25 Aug, 2011

18 commits

  • Patch series 109f6e39..7361c36c back in 2.6.36 added functionality to
    allow credentials to work across pid namespaces for packets sent via
    UNIX sockets. However, the atomic reference counts on pid and
    credentials caused plenty of cache bouncing when there are numerous
    threads of the same pid sharing a UNIX socket. This patch mitigates the
    problem by eliminating extraneous reference counts on pid and
    credentials on both send and receive path of UNIX sockets. I found a 2x
    improvement in hackbench's threaded case.

    On the receive path in unix_dgram_recvmsg, currently there is an
    increment of reference count on pid and credentials in scm_set_cred.
    Then there are two decrement of the reference counts. Once in scm_recv
    and once when skb_free_datagram call skb->destructor function
    unix_destruct_scm. One pair of increment and decrement of ref count on
    pid and credentials can be eliminated from the receive path. Until we
    destroy the skb, we already set a reference when we created the skb on
    the send side.

    On the send path, there are two increments of ref count on pid and
    credentials, once in scm_send and once in unix_scm_to_skb. Then there
    is a decrement of the reference counts in scm_destroy's call to
    scm_destroy_cred at the end of unix_dgram_sendmsg functions. One pair
    of increment and decrement of the reference counts can be removed so we
    only need to increment the ref counts once.

    By incorporating these changes, for hackbench running on a 4 socket
    NHM-EX machine with 40 cores, the execution of hackbench on
    50 groups of 20 threads sped up by factor of 2.

    Hackbench command used for testing:
    ./hackbench 50 thread 2000

    Signed-off-by: Tim Chen
    Signed-off-by: David S. Miller

    Tim Chen
     
  • With this patch a HEARTBEAT chunk is bundled into the ASCONF-ACK
    for ADD IP ADDRESS, confirming the new destination as quickly as
    possible.

    Signed-off-by: Michio Honda
    Signed-off-by: David S. Miller

    Michio Honda
     
  • This patch fixes BUG that the ASCONF receiver transmits DATA chunks
    to the newly added UNCONFIRMED destination.

    Signed-off-by: Michio Honda
    Signed-off-by: David S. Miller

    Michio Honda
     
  • This patch implements Proportional Rate Reduction (PRR) for TCP.
    PRR is an algorithm that determines TCP's sending rate in fast
    recovery. PRR avoids excessive window reductions and aims for
    the actual congestion window size at the end of recovery to be as
    close as possible to the window determined by the congestion control
    algorithm. PRR also improves accuracy of the amount of data sent
    during loss recovery.

    The patch implements the recommended flavor of PRR called PRR-SSRB
    (Proportional rate reduction with slow start reduction bound) and
    replaces the existing rate halving algorithm. PRR improves upon the
    existing Linux fast recovery under a number of conditions including:
    1) burst losses where the losses implicitly reduce the amount of
    outstanding data (pipe) below the ssthresh value selected by the
    congestion control algorithm and,
    2) losses near the end of short flows where application runs out of
    data to send.

    As an example, with the existing rate halving implementation a single
    loss event can cause a connection carrying short Web transactions to
    go into the slow start mode after the recovery. This is because during
    recovery Linux pulls the congestion window down to packets_in_flight+1
    on every ACK. A short Web response often runs out of new data to send
    and its pipe reduces to zero by the end of recovery when all its packets
    are drained from the network. Subsequent HTTP responses using the same
    connection will have to slow start to raise cwnd to ssthresh. PRR on
    the other hand aims for the cwnd to be as close as possible to ssthresh
    by the end of recovery.

    A description of PRR and a discussion of its performance can be found at
    the following links:
    - IETF Draft:
    http://tools.ietf.org/html/draft-mathis-tcpm-proportional-rate-reduction-01
    - IETF Slides:
    http://www.ietf.org/proceedings/80/slides/tcpm-6.pdf
    http://tools.ietf.org/agenda/81/slides/tcpm-2.pdf
    - Paper to appear in Internet Measurements Conference (IMC) 2011:
    Improving TCP Loss Recovery
    Nandita Dukkipati, Matt Mathis, Yuchung Cheng

    Signed-off-by: Nandita Dukkipati
    Signed-off-by: David S. Miller

    Nandita Dukkipati
     
  • 1) Blocks can be configured with non-static frame-size.
    2) Read/poll is at a block-level(as opposed to packet-level).
    3) Added poll timeout to avoid indefinite user-space wait on idle links.
    4) Added user-configurable knobs:
    4.1) block::timeout.
    4.2) tpkt_hdr::sk_rxhash.

    Changes:
    C1) tpacket_rcv()
    C1.1) packet_current_frame() is replaced by packet_current_rx_frame()
    The bulk of the processing is then moved in the following chain:
    packet_current_rx_frame()
    __packet_lookup_frame_in_block
    fill_curr_block()
    or
    retire_current_block
    dispatch_next_block
    or
    return NULL(queue is plugged/paused)

    Signed-off-by: Chetan Loke
    Signed-off-by: David S. Miller

    chetan loke
     
  • Added TPACKET_V3 definitions.

    Signed-off-by: Chetan Loke
    Signed-off-by: David S. Miller

    chetan loke
     
  • This patch provides base support for transmission of IPv6 packets as
    well as the formation of IPv6 link-local addresses and statelessly
    autoconfigured addresses on top of IEEE 802.15.4 networks.

    For more information please look at the RFC4944 "Compression Format
    for IPv6 Datagrams in Low Power and Losst Networks (6LoWPAN).

    Signed-off-by: Alexander Smirnov
    Signed-off-by: David S. Miller

    Alexander Smirnov
     
  • Signed-off-by: Ian Campbell
    Cc: "David S. Miller"
    Cc: netdev@vger.kernel.org
    Signed-off-by: David S. Miller

    Ian Campbell
     
  • Signed-off-by: Ian Campbell
    Cc: "David S. Miller"
    Cc: Alexey Kuznetsov
    Cc: "Pekka Savola (ipv6)"
    Cc: James Morris
    Cc: Hideaki YOSHIFUJI
    Cc: Patrick McHardy
    Cc: netdev@vger.kernel.org
    Signed-off-by: David S. Miller

    Ian Campbell
     
  • Signed-off-by: Ian Campbell
    Cc: "David S. Miller"
    Cc: Alexey Kuznetsov
    Cc: "Pekka Savola (ipv6)"
    Cc: James Morris
    Cc: Hideaki YOSHIFUJI
    Cc: Patrick McHardy
    Cc: netdev@vger.kernel.org
    Signed-off-by: David S. Miller

    Ian Campbell
     
  • Signed-off-by: Ian Campbell
    Cc: "David S. Miller"
    Cc: Eric Dumazet
    Cc: "Michał Mirosław"
    Cc: netdev@vger.kernel.org
    Signed-off-by: David S. Miller

    Ian Campbell
     
  • Signed-off-by: Sathya Perla
    Signed-off-by: David S. Miller

    Sathya Perla
     
  • Flashing some of the PHYs can take longer thus increasing the total flash
    update time to a max of 40s.

    Signed-off-by: Sathya Perla
    Signed-off-by: David S. Miller

    Sathya Perla
     
  • The rx_drops_no_frags HW counter for RSS rings is 16bits in HW and can
    wraparound often. Maintain a 32-bit accumulator in the driver to prevent
    frequent wraparound.

    Also, incorporated Eric's feedback to use ACCESS_ONCE() for the accumulator
    write.

    Signed-off-by: Sathya Perla
    Signed-off-by: David S. Miller

    Sathya Perla
     
  • Get rid of adapter->pcicfg and its use. Use pci_config_read/write_dword()
    instead.

    Signed-off-by: Sathya Perla
    Signed-off-by: David S. Miller

    Sathya Perla
     
  • There is a possibility of be_post_rx_frags() being called simultaneously from
    both be_worker() (when rx_post_starved) and be_poll_rx() (when rxq->used is 0).
    This can be avoided by posting rx buffers only when some completions have been
    reaped.

    Signed-off-by: Sathya Perla
    Signed-off-by: David S. Miller

    Sathya Perla
     
  • Skip IPIP header to get proper layer-4 information.

    Like GRE tunnels, this only works if rxhash is not already provided by
    the device itself (ethtool -K ethX rxhash off), to allow kernel compute
    a software rxhash.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • David S. Miller
     

23 Aug, 2011

5 commits


22 Aug, 2011

8 commits


21 Aug, 2011

9 commits