10 Oct, 2013

10 commits


09 Oct, 2013

30 commits

  • At this point sk might contain garbage.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • TCP listener refactoring, part 4 :

    To speed up inet lookups, we moved IPv4 addresses from inet to struct
    sock_common

    Now is time to do the same for IPv6, because it permits us to have fast
    lookups for all kind of sockets, including upcoming SYN_RECV.

    Getting IPv6 addresses in TCP lookups currently requires two extra cache
    lines, plus a dereference (and memory stall).

    inet6_sk(sk) does the dereference of inet_sk(__sk)->pinet6

    This patch is way bigger than its IPv4 counter part, because for IPv4,
    we could add aliases (inet_daddr, inet_rcv_saddr), while on IPv6,
    it's not doable easily.

    inet6_sk(sk)->daddr becomes sk->sk_v6_daddr
    inet6_sk(sk)->rcv_saddr becomes sk->sk_v6_rcv_saddr

    And timewait socket also have tw->tw_v6_daddr & tw->tw_v6_rcv_saddr
    at the same offset.

    We get rid of INET6_TW_MATCH() as INET6_MATCH() is now the generic
    macro.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • TCP listener refactoring, part 3 :

    Our goal is to hash SYN_RECV sockets into main ehash for fast lookup,
    and parallel SYN processing.

    Current inet_ehash_bucket contains two chains, one for ESTABLISH (and
    friend states) sockets, another for TIME_WAIT sockets only.

    As the hash table is sized to get at most one socket per bucket, it
    makes little sense to have separate twchain, as it makes the lookup
    slightly more complicated, and doubles hash table memory usage.

    If we make sure all socket types have the lookup keys at the same
    offsets, we can use a generic and faster lookup. It turns out TIME_WAIT
    and ESTABLISHED sockets already have common lookup fields for IPv4.

    [ INET_TW_MATCH() is no longer needed ]

    I'll provide a follow-up to factorize IPv6 lookup as well, to remove
    INET6_TW_MATCH()

    This way, SYN_RECV pseudo sockets will be supported the same.

    A new sock_gen_put() helper is added, doing either a sock_put() or
    inet_twsk_put() [ and will support SYN_RECV later ].

    Note this helper should only be called in real slow path, when rcu
    lookup found a socket that was moved to another identity (freed/reused
    immediately), but could eventually be used in other contexts, like
    sock_edemux()

    Before patch :

    dmesg | grep "TCP established"

    TCP established hash table entries: 524288 (order: 11, 8388608 bytes)

    After patch :

    TCP established hash table entries: 524288 (order: 10, 4194304 bytes)

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Conflicts:
    include/linux/netdevice.h
    net/core/sock.c

    Trivial merge issues.

    Removal of "extern" for functions declaration in netdevice.h
    at the same time "const" was added to an argument.

    Two parallel line additions in net/core/sock.c

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Ben Hutchings says:

    ====================
    Some more fixes for EF10 support; hopefully the last lot:

    1. Fixes for reading statistics, from Edward Cree and Jon Cooper.
    2. Addition of ethtool statistics for packets dropped by the hardware
    before they were associated with a specific function, from Edward Cree.
    3. Only bind to functions that are in control of their associated port,
    as the driver currently assumes this is the case.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Steinar reported FQ pacing was not working for UDP flows.

    It looks like the initial sk->sk_pacing_rate value of 0 was
    a wrong choice. We should init it to ~0U (unlimited)

    Then, TCA_FQ_FLOW_DEFAULT_RATE should be removed because it makes
    no real sense. The default rate is really unlimited, and we
    need to avoid a zero divide.

    Reported-by: Steinar H. Gunderson
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • This reverts commit 612c337306f00dc8d396830212de51c475844791.

    As per Stephen Hemminger, the layout of the netlink attribute
    is not implemented correctly so revert this for now.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Add the missing destroy_workqueue() before return from
    qlcnic_probe() in the error handling case.

    Signed-off-by: Wei Yongjun
    Signed-off-by: David S. Miller

    Wei Yongjun
     
  • This patch fix the error handling in moxart_mac_probe():
    - return -ENOMEM in some memory alloc fail cases
    - add missing free_netdev() in the error handling case

    Signed-off-by: Wei Yongjun
    Signed-off-by: David S. Miller

    Wei Yongjun
     
  • This patch fixes the calculation of the nlmsg size, by adding the missing
    nla_total_size().

    Cc: Patrick McHardy
    Signed-off-by: Marc Kleine-Budde
    Signed-off-by: David S. Miller

    Marc Kleine-Budde
     
  • TCA_FQ_INITIAL_QUANTUM should set q->initial_quantum

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Unlike ipv4, the struct member hlen holds the length of the GRE and ipv6
    headers. This length is also counted in dev->hard_header_len.
    Perhaps, it's more clean to modify the hlen to count only the GRE header
    without ipv6 header as the variable name suggest, but the simple way to fix
    this without regression risk is simply modify the calculation of the limit
    in ip6gre_tunnel_change_mtu function.
    Verified in kernel version v3.11.

    Signed-off-by: Oussama Ghorbel
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Oussama Ghorbel
     
  • We can get classid through cgroup_subsys_state,
    this is directviewing and effective.

    Signed-off-by: Gao feng
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Gao feng
     
  • Since the tasks have been migrated to the cgroup,
    there is no need to call task_netprioidx to get
    task's cgroup id.

    Signed-off-by: Gao feng
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Gao feng
     
  • The since the removal of the routing cache computing
    fib_compute_spec_dst() does a fib_table lookup for each UDP multicast
    packet received. This has introduced a performance regression for some
    UDP workloads.

    This change skips populating the packet info for sockets that do not have
    IP_PKTINFO set.

    Benchmark results from a netperf UDP_RR test:
    Before 89789.68 transactions/s
    After 90587.62 transactions/s

    Benchmark results from a fio 1 byte UDP multicast pingpong test
    (Multicast one way unicast response):
    Before 12.63us RTT
    After 12.48us RTT

    Signed-off-by: Shawn Bohrer
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Shawn Bohrer
     
  • The removal of the routing cache introduced a performance regression for
    some UDP workloads since a dst lookup must be done for each packet.
    This change caches the dst per socket in a similar manner to what we do
    for TCP by implementing early_demux.

    For UDP multicast we can only cache the dst if there is only one
    receiving socket on the host. Since caching only works when there is
    one receiving socket we do the multicast socket lookup using RCU.

    For UDP unicast we only demux sockets with an exact match in order to
    not break forwarding setups. Additionally since the hash chains may be
    long we only check the first socket to see if it is a match and not
    waste extra time searching the whole chain when we might not find an
    exact match.

    Benchmark results from a netperf UDP_RR test:
    Before 87961.22 transactions/s
    After 89789.68 transactions/s

    Benchmark results from a fio 1 byte UDP multicast pingpong test
    (Multicast one way unicast response):
    Before 12.97us RTT
    After 12.63us RTT

    Signed-off-by: Shawn Bohrer
    Signed-off-by: David S. Miller

    Shawn Bohrer
     
  • UDP sockets can receive packets from multiple endpoints and thus may be
    received on multiple receive queues. Since packets packets can arrive
    on multiple receive queues we should not mark the napi_id for all
    packets. This makes busy read/poll only work for connected UDP sockets.

    This additionally enables busy read/poll for UDP multicast packets as
    long as the socket is connected by moving the check into
    __udp_queue_rcv_skb().

    Signed-off-by: Shawn Bohrer
    Suggested-by: Eric Dumazet
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Shawn Bohrer
     
  • qdisc_tree_decrease_qlen() is called when some packets are dropped
    on a qdisc, and we want to notify parents of qlen changes.

    We also can increment parents qdisc qstats drop counters.

    This permits more accurate drop counters up to root qdisc.

    For example a graft operation typically resets a qdisc
    (drops all packets) and call qdisc_tree_decrease_qlen()

    Note that callers are responsible for their drop counters.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • If a guest is destroyed without transitioning its frontend to CLOSED,
    the domain becomes a zombie as netback was not grant unmapping the
    shared rings.

    When removing a VIF, transition the backend to CLOSED so the VIF is
    disconnected if necessary (which will unmap the shared rings etc).

    This fixes a regression introduced by
    279f438e36c0a70b23b86d2090aeec50155034a9 (xen-netback: Don't destroy
    the netdev until the vif is shut down).

    Signed-off-by: David Vrabel
    Cc: Ian Campbell
    Cc: Wei Liu
    Cc: Paul Durrant
    Acked-by: Wei Liu
    Reviewed-by: Paul Durrant
    Signed-off-by: David S. Miller

    David Vrabel
     
  • Amir Vadai says:

    ====================
    net/mlx4_en: Fix pages never dma unmapped on rx

    This patchset fixes a bug introduced by commit 51151a16 (mlx4: allow order-0
    memory allocations in RX path). Where dma_unmap_page wasn't called.

    Changes from V0:
    - Added "Rename name of mlx4_en_rx_alloc members". Old names were confusing.
    - Last frag in page calculation was wrong. Since all frags in page are of the
    same size, need to add this frag_stride to end of frag offset, and not the
    size of next frag in skb.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • This patch fixes a bug introduced by commit 51151a16 (mlx4: allow
    order-0 memory allocations in RX path).

    dma_unmap_page never reached because condition to detect last fragment
    in page is wrong. offset+frag_stride can't be greater than size, need to
    make sure no additional frag will fit in page => compare offset +
    frag_stride + next_frag_size instead.
    next_frag_size is the same as the current one, since page is shared only
    with frags of the same size.

    CC: Eric Dumazet
    Signed-off-by: Amir Vadai
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Amir Vadai
     
  • Add page prefix to page related members: @size and @offset into
    @page_size and @page_offset

    CC: Eric Dumazet
    Signed-off-by: Amir Vadai
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Amir Vadai
     
  • Currently, in TLB mode we change mac addresses only by memcpy-ing the to
    net_device->dev_addr, without actually setting them via
    dev_set_mac_address(). This permits us to receive all the traffic always on
    one mac address.

    However, in case the interface flips, some drivers might enforce the
    mac filtering for its FW/HW based on current ->dev_addr, and thus we won't
    be able to receive traffic on that interface, in case it will be selected
    as active in TLB mode.

    Fix it by setting the mac address forcefully on every new active slave that
    we select in TLB mode.

    CC: Jay Vosburgh
    CC: Andy Gospodarek
    CC: Yuval Mintz
    Reported-by: Yuval Mintz
    Tested-by: Yuval Mintz
    Signed-off-by: Veaceslav Falico
    Signed-off-by: David S. Miller

    Veaceslav Falico
     
  • This patch will fix RX packets errors when receiving big size
    of data by set bit RNC = 1.

    RNC - Receive Enable Control

    0: Upon completion of reception of one frame, the E-DMAC writes
    the receive status to the descriptor and clears the RR bit in
    EDRRR to 0.

    1: Upon completion of reception of one frame, the E-DMAC writes
    (writes back) the receive status to the descriptor. In addition,
    the E-DMAC reads the next descriptor and prepares for reception
    of the next frame.

    In addition, for get more stable when receiving packets, I set
    maximum size for the transmit/receive FIFO and inserts padding
    in receive data.

    Signed-off-by: Nguyen Hong Ky
    Signed-off-by: David S. Miller

    Nguyen Hong Ky
     
  • net/l2tp/l2tp_core.c: In function ‘l2tp_verify_udp_checksum’:
    net/l2tp/l2tp_core.c:499:22: warning: unused variable ‘tunnel’ [-Wunused-variable]

    Create a helper "l2tp_tunnel()" to facilitate this, and as a side
    effect get rid of a bunch of unnecessary void pointer casts.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • We play with a wait queue even if socket is
    non blocking. This is an obvious waste.
    Besides, it will prevent calling the non blocking
    variant when current is not valid.

    Signed-off-by: Michael S. Tsirkin
    Acked-by: Jason Wang
    Signed-off-by: David S. Miller

    Michael S. Tsirkin
     
  • Alan Ott says:

    ====================
    Fix race conditions in mrf24j40 interrupts

    After testing with the betas of this patchset, it's been rebased and is
    ready for inclusion.

    David Hauweele noticed that the mrf24j40 would hang arbitrarily after some
    period of heavy traffic. Two race conditions were discovered, and the
    driver was changed to use threaded interrupts, since the enable/disable of
    interrupts in the driver has recently been a lighning rod whenever issues
    arise related to interrupts (costing engineering time), and since threaded
    interrupts are the right way to do it.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • The mrf24j40 generates level interrupts. There are rare cases where it
    appears that the interrupt line never gets de-asserted between interrupts,
    causing interrupts to be lost, and causing a hung device from the driver's
    perspective. Switching the driver to interpret these interrupts as
    level-triggered fixes this issue.

    Signed-off-by: Alan Ott
    Signed-off-by: David S. Miller

    Alan Ott
     
  • Eliminate all the workqueue and interrupt enable/disable.

    Signed-off-by: Alan Ott
    Signed-off-by: David S. Miller

    Alan Ott
     
  • This avoids a race condition where complete(tx_complete) could be called
    before tx_complete is initialized.

    Signed-off-by: Alan Ott
    Signed-off-by: David S. Miller

    Alan Ott