07 May, 2009

1 commit


06 May, 2009

1 commit

  • v5 -> v6 (current):
    -removed so far unused static functions
    -corrected dev_addr_del_multiple to call del instead of add

    v4 -> v5:
    -added device address type (suggested by davem)
    -removed refcounting (better to have simplier code then safe potentially few
    bytes)

    v3 -> v4:
    -changed kzalloc to kmalloc in __hw_addr_add_ii()
    -ASSERT_RTNL() avoided in dev_addr_flush() and dev_addr_init()

    v2 -> v3:
    -removed unnecessary rcu read locking
    -moved dev_addr_flush() calling to ensure no null dereference of dev_addr

    v1 -> v2:
    -added forgotten ASSERT_RTNL to dev_addr_init and dev_addr_flush
    -removed unnecessary rcu_read locking in dev_addr_init
    -use compare_ether_addr_64bits instead of compare_ether_addr
    -use L1_CACHE_BYTES as size for allocating struct netdev_hw_addr
    -use call_rcu instead of rcu_synchronize
    -moved is_etherdev_addr into __KERNEL__ ifdef

    This patch introduces a new list in struct net_device and brings a set of
    functions to handle the work with device address list. The list is a replacement
    for the original dev_addr field and because in some situations there is need to
    carry several device addresses with the net device. To be backward compatible,
    dev_addr is made to point to the first member of the list so original drivers
    sees no difference.

    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Jiri Pirko
     

30 Apr, 2009

1 commit


28 Apr, 2009

2 commits

  • netif_tx_queue_stopped(txq) is most of the time false.

    Yet its cost is very expensive on SMP.

    static inline int netif_tx_queue_stopped(const struct netdev_queue *dev_queue)
    {
    return test_bit(__QUEUE_STATE_XOFF, &dev_queue->state);
    }

    I saw this on oprofile hunting and bnx2 driver bnx2_tx_int().

    We probably should split "struct netdev_queue" in two parts, one
    being read mostly.

    __netif_tx_lock() touches _xmit_lock & xmit_lock_owner, these
    deserve a separate cache line.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • this is the sctp code to enable hardware crc32c offload for
    adapters that support it.

    Originally by: Vlad Yasevich

    modified by Jesse Brandeburg

    Signed-off-by: Jesse Brandeburg
    Signed-off-by: Vlad Yasevich
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    Jesse Brandeburg
     

27 Apr, 2009

4 commits


21 Apr, 2009

1 commit


16 Apr, 2009

1 commit

  • It turns out that copying a 16-byte area at ~800k times a second
    can be really expensive :) This patch redesigns the frags GRO
    interface to avoid copying that area twice.

    The two disciples of the frags interface have been converted.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

29 Mar, 2009

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (119 commits)
    [SCSI] scsi_dh_rdac: Retry for NOT_READY check condition
    [SCSI] mpt2sas: make global symbols unique
    [SCSI] sd: Make revalidate less chatty
    [SCSI] sd: Try READ CAPACITY 16 first for SBC-2 devices
    [SCSI] sd: Refactor sd_read_capacity()
    [SCSI] mpt2sas v00.100.11.15
    [SCSI] mpt2sas: add MPT2SAS_MINOR(221) to miscdevice.h
    [SCSI] ch: Add scsi type modalias
    [SCSI] 3w-9xxx: add power management support
    [SCSI] bsg: add linux/types.h include to bsg.h
    [SCSI] cxgb3i: fix function descriptions
    [SCSI] libiscsi: fix possbile null ptr session command cleanup
    [SCSI] iscsi class: remove host no argument from session creation callout
    [SCSI] libiscsi: pass session failure a session struct
    [SCSI] iscsi lib: remove qdepth param from iscsi host allocation
    [SCSI] iscsi lib: have lib create work queue for transmitting IO
    [SCSI] iscsi class: fix lock dep warning on logout
    [SCSI] libiscsi: don't cap queue depth in iscsi modules
    [SCSI] iscsi_tcp: replace scsi_debug/tcp_debug logging with iscsi conn logging
    [SCSI] libiscsi_tcp: replace tcp_debug/scsi_debug logging with session/conn logging
    ...

    Linus Torvalds
     

28 Mar, 2009

1 commit

  • The inline function skb_gro_mac_header defined in include/linux/netdevice.h
    makes use of page_address(). Depending on configuration options, the latter
    is either defined as a macro or is declared as a function in another header
    file, namely include/linux/mm.h. However, include/linux/netdevice.h does not
    include include/linux/mm.h.

    On MIPS, this has produced the following build error:

    CC kernel/sysctl_check.o
    In file included from include/linux/icmpv6.h:173,
    from include/linux/ipv6.h:208,
    from include/net/ip_vs.h:26,
    from kernel/sysctl_check.c:6:
    include/linux/netdevice.h: In function 'skb_gro_mac_header':
    include/linux/netdevice.h:1132: error: implicit declaration of function
    'page_address'
    include/linux/netdevice.h:1133: warning: pointer/integer type mismatch
    in conditional expression
    make[1]: *** [kernel/sysctl_check.o] Error 1
    make: *** [kernel] Error 2

    The patch adds the missing include and fixes the build error.

    Signed-off-by: Dmitri Vorobiev
    Signed-off-by: David S. Miller

    Dmitri Vorobiev
     

17 Mar, 2009

1 commit

  • As my netpoll fix for net doesn't really work for net-next, we
    need this update to move the checks into the right place. As it
    stands we may pass freed skbs to netpoll_receive_skb.

    This patch also introduces a netpoll_rx_on function to avoid GRO
    completely if we're invoked through netpoll. This might seem
    paranoid but as netpoll may have an external receive hook it's
    better to be safe than sorry. I don't think we need this for
    2.6.29 though since there's nothing immediately broken by it.

    This patch also moves the GRO_* return values to netdevice.h since
    VLAN needs them too (I tried to avoid this originally but alas
    this seems to be the easiest way out). This fixes a bug in VLAN
    where it continued to use the old return value 2 instead of the
    correct GRO_DROP.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

14 Mar, 2009

3 commits


05 Mar, 2009

2 commits


15 Feb, 2009

1 commit


09 Feb, 2009

2 commits

  • This patch optimises the Ethernet header comparison to use 2-byte
    and 4-byte xors instead of memcmp. In order to facilitate this,
    the actual comparison is now carried out by the callers of the
    shared dev_gro_receive function.

    This has a significant impact when receiving 1500B packets through
    10GbE.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch prepares for the move of the same_flow checks out of
    dev_gro_receive. As such we need to remember the number of held
    packets since doing a loop just to count them every time is silly.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

06 Feb, 2009

1 commit


30 Jan, 2009

2 commits

  • Unfortunately simplicity isn't always the best. The fraginfo
    interface turned out to be suboptimal. The problem was quite
    obvious. For every packet, we have to copy the headers from
    the frags structure into skb->head, even though for 99% of the
    packets this part is immediately thrown away after the merge.

    LRO didn't have this problem because it directly read the headers
    from the frags structure.

    This patch attempts to address this by creating an interface
    that allows GRO to access the headers in the first frag without
    having to copy it. Because all drivers that use frags place the
    headers in the first frag this optimisation should be enough.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Currently VLAN still has a bit of common code handling the aftermath
    of GRO that's shared with the common path. This patch moves them
    into shared helpers to reduce code duplication.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

22 Jan, 2009

1 commit

  • Following the removal of the unused struct net_device * parameter from
    the NAPI functions named *netif_rx_* in commit 908a7a1, they are
    exactly equivalent to the corresponding *napi_* functions and are
    therefore redundant.

    Signed-off-by: Ben Hutchings
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Ben Hutchings
     

15 Jan, 2009

1 commit

  • This adds an init_dummy_netdev() function that gets a network device
    structure (allocation and lifetime entirely under caller's control) and
    initialize the minimum amount of fields so it can be used to schedule
    NAPI polls without registering a full blown interface. This is to be
    used by drivers that need to tie several hardware interfaces to a single
    NAPI poll scheduler due to HW limitations.

    It also updates the ibm_newemac driver to use that, this fixing the
    oops on 2.6.29 due to passing NULL as "dev" to netif_napi_add()

    Symbol is exported GPL only a I don't think we want binary drivers doing
    that sort of acrobatics (if we want them at all).

    Signed-off-by: Benjamin Herrenschmidt
    Tested-by: Geert Uytterhoeven
    Signed-off-by: David S. Miller

    Benjamin Herrenschmidt
     

13 Jan, 2009

1 commit


10 Jan, 2009

1 commit

  • * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx: (22 commits)
    ioat: fix self test for multi-channel case
    dmaengine: bump initcall level to arch_initcall
    dmaengine: advertise all channels on a device to dma_filter_fn
    dmaengine: use idr for registering dma device numbers
    dmaengine: add a release for dma class devices and dependent infrastructure
    ioat: do not perform removal actions at shutdown
    iop-adma: enable module removal
    iop-adma: kill debug BUG_ON
    iop-adma: let devm do its job, don't duplicate free
    dmaengine: kill enum dma_state_client
    dmaengine: remove 'bigref' infrastructure
    dmaengine: kill struct dma_client and supporting infrastructure
    dmaengine: replace dma_async_client_register with dmaengine_get
    atmel-mci: convert to dma_request_channel and down-level dma_slave
    dmatest: convert to dma_request_channel
    dmaengine: introduce dma_request_channel and private channels
    net_dma: convert to dma_find_channel
    dmaengine: provide a common 'issue_pending_all' implementation
    dmaengine: centralize channel allocation, introduce dma_find_channel
    dmaengine: up-level reference counting to the module level
    ...

    Linus Torvalds
     

07 Jan, 2009

2 commits

  • Previously GRO's only entry point from the outside is through
    napi_gro_receive and napi_gro_frags. These interfaces are for
    device drivers.

    This patch rearranges things to provide a new set of interfaces
    for VLANs. These interfaces are for internal use only. The
    VLAN code itself can then provide a set of entry points for
    device drivers.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Use the general-purpose channel allocation provided by dmaengine.

    Reviewed-by: Andrew Morton
    Signed-off-by: Dan Williams

    Dan Williams
     

05 Jan, 2009

1 commit

  • This patch allows GRO to merge page frags (skb_shinfo(skb)->frags)
    in one skb, rather than using the less efficient frag_list.

    It also adds a new interface, napi_gro_frags to allow drivers
    to inject page frags directly into the stack without allocating
    an skb. This is intended to be the GRO equivalent for LRO's
    lro_receive_frags interface.

    The existing GSO interface can already handle page frags with
    or without an appended frag_list so nothing needs to be changed
    there.

    The merging itself is rather simple. We store any new frag entries
    after the last existing entry, without checking whether the first
    new entry can be merged with the last existing entry. Making this
    check would actually be easy but since no existing driver can
    produce contiguous frags anyway it would just be mental masturbation.

    If the total number of entries would exceed the capacity of a
    single skb, we simply resort to using frag_list as we do now.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

23 Dec, 2008

1 commit

  • When the napi api was changed to separate its 1:1 binding to the net_device
    struct, the netif_rx_[prep|schedule|complete] api failed to remove the now
    vestigual net_device structure parameter. This patch cleans up that api by
    properly removing it..

    Signed-off-by: Neil Horman
    Signed-off-by: David S. Miller

    Neil Horman
     

16 Dec, 2008

3 commits

  • This patch adds the top-level GRO (Generic Receive Offload) infrastructure.
    This is pretty similar to LRO except that this is protocol-independent.
    Instead of holding packets in an lro_mgr structure, they're now held in
    napi_struct.

    For drivers that intend to use this, they can set the NETIF_F_GRO bit and
    call napi_gro_receive instead of netif_receive_skb or just call netif_rx.
    The latter will call napi_receive_skb automatically. When napi_gro_receive
    is used, the driver must either call napi_complete/napi_rx_complete, or
    call napi_gro_flush in softirq context if the driver uses the primitives
    __napi_complete/__napi_rx_complete.

    Protocols will set the gro_receive and gro_complete function pointers in
    order to participate in this scheme.

    In addition to the packet, gro_receive will get a list of currently held
    packets. Each packet in the list has a same_flow field which is non-zero
    if it is a potential match for the new packet. For each packet that may
    match, they also have a flush field which is non-zero if the held packet
    must not be merged with the new packet.

    Once gro_receive has determined that the new skb matches a held packet,
    the held packet may be processed immediately if the new skb cannot be
    merged with it. In this case gro_receive should return the pointer to
    the existing skb in gro_list. Otherwise the new skb should be merged into
    the existing packet and NULL should be returned, unless the new skb makes
    it impossible for any further merges to be made (e.g., FIN packet) where
    the merged skb should be returned.

    Whenever the skb is merged into an existing entry, the gro_receive
    function should set NAPI_GRO_CB(skb)->same_flow. Note that if an skb
    merely matches an existing entry but can't be merged with it, then
    this shouldn't be set.

    If gro_receive finds it pointless to hold the new skb for future merging,
    it should set NAPI_GRO_CB(skb)->flush.

    Held packets will be flushed by napi_gro_flush which is called by
    napi_complete and napi_rx_complete.

    Currently held packets are stored in a singly liked list just like LRO.
    The list is limited to a maximum of 8 entries. In future, this may be
    expanded to use a hash table to allow more flows to be held for merging.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch allows GSO to handle frag_list in a limited way for the
    purposes of allowing packets merged by GRO to be refragmented on
    output.

    Most hardware won't (and aren't expected to) support handling GRO
    frag_list packets directly. Therefore we will perform GSO in
    software for those cases.

    However, for drivers that can support it (such as virtual NICs) we
    may not have to segment the packets at all.

    Whether the added overhead of GRO/GSO is worthwhile for bridges
    and routers when weighed against the benefit of potentially
    increasing the MTU within the host is still an open question.
    However, for the case of host nodes this is undoubtedly a win.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Conflicts:

    drivers/net/e1000e/ich8lan.c

    David S. Miller
     

10 Dec, 2008

1 commit

  • A few months back a race was discused between the netpoll napi service
    path, and the fast path through net_rx_action:
    http://kerneltrap.org/mailarchive/linux-netdev/2007/10/16/345470

    A patch was submitted for that bug, but I think we missed a case.

    Consider the following scenario:

    INITIAL STATE
    CPU0 has one napi_struct A on its poll_list
    CPU1 is calling netpoll_send_skb and needs to call poll_napi on the same
    napi_struct A that CPU0 has on its list

    CPU0 CPU1
    net_rx_action poll_napi
    !list_empty (returns true) locks poll_lock for A
    poll_one_napi
    napi->poll
    netif_rx_complete
    __napi_complete
    (removes A from poll_list)
    list_entry(list->next)

    In the above scenario, net_rx_action assumes that the per-cpu poll_list is
    exclusive to that cpu. netpoll of course violates that, and because the netpoll
    path can dequeue from the poll list, its possible for CPU0 to detect a non-empty
    list at the top of the while loop in net_rx_action, but have it become empty by
    the time it calls list_entry. Since the poll_list isn't surrounded by any other
    structure, the returned data from that list_entry call in this situation is
    garbage, and any number of crashes can result based on what exactly that garbage
    is.

    Given that its not fasible for performance reasons to place exclusive locks
    arround each cpus poll list to provide that mutal exclusion, I think the best
    solution is modify the netpoll path in such a way that we continue to guarantee
    that the poll_list for a cpu is in fact exclusive to that cpu. To do this I've
    implemented the patch below. It adds an additional bit to the state field in
    the napi_struct. When executing napi->poll from the netpoll_path, this bit will
    be set. When a driver calls netif_rx_complete, if that bit is set, it will not
    remove the napi_struct from the poll_list. That work will be saved for the next
    iteration of net_rx_action.

    I've tested this and it seems to work well. About the biggest drawback I can
    see to it is the fact that it might result in an extra loop through
    net_rx_action in the event that the device is actually contended for (i.e. the
    netpoll path actually preforms all the needed work no the device, and the call
    to net_rx_action winds up doing nothing, except removing the napi_struct from
    the poll_list. However I think this is probably a small price to pay, given
    that the alternative is a crash.

    Signed-off-by: Neil Horman
    Signed-off-by: David S. Miller

    Neil Horman
     

08 Dec, 2008

1 commit

  • This is the last shoot of this series.
    After I removing all directly reference of netdev->priv, I am killing
    "priv" of "struct net_device" and fixing relative comments/docs.

    Anyone will not be allowed to reference netdev->priv directly.
    If you want to reference the memory of private data, use netdev_priv()
    instead.
    If the private data is not allocted when alloc_netdev(), use
    netdev->ml_priv to point that memory after you creating that private
    data.

    Signed-off-by: Wang Chen
    Signed-off-by: David S. Miller

    Wang Chen
     

25 Nov, 2008

2 commits

  • Since the netlink option for DCB is necessary to actually be useful,
    simplified the Kconfig option. In addition, added useful help text for the
    Kconfig option.

    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    Jeff Kirsher
     
  • As a concession to vendors who have to deal with one source for different
    kernel versions, add a HAVE_NET_DEVICE_OPS so they don't end up hard
    coding ifdef against kernel version.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger