21 May, 2011

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1446 commits)
    macvlan: fix panic if lowerdev in a bond
    tg3: Add braces around 5906 workaround.
    tg3: Fix NETIF_F_LOOPBACK error
    macvlan: remove one synchronize_rcu() call
    networking: NET_CLS_ROUTE4 depends on INET
    irda: Fix error propagation in ircomm_lmp_connect_response()
    irda: Kill set but unused variable 'bytes' in irlan_check_command_param()
    irda: Kill set but unused variable 'clen' in ircomm_connect_indication()
    rxrpc: Fix set but unused variable 'usage' in rxrpc_get_transport()
    be2net: Kill set but unused variable 'req' in lancer_fw_download()
    irda: Kill set but unused vars 'saddr' and 'daddr' in irlan_provider_connect_indication()
    atl1c: atl1c_resume() is only used when CONFIG_PM_SLEEP is defined.
    rxrpc: Fix set but unused variable 'usage' in rxrpc_get_peer().
    rxrpc: Kill set but unused variable 'local' in rxrpc_UDP_error_handler()
    rxrpc: Kill set but unused variable 'sp' in rxrpc_process_connection()
    rxrpc: Kill set but unused variable 'sp' in rxrpc_rotate_tx_window()
    pkt_sched: Kill set but unused variable 'protocol' in tc_classify()
    isdn: capi: Use pr_debug() instead of ifdefs.
    tg3: Update version to 3.119
    tg3: Apply rx_discards fix to 5719/5720
    ...

    Fix up trivial conflicts in arch/x86/Kconfig and net/mac80211/agg-tx.c
    as per Davem.

    Linus Torvalds
     

17 May, 2011

1 commit


08 May, 2011

3 commits


30 Apr, 2011

1 commit

  • This makes sure that when a driver calls the ethtool's
    get/set_settings() callback of another driver, the data passed to it
    is clean. This guarantees that speed_hi will be zeroed correctly if
    the called callback doesn't explicitely set it: we are sure we don't
    get a corrupted speed from the underlying driver. We also take care of
    setting the cmd field appropriately (ETHTOOL_GSET/SSET).

    This applies to dev_ethtool_get_settings(), which now makes sure it
    sets up that ethtool command parameter correctly before passing it to
    drivers. This also means that whoever calls dev_ethtool_get_settings()
    does not have to clean the ethtool command parameter. This function
    also becomes an exported symbol instead of an inline.

    All drivers visible to make allyesconfig under x86_64 have been
    updated.

    Signed-off-by: David Decotigny
    Signed-off-by: David S. Miller

    David Decotigny
     

10 Feb, 2011

1 commit

  • commit a512b92 adds sysfs entry for net device group, but
    before this commit, tun also uses group sysfs, so after this
    commit checkin, kernel warns like this:
    sysfs: cannot create duplicate filename '/devices/virtual/net/vnet0/group'

    Since tun has used this for years, rename sysfs under tun might
    break existing userspace, so rename group sysfs entry for net device
    group is a better choice.

    Signed-off-by: Xiaotian Feng
    Signed-off-by: David S. Miller

    Xiaotian Feng
     

25 Jan, 2011

2 commits

  • The group of a network device can be queried or changed from userspace
    using sysfs.

    For example, considering sysfs mounted in /sys, one can change the group
    that interface lo belongs to:
    echo 1 > /sys/class/net/lo/group

    Signed-off-by: Vlad Dogaru
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Vlad Dogaru
     
  • Quoting Ben Hutchings: we presumably won't be defining features that
    can only be enabled on 64-bit architectures.

    Occurences found by `grep -r` on net/, drivers/net, include/

    [ Move features and vlan_features next to each other in
    struct netdev, as per Eric Dumazet's suggestion -DaveM ]

    Signed-off-by: Michał Mirosław
    Signed-off-by: David S. Miller

    Michał Mirosław
     

17 Dec, 2010

1 commit


02 Dec, 2010

1 commit

  • Allocate qdisc memory according to NUMA properties of cpus included in
    xps map.

    To be effective, qdisc should be (re)setup after changes
    of /sys/class/net/eth/queues/tx-/xps_cpus

    I added a numa_node field in struct netdev_queue, containing NUMA node
    if all cpus included in xps_cpus share same node, else -1.

    Signed-off-by: Eric Dumazet
    Cc: Ben Hutchings
    Cc: Tom Herbert
    Signed-off-by: David S. Miller

    Eric Dumazet
     

30 Nov, 2010

2 commits


29 Nov, 2010

1 commit

  • This patch adds XPS_CONFIG option to enable and disable XPS. This is
    done in the same manner as RPS_CONFIG. This is also fixes build
    failure in XPS code when SMP is not enabled.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

25 Nov, 2010

1 commit

  • This patch implements transmit packet steering (XPS) for multiqueue
    devices. XPS selects a transmit queue during packet transmission based
    on configuration. This is done by mapping the CPU transmitting the
    packet to a queue. This is the transmit side analogue to RPS-- where
    RPS is selecting a CPU based on receive queue, XPS selects a queue
    based on the CPU (previously there was an XPS patch from Eric
    Dumazet, but that might more appropriately be called transmit completion
    steering).

    Each transmit queue can be associated with a number of CPUs which will
    use the queue to send packets. This is configured as a CPU mask on a
    per queue basis in:

    /sys/class/net/eth/queues/tx-/xps_cpus

    The mappings are stored per device in an inverted data structure that
    maps CPUs to queues. In the netdevice structure this is an array of
    num_possible_cpu structures where each structure holds and array of
    queue_indexes for queues which that CPU can use.

    The benefits of XPS are improved locality in the per queue data
    structures. Also, transmit completions are more likely to be done
    nearer to the sending thread, so this should promote locality back
    to the socket on free (e.g. UDP). The benefits of XPS are dependent on
    cache hierarchy, application load, and other factors. XPS would
    nominally be configured so that a queue would only be shared by CPUs
    which are sharing a cache, the degenerative configuration woud be that
    each CPU has it's own queue.

    Below are some benchmark results which show the potential benfit of
    this patch. The netperf test has 500 instances of netperf TCP_RR test
    with 1 byte req. and resp.

    bnx2x on 16 core AMD
    XPS (16 queues, 1 TX queue per CPU) 1234K at 100% CPU
    No XPS (16 queues) 996K at 100% CPU

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

18 Nov, 2010

1 commit

  • netif_set_real_num_rx_queues() can decrement and increment
    the number of rx queues. For example ixgbe does this as
    features and offloads are toggled. Presumably this could
    also happen across down/up on most devices if the available
    resources changed (cpu offlined).

    The kobject needs to be zero'd in this case so that the
    state is not preserved across kobject_put()/kobject_init_and_add().

    This resolves the following error report.

    ixgbe 0000:03:00.0: eth2: NIC Link is Up 10 Gbps, Flow Control: RX/TX
    kobject (ffff880324b83210): tried to init an initialized object, something is seriously wrong.
    Pid: 1972, comm: lldpad Not tainted 2.6.37-rc18021qaz+ #169
    Call Trace:
    [] kobject_init+0x3a/0x83
    [] kobject_init_and_add+0x23/0x57
    [] ? mark_lock+0x21/0x267
    [] net_rx_queue_update_kobjects+0x63/0xc6
    [] netif_set_real_num_rx_queues+0x5f/0x78
    [] ixgbe_set_num_queues+0x1c6/0x1ca [ixgbe]
    [] ixgbe_init_interrupt_scheme+0x1e/0x79c [ixgbe]
    [] ixgbe_dcbnl_set_state+0x167/0x189 [ixgbe]

    Signed-off-by: John Fastabend
    Signed-off-by: David S. Miller

    John Fastabend
     

16 Nov, 2010

1 commit

  • This patch move RX queue allocation to alloc_netdev_mq and freeing of
    the queues to free_netdev (symmetric to TX queue allocation). Each
    kobject RX queue takes a reference to the queue's device so that the
    device can't be freed before all the kobjects have been released-- this
    obviates the need for reference counts specific to RX queues.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

26 Oct, 2010

1 commit

  • Add __rcu annotations to :
    (struct netdev_rx_queue)->rps_map
    (struct netdev_rx_queue)->rps_flow_table
    struct rps_sock_flow_table *rps_sock_flow_table;

    And use appropriate rcu primitives.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

09 Oct, 2010

1 commit

  • The rx->count reference is used to track reference counts to the
    number of rx-queue kobjects created for the device. This patch
    eliminates initialization of the counter in netif_alloc_rx_queues
    and instead increments the counter each time a kobject is created.
    This is now symmetric with the decrement that is done when an object is
    released.

    Signed-off-by: Tom Herbert
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Tom Herbert
     

28 Sep, 2010

1 commit

  • For RPS, we create a kobject for each RX queue based on the number of
    queues passed to alloc_netdev_mq(). However, drivers generally do not
    determine the numbers of hardware queues to use until much later, so
    this usually represents the maximum number the driver may use and not
    the actual number in use.

    For TX queues, drivers can update the actual number using
    netif_set_real_num_tx_queues(). Add a corresponding function for RX
    queues, netif_set_real_num_rx_queues().

    Signed-off-by: Ben Hutchings
    Signed-off-by: David S. Miller

    Ben Hutchings
     

02 Sep, 2010

1 commit


17 Aug, 2010

1 commit

  • Enable using network namespaces with
    wireless devices even when sysfs is
    enabled using the same infrastructure
    that was built for netdevs.

    Signed-off-by: Johannes Berg
    Acked-by: "Eric W. Biederman"
    Signed-off-by: John W. Linville

    Johannes Berg
     

25 Jul, 2010

1 commit

  • Add addr_assign_type to struct net_device and expose it via sysfs.
    This new attribute has the purpose of giving user-space the ability to
    distinguish between different assignment types of MAC addresses.

    For example user-space can treat NICs with randomly generated MAC
    addresses differently than NICs that have permanent (locally assigned)
    MAC addresses.
    For the former udev could write a persistent net rule by matching the
    device path instead of the MAC address.
    There's also the case of devices that 'steal' MAC addresses from slave
    devices. In which it is also be beneficial for user-space to be aware
    of the fact.

    This patch also introduces a helper function to assist adoption of
    drivers that generate MAC addresses randomly.

    Signed-off-by: Stefan Assmann
    Signed-off-by: David S. Miller

    Stefan Assmann
     

13 Jul, 2010

1 commit


08 Jul, 2010

1 commit

  • There is a small possibility that a reader gets incorrect values on 32
    bit arches. SNMP applications could catch incorrect counters when a
    32bit high part is changed by another stats consumer/provider.

    One way to solve this is to add a rtnl_link_stats64 param to all
    ndo_get_stats64() methods, and also add such a parameter to
    dev_get_stats().

    Rule is that we are not allowed to use dev->stats64 as a temporary
    storage for 64bit stats, but a caller provided area (usually on stack)

    Old drivers (only providing get_stats() method) need no changes.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

13 Jun, 2010

1 commit

  • Use struct rtnl_link_stats64 as the statistics structure.

    On 32-bit architectures, insert 32 bits of padding after/before each
    field of struct net_device_stats to make its layout compatible with
    struct rtnl_link_stats64. Add an anonymous union in net_device; move
    stats into the union and add struct rtnl_link_stats64 stats64.

    Add net_device_ops::ndo_get_stats64, implementations of which will
    return a pointer to struct rtnl_link_stats64. Drivers that implement
    this operation must not update the structure asynchronously.

    Change dev_get_stats() to call ndo_get_stats64 if available, and to
    return a pointer to struct rtnl_link_stats64. Change callers of
    dev_get_stats() accordingly.

    Signed-off-by: Ben Hutchings
    Signed-off-by: David S. Miller

    Ben Hutchings
     

22 May, 2010

3 commits

  • This reverts commit aaf8cdc34ddba08122f02217d9d684e2f9f5d575.

    Drivers like the ipw2100 call device_create_group when they
    are initialized and device_remove_group when they are shutdown.
    Moving them between namespaces deletes their sysfs groups early.

    In particular the following call chain results.
    netdev_unregister_kobject -> device_del -> kobject_del -> sysfs_remove_dir
    With sysfs_remove_dir recursively deleting all of it's subdirectories,
    and nothing adding them back.

    Ouch!

    Therefore we need to call something that ultimate calls sysfs_mv_dir
    as that sysfs function can move sysfs directories between namespaces
    without deleting their subdirectories or their contents. Allowing
    us to avoid placing extra boiler plate into every driver that does
    something interesting with sysfs.

    Currently the function that provides that capability is device_rename.
    That is the code works without nasty side effects as originally written.

    So remove the misguided fix for moving devices between namespaces. The
    bug in the kobject layer that inspired it has now been recognized and
    fixed.

    Signed-off-by: Eric W. Biederman
    Acked-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • I had a couple of stupid bugs in:
    netns: Teach network device kobjects which namespace they are in.

    - I duplicated the Kconfig for the NET_NS
    - The build was broken when sysfs was not compiled in

    The sysfs breakage is because after I moved the operations
    for the sysfs to the kobject layer, to make things cleaner
    I forgot to move the ifdefs. Opps.

    I'm not quite certain how I got introduced a second NET_NS Kconfig,
    but it was probably a 3 way merge somewhere along the way that
    did not notice that the NET_NS Kconfig option had mvoed and thout
    that was a bug. It probably slipped in because it used to be the
    sysfs patches were the first patches in my network namespace patches.
    Some things just don't go like you would expect.

    Neither of these bugs actually affect anything in the common case
    but they should be fixed.

    Thanks to Serge for noticing they were present.

    Reported-by: Serge E. Hallyn
    Signed-off-by: Eric W. Biederman
    Acked-by: David S. Miller

    Eric W. Biederman
     
  • The problem. Network devices show up in sysfs and with the network
    namespace active multiple devices with the same name can show up in
    the same directory, ouch!

    To avoid that problem and allow existing applications in network namespaces
    to see the same interface that is currently presented in sysfs, this
    patch enables the tagging directory support in sysfs.

    By using the network namespace pointers as tags to separate out the
    the sysfs directory entries we ensure that we don't have conflicts
    in the directories and applications only see a limited set of
    the network devices.

    Signed-off-by: Eric W. Biederman
    Acked-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     

20 Apr, 2010

1 commit


17 Apr, 2010

1 commit

  • This patch implements receive flow steering (RFS). RFS steers
    received packets for layer 3 and 4 processing to the CPU where
    the application for the corresponding flow is running. RFS is an
    extension of Receive Packet Steering (RPS).

    The basic idea of RFS is that when an application calls recvmsg
    (or sendmsg) the application's running CPU is stored in a hash
    table that is indexed by the connection's rxhash which is stored in
    the socket structure. The rxhash is passed in skb's received on
    the connection from netif_receive_skb. For each received packet,
    the associated rxhash is used to look up the CPU in the hash table,
    if a valid CPU is set then the packet is steered to that CPU using
    the RPS mechanisms.

    The convolution of the simple approach is that it would potentially
    allow OOO packets. If threads are thrashing around CPUs or multiple
    threads are trying to read from the same sockets, a quickly changing
    CPU value in the hash table could cause rampant OOO packets--
    we consider this a non-starter.

    To avoid OOO packets, this solution implements two types of hash
    tables: rps_sock_flow_table and rps_dev_flow_table.

    rps_sock_table is a global hash table. Each entry is just a CPU
    number and it is populated in recvmsg and sendmsg as described above.
    This table contains the "desired" CPUs for flows.

    rps_dev_flow_table is specific to each device queue. Each entry
    contains a CPU and a tail queue counter. The CPU is the "current"
    CPU for a matching flow. The tail queue counter holds the value
    of a tail queue counter for the associated CPU's backlog queue at
    the time of last enqueue for a flow matching the entry.

    Each backlog queue has a queue head counter which is incremented
    on dequeue, and so a queue tail counter is computed as queue head
    count + queue length. When a packet is enqueued on a backlog queue,
    the current value of the queue tail counter is saved in the hash
    entry of the rps_dev_flow_table.

    And now the trick: when selecting the CPU for RPS (get_rps_cpu)
    the rps_sock_flow table and the rps_dev_flow table for the RX queue
    are consulted. When the desired CPU for the flow (found in the
    rps_sock_flow table) does not match the current CPU (found in the
    rps_dev_flow table), the current CPU is changed to the desired CPU
    if one of the following is true:

    - The current CPU is unset (equal to RPS_NO_CPU)
    - Current CPU is offline
    - The current CPU's queue head counter >= queue tail counter in the
    rps_dev_flow table. This checks if the queue tail has advanced
    beyond the last packet that was enqueued using this table entry.
    This guarantees that all packets queued using this entry have been
    dequeued, thus preserving in order delivery.

    Making each queue have its own rps_dev_flow table has two advantages:
    1) the tail queue counters will be written on each receive, so
    keeping the table local to interrupting CPU s good for locality. 2)
    this allows lockless access to the table-- the CPU number and queue
    tail counter need to be accessed together under mutual exclusion
    from netif_receive_skb, we assume that this is only called from
    device napi_poll which is non-reentrant.

    This patch implements RFS for TCP and connected UDP sockets.
    It should be usable for other flow oriented protocols.

    There are two configuration parameters for RFS. The
    "rps_flow_entries" kernel init parameter sets the number of
    entries in the rps_sock_flow_table, the per rxqueue sysfs entry
    "rps_flow_cnt" contains the number of entries in the rps_dev_flow
    table for the rxqueue. Both are rounded to power of two.

    The obvious benefit of RFS (over just RPS) is that it achieves
    CPU locality between the receive processing for a flow and the
    applications processing; this can result in increased performance
    (higher pps, lower latency).

    The benefits of RFS are dependent on cache hierarchy, application
    load, and other factors. On simple benchmarks, we don't necessarily
    see improvement and sometimes see degradation. However, for more
    complex benchmarks and for applications where cache pressure is
    much higher this technique seems to perform very well.

    Below are some benchmark results which show the potential benfit of
    this patch. The netperf test has 500 instances of netperf TCP_RR
    test with 1 byte req. and resp. The RPC test is an request/response
    test similar in structure to netperf RR test ith 100 threads on
    each host, but does more work in userspace that netperf.

    e1000e on 8 core Intel
    No RFS or RPS 104K tps at 30% CPU
    No RFS (best RPS config): 290K tps at 63% CPU
    RFS 303K tps at 61% CPU

    RPC test tps CPU% 50/90/99% usec latency Latency StdDev
    No RFS/RPS 103K 48% 757/900/3185 4472.35
    RPS only: 174K 73% 415/993/2468 491.66
    RFS 223K 73% 379/651/1382 315.61

    Signed-off-by: Tom Herbert
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Tom Herbert
     

12 Apr, 2010

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

29 Mar, 2010

1 commit


23 Mar, 2010

1 commit


17 Mar, 2010

1 commit

  • This patch implements software receive side packet steering (RPS). RPS
    distributes the load of received packet processing across multiple CPUs.

    Problem statement: Protocol processing done in the NAPI context for received
    packets is serialized per device queue and becomes a bottleneck under high
    packet load. This substantially limits pps that can be achieved on a single
    queue NIC and provides no scaling with multiple cores.

    This solution queues packets early on in the receive path on the backlog queues
    of other CPUs. This allows protocol processing (e.g. IP and TCP) to be
    performed on packets in parallel. For each device (or each receive queue in
    a multi-queue device) a mask of CPUs is set to indicate the CPUs that can
    process packets. A CPU is selected on a per packet basis by hashing contents
    of the packet header (e.g. the TCP or UDP 4-tuple) and using the result to index
    into the CPU mask. The IPI mechanism is used to raise networking receive
    softirqs between CPUs. This effectively emulates in software what a multi-queue
    NIC can provide, but is generic requiring no device support.

    Many devices now provide a hash over the 4-tuple on a per packet basis
    (e.g. the Toeplitz hash). This patch allow drivers to set the HW reported hash
    in an skb field, and that value in turn is used to index into the RPS maps.
    Using the HW generated hash can avoid cache misses on the packet when
    steering it to a remote CPU.

    The CPU mask is set on a per device and per queue basis in the sysfs variable
    /sys/class/net//queues/rx-/rps_cpus. This is a set of canonical
    bit maps for receive queues in the device (numbered by ). If a device
    does not support multi-queue, a single variable is used for the device (rx-0).

    Generally, we have found this technique increases pps capabilities of a single
    queue device with good CPU utilization. Optimal settings for the CPU mask
    seem to depend on architectures and cache hierarcy. Below are some results
    running 500 instances of netperf TCP_RR test with 1 byte req. and resp.
    Results show cumulative transaction rate and system CPU utilization.

    e1000e on 8 core Intel
    Without RPS: 108K tps at 33% CPU
    With RPS: 311K tps at 64% CPU

    forcedeth on 16 core AMD
    Without RPS: 156K tps at 15% CPU
    With RPS: 404K tps at 49% CPU

    bnx2x on 16 core AMD
    Without RPS 567K tps at 61% CPU (4 HW RX queues)
    Without RPS 738K tps at 96% CPU (8 HW RX queues)
    With RPS: 854K tps at 76% CPU (4 HW RX queues)

    Caveats:
    - The benefits of this patch are dependent on architecture and cache hierarchy.
    Tuning the masks to get best performance is probably necessary.
    - This patch adds overhead in the path for processing a single packet. In
    a lightly loaded server this overhead may eliminate the advantages of
    increased parallelism, and possibly cause some relative performance degradation.
    We have found that masks that are cache aware (share same caches with
    the interrupting CPU) mitigate much of this.
    - The RPS masks can be changed dynamically, however whenever the mask is changed
    this introduces the possibility of generating out of order packets. It's
    probably best not change the masks too frequently.

    Signed-off-by: Tom Herbert

    include/linux/netdevice.h | 32 ++++-
    include/linux/skbuff.h | 3 +
    net/core/dev.c | 335 +++++++++++++++++++++++++++++++++++++--------
    net/core/net-sysfs.c | 225 ++++++++++++++++++++++++++++++-
    net/core/skbuff.c | 2 +
    5 files changed, 538 insertions(+), 59 deletions(-)
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Tom Herbert
     

20 Feb, 2010

1 commit


26 Nov, 2009

1 commit

  • Generated with the following semantic patch

    @@
    struct net *n1;
    struct net *n2;
    @@
    - n1 == n2
    + net_eq(n1, n2)

    @@
    struct net *n1;
    struct net *n2;
    @@
    - n1 != n2
    + !net_eq(n1, n2)

    applied over {include,net,drivers/net}.

    Signed-off-by: Octavian Purdila
    Signed-off-by: David S. Miller

    Octavian Purdila
     

31 Oct, 2009

1 commit


28 Oct, 2009

1 commit

  • commit d519e17e2d01a0ee9abe083019532061b4438065
    (net: export device speed and duplex via sysfs)
    made the wrong assumption that netdev->ethtool_ops was always set.

    This makes possible to crash kernel and let rtnl in locked state.

    modprobe dummy
    ip link set dummy0 up
    (udev runs and crash)

    Signed-off-by: Eric Dumazet
    Acked-by: Andy Gospodarek
    Signed-off-by: David S. Miller

    Eric Dumazet