13 Jan, 2012

1 commit

  • commit a9b3cd7f32 (rcu: convert uses of rcu_assign_pointer(x, NULL) to
    RCU_INIT_POINTER) did a lot of incorrect changes, since it did a
    complete conversion of rcu_assign_pointer(x, y) to RCU_INIT_POINTER(x,
    y).

    We miss needed barriers, even on x86, when y is not NULL.

    Signed-off-by: Eric Dumazet
    CC: Stephen Hemminger
    CC: Paul E. McKenney
    Signed-off-by: David S. Miller

    Eric Dumazet
     

25 Dec, 2011

1 commit

  • Aim of this patch is to provide full range of rps_flow_cnt on 64bit arches.

    Theorical limit on number of flows is 2^32

    Fix some buggy RPS/RFS macros as well.

    Signed-off-by: Eric Dumazet
    CC: Tom Herbert
    CC: Xi Wang
    CC: Laurent Chavey
    Signed-off-by: David S. Miller

    Eric Dumazet
     

24 Dec, 2011

1 commit


23 Dec, 2011

1 commit


06 Dec, 2011

1 commit


30 Nov, 2011

2 commits

  • Networking stack support for byte queue limits, uses dynamic queue
    limits library. Byte queue limits are maintained per transmit queue,
    and a dql structure has been added to netdev_queue structure for this
    purpose.

    Configuration of bql is in the tx- sysfs directory for the queue
    under the byte_queue_limits directory. Configuration includes:
    limit_min, bql minimum limit
    limit_max, bql maximum limit
    hold_time, bql slack hold time

    Also under the directory are:
    limit, current byte limit
    inflight, current number of bytes on the queue

    Signed-off-by: Tom Herbert
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Tom Herbert
     
  • This patch moves the xps specific parts in netdev_queue_release into
    its own function which netdev_queue_release can call. This allows
    netdev_queue_release to be more generic (for adding new attributes
    to tx queues).

    Signed-off-by: Tom Herbert
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Tom Herbert
     

18 Nov, 2011

1 commit

  • Most machines dont use RPS/RFS, and pay a fair amount of instructions in
    netif_receive_skb() / netif_rx() / get_rps_cpu() just to discover
    RPS/RFS is not setup.

    Add a jump_label named rps_needed.

    If no device rps_map or global rps_sock_flow_table is setup,
    netif_receive_skb() / netif_rx() do a single instruction instead of many
    ones, including conditional jumps.

    jmp +0 (if CONFIG_JUMP_LABEL=y)

    Signed-off-by: Eric Dumazet
    CC: Tom Herbert
    Signed-off-by: David S. Miller

    Eric Dumazet
     

17 Nov, 2011

2 commits

  • This adds the /sys/class/net/DEV/queues/Q/tx_timeout attribute
    containing the total number of timeout events on the given queue. It
    is always available with CONFIG_SYSFS, independently of
    CONFIG_RPS/XPS.

    Credits to Stephen Hemminger for a preliminary version of this patch.

    Tested:
    without CONFIG_SYSFS (compilation only)
    with sysfs and without CONFIG_RPS & CONFIG_XPS
    with sysfs and without CONFIG_RPS
    with sysfs and without CONFIG_XPS
    with defaults

    Signed-off-by: David Decotigny
    Signed-off-by: David S. Miller

    david decotigny
     
  • This commit fixes following warning:
    net/core/net-sysfs.c:921:6: warning: symbol 'numa_node' shadows an earlier one
    include/linux/topology.h:222:1: originally declared here

    Signed-off-by: David Decotigny
    Signed-off-by: David S. Miller

    david decotigny
     

01 Nov, 2011

1 commit


16 Sep, 2011

1 commit

  • This patch does several things:
    - introduces __ethtool_get_settings which is called from ethtool code and
    from drivers as well. Put ASSERT_RTNL there.
    - dev_ethtool_get_settings() is replaced by __ethtool_get_settings()
    - changes calling in drivers so rtnl locking is respected. In
    iboe_get_rate was previously ->get_settings() called unlocked. This
    fixes it. Also prb_calc_retire_blk_tmo() in af_packet.c had the same
    problem. Also fixed by calling __dev_get_by_index() instead of
    dev_get_by_index() and holding rtnl_lock for both calls.
    - introduces rtnl_lock in bnx2fc_vport_create() and fcoe_vport_create()
    so bnx2fc_if_create() and fcoe_if_create() are called locked as they
    are from other places.
    - use __ethtool_get_settings() in bonding code

    Signed-off-by: Jiri Pirko

    v2->v3:
    -removed dev_ethtool_get_settings()
    -added ASSERT_RTNL into __ethtool_get_settings()
    -prb_calc_retire_blk_tmo - use __dev_get_by_index() and lock
    around it and __ethtool_get_settings() call
    v1->v2:
    add missing export_symbol
    Reviewed-by: Ben Hutchings [except FCoE bits]
    Acked-by: Ralf Baechle
    Signed-off-by: David S. Miller

    Jiri Pirko
     

12 Aug, 2011

1 commit


02 Aug, 2011

1 commit

  • When assigning a NULL value to an RCU protected pointer, no barrier
    is needed. The rcu_assign_pointer, used to handle that but will soon
    change to not handle the special case.

    Convert all rcu_assign_pointer of NULL value.

    //smpl
    @@ expression P; @@

    - rcu_assign_pointer(P, NULL)
    + RCU_INIT_POINTER(P, NULL)

    //

    Signed-off-by: Stephen Hemminger
    Acked-by: Paul E. McKenney
    Signed-off-by: David S. Miller

    Stephen Hemminger
     

15 Jul, 2011

1 commit


13 Jun, 2011

1 commit

  • * new refcount in struct net, controlling actual freeing of the memory
    * new method in kobj_ns_type_operations (->drop_ns())
    * ->current_ns() semantics change - it's supposed to be followed by
    corresponding ->drop_ns(). For struct net in case of CONFIG_NET_NS it bumps
    the new refcount; net_drop_ns() decrements it and calls net_free() if the
    last reference has been dropped. Method renamed to ->grab_current_ns().
    * old net_free() callers call net_drop_ns() instead.
    * sysfs_exit_ns() is gone, along with a large part of callchain
    leading to it; now that the references stored in ->ns[...] stay valid we
    do not need to hunt them down and replace them with NULL. That fixes
    problems in sysfs_lookup() and sysfs_readdir(), along with getting rid
    of sb->s_instances abuse.

    Note that struct net *shutdown* logics has not changed - net_cleanup()
    is called exactly when it used to be called. The only thing postponed by
    having a sysfs instance refering to that struct net is actual freeing of
    memory occupied by struct net.

    Signed-off-by: Al Viro

    Al Viro
     

21 May, 2011

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1446 commits)
    macvlan: fix panic if lowerdev in a bond
    tg3: Add braces around 5906 workaround.
    tg3: Fix NETIF_F_LOOPBACK error
    macvlan: remove one synchronize_rcu() call
    networking: NET_CLS_ROUTE4 depends on INET
    irda: Fix error propagation in ircomm_lmp_connect_response()
    irda: Kill set but unused variable 'bytes' in irlan_check_command_param()
    irda: Kill set but unused variable 'clen' in ircomm_connect_indication()
    rxrpc: Fix set but unused variable 'usage' in rxrpc_get_transport()
    be2net: Kill set but unused variable 'req' in lancer_fw_download()
    irda: Kill set but unused vars 'saddr' and 'daddr' in irlan_provider_connect_indication()
    atl1c: atl1c_resume() is only used when CONFIG_PM_SLEEP is defined.
    rxrpc: Fix set but unused variable 'usage' in rxrpc_get_peer().
    rxrpc: Kill set but unused variable 'local' in rxrpc_UDP_error_handler()
    rxrpc: Kill set but unused variable 'sp' in rxrpc_process_connection()
    rxrpc: Kill set but unused variable 'sp' in rxrpc_rotate_tx_window()
    pkt_sched: Kill set but unused variable 'protocol' in tc_classify()
    isdn: capi: Use pr_debug() instead of ifdefs.
    tg3: Update version to 3.119
    tg3: Apply rx_discards fix to 5719/5720
    ...

    Fix up trivial conflicts in arch/x86/Kconfig and net/mac80211/agg-tx.c
    as per Davem.

    Linus Torvalds
     

17 May, 2011

1 commit


08 May, 2011

3 commits


30 Apr, 2011

1 commit

  • This makes sure that when a driver calls the ethtool's
    get/set_settings() callback of another driver, the data passed to it
    is clean. This guarantees that speed_hi will be zeroed correctly if
    the called callback doesn't explicitely set it: we are sure we don't
    get a corrupted speed from the underlying driver. We also take care of
    setting the cmd field appropriately (ETHTOOL_GSET/SSET).

    This applies to dev_ethtool_get_settings(), which now makes sure it
    sets up that ethtool command parameter correctly before passing it to
    drivers. This also means that whoever calls dev_ethtool_get_settings()
    does not have to clean the ethtool command parameter. This function
    also becomes an exported symbol instead of an inline.

    All drivers visible to make allyesconfig under x86_64 have been
    updated.

    Signed-off-by: David Decotigny
    Signed-off-by: David S. Miller

    David Decotigny
     

10 Feb, 2011

1 commit

  • commit a512b92 adds sysfs entry for net device group, but
    before this commit, tun also uses group sysfs, so after this
    commit checkin, kernel warns like this:
    sysfs: cannot create duplicate filename '/devices/virtual/net/vnet0/group'

    Since tun has used this for years, rename sysfs under tun might
    break existing userspace, so rename group sysfs entry for net device
    group is a better choice.

    Signed-off-by: Xiaotian Feng
    Signed-off-by: David S. Miller

    Xiaotian Feng
     

25 Jan, 2011

2 commits

  • The group of a network device can be queried or changed from userspace
    using sysfs.

    For example, considering sysfs mounted in /sys, one can change the group
    that interface lo belongs to:
    echo 1 > /sys/class/net/lo/group

    Signed-off-by: Vlad Dogaru
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Vlad Dogaru
     
  • Quoting Ben Hutchings: we presumably won't be defining features that
    can only be enabled on 64-bit architectures.

    Occurences found by `grep -r` on net/, drivers/net, include/

    [ Move features and vlan_features next to each other in
    struct netdev, as per Eric Dumazet's suggestion -DaveM ]

    Signed-off-by: Michał Mirosław
    Signed-off-by: David S. Miller

    Michał Mirosław
     

17 Dec, 2010

1 commit


02 Dec, 2010

1 commit

  • Allocate qdisc memory according to NUMA properties of cpus included in
    xps map.

    To be effective, qdisc should be (re)setup after changes
    of /sys/class/net/eth/queues/tx-/xps_cpus

    I added a numa_node field in struct netdev_queue, containing NUMA node
    if all cpus included in xps_cpus share same node, else -1.

    Signed-off-by: Eric Dumazet
    Cc: Ben Hutchings
    Cc: Tom Herbert
    Signed-off-by: David S. Miller

    Eric Dumazet
     

30 Nov, 2010

2 commits


29 Nov, 2010

1 commit

  • This patch adds XPS_CONFIG option to enable and disable XPS. This is
    done in the same manner as RPS_CONFIG. This is also fixes build
    failure in XPS code when SMP is not enabled.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

25 Nov, 2010

1 commit

  • This patch implements transmit packet steering (XPS) for multiqueue
    devices. XPS selects a transmit queue during packet transmission based
    on configuration. This is done by mapping the CPU transmitting the
    packet to a queue. This is the transmit side analogue to RPS-- where
    RPS is selecting a CPU based on receive queue, XPS selects a queue
    based on the CPU (previously there was an XPS patch from Eric
    Dumazet, but that might more appropriately be called transmit completion
    steering).

    Each transmit queue can be associated with a number of CPUs which will
    use the queue to send packets. This is configured as a CPU mask on a
    per queue basis in:

    /sys/class/net/eth/queues/tx-/xps_cpus

    The mappings are stored per device in an inverted data structure that
    maps CPUs to queues. In the netdevice structure this is an array of
    num_possible_cpu structures where each structure holds and array of
    queue_indexes for queues which that CPU can use.

    The benefits of XPS are improved locality in the per queue data
    structures. Also, transmit completions are more likely to be done
    nearer to the sending thread, so this should promote locality back
    to the socket on free (e.g. UDP). The benefits of XPS are dependent on
    cache hierarchy, application load, and other factors. XPS would
    nominally be configured so that a queue would only be shared by CPUs
    which are sharing a cache, the degenerative configuration woud be that
    each CPU has it's own queue.

    Below are some benchmark results which show the potential benfit of
    this patch. The netperf test has 500 instances of netperf TCP_RR test
    with 1 byte req. and resp.

    bnx2x on 16 core AMD
    XPS (16 queues, 1 TX queue per CPU) 1234K at 100% CPU
    No XPS (16 queues) 996K at 100% CPU

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

18 Nov, 2010

1 commit

  • netif_set_real_num_rx_queues() can decrement and increment
    the number of rx queues. For example ixgbe does this as
    features and offloads are toggled. Presumably this could
    also happen across down/up on most devices if the available
    resources changed (cpu offlined).

    The kobject needs to be zero'd in this case so that the
    state is not preserved across kobject_put()/kobject_init_and_add().

    This resolves the following error report.

    ixgbe 0000:03:00.0: eth2: NIC Link is Up 10 Gbps, Flow Control: RX/TX
    kobject (ffff880324b83210): tried to init an initialized object, something is seriously wrong.
    Pid: 1972, comm: lldpad Not tainted 2.6.37-rc18021qaz+ #169
    Call Trace:
    [] kobject_init+0x3a/0x83
    [] kobject_init_and_add+0x23/0x57
    [] ? mark_lock+0x21/0x267
    [] net_rx_queue_update_kobjects+0x63/0xc6
    [] netif_set_real_num_rx_queues+0x5f/0x78
    [] ixgbe_set_num_queues+0x1c6/0x1ca [ixgbe]
    [] ixgbe_init_interrupt_scheme+0x1e/0x79c [ixgbe]
    [] ixgbe_dcbnl_set_state+0x167/0x189 [ixgbe]

    Signed-off-by: John Fastabend
    Signed-off-by: David S. Miller

    John Fastabend
     

16 Nov, 2010

1 commit

  • This patch move RX queue allocation to alloc_netdev_mq and freeing of
    the queues to free_netdev (symmetric to TX queue allocation). Each
    kobject RX queue takes a reference to the queue's device so that the
    device can't be freed before all the kobjects have been released-- this
    obviates the need for reference counts specific to RX queues.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

26 Oct, 2010

1 commit

  • Add __rcu annotations to :
    (struct netdev_rx_queue)->rps_map
    (struct netdev_rx_queue)->rps_flow_table
    struct rps_sock_flow_table *rps_sock_flow_table;

    And use appropriate rcu primitives.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

09 Oct, 2010

1 commit

  • The rx->count reference is used to track reference counts to the
    number of rx-queue kobjects created for the device. This patch
    eliminates initialization of the counter in netif_alloc_rx_queues
    and instead increments the counter each time a kobject is created.
    This is now symmetric with the decrement that is done when an object is
    released.

    Signed-off-by: Tom Herbert
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Tom Herbert
     

28 Sep, 2010

1 commit

  • For RPS, we create a kobject for each RX queue based on the number of
    queues passed to alloc_netdev_mq(). However, drivers generally do not
    determine the numbers of hardware queues to use until much later, so
    this usually represents the maximum number the driver may use and not
    the actual number in use.

    For TX queues, drivers can update the actual number using
    netif_set_real_num_tx_queues(). Add a corresponding function for RX
    queues, netif_set_real_num_rx_queues().

    Signed-off-by: Ben Hutchings
    Signed-off-by: David S. Miller

    Ben Hutchings
     

02 Sep, 2010

1 commit


17 Aug, 2010

1 commit

  • Enable using network namespaces with
    wireless devices even when sysfs is
    enabled using the same infrastructure
    that was built for netdevs.

    Signed-off-by: Johannes Berg
    Acked-by: "Eric W. Biederman"
    Signed-off-by: John W. Linville

    Johannes Berg
     

25 Jul, 2010

1 commit

  • Add addr_assign_type to struct net_device and expose it via sysfs.
    This new attribute has the purpose of giving user-space the ability to
    distinguish between different assignment types of MAC addresses.

    For example user-space can treat NICs with randomly generated MAC
    addresses differently than NICs that have permanent (locally assigned)
    MAC addresses.
    For the former udev could write a persistent net rule by matching the
    device path instead of the MAC address.
    There's also the case of devices that 'steal' MAC addresses from slave
    devices. In which it is also be beneficial for user-space to be aware
    of the fact.

    This patch also introduces a helper function to assist adoption of
    drivers that generate MAC addresses randomly.

    Signed-off-by: Stefan Assmann
    Signed-off-by: David S. Miller

    Stefan Assmann
     

13 Jul, 2010

1 commit