19 Jul, 2010

1 commit

  • - Without the 8021q module loaded in the kernel, all 802.1p packets
    (VLAN 0 but QoS tagging) are silently discarded (as expected, as
    the protocol is not loaded).

    - Without this patch in 8021q module, these packets are forwarded to
    the module, but they are discarded also if VLAN 0 is not configured,
    which should not be the default behaviour, as VLAN 0 is not really
    a VLANed packet but a 802.1p packet. Defining VLAN 0 makes it almost
    impossible to communicate with mixed 802.1p and non 802.1p devices on
    the same network due to arp table issues.

    - Changed logic to skip vlan specific code in vlan_skb_recv if VLAN
    is 0 and we have not defined a VLAN with ID 0, but we accept the
    packet with the encapsulated proto and pass it later to netif_rx.

    - In the vlan device event handler, added some logic to add VLAN 0
    to HW filter in devices that support it (this prevented any traffic
    in VLAN 0 to reach the stack in e1000e with HW filter under 2.6.35,
    and probably also with other HW filtered cards, so we fix it here).

    - In the vlan unregister logic, prevent the elimination of VLAN 0
    in devices with HW filter.

    - The default behaviour is to ignore the VLAN 0 tagging and accept
    the packet as if it was not tagged, but we can still define a
    VLAN 0 if desired (so it is backwards compatible).

    Signed-off-by: Pedro Garcia
    Signed-off-by: David S. Miller

    Pedro Garcia
     

10 Jul, 2010

1 commit

  • In commit be1f3c2c027cc5ad735df6a45a542ed1db7ec48b "net: Enable 64-bit
    net device statistics on 32-bit architectures" I redefined struct
    net_device_stats so that it could be used in a union with struct
    rtnl_link_stats64, avoiding the need for explicit copying or
    conversion between the two. However, this is unsafe because there is
    no locking required and no lock consistently held around calls to
    dev_get_stats() and use of the statistics structure it returns.

    In commit 28172739f0a276eb8d6ca917b3974c2edb036da3 "net: fix 64 bit
    counters on 32 bit arches" Eric Dumazet dealt with that problem by
    requiring callers of dev_get_stats() to provide storage for the
    result. This means that the net_device::stats64 field and the padding
    in struct net_device_stats are now redundant, so remove them.

    Update the comment on net_device_ops::ndo_get_stats64 to reflect its
    new usage.

    Change dev_txq_stats_fold() to use struct rtnl_link_stats64, since
    that is what all its callers are really using and it is no longer
    going to be compatible with struct net_device_stats.

    Eric Dumazet suggested the separate function for the structure
    conversion.

    Signed-off-by: Ben Hutchings
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Ben Hutchings
     

09 Jul, 2010

1 commit

  • When we need to shape traffic using low speeds, we need to
    disable tso on network interface :

    ethtool -K eth0.2240 tso off

    It seems vlan interfaces miss the set_tso() ethtool method.

    Before enabling TSO, we must check real device supports
    TSO for VLAN-tagged packets and enables TSO.

    Note that a TSO change on real device propagates TSO setting
    on all vlans, even if admin selected a different TSO setting.

    Signed-off-by: Eric Dumazet
    Signed-off-by: Ben Hutchings
    Signed-off-by: David S. Miller

    Eric Dumazet
     

08 Jul, 2010

1 commit

  • There is a small possibility that a reader gets incorrect values on 32
    bit arches. SNMP applications could catch incorrect counters when a
    32bit high part is changed by another stats consumer/provider.

    One way to solve this is to add a rtnl_link_stats64 param to all
    ndo_get_stats64() methods, and also add such a parameter to
    dev_get_stats().

    Rule is that we are not allowed to use dev->stats64 as a temporary
    storage for 64bit stats, but a caller provided area (usually on stack)

    Old drivers (only providing get_stats() method) need no changes.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

29 Jun, 2010

1 commit


13 Jun, 2010

1 commit

  • Use struct rtnl_link_stats64 as the statistics structure.

    On 32-bit architectures, insert 32 bits of padding after/before each
    field of struct net_device_stats to make its layout compatible with
    struct rtnl_link_stats64. Add an anonymous union in net_device; move
    stats into the union and add struct rtnl_link_stats64 stats64.

    Add net_device_ops::ndo_get_stats64, implementations of which will
    return a pointer to struct rtnl_link_stats64. Drivers that implement
    this operation must not update the structure asynchronously.

    Change dev_get_stats() to call ndo_get_stats64 if available, and to
    return a pointer to struct rtnl_link_stats64. Change callers of
    dev_get_stats() accordingly.

    Signed-off-by: Ben Hutchings
    Signed-off-by: David S. Miller

    Ben Hutchings
     

11 Jun, 2010

1 commit

  • Currently, the accelerated receive path for VLAN's will
    drop packets if the real device is an inactive slave and
    is not one of the special pkts tested for in
    skb_bond_should_drop(). This behavior is different then
    the non-accelerated path and for pkts over a bonded vlan.

    For example,

    vlanx -> bond0 -> ethx

    will be dropped in the vlan path and not delivered to any
    packet handlers at all. However,

    bond0 -> vlanx -> ethx

    and

    bond0 -> ethx

    will be delivered to handlers that match the exact dev,
    because the VLAN path checks the real_dev which is not a
    slave and netif_recv_skb() doesn't drop frames but only
    delivers them to exact matches.

    This patch adds a sk_buff flag which is used for tagging
    skbs that would previously been dropped and allows the
    skb to continue to skb_netif_recv(). Here we add
    logic to check for the deliver_no_wcard flag and if it
    is set only deliver to handlers that match exactly. This
    makes both paths above consistent and gives pkt handlers
    a way to identify skbs that come from inactive slaves.
    Without this patch in some configurations skbs will be
    delivered to handlers with exact matches and in others
    be dropped out right in the vlan path.

    I have tested the following 4 configurations in failover modes
    and load balancing modes.

    # bond0 -> ethx

    # vlanx -> bond0 -> ethx

    # bond0 -> vlanx -> ethx

    # bond0 -> ethx
    |
    vlanx -> --

    Signed-off-by: John Fastabend
    Signed-off-by: David S. Miller

    John Fastabend
     

02 Jun, 2010

1 commit


18 May, 2010

1 commit


16 May, 2010

1 commit


12 Apr, 2010

1 commit


07 Apr, 2010

1 commit


04 Apr, 2010

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

25 Mar, 2010

2 commits

  • Updates real_num_tx_queues in case underlying real device
    has changed real_num_tx_queues.

    -v2
    As per Eric Dumazet comment:-
    -- adds BUG_ON to catch case of real_num_tx_queues exceeding num_tx_queues.
    -- created this self contained patch to just update real_num_tx_queues.

    Signed-off-by: Vasu Dev
    Signed-off-by: Jeff Kirsher
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Vasu Dev
     
  • This is required to correctly select vlan tx queue for a driver
    supporting multi tx queue with ndo_select_queue implemented since
    currently selected vlan tx queue is unaligned to selected queue by
    real net_devce ndo_select_queue.

    Unaligned vlan tx queue selection causes thrash with higher vlan
    tx lock contention for least fcoe traffic and wrong socket tx
    queue_mapping for ixgbe having ndo_select_queue implemented.

    -v2

    As per Eric Dumazet comments, mirrored
    vlan net_device_ops to have them with and without vlan_dev_select_queue
    and then select according to real dev ndo_select_queue present or not
    for a vlan net_device. This is to completely skip vlan_dev_select_queue
    calling for real net_device not supporting ndo_select_queue.

    Signed-off-by: Vasu Dev
    Signed-off-by: Jeff Kirsher
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Vasu Dev
     

21 Mar, 2010

1 commit


19 Mar, 2010

2 commits

  • When doing "ifenslave -d bond0 eth0", there is chance to get NULL
    dereference in netif_receive_skb(), because dev->master suddenly becomes
    NULL after we tested it.

    We should use ACCESS_ONCE() to avoid this (or rcu_dereference())

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • It's not desired for underlaying devices to change type. At the time,
    there is for example possible to have bond with changed type from
    Ethernet to Infiniband as a port of a bridge. This patch fixes this.

    Signed-off-by: Jiri Pirko
    Signed-off-by: Jay Vosburgh
    Signed-off-by: David S. Miller

    Jiri Pirko
     

17 Feb, 2010

1 commit

  • Add __percpu sparse annotations to net.

    These annotations are to make sparse consider percpu variables to be
    in a different address space and warn if accessed without going
    through percpu accessors. This patch doesn't affect normal builds.

    The macro and type tricks around snmp stats make things a bit
    interesting. DEFINE/DECLARE_SNMP_STAT() macros mark the target field
    as __percpu and SNMP_UPD_PO_STATS() macro is updated accordingly. All
    snmp_mib_*() users which used to cast the argument to (void **) are
    updated to cast it to (void __percpu **).

    Signed-off-by: Tejun Heo
    Acked-by: David S. Miller
    Cc: Patrick McHardy
    Cc: Arnaldo Carvalho de Melo
    Cc: Vlad Yasevich
    Cc: netdev@vger.kernel.org
    Signed-off-by: David S. Miller

    Tejun Heo
     

04 Feb, 2010

1 commit

  • In the vlan and macvlan drivers, the start_xmit function forwards
    data to the dev_queue_xmit function for another device, which may
    potentially belong to a different namespace.

    To make sure that classification stays within a single namespace,
    this resets the potentially critical fields.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: David S. Miller

    Arnd Bergmann
     

28 Jan, 2010

1 commit


25 Jan, 2010

1 commit

  • Bruno Prémont found commit 9793241fe92f7d930
    (vlan: Precise RX stats accounting) added a regression for non
    hw accelerated vlans.

    [ 26.390576] BUG: unable to handle kernel NULL pointer dereference at (null)
    [ 26.396369] IP: [] vlan_skb_recv+0x89/0x280 [8021q]

    vlan_dev_info() was used with original device, instead of
    skb->dev. Also spotted by Américo Wang.

    Reported-By: Bruno Prémont
    Tested-By: Bruno Prémont
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

18 Jan, 2010

1 commit


04 Jan, 2010

1 commit

  • This allows a bond device to specify an arp_ip_target as a host that is
    not on the same vlan as the base bond device and still use arp
    validation. A configuration like this, now works:

    BONDING_OPTS="mode=active-backup arp_interval=1000 arp_ip_target=10.0.100.1 arp_validate=3"

    1: lo: mtu 16436 qdisc noqueue
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
    valid_lft forever preferred_lft forever
    2: eth1: mtu 1500 qdisc pfifo_fast master bond0 qlen 1000
    link/ether 00:13:21:be:33:e9 brd ff:ff:ff:ff:ff:ff
    3: eth0: mtu 1500 qdisc pfifo_fast master bond0 qlen 1000
    link/ether 00:13:21:be:33:e9 brd ff:ff:ff:ff:ff:ff
    8: bond0: mtu 1500 qdisc noqueue
    link/ether 00:13:21:be:33:e9 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::213:21ff:febe:33e9/64 scope link
    valid_lft forever preferred_lft forever
    9: bond0.100@bond0: mtu 1500 qdisc noqueue
    link/ether 00:13:21:be:33:e9 brd ff:ff:ff:ff:ff:ff
    inet 10.0.100.2/24 brd 10.0.100.255 scope global bond0.100
    inet6 fe80::213:21ff:febe:33e9/64 scope link
    valid_lft forever preferred_lft forever

    Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)

    Bonding Mode: fault-tolerance (active-backup)
    Primary Slave: None
    Currently Active Slave: eth1
    MII Status: up
    MII Polling Interval (ms): 0
    Up Delay (ms): 0
    Down Delay (ms): 0
    ARP Polling Interval (ms): 1000
    ARP IP target/s (n.n.n.n form): 10.0.100.1

    Slave Interface: eth1
    MII Status: up
    Link Failure Count: 1
    Permanent HW addr: 00:40:05:30:ff:30

    Slave Interface: eth0
    MII Status: up
    Link Failure Count: 0
    Permanent HW addr: 00:13:21:be:33:e9

    Signed-off-by: Andy Gospodarek
    Signed-off-by: Jay Vosburgh
    Signed-off-by: David S. Miller

    Andy Gospodarek
     

27 Dec, 2009

1 commit

  • Using dev_hard_header allows us to use LLC with VLANs and potentially
    other Ethernet/TokernRing specific encapsulations. It also removes code
    duplication between LLC and Ethernet/TokenRing core code.

    Signed-off-by: Octavian Purdila
    Signed-off-by: David S. Miller

    Octavian Purdila
     

04 Dec, 2009

1 commit


03 Dec, 2009

1 commit

  • Take advantage of the fact that an explicit rtnl_kill_links is
    unnecessary (and skipping it improves batching), as network namespace
    exit calls dellink on all remaining virtual devices, and
    rtnl_link_unregister calls dellink on all outstanding devices in that
    network namespace. To do this we need to leave the vlan proc
    directories in place until after network device exit time, which is
    done by using register_pernet_subsys instead of
    register_pernet_device.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

02 Dec, 2009

1 commit


27 Nov, 2009

1 commit

  • Currently the UP/DOWN state of VLANs is synchronized to the state of the
    underlying device, meaning all VLANs are set down once the underlying
    device is set down. This causes all routes to the VLAN devices to vanish.

    Add a flag to specify a "loose binding" mode, in which only the operstate
    is transfered, but the VLAN device state is independant.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     

19 Nov, 2009

1 commit


18 Nov, 2009

2 commits

  • Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • With multi queue devices, its possible that several cpus call
    vlan RX routines simultaneously for the same vlan device.

    We update RX stats counter without any locking, so we can
    get slightly wrong counters.

    One possible fix is to use percpu counters, to get precise
    accounting and also get guarantee of no cache line ping pongs
    between cpus.

    Note: this adds 16 bytes (32 bytes on 64bit arches) of percpu
    data per vlan device.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

17 Nov, 2009

1 commit

  • In case register_netdevice() returns an error, and a new vlan_group
    was allocated and inserted in vlan_group_hash[] we call
    vlan_group_free() without deleting group from hash table. Future
    lookups can give infinite loops or crashes.

    We must delete the vlan_group using RCU safe procedure.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

16 Nov, 2009

1 commit


14 Nov, 2009

1 commit


11 Nov, 2009

1 commit


08 Nov, 2009

1 commit

  • There is no good reason to not support userspace specifying the
    network namespace during device creation, and it makes it easier
    to create a network device and pass it to a child network namespace
    with a well known name.

    We have to be careful to ensure that the target network namespace
    for the new device exists through the life of the call. To keep
    that logic clear I have factored out the network namespace grabbing
    logic into rtnl_link_get_net.

    In addtion we need to continue to pass the source network namespace
    to the rtnl_link_ops.newlink method so that we can find the base
    device source network namespace.

    Signed-off-by: Eric W. Biederman
    Acked-by: Eric Dumazet

    Eric W. Biederman
     

30 Oct, 2009

2 commits

  • The temporary copy of the VLAN group is not neccessary since the lower device
    is already in the process of being unregistered, if it was neccessary the
    memset of the global group would introduce a race condition.

    With this removed, the changes to the original code are only a few lines, so
    remove the new function and move the code back into vlan_device_event().

    Signed-off-by: Patrick McHardy
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • This will allow drivers to adjust their receive path dynamically
    based on whether GRO is being applied successfully.

    Currently all in-tree callers ignore the return values of these
    functions and do not need to be changed.

    Signed-off-by: Ben Hutchings
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Ben Hutchings