25 Jul, 2018

1 commit


03 Jul, 2018

1 commit


02 Jul, 2018

1 commit

  • Since the addition of GRO for ESP, gro_receive can consume the skb and
    return -EINPROGRESS. In that case, the lower layer GRO handler cannot
    touch the skb anymore.

    Commit 5f114163f2f5 ("net: Add a skb_gro_flush_final helper.") converted
    some of the gro_receive handlers that can lead to ESP's gro_receive so
    that they wouldn't access the skb when -EINPROGRESS is returned, but
    missed other spots, mainly in tunneling protocols.

    This patch finishes the conversion to using skb_gro_flush_final(), and
    adds a new helper, skb_gro_flush_final_remcsum(), used in VXLAN and
    GUE.

    Fixes: 5f114163f2f5 ("net: Add a skb_gro_flush_final helper.")
    Signed-off-by: Sabrina Dubroca
    Reviewed-by: Stefano Brivio
    Signed-off-by: David S. Miller

    Sabrina Dubroca
     

26 Jun, 2018

1 commit

  • Manage pending per-NAPI GRO packets via list_head.

    Return an SKB pointer from the GRO receive handlers. When GRO receive
    handlers return non-NULL, it means that this SKB needs to be completed
    at this time and removed from the NAPI queue.

    Several operations are greatly simplified by this transformation,
    especially timing out the oldest SKB in the list when gro_count
    exceeds MAX_GRO_SKBS, and napi_gro_flush() which walks the queue
    in reverse order.

    Signed-off-by: David S. Miller

    David Miller
     

07 Jun, 2018

1 commit

  • Pull networking updates from David Miller:

    1) Add Maglev hashing scheduler to IPVS, from Inju Song.

    2) Lots of new TC subsystem tests from Roman Mashak.

    3) Add TCP zero copy receive and fix delayed acks and autotuning with
    SO_RCVLOWAT, from Eric Dumazet.

    4) Add XDP_REDIRECT support to mlx5 driver, from Jesper Dangaard
    Brouer.

    5) Add ttl inherit support to vxlan, from Hangbin Liu.

    6) Properly separate ipv6 routes into their logically independant
    components. fib6_info for the routing table, and fib6_nh for sets of
    nexthops, which thus can be shared. From David Ahern.

    7) Add bpf_xdp_adjust_tail helper, which can be used to generate ICMP
    messages from XDP programs. From Nikita V. Shirokov.

    8) Lots of long overdue cleanups to the r8169 driver, from Heiner
    Kallweit.

    9) Add BTF ("BPF Type Format"), from Martin KaFai Lau.

    10) Add traffic condition monitoring to iwlwifi, from Luca Coelho.

    11) Plumb extack down into fib_rules, from Roopa Prabhu.

    12) Add Flower classifier offload support to igb, from Vinicius Costa
    Gomes.

    13) Add UDP GSO support, from Willem de Bruijn.

    14) Add documentation for eBPF helpers, from Quentin Monnet.

    15) Add TLS tx offload to mlx5, from Ilya Lesokhin.

    16) Allow applications to be given the number of bytes available to read
    on a socket via a control message returned from recvmsg(), from
    Soheil Hassas Yeganeh.

    17) Add x86_32 eBPF JIT compiler, from Wang YanQing.

    18) Add AF_XDP sockets, with zerocopy support infrastructure as well.
    From Björn Töpel.

    19) Remove indirect load support from all of the BPF JITs and handle
    these operations in the verifier by translating them into native BPF
    instead. From Daniel Borkmann.

    20) Add GRO support to ipv6 gre tunnels, from Eran Ben Elisha.

    21) Allow XDP programs to do lookups in the main kernel routing tables
    for forwarding. From David Ahern.

    22) Allow drivers to store hardware state into an ELF section of kernel
    dump vmcore files, and use it in cxgb4. From Rahul Lakkireddy.

    23) Various RACK and loss detection improvements in TCP, from Yuchung
    Cheng.

    24) Add TCP SACK compression, from Eric Dumazet.

    25) Add User Mode Helper support and basic bpfilter infrastructure, from
    Alexei Starovoitov.

    26) Support ports and protocol values in RTM_GETROUTE, from Roopa
    Prabhu.

    27) Support bulking in ->ndo_xdp_xmit() API, from Jesper Dangaard
    Brouer.

    28) Add lots of forwarding selftests, from Petr Machata.

    29) Add generic network device failover driver, from Sridhar Samudrala.

    * ra.kernel.org:/pub/scm/linux/kernel/git/davem/net-next: (1959 commits)
    strparser: Add __strp_unpause and use it in ktls.
    rxrpc: Fix terminal retransmission connection ID to include the channel
    net: hns3: Optimize PF CMDQ interrupt switching process
    net: hns3: Fix for VF mailbox receiving unknown message
    net: hns3: Fix for VF mailbox cannot receiving PF response
    bnx2x: use the right constant
    Revert "net: sched: cls: Fix offloading when ingress dev is vxlan"
    net: dsa: b53: Fix for brcm tag issue in Cygnus SoC
    enic: fix UDP rss bits
    netdev-FAQ: clarify DaveM's position for stable backports
    rtnetlink: validate attributes in do_setlink()
    mlxsw: Add extack messages for port_{un, }split failures
    netdevsim: Add extack error message for devlink reload
    devlink: Add extack to reload and port_{un, }split operations
    net: metrics: add proper netlink validation
    ipmr: fix error path when ipmr_new_table fails
    ip6mr: only set ip6mr_table from setsockopt when ip6mr_new_table succeeds
    net: hns3: remove unused hclgevf_cfg_func_mta_filter
    netfilter: provide udp*_lib_lookup for nf_tproxy
    qed*: Utilize FW 8.37.2.0
    ...

    Linus Torvalds
     

18 May, 2018

1 commit


16 May, 2018

2 commits


08 May, 2018

1 commit


02 Apr, 2018

1 commit


01 Apr, 2018

1 commit


30 Mar, 2018

1 commit

  • NETIF_F_HW_VLAN_[CS]TAG_FILTER features require more than just a bit
    flip in dev->features in order to keep the driver in a consistent state.
    These features notify the driver of each added/removed vlan, but toggling
    of vlan-filter does not notify the driver accordingly for each of the
    existing vlans.

    This patch implements a similar solution to NETIF_F_RX_UDP_TUNNEL_PORT
    behavior (which notifies the driver about UDP ports in the same manner
    that vids are reported).

    Each toggling of the features propagates to the 8021q module, which
    iterates over the vlans and call add/kill ndo accordingly.

    Signed-off-by: Gal Pressman
    Reviewed-by: Tariq Toukan
    Signed-off-by: David S. Miller

    Gal Pressman
     

28 Mar, 2018

1 commit


27 Mar, 2018

1 commit

  • Prefer the direct use of octal for permissions.

    Done with checkpatch -f --types=SYMBOLIC_PERMS --fix-inplace
    and some typing.

    Miscellanea:

    o Whitespace neatening around these conversions.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

23 Mar, 2018

1 commit

  • Fun set of conflict resolutions here...

    For the mac80211 stuff, these were fortunately just parallel
    adds. Trivially resolved.

    In drivers/net/phy/phy.c we had a bug fix in 'net' that moved the
    function phy_disable_interrupts() earlier in the file, whilst in
    'net-next' the phy_error() call from this function was removed.

    In net/ipv4/xfrm4_policy.c, David Ahern's changes to remove the
    'rt_table_id' member of rtable collided with a bug fix in 'net' that
    added a new struct member "rt_mtu_locked" which needs to be copied
    over here.

    The mlxsw driver conflict consisted of net-next separating
    the span code and definitions into separate files, whilst
    a 'net' bug fix made some changes to that moved code.

    The mlx5 infiniband conflict resolution was quite non-trivial,
    the RDMA tree's merge commit was used as a guide here, and
    here are their notes:

    ====================

    Due to bug fixes found by the syzkaller bot and taken into the for-rc
    branch after development for the 4.17 merge window had already started
    being taken into the for-next branch, there were fairly non-trivial
    merge issues that would need to be resolved between the for-rc branch
    and the for-next branch. This merge resolves those conflicts and
    provides a unified base upon which ongoing development for 4.17 can
    be based.

    Conflicts:
    drivers/infiniband/hw/mlx5/main.c - Commit 42cea83f9524
    (IB/mlx5: Fix cleanup order on unload) added to for-rc and
    commit b5ca15ad7e61 (IB/mlx5: Add proper representors support)
    add as part of the devel cycle both needed to modify the
    init/de-init functions used by mlx5. To support the new
    representors, the new functions added by the cleanup patch
    needed to be made non-static, and the init/de-init list
    added by the representors patch needed to be modified to
    match the init/de-init list changes made by the cleanup
    patch.
    Updates:
    drivers/infiniband/hw/mlx5/mlx5_ib.h - Update function
    prototypes added by representors patch to reflect new function
    names as changed by cleanup patch
    drivers/infiniband/hw/mlx5/ib_rep.c - Update init/de-init
    stage list to match new order from cleanup patch
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

16 Mar, 2018

1 commit

  • With reorder header off, received packets are untagged in skb_vlan_untag()
    called from within __netif_receive_skb_core(), and later the tag will be
    inserted back in vlan_do_receive().

    This caused out of order vlan headers when we create a vlan device on top
    of another vlan device, because vlan_do_receive() inserts a tag as the
    outermost vlan tag. E.g. the outer tag is first removed in skb_vlan_untag()
    and inserted back in vlan_do_receive(), then the inner tag is next removed
    and inserted back as the outermost tag.

    This patch fixes the behaviour by inserting the inner tag at the right
    position.

    Signed-off-by: Toshiaki Makita
    Signed-off-by: David S. Miller

    Toshiaki Makita
     

28 Feb, 2018

1 commit


17 Jan, 2018

1 commit

  • /proc has been ignoring struct file_operations::owner field for 10 years.
    Specifically, it started with commit 786d7e1612f0b0adb6046f19b906609e4fe8b1ba
    ("Fix rmmod/read/write races in /proc entries"). Notice the chunk where
    inode->i_fop is initialized with proxy struct file_operations for
    regular files:

    - if (de->proc_fops)
    - inode->i_fop = de->proc_fops;
    + if (de->proc_fops) {
    + if (S_ISREG(inode->i_mode))
    + inode->i_fop = &proc_reg_file_ops;
    + else
    + inode->i_fop = de->proc_fops;
    + }

    VFS stopped pinning module at this point.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     

11 Jan, 2018

1 commit

  • A vlan device with vid 0 is allow to creat by not able to be fully
    cleaned up by unregister_vlan_dev() which checks for vlan_id!=0.

    Also, VLAN 0 is probably not a valid number and it is kinda
    "reserved" for HW accelerating devices, but it is probably too
    late to reject it from creation even if makes sense. Instead,
    just remove the check in unregister_vlan_dev().

    Reported-by: Dmitry Vyukov
    Fixes: ad1afb003939 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)")
    Cc: Vlad Yasevich
    Cc: Ben Hutchings
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

12 Nov, 2017

1 commit


11 Nov, 2017

1 commit

  • After refcnt reaches zero, vlan_vid_del() could free
    dev->vlan_info via RCU:

    RCU_INIT_POINTER(dev->vlan_info, NULL);
    call_rcu(&vlan_info->rcu, vlan_info_rcu_free);

    However, the pointer 'grp' still points to that memory
    since it is set before vlan_vid_del():

    vlan_info = rtnl_dereference(dev->vlan_info);
    if (!vlan_info)
    goto out;
    grp = &vlan_info->grp;

    Depends on when that RCU callback is scheduled, we could
    trigger a use-after-free in vlan_group_for_each_dev()
    right following this vlan_vid_del().

    Fix it by moving vlan_vid_del() before setting grp. This
    is also symmetric to the vlan_vid_add() we call in
    vlan_device_event().

    Reported-by: Fengguang Wu
    Fixes: efc73f4bbc23 ("net: Fix memory leak - vlan_info struct")
    Cc: Alexander Duyck
    Cc: Linus Torvalds
    Cc: Girish Moodalbail
    Signed-off-by: Cong Wang
    Reviewed-by: Girish Moodalbail
    Tested-by: Fengguang Wu
    Signed-off-by: David S. Miller

    Cong Wang
     

04 Nov, 2017

2 commits

  • Some time ago Eric Dumazet suggested a "hack the IFF_XMIT_DST_RELEASE
    flag on the vlan netdev". But the last comment was "does not support
    properly bonding/team.(If the real_dev->privflags IFF_XMIT_DST_RELEASE
    bit changes, we want to update all the vlans at the same time )"

    I've extended that patch to support changes of IFF_XMIT_DST_RELEASE in
    bonding/team.
    Both bonding and team call netdev_change_features() after recalculation
    of features including priv_flags IFF_XMIT_DST_RELEASE bit. So the only
    thing needed to support is to recheck this bit in
    vlan_transfer_features().

    Suggested-by: Eric Dumazet
    Signed-off-by: Vadim Fedorenko
    Reviewed-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Vadim Fedorenko
     
  • Files removed in 'net-next' had their license header updated
    in 'net'. We take the remove from 'net-next'.

    Signed-off-by: David S. Miller

    David S. Miller
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

06 Oct, 2017

1 commit


05 Oct, 2017

2 commits

  • Add extack arg to netdev_upper_dev_link and netdev_master_upper_dev_link

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • If the vlan is down, free the packet instead of proceeding with other
    processing, or counting it as received. If vlan interfaces are used
    as slaves for bonding, with arp monitoring for connectivity, if the rx
    counter is seen to be incrementing, then the bond device will not
    observe that the interface is down.

    CC: David S. Miller
    Signed-off-by: Vishakha Narvekar
    Signed-off-by: David S. Miller

    Vishakha Narvekar
     

27 Jun, 2017

3 commits


22 Jun, 2017

1 commit


20 Jun, 2017

1 commit

  • The register_vlan_device would invoke free_netdev directly, when
    register_vlan_dev failed. It would trigger the BUG_ON in free_netdev
    if the dev was already registered. In this case, the netdev would be
    freed in netdev_run_todo later.

    So add one condition check now. Only when dev is not registered, then
    free it directly.

    The following is the part coredump when netdev_upper_dev_link failed
    in register_vlan_dev. I removed the lines which are too long.

    [ 411.237457] ------------[ cut here ]------------
    [ 411.237458] kernel BUG at net/core/dev.c:7998!
    [ 411.237484] invalid opcode: 0000 [#1] SMP
    [ 411.237705] [last unloaded: 8021q]
    [ 411.237718] CPU: 1 PID: 12845 Comm: vconfig Tainted: G E 4.12.0-rc5+ #6
    [ 411.237737] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
    [ 411.237764] task: ffff9cbeb6685580 task.stack: ffffa7d2807d8000
    [ 411.237782] RIP: 0010:free_netdev+0x116/0x120
    [ 411.237794] RSP: 0018:ffffa7d2807dbdb0 EFLAGS: 00010297
    [ 411.237808] RAX: 0000000000000002 RBX: ffff9cbeb6ba8fd8 RCX: 0000000000001878
    [ 411.237826] RDX: 0000000000000001 RSI: 0000000000000282 RDI: 0000000000000000
    [ 411.237844] RBP: ffffa7d2807dbdc8 R08: 0002986100029841 R09: 0002982100029801
    [ 411.237861] R10: 0004000100029980 R11: 0004000100029980 R12: ffff9cbeb6ba9000
    [ 411.238761] R13: ffff9cbeb6ba9060 R14: ffff9cbe60f1a000 R15: ffff9cbeb6ba9000
    [ 411.239518] FS: 00007fb690d81700(0000) GS:ffff9cbebb640000(0000) knlGS:0000000000000000
    [ 411.239949] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 411.240454] CR2: 00007f7115624000 CR3: 0000000077cdf000 CR4: 00000000003406e0
    [ 411.240936] Call Trace:
    [ 411.241462] vlan_ioctl_handler+0x3f1/0x400 [8021q]
    [ 411.241910] sock_ioctl+0x18b/0x2c0
    [ 411.242394] do_vfs_ioctl+0xa1/0x5d0
    [ 411.242853] ? sock_alloc_file+0xa6/0x130
    [ 411.243465] SyS_ioctl+0x79/0x90
    [ 411.243900] entry_SYSCALL_64_fastpath+0x1e/0xa9
    [ 411.244425] RIP: 0033:0x7fb69089a357
    [ 411.244863] RSP: 002b:00007ffcd04e0fc8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
    [ 411.245445] RAX: ffffffffffffffda RBX: 00007ffcd04e2884 RCX: 00007fb69089a357
    [ 411.245903] RDX: 00007ffcd04e0fd0 RSI: 0000000000008983 RDI: 0000000000000003
    [ 411.246527] RBP: 00007ffcd04e0fd0 R08: 0000000000000000 R09: 1999999999999999
    [ 411.246976] R10: 000000000000053f R11: 0000000000000202 R12: 0000000000000004
    [ 411.247414] R13: 00007ffcd04e1128 R14: 00007ffcd04e2888 R15: 0000000000000001
    [ 411.249129] RIP: free_netdev+0x116/0x120 RSP: ffffa7d2807dbdb0

    Signed-off-by: Gao Feng
    Signed-off-by: David S. Miller

    Gao Feng
     

16 Jun, 2017

1 commit

  • It seems like a historic accident that these return unsigned char *,
    and in many places that means casts are required, more often than not.

    Make these functions return void * and remove all the casts across
    the tree, adding a (u8 *) cast only where the unsigned char pointer
    was used directly, all done with the following spatch:

    @@
    expression SKB, LEN;
    typedef u8;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    @@
    - *(fn(SKB, LEN))
    + *(u8 *)fn(SKB, LEN)

    @@
    expression E, SKB, LEN;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    type T;
    @@
    - E = ((T *)(fn(SKB, LEN)))
    + E = fn(SKB, LEN)

    @@
    expression SKB, LEN;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    @@
    - fn(SKB, LEN)[0]
    + *(u8 *)fn(SKB, LEN)

    Note that the last part there converts from push(...)[0] to the
    more idiomatic *(u8 *)push(...).

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

15 Jun, 2017

1 commit


09 Jun, 2017

1 commit

  • Remove support for bridge bypass ndos from stacked devices. At this point
    no driver which supports stack device behavior offload supports operation
    with SELF flag. The case for upper device is already taken care of in both
    of the following cases:

    1. FDB add/del - driver should check at the notification cb if the
    stacked device contains his ports.

    2. Port attribute - calls switchdev code directly which checks
    for case of stack device.

    Signed-off-by: Arkadi Sharshevsky
    Reviewed-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Arkadi Sharshevsky
     

08 Jun, 2017

1 commit

  • Network devices can allocate reasources and private memory using
    netdev_ops->ndo_init(). However, the release of these resources
    can occur in one of two different places.

    Either netdev_ops->ndo_uninit() or netdev->destructor().

    The decision of which operation frees the resources depends upon
    whether it is necessary for all netdev refs to be released before it
    is safe to perform the freeing.

    netdev_ops->ndo_uninit() presumably can occur right after the
    NETDEV_UNREGISTER notifier completes and the unicast and multicast
    address lists are flushed.

    netdev->destructor(), on the other hand, does not run until the
    netdev references all go away.

    Further complicating the situation is that netdev->destructor()
    almost universally does also a free_netdev().

    This creates a problem for the logic in register_netdevice().
    Because all callers of register_netdevice() manage the freeing
    of the netdev, and invoke free_netdev(dev) if register_netdevice()
    fails.

    If netdev_ops->ndo_init() succeeds, but something else fails inside
    of register_netdevice(), it does call ndo_ops->ndo_uninit(). But
    it is not able to invoke netdev->destructor().

    This is because netdev->destructor() will do a free_netdev() and
    then the caller of register_netdevice() will do the same.

    However, this means that the resources that would normally be released
    by netdev->destructor() will not be.

    Over the years drivers have added local hacks to deal with this, by
    invoking their destructor parts by hand when register_netdevice()
    fails.

    Many drivers do not try to deal with this, and instead we have leaks.

    Let's close this hole by formalizing the distinction between what
    private things need to be freed up by netdev->destructor() and whether
    the driver needs unregister_netdevice() to perform the free_netdev().

    netdev->priv_destructor() performs all actions to free up the private
    resources that used to be freed by netdev->destructor(), except for
    free_netdev().

    netdev->needs_free_netdev is a boolean that indicates whether
    free_netdev() should be done at the end of unregister_netdevice().

    Now, register_netdevice() can sanely release all resources after
    ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
    and netdev->priv_destructor().

    And at the end of unregister_netdevice(), we invoke
    netdev->priv_destructor() and optionally call free_netdev().

    Signed-off-by: David S. Miller

    David S. Miller
     

09 May, 2017

1 commit

  • Vlan devices, like all other software devices, enable
    NETIF_F_HW_CSUM feature. However, unlike all the othe other
    software devices, vlans will switch to using IP|IPV6_CSUM
    features, if the underlying devices uses them. In these situations,
    checksum offload features on the vlan device can't be controlled
    via ethtool.

    This patch makes vlans keep HW_CSUM feature if the underlying
    device supports checksum offloading. This makes vlan devices
    behave like other software devices, and restores control to the
    user.

    A side-effect is that some offload settings (typically UFO)
    may be enabled on the vlan device while being disabled on the HW.
    However, the GSO code will correctly process the packets. This
    actually results in slightly better raw throughput.

    Signed-off-by: Vladislav Yasevich
    Acked-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Vlad Yasevich
     

14 Apr, 2017

1 commit


22 Mar, 2017

1 commit

  • wanted_features is a set of features which have to be enabled if a
    hardware allows that.

    Currently when a vlan device is created, its wanted_features is set to
    current features of its base device.

    The problem is that the base device can get new features and they are
    not propagated to vlan-s of this device.

    If we look at bonding devices, they doesn't have this problem and this
    patch suggests to fix this issue by the same way how it works for bonding
    devices.

    We meet this problem, when we try to create a vlan device over a bonding
    device. When a system are booting, real devices require time to be
    initialized, so bonding devices created without slaves, then vlan
    devices are created and only then ethernet devices are added to the
    bonding device. As a result we have vlan devices with disabled
    scatter-gather.

    * create a bonding device
    $ ip link add bond0 type bond
    $ ethtool -k bond0 | grep scatter
    scatter-gather: off
    tx-scatter-gather: off [requested on]
    tx-scatter-gather-fraglist: off [requested on]

    * create a vlan device
    $ ip link add link bond0 name bond0.10 type vlan id 10
    $ ethtool -k bond0.10 | grep scatter
    scatter-gather: off
    tx-scatter-gather: off
    tx-scatter-gather-fraglist: off

    * Add a slave device to bond0
    $ ip link set dev eth0 master bond0

    And now we can see that the bond0 device has got the scatter-gather
    feature, but the bond0.10 hasn't got it.
    [root@laptop linux-task-diag]# ethtool -k bond0 | grep scatter
    scatter-gather: on
    tx-scatter-gather: on
    tx-scatter-gather-fraglist: on
    [root@laptop linux-task-diag]# ethtool -k bond0.10 | grep scatter
    scatter-gather: off
    tx-scatter-gather: off
    tx-scatter-gather-fraglist: off

    With this patch the vlan device will get all new features from the
    bonding device.

    Here is a call trace how features which are set in this patch reach
    dev->wanted_features.

    register_netdevice
    vlan_dev_init
    ...
    dev->hw_features = NETIF_F_HW_CSUM | NETIF_F_SG |
    NETIF_F_FRAGLIST | NETIF_F_GSO_SOFTWARE |
    NETIF_F_HIGHDMA | NETIF_F_SCTP_CRC |
    NETIF_F_ALL_FCOE;

    dev->features |= dev->hw_features;
    ...
    dev->wanted_features = dev->features & dev->hw_features;
    __netdev_update_features(dev);
    vlan_dev_fix_features
    ...

    Cc: Alexey Kuznetsov
    Cc: Patrick McHardy
    Cc: "David S. Miller"
    Signed-off-by: Andrei Vagin
    Signed-off-by: David S. Miller

    Andrey Vagin
     

07 Feb, 2017

1 commit

  • In commit 18bfb924f000 ("net: introduce default neigh_construct/destroy
    ndo calls for L2 upper devices") we added these ndos to stacked devices
    such as team and bond, so that calls will be propagated to mlxsw.

    However, previous commit removed the reliance on these ndos and no new
    users of these ndos have appeared since above mentioned commit. We can
    therefore safely remove this dead code.

    Signed-off-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Ido Schimmel