21 May, 2019

2 commits

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version this program is distributed in the
    hope that it will be useful but without any warranty without even
    the implied warranty of merchantability or fitness for a particular
    purpose see the gnu general public license for more details you
    should have received a copy of the gnu general public license along
    with this program if not see http www gnu org licenses the full gnu
    general public license is included in this distribution in the file
    called license

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 3 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Steve Winslow
    Reviewed-by: Kate Stewart
    Reviewed-by: Jilayne Lovejoy
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190519154041.052102771@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • Add SPDX license identifiers to all Make/Kconfig files which:

    - Have no license information of any form

    These files fall under the project license, GPL v2 only. The resulting SPDX
    license identifier is:

    GPL-2.0-only

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

14 May, 2019

1 commit

  • There's currently a problem with toggling arp_validate on and off with an
    active-backup bond. At the moment, you can start up a bond, like so:

    modprobe bonding mode=1 arp_interval=100 arp_validate=0 arp_ip_targets=192.168.1.1
    ip link set bond0 down
    echo "ens4f0" > /sys/class/net/bond0/bonding/slaves
    echo "ens4f1" > /sys/class/net/bond0/bonding/slaves
    ip link set bond0 up
    ip addr add 192.168.1.2/24 dev bond0

    Pings to 192.168.1.1 work just fine. Now turn on arp_validate:

    echo 1 > /sys/class/net/bond0/bonding/arp_validate

    Pings to 192.168.1.1 continue to work just fine. Now when you go to turn
    arp_validate off again, the link falls flat on it's face:

    echo 0 > /sys/class/net/bond0/bonding/arp_validate
    dmesg
    ...
    [133191.911987] bond0: Setting arp_validate to none (0)
    [133194.257793] bond0: bond_should_notify_peers: slave ens4f0
    [133194.258031] bond0: link status definitely down for interface ens4f0, disabling it
    [133194.259000] bond0: making interface ens4f1 the new active one
    [133197.330130] bond0: link status definitely down for interface ens4f1, disabling it
    [133197.331191] bond0: now running without any active interface!

    The problem lies in bond_options.c, where passing in arp_validate=0
    results in bond->recv_probe getting set to NULL. This flies directly in
    the face of commit 3fe68df97c7f, which says we need to set recv_probe =
    bond_arp_recv, even if we're not using arp_validate. Said commit fixed
    this in bond_option_arp_interval_set, but missed that we can get to that
    same state in bond_option_arp_validate_set as well.

    One solution would be to universally set recv_probe = bond_arp_recv here
    as well, but I don't think bond_option_arp_validate_set has any business
    touching recv_probe at all, and that should be left to the arp_interval
    code, so we can just make things much tidier here.

    Fixes: 3fe68df97c7f ("bonding: always set recv_probe to bond_arp_rcv in arp monitor")
    CC: Jay Vosburgh
    CC: Veaceslav Falico
    CC: Andy Gospodarek
    CC: "David S. Miller"
    CC: netdev@vger.kernel.org
    Signed-off-by: Jarod Wilson
    Signed-off-by: Jay Vosburgh
    Signed-off-by: David S. Miller

    Jarod Wilson
     

28 Apr, 2019

1 commit

  • Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
    netlink based interfaces (including recently added ones) are still not
    setting it in kernel generated messages. Without the flag, message parsers
    not aware of attribute semantics (e.g. wireshark dissector or libmnl's
    mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
    the structure of their contents.

    Unfortunately we cannot just add the flag everywhere as there may be
    userspace applications which check nlattr::nla_type directly rather than
    through a helper masking out the flags. Therefore the patch renames
    nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
    as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
    are rewritten to use nla_nest_start().

    Except for changes in include/net/netlink.h, the patch was generated using
    this semantic patch:

    @@ expression E1, E2; @@
    -nla_nest_start(E1, E2)
    +nla_nest_start_noflag(E1, E2)

    @@ expression E1, E2; @@
    -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
    +nla_nest_start(E1, E2)

    Signed-off-by: Michal Kubecek
    Acked-by: Jiri Pirko
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Michal Kubecek
     

18 Apr, 2019

1 commit


16 Apr, 2019

1 commit

  • When a bond is enslaved to another bond, bond_netdev_event() only
    handles the event as if the bond is a master, and skips treating the
    bond as a slave.

    This leads to a refcount leak on the slave, since we don't remove the
    adjacency to its master and the master holds a reference on the slave.

    Reproducer:
    ip link add bondL type bond
    ip link add bondU type bond
    ip link set bondL master bondU
    ip link del bondL

    No "Fixes:" tag, this code is older than git history.

    Signed-off-by: Sabrina Dubroca
    Signed-off-by: David S. Miller

    Sabrina Dubroca
     

06 Apr, 2019

1 commit


30 Mar, 2019

1 commit

  • Bond expects ethernet hwaddr for its slave, but it can be longer than 6
    bytes - infiniband interface for example.

    # cat /sys/devices//net/ib0/address
    80:00:02:08:fe:80:00:00:00:00:00:00:7c:fe:90:03:00:be:5d:e1

    # cat /sys/devices//net/ib0/bonding_slave/perm_hwaddr
    80:00:02:08:fe:80

    So print full hwaddr in sysfs "bonding_slave/perm_hwaddr" as well.

    Signed-off-by: Konstantin Khorenko
    Signed-off-by: David S. Miller

    Konstantin Khorenko
     

21 Mar, 2019

1 commit

  • After the previous patch, all the callers of ndo_select_queue()
    provide as a 'fallback' argument netdev_pick_tx.
    The only exceptions are nested calls to ndo_select_queue(),
    which pass down the 'fallback' available in the current scope
    - still netdev_pick_tx.

    We can drop such argument and replace fallback() invocation with
    netdev_pick_tx(). This avoids an indirect call per xmit packet
    in some scenarios (TCP syn, UDP unconnected, XDP generic, pktgen)
    with device drivers implementing such ndo. It also clean the code
    a bit.

    Tested with ixgbe and CONFIG_FCOE=m

    With pktgen using queue xmit:
    threads vanilla patched
    (kpps) (kpps)
    1 2334 2428
    2 4166 4278
    4 7895 8100

    v1 -> v2:
    - rebased after helper's name change

    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     

25 Feb, 2019

2 commits

  • This is no longer necessary after eca59f691566 ("net: Remove support for bridge bypass ndos from stacked devices")

    Suggested-by: Ido Schimmel
    Signed-off-by: Florian Fainelli
    Reviewed-by: Andy Gospodarek
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • Three conflicts, one of which, for marvell10g.c is non-trivial and
    requires some follow-up from Heiner or someone else.

    The issue is that Heiner converted the marvell10g driver over to
    use the generic c45 code as much as possible.

    However, in 'net' a bug fix appeared which makes sure that a new
    local mask (MDIO_AN_10GBT_CTRL_ADV_NBT_MASK) with value 0x01e0
    is cleared.

    Signed-off-by: David S. Miller

    David S. Miller
     

22 Feb, 2019

1 commit

  • This patch fixes a subtle PACKET_ORIGDEV regression which was a side
    effect of fixes introduced by:

    6a9e461f6fe4 bonding: pass link-local packets to bonding master also.

    ... to:

    b89f04c61efe bonding: deliver link-local packets with skb->dev set to link that packets arrived on

    While 6a9e461f6fe4 restored pre-b89f04c61efe presence of link-local
    packets on bonding masters (which is required e.g. by linux bridges
    participating in spanning tree or needed for lab-like setups created
    with group_fwd_mask) it also caused the originating device
    information to be lost due to cloning.

    Maciej Żenczykowski proposed another solution that doesn't require
    packet cloning and retains original device information - instead of
    returning RX_HANDLER_PASS for all link-local packets it's now limited
    only to packets from inactive slaves.

    At the same time, packets passed to bonding masters retain correct
    information about the originating device and PACKET_ORIGDEV can be used
    to determine it.

    This elegantly solves all issues so far:

    - link-local packets that were removed from bonding masters
    - LLDP daemons being forced to explicitly bind to slave interfaces
    - PACKET_ORIGDEV having no effect on bond interfaces

    Fixes: 6a9e461f6fe4 (bonding: pass link-local packets to bonding master also.)
    Reported-by: Vincent Bernat
    Signed-off-by: Michal Soltys
    Signed-off-by: Maciej Żenczykowski
    Signed-off-by: David S. Miller

    Michal Soltys
     

15 Feb, 2019

1 commit

  • This patch is a little improvement. If user use the
    command shown as below, we should print the info [1]
    instead of [2]. The eth0 exists actually, and it may
    confuse user.

    $ echo "eth0" > /sys/class/net/bond4/bonding/slaves

    [1] "bond4: no command found in slaves file - use +ifname or -ifname"
    [2] "write error: No such device"

    Signed-off-by: Tonghao Zhang
    Signed-off-by: David S. Miller

    Tonghao Zhang
     

25 Jan, 2019

1 commit

  • I made a dumb mistake when I summed up the slave stats, obviously slaves
    can come and go which would make the master stats unreliable.
    Count and export the master stats separately.

    Fixes: a258aeacd7f0 ("bonding: add support for xstats and export 3ad stats")
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

23 Jan, 2019

4 commits

  • This patch adds support for extended statistics (xstats) call to the
    bonding. The first user would be the 3ad code which counts the following
    events:
    - LACPDU Rx/Tx
    - LACPDU unknown type Rx
    - LACPDU illegal Rx
    - Marker Rx/Tx
    - Marker response Rx/Tx
    - Marker unknown type Rx

    All of these are exported via netlink as separate attributes to be
    easily extensible as we plan to add more in the future.
    Similar to how the bridge and other xstats exports, the structure
    inside is:
    [ IFLA_STATS_LINK_XSTATS ]
    -> [ LINK_XSTATS_TYPE_BOND ]
    -> [ BOND_XSTATS_3AD ]
    -> [ 3ad stats attributes ]

    With this structure it's easy to add more stat types later.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • Count the following types of 3ad packets per slave:
    - rx/tx lacpdu
    - rx/tx marker
    - rx/tx marker response
    - rx illegal lacpdus (right now counted on wrong length)
    - rx unknown lacpdu type
    - rx unknown marker type

    The counters are using atomic64 since this is not fast path.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • Since the received lacpdu is accessed via skb_header_pointer() in
    bond_3ad_lacpdu_recv() we no longer need to check for skb->len's length.
    If the returned lacpdu pointer is not null that should be enough.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • No functional changes, adjust the style of bond_3ad_rx_indication to
    prepare it for the stats changes:
    - reduce indentation by returning early on wrong length
    - remove extra new lines between switch cases
    - add marker local variable and use it to reduce line length
    - rearrange local variables in reverse xmas tree
    - separate final return

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

11 Jan, 2019

1 commit

  • A network device stack with multiple layers of bonding devices can
    trigger a false positive lockdep warning. Adding lockdep nest levels
    fixes this. Update the level on both enslave and unlink, to avoid the
    following series of events ..

    ip netns add test
    ip netns exec test bash
    ip link set dev lo addr 00:11:22:33:44:55
    ip link set dev lo down

    ip link add dev bond1 type bond
    ip link add dev bond2 type bond

    ip link set dev lo master bond1
    ip link set dev bond1 master bond2

    ip link set dev bond1 nomaster
    ip link set dev bond2 master bond1

    .. from still generating a splat:

    [ 193.652127] ======================================================
    [ 193.658231] WARNING: possible circular locking dependency detected
    [ 193.664350] 4.20.0 #8 Not tainted
    [ 193.668310] ------------------------------------------------------
    [ 193.674417] ip/15577 is trying to acquire lock:
    [ 193.678897] 00000000a40e3b69 (&(&bond->stats_lock)->rlock#3/3){+.+.}, at: bond_get_stats+0x58/0x290
    [ 193.687851]
    but task is already holding lock:
    [ 193.693625] 00000000807b9d9f (&(&bond->stats_lock)->rlock#2/2){+.+.}, at: bond_get_stats+0x58/0x290

    [..]

    [ 193.851092] lock_acquire+0xa7/0x190
    [ 193.855138] _raw_spin_lock_nested+0x2d/0x40
    [ 193.859878] bond_get_stats+0x58/0x290
    [ 193.864093] dev_get_stats+0x5a/0xc0
    [ 193.868140] bond_get_stats+0x105/0x290
    [ 193.872444] dev_get_stats+0x5a/0xc0
    [ 193.876493] rtnl_fill_stats+0x40/0x130
    [ 193.880797] rtnl_fill_ifinfo+0x6c5/0xdc0
    [ 193.885271] rtmsg_ifinfo_build_skb+0x86/0xe0
    [ 193.890091] rtnetlink_event+0x5b/0xa0
    [ 193.894320] raw_notifier_call_chain+0x43/0x60
    [ 193.899225] netdev_change_features+0x50/0xa0
    [ 193.904044] bond_compute_features.isra.46+0x1ab/0x270
    [ 193.909640] bond_enslave+0x141d/0x15b0
    [ 193.913946] do_set_master+0x89/0xa0
    [ 193.918016] do_setlink+0x37c/0xda0
    [ 193.921980] __rtnl_newlink+0x499/0x890
    [ 193.926281] rtnl_newlink+0x48/0x70
    [ 193.930238] rtnetlink_rcv_msg+0x171/0x4b0
    [ 193.934801] netlink_rcv_skb+0xd1/0x110
    [ 193.939103] rtnetlink_rcv+0x15/0x20
    [ 193.943151] netlink_unicast+0x3b5/0x520
    [ 193.947544] netlink_sendmsg+0x2fd/0x3f0
    [ 193.951942] sock_sendmsg+0x38/0x50
    [ 193.955899] ___sys_sendmsg+0x2ba/0x2d0
    [ 193.960205] __x64_sys_sendmsg+0xad/0x100
    [ 193.964687] do_syscall_64+0x5a/0x460
    [ 193.968823] entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Fixes: 7e2556e40026 ("bonding: avoid lockdep confusion in bond_get_stats()")
    Reported-by: syzbot
    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

19 Dec, 2018

1 commit


14 Dec, 2018

3 commits

  • Give interested parties an opportunity to veto an impending HW address
    change.

    Signed-off-by: Petr Machata
    Acked-by: Jiri Pirko
    Reviewed-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Petr Machata
     
  • Before NETDEV_CHANGEADDR, bond driver should emit NETDEV_PRE_CHANGEADDR,
    and allow consumers to veto the address change. To propagate further the
    return code from NETDEV_PRE_CHANGEADDR, give the function that
    implements address change a return value.

    Signed-off-by: Petr Machata
    Acked-by: Jiri Pirko
    Reviewed-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Petr Machata
     
  • A follow-up patch will add a notifier type NETDEV_PRE_CHANGEADDR, which
    allows vetoing of MAC address changes. One prominent path to that
    notification is through dev_set_mac_address(). Therefore give this
    function an extack argument, so that it can be packed together with the
    notification. Thus a textual reason for rejection (or a warning) can be
    communicated back to the user.

    Signed-off-by: Petr Machata
    Acked-by: Jiri Pirko
    Reviewed-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Petr Machata
     

11 Dec, 2018

1 commit


10 Dec, 2018

1 commit

  • Several conflicts, seemingly all over the place.

    I used Stephen Rothwell's sample resolutions for many of these, if not
    just to double check my own work, so definitely the credit largely
    goes to him.

    The NFP conflict consisted of a bug fix (moving operations
    past the rhashtable operation) while chaning the initial
    argument in the function call in the moved code.

    The net/dsa/master.c conflict had to do with a bug fix intermixing of
    making dsa_master_set_mtu() static with the fixing of the tagging
    attribute location.

    cls_flower had a conflict because the dup reject fix from Or
    overlapped with the addition of port range classifiction.

    __set_phy_supported()'s conflict was relatively easy to resolve
    because Andrew fixed it in both trees, so it was just a matter
    of taking the net-next copy. Or at least I think it was :-)

    Joe Stringer's fix to the handling of netns id 0 in bpf_sk_lookup()
    intermixed with changes on how the sdif and caller_net are calculated
    in these code paths in net-next.

    The remaining BPF conflicts were largely about the addition of the
    __bpf_md_ptr stuff in 'net' overlapping with adjustments and additions
    to the relevant data structure where the MD pointer macros are used.

    Signed-off-by: David S. Miller

    David S. Miller
     

07 Dec, 2018

1 commit

  • In order to pass extack together with NETDEV_PRE_UP notifications, it's
    necessary to route the extack to __dev_open() from diverse (possibly
    indirect) callers. One prominent API through which the notification is
    invoked is dev_open().

    Therefore extend dev_open() with and extra extack argument and update
    all users. Most of the calls end up just encoding NULL, but bond and
    team drivers have the extack readily available.

    Signed-off-by: Petr Machata
    Acked-by: Jiri Pirko
    Reviewed-by: Ido Schimmel
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Petr Machata
     

01 Dec, 2018

1 commit

  • Previously when unbinding a slave the 802.3ad implementation only told
    partner that the port is not suitable for aggregation by setting the port
    aggregation state from aggregatable to individual. This is not enough. If the
    physical layer still stays up and we only unbinded this port from the bond there
    is nothing in the aggregation status alone to prevent the partner from sending
    traffic towards us. To ensure that the partner doesn't consider this
    port at all anymore we should also disable collecting and distributing to
    signal that this actor is going away. Also clear AD_STATE_SYNCHRONIZATION to
    ensure partner exits collecting + distributing state.

    I have tested this behaviour againts Arista EOS switches with mlx5 cards
    (physical link stays up even when interface is down) and simulated
    the same situation virtually Linux Linux with two network namespaces
    running two veth device pairs. In both cases setting aggregation to
    individual doesn't alone prevent traffic from being to sent towards this
    port given that the link stays up in partners end. Partner still keeps
    it's end in collecting + distributing state and continues until timeout is
    reached. In most cases this means we are losing the traffic partner sends
    towards our port while we wait for timeout. This is most visible with slow
    periodic time (LACP rate slow).

    Other open source implementations like Open VSwitch and libreswitch, and
    vendor implementations like Arista EOS, seem to disable collecting +
    distributing to when doing similar port disabling/detaching/removing change.
    With this patch kernel implementation would behave the same way and ensure
    partner doesn't consider our actor viable anymore.

    Signed-off-by: Toni Peltonen
    Signed-off-by: Jay Vosburgh
    Acked-by: Jonathan Toppins
    Signed-off-by: David S. Miller

    Toni Peltonen
     

05 Nov, 2018

1 commit

  • Commit 4d2c0cda07448ea6980f00102dc3964eb25e241c set slave->link to
    BOND_LINK_DOWN for 802.3ad bonds whenever invalid speed/duplex values
    were read, to fix a problem with slaves getting into weird states, but
    in the process, broke tracking of link failures, as going straight to
    BOND_LINK_DOWN when a link is indeed down (cable pulled, switch rebooted)
    means we broke out of bond_miimon_inspect()'s BOND_LINK_DOWN case because
    !link_state was already true, we never incremented commit, and never got
    a chance to call bond_miimon_commit(), where slave->link_failure_count
    would be incremented. I believe the simple fix here is to mark the slave
    as BOND_LINK_FAIL, and let bond_miimon_inspect() transition the link from
    _FAIL to either _UP or _DOWN, and in the latter case, we now get proper
    incrementing of link_failure_count again.

    Fixes: 4d2c0cda0744 ("bonding: speed/duplex update at NETDEV_UP event")
    CC: Mahesh Bandewar
    CC: David S. Miller
    CC: netdev@vger.kernel.org
    CC: stable@vger.kernel.org
    Signed-off-by: Jarod Wilson
    Signed-off-by: David S. Miller

    Jarod Wilson
     

30 Oct, 2018

1 commit

  • The attribute IFLA_BOND_AD_ACTOR_SYSTEM is sent to user space having the
    length of sizeof(bond->params.ad_actor_system) which is 8 byte. This
    patch aligns the length to ETH_ALEN to have the same MAC address exposed
    as using sysfs.

    Fixes: f87fda00b6ed2 ("bonding: prevent out of bound accesses")
    Signed-off-by: Tobias Jungel
    Signed-off-by: David S. Miller

    Tobias Jungel
     

20 Oct, 2018

1 commit

  • This fixes a problem introduced by:
    commit 2cde6acd49da ("netpoll: Fix __netpoll_rcu_free so that it can hold the rtnl lock")

    When using netconsole on a bond, __netpoll_cleanup can asynchronously
    recurse multiple times, each __netpoll_free_async call can result in
    more __netpoll_free_async's. This means there is now a race between
    cleanup_work queues on multiple netpoll_info's on multiple devices and
    the configuration of a new netpoll. For example if a netconsole is set
    to enable 0, reconfigured, and enable 1 immediately, this netconsole
    will likely not work.

    Given the reason for __netpoll_free_async is it can be called when rtnl
    is not locked, if it is locked, we should be able to execute
    synchronously. It appears to be locked everywhere it's called from.

    Generalize the design pattern from the teaming driver for current
    callers of __netpoll_free_async.

    CC: Neil Horman
    CC: "David S. Miller"
    Signed-off-by: Debabrata Banerjee
    Signed-off-by: David S. Miller

    Debabrata Banerjee
     

03 Oct, 2018

1 commit

  • RX queue config for bonding master could be different from its slave
    device(s). With the commit 6a9e461f6fe4 ("bonding: pass link-local
    packets to bonding master also."), the packet is reinjected into stack
    with skb->dev as bonding master. This potentially triggers the
    message:

    "bondX received packet on queue Y, but number of RX queues is Z"

    whenever the queue that packet is received on is higher than the
    numrxqueues on bonding master (Y > Z).

    Fixes: 6a9e461f6fe4 ("bonding: pass link-local packets to bonding master also.")
    Reported-by: John Sperbeck
    Signed-off-by: Eric Dumazet
    Signed-off-by: Mahesh Bandewar
    Signed-off-by: David S. Miller

    Mahesh Bandewar
     

27 Sep, 2018

2 commits

  • Syzkaller reported this on a slightly older kernel but it's still
    applicable to the current kernel -

    ======================================================
    WARNING: possible circular locking dependency detected
    4.18.0-next-20180823+ #46 Not tainted
    ------------------------------------------------------
    syz-executor4/26841 is trying to acquire lock:
    00000000dd41ef48 ((wq_completion)bond_dev->name){+.+.}, at: flush_workqueue+0x2db/0x1e10 kernel/workqueue.c:2652

    but task is already holding lock:
    00000000768ab431 (rtnl_mutex){+.+.}, at: rtnl_lock net/core/rtnetlink.c:77 [inline]
    00000000768ab431 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x412/0xc30 net/core/rtnetlink.c:4708

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #2 (rtnl_mutex){+.+.}:
    __mutex_lock_common kernel/locking/mutex.c:925 [inline]
    __mutex_lock+0x171/0x1700 kernel/locking/mutex.c:1073
    mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1088
    rtnl_lock+0x17/0x20 net/core/rtnetlink.c:77
    bond_netdev_notify drivers/net/bonding/bond_main.c:1310 [inline]
    bond_netdev_notify_work+0x44/0xd0 drivers/net/bonding/bond_main.c:1320
    process_one_work+0xc73/0x1aa0 kernel/workqueue.c:2153
    worker_thread+0x189/0x13c0 kernel/workqueue.c:2296
    kthread+0x35a/0x420 kernel/kthread.c:246
    ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:415

    -> #1 ((work_completion)(&(&nnw->work)->work)){+.+.}:
    process_one_work+0xc0b/0x1aa0 kernel/workqueue.c:2129
    worker_thread+0x189/0x13c0 kernel/workqueue.c:2296
    kthread+0x35a/0x420 kernel/kthread.c:246
    ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:415

    -> #0 ((wq_completion)bond_dev->name){+.+.}:
    lock_acquire+0x1e4/0x4f0 kernel/locking/lockdep.c:3901
    flush_workqueue+0x30a/0x1e10 kernel/workqueue.c:2655
    drain_workqueue+0x2a9/0x640 kernel/workqueue.c:2820
    destroy_workqueue+0xc6/0x9d0 kernel/workqueue.c:4155
    __alloc_workqueue_key+0xef9/0x1190 kernel/workqueue.c:4138
    bond_init+0x269/0x940 drivers/net/bonding/bond_main.c:4734
    register_netdevice+0x337/0x1100 net/core/dev.c:8410
    bond_newlink+0x49/0xa0 drivers/net/bonding/bond_netlink.c:453
    rtnl_newlink+0xef4/0x1d50 net/core/rtnetlink.c:3099
    rtnetlink_rcv_msg+0x46e/0xc30 net/core/rtnetlink.c:4711
    netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2454
    rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4729
    netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
    netlink_unicast+0x5a0/0x760 net/netlink/af_netlink.c:1343
    netlink_sendmsg+0xa18/0xfc0 net/netlink/af_netlink.c:1908
    sock_sendmsg_nosec net/socket.c:622 [inline]
    sock_sendmsg+0xd5/0x120 net/socket.c:632
    ___sys_sendmsg+0x7fd/0x930 net/socket.c:2115
    __sys_sendmsg+0x11d/0x290 net/socket.c:2153
    __do_sys_sendmsg net/socket.c:2162 [inline]
    __se_sys_sendmsg net/socket.c:2160 [inline]
    __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2160
    do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    other info that might help us debug this:

    Chain exists of:
    (wq_completion)bond_dev->name --> (work_completion)(&(&nnw->work)->work) --> rtnl_mutex

    Possible unsafe locking scenario:

    CPU0 CPU1
    ---- ----
    lock(rtnl_mutex);
    lock((work_completion)(&(&nnw->work)->work));
    lock(rtnl_mutex);
    lock((wq_completion)bond_dev->name);

    *** DEADLOCK ***

    1 lock held by syz-executor4/26841:

    stack backtrace:
    CPU: 1 PID: 26841 Comm: syz-executor4 Not tainted 4.18.0-next-20180823+ #46
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
    print_circular_bug.isra.34.cold.55+0x1bd/0x27d kernel/locking/lockdep.c:1222
    check_prev_add kernel/locking/lockdep.c:1862 [inline]
    check_prevs_add kernel/locking/lockdep.c:1975 [inline]
    validate_chain kernel/locking/lockdep.c:2416 [inline]
    __lock_acquire+0x3449/0x5020 kernel/locking/lockdep.c:3412
    lock_acquire+0x1e4/0x4f0 kernel/locking/lockdep.c:3901
    flush_workqueue+0x30a/0x1e10 kernel/workqueue.c:2655
    drain_workqueue+0x2a9/0x640 kernel/workqueue.c:2820
    destroy_workqueue+0xc6/0x9d0 kernel/workqueue.c:4155
    __alloc_workqueue_key+0xef9/0x1190 kernel/workqueue.c:4138
    bond_init+0x269/0x940 drivers/net/bonding/bond_main.c:4734
    register_netdevice+0x337/0x1100 net/core/dev.c:8410
    bond_newlink+0x49/0xa0 drivers/net/bonding/bond_netlink.c:453
    rtnl_newlink+0xef4/0x1d50 net/core/rtnetlink.c:3099
    rtnetlink_rcv_msg+0x46e/0xc30 net/core/rtnetlink.c:4711
    netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2454
    rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4729
    netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
    netlink_unicast+0x5a0/0x760 net/netlink/af_netlink.c:1343
    netlink_sendmsg+0xa18/0xfc0 net/netlink/af_netlink.c:1908
    sock_sendmsg_nosec net/socket.c:622 [inline]
    sock_sendmsg+0xd5/0x120 net/socket.c:632
    ___sys_sendmsg+0x7fd/0x930 net/socket.c:2115
    __sys_sendmsg+0x11d/0x290 net/socket.c:2153
    __do_sys_sendmsg net/socket.c:2162 [inline]
    __se_sys_sendmsg net/socket.c:2160 [inline]
    __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2160
    do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x457089
    Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
    RSP: 002b:00007f2df20a5c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
    RAX: ffffffffffffffda RBX: 00007f2df20a66d4 RCX: 0000000000457089
    RDX: 0000000000000000 RSI: 0000000020000180 RDI: 0000000000000003
    RBP: 0000000000930140 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
    R13: 00000000004d40b8 R14: 00000000004c8ad8 R15: 0000000000000001

    Signed-off-by: Mahesh Bandewar
    Signed-off-by: David S. Miller

    Mahesh Bandewar
     
  • Commit b89f04c61efe ("bonding: deliver link-local packets with
    skb->dev set to link that packets arrived on") changed the behavior
    of how link-local-multicast packets are processed. The change in
    the behavior broke some legacy use cases where these packets are
    expected to arrive on bonding master device also.

    This patch passes the packet to the stack with the link it arrived
    on as well as passes to the bonding-master device to preserve the
    legacy use case.

    Fixes: b89f04c61efe ("bonding: deliver link-local packets with skb->dev set to link that packets arrived on")
    Reported-by: Michal Soltys
    Signed-off-by: Mahesh Bandewar
    Signed-off-by: David S. Miller

    Mahesh Bandewar
     

24 Sep, 2018

1 commit

  • We want to allow NAPI drivers to no longer provide
    ndo_poll_controller() method, as it has been proven problematic.

    team driver must not look at its presence, but instead call
    netpoll_poll_dev() which factorize the needed actions.

    Signed-off-by: Eric Dumazet
    Cc: Jay Vosburgh
    Cc: Veaceslav Falico
    Cc: Andy Gospodarek
    Acked-by: Jay Vosburgh
    Signed-off-by: David S. Miller

    Eric Dumazet
     

03 Aug, 2018

1 commit


02 Aug, 2018

1 commit

  • syzbot found that the following sequence produces a LOCKDEP splat [1]

    ip link add bond10 type bond
    ip link add bond11 type bond
    ip link set bond11 master bond10

    To fix this, we can use the already provided nest_level.

    This patch also provides correct nesting for dev->addr_list_lock

    [1]
    WARNING: possible recursive locking detected
    4.18.0-rc6+ #167 Not tainted
    --------------------------------------------
    syz-executor751/4439 is trying to acquire lock:
    (____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: spin_lock include/linux/spinlock.h:310 [inline]
    (____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: bond_get_stats+0xb4/0x560 drivers/net/bonding/bond_main.c:3426

    but task is already holding lock:
    (____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: spin_lock include/linux/spinlock.h:310 [inline]
    (____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: bond_get_stats+0xb4/0x560 drivers/net/bonding/bond_main.c:3426

    other info that might help us debug this:
    Possible unsafe locking scenario:

    CPU0
    ----
    lock(&(&bond->stats_lock)->rlock);
    lock(&(&bond->stats_lock)->rlock);

    *** DEADLOCK ***

    May be due to missing lock nesting notation

    3 locks held by syz-executor751/4439:
    #0: (____ptrval____) (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20 net/core/rtnetlink.c:77
    #1: (____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: spin_lock include/linux/spinlock.h:310 [inline]
    #1: (____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: bond_get_stats+0xb4/0x560 drivers/net/bonding/bond_main.c:3426
    #2: (____ptrval____) (rcu_read_lock){....}, at: bond_get_stats+0x0/0x560 include/linux/compiler.h:215

    stack backtrace:
    CPU: 0 PID: 4439 Comm: syz-executor751 Not tainted 4.18.0-rc6+ #167
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
    print_deadlock_bug kernel/locking/lockdep.c:1765 [inline]
    check_deadlock kernel/locking/lockdep.c:1809 [inline]
    validate_chain kernel/locking/lockdep.c:2405 [inline]
    __lock_acquire.cold.64+0x1fb/0x486 kernel/locking/lockdep.c:3435
    lock_acquire+0x1e4/0x540 kernel/locking/lockdep.c:3924
    __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
    _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:144
    spin_lock include/linux/spinlock.h:310 [inline]
    bond_get_stats+0xb4/0x560 drivers/net/bonding/bond_main.c:3426
    dev_get_stats+0x10f/0x470 net/core/dev.c:8316
    bond_get_stats+0x232/0x560 drivers/net/bonding/bond_main.c:3432
    dev_get_stats+0x10f/0x470 net/core/dev.c:8316
    rtnl_fill_stats+0x4d/0xac0 net/core/rtnetlink.c:1169
    rtnl_fill_ifinfo+0x1aa6/0x3fb0 net/core/rtnetlink.c:1611
    rtmsg_ifinfo_build_skb+0xc8/0x190 net/core/rtnetlink.c:3268
    rtmsg_ifinfo_event.part.30+0x45/0xe0 net/core/rtnetlink.c:3300
    rtmsg_ifinfo_event net/core/rtnetlink.c:3297 [inline]
    rtnetlink_event+0x144/0x170 net/core/rtnetlink.c:4716
    notifier_call_chain+0x180/0x390 kernel/notifier.c:93
    __raw_notifier_call_chain kernel/notifier.c:394 [inline]
    raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
    call_netdevice_notifiers_info+0x3f/0x90 net/core/dev.c:1735
    call_netdevice_notifiers net/core/dev.c:1753 [inline]
    netdev_features_change net/core/dev.c:1321 [inline]
    netdev_change_features+0xb3/0x110 net/core/dev.c:7759
    bond_compute_features.isra.47+0x585/0xa50 drivers/net/bonding/bond_main.c:1120
    bond_enslave+0x1b25/0x5da0 drivers/net/bonding/bond_main.c:1755
    bond_do_ioctl+0x7cb/0xae0 drivers/net/bonding/bond_main.c:3528
    dev_ifsioc+0x43c/0xb30 net/core/dev_ioctl.c:327
    dev_ioctl+0x1b5/0xcc0 net/core/dev_ioctl.c:493
    sock_do_ioctl+0x1d3/0x3e0 net/socket.c:992
    sock_ioctl+0x30d/0x680 net/socket.c:1093
    vfs_ioctl fs/ioctl.c:46 [inline]
    file_ioctl fs/ioctl.c:500 [inline]
    do_vfs_ioctl+0x1de/0x1720 fs/ioctl.c:684
    ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701
    __do_sys_ioctl fs/ioctl.c:708 [inline]
    __se_sys_ioctl fs/ioctl.c:706 [inline]
    __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:706
    do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x440859
    Code: e8 2c af 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 0f 83 3b 10 fc ff c3 66 2e 0f 1f 84 00 00 00 00
    RSP: 002b:00007ffc51a92878 EFLAGS: 00000213 ORIG_RAX: 0000000000000010
    RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000440859
    RDX: 0000000020000040 RSI: 0000000000008990 RDI: 0000000000000003
    RBP: 0000000000000000 R08: 00000000004002c8 R09: 00000000004002c8
    R10: 00000000022d5880 R11: 0000000000000213 R12: 0000000000007390
    R13: 0000000000401db0 R14: 0000000000000000 R15: 0000000000000000

    Signed-off-by: Eric Dumazet
    Cc: Jay Vosburgh
    Cc: Veaceslav Falico
    Cc: Andy Gospodarek

    Signed-off-by: David S. Miller

    Eric Dumazet
     

25 Jul, 2018

1 commit


23 Jul, 2018

1 commit


22 Jul, 2018

1 commit

  • For some time now, if you load the bonding driver and configure bond
    parameters via sysfs using minimal config options, such as specifying
    nothing but the mode, relying on defaults for everything else, modes
    that cannot use arp monitoring (802.3ad, balance-tlb, balance-alb) all
    wind up with both arp_interval=0 (as it should be) and miimon=0, which
    means the miimon monitor thread never actually runs. This is particularly
    problematic for 802.3ad.

    For example, from an LNST recipe I've set up:

    $ modprobe bonding max_bonds=0"
    $ echo "+t_bond0" > /sys/class/net/bonding_masters"
    $ ip link set t_bond0 down"
    $ echo "802.3ad" > /sys/class/net/t_bond0/bonding/mode"
    $ ip link set ens1f1 down"
    $ echo "+ens1f1" > /sys/class/net/t_bond0/bonding/slaves"
    $ ip link set ens1f0 down"
    $ echo "+ens1f0" > /sys/class/net/t_bond0/bonding/slaves"
    $ ethtool -i t_bond0"
    $ ip link set ens1f1 up"
    $ ip link set ens1f0 up"
    $ ip link set t_bond0 up"
    $ ip addr add 192.168.9.1/24 dev t_bond0"
    $ ip addr add 2002::1/64 dev t_bond0"

    This bond comes up okay, but things look slightly suspect in
    /proc/net/bonding/t_bond0 output:

    $ grep -i mii /proc/net/bonding/t_bond0
    MII Status: up
    MII Polling Interval (ms): 0
    MII Status: up
    MII Status: up

    Now, pull a cable on one of the ports in the bond, then reconnect it, and
    you'll see:

    Slave Interface: ens1f0
    MII Status: down
    Speed: 1000 Mbps
    Duplex: full

    I believe this became a major issue as of commit 4d2c0cda0744, which for
    802.3ad bonds, sets slave->link = BOND_LINK_DOWN, with a comment about
    relying on link monitoring via miimon to set it correctly, but since the
    miimon work queue never runs, the link just stays marked down.

    If we simply tweak bond_option_mode_set() slightly, we can check for the
    non-arp modes having no miimon value set, and insert BOND_DEFAULT_MIIMON,
    which gets things back in full working order. This problem exists as far
    back as 4.14, and might be worth fixing in all stable trees since, though
    the work-around is to simply specify an miimon value yourself.

    Reported-by: Bob Ball
    Signed-off-by: Jarod Wilson
    Acked-by: Mahesh Bandewar
    Signed-off-by: David S. Miller

    Jarod Wilson
     

10 Jul, 2018

1 commit

  • This patch makes it so that instead of passing a void pointer as the
    accel_priv we instead pass a net_device pointer as sb_dev. Making this
    change allows us to pass the subordinate device through to the fallback
    function eventually so that we can keep the actual code in the
    ndo_select_queue call as focused on possible on the exception cases.

    Signed-off-by: Alexander Duyck
    Tested-by: Andrew Bowers
    Signed-off-by: Jeff Kirsher

    Alexander Duyck