19 Oct, 2016

1 commit

  • We recently got the following warning after setting up a vlan device on
    top of an offloaded bridge and executing 'bridge link':

    WARNING: CPU: 0 PID: 18566 at drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c:81 mlxsw_sp_port_orig_get.part.9+0x55/0x70 [mlxsw_spectrum]
    [...]
    CPU: 0 PID: 18566 Comm: bridge Not tainted 4.8.0-rc7 #1
    Hardware name: Mellanox Technologies Ltd. Mellanox switch/Mellanox switch, BIOS 4.6.5 05/21/2015
    0000000000000286 00000000e64ab94f ffff880406e6f8f0 ffffffff8135eaa3
    0000000000000000 0000000000000000 ffff880406e6f930 ffffffff8108c43b
    0000005106e6f988 ffff8803df398840 ffff880403c60108 ffff880406e6f990
    Call Trace:
    [] dump_stack+0x63/0x90
    [] __warn+0xcb/0xf0
    [] warn_slowpath_null+0x1d/0x20
    [] mlxsw_sp_port_orig_get.part.9+0x55/0x70 [mlxsw_spectrum]
    [] mlxsw_sp_port_attr_get+0xa5/0xb0 [mlxsw_spectrum]
    [] switchdev_port_attr_get+0x4f/0x140
    [] switchdev_port_attr_get+0x100/0x140
    [] switchdev_port_attr_get+0x100/0x140
    [] switchdev_port_bridge_getlink+0x5b/0xc0
    [] ? switchdev_port_fdb_dump+0x90/0x90
    [] rtnl_bridge_getlink+0xe7/0x190
    [] netlink_dump+0x122/0x290
    [] __netlink_dump_start+0x15f/0x190
    [] ? rtnl_bridge_dellink+0x230/0x230
    [] rtnetlink_rcv_msg+0x1a6/0x220
    [] ? __kmalloc_node_track_caller+0x208/0x2c0
    [] ? rtnl_bridge_dellink+0x230/0x230
    [] ? rtnl_newlink+0x890/0x890
    [] netlink_rcv_skb+0xa4/0xc0
    [] rtnetlink_rcv+0x28/0x30
    [] netlink_unicast+0x18c/0x240
    [] netlink_sendmsg+0x2fb/0x3a0
    [] sock_sendmsg+0x38/0x50
    [] SYSC_sendto+0x101/0x190
    [] ? __sys_recvmsg+0x51/0x90
    [] SyS_sendto+0xe/0x10
    [] entry_SYSCALL_64_fastpath+0x1a/0xa4

    The problem is that the 8021q module propagates the call to
    ndo_bridge_getlink() via switchdev ops, but the switch driver doesn't
    recognize the netdev, as it's not offloaded.

    While we can ignore calls being made to non-bridge ports inside the
    driver, a better fix would be to push this check up to the switchdev
    layer.

    Note that these ndos can be called for non-bridged netdev, but this only
    happens in certain PF drivers which don't call the corresponding
    switchdev functions anyway.

    Fixes: 99f44bb3527b ("mlxsw: spectrum: Enable L3 interfaces on top of bridge devices")
    Signed-off-by: Ido Schimmel
    Reported-by: Tamir Winetroub
    Tested-by: Tamir Winetroub
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Ido Schimmel
     

28 Sep, 2016

2 commits


02 Sep, 2016

1 commit

  • fdb dumps spanning multiple skb's currently restart from the first
    interface again for every skb. This results in unnecessary
    iterations on the already visited interfaces and their fdb
    entries. In large scale setups, we have seen this to slow
    down fdb dumps considerably. On a system with 30k macs we
    see fdb dumps spanning across more than 300 skbs.

    To fix the problem, this patch replaces the existing single fdb
    marker with three markers: netdev hash entries, netdevs and fdb
    index to continue where we left off instead of restarting from the
    first netdev. This is consistent with link dumps.

    In the process of fixing the performance issue, this patch also
    re-implements fix done by
    commit 472681d57a5d ("net: ndo_fdb_dump should report -EMSGSIZE to rtnl_fdb_dump")
    (with an internal fix from Wilson Kok) in the following ways:
    - change ndo_fdb_dump handlers to return error code instead
    of the last fdb index
    - use cb->args strictly for dump frag markers and not error codes.
    This is consistent with other dump functions.

    Below results were taken on a system with 1000 netdevs
    and 35085 fdb entries:
    before patch:
    $time bridge fdb show | wc -l
    15065

    real 1m11.791s
    user 0m0.070s
    sys 1m8.395s

    (existing code does not return all macs)

    after patch:
    $time bridge fdb show | wc -l
    35085

    real 0m2.017s
    user 0m0.113s
    sys 0m1.942s

    Signed-off-by: Roopa Prabhu
    Signed-off-by: Wilson Kok
    Signed-off-by: David S. Miller

    Roopa Prabhu
     

27 Aug, 2016

2 commits

  • switchdev_port_fwd_mark_set() is used to set the 'offload_fwd_mark' of
    port netdevs so that packets being flooded by the device won't be
    flooded twice.

    It works by assigning a unique identifier (the ifindex of the first
    bridge port) to bridge ports sharing the same parent ID. This prevents
    packets from being flooded twice by the same switch, but will flood
    packets through bridge ports belonging to a different switch.

    This method is problematic when stacked devices are taken into account,
    such as VLANs. In such cases, a physical port netdev can have upper
    devices being members in two different bridges, thus requiring two
    different 'offload_fwd_mark's to be configured on the port netdev, which
    is impossible.

    The main problem is that packet and netdev marking is performed at the
    physical netdev level, whereas flooding occurs between bridge ports,
    which are not necessarily port netdevs.

    Instead, packet and netdev marking should really be done in the bridge
    driver with the switch driver only telling it which packets it already
    forwarded. The bridge driver will mark such packets using the mark
    assigned to the ingress bridge port and will prevent the packet from
    being forwarded through any bridge port sharing the same mark (i.e.
    having the same parent ID).

    Remove the current switchdev 'offload_fwd_mark' implementation and
    instead implement the proposed method. In addition, make rocker - the
    sole user of the mark - use the proposed method.

    Signed-off-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • switchdev_port_same_parent_id() currently expects port netdevs, but we
    need it to support stacked devices in the next patch, so drop the
    NO_RECURSE flag.

    Signed-off-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Ido Schimmel
     

16 Aug, 2016

1 commit


15 Jul, 2016

1 commit


18 May, 2016

1 commit

  • The problem is that fib_info->nh is [0] so the struct fib_info
    allocation size depends on number of nexthops. If we just copy fib_info,
    we do not copy the nexthops info and driver accesses memory which is not
    ours.

    Given the fact that fib4 does not defer operations and therefore it does
    not need copy, just pass the pointer down to drivers as it was done
    before.

    Fixes: 850d0cbc91 ("switchdev: remove pointers from switchdev objects")
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Jiri Pirko
     

15 May, 2016

1 commit


25 Apr, 2016

1 commit

  • When using switchdev deferred operation (SWITCHDEV_F_DEFER), the operation
    is executed in different context and the application doesn't have any way
    to get the operation real status.

    Adding a completion callback fixes that. This patch adds fields to
    switchdev_attr and switchdev_obj "complete_priv" field which is used by
    the "complete" callback.

    Application can set a complete function which will be called once the
    operation executed.

    Signed-off-by: Elad Raz
    Signed-off-by: Jiri Pirko
    Reviewed-by: Ido Schimmel
    Acked-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Elad Raz
     

25 Mar, 2016

1 commit


27 Feb, 2016

1 commit

  • When the send skbuff reaches the end, nlmsg_put and friends returns
    -EMSGSIZE but it is silently thrown away in ndo_fdb_dump. It is called
    within a for_each_netdev loop and the first fdb entry of a following
    netdev could fit in the remaining skbuff. This breaks the mechanism
    of cb->args[0] and idx to keep track of the entries that are already
    dumped, which results missing entries in bridge fdb show command.

    Signed-off-by: Minoura Makoto
    Signed-off-by: David S. Miller

    MINOURA Makoto / 箕浦 真
     

29 Jan, 2016

1 commit

  • When switchdev drivers process FDB notifications from the underlying
    device they resolve the netdev to which the entry points to and notify
    the bridge using the switchdev notifier.

    However, since the RTNL mutex is not held there is nothing preventing
    the netdev from disappearing in the middle, which will cause
    br_switchdev_event() to dereference a non-existing netdev.

    Make switchdev drivers hold the lock at the beginning of the
    notification processing session and release it once it ends, after
    notifying the bridge.

    Also, remove switchdev_mutex and fdb_lock, as they are no longer needed
    when RTNL mutex is held.

    Fixes: 03bf0c281234 ("switchdev: introduce switchdev notifier")
    Signed-off-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Ido Schimmel
     

11 Jan, 2016

1 commit

  • Define HW multicast entry: MAC and VID.
    Using a MAC address simplifies support for both IPV4 and IPv6.

    Signed-off-by: Elad Raz
    Signed-off-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Elad Raz
     

16 Dec, 2015

1 commit

  • switchdev drivers need to know the netdev on which the switchdev op was
    invoked. For example, the STP state of a VLAN interface configured on top
    of a port can change while being member in a bridge. In this case, the
    underlying driver should only change the STP state of that particular
    VLAN and not of all the VLANs configured on the port.

    However, current switchdev infrastructure only passes the port netdev down
    to the driver. Solve that by passing the original device down to the
    driver as part of the required switchdev object / attribute.

    This doesn't entail any change in current switchdev drivers. It simply
    enables those supporting stacked devices to know the originating device
    and act accordingly.

    Signed-off-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Ido Schimmel
     

04 Nov, 2015

1 commit

  • Caller passing down the SKIP_EOPNOTSUPP switchdev flag expects that
    -EOPNOTSUPP cannot be returned. But in case of direct op call without
    recurtion, this may happen. So fix this by checking it always on the
    end of __switchdev_port_attr_set function.

    Fixes: 464314ea6c11 ("switchdev: skip over ports returning -EOPNOTSUPP when recursing ports")
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Jiri Pirko
     

30 Oct, 2015

3 commits


20 Oct, 2015

1 commit


15 Oct, 2015

6 commits


13 Oct, 2015

2 commits


12 Oct, 2015

1 commit


03 Oct, 2015

6 commits


30 Sep, 2015

4 commits

  • Similar to the notifier_call callback of a notifier_block, change the
    function signature of switchdev add and del operations to:

    int switchdev_port_obj_add/del(struct net_device *dev,
    enum switchdev_obj_id id, void *obj);

    This allows the caller to pass a specific switchdev_obj_* structure
    instead of the generic switchdev_obj one.

    Drivers implementation of these operations and switchdev have been
    changed accordingly.

    Signed-off-by: Vivien Didelot
    Signed-off-by: David S. Miller

    Vivien Didelot
     
  • Similar to the notifier_call callback of a notifier_block, change the
    function signature of switchdev dump operation to:

    int switchdev_port_obj_dump(struct net_device *dev,
    enum switchdev_obj_id id, void *obj,
    int (*cb)(void *obj));

    This allows the caller to pass and expect back a specific
    switchdev_obj_* structure instead of the generic switchdev_obj one.

    Drivers implementation of dump operation can now expect this specific
    structure and call the callback with it. Drivers have been changed
    accordingly.

    Signed-off-by: Vivien Didelot
    Signed-off-by: David S. Miller

    Vivien Didelot
     
  • The net_device associated to a dump operation does not have to be passed
    to the callback. switchdev stores it in a superset struct, if needed.

    Also some drivers (such as DSA drivers) may not have easy access to it.

    This will simplify pushing the callback function down to the drivers.

    Signed-off-by: Vivien Didelot
    Signed-off-by: David S. Miller

    Vivien Didelot
     
  • The FDB dump callback requires the related net_device so move it to the
    struct switchdev_fdb_dump superset instead of using a callback param.

    With this done, it'll be simpler to change the dump function signature.

    Signed-off-by: Vivien Didelot
    Signed-off-by: David S. Miller

    Vivien Didelot