04 May, 2022

5 commits

  • When the switchdev_handle_fdb_event_to_device() event replication helper
    was created, my original thought was that FDB events on LAG interfaces
    should most likely be special-cased, not just replicated towards all
    switchdev ports beneath that LAG. So this replication helper currently
    does not recurse through switchdev lower interfaces of LAG bridge ports,
    but rather calls the lag_mod_cb() if that was provided.

    No switchdev driver uses this helper for FDB events on LAG interfaces
    yet, so that was an assumption which was yet to be tested. It is
    certainly usable for that purpose, as my RFC series shows:

    https://patchwork.kernel.org/project/netdevbpf/cover/20220210125201.2859463-1-vladimir.oltean@nxp.com/

    however this approach is slightly convoluted because:

    - the switchdev driver gets a "dev" that isn't its own net device, but
    rather the LAG net device. It must call switchdev_lower_dev_find(dev)
    in order to get a handle of any of its own net devices (the ones that
    pass check_cb).

    - in order for FDB entries on LAG ports to be correctly refcounted per
    the number of switchdev ports beneath that LAG, we haven't escaped the
    need to iterate through the LAG's lower interfaces. Except that is now
    the responsibility of the switchdev driver, because the replication
    helper just stopped half-way.

    So, even though yes, FDB events on LAG bridge ports must be
    special-cased, in the end it's simpler to let switchdev_handle_fdb_*
    just iterate through the LAG port's switchdev lowers, and let the
    switchdev driver figure out that those physical ports are under a LAG.

    The switchdev_handle_fdb_event_to_device() helper takes a
    "foreign_dev_check" callback so it can figure out whether @dev can
    autonomously forward to @foreign_dev. DSA fills this method properly:
    if the LAG is offloaded by another port in the same tree as @dev, then
    it isn't foreign. If it is a software LAG, it is foreign - forwarding
    happens in software.

    Whether an interface is foreign or not decides whether the replication
    helper will go through the LAG's switchdev lowers or not. Since the
    lan966x doesn't properly fill this out, FDB events on software LAG
    uppers will get called. By changing lan966x_foreign_dev_check(), we can
    suppress them.

    Whereas DSA will now start receiving FDB events for its offloaded LAG
    uppers, so we need to return -EOPNOTSUPP, since we currently don't do
    the right thing for them.

    Cc: Horatiu Vultur
    Signed-off-by: Vladimir Oltean
    Reviewed-by: Florian Fainelli
    Signed-off-by: Jakub Kicinski
    (cherry picked from commit ec638740fce990ad2b9af43ead8088d6d6eb2145)
    Signed-off-by: Vladimir Oltean

    Vladimir Oltean
     
  • The logic from switchdev_handle_port_obj_add_foreign() is directly
    adapted from switchdev_handle_fdb_event_to_device(), which already
    detects events on foreign interfaces and reoffloads them towards the
    switchdev neighbors.

    However, when we have a simple br0 bond0 swp0 topology and the
    switchdev_handle_port_obj_add_foreign() gets called on bond0, we get
    stuck into an infinite recursion:

    1. bond0 does not pass check_cb(), so we attempt to find switchdev
    neighbor interfaces. For that, we recursively call
    __switchdev_handle_port_obj_add() for bond0's bridge, br0.

    2. __switchdev_handle_port_obj_add() recurses through br0's lowers,
    essentially calling __switchdev_handle_port_obj_add() for bond0

    3. Go to step 1.

    This happens because switchdev_handle_fdb_event_to_device() and
    switchdev_handle_port_obj_add_foreign() are not exactly the same.
    The FDB event helper special-cases LAG interfaces with its lag_mod_cb(),
    so this is why we don't end up in an infinite loop - because it doesn't
    attempt to treat LAG interfaces as potentially foreign bridge ports.

    The problem is solved by looking ahead through the bridge's lowers to
    see whether there is any switchdev interface that is foreign to the @dev
    we are currently processing. This stops the recursion described above at
    step 1: __switchdev_handle_port_obj_add(bond0) will not create another
    call to __switchdev_handle_port_obj_add(br0). Going one step upper
    should only happen when we're starting from a bridge port that has been
    determined to be "foreign" to the switchdev driver that passes the
    foreign_dev_check_cb().

    Fixes: c4076cdd21f8 ("net: switchdev: introduce switchdev_handle_port_obj_{add,del} for foreign interfaces")
    Signed-off-by: Vladimir Oltean
    Signed-off-by: David S. Miller
    (cherry picked from commit acd8df5880d7c80b0317dce8df3e65b6a6825c88)
    Signed-off-by: Vladimir Oltean

    Vladimir Oltean
     
  • The switchdev_handle_port_obj_add() helper is good for replicating a
    port object on the lower interfaces of @dev, if that object was emitted
    on a bridge, or on a bridge port that is a LAG.

    However, drivers that use this helper limit themselves to a box from
    which they can no longer intercept port objects notified on neighbor
    ports ("foreign interfaces").

    One such driver is DSA, where software bridging with foreign interfaces
    such as standalone NICs or Wi-Fi APs is an important use case. There, a
    VLAN installed on a neighbor bridge port roughly corresponds to a
    forwarding VLAN installed on the DSA switch's CPU port.

    To support this use case while also making use of the benefits of the
    switchdev_handle_* replication helper for port objects, introduce a new
    variant of these functions that crawls through the neighbor ports of
    @dev, in search of potentially compatible switchdev ports that are
    interested in the event.

    The strategy is identical to switchdev_handle_fdb_event_to_device():
    if @dev wasn't a switchdev interface, then go one step upper, and
    recursively call this function on the bridge that this port belongs to.
    At the next recursion step, __switchdev_handle_port_obj_add() will
    iterate through the bridge's lower interfaces. Among those, some will be
    switchdev interfaces, and one will be the original @dev that we came
    from. To prevent infinite recursion, we must suppress reentry into the
    original @dev, and just call the @add_cb for the switchdev_interfaces.

    It looks like this:

    br0
    / | \
    / | \
    / | \
    swp0 swp1 eth0

    1. __switchdev_handle_port_obj_add(eth0)
    -> check_cb(eth0) returns false
    -> eth0 has no lower interfaces
    -> eth0's bridge is br0
    -> switchdev_lower_dev_find(br0, check_cb, foreign_dev_check_cb))
    finds br0

    2. __switchdev_handle_port_obj_add(br0)
    -> check_cb(br0) returns false
    -> netdev_for_each_lower_dev
    -> check_cb(swp0) returns true, so we don't skip this interface

    3. __switchdev_handle_port_obj_add(swp0)
    -> check_cb(swp0) returns true, so we call add_cb(swp0)

    (back to netdev_for_each_lower_dev from 2)
    -> check_cb(swp1) returns true, so we don't skip this interface

    4. __switchdev_handle_port_obj_add(swp1)
    -> check_cb(swp1) returns true, so we call add_cb(swp1)

    (back to netdev_for_each_lower_dev from 2)
    -> check_cb(eth0) returns false, so we skip this interface to
    avoid infinite recursion

    Note: eth0 could have been a LAG, and we don't want to suppress the
    recursion through its lowers if those exist, so when check_cb() returns
    false, we still call switchdev_lower_dev_find() to estimate whether
    there's anything worth a recursion beneath that LAG. Using check_cb()
    and foreign_dev_check_cb(), switchdev_lower_dev_find() not only figures
    out whether the lowers of the LAG are switchdev, but also whether they
    actively offload the LAG or not (whether the LAG is "foreign" to the
    switchdev interface or not).

    The port_obj_info->orig_dev is preserved across recursive calls, so
    switchdev drivers still know on which device was this notification
    originally emitted.

    Signed-off-by: Vladimir Oltean
    Signed-off-by: David S. Miller
    (cherry picked from commit c4076cdd21f8d68a96f1e7124bd8915c7e31a474)
    Signed-off-by: Vladimir Oltean

    Vladimir Oltean
     
  • switchdev_lower_dev_find() assumes RCU read-side critical section
    calling context, since it uses netdev_walk_all_lower_dev_rcu().

    Rename it appropriately, in preparation of adding a similar iterator
    that assumes writer-side rtnl_mutex protection.

    Signed-off-by: Vladimir Oltean
    Signed-off-by: David S. Miller
    (cherry picked from commit 7b465f4cf39ea1f5df7f425b843578b60f673155)
    Signed-off-by: Vladimir Oltean

    Vladimir Oltean
     
  • To reduce code churn, the same patch makes multiple changes, since they
    all touch the same lines:

    1. The implementations for these two are identical, just with different
    function pointers. Reduce duplications and name the function pointers
    "mod_cb" instead of "add_cb" and "del_cb". Pass the event as argument.

    2. Drop the "const" attribute from "orig_dev". If the driver needs to
    check whether orig_dev belongs to itself and then
    call_switchdev_notifiers(orig_dev, SWITCHDEV_FDB_OFFLOADED), it
    can't, because call_switchdev_notifiers takes a non-const struct
    net_device *.

    Signed-off-by: Vladimir Oltean
    Reviewed-by: Ido Schimmel
    Signed-off-by: David S. Miller
    (cherry picked from commit 716a30a97a52aa78afd70db48d522855f624e7e0)
    Signed-off-by: Vladimir Oltean

    Vladimir Oltean
     

04 Aug, 2021

1 commit

  • With the introduction of explicit offloading API in switchdev in commit
    2f5dc00f7a3e ("net: bridge: switchdev: let drivers inform which bridge
    ports are offloaded"), we started having Ethernet switch drivers calling
    directly into a function exported by net/bridge/br_switchdev.c, which is
    a function exported by the bridge driver.

    This means that drivers that did not have an explicit dependency on the
    bridge before, like cpsw and am65-cpsw, now do - otherwise it is not
    possible to call a symbol exported by a driver that can be built as
    module unless you are a module too.

    There was an attempt to solve the dependency issue in the form of commit
    b0e81817629a ("net: build all switchdev drivers as modules when the
    bridge is a module"). Grygorii Strashko, however, says about it:

    | In my opinion, the problem is a bit bigger here than just fixing the
    | build :(
    |
    | In case, of ^cpsw the switchdev mode is kinda optional and in many
    | cases (especially for testing purposes, NFS) the multi-mac mode is
    | still preferable mode.
    |
    | There were no such tight dependency between switchdev drivers and
    | bridge core before and switchdev serviced as independent, notification
    | based layer between them, so ^cpsw still can be "Y" and bridge can be
    | "M". Now for mostly every kernel build configuration the CONFIG_BRIDGE
    | will need to be set as "Y", or we will have to update drivers to
    | support build with BRIDGE=n and maintain separate builds for
    | networking vs non-networking testing. But is this enough? Wouldn't
    | it cause 'chain reaction' required to add more and more "Y" options
    | (like CONFIG_VLAN_8021Q)?
    |
    | PS. Just to be sure we on the same page - ARM builds will be forced
    | (with this patch) to have CONFIG_TI_CPSW_SWITCHDEV=m and so all our
    | automation testing will just fail with omap2plus_defconfig.

    In the light of this, it would be desirable for some configurations to
    avoid dependencies between switchdev drivers and the bridge, and have
    the switchdev mode as completely optional within the driver.

    Arnd Bergmann also tried to write a patch which better expressed the
    build time dependency for Ethernet switch drivers where the switchdev
    support is optional, like cpsw/am65-cpsw, and this made the drivers
    follow the bridge (compile as module if the bridge is a module) only if
    the optional switchdev support in the driver was enabled in the first
    place:
    https://patchwork.kernel.org/project/netdevbpf/patch/20210802144813.1152762-1-arnd@kernel.org/

    but this still did not solve the fact that cpsw and am65-cpsw now must
    be built as modules when the bridge is a module - it just expressed
    correctly that optional dependency. But the new behavior is an apparent
    regression from Grygorii's perspective.

    So to support the use case where the Ethernet driver is built-in,
    NET_SWITCHDEV (a bool option) is enabled, and the bridge is a module, we
    need a framework that can handle the possible absence of the bridge from
    the running system, i.e. runtime bloatware as opposed to build-time
    bloatware.

    Luckily we already have this framework, since switchdev has been using
    it extensively. Events from the bridge side are transmitted to the
    driver side using notifier chains - this was originally done so that
    unrelated drivers could snoop for events emitted by the bridge towards
    ports that are implemented by other drivers (think of a switch driver
    with LAG offload that listens for switchdev events on a bonding/team
    interface that it offloads).

    There are also events which are transmitted from the driver side to the
    bridge side, which again are modeled using notifiers.
    SWITCHDEV_FDB_ADD_TO_BRIDGE is an example of this, and deals with
    notifying the bridge that a MAC address has been dynamically learned.
    So there is a precedent we can use for modeling the new framework.

    The difference compared to SWITCHDEV_FDB_ADD_TO_BRIDGE is that the work
    that the bridge needs to do when a port becomes offloaded is blocking in
    its nature: replay VLANs, MDBs etc. The calling context is indeed
    blocking (we are under rtnl_mutex), but the existing switchdev
    notification chain that the bridge is subscribed to is only the atomic
    one. So we need to subscribe the bridge to the blocking switchdev
    notification chain too.

    This patch:
    - keeps the driver-side perception of the switchdev_bridge_port_{,un}offload
    unchanged
    - moves the implementation of switchdev_bridge_port_{,un}offload from
    the bridge module into the switchdev module.
    - makes everybody that is subscribed to the switchdev blocking notifier
    chain "hear" offload & unoffload events
    - makes the bridge driver subscribe and handle those events
    - moves the bridge driver's handling of those events into 2 new
    functions called br_switchdev_port_{,un}offload. These functions
    contain in fact the core of the logic that was previously in
    switchdev_bridge_port_{,un}offload, just that now we go through an
    extra indirection layer to reach them.

    Unlike all the other switchdev notification structures, the structure
    used to carry the bridge port information, struct
    switchdev_notifier_brport_info, does not contain a "bool handled".
    This is because in the current usage pattern, we always know that a
    switchdev bridge port offloading event will be handled by the bridge,
    because the switchdev_bridge_port_offload() call was initiated by a
    NETDEV_CHANGEUPPER event in the first place, where info->upper_dev is a
    bridge. So if the bridge wasn't loaded, then the CHANGEUPPER event
    couldn't have happened.

    Signed-off-by: Vladimir Oltean
    Tested-by: Grygorii Strashko
    Signed-off-by: David S. Miller

    Vladimir Oltean
     

22 Jul, 2021

1 commit

  • The newly introduced switchdev_handle_fdb_{add,del}_to_device helpers
    solved a problem but introduced another one. They have a severe design
    bug: they do not propagate FDB events on foreign interfaces to us, i.e.
    this use case:

    br0
    / \
    / \
    / \
    / \
    swp0 eno0
    (switchdev) (foreign)

    when an address is learned on eno0, what is supposed to happen is that
    this event should also be propagated towards swp0. Somehow I managed to
    convince myself that this did work correctly, but obviously it does not.

    The trouble with foreign interfaces is that we must reach a switchdev
    net_device pointer through a foreign net_device that has no direct
    upper/lower relationship with it. So we need to do exploratory searching
    through the lower interfaces of the foreign net_device's bridge upper
    (to reach swp0 from eno0, we must check its upper, br0, for lower
    interfaces that pass the check_cb and foreign_dev_check_cb). This is
    something that the previous code did not do, it just assumed that "dev"
    will become a switchdev interface at some point, somehow, probably by
    magic.

    With this patch, assisted address learning on the CPU port works again
    in DSA:

    ip link add br0 type bridge
    ip link set swp0 master br0
    ip link set eno0 master br0
    ip link set br0 up

    [ 46.708929] mscc_felix 0000:00:00.5 swp0: Adding FDB entry towards eno0, addr 00:04:9f:05:f4:ab vid 0 as host address

    Fixes: 8ca07176ab00 ("net: switchdev: introduce a fanout helper for SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE")
    Reported-by: Eric Woudstra
    Signed-off-by: Vladimir Oltean
    Signed-off-by: David S. Miller

    Vladimir Oltean
     

21 Jul, 2021

1 commit

  • The difference between __switchdev_handle_fdb_del_to_device and
    switchdev_handle_del_to_device is that the former takes an extra
    orig_dev argument, while the latter starts with dev == orig_dev.

    We should recurse into the variant that does not lose the orig_dev along
    the way. This is relevant when deleting FDB entries pointing towards a
    bridge (dev changes to the lower interfaces, but orig_dev shouldn't).

    The addition helper already recurses properly, just the deletion one
    doesn't.

    Fixes: 8ca07176ab00 ("net: switchdev: introduce a fanout helper for SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE")
    Signed-off-by: Vladimir Oltean
    Signed-off-by: David S. Miller

    Vladimir Oltean
     

20 Jul, 2021

1 commit

  • Currently DSA has an issue with FDB entries pointing towards the bridge
    in the presence of br_fdb_replay() being called at port join and leave
    time.

    In particular, each bridge port will ask for a replay for the FDB
    entries pointing towards the bridge when it joins, and for another
    replay when it leaves.

    This means that for example, a bridge with 4 switch ports will notify
    DSA 4 times of the bridge MAC address.

    But if the MAC address of the bridge changes during the normal runtime
    of the system, the bridge notifies switchdev [ once ] of the deletion of
    the old MAC address as a local FDB towards the bridge, and of the
    insertion [ again once ] of the new MAC address as a local FDB.

    This is a problem, because DSA keeps the old MAC address as a host FDB
    entry with refcount 4 (4 ports asked for it using br_fdb_replay). So the
    old MAC address will not be deleted. Additionally, the new MAC address
    will only be installed with refcount 1, and when the first switch port
    leaves the bridge (leaving 3 others as still members), it will delete
    with it the new MAC address of the bridge from the local FDB entries
    kept by DSA (because the br_fdb_replay call on deletion will bring the
    entry's refcount from 1 to 0).

    So the problem, really, is that the number of br_fdb_replay() calls is
    not matched with the refcount that a host FDB is offloaded to DSA during
    normal runtime.

    An elegant way to solve the problem would be to make the switchdev
    notification emitted by br_fdb_change_mac_address() result in a host FDB
    kept by DSA which has a refcount exactly equal to the number of ports
    under that bridge. Then, no matter how many DSA ports join or leave that
    bridge, the host FDB entry will always be deleted when there are exactly
    zero remaining DSA switch ports members of the bridge.

    To implement the proposed solution, we remember that the switchdev
    objects and port attributes have some helpers provided by switchdev,
    which can be optionally called by drivers:
    switchdev_handle_port_obj_{add,del} and switchdev_handle_port_attr_set.
    These helpers:
    - fan out a switchdev object/attribute emitted for the bridge towards
    all the lower interfaces that pass the check_cb().
    - fan out a switchdev object/attribute emitted for a bridge port that is
    a LAG towards all the lower interfaces that pass the check_cb().

    In other words, this is the model we need for the FDB events too:
    something that will keep an FDB entry emitted towards a physical port as
    it is, but translate an FDB entry emitted towards the bridge into N FDB
    entries, one per physical port.

    Of course, there are many differences between fanning out a switchdev
    object (VLAN) on 3 lower interfaces of a LAG and fanning out an FDB
    entry on 3 lower interfaces of a LAG. Intuitively, an FDB entry towards
    a LAG should be treated specially, because FDB entries are unicast, we
    can't just install the same address towards 3 destinations. It is
    imaginable that drivers might want to treat this case specifically, so
    create some methods for this case and do not recurse into the LAG lower
    ports, just the bridge ports.

    DSA also listens for FDB entries on "foreign" interfaces, aka interfaces
    bridged with us which are not part of our hardware domain: think an
    Ethernet switch bridged with a Wi-Fi AP. For those addresses, DSA
    installs host FDB entries. However, there we have the same problem
    (those host FDB entries are installed with a refcount of only 1) and an
    even bigger one which we did not have with FDB entries towards the
    bridge:

    br_fdb_replay() is currently not called for FDB entries on foreign
    interfaces, just for the physical port and for the bridge itself.

    So when DSA sniffs an address learned by the software bridge towards a
    foreign interface like an e1000 port, and then that e1000 leaves the
    bridge, DSA remains with the dangling host FDB address. That will be
    fixed separately by replaying all FDB entries and not just the ones
    towards the port and the bridge.

    Signed-off-by: Vladimir Oltean
    Signed-off-by: David S. Miller

    Vladimir Oltean
     

29 Jun, 2021

1 commit

  • In the case where the driver asks for a replay of a certain type of
    event (port object or attribute) for a bridge port that is a LAG, it may
    do so because this port has just joined the LAG.

    But there might already be other switchdev ports in that LAG, and it is
    preferable that those preexisting switchdev ports do not act upon the
    replayed event.

    The solution is to add a context to switchdev events, which is NULL most
    of the time (when the bridge layer initiates the call) but which can be
    set to a value controlled by the switchdev driver when a replay is
    requested. The driver can then check the context to figure out if all
    ports within the LAG should act upon the switchdev event, or just the
    ones that match the context.

    We have to modify all switchdev_handle_* helper functions as well as the
    prototypes in the drivers that use these helpers too, because these
    helpers hide the underlying struct switchdev_notifier_info from us and
    there is no way to retrieve the context otherwise.

    The context structure will be populated and used in later patches.

    Signed-off-by: Vladimir Oltean
    Reviewed-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Vladimir Oltean
     

15 Feb, 2021

1 commit


13 Feb, 2021

1 commit

  • When a struct switchdev_attr is notified through switchdev, there is no
    way to report informational messages, unlike for struct switchdev_obj.

    Signed-off-by: Vladimir Oltean
    Reviewed-by: Ido Schimmel
    Reviewed-by: Florian Fainelli
    Reviewed-by: Nikolay Aleksandrov
    Reviewed-by: Grygorii Strashko
    Signed-off-by: David S. Miller

    Vladimir Oltean
     

29 Jan, 2021

1 commit

  • drivers/net/can/dev.c
    b552766c872f ("can: dev: prevent potential information leak in can_fill_info()")
    3e77f70e7345 ("can: dev: move driver related infrastructure into separate subdir")
    0a042c6ec991 ("can: dev: move netlink related code into seperate file")

    Code move.

    drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
    57ac4a31c483 ("net/mlx5e: Correctly handle changing the number of queues when the interface is down")
    214baf22870c ("net/mlx5e: Support HTB offload")

    Adjacent code changes

    net/switchdev/switchdev.c
    20776b465c0c ("net: switchdev: don't set port_obj_info->handled true when -EOPNOTSUPP")
    ffb68fc58e96 ("net: switchdev: remove the transaction structure from port object notifiers")
    bae33f2b5afe ("net: switchdev: remove the transaction structure from port attributes")

    Transaction parameter gets dropped otherwise keep the fix.

    Signed-off-by: Jakub Kicinski

    Jakub Kicinski
     

28 Jan, 2021

2 commits

  • CONFIG_NET_SWITCHDEV is a bool option. Change the ifeq conditional to
    the standard obj-$(CONFIG_NET_SWITCHDEV) form.

    Use obj-y in net/switchdev/Makefile because Kbuild visits this Makefile
    only when CONFIG_NET_SWITCHDEV=y.

    Signed-off-by: Masahiro Yamada
    Link: https://lore.kernel.org/r/20210125231659.106201-3-masahiroy@kernel.org
    Signed-off-by: Jakub Kicinski

    Masahiro Yamada
     
  • It's not true that switchdev_port_obj_notify() only inspects the
    ->handled field of "struct switchdev_notifier_port_obj_info" if
    call_switchdev_blocking_notifiers() returns 0 - there's a WARN_ON()
    triggering for a non-zero return combined with ->handled not being
    true. But the real problem here is that -EOPNOTSUPP is not being
    properly handled.

    The wrapper functions switchdev_handle_port_obj_add() et al change a
    return value of -EOPNOTSUPP to 0, and the treatment of ->handled in
    switchdev_port_obj_notify() seems to be designed to change that back
    to -EOPNOTSUPP in case nobody actually acted on the notifier (i.e.,
    everybody returned -EOPNOTSUPP).

    Currently, as soon as some device down the stack passes the check_cb()
    check, ->handled gets set to true, which means that
    switchdev_port_obj_notify() cannot actually ever return -EOPNOTSUPP.

    This, for example, means that the detection of hardware offload
    support in the MRP code is broken: switchdev_port_obj_add() used by
    br_mrp_switchdev_send_ring_test() always returns 0, so since the MRP
    code thinks the generation of MRP test frames has been offloaded, no
    such frames are actually put on the wire. Similarly,
    br_mrp_switchdev_set_ring_role() also always returns 0, causing
    mrp->ring_role_offloaded to be set to 1.

    To fix this, continue to set ->handled true if any callback returns
    success or any error distinct from -EOPNOTSUPP. But if all the
    callbacks return -EOPNOTSUPP, make sure that ->handled stays false, so
    the logic in switchdev_port_obj_notify() can propagate that
    information.

    Fixes: 9a9f26e8f7ea ("bridge: mrp: Connect MRP API with the switchdev API")
    Fixes: f30f0601eb93 ("switchdev: Add helpers to aid traversal through lower devices")
    Reviewed-by: Petr Machata
    Signed-off-by: Rasmus Villemoes
    Link: https://lore.kernel.org/r/20210125124116.102928-1-rasmus.villemoes@prevas.dk
    Signed-off-by: Jakub Kicinski

    Rasmus Villemoes
     

12 Jan, 2021

3 commits

  • Since the introduction of the switchdev API, port attributes were
    transmitted to drivers for offloading using a two-step transactional
    model, with a prepare phase that was supposed to catch all errors, and a
    commit phase that was supposed to never fail.

    Some classes of failures can never be avoided, like hardware access, or
    memory allocation. In the latter case, merely attempting to move the
    memory allocation to the preparation phase makes it impossible to avoid
    memory leaks, since commit 91cf8eceffc1 ("switchdev: Remove unused
    transaction item queue") which has removed the unused mechanism of
    passing on the allocated memory between one phase and another.

    It is time we admit that separating the preparation from the commit
    phase is something that is best left for the driver to decide, and not
    something that should be baked into the API, especially since there are
    no switchdev callers that depend on this.

    This patch removes the struct switchdev_trans member from switchdev port
    attribute notifier structures, and converts drivers to not look at this
    member.

    In part, this patch contains a revert of my previous commit 2e554a7a5d8a
    ("net: dsa: propagate switchdev vlan_filtering prepare phase to
    drivers").

    For the most part, the conversion was trivial except for:
    - Rocker's world implementation based on Broadcom OF-DPA had an odd
    implementation of ofdpa_port_attr_bridge_flags_set. The conversion was
    done mechanically, by pasting the implementation twice, then only
    keeping the code that would get executed during prepare phase on top,
    then only keeping the code that gets executed during the commit phase
    on bottom, then simplifying the resulting code until this was obtained.
    - DSA's offloading of STP state, bridge flags, VLAN filtering and
    multicast router could be converted right away. But the ageing time
    could not, so a shim was introduced and this was left for a further
    commit.

    Signed-off-by: Vladimir Oltean
    Acked-by: Linus Walleij
    Acked-by: Jiri Pirko
    Reviewed-by: Kurt Kanzenbach # hellcreek
    Reviewed-by: Linus Walleij # RTL8366RB
    Reviewed-by: Ido Schimmel
    Reviewed-by: Florian Fainelli
    Signed-off-by: Jakub Kicinski

    Vladimir Oltean
     
  • After the removal of the transactional model inside
    switchdev_port_obj_add_now, it has no added value and we can just call
    switchdev_port_obj_notify directly, bypassing this function. Let's
    delete it.

    Signed-off-by: Vladimir Oltean
    Reviewed-by: Florian Fainelli
    Reviewed-by: Ido Schimmel
    Acked-by: Linus Walleij
    Acked-by: Jiri Pirko
    Signed-off-by: Jakub Kicinski

    Vladimir Oltean
     
  • Since the introduction of the switchdev API, port objects were
    transmitted to drivers for offloading using a two-step transactional
    model, with a prepare phase that was supposed to catch all errors, and a
    commit phase that was supposed to never fail.

    Some classes of failures can never be avoided, like hardware access, or
    memory allocation. In the latter case, merely attempting to move the
    memory allocation to the preparation phase makes it impossible to avoid
    memory leaks, since commit 91cf8eceffc1 ("switchdev: Remove unused
    transaction item queue") which has removed the unused mechanism of
    passing on the allocated memory between one phase and another.

    It is time we admit that separating the preparation from the commit
    phase is something that is best left for the driver to decide, and not
    something that should be baked into the API, especially since there are
    no switchdev callers that depend on this.

    This patch removes the struct switchdev_trans member from switchdev port
    object notifier structures, and converts drivers to not look at this
    member.

    Where driver conversion is trivial (like in the case of the Marvell
    Prestera driver, NXP DPAA2 switch, TI CPSW, and Rocker drivers), it is
    done in this patch.

    Where driver conversion needs more attention (DSA, Mellanox Spectrum),
    the conversion is left for subsequent patches and here we only fake the
    prepare/commit phases at a lower level, just not in the switchdev
    notifier itself.

    Where the code has a natural structure that is best left alone as a
    preparation and a commit phase (as in the case of the Ocelot switch),
    that structure is left in place, just made to not depend upon the
    switchdev transactional model.

    Signed-off-by: Vladimir Oltean
    Reviewed-by: Florian Fainelli
    Acked-by: Linus Walleij
    Acked-by: Jiri Pirko
    Reviewed-by: Ido Schimmel
    Signed-off-by: Jakub Kicinski

    Vladimir Oltean
     

24 Sep, 2020

1 commit

  • Update kernel-doc line comments to fix warnings reported by make W=1.
    net/switchdev/switchdev.c:413: warning: Function parameter or
    member 'extack' not described in 'call_switchdev_notifiers'

    Signed-off-by: Tian Tao
    Acked-by: Ivan Vecera
    Signed-off-by: David S. Miller

    Tian Tao
     

14 Jul, 2020

1 commit


14 Jun, 2020

1 commit

  • Since commit 84af7a6194e4 ("checkpatch: kconfig: prefer 'help' over
    '---help---'"), the number of '---help---' has been gradually
    decreasing, but there are still more than 2400 instances.

    This commit finishes the conversion. While I touched the lines,
    I also fixed the indentation.

    There are a variety of indentation styles found.

    a) 4 spaces + '---help---'
    b) 7 spaces + '---help---'
    c) 8 spaces + '---help---'
    d) 1 space + 1 tab + '---help---'
    e) 1 tab + '---help---' (correct indentation)
    f) 1 tab + 1 space + '---help---'
    g) 1 tab + 2 spaces + '---help---'

    In order to convert all of them to 1 tab + 'help', I ran the
    following commend:

    $ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/'

    Signed-off-by: Masahiro Yamada

    Masahiro Yamada
     

27 Feb, 2020

1 commit

  • When configuring a tree of independent bridges, propagating changes
    from the upper bridge across a bridge master to the lower bridge
    ports brings surprises.

    For example, a lower bridge may have vlan filtering enabled. It
    may have a vlan interface attached to the bridge master, which may
    then be incorporated into another bridge. As soon as the lower
    bridge vlan interface is attached to the upper bridge, the lower
    bridge has vlan filtering disabled.

    This occurs because switchdev recursively applies its changes to
    all lower devices no matter what.

    Reviewed-by: Ido Schimmel
    Tested-by: Ido Schimmel
    Signed-off-by: Russell King
    Reviewed-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Russell King
     

18 Feb, 2020

1 commit

  • The current codebase makes use of the zero-length array language
    extension to the C90 standard, but the preferred mechanism to declare
    variable-length types such as these ones is a flexible array member[1][2],
    introduced in C99:

    struct foo {
    int stuff;
    struct boo array[];
    };

    By making use of the mechanism above, we will get a compiler warning
    in case the flexible array does not occur last in the structure, which
    will help us prevent some kind of undefined behavior bugs from being
    inadvertently introduced[3] to the codebase from now on.

    Also, notice that, dynamic memory allocations won't be affected by
    this change:

    "Flexible array members have incomplete type, and so the sizeof operator
    may not be applied. As a quirk of the original implementation of
    zero-length arrays, sizeof evaluates to zero."[1]

    This issue was found with the help of Coccinelle.

    [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
    [2] https://github.com/KSPP/linux/issues/21
    [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")

    Signed-off-by: Gustavo A. R. Silva
    Signed-off-by: David S. Miller

    Gustavo A. R. Silva
     

31 May, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 3029 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

21 May, 2019

1 commit


02 Mar, 2019

1 commit

  • There are no more in tree users of the
    switchdev_trans_item_{dequeue,enqueue} or switchdev_trans_item structure
    in the kernel since commit 00fc0c51e35b ("rocker: Change world_ops API
    and implementation to be switchdev independant").

    Remove this unused code and update the documentation accordingly since.

    Signed-off-by: Florian Fainelli
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Florian Fainelli
     

28 Feb, 2019

2 commits

  • Drop switchdev_ops.switchdev_port_attr_set. Drop the uses of this field
    from all clients, which were migrated to use switchdev notification in
    the previous patches.

    Add a new function switchdev_port_attr_notify() that sends the switchdev
    notifications SWITCHDEV_PORT_ATTR_SET and calls the blocking (process)
    notifier chain.

    We have one odd case within net/bridge/br_switchdev.c with the
    SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS attribute identifier that
    requires executing from atomic context, we deal with that one
    specifically.

    Drop __switchdev_port_attr_set() and update switchdev_port_attr_set()
    likewise.

    Signed-off-by: Florian Fainelli
    Reviewed-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • In preparation for allowing switchdev enabled drivers to veto specific
    attribute settings from within the context of the caller, introduce a
    new switchdev notifier type for port attributes.

    Suggested-by: Ido Schimmel
    Reviewed-by: Ido Schimmel
    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     

25 Feb, 2019

1 commit


07 Feb, 2019

1 commit

  • Now that we have a dedicated NDO for getting a port's parent ID, get rid
    of SWITCHDEV_ATTR_ID_PORT_PARENT_ID and convert all callers to use the
    NDO exclusively. This is a preliminary change to getting rid of
    switchdev_ops eventually.

    Signed-off-by: Florian Fainelli
    Reviewed-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Florian Fainelli
     

18 Jan, 2019

1 commit


13 Dec, 2018

3 commits

  • Drivers use switchdev_handle_port_obj_add() to handle recursive descent
    through lower devices. Change this function prototype to take add_cb
    that itself takes an extack argument. Decode extack from
    switchdev_notifier_port_obj_info and pass it to add_cb.

    Update mlxsw and ocelot drivers which use this helper.

    Signed-off-by: Petr Machata
    Acked-by: Jiri Pirko
    Acked-by: Ivan Vecera
    Reviewed-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Petr Machata
     
  • In order to pass extack to the drivers that need it, add an extack field
    to struct switchdev_notifier_info, and an extack argument to the
    function call_switchdev_blocking_notifiers(). Also add a helper function
    switchdev_notifier_info_to_extack().

    Signed-off-by: Petr Machata
    Acked-by: Jiri Pirko
    Acked-by: Ivan Vecera
    Reviewed-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Petr Machata
     
  • After the previous patch, bridge driver has extack argument available to
    pass to switchdev. Therefore extend switchdev_port_obj_add() with this
    argument, updating all callers, and passing the argument through to
    switchdev_port_obj_notify().

    Signed-off-by: Petr Machata
    Acked-by: Jiri Pirko
    Acked-by: Ivan Vecera
    Reviewed-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Petr Machata
     

24 Nov, 2018

3 commits

  • Drop switchdev_ops.switchdev_port_obj_add and _del. Drop the uses of
    this field from all clients, which were migrated to use switchdev
    notification in the previous patches.

    Add a new function switchdev_port_obj_notify() that sends the switchdev
    notifications SWITCHDEV_PORT_OBJ_ADD and _DEL.

    Update switchdev_port_obj_del_now() to dispatch to this new function.
    Drop __switchdev_port_obj_add() and update switchdev_port_obj_add()
    likewise.

    Signed-off-by: Petr Machata
    Reviewed-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Petr Machata
     
  • After the transition from switchdev operations to notifier chain (which
    will take place in following patches), the onus is on the driver to find
    its own devices below possible layer of LAG or other uppers.

    The logic to do so is fairly repetitive: each driver is looking for its
    own devices among the lowers of the notified device. For those that it
    finds, it calls a handler. To indicate that the event was handled,
    struct switchdev_notifier_port_obj_info.handled is set. The differences
    lie only in what constitutes an "own" device and what handler to call.

    Therefore abstract this logic into two helpers,
    switchdev_handle_port_obj_add() and switchdev_handle_port_obj_del(). If
    a driver only supports physical ports under a bridge device, it will
    simply avoid this layer of indirection.

    One area where this helper diverges from the current switchdev behavior
    is the case of mixed lowers, some of which are switchdev ports and some
    of which are not. Previously, such scenario would fail with -EOPNOTSUPP.
    The helper could do that for lowers for which the passed-in predicate
    doesn't hold. That would however break the case that switchdev ports
    from several different drivers are stashed under one master, a scenario
    that switchdev currently happily supports. Therefore tolerate any and
    all unknown netdevices, whether they are backed by a switchdev driver
    or not.

    Signed-off-by: Petr Machata
    Reviewed-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Petr Machata
     
  • In general one can't assume that a switchdev notifier is called in a
    non-atomic context, and correspondingly, the switchdev notifier chain is
    an atomic one.

    However, port object addition and deletion messages are delivered from a
    process context. Even the MDB addition messages, whose delivery is
    scheduled from atomic context, are queued and the delivery itself takes
    place in blocking context. For VLAN messages in particular, keeping the
    blocking nature is important for error reporting.

    Therefore introduce a blocking notifier chain and related service
    functions to distribute the notifications for which a blocking context
    can be assumed.

    Signed-off-by: Petr Machata
    Reviewed-by: Jiri Pirko
    Reviewed-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Petr Machata
     

10 Nov, 2017

1 commit


08 Aug, 2017

2 commits

  • Currently the bridge port flags, vlans, FDBs and MDBs can be offloaded
    through the bridge code, making the switchdev's SELF bridge bypass
    implementation to be redundant. This implies several changes:
    - No need for dump infra in switchdev, DSA's special case is handled
    privately.
    - Remove obj_dump from switchdev_ops.
    - FDBs are removed from obj_add/del routines, due to the fact that they
    are offloaded through the bridge notification chain.
    - The switchdev_port_bridge_xx() and switchdev_port_fdb_xx() functions
    can be removed.

    Signed-off-by: Arkadi Sharshevsky
    Reviewed-by: Vivien Didelot
    Acked-by: Jiri Pirko
    Reviewed-by: Ivan Vecera
    Reviewed-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Arkadi Sharshevsky
     
  • >From all switchdev devices only DSA requires special FDB dump. This is due
    to lack of ability for syncing the hardware learned FDBs with the bridge.
    Due to this it is removed from switchdev and moved inside DSA.

    Signed-off-by: Arkadi Sharshevsky
    Signed-off-by: David S. Miller

    Arkadi Sharshevsky