11 May, 2020

1 commit

  • Commit 8db0a2ee2c63 ("net: bridge: reject DSA-enabled master netdevices
    as bridge members") added a special check in br_if.c in order to check
    for a DSA master network device with a tagging protocol configured. This
    was done because back then, such devices, once enslaved in a bridge
    would become inoperative and would not pass DSA tagged traffic anymore
    due to br_handle_frame returning RX_HANDLER_CONSUMED.

    But right now we have valid use cases which do require bridging of DSA
    masters. One such example is when the DSA master ports are DSA switch
    ports themselves (in a disjoint tree setup). This should be completely
    equivalent, functionally speaking, from having multiple DSA switches
    hanging off of the ports of a switchdev driver. So we should allow the
    enslaving of DSA tagged master network devices.

    Instead of the regular br_handle_frame(), install a new function
    br_handle_frame_dummy() on these DSA masters, which returns
    RX_HANDLER_PASS in order to call into the DSA specific tagging protocol
    handlers, and lift the restriction from br_add_if.

    Suggested-by: Nikolay Aleksandrov
    Suggested-by: Florian Fainelli
    Signed-off-by: Vladimir Oltean
    Acked-by: Nikolay Aleksandrov
    Reviewed-by: Florian Fainelli
    Tested-by: Florian Fainelli
    Signed-off-by: Jakub Kicinski

    Vladimir Oltean
     

28 Apr, 2020

1 commit

  • To integrate MRP into the bridge, the bridge needs to do the following:
    - detect if the MRP frame was received on MRP ring port in that case it would be
    processed otherwise just forward it as usual.
    - enable parsing of MRP
    - before whenever the bridge was set up, it would set all the ports in
    forwarding state. Add an extra check to not set ports in forwarding state if
    the port is an MRP ring port. The reason of this change is that if the MRP
    instance initially sets the port in blocked state by setting the bridge up it
    would overwrite this setting.

    Reviewed-by: Nikolay Aleksandrov
    Signed-off-by: Horatiu Vultur
    Signed-off-by: David S. Miller

    Horatiu Vultur
     

31 May, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 3029 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

11 May, 2019

1 commit

  • Currently error return from kobject_init_and_add() is not followed by a
    call to kobject_put(). This means there is a memory leak. We currently
    set p to NULL so that kfree() may be called on it as a noop, the code is
    arguably clearer if we move the kfree() up closer to where it is
    called (instead of after goto jump).

    Remove a goto label 'err1' and jump to call to kobject_put() in error
    return from kobject_init_and_add() fixing the memory leak. Re-name goto
    label 'put_back' to 'err1' now that we don't use err1, following current
    nomenclature (err1, err2 ...). Move call to kfree out of the error
    code at bottom of function up to closer to where memory was allocated.
    Add comment to clarify call to kfree().

    Signed-off-by: Tobin C. Harding
    Signed-off-by: David S. Miller

    Tobin C. Harding
     

30 Mar, 2019

1 commit


14 Dec, 2018

1 commit

  • When a port is attached to a bridge, the address of the bridge in
    question may change as well. Even if it would not change at this
    point (because the current bridge address is lower), it might end up
    changing later as a result of detach of another port, which can't be
    vetoed.

    Therefore issue NETDEV_PRE_CHANGEADDR regardless of whether the address
    will be used at this point or not, and make sure all involved parties
    would agree with the change.

    Signed-off-by: Petr Machata
    Acked-by: Jiri Pirko
    Reviewed-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Petr Machata
     

13 Dec, 2018

1 commit

  • ndo_bridge_setlink has been updated in the previous patch to have extack
    available, and changelink RTNL op has had this argument since the time
    extack was added. Propagate both through the bridge driver to eventually
    reach br_switchdev_port_vlan_add(), where it will be used by subsequent
    patches.

    Signed-off-by: Petr Machata
    Acked-by: Jiri Pirko
    Acked-by: Nikolay Aleksandrov
    Acked-by: Ivan Vecera
    Acked-by: Roopa Prabhu
    Reviewed-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Petr Machata
     

22 Nov, 2018

1 commit

  • Allow querying bridge port flags so that drivers capable of performing
    VxLAN learning will update the bridge driver only if learning is enabled
    on its bridge port corresponding to the VxLAN device.

    Signed-off-by: Ido Schimmel
    Reviewed-by: Petr Machata
    Signed-off-by: David S. Miller

    Ido Schimmel
     

27 Sep, 2018

1 commit


01 Sep, 2018

1 commit


24 Jul, 2018

1 commit

  • This patch adds a new port attribute - IFLA_BRPORT_BACKUP_PORT, which
    allows to set a backup port to be used for known unicast traffic if the
    port has gone carrier down. The backup pointer is rcu protected and set
    only under RTNL, a counter is maintained so when deleting a port we know
    how many other ports reference it as a backup and we remove it from all.
    Also the pointer is in the first cache line which is hot at the time of
    the check and thus in the common case we only add one more test.
    The backup port will be used only for the non-flooding case since
    it's a part of the bridge and the flooded packets will be forwarded to it
    anyway. To remove the forwarding just send a 0/non-existing backup port.
    This is used to avoid numerous scalability problems when using MLAG most
    notably if we have thousands of fdbs one would need to change all of them
    on port carrier going down which takes too long and causes a storm of fdb
    notifications (and again when the port comes back up). In a Multi-chassis
    Link Aggregation setup usually hosts are connected to two different
    switches which act as a single logical switch. Those switches usually have
    a control and backup link between them called peerlink which might be used
    for communication in case a host loses connectivity to one of them.
    We need a fast way to failover in case a host port goes down and currently
    none of the solutions (like bond) cannot fulfill the requirements because
    the participating ports are actually the "master" devices and must have the
    same peerlink as their backup interface and at the same time all of them
    must participate in the bridge device. As Roopa noted it's normal practice
    in routing called fast re-route where a precalculated backup path is used
    when the main one is down.
    Another use case of this is with EVPN, having a single vxlan device which
    is backup of every port. Due to the nature of master devices it's not
    currently possible to use one device as a backup for many and still have
    all of them participate in the bridge (which is master itself).
    More detailed information about MLAG is available at the link below.
    https://docs.cumulusnetworks.com/display/DOCS/Multi-Chassis+Link+Aggregation+-+MLAG

    Further explanation and a diagram by Roopa:
    Two switches acting in a MLAG pair are connected by the peerlink
    interface which is a bridge port.

    the config on one of the switches looks like the below. The other
    switch also has a similar config.
    eth0 is connected to one port on the server. And the server is
    connected to both switches.

    br0 -- team0---eth0
    |
    -- switch-peerlink

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

21 Jul, 2018

1 commit


04 May, 2018

2 commits


30 Apr, 2018

1 commit

  • When we set a bond slave's master to bridge via ioctl, we only check
    the IFF_BRIDGE_PORT flag. Although we will find the slave's real master
    at netdev_master_upper_dev_link() later, it already does some settings
    and allocates some resources. It would be better to return as early
    as possible.

    v1 -> v2:
    use netdev_master_upper_dev_get() instead of netdev_has_any_upper_dev()
    to check if we have a master, because not all upper devs are masters,
    e.g. vlan device.

    Reported-by: syzbot+de73361ee4971b6e6f75@syzkaller.appspotmail.com
    Signed-off-by: Hangbin Liu
    Acked-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Hangbin Liu
     

01 Apr, 2018

2 commits

  • As Roopa noted today the biggest source of problems when configuring
    bridge and ports is that the bridge MTU keeps changing automatically on
    port events (add/del/changemtu). That leads to inconsistent behaviour
    and network config software needs to chase the MTU and fix it on each
    such event. Let's improve on that situation and allow for the user to
    set any MTU within ETH_MIN/MAX limits, but once manually configured it
    is the user's responsibility to keep it correct afterwards.

    In case the MTU isn't manually set - the behaviour reverts to the
    previous and the bridge follows the minimum MTU.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • Recently the bridge was changed to automatically set maximum MTU on port
    events (add/del/changemtu) when vlan filtering is enabled, but that
    actually changes behaviour in a way which breaks some setups and can lead
    to packet drops. In order to still allow that maximum to be set while being
    compatible, we add the ability for the user to tune the bridge MTU up to
    the maximum when vlan filtering is enabled, but that has to be done
    explicitly and all port events (add/del/changemtu) lead to resetting that
    MTU to the minimum as before.

    Suggested-by: Roopa Prabhu
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

24 Mar, 2018

2 commits

  • We need to use br_vlan_enabled() helper otherwise we'll break builds
    without bridge vlans:
    net/bridge//br_if.c: In function ‘br_mtu’:
    net/bridge//br_if.c:458:8: error: ‘const struct net_bridge’ has no
    member named ‘vlan_enabled’
    if (br->vlan_enabled)
    ^
    net/bridge//br_if.c:462:1: warning: control reaches end of non-void
    function [-Wreturn-type]
    }
    ^
    scripts/Makefile.build:324: recipe for target 'net/bridge//br_if.o'
    failed

    Fixes: 419d14af9e07 ("bridge: Allow max MTU when multiple VLANs present")
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • If the bridge is allowing multiple VLANs, some VLANs may have
    different MTUs. Instead of choosing the minimum MTU for the
    bridge interface, choose the maximum MTU of the bridge members.
    With this the user only needs to set a larger MTU on the member
    ports that are participating in the large MTU VLANS.

    Signed-off-by: Chas Williams
    Reviewed-by: Nikolay Aleksandrov
    Acked-by: Roopa Prabhu
    Signed-off-by: David S. Miller

    Chas Williams
     

02 Nov, 2017

1 commit

  • Currently the bridge device doesn't generate any notifications upon vlan
    modifications on itself because it doesn't use the generic bridge
    notifications.
    With the recent changes we know if anything was modified in the vlan config
    thus we can generate a notification when necessary for the bridge device
    so add support to br_ifinfo_notify() similar to how other combined
    functions are done - if port is present it takes precedence, otherwise
    notify about the bridge. I've explicitly marked the locations where the
    notification should be always for the port by setting bridge to NULL.
    I've also taken the liberty to rearrange each modified function's local
    variables in reverse xmas tree as well.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

09 Oct, 2017

1 commit

  • This patch adds a new bridge port flag BR_NEIGH_SUPPRESS to
    suppress arp and nd flood on bridge ports. It implements
    rfc7432, section 10.
    https://tools.ietf.org/html/rfc7432#section-10
    for ethernet VPN deployments. It is similar to the existing
    BR_PROXYARP* flags but has a few semantic differences to conform
    to EVPN standard. Unlike the existing flags, this new flag suppresses
    flood of all neigh discovery packets (arp and nd) to tunnel ports.
    Supports both vlan filtering and non-vlan filtering bridges.

    In case of EVPN, it is mainly used to avoid flooding
    of arp and nd packets to tunnel ports like vxlan.

    This patch adds netlink and sysfs support to set this bridge port
    flag.

    Signed-off-by: Roopa Prabhu
    Signed-off-by: David S. Miller

    Roopa Prabhu
     

05 Oct, 2017

2 commits


27 May, 2017

1 commit

  • It's useful for drivers supporting bridge offload to be able to query
    the bridge's VLAN filtering state.

    Currently, upon enslavement to a bridge master, the offloading driver
    will only learn about the bridge's VLAN filtering state after the bridge
    device was already linked with its slave.

    Being able to query the bridge's VLAN filtering state allows such
    drivers to forbid enslavement in case resource couldn't be allocated for
    a VLAN-aware bridge and also choose the correct initialization routine
    for the enslaved port, which is dependent on the bridge type.

    Signed-off-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Reviewed-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Ido Schimmel
     

28 Apr, 2017

1 commit

  • Support for l2 multicast flood control was added in commit b6cb5ac8331b
    ("net: bridge: add per-port multicast flood flag"). It allows broadcast
    as it was introduced specifically for unknown multicast flood control.
    But as broadcast is a special case of multicast, this may also need to
    be disabled. For this purpose, introduce a flag to disable the flooding
    of received l2 broadcasts. This approach is backwards compatible and
    provides flexibility in filtering for the desired packet types.

    Cc: Nikolay Aleksandrov
    Signed-off-by: Mike Manning
    Reviewed-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Mike Manning
     

27 Apr, 2017

1 commit


26 Apr, 2017

1 commit

  • During removing a bridge device, if the bridge is still up, a new mdb entry
    still can be added in br_multicast_add_group() after all mdb entries are
    removed in br_multicast_dev_del(). Like the path:

    mld_ifc_timer_expire ->
    mld_sendpack -> ...
    br_multicast_rcv ->
    br_multicast_add_group

    The new mp's timer will be set up. If the timer expires after the bridge
    is freed, it may cause use-after-free panic in br_multicast_group_expired.

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
    IP: [] br_multicast_group_expired+0x28/0xb0 [bridge]
    Call Trace:

    [] call_timer_fn+0x36/0x110
    [] ? br_mdb_free+0x30/0x30 [bridge]
    [] run_timer_softirq+0x237/0x340
    [] __do_softirq+0xef/0x280
    [] call_softirq+0x1c/0x30
    [] do_softirq+0x65/0xa0
    [] irq_exit+0x115/0x120
    [] smp_apic_timer_interrupt+0x45/0x60
    [] apic_timer_interrupt+0x6d/0x80

    Nikolay also found it would cause a memory leak - the mdb hash is
    reallocated and not freed due to the mdb rehash.

    unreferenced object 0xffff8800540ba800 (size 2048):
    backtrace:
    [] kmemleak_alloc+0x67/0xc0
    [] __kmalloc+0x1ba/0x3e0
    [] br_mdb_rehash+0x5e/0x340 [bridge]
    [] br_multicast_new_group+0x43f/0x6e0 [bridge]
    [] br_multicast_add_group+0x203/0x260 [bridge]
    [] br_multicast_rcv+0x945/0x11d0 [bridge]
    [] br_dev_xmit+0x180/0x470 [bridge]
    [] dev_hard_start_xmit+0xbb/0x3d0
    [] __dev_queue_xmit+0xb13/0xc10
    [] dev_queue_xmit+0x10/0x20
    [] ip6_finish_output2+0x5ca/0xac0 [ipv6]
    [] ip6_finish_output+0x126/0x2c0 [ipv6]
    [] ip6_output+0xe5/0x390 [ipv6]
    [] NF_HOOK.constprop.44+0x6c/0x240 [ipv6]
    [] mld_sendpack+0x216/0x3e0 [ipv6]
    [] mld_ifc_timer_expire+0x18b/0x2b0 [ipv6]

    This could happen when ip link remove a bridge or destroy a netns with a
    bridge device inside.

    With Nikolay's suggestion, this patch is to clean up bridge multicast in
    ndo_uninit after bridge dev is shutdown, instead of br_dev_delete, so
    that netif_running check in br_multicast_add_group can avoid this issue.

    v1->v2:
    - fix this issue by moving br_multicast_dev_del to ndo_uninit, instead
    of calling dev_close in br_dev_delete.

    (NOTE: Depends upon b6fe0440c637 ("bridge: implement missing ndo_uninit()"))

    Fixes: e10177abf842 ("bridge: multicast: fix handling of temp and perm entries")
    Reported-by: Jianwen Ji
    Signed-off-by: Xin Long
    Reviewed-by: Stephen Hemminger
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Xin Long
     

16 Apr, 2017

1 commit


12 Apr, 2017

1 commit

  • While the bridge driver implements an ndo_init(), it was missing a
    symmetric ndo_uninit(), causing the different de-initialization
    operations to be scattered around its dellink() and destructor().

    Implement a symmetric ndo_uninit() and remove the overlapping operations
    from its dellink() and destructor().

    This is a prerequisite for the next patch, as it allows us to have a
    proper cleanup upon changelink() failure during the bridge's newlink().

    Fixes: b6677449dff6 ("bridge: netlink: call br_changelink() during br_dev_newlink()")
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Ido Schimmel
     

29 Mar, 2017

1 commit

  • There is an include loop between netdevice.h, dsa.h, devlink.h because
    of NETDEV_ALIGN, making it impossible to use devlink structures in
    dsa.h.

    Break this loop by taking dsa.h out of netdevice.h, add a forward
    declaration of dsa_switch_tree and netdev_set_default_ethtool_ops()
    function, which is what netdevice.h requires.

    No longer having dsa.h in netdevice.h means the includes in dsa.h no
    longer get included. This breaks a few other files which depend on
    these includes. Add these directly in the affected file.

    Signed-off-by: Andrew Lunn
    Reviewed-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Andrew Lunn
     

07 Feb, 2017

1 commit

  • Move the fdb garbage collector to a workqueue which fires at least 10
    milliseconds apart and cleans chain by chain allowing for other tasks
    to run in the meantime. When having thousands of fdbs the system is much
    more responsive. Most importantly remove the need to check if the
    matched entry has expired in __br_fdb_get that causes false-sharing and
    is completely unnecessary if we cleanup entries, at worst we'll get 10ms
    of traffic for that entry before it gets deleted.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

02 Sep, 2016

1 commit


27 Aug, 2016

1 commit

  • switchdev_port_fwd_mark_set() is used to set the 'offload_fwd_mark' of
    port netdevs so that packets being flooded by the device won't be
    flooded twice.

    It works by assigning a unique identifier (the ifindex of the first
    bridge port) to bridge ports sharing the same parent ID. This prevents
    packets from being flooded twice by the same switch, but will flood
    packets through bridge ports belonging to a different switch.

    This method is problematic when stacked devices are taken into account,
    such as VLANs. In such cases, a physical port netdev can have upper
    devices being members in two different bridges, thus requiring two
    different 'offload_fwd_mark's to be configured on the port netdev, which
    is impossible.

    The main problem is that packet and netdev marking is performed at the
    physical netdev level, whereas flooding occurs between bridge ports,
    which are not necessarily port netdevs.

    Instead, packet and netdev marking should really be done in the bridge
    driver with the switch driver only telling it which packets it already
    forwarded. The bridge driver will mark such packets using the mark
    assigned to the ingress bridge port and will prevent the packet from
    being forwarded through any bridge port sharing the same mark (i.e.
    having the same parent ID).

    Remove the current switchdev 'offload_fwd_mark' implementation and
    instead implement the proposed method. In addition, make rocker - the
    sole user of the mark - use the proposed method.

    Signed-off-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Ido Schimmel
     

30 Jun, 2016

1 commit

  • This patch adds stats support for the currently used IGMP/MLD types by the
    bridge. The stats are per-port (plus one stat per-bridge) and per-direction
    (RX/TX). The stats are exported via netlink via the new linkxstats API
    (RTM_GETSTATS). In order to minimize the performance impact, a new option
    is used to enable/disable the stats - multicast_stats_enabled, similar to
    the recent vlan stats. Also in order to avoid multiple IGMP/MLD type
    lookups and checks, we make use of the current "igmp" member of the bridge
    private skb->cb region to record the type on Rx (both host-generated and
    external packets pass by multicast_rcv()). We can do that since the igmp
    member was used as a boolean and all the valid IGMP/MLD types are positive
    values. The normal bridge fast-path is not affected at all, the only
    affected paths are the flooding ones and since we make use of the IGMP/MLD
    type, we can quickly determine if the packet should be counted using
    cache-hot data (cb's igmp member). We add counters for:
    * IGMP Queries
    * IGMP Leaves
    * IGMP v1/v2/v3 reports

    * MLD Queries
    * MLD Leaves
    * MLD v1/v2 reports

    These are invaluable when monitoring or debugging complex multicast setups
    with bridges.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

22 Mar, 2016

1 commit

  • It can be useful to lower max_gso_segs on NIC with very low
    number of TX descriptors like bcmgenet.

    However, this is defeated by bridge since it does not propagate
    the lower value of max_gso_segs and max_gso_size.

    Signed-off-by: Eric Dumazet
    Cc: Petri Gynther
    Cc: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric Dumazet
     

02 Mar, 2016

1 commit


26 Feb, 2016

1 commit


07 Jan, 2016

1 commit


04 Dec, 2015

2 commits