03 Jan, 2018

1 commit

  • [ Upstream commit 84aeb437ab98a2bce3d4b2111c79723aedfceb33 ]

    The early call to br_stp_change_bridge_id in bridge's newlink can cause
    a memory leak if an error occurs during the newlink because the fdb
    entries are not cleaned up if a different lladdr was specified, also
    another minor issue is that it generates fdb notifications with
    ifindex = 0. Another unrelated memory leak is the bridge sysfs entries
    which get added on NETDEV_REGISTER event, but are not cleaned up in the
    newlink error path. To remove this special case the call to
    br_stp_change_bridge_id is done after netdev register and we cleanup the
    bridge on changelink error via br_dev_delete to plug all leaks.

    This patch makes netlink bridge destruction on newlink error the same as
    dellink and ioctl del which is necessary since at that point we have a
    fully initialized bridge device.

    To reproduce the issue:
    $ ip l add br0 address 00:11:22:33:44:55 type bridge group_fwd_mask 1
    RTNETLINK answers: Invalid argument

    $ rmmod bridge
    [ 1822.142525] =============================================================================
    [ 1822.143640] BUG bridge_fdb_cache (Tainted: G O ): Objects remaining in bridge_fdb_cache on __kmem_cache_shutdown()
    [ 1822.144821] -----------------------------------------------------------------------------

    [ 1822.145990] Disabling lock debugging due to kernel taint
    [ 1822.146732] INFO: Slab 0x0000000092a844b2 objects=32 used=2 fp=0x00000000fef011b0 flags=0x1ffff8000000100
    [ 1822.147700] CPU: 2 PID: 13584 Comm: rmmod Tainted: G B O 4.15.0-rc2+ #87
    [ 1822.148578] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
    [ 1822.150008] Call Trace:
    [ 1822.150510] dump_stack+0x78/0xa9
    [ 1822.151156] slab_err+0xb1/0xd3
    [ 1822.151834] ? __kmalloc+0x1bb/0x1ce
    [ 1822.152546] __kmem_cache_shutdown+0x151/0x28b
    [ 1822.153395] shutdown_cache+0x13/0x144
    [ 1822.154126] kmem_cache_destroy+0x1c0/0x1fb
    [ 1822.154669] SyS_delete_module+0x194/0x244
    [ 1822.155199] ? trace_hardirqs_on_thunk+0x1a/0x1c
    [ 1822.155773] entry_SYSCALL_64_fastpath+0x23/0x9a
    [ 1822.156343] RIP: 0033:0x7f929bd38b17
    [ 1822.156859] RSP: 002b:00007ffd160e9a98 EFLAGS: 00000202 ORIG_RAX: 00000000000000b0
    [ 1822.157728] RAX: ffffffffffffffda RBX: 00005578316ba090 RCX: 00007f929bd38b17
    [ 1822.158422] RDX: 00007f929bd9ec60 RSI: 0000000000000800 RDI: 00005578316ba0f0
    [ 1822.159114] RBP: 0000000000000003 R08: 00007f929bff5f20 R09: 00007ffd160e8a11
    [ 1822.159808] R10: 00007ffd160e9860 R11: 0000000000000202 R12: 00007ffd160e8a80
    [ 1822.160513] R13: 0000000000000000 R14: 0000000000000000 R15: 00005578316ba090
    [ 1822.161278] INFO: Object 0x000000007645de29 @offset=0
    [ 1822.161666] INFO: Object 0x00000000d5df2ab5 @offset=128

    Fixes: 30313a3d5794 ("bridge: Handle IFLA_ADDRESS correctly when creating bridge device")
    Fixes: 5b8d5429daa0 ("bridge: netlink: register netdevice before executing changelink")
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Nikolay Aleksandrov
     

22 Oct, 2017

1 commit

  • When vlan tunnels were introduced, vlan range errors got silently
    dropped and instead 0 was returned always. Restore the previous
    behaviour and return errors to user-space.

    Fixes: efa5356b0d97 ("bridge: per vlan dst_metadata netlink support")
    Signed-off-by: Nikolay Aleksandrov
    Acked-by: Roopa Prabhu
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

27 Jun, 2017

4 commits


09 Jun, 2017

1 commit

  • Currently the flood, learning and learning_sync port attributes are
    offloaded by setting the SELF flag. Add support for offloading the
    flood and learning attribute through the bridge code. In case of
    setting an unsupported flag on a offloded port the operation will
    fail.

    The learning_sync attribute doesn't have any software representation
    and cannot be offloaded through the bridge code.

    Signed-off-by: Arkadi Sharshevsky
    Reviewed-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Reviewed-by: Nikolay Aleksandrov
    Reviewed-by: Ivan Vecera
    Signed-off-by: David S. Miller

    Arkadi Sharshevsky
     

07 Jun, 2017

2 commits


27 May, 2017

1 commit

  • It's useful for drivers supporting bridge offload to be able to query
    the bridge's VLAN filtering state.

    Currently, upon enslavement to a bridge master, the offloading driver
    will only learn about the bridge's VLAN filtering state after the bridge
    device was already linked with its slave.

    Being able to query the bridge's VLAN filtering state allows such
    drivers to forbid enslavement in case resource couldn't be allocated for
    a VLAN-aware bridge and also choose the correct initialization routine
    for the enslaved port, which is dependent on the bridge type.

    Signed-off-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Reviewed-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Ido Schimmel
     

18 May, 2017

1 commit

  • Currently it is allowed to set the default pvid of a bridge to a value
    above VLAN_VID_MASK (0xfff). This patch adds a check to br_validate and
    returns -EINVAL in case the pvid is out of bounds.

    Reproduce by calling:

    [root@test ~]# ip l a type bridge
    [root@test ~]# ip l a type dummy
    [root@test ~]# ip l s bridge0 type bridge vlan_filtering 1
    [root@test ~]# ip l s bridge0 type bridge vlan_default_pvid 9999
    [root@test ~]# ip l s dummy0 master bridge0
    [root@test ~]# bridge vlan
    port vlan ids
    bridge0 9999 PVID Egress Untagged

    dummy0 9999 PVID Egress Untagged

    Fixes: 0f963b7592ef ("bridge: netlink: add support for default_pvid")
    Acked-by: Nikolay Aleksandrov
    Signed-off-by: Tobias Jungel
    Acked-by: Sabrina Dubroca
    Signed-off-by: David S. Miller

    Tobias Jungel
     

05 May, 2017

1 commit

  • The attribute sizes for IFLA_BRPORT_MCAST_FLOOD and
    IFLA_BRPORT_BCAST_FLOOD weren't accounted for in br_port_info_size()
    when they were added. Do so now and also add the corresponding policy
    entries:

    Cc: Nikolay Aleksandrov
    Cc: Mike Manning
    Fixes: b6cb5ac8331b ("net: bridge: add per-port multicast flood flag")
    Fixes: 99f906e9ad7b ("bridge: add per-port broadcast flood flag")
    Signed-off-by: Tobias Klauser
    Acked-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Tobias Klauser
     

28 Apr, 2017

1 commit

  • Support for l2 multicast flood control was added in commit b6cb5ac8331b
    ("net: bridge: add per-port multicast flood flag"). It allows broadcast
    as it was introduced specifically for unknown multicast flood control.
    But as broadcast is a special case of multicast, this may also need to
    be disabled. For this purpose, introduce a flag to disable the flooding
    of received l2 broadcasts. This approach is backwards compatible and
    provides flexibility in filtering for the desired packet types.

    Cc: Nikolay Aleksandrov
    Signed-off-by: Mike Manning
    Reviewed-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Mike Manning
     

16 Apr, 2017

1 commit


14 Apr, 2017

1 commit


12 Apr, 2017

1 commit

  • Peter reported a kernel oops when executing the following command:

    $ ip link add name test type bridge vlan_default_pvid 1

    [13634.939408] BUG: unable to handle kernel NULL pointer dereference at
    0000000000000190
    [13634.939436] IP: __vlan_add+0x73/0x5f0
    [...]
    [13634.939783] Call Trace:
    [13634.939791] ? pcpu_next_unpop+0x3b/0x50
    [13634.939801] ? pcpu_alloc+0x3d2/0x680
    [13634.939810] ? br_vlan_add+0x135/0x1b0
    [13634.939820] ? __br_vlan_set_default_pvid.part.28+0x204/0x2b0
    [13634.939834] ? br_changelink+0x120/0x4e0
    [13634.939844] ? br_dev_newlink+0x50/0x70
    [13634.939854] ? rtnl_newlink+0x5f5/0x8a0
    [13634.939864] ? rtnl_newlink+0x176/0x8a0
    [13634.939874] ? mem_cgroup_commit_charge+0x7c/0x4e0
    [13634.939886] ? rtnetlink_rcv_msg+0xe1/0x220
    [13634.939896] ? lookup_fast+0x52/0x370
    [13634.939905] ? rtnl_newlink+0x8a0/0x8a0
    [13634.939915] ? netlink_rcv_skb+0xa1/0xc0
    [13634.939925] ? rtnetlink_rcv+0x24/0x30
    [13634.939934] ? netlink_unicast+0x177/0x220
    [13634.939944] ? netlink_sendmsg+0x2fe/0x3b0
    [13634.939954] ? _copy_from_user+0x39/0x40
    [13634.939964] ? sock_sendmsg+0x30/0x40
    [13634.940159] ? ___sys_sendmsg+0x29d/0x2b0
    [13634.940326] ? __alloc_pages_nodemask+0xdf/0x230
    [13634.940478] ? mem_cgroup_commit_charge+0x7c/0x4e0
    [13634.940592] ? mem_cgroup_try_charge+0x76/0x1a0
    [13634.940701] ? __handle_mm_fault+0xdb9/0x10b0
    [13634.940809] ? __sys_sendmsg+0x51/0x90
    [13634.940917] ? entry_SYSCALL_64_fastpath+0x1e/0xad

    The problem is that the bridge's VLAN group is created after setting the
    default PVID, when registering the netdevice and executing its
    ndo_init().

    Fix this by changing the order of both operations, so that
    br_changelink() is only processed after the netdevice is registered,
    when the VLAN group is already initialized.

    Fixes: b6677449dff6 ("bridge: netlink: call br_changelink() during br_dev_newlink()")
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: Ido Schimmel
    Reported-by: Peter V. Saveliev
    Tested-by: Peter V. Saveliev
    Signed-off-by: David S. Miller

    Ido Schimmel
     

08 Feb, 2017

1 commit


07 Feb, 2017

1 commit

  • Move the fdb garbage collector to a workqueue which fires at least 10
    milliseconds apart and cleans chain by chain allowing for other tasks
    to run in the meantime. When having thousands of fdbs the system is much
    more responsive. Most importantly remove the need to check if the
    matched entry has expired in __br_fdb_get that causes false-sharing and
    is completely unnecessary if we cleanup entries, at worst we'll get 10ms
    of traffic for that entry before it gets deleted.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

04 Feb, 2017

1 commit

  • This patch adds support to attach per vlan tunnel info dst
    metadata. This enables bridge driver to map vlan to tunnel_info
    at ingress and egress. It uses the kernel dst_metadata infrastructure.

    The initial use case is vlan to vni bridging, but the api is generic
    to extend to any tunnel_info in the future:
    - Uapi to configure/unconfigure/dump per vlan tunnel data
    - netlink functions to configure vlan and tunnel_info mapping
    - Introduces bridge port flag BR_LWT_VLAN to enable attach/detach
    dst_metadata to bridged packets on ports. off by default.
    - changes to existing code is mainly refactor some existing vlan
    handling netlink code + hooks for new vlan tunnel code
    - I have kept the vlan tunnel code isolated in separate files.
    - most of the netlink vlan tunnel code is handling of vlan-tunid
    ranges (follows the vlan range handling code). To conserve space
    vlan-tunid by default are always dumped in ranges if applicable.

    Use case:
    example use for this is a vxlan bridging gateway or vtep
    which maps vlans to vn-segments (or vnis).

    iproute2 example (patched and pruned iproute2 output to just show
    relevant fdb entries):
    example shows same host mac learnt on two vni's and
    vlan 100 maps to vni 1000, vlan 101 maps to vni 1001

    before (netdev per vni):
    $bridge fdb show | grep "00:02:00:00:00:03"
    00:02:00:00:00:03 dev vxlan1001 vlan 101 master bridge
    00:02:00:00:00:03 dev vxlan1001 dst 12.0.0.8 self
    00:02:00:00:00:03 dev vxlan1000 vlan 100 master bridge
    00:02:00:00:00:03 dev vxlan1000 dst 12.0.0.8 self

    after this patch with collect metdata in bridged mode (single netdev):
    $bridge fdb show | grep "00:02:00:00:00:03"
    00:02:00:00:00:03 dev vxlan0 vlan 101 master bridge
    00:02:00:00:00:03 dev vxlan0 src_vni 1001 dst 12.0.0.8 self
    00:02:00:00:00:03 dev vxlan0 vlan 100 master bridge
    00:02:00:00:00:03 dev vxlan0 src_vni 1000 dst 12.0.0.8 self

    CC: Nikolay Aleksandrov
    Signed-off-by: Roopa Prabhu
    Signed-off-by: David S. Miller

    Roopa Prabhu
     

28 Jan, 2017

1 commit


25 Jan, 2017

1 commit

  • Implements an optional, per bridge port flag and feature to deliver
    multicast packets to any host on the according port via unicast
    individually. This is done by copying the packet per host and
    changing the multicast destination MAC to a unicast one accordingly.

    multicast-to-unicast works on top of the multicast snooping feature of
    the bridge. Which means unicast copies are only delivered to hosts which
    are interested in it and signalized this via IGMP/MLD reports
    previously.

    This feature is intended for interface types which have a more reliable
    and/or efficient way to deliver unicast packets than broadcast ones
    (e.g. wifi).

    However, it should only be enabled on interfaces where no IGMPv2/MLDv1
    report suppression takes place. This feature is disabled by default.

    The initial patch and idea is from Felix Fietkau.

    Signed-off-by: Felix Fietkau
    [linus.luessing@c0d3.blue: various bug + style fixes, commit message]
    Signed-off-by: Linus Lüssing
    Reviewed-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Felix Fietkau
     

21 Jan, 2017

1 commit


22 Nov, 2016

2 commits

  • This patch adds basic support for MLDv2 queries, the default is MLDv1
    as before. A new multicast option - multicast_mld_version, adds the
    ability to change it between 1 and 2 via netlink and sysfs.
    The MLD option is disabled if CONFIG_IPV6 is disabled.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • This patch adds basic support for IGMPv3 queries, the default is IGMPv2
    as before. A new multicast option - multicast_igmp_version, adds the
    ability to change it between 2 and 3 via netlink and sysfs. The option
    struct member is in a 4 byte hole in net_bridge.

    There also a few minor style adjustments in br_multicast_new_group and
    br_multicast_add_group.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

02 Sep, 2016

1 commit


27 Aug, 2016

1 commit


19 Aug, 2016

2 commits

  • Use one of the vlan xstats padding fields to export the vlan flags. This is
    needed in order to be able to distinguish between master (bridge) and port
    vlan entries in user-space when dumping the bridge vlan stats.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • In the bridge driver we usually have the same function working for both
    port and bridge. In order to follow that logic and also avoid code
    duplication, consolidate the bridge_ and brport_ linkxstats calls into
    one since they share most of their code. As a side effect this allows us
    to dump the vlan stats also via the slave call which is in preparation for
    the upcoming per-port vlan stats and vlan flag dumping.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

30 Jun, 2016

2 commits

  • This patch adds stats support for the currently used IGMP/MLD types by the
    bridge. The stats are per-port (plus one stat per-bridge) and per-direction
    (RX/TX). The stats are exported via netlink via the new linkxstats API
    (RTM_GETSTATS). In order to minimize the performance impact, a new option
    is used to enable/disable the stats - multicast_stats_enabled, similar to
    the recent vlan stats. Also in order to avoid multiple IGMP/MLD type
    lookups and checks, we make use of the current "igmp" member of the bridge
    private skb->cb region to record the type on Rx (both host-generated and
    external packets pass by multicast_rcv()). We can do that since the igmp
    member was used as a boolean and all the valid IGMP/MLD types are positive
    values. The normal bridge fast-path is not affected at all, the only
    affected paths are the flooding ones and since we make use of the IGMP/MLD
    type, we can quickly determine if the packet should be counted using
    cache-hot data (cb's igmp member). We add counters for:
    * IGMP Queries
    * IGMP Leaves
    * IGMP v1/v2/v3 reports

    * MLD Queries
    * MLD Leaves
    * MLD v1/v2 reports

    These are invaluable when monitoring or debugging complex multicast setups
    with bridges.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • This patch adds support for the IFLA_STATS_LINK_XSTATS_SLAVE attribute
    which allows to export per-slave statistics if the master device supports
    the linkxstats callback. The attribute is passed down to the linkxstats
    callback and it is up to the callback user to use it (an example has been
    added to the only current user - the bridge). This allows us to query only
    specific slaves of master devices like bridge ports and export only what
    we're interested in instead of having to dump all ports and searching only
    for a single one. This will be used to export per-port IGMP/MLD stats and
    also per-port vlan stats in the future, possibly other statistics as well.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

29 Jun, 2016

1 commit

  • I made a dumb off-by-one mistake when I added the vlan stats counter
    dumping code. The increment should happen before the check, not after
    otherwise we miss one entry when we continue dumping.

    Fixes: a60c090361ea ("bridge: netlink: export per-vlan stats")
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

03 May, 2016

2 commits

  • Add a new LINK_XSTATS_TYPE_BRIDGE attribute and implement the
    RTM_GETSTATS callbacks for IFLA_STATS_LINK_XSTATS (fill_linkxstats and
    get_linkxstats_size) in order to export the per-vlan stats.
    The paddings were added because soon these fields will be needed for
    per-port per-vlan stats (or something else if someone beats me to it) so
    avoiding at least a few more netlink attributes.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • Add support for per-VLAN Tx/Rx statistics. Every global vlan context gets
    allocated a per-cpu stats which is then set in each per-port vlan context
    for quick access. The br_allowed_ingress() common function is used to
    account for Rx packets and the br_handle_vlan() common function is used
    to account for Tx packets. Stats accounting is performed only if the
    bridge-wide vlan_stats_enabled option is set either via sysfs or netlink.
    A struct hole between vlan_enabled and vlan_proto is used for the new
    option so it is in the same cache line. Currently it is binary (on/off)
    but it is intentionally restricted to exactly 0 and 1 since other values
    will be used in the future for different purposes (e.g. per-port stats).

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

26 Apr, 2016

1 commit


19 Feb, 2016

1 commit


22 Oct, 2015

1 commit

  • if_nlmsg_size() overestimates the minimum allocation size of netlink
    dump request (when called from rtnl_calcit()) or the size of the
    message (when called from rtnl_getlink()). This is because
    ext_filter_mask is not supported by rtnl_link_get_af_size() and
    rtnl_link_get_size().

    The over-estimation is significant when at least one netdev has many
    VLANs configured (8 bytes for each configured VLAN).

    This patch-set "rightsizes" the protocol specific attribute size
    calculation by propagating ext_filter_mask to rtnl_link_get_af_size()
    and adding this a argument to get_link_af_size op in rtnl_af_ops.

    Bridge module already used filtering aware sizing for notifications.
    br_get_link_af_size_filtered() is consistent with the modified
    get_link_af_size op so it replaces br_get_link_af_size() in br_af_ops.
    br_get_link_af_size() becomes unused and thus removed.

    Signed-off-by: Ronen Arad
    Acked-by: Sridhar Samudrala
    Signed-off-by: David S. Miller

    Arad, Ronen
     

13 Oct, 2015

3 commits

  • br_fill_ifinfo is called by br_ifinfo_notify which can be called from
    many contexts with different locks held, sometimes it relies upon
    bridge's spinlock only which is a problem for the vlan code, so use
    explicitly rcu for that to avoid problems.

    Signed-off-by: Nikolay Aleksandrov
    Reviewed-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • The bridge and port's vlgrp member is already used in RCU way, currently
    we rely on the fact that it cannot disappear while the port exists but
    that is error-prone and we might miss places with improper locking
    (either RCU or RTNL must be held to walk the vlan_list). So make it
    official and use RCU for vlgrp to catch offenders. Introduce proper vlgrp
    accessors and use them consistently throughout the code.

    Signed-off-by: Nikolay Aleksandrov
    Reviewed-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • Currently it's possible for someone to send a vlan range to the kernel
    with the pvid flag set which will result in the pvid bouncing from a
    vlan to vlan and isn't correct, it also introduces problems for hardware
    where it doesn't make sense having more than 1 pvid. iproute2 already
    enforces this, so let's enforce it on kernel-side as well.

    Reported-by: Elad Raz
    Signed-off-by: Nikolay Aleksandrov
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

12 Oct, 2015

1 commit

  • Use SWITCHDEV_F_SKIP_EOPNOTSUPP to skip over ports in bridge that don't
    support setting ageing_time (or setting bridge attrs in general).

    If push fails, don't update ageing_time in bridge and return err to user.

    If push succeeds, update ageing_time in bridge and run gc_timer now to
    recalabrate when to run gc_timer next, based on new ageing_time.

    Signed-off-by: Scott Feldman
    Signed-off-by: Jiri Pirko
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Scott Feldman