08 Dec, 2020

1 commit

  • When enabling multicast snooping, bridge module deadlocks on multicast_lock
    if 1) IPv6 is enabled, and 2) there is an existing querier on the same L2
    network.

    The deadlock was caused by the following sequence: While holding the lock,
    br_multicast_open calls br_multicast_join_snoopers, which eventually causes
    IP stack to (attempt to) send out a Listener Report (in igmp6_join_group).
    Since the destination Ethernet address is a multicast address, br_dev_xmit
    feeds the packet back to the bridge via br_multicast_rcv, which in turn
    calls br_multicast_add_group, which then deadlocks on multicast_lock.

    The fix is to move the call br_multicast_join_snoopers outside of the
    critical section. This works since br_multicast_join_snoopers only deals
    with IP and does not modify any multicast data structures of the bridge,
    so there's no need to hold the lock.

    Steps to reproduce:
    1. sysctl net.ipv6.conf.all.force_mld_version=1
    2. have another querier
    3. ip link set dev bridge type bridge mcast_snooping 0 && \
    ip link set dev bridge type bridge mcast_snooping 1 < deadlock >

    A typical call trace looks like the following:

    [ 936.251495] _raw_spin_lock+0x5c/0x68
    [ 936.255221] br_multicast_add_group+0x40/0x170 [bridge]
    [ 936.260491] br_multicast_rcv+0x7ac/0xe30 [bridge]
    [ 936.265322] br_dev_xmit+0x140/0x368 [bridge]
    [ 936.269689] dev_hard_start_xmit+0x94/0x158
    [ 936.273876] __dev_queue_xmit+0x5ac/0x7f8
    [ 936.277890] dev_queue_xmit+0x10/0x18
    [ 936.281563] neigh_resolve_output+0xec/0x198
    [ 936.285845] ip6_finish_output2+0x240/0x710
    [ 936.290039] __ip6_finish_output+0x130/0x170
    [ 936.294318] ip6_output+0x6c/0x1c8
    [ 936.297731] NF_HOOK.constprop.0+0xd8/0xe8
    [ 936.301834] igmp6_send+0x358/0x558
    [ 936.305326] igmp6_join_group.part.0+0x30/0xf0
    [ 936.309774] igmp6_group_added+0xfc/0x110
    [ 936.313787] __ipv6_dev_mc_inc+0x1a4/0x290
    [ 936.317885] ipv6_dev_mc_inc+0x10/0x18
    [ 936.321677] br_multicast_open+0xbc/0x110 [bridge]
    [ 936.326506] br_multicast_toggle+0xec/0x140 [bridge]

    Fixes: 4effd28c1245 ("bridge: join all-snoopers multicast address")
    Signed-off-by: Joseph Huang
    Acked-by: Nikolay Aleksandrov
    Link: https://lore.kernel.org/r/20201204235628.50653-1-Joseph.Huang@garmin.com
    Signed-off-by: Jakub Kicinski

    Joseph Huang
     

05 Dec, 2020

1 commit

  • Fix to return a negative error code from the error handling
    case instead of 0, as done elsewhere in this function.

    Fixes: f8ed289fab84 ("bridge: vlan: use br_vlan_(get|put)_master to deal with refcounts")
    Reported-by: Hulk Robot
    Signed-off-by: Zhang Changzhong
    Acked-by: Nikolay Aleksandrov
    Link: https://lore.kernel.org/r/1607071737-33875-1-git-send-email-zhangchangzhong@huawei.com
    Signed-off-by: Jakub Kicinski

    Zhang Changzhong
     

29 Nov, 2020

1 commit

  • Netfilter changes PACKET_OTHERHOST to PACKET_HOST before invoking the
    hooks as, while it's an expected value for a bridge, routing expects
    PACKET_HOST. The change is undone later on after hook traversal. This
    can be seen with pairs of functions updating skb>pkt_type and then
    reverting it to its original value:

    For hook NF_INET_PRE_ROUTING:
    setup_pre_routing / br_nf_pre_routing_finish

    For hook NF_INET_FORWARD:
    br_nf_forward_ip / br_nf_forward_finish

    But the third case where netfilter does this, for hook
    NF_INET_POST_ROUTING, the packet type is changed in br_nf_post_routing
    but never reverted. A comment says:

    /* We assume any code from br_dev_queue_push_xmit onwards doesn't care
    * about the value of skb->pkt_type. */

    But when having a tunnel (say vxlan) attached to a bridge we have the
    following call trace:

    br_nf_pre_routing
    br_nf_pre_routing_ipv6
    br_nf_pre_routing_finish
    br_nf_forward_ip
    br_nf_forward_finish
    br_nf_post_routing
    Reviewed-by: Florian Westphal
    Link: https://lore.kernel.org/r/20201123174902.622102-1-atenart@kernel.org
    Signed-off-by: Jakub Kicinski

    Antoine Tenart
     

17 Nov, 2020

1 commit

  • In br_forward.c and br_input.c fields dev->stats.tx_dropped and
    dev->stats.multicast are populated, but they are ignored in
    ndo_get_stats64.

    Fixes: 28172739f0a2 ("net: fix 64 bit counters on 32 bit arches")
    Signed-off-by: Heiner Kallweit
    Link: https://lore.kernel.org/r/58ea9963-77ad-a7cf-8dfd-fc95ab95f606@gmail.com
    Signed-off-by: Jakub Kicinski

    Heiner Kallweit
     

20 Oct, 2020

1 commit

  • Fixes an error causing small packets to get dropped. skb_ensure_writable
    expects the second parameter to be a length in the ethernet payload.=20
    If we want to write the ethernet header (src, dst), we should pass 0.
    Otherwise, packets with small payloads (< ETH_ALEN) will get dropped.

    Fixes: c1a831167901 ("netfilter: bridge: convert skb_make_writable to skb_ensure_writable")
    Signed-off-by: Timothée COCAULT
    Reviewed-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Timothée COCAULT
     

14 Oct, 2020

1 commit


09 Oct, 2020

2 commits


06 Oct, 2020

1 commit


29 Sep, 2020

2 commits

  • Functions related to nested interface infrastructure such as
    netdev_walk_all_{ upper | lower }_dev() pass both private functions
    and "data" pointer to handle their own things.
    At this point, the data pointer type is void *.
    In order to make it easier to expand common variables and functions,
    this new netdev_nested_priv structure is added.

    In the following patch, a new member variable will be added into this
    struct to fix the lockdep issue.

    Signed-off-by: Taehee Yoo
    Signed-off-by: David S. Miller

    Taehee Yoo
     
  • When a user-space software manages fdb entries externally it should
    set the ext_learn flag which marks the fdb entry as externally managed
    and avoids expiring it (they're treated as static fdbs). Unfortunately
    on events where fdb entries are flushed (STP down, netlink fdb flush
    etc) these fdbs are also deleted automatically by the bridge. That in turn
    causes trouble for the managing user-space software (e.g. in MLAG setups
    we lose remote fdb entries on port flaps).
    These entries are completely externally managed so we should avoid
    automatically deleting them, the only exception are offloaded entries
    (i.e. BR_FDB_ADDED_BY_EXT_LEARN + BR_FDB_OFFLOADED). They are flushed as
    before.

    Fixes: eb100e0e24a2 ("net: bridge: allow to add externally learned entries from user-space")
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

26 Sep, 2020

1 commit


24 Sep, 2020

15 commits

  • We need to avoid forwarding to ports in MCAST_INCLUDE filter mode when the
    mdst entry is a *,G or when the port has the blocked flag.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • Since host joins are considered as EXCLUDE {} joins we need to reflect
    that in all of *,G ports' S,G entries. Since the S,Gs can have
    host_joined == true only set automatically we can safely set it to false
    when removing all automatically added entries upon S,G delete.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • When excluding S,G entries we need a way to block a particular S,G,port.
    The new port group flag is managed based on the source's timer as per
    RFCs 3376 and 3810. When a source expires and its port group is in
    EXCLUDE mode, it will be blocked.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • We need to handle group filter mode transitions and initial state.
    To change a port group's INCLUDE -> EXCLUDE mode (or when we have added
    a new port group in EXCLUDE mode) we need to add that port to all of
    *,G ports' S,G entries for proper replication. When the EXCLUDE state is
    changed from IGMPv3 report, br_multicast_fwd_filter_exclude() must be
    called after the source list processing because the assumption is that
    all of the group's S,G entries will be created before transitioning to
    EXCLUDE mode, i.e. most importantly its blocked entries will already be
    added so it will not get automatically added to them.
    The transition EXCLUDE -> INCLUDE happens only when a port group timer
    expires, it requires us to remove that port from all of *,G ports' S,G
    entries where it was automatically added previously.
    Finally when we are adding a new S,G entry we must add all of *,G's
    EXCLUDE ports to it.
    In order to distinguish automatically added *,G EXCLUDE ports we have a
    new port group flag - MDB_PG_FLAGS_STAR_EXCL.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • This patch adds support for automatic install of S,G mdb entries based
    on the port group's source list and the source entry's timer.
    Once installed the S,G will be used when forwarding packets if the
    approprate multicast/mld versions are set. A new source flag called
    BR_SGRP_F_INSTALLED denotes if the source has a forwarding mdb entry
    installed.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • To speedup S,G forward handling we need to be able to quickly find out
    if a port is a member of an S,G group. To do that add a global S,G port
    rhashtable with key: source addr, group addr, protocol, vid (all br_ip
    fields) and port pointer.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • We need to be able to differentiate between pg entries created by
    user-space and the kernel when we start generating S,G entries for
    IGMPv3/MLDv2's fast path. User-space entries are created by default as
    RTPROT_STATIC and the kernel entries are RTPROT_KERNEL. Later we can
    allow user-space to provide the entry rt_protocol so we can
    differentiate between who added the entries specifically (e.g. clag,
    admin, frr etc).

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • If (S,G) entries are enabled (igmpv3/mldv2) then look them up first. If
    there isn't a present (S,G) entry then try to find (*,G).

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • Add new mdb attributes (MDBE_ATTR_SOURCE for setting,
    MDBA_MDB_EATTR_SOURCE for dumping) to allow add/del and dump of mdb
    entries with a source address (S,G). New S,G entries are created with
    filter mode of MCAST_INCLUDE. The same attributes are used for IPv4 and
    IPv6, they're validated and parsed based on their protocol.
    S,G host joined entries which are added by user are not allowed yet.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • Since the MDB add/del code expects an exact struct br_mdb_entry we can't
    really add any extensions, thus add a new nested attribute at the level of
    MDBA_SET_ENTRY called MDBA_SET_ENTRY_ATTRS which will be used to pass
    all new options via netlink attributes. This patch doesn't change
    anything functionally since the new attribute is not used yet, only
    parsed.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • Since now we have src in br_ip, u no longer makes sense so rename
    it to dst. No functional changes.

    v2: fix build with CONFIG_BATMAN_ADV_MCAST

    CC: Marek Lindner
    CC: Simon Wunderlich
    CC: Antonio Quartulli
    CC: Sven Eckelmann
    CC: b.a.t.m.a.n@lists.open-mesh.org
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • Now that we have src and dst in br_ip it is logical to use the src field
    for the cases where we need to work with a source address such as
    querier source address and group source address.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • Pass and use extack all the way down to br_mdb_add_group().

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • To avoid doing duplicate device checks and searches (the same were done
    in br_mdb_add and __br_mdb_add) pass the already found port to __br_mdb_add
    and pull the bridge's netif_running and enabled multicast checks to
    br_mdb_add. This would also simplify the future extack errors.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • We can drop the pr_info() calls and just use extack to return a
    meaningful error to user-space when br_mdb_parse() fails.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

23 Sep, 2020

1 commit

  • Two minor conflicts:

    1) net/ipv4/route.c, adding a new local variable while
    moving another local variable and removing it's
    initial assignment.

    2) drivers/net/dsa/microchip/ksz9477.c, overlapping changes.
    One pretty prints the port mode differently, whilst another
    changes the driver to try and obtain the port mode from
    the port node rather than the switch node.

    Signed-off-by: David S. Miller

    David S. Miller
     

22 Sep, 2020

1 commit

  • When calling the RCU brother of br_vlan_get_pvid(), lockdep warns:

    =============================
    WARNING: suspicious RCU usage
    5.9.0-rc3-01631-g13c17acb8e38-dirty #814 Not tainted
    -----------------------------
    net/bridge/br_private.h:1054 suspicious rcu_dereference_protected() usage!

    Call trace:
    lockdep_rcu_suspicious+0xd4/0xf8
    __br_vlan_get_pvid+0xc0/0x100
    br_vlan_get_pvid_rcu+0x78/0x108

    The warning is because br_vlan_get_pvid_rcu() calls nbp_vlan_group()
    which calls rtnl_dereference() instead of rcu_dereference(). In turn,
    rtnl_dereference() calls rcu_dereference_protected() which assumes
    operation under an RCU write-side critical section, which obviously is
    not the case here. So, when the incorrect primitive is used to access
    the RCU-protected VLAN group pointer, READ_ONCE() is not used, which may
    cause various unexpected problems.

    I'm sad to say that br_vlan_get_pvid() and br_vlan_get_pvid_rcu() cannot
    share the same implementation. So fix the bug by splitting the 2
    functions, and making br_vlan_get_pvid_rcu() retrieve the VLAN groups
    under proper locking annotations.

    Fixes: 7582f5b70f9a ("bridge: add br_vlan_get_pvid_rcu()")
    Signed-off-by: Vladimir Oltean
    Signed-off-by: David S. Miller

    Vladimir Oltean
     

19 Sep, 2020

1 commit


17 Sep, 2020

1 commit


16 Sep, 2020

1 commit

  • so the switchdev can notifiy the bridge to flush non-permanent fdb entries
    for this port. This is useful whenever the hardware fdb of the switchdev
    is reset, but the netdev and the bridgeport are not deleted.

    Note that this has the same effect as the IFLA_BRPORT_FLUSH attribute.

    CC: Jiri Pirko
    CC: Ivan Vecera
    CC: Roopa Prabhu
    CC: Nikolay Aleksandrov
    Signed-off-by: Alexandra Winter
    Signed-off-by: Julian Wiedmann
    Acked-by: Nikolay Aleksandrov
    Acked-by: Ivan Vecera
    Signed-off-by: David S. Miller

    Alexandra Winter
     

12 Sep, 2020

1 commit

  • Each MDB entry is encoded in a nested netlink attribute called
    'MDBA_MDB_ENTRY'. In turn, this attribute contains another nested
    attributed called 'MDBA_MDB_ENTRY_INFO', which encodes a single port
    group entry within the MDB entry.

    The cited commit added the ability to restart a dump from a specific
    port group entry. However, on failure to add a port group entry to the
    dump the entire MDB entry (stored in 'nest2') is removed, resulting in
    missing port group entries.

    Fix this by finalizing the MDB entry with the partial list of already
    encoded port group entries.

    Fixes: 5205e919c9f0 ("net: bridge: mcast: add support for src list and filter mode dumping")
    Signed-off-by: Ido Schimmel
    Acked-by: Nikolay Aleksandrov
    Reviewed-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Ido Schimmel
     

10 Sep, 2020

1 commit

  • Pablo Neira Ayuso says:

    ====================
    Netfilter updates for net-next

    The following patchset contains Netfilter updates for net-next:

    1) Rewrite inner header IPv6 in ICMPv6 messages in ip6t_NPT,
    from Michael Zhou.

    2) do_ip_vs_set_ctl() dereferences uninitialized value,
    from Peilin Ye.

    3) Support for userdata in tables, from Jose M. Guisado.

    4) Do not increment ct error and invalid stats at the same time,
    from Florian Westphal.

    5) Remove ct ignore stats, also from Florian.

    6) Add ct stats for clash resolution, from Florian Westphal.

    7) Bump reference counter bump on ct clash resolution only,
    this is safe because bucket lock is held, again from Florian.

    8) Use ip_is_fragment() in xt_HMARK, from YueHaibing.

    9) Add wildcard support for nft_socket, from Balazs Scheidler.

    10) Remove superfluous IPVS dependency on iptables, from
    Yaroslav Bolyukin.

    11) Remove unused definition in ebt_stp, from Wang Hai.

    12) Replace CONFIG_NFT_CHAIN_NAT_{IPV4,IPV6} by CONFIG_NFT_NAT
    in selftests/net, from Fabian Frederick.

    13) Add userdata support for nft_object, from Jose M. Guisado.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

09 Sep, 2020

1 commit

  • Stephen reported the following warning:
    net/bridge/br_multicast.c: In function 'br_multicast_find_port':
    net/bridge/br_multicast.c:1818:21: warning: unused variable 'br' [-Wunused-variable]
    1818 | struct net_bridge *br = mp->br;
    | ^~

    It happens due to bridge's mlock_dereference() when lockdep isn't defined.
    Silence the warning by annotating the variable as __maybe_unused.

    Fixes: 0436862e417e ("net: bridge: mcast: support for IGMPv3/MLDv2 ALLOW_NEW_SOURCES report")
    Reported-by: Stephen Rothwell
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

08 Sep, 2020

5 commits