08 Dec, 2020
1 commit
-
When enabling multicast snooping, bridge module deadlocks on multicast_lock
if 1) IPv6 is enabled, and 2) there is an existing querier on the same L2
network.The deadlock was caused by the following sequence: While holding the lock,
br_multicast_open calls br_multicast_join_snoopers, which eventually causes
IP stack to (attempt to) send out a Listener Report (in igmp6_join_group).
Since the destination Ethernet address is a multicast address, br_dev_xmit
feeds the packet back to the bridge via br_multicast_rcv, which in turn
calls br_multicast_add_group, which then deadlocks on multicast_lock.The fix is to move the call br_multicast_join_snoopers outside of the
critical section. This works since br_multicast_join_snoopers only deals
with IP and does not modify any multicast data structures of the bridge,
so there's no need to hold the lock.Steps to reproduce:
1. sysctl net.ipv6.conf.all.force_mld_version=1
2. have another querier
3. ip link set dev bridge type bridge mcast_snooping 0 && \
ip link set dev bridge type bridge mcast_snooping 1 < deadlock >A typical call trace looks like the following:
[ 936.251495] _raw_spin_lock+0x5c/0x68
[ 936.255221] br_multicast_add_group+0x40/0x170 [bridge]
[ 936.260491] br_multicast_rcv+0x7ac/0xe30 [bridge]
[ 936.265322] br_dev_xmit+0x140/0x368 [bridge]
[ 936.269689] dev_hard_start_xmit+0x94/0x158
[ 936.273876] __dev_queue_xmit+0x5ac/0x7f8
[ 936.277890] dev_queue_xmit+0x10/0x18
[ 936.281563] neigh_resolve_output+0xec/0x198
[ 936.285845] ip6_finish_output2+0x240/0x710
[ 936.290039] __ip6_finish_output+0x130/0x170
[ 936.294318] ip6_output+0x6c/0x1c8
[ 936.297731] NF_HOOK.constprop.0+0xd8/0xe8
[ 936.301834] igmp6_send+0x358/0x558
[ 936.305326] igmp6_join_group.part.0+0x30/0xf0
[ 936.309774] igmp6_group_added+0xfc/0x110
[ 936.313787] __ipv6_dev_mc_inc+0x1a4/0x290
[ 936.317885] ipv6_dev_mc_inc+0x10/0x18
[ 936.321677] br_multicast_open+0xbc/0x110 [bridge]
[ 936.326506] br_multicast_toggle+0xec/0x140 [bridge]Fixes: 4effd28c1245 ("bridge: join all-snoopers multicast address")
Signed-off-by: Joseph Huang
Acked-by: Nikolay Aleksandrov
Link: https://lore.kernel.org/r/20201204235628.50653-1-Joseph.Huang@garmin.com
Signed-off-by: Jakub Kicinski
05 Dec, 2020
1 commit
-
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.Fixes: f8ed289fab84 ("bridge: vlan: use br_vlan_(get|put)_master to deal with refcounts")
Reported-by: Hulk Robot
Signed-off-by: Zhang Changzhong
Acked-by: Nikolay Aleksandrov
Link: https://lore.kernel.org/r/1607071737-33875-1-git-send-email-zhangchangzhong@huawei.com
Signed-off-by: Jakub Kicinski
29 Nov, 2020
1 commit
-
Netfilter changes PACKET_OTHERHOST to PACKET_HOST before invoking the
hooks as, while it's an expected value for a bridge, routing expects
PACKET_HOST. The change is undone later on after hook traversal. This
can be seen with pairs of functions updating skb>pkt_type and then
reverting it to its original value:For hook NF_INET_PRE_ROUTING:
setup_pre_routing / br_nf_pre_routing_finishFor hook NF_INET_FORWARD:
br_nf_forward_ip / br_nf_forward_finishBut the third case where netfilter does this, for hook
NF_INET_POST_ROUTING, the packet type is changed in br_nf_post_routing
but never reverted. A comment says:/* We assume any code from br_dev_queue_push_xmit onwards doesn't care
* about the value of skb->pkt_type. */But when having a tunnel (say vxlan) attached to a bridge we have the
following call trace:br_nf_pre_routing
br_nf_pre_routing_ipv6
br_nf_pre_routing_finish
br_nf_forward_ip
br_nf_forward_finish
br_nf_post_routing
Reviewed-by: Florian Westphal
Link: https://lore.kernel.org/r/20201123174902.622102-1-atenart@kernel.org
Signed-off-by: Jakub Kicinski
17 Nov, 2020
1 commit
-
In br_forward.c and br_input.c fields dev->stats.tx_dropped and
dev->stats.multicast are populated, but they are ignored in
ndo_get_stats64.Fixes: 28172739f0a2 ("net: fix 64 bit counters on 32 bit arches")
Signed-off-by: Heiner Kallweit
Link: https://lore.kernel.org/r/58ea9963-77ad-a7cf-8dfd-fc95ab95f606@gmail.com
Signed-off-by: Jakub Kicinski
20 Oct, 2020
1 commit
-
Fixes an error causing small packets to get dropped. skb_ensure_writable
expects the second parameter to be a length in the ethernet payload.=20
If we want to write the ethernet header (src, dst), we should pass 0.
Otherwise, packets with small payloads (< ETH_ALEN) will get dropped.Fixes: c1a831167901 ("netfilter: bridge: convert skb_make_writable to skb_ensure_writable")
Signed-off-by: Timothée COCAULT
Reviewed-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
14 Oct, 2020
1 commit
-
Simplify the code by using new function dev_fetch_sw_netstats().
Signed-off-by: Heiner Kallweit
Link: https://lore.kernel.org/r/d1c3ff29-5691-9d54-d164-16421905fa59@gmail.com
Signed-off-by: Jakub Kicinski
09 Oct, 2020
2 commits
-
Small conflict around locking in rxrpc_process_event() -
channel_lock moved to bundle in next, while state lock
needs _bh() from net.Signed-off-by: Jakub Kicinski
-
This commit is correcting NETLINK br_fill_ifinfo() to be able to
handle 'filter_mask' with multiple flags asserted.Fixes: 36a8e8e265420 ("bridge: Extend br_fill_ifinfo to return MPR status")
Signed-off-by: Henrik Bjoernlund
Reviewed-by: Horatiu Vultur
Suggested-by: Nikolay Aleksandrov
Tested-by: Horatiu Vultur
Acked-by: Nikolay Aleksandrov
Signed-off-by: Jakub Kicinski
06 Oct, 2020
1 commit
-
Rejecting non-native endian BTF overlapped with the addition
of support for it.The rest were more simple overlapping changes, except the
renesas ravb binding update, which had to follow a file
move as well as a YAML conversion.Signed-off-by: David S. Miller
29 Sep, 2020
2 commits
-
Functions related to nested interface infrastructure such as
netdev_walk_all_{ upper | lower }_dev() pass both private functions
and "data" pointer to handle their own things.
At this point, the data pointer type is void *.
In order to make it easier to expand common variables and functions,
this new netdev_nested_priv structure is added.In the following patch, a new member variable will be added into this
struct to fix the lockdep issue.Signed-off-by: Taehee Yoo
Signed-off-by: David S. Miller -
When a user-space software manages fdb entries externally it should
set the ext_learn flag which marks the fdb entry as externally managed
and avoids expiring it (they're treated as static fdbs). Unfortunately
on events where fdb entries are flushed (STP down, netlink fdb flush
etc) these fdbs are also deleted automatically by the bridge. That in turn
causes trouble for the managing user-space software (e.g. in MLAG setups
we lose remote fdb entries on port flaps).
These entries are completely externally managed so we should avoid
automatically deleting them, the only exception are offloaded entries
(i.e. BR_FDB_ADDED_BY_EXT_LEARN + BR_FDB_OFFLOADED). They are flushed as
before.Fixes: eb100e0e24a2 ("net: bridge: allow to add externally learned entries from user-space")
Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller
26 Sep, 2020
1 commit
-
We should remove a group from the sg_port hash only if it's an S,G
entry. This makes it correct and more symmetric with group add. Also
since *,G groups are not added to that hash we can hide a bug.Fixes: 085b53c8beab ("net: bridge: mcast: add sg_port rhashtable")
Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller
24 Sep, 2020
15 commits
-
We need to avoid forwarding to ports in MCAST_INCLUDE filter mode when the
mdst entry is a *,G or when the port has the blocked flag.Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller -
Since host joins are considered as EXCLUDE {} joins we need to reflect
that in all of *,G ports' S,G entries. Since the S,Gs can have
host_joined == true only set automatically we can safely set it to false
when removing all automatically added entries upon S,G delete.Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller -
When excluding S,G entries we need a way to block a particular S,G,port.
The new port group flag is managed based on the source's timer as per
RFCs 3376 and 3810. When a source expires and its port group is in
EXCLUDE mode, it will be blocked.Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller -
We need to handle group filter mode transitions and initial state.
To change a port group's INCLUDE -> EXCLUDE mode (or when we have added
a new port group in EXCLUDE mode) we need to add that port to all of
*,G ports' S,G entries for proper replication. When the EXCLUDE state is
changed from IGMPv3 report, br_multicast_fwd_filter_exclude() must be
called after the source list processing because the assumption is that
all of the group's S,G entries will be created before transitioning to
EXCLUDE mode, i.e. most importantly its blocked entries will already be
added so it will not get automatically added to them.
The transition EXCLUDE -> INCLUDE happens only when a port group timer
expires, it requires us to remove that port from all of *,G ports' S,G
entries where it was automatically added previously.
Finally when we are adding a new S,G entry we must add all of *,G's
EXCLUDE ports to it.
In order to distinguish automatically added *,G EXCLUDE ports we have a
new port group flag - MDB_PG_FLAGS_STAR_EXCL.Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller -
This patch adds support for automatic install of S,G mdb entries based
on the port group's source list and the source entry's timer.
Once installed the S,G will be used when forwarding packets if the
approprate multicast/mld versions are set. A new source flag called
BR_SGRP_F_INSTALLED denotes if the source has a forwarding mdb entry
installed.Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller -
To speedup S,G forward handling we need to be able to quickly find out
if a port is a member of an S,G group. To do that add a global S,G port
rhashtable with key: source addr, group addr, protocol, vid (all br_ip
fields) and port pointer.Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller -
We need to be able to differentiate between pg entries created by
user-space and the kernel when we start generating S,G entries for
IGMPv3/MLDv2's fast path. User-space entries are created by default as
RTPROT_STATIC and the kernel entries are RTPROT_KERNEL. Later we can
allow user-space to provide the entry rt_protocol so we can
differentiate between who added the entries specifically (e.g. clag,
admin, frr etc).Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller -
If (S,G) entries are enabled (igmpv3/mldv2) then look them up first. If
there isn't a present (S,G) entry then try to find (*,G).Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller -
Add new mdb attributes (MDBE_ATTR_SOURCE for setting,
MDBA_MDB_EATTR_SOURCE for dumping) to allow add/del and dump of mdb
entries with a source address (S,G). New S,G entries are created with
filter mode of MCAST_INCLUDE. The same attributes are used for IPv4 and
IPv6, they're validated and parsed based on their protocol.
S,G host joined entries which are added by user are not allowed yet.Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller -
Since the MDB add/del code expects an exact struct br_mdb_entry we can't
really add any extensions, thus add a new nested attribute at the level of
MDBA_SET_ENTRY called MDBA_SET_ENTRY_ATTRS which will be used to pass
all new options via netlink attributes. This patch doesn't change
anything functionally since the new attribute is not used yet, only
parsed.Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller -
Since now we have src in br_ip, u no longer makes sense so rename
it to dst. No functional changes.v2: fix build with CONFIG_BATMAN_ADV_MCAST
CC: Marek Lindner
CC: Simon Wunderlich
CC: Antonio Quartulli
CC: Sven Eckelmann
CC: b.a.t.m.a.n@lists.open-mesh.org
Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller -
Now that we have src and dst in br_ip it is logical to use the src field
for the cases where we need to work with a source address such as
querier source address and group source address.Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller -
Pass and use extack all the way down to br_mdb_add_group().
Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller -
To avoid doing duplicate device checks and searches (the same were done
in br_mdb_add and __br_mdb_add) pass the already found port to __br_mdb_add
and pull the bridge's netif_running and enabled multicast checks to
br_mdb_add. This would also simplify the future extack errors.Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller -
We can drop the pr_info() calls and just use extack to return a
meaningful error to user-space when br_mdb_parse() fails.Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller
23 Sep, 2020
1 commit
-
Two minor conflicts:
1) net/ipv4/route.c, adding a new local variable while
moving another local variable and removing it's
initial assignment.2) drivers/net/dsa/microchip/ksz9477.c, overlapping changes.
One pretty prints the port mode differently, whilst another
changes the driver to try and obtain the port mode from
the port node rather than the switch node.Signed-off-by: David S. Miller
22 Sep, 2020
1 commit
-
When calling the RCU brother of br_vlan_get_pvid(), lockdep warns:
=============================
WARNING: suspicious RCU usage
5.9.0-rc3-01631-g13c17acb8e38-dirty #814 Not tainted
-----------------------------
net/bridge/br_private.h:1054 suspicious rcu_dereference_protected() usage!Call trace:
lockdep_rcu_suspicious+0xd4/0xf8
__br_vlan_get_pvid+0xc0/0x100
br_vlan_get_pvid_rcu+0x78/0x108The warning is because br_vlan_get_pvid_rcu() calls nbp_vlan_group()
which calls rtnl_dereference() instead of rcu_dereference(). In turn,
rtnl_dereference() calls rcu_dereference_protected() which assumes
operation under an RCU write-side critical section, which obviously is
not the case here. So, when the incorrect primitive is used to access
the RCU-protected VLAN group pointer, READ_ONCE() is not used, which may
cause various unexpected problems.I'm sad to say that br_vlan_get_pvid() and br_vlan_get_pvid_rcu() cannot
share the same implementation. So fix the bug by splitting the 2
functions, and making br_vlan_get_pvid_rcu() retrieve the VLAN groups
under proper locking annotations.Fixes: 7582f5b70f9a ("bridge: add br_vlan_get_pvid_rcu()")
Signed-off-by: Vladimir Oltean
Signed-off-by: David S. Miller
19 Sep, 2020
1 commit
-
Drop repeated words in net/bridge/.
Signed-off-by: Randy Dunlap
Cc: "David S. Miller"
Cc: Jakub Kicinski
Cc: Roopa Prabhu
Cc: Nikolay Aleksandrov
Cc: bridge@lists.linux-foundation.org
Signed-off-by: David S. Miller
17 Sep, 2020
1 commit
-
When we're handling TO_EXCLUDE report in EXCLUDE filter mode we should
not ignore the return value of __grp_src_toex_excl() as we'll miss
sending notifications about group changes.Fixes: 5bf1e00b6849 ("net: bridge: mcast: support for IGMPV3/MLDv2 CHANGE_TO_INCLUDE/EXCLUDE report")
Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller
16 Sep, 2020
1 commit
-
so the switchdev can notifiy the bridge to flush non-permanent fdb entries
for this port. This is useful whenever the hardware fdb of the switchdev
is reset, but the netdev and the bridgeport are not deleted.Note that this has the same effect as the IFLA_BRPORT_FLUSH attribute.
CC: Jiri Pirko
CC: Ivan Vecera
CC: Roopa Prabhu
CC: Nikolay Aleksandrov
Signed-off-by: Alexandra Winter
Signed-off-by: Julian Wiedmann
Acked-by: Nikolay Aleksandrov
Acked-by: Ivan Vecera
Signed-off-by: David S. Miller
12 Sep, 2020
1 commit
-
Each MDB entry is encoded in a nested netlink attribute called
'MDBA_MDB_ENTRY'. In turn, this attribute contains another nested
attributed called 'MDBA_MDB_ENTRY_INFO', which encodes a single port
group entry within the MDB entry.The cited commit added the ability to restart a dump from a specific
port group entry. However, on failure to add a port group entry to the
dump the entire MDB entry (stored in 'nest2') is removed, resulting in
missing port group entries.Fix this by finalizing the MDB entry with the partial list of already
encoded port group entries.Fixes: 5205e919c9f0 ("net: bridge: mcast: add support for src list and filter mode dumping")
Signed-off-by: Ido Schimmel
Acked-by: Nikolay Aleksandrov
Reviewed-by: Jiri Pirko
Signed-off-by: David S. Miller
10 Sep, 2020
1 commit
-
Pablo Neira Ayuso says:
====================
Netfilter updates for net-nextThe following patchset contains Netfilter updates for net-next:
1) Rewrite inner header IPv6 in ICMPv6 messages in ip6t_NPT,
from Michael Zhou.2) do_ip_vs_set_ctl() dereferences uninitialized value,
from Peilin Ye.3) Support for userdata in tables, from Jose M. Guisado.
4) Do not increment ct error and invalid stats at the same time,
from Florian Westphal.5) Remove ct ignore stats, also from Florian.
6) Add ct stats for clash resolution, from Florian Westphal.
7) Bump reference counter bump on ct clash resolution only,
this is safe because bucket lock is held, again from Florian.8) Use ip_is_fragment() in xt_HMARK, from YueHaibing.
9) Add wildcard support for nft_socket, from Balazs Scheidler.
10) Remove superfluous IPVS dependency on iptables, from
Yaroslav Bolyukin.11) Remove unused definition in ebt_stp, from Wang Hai.
12) Replace CONFIG_NFT_CHAIN_NAT_{IPV4,IPV6} by CONFIG_NFT_NAT
in selftests/net, from Fabian Frederick.13) Add userdata support for nft_object, from Jose M. Guisado.
====================Signed-off-by: David S. Miller
09 Sep, 2020
1 commit
-
Stephen reported the following warning:
net/bridge/br_multicast.c: In function 'br_multicast_find_port':
net/bridge/br_multicast.c:1818:21: warning: unused variable 'br' [-Wunused-variable]
1818 | struct net_bridge *br = mp->br;
| ^~It happens due to bridge's mlock_dereference() when lockdep isn't defined.
Silence the warning by annotating the variable as __maybe_unused.Fixes: 0436862e417e ("net: bridge: mcast: support for IGMPv3/MLDv2 ALLOW_NEW_SOURCES report")
Reported-by: Stephen Rothwell
Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller
08 Sep, 2020
5 commits
-
BPDU_TYPE_TCN is never used after it was introduced.
So better to remove it.Reported-by: Hulk Robot
Signed-off-by: Wang Hai
Signed-off-by: Pablo Neira Ayuso -
Since each entry type has timers that can be running simultaneously we need
to make sure that entries are not freed before their timers have finished.
In order to do that generalize the src gc work to mcast gc work and use a
callback to free the entries (mdb, port group or src).v3: add IPv6 support
v2: force mcast gc on port del to make sure all port group timers have
finished before freeing the bridge portSigned-off-by: Nikolay Aleksandrov
Signed-off-by: Jakub Kicinski -
When an IGMPv3/MLDv2 query is received and we're operating in such mode
then we need to avoid updating group timers if the suppress flag is set.
Also we should update only timers for groups in exclude mode.v3: add IPv6/MLDv2 support
Signed-off-by: Nikolay Aleksandrov
Signed-off-by: Jakub Kicinski -
We already have all necessary helpers, so process IGMPV3/MLDv2
BLOCK_OLD_SOURCES as per the RFCs.v3: add IPv6/MLDv2 support
v2: directly do flag bit operationsSigned-off-by: Nikolay Aleksandrov
Signed-off-by: Jakub Kicinski -
In order to process IGMPV3/MLDv2 CHANGE_TO_INCLUDE/EXCLUDE report types we
need new helpers which allow us to mark entries based on their timer
state and to query only marked entries.v3: add IPv6/MLDv2 support, fix other_query checks
v2: directly do flag bit operationsSigned-off-by: Nikolay Aleksandrov
Signed-off-by: Jakub Kicinski