Eric Lee / smarc-fsl-linux-kernel

13 May, 2019

1 commit

3ebb41bf4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf ... Browse Code »

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Postpone chain policy update to drop after transaction is complete,
from Florian Westphal.

2) Add entry to flowtable after confirmation to fix UDP flows with
packets going in one single direction.

3) Reference count leak in dst object, from Taehee Yoo.

4) Check for TTL field in flowtable datapath, from Taehee Yoo.

5) Fix h323 conntrack helper due to incorrect boundary check,
from Jakub Jankowski.

6) Fix incorrect rcu dereference when fetching basechain stats,
from Florian Westphal.

7) Missing error check when adding new entries to flowtable,
from Taehee Yoo.

8) Use version field in nfnetlink message to honor the nfgen_family
field, from Kristian Evensen.

9) Remove incorrect configuration check for CONFIG_NF_CONNTRACK_IPV6,
from Subash Abhinov Kasiviswanathan.

10) Prevent dying entries from being added to the flowtable,
from Taehee Yoo.

11) Don't hit WARN_ON() with malformed blob in ebtables with
trailing data after last rule, reported by syzbot, patch
from Florian Westphal.

12) Remove NFT_CT_TIMEOUT enumeration, never used in the kernel
code.

13) Fix incorrect definition for NFT_LOGLEVEL_MAX, from Florian
Westphal.

This batch comes with a conflict that can be fixed with this patch:

diff --cc include/uapi/linux/netfilter/nf_tables.h
index 7bdb234f3d8c,f0cf7b0f4f35..505393c6e959
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@@ -966,6 -966,8 +966,7 @@@ enum nft_socket_keys
* @NFT_CT_DST_IP: conntrack layer 3 protocol destination (IPv4 address)
* @NFT_CT_SRC_IP6: conntrack layer 3 protocol source (IPv6 address)
* @NFT_CT_DST_IP6: conntrack layer 3 protocol destination (IPv6 address)
- * @NFT_CT_TIMEOUT: connection tracking timeout policy assigned to conntrack
+ * @NFT_CT_ID: conntrack id
*/
enum nft_ct_keys {
NFT_CT_STATE,
@@@ -991,6 -993,8 +992,7 @@@
NFT_CT_DST_IP,
NFT_CT_SRC_IP6,
NFT_CT_DST_IP6,
- NFT_CT_TIMEOUT,
+ NFT_CT_ID,
__NFT_CT_MAX
};
#define NFT_CT_MAX (__NFT_CT_MAX - 1)

That replaces the unused NFT_CT_TIMEOUT definition by NFT_CT_ID. If you prefer,
I can also solve this conflict here, just let me know.
====================

Signed-off-by: David S. Miller

David S. Miller
2019-05-13 23:55:15 +0800

11 May, 2019

1 commit

bdfad5aec bridge: Fix error path for kobject_init_and_add() ... Browse Code »

Currently error return from kobject_init_and_add() is not followed by a
call to kobject_put(). This means there is a memory leak. We currently
set p to NULL so that kfree() may be called on it as a noop, the code is
arguably clearer if we move the kfree() up closer to where it is
called (instead of after goto jump).

Remove a goto label 'err1' and jump to call to kobject_put() in error
return from kobject_init_and_add() fixing the memory leak. Re-name goto
label 'put_back' to 'err1' now that we don't use err1, following current
nomenclature (err1, err2 ...). Move call to kfree out of the error
code at bottom of function up to closer to where memory was allocated.
Add comment to clarify call to kfree().

Signed-off-by: Tobin C. Harding
Signed-off-by: David S. Miller

Tobin C. Harding
2019-05-11 06:05:08 +0800

09 May, 2019

1 commit

680f6af53 netfilter: ebtables: CONFIG_COMPAT: reject trailing data after last rule ... Browse Code »

If userspace provides a rule blob with trailing data after last target,
we trigger a splat, then convert ruleset to 64bit format (with trailing
data), then pass that to do_replace_finish() which then returns -EINVAL.

Erroring out right away avoids the splat plus unneeded translation and
error unwind.

Fixes: 81e675c227ec ("netfilter: ebtables: add CONFIG_COMPAT support")
Reported-by: Tetsuo Handa
Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2019-05-09 14:54:49 +0800

28 Apr, 2019

3 commits

8cb081746 netlink: make validation more configurable for future strictness ... Browse Code »

We currently have two levels of strict validation:

1) liberal (default)
- undefined (type >= max) & NLA_UNSPEC attributes accepted
- attribute length >= expected accepted
- garbage at end of message accepted
2) strict (opt-in)
- NLA_UNSPEC attributes accepted
- attribute length >= expected accepted

Split out parsing strictness into four different options:
* TRAILING - check that there's no trailing data after parsing
attributes (in message or nested)
* MAXTYPE - reject attrs > max known type
* UNSPEC - reject attributes with NLA_UNSPEC policy entries
* STRICT_ATTRS - strictly validate attribute size

The default for future things should be *everything*.
The current *_strict() is a combination of TRAILING and MAXTYPE,
and is renamed to _deprecated_strict().
The current regular parsing has none of this, and is renamed to
*_parse_deprecated().

Additionally it allows us to selectively set one of the new flags
even on old policies. Notably, the UNSPEC flag could be useful in
this case, since it can be arranged (by filling in the policy) to
not be an incompatible userspace ABI change, but would then going
forward prevent forgetting attribute entries. Similar can apply
to the POLICY flag.

We end up with the following renames:
* nla_parse -> nla_parse_deprecated
* nla_parse_strict -> nla_parse_deprecated_strict
* nlmsg_parse -> nlmsg_parse_deprecated
* nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
* nla_parse_nested -> nla_parse_nested_deprecated
* nla_validate_nested -> nla_validate_nested_deprecated

Using spatch, of course:
@@
expression TB, MAX, HEAD, LEN, POL, EXT;
@@
-nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
+nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)

@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)

@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)

@@
expression TB, MAX, NLA, POL, EXT;
@@
-nla_parse_nested(TB, MAX, NLA, POL, EXT)
+nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)

@@
expression START, MAX, POL, EXT;
@@
-nla_validate_nested(START, MAX, POL, EXT)
+nla_validate_nested_deprecated(START, MAX, POL, EXT)

@@
expression NLH, HDRLEN, MAX, POL, EXT;
@@
-nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
+nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)

For this patch, don't actually add the strict, non-renamed versions
yet so that it breaks compile if I get it wrong.

Also, while at it, make nla_validate and nla_parse go down to a
common __nla_validate_parse() function to avoid code duplication.

Ultimately, this allows us to have very strict validation for every
new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
next patch, while existing things will continue to work as is.

In effect then, this adds fully strict validation for any new command.

Signed-off-by: Johannes Berg
Signed-off-by: David S. Miller

Johannes Berg
2019-04-28 05:07:21 +0800
f78c6032c net: fix two coding style issues ... Browse Code »

This is a simple cleanup addressing two coding style issues found by
checkpatch.pl in an earlier patch. It's submitted as a separate patch to
keep the original patch as it was generated by spatch.

Signed-off-by: Michal Kubecek
Signed-off-by: David S. Miller

Michal Kubecek
2019-04-28 05:03:44 +0800
ae0be8de9 netlink: make nla_nest_start() add NLA_F_NESTED flag ... Browse Code »

Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
netlink based interfaces (including recently added ones) are still not
setting it in kernel generated messages. Without the flag, message parsers
not aware of attribute semantics (e.g. wireshark dissector or libmnl's
mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
the structure of their contents.

Unfortunately we cannot just add the flag everywhere as there may be
userspace applications which check nlattr::nla_type directly rather than
through a helper masking out the flags. Therefore the patch renames
nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
are rewritten to use nla_nest_start().

Except for changes in include/net/netlink.h, the patch was generated using
this semantic patch:

@@ expression E1, E2; @@
-nla_nest_start(E1, E2)
+nla_nest_start_noflag(E1, E2)

@@ expression E1, E2; @@
-nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
+nla_nest_start(E1, E2)

Signed-off-by: Michal Kubecek
Acked-by: Jiri Pirko
Acked-by: David Ahern
Signed-off-by: David S. Miller

Michal Kubecek
2019-04-28 05:03:44 +0800

26 Apr, 2019

1 commit

8b4483658 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Two easy cases of overlapping changes.

Signed-off-by: David S. Miller

David S. Miller
2019-04-26 11:52:29 +0800

23 Apr, 2019

2 commits

697cd36cd bridge: Fix possible use-after-free when deleting bridge port ... Browse Code »

When a bridge port is being deleted, do not dereference it later in
br_vlan_port_event() as it can result in a use-after-free [1] if the RCU
callback was executed before invoking the function.

[1]
[ 129.638551] ==================================================================
[ 129.646904] BUG: KASAN: use-after-free in br_vlan_port_event+0x53c/0x5fd
[ 129.654406] Read of size 8 at addr ffff8881e4aa1ae8 by task ip/483
[ 129.663008] CPU: 0 PID: 483 Comm: ip Not tainted 5.1.0-rc5-custom-02265-ga946bd73daac #1383
[ 129.672359] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
[ 129.682484] Call Trace:
[ 129.685242] dump_stack+0xa9/0x10e
[ 129.689068] print_address_description.cold.2+0x9/0x25e
[ 129.694930] kasan_report.cold.3+0x78/0x9d
[ 129.704420] br_vlan_port_event+0x53c/0x5fd
[ 129.728300] br_device_event+0x2c7/0x7a0
[ 129.741505] notifier_call_chain+0xb5/0x1c0
[ 129.746202] rollback_registered_many+0x895/0xe90
[ 129.793119] unregister_netdevice_many+0x48/0x210
[ 129.803384] rtnl_delete_link+0xe1/0x140
[ 129.815906] rtnl_dellink+0x2a3/0x820
[ 129.844166] rtnetlink_rcv_msg+0x397/0x910
[ 129.868517] netlink_rcv_skb+0x137/0x3a0
[ 129.882013] netlink_unicast+0x49b/0x660
[ 129.900019] netlink_sendmsg+0x755/0xc90
[ 129.915758] ___sys_sendmsg+0x761/0x8e0
[ 129.966315] __sys_sendmsg+0xf0/0x1c0
[ 129.988918] do_syscall_64+0xa4/0x470
[ 129.993032] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 129.998696] RIP: 0033:0x7ff578104b58
...
[ 130.073811] Allocated by task 479:
[ 130.077633] __kasan_kmalloc.constprop.5+0xc1/0xd0
[ 130.083008] kmem_cache_alloc_trace+0x152/0x320
[ 130.088090] br_add_if+0x39c/0x1580
[ 130.092005] do_set_master+0x1aa/0x210
[ 130.096211] do_setlink+0x985/0x3100
[ 130.100224] __rtnl_newlink+0xc52/0x1380
[ 130.104625] rtnl_newlink+0x6b/0xa0
[ 130.108541] rtnetlink_rcv_msg+0x397/0x910
[ 130.113136] netlink_rcv_skb+0x137/0x3a0
[ 130.117538] netlink_unicast+0x49b/0x660
[ 130.121939] netlink_sendmsg+0x755/0xc90
[ 130.126340] ___sys_sendmsg+0x761/0x8e0
[ 130.130645] __sys_sendmsg+0xf0/0x1c0
[ 130.134753] do_syscall_64+0xa4/0x470
[ 130.138864] entry_SYSCALL_64_after_hwframe+0x49/0xbe

[ 130.146195] Freed by task 0:
[ 130.149421] __kasan_slab_free+0x125/0x170
[ 130.154016] kfree+0xf3/0x310
[ 130.157349] kobject_put+0x1a8/0x4c0
[ 130.161363] rcu_core+0x859/0x19b0
[ 130.165175] __do_softirq+0x250/0xa26
[ 130.170956] The buggy address belongs to the object at ffff8881e4aa1ae8
which belongs to the cache kmalloc-1k of size 1024
[ 130.184972] The buggy address is located 0 bytes inside of
1024-byte region [ffff8881e4aa1ae8, ffff8881e4aa1ee8)

Fixes: 9c0ec2e7182a ("bridge: support binding vlan dev link state to vlan member bridge ports")
Signed-off-by: Ido Schimmel
Cc: Mike Manning
Acked-by: Nikolay Aleksandrov
Acked-by: Mike Manning
Signed-off-by: David S. Miller

Ido Schimmel
2019-04-23 13:17:47 +0800
acced9d2b Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf ... Browse Code »

Pablo Neira Ayuso says:

====================
Netfilter/IPVS fixes for net

The following patchset contains Netfilter/IPVS fixes for your net tree:

1) Add a selftest for icmp packet too big errors with conntrack, from
Florian Westphal.

2) Validate inner header in ICMP error message does not lie to us
in conntrack, also from Florian.

3) Initialize ct->timeout to calm down KASAN, from Alexander Potapenko.

4) Skip ICMP error messages from tunnels in IPVS, from Julian Anastasov.

5) Use a hash to expose conntrack and expectation ID, from Florian Westphal.

6) Prevent shift wrap in nft_chain_parse_hook(), from Dan Carpenter.

7) Fix broken ICMP ID randomization with NAT, also from Florian.

8) Remove WARN_ON in ebtables compat that is reached via syzkaller,
from Florian Westphal.

9) Fix broken timestamps since fb420d5d91c1 ("tcp/fq: move back to
CLOCK_MONOTONIC"), from Florian.

10) Fix logging of invalid packets in conntrack, from Andrei Vagin.
====================

Signed-off-by: David S. Miller

David S. Miller
2019-04-23 12:23:55 +0800

22 Apr, 2019

1 commit

7caa56f00 netfilter: ebtables: CONFIG_COMPAT: drop a bogus WARN_ON ... Browse Code »

It means userspace gave us a ruleset where there is some other
data after the ebtables target but before the beginning of the next rule.

Fixes: 81e675c227ec ("netfilter: ebtables: add CONFIG_COMPAT support")
Reported-by: syzbot+659574e7bcc7f7eb4df7@syzkaller.appspotmail.com
Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2019-04-22 16:34:30 +0800

20 Apr, 2019

3 commits

8e1acd4fc bridge: update vlan dev link state for bridge netdev changes ... Browse Code »

If vlan bridge binding is enabled, then the link state of a vlan device
that is an upper device of the bridge tracks the state of bridge ports
that are members of that vlan. But this can only be done when the link
state of the bridge is up. If it is down, then the link state of the
vlan devices must also be down. This is to maintain existing behavior
for when STP is enabled and there are no live ports, in which case the
link state for the bridge and any vlan devices is down.

Signed-off-by: Mike Manning
Acked-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller

Mike Manning
2019-04-20 04:58:17 +0800
80900acd3 bridge: update vlan dev state when port added to or deleted from vlan ... Browse Code »

If vlan bridge binding is enabled, then the link state of a vlan device
that is an upper device of the bridge should track the state of bridge
ports that are members of that vlan. So if a bridge port becomes or
stops being a member of a vlan, then update the link state of the
vlan device if necessary.

Signed-off-by: Mike Manning
Acked-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller

Mike Manning
2019-04-20 04:58:17 +0800
9c0ec2e71 bridge: support binding vlan dev link state to vlan member bridge ports ... Browse Code »

In the case of vlan filtering on bridges, the bridge may also have the
corresponding vlan devices as upper devices. A vlan bridge binding mode
is added to allow the link state of the vlan device to track only the
state of the subset of bridge ports that are also members of the vlan,
rather than that of all bridge ports. This mode is set with a vlan flag
rather than a bridge sysfs so that the 8021q module is aware that it
should not set the link state for the vlan device.

If bridge vlan is configured, the bridge device event handling results
in the link state for an upper device being set, if it is a vlan device
with the vlan bridge binding mode enabled. This also sets a
vlan_bridge_binding flag so that subsequent UP/DOWN/CHANGE events for
the ports in that bridge result in a link state update of the vlan
device if required.

The link state of the vlan device is up if there is at least one bridge
port that is a vlan member that is admin & oper up, otherwise its oper
state is IF_OPER_LOWERLAYERDOWN.

Signed-off-by: Mike Manning
Acked-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller

Mike Manning
2019-04-20 04:58:17 +0800

18 Apr, 2019

1 commit

6b0a7f84e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflict resolution of af_smc.c from Stephen Rothwell.

Signed-off-by: David S. Miller

David S. Miller
2019-04-18 02:26:25 +0800

17 Apr, 2019

2 commits

600bea7db net: bridge: fix netlink export of vlan_stats_per_port option ... Browse Code »

Since the introduction of the vlan_stats_per_port option the netlink
export of it has been broken since I made a typo and used the ifla
attribute instead of the bridge option to retrieve its state.
Sysfs export is fine, only netlink export has been affected.

Fixes: 9163a0fc1f0c0 ("net: bridge: add support for per-port vlan stats")
Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller

Nikolay Aleksandrov
2019-04-17 12:40:29 +0800
3b2e2904d net: bridge: fix per-port af_packet sockets ... Browse Code »

When the commit below was introduced it changed two visible things:
- the skb was no longer passed through the protocol handlers with the
original device
- the skb was passed up the stack with skb->dev = bridge

The first change broke af_packet sockets on bridge ports. For example we
use them for hostapd which listens for ETH_P_PAE packets on the ports.
We discussed two possible fixes:
- create a clone and pass it through NF_HOOK(), act on the original skb
based on the result
- somehow signal to the caller from the okfn() that it was called,
meaning the skb is ok to be passed, which this patch is trying to
implement via returning 1 from the bridge link-local okfn()

Note that we rely on the fact that NF_QUEUE/STOLEN would return 0 and
drop/error would return < 0 thus the okfn() is called only when the
return was 1, so we signal to the caller that it was called by preserving
the return value from nf_hook().

Fixes: 8626c56c8279 ("bridge: fix potential use-after-free when hook returns QUEUE or STOLEN verdict")
Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller

Nikolay Aleksandrov
2019-04-17 11:30:40 +0800

16 Apr, 2019

1 commit

dc2f4189d bridge: only include nf_queue.h if needed ... Browse Code »

After merging the netfilter-next tree, today's linux-next build (powerpc
ppc44x_defconfig) failed like this:

In file included from net/bridge/br_input.c:19:
include/net/netfilter/nf_queue.h:16:23: error: field 'state' has incomplete type
struct nf_hook_state state;
^~~~~

Fixes: 971502d77faa ("bridge: netfilter: unroll NF_HOOK helper in bridge input path")
Signed-off-by: Stephen Rothwell
Signed-off-by: Pablo Neira Ayuso

Stephen Rothwell
2019-04-16 00:47:36 +0800

12 Apr, 2019

4 commits

223fd0adf bridge: broute: make broute a real ebtables table ... Browse Code »

This makes broute a normal ebtables table, hooking at PREROUTING.
The broute hook is removed.

It uses skb->cb to signal to bridge rx handler that the skb should be
routed instead of being bridged.

This change is backwards compatible with ebtables as no userspace visible
parts are changed.

This means we can also remove the !ops test in ebt_register_table,
it was only there for broute table sake.

Signed-off-by: Florian Westphal
Acked-by: David S. Miller
Acked-by: Nikolay Aleksandrov
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2019-04-12 07:47:50 +0800
971502d77 bridge: netfilter: unroll NF_HOOK helper in bridge input path ... Browse Code »

Replace NF_HOOK() based invocation of the netfilter hooks with a private
copy of nf_hook_slow().

This copy has one difference: it can return the rx handler value expected
by the stack, i.e. RX_HANDLER_CONSUMED or RX_HANDLER_PASS.

This is needed by the next patch to invoke the ebtables
"broute" table via the standard netfilter hooks rather than the custom
"br_should_route_hook" indirection that is used now.

When the skb is to be "brouted", we must return RX_HANDLER_PASS from the
bridge rx input handler, but there is no way to indicate this via
NF_HOOK(), unless perhaps by some hack such as exposing bridge_cb in the
netfilter core or a percpu flag.

text data bss dec filename
3369 56 0 3425 net/bridge/br_input.o.before
3458 40 0 3498 net/bridge/br_input.o.after

This allows removal of the "br_should_route_hook" in the next patch.

Signed-off-by: Florian Westphal
Acked-by: David S. Miller
Acked-by: Nikolay Aleksandrov
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2019-04-12 07:47:39 +0800
f12064d1b bridge: reduce size of input cb to 16 bytes ... Browse Code »

Reduce size of br_input_skb_cb from 24 to 16 bytes by
using bitfield for those values that can only be 0 or 1.

igmp is the igmp type value, so it needs to be at least u8.

Furthermore, the bridge currently relies on step-by-step initialization
of br_input_skb_cb fields as the skb passes through the stack.

Explicitly zero out the bridge input cb instead, this avoids having to
review/validate that no BR_INPUT_SKB_CB(skb)->foo test can see a
'random' value from previous protocol cb.

AFAICS all current fields are always set up before they are read again,
so this is not a bug fix.

Signed-off-by: Florian Westphal
Acked-by: David S. Miller
Acked-by: Nikolay Aleksandrov
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2019-04-12 07:47:27 +0800
c5b493ce1 net: bridge: multicast: use rcu to access port list from br_multicast_start_querier ... Browse Code »

br_multicast_start_querier() walks over the port list but it can be
called from a timer with only multicast_lock held which doesn't protect
the port list, so use RCU to walk over it.

Fixes: c83b8fab06fc ("bridge: Restart queries when last querier expires")
Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller

Nikolay Aleksandrov
2019-04-12 02:13:51 +0800

08 Apr, 2019

1 commit

8f0db0180 rhashtable: use bit_spin_locks to protect hash bucket. ... Browse Code »

This patch changes rhashtables to use a bit_spin_lock on BIT(1) of the
bucket pointer to lock the hash chain for that bucket.

The benefits of a bit spin_lock are:
- no need to allocate a separate array of locks.
- no need to have a configuration option to guide the
choice of the size of this array
- locking cost is often a single test-and-set in a cache line
that will have to be loaded anyway. When inserting at, or removing
from, the head of the chain, the unlock is free - writing the new
address in the bucket head implicitly clears the lock bit.
For __rhashtable_insert_fast() we ensure this always happens
when adding a new key.
- even when lockings costs 2 updates (lock and unlock), they are
in a cacheline that needs to be read anyway.

The cost of using a bit spin_lock is a little bit of code complexity,
which I think is quite manageable.

Bit spin_locks are sometimes inappropriate because they are not fair -
if multiple CPUs repeatedly contend of the same lock, one CPU can
easily be starved. This is not a credible situation with rhashtable.
Multiple CPUs may want to repeatedly add or remove objects, but they
will typically do so at different buckets, so they will attempt to
acquire different locks.

As we have more bit-locks than we previously had spinlocks (by at
least a factor of two) we can expect slightly less contention to
go with the slightly better cache behavior and reduced memory
consumption.

To enhance type checking, a new struct is introduced to represent the
pointer plus lock-bit
that is stored in the bucket-table. This is "struct rhash_lock_head"
and is empty. A pointer to this needs to be cast to either an
unsigned lock, or a "struct rhash_head *" to be useful.
Variables of this type are most often called "bkt".

Previously "pprev" would sometimes point to a bucket, and sometimes a
->next pointer in an rhash_head. As these are now different types,
pprev is NULL when it would have pointed to the bucket. In that case,
'blk' is used, together with correct locking protocol.

Signed-off-by: NeilBrown
Signed-off-by: David S. Miller

NeilBrown
2019-04-08 10:12:12 +0800

06 Apr, 2019

1 commit

f83f71519 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Minor comment merge conflict in mlx5.

Staging driver has a fixup due to the skb->xmit_more changes
in 'net-next', but was removed in 'net'.

Signed-off-by: David S. Miller

David S. Miller
2019-04-06 05:14:19 +0800

05 Apr, 2019

4 commits

e177163d3 net: bridge: mcast: remove unused br_ip_equal function ... Browse Code »

Since the mcast conversion to rhashtable this function has been unused, so
remove it.

Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller

Nikolay Aleksandrov
2019-04-05 08:53:56 +0800
1515a63fc net: bridge: always clear mcast matching struct on reports and leaves ... Browse Code »

We need to be careful and always zero the whole br_ip struct when it is
used for matching since the rhashtable change. This patch fixes all the
places which didn't properly clear it which in turn might've caused
mismatches.

Thanks for the great bug report with reproducing steps and bisection.

Steps to reproduce (from the bug report):
ip link add br0 type bridge mcast_querier 1
ip link set br0 up

ip link add v2 type veth peer name v3
ip link set v2 master br0
ip link set v2 up
ip link set v3 up
ip addr add 3.0.0.2/24 dev v3

ip netns add test
ip link add v1 type veth peer name v1 netns test
ip link set v1 master br0
ip link set v1 up
ip -n test link set v1 up
ip -n test addr add 3.0.0.1/24 dev v1

# Multicast receiver
ip netns exec test socat
UDP4-RECVFROM:5588,ip-add-membership=224.224.224.224:3.0.0.1,fork -

# Multicast sender
echo hello | nc -u -s 3.0.0.2 224.224.224.224 5588

Reported-by: liam.mcbirnie@boeing.com
Fixes: 19e3a9c90c53 ("net: bridge: convert multicast to generic rhashtable")
Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller

Nikolay Aleksandrov
2019-04-05 08:52:40 +0800
8dc350202 net: bridge: optimize backup_port fdb convergence ... Browse Code »

We can optimize the fdb convergence when a backup_port is present by not
immediately flushing the entries of the stopped port since traffic for
those entries will flow towards the backup_port.

There are 2 cases specifically that benefit most:
- when the stopped port comes up before the entries expire by themselves
- when there's an external entry refresh and they're kept while the
backup_port is operating (e.g. mlag)

Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller

Nikolay Aleksandrov
2019-04-05 08:39:47 +0800
847d44efa net: bridge: update multicast stats from maybe_deliver() ... Browse Code »

Simplify this code by updating bridge multicast stats from
maybe_deliver().

Note that commit 6db6f0eae605 ("bridge: multicast to unicast"), in case
the port flag BR_MULTICAST_TO_UNICAST is set, never updates the previous
port pointer, therefore it is always going to be different from the
existing port in this deduplicated list iteration.

Signed-off-by: Pablo Neira Ayuso
Acked-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller

Pablo Neira Ayuso
2019-04-05 01:49:27 +0800

30 Mar, 2019

2 commits

35f861e3c net: bridge: use netif_is_bridge_port() ... Browse Code »

Replace the br_port_exists() macro with its twin from netdevice.h

CC: Roopa Prabhu
CC: Nikolay Aleksandrov
Signed-off-by: Julian Wiedmann
Acked-by: Roopa Prabhu
Signed-off-by: David S. Miller

Julian Wiedmann
2019-03-30 04:48:40 +0800
3616d08bc ipv6: Move ipv6 stubs to a separate header file ... Browse Code »

The number of stubs is growing and has nothing to do with addrconf.
Move the definition of the stubs to a separate header file and update
users. In the move, drop the vxlan specific comment before ipv6_stub.

Code move only; no functional change intended.

Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2019-03-30 01:53:45 +0800

28 Mar, 2019

1 commit

356d71e00 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Browse Code »

David S. Miller
2019-03-28 08:37:58 +0800

21 Mar, 2019

1 commit

1bfe45f4a net: bridge: use eth_broadcast_addr() to assign broadcast address ... Browse Code »

This patch is to use eth_broadcast_addr() to assign broadcast address
insetad of memset().

Signed-off-by: Mao Wenan
Signed-off-by: David S. Miller

Mao Wenan
2019-03-21 02:02:47 +0800

18 Mar, 2019

1 commit

e166e4fda netfilter: bridge: set skb transport_header before entering NF_INET_PRE_ROUTING ... Browse Code »

Since Commit 21d1196a35f5 ("ipv4: set transport header earlier"),
skb->transport_header has been always set before entering INET
netfilter. This patch is to set skb->transport_header for bridge
before entering INET netfilter by bridge-nf-call-iptables.

It also fixes an issue that sctp_error() couldn't compute a right
csum due to unset skb->transport_header.

Fixes: e6d8b64b34aa ("net: sctp: fix and consolidate SCTP checksumming code")
Reported-by: Li Shuang
Suggested-by: Pablo Neira Ayuso
Signed-off-by: Xin Long
Acked-by: Neil Horman
Acked-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Xin Long
2019-03-18 23:21:54 +0800

03 Mar, 2019

1 commit

4e7df119d Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next ... Browse Code »

Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for net-next:

1) Add .release_ops to properly unroll .select_ops, use it from nft_compat.
After this change, we can remove list of extensions too to simplify this
codebase.

2) Update amanda conntrack helper to support v3.4, from Florian Tham.

3) Get rid of the obsolete BUGPRINT macro in ebtables, from
Florian Westphal.

4) Merge IPv4 and IPv6 masquerading infrastructure into one single module.
From Florian Westphal.

5) Patchset to remove nf_nat_l3proto structure to get rid of
indirections, from Florian Westphal.

6) Skip unnecessary conntrack timeout updates in case the value is
still the same, also from Florian Westphal.

7) Remove unnecessary 'fall through' comments in empty switch cases,
from Li RongQing.

8) Fix lookup to fixed size hashtable sets on big endian with 32-bit keys.

9) Incorrect logic to deactivate path of fixed size hashtable sets,
element was being tested to self.

10) Remove nft_hash_key(), the bitmap set is always selected for 16-bit
keys.

11) Use boolean whenever possible in IPVS codebase, from Andrea Claudi.

12) Enter close state in conntrack if RST matches exact sequence number,
from Florian Westphal.

13) Initialize dst_cache in tunnel extension, from wenxu.

14) Pass protocol as u16 to xt_check_match and xt_check_target, from
Li RongQing.

15) SCTP header is granted to be in a linear area from IPVS NAT handler,
from Xin Long.

16) Don't steal packets coming from slave VRF device from the
ip_sabotage_in() path, from David Ahern.

17) Fix unsafe update of basechain stats, from Li RongQing.

18) Make sure CONNTRACK_LOCKS is power of 2 to let compiler optimize
modulo operation as bitwise AND, from Li RongQing.

19) Use device_attribute instead of internal definition in the IDLETIMER
target, from Sami Tolvanen.

20) Merge redir, masq and IPv4/IPv6 NAT chain types, from Florian Westphal.
====================

Signed-off-by: David S. Miller

David S. Miller
2019-03-03 06:01:04 +0800

01 Mar, 2019

2 commits

cd6428988 netfilter: bridge: Don't sabotage nf_hook calls for an l3mdev slave ... Browse Code »

Followup to a173f066c7cf ("netfilter: bridge: Don't sabotage nf_hook
calls from an l3mdev"). Some packets (e.g., ndisc) do not have the skb
device flipped to the l3mdev (e.g., VRF) device. Update ip_sabotage_in
to not drop packets for slave devices too. Currently, neighbor
solicitation packets for 'dev -> bridge (addr) -> vrf' setups are getting
dropped. This patch enables IPv6 communications for bridges with an
address that are enslaved to a VRF.

Fixes: 73e20b761acf ("net: vrf: Add support for PREROUTING rules on vrf device")
Signed-off-by: David Ahern
Signed-off-by: Pablo Neira Ayuso

David Ahern
2019-03-01 21:28:45 +0800
11d4dd0b2 netfilter: convert the proto argument from u8 to u16 ... Browse Code »

The proto in struct xt_match and struct xt_target is u16, when
calling xt_check_target/match, their proto argument is u8,
and will cause truncation, it is harmless to ip packet, since
ip proto is u8

if a etable's match/target has proto that is u16, will cause
the check failure.

and convert be16 to short in bridge/netfilter/ebtables.c

Signed-off-by: Zhang Yu
Signed-off-by: Li RongQing
Signed-off-by: Pablo Neira Ayuso

Li RongQing
2019-03-01 21:28:43 +0800

28 Feb, 2019

1 commit

d45224d60 net: switchdev: Replace port attr set SDO with a notification ... Browse Code »

Drop switchdev_ops.switchdev_port_attr_set. Drop the uses of this field
from all clients, which were migrated to use switchdev notification in
the previous patches.

Add a new function switchdev_port_attr_notify() that sends the switchdev
notifications SWITCHDEV_PORT_ATTR_SET and calls the blocking (process)
notifier chain.

We have one odd case within net/bridge/br_switchdev.c with the
SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS attribute identifier that
requires executing from atomic context, we deal with that one
specifically.

Drop __switchdev_port_attr_set() and update switchdev_port_attr_set()
likewise.

Signed-off-by: Florian Fainelli
Reviewed-by: Ido Schimmel
Signed-off-by: David S. Miller

Florian Fainelli
2019-02-28 04:39:56 +0800

27 Feb, 2019

1 commit

d824548da netfilter: ebtables: remove BUGPRINT messages ... Browse Code »

They are however frequently triggered by syzkaller, so remove them.

ebtables userspace should never trigger any of these, so there is little
value in making them pr_debug (or ratelimited).

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2019-02-27 17:47:57 +0800

25 Feb, 2019

1 commit

70f352261 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Three conflicts, one of which, for marvell10g.c is non-trivial and
requires some follow-up from Heiner or someone else.

The issue is that Heiner converted the marvell10g driver over to
use the generic c45 code as much as possible.

However, in 'net' a bug fix appeared which makes sure that a new
local mask (MDIO_AN_10GBT_CTRL_ADV_NBT_MASK) with value 0x01e0
is cleared.

Signed-off-by: David S. Miller

David S. Miller
2019-02-25 04:06:19 +0800

24 Feb, 2019

1 commit

278e2148c Revert "bridge: do not add port to router list when receives query with source 0.0.0.0" ... Browse Code »

This reverts commit 5a2de63fd1a5 ("bridge: do not add port to router list
when receives query with source 0.0.0.0") and commit 0fe5119e267f ("net:
bridge: remove ipv6 zero address check in mcast queries")

The reason is RFC 4541 is not a standard but suggestive. Currently we
will elect 0.0.0.0 as Querier if there is no ip address configured on
bridge. If we do not add the port which recives query with source
0.0.0.0 to router list, the IGMP reports will not be about to forward
to Querier, IGMP data will also not be able to forward to dest.

As Nikolay suggested, revert this change first and add a boolopt api
to disable none-zero election in future if needed.

Reported-by: Linus Lüssing
Reported-by: Sebastian Gottschall
Fixes: 5a2de63fd1a5 ("bridge: do not add port to router list when receives query with source 0.0.0.0")
Fixes: 0fe5119e267f ("net: bridge: remove ipv6 zero address check in mcast queries")
Signed-off-by: Hangbin Liu
Acked-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller

Hangbin Liu
2019-02-24 10:36:06 +0800

22 Feb, 2019

1 commit

1ef076448 net: bridge: Stop calling switchdev_port_attr_get() ... Browse Code »

Now that all switchdev drivers have been converted to check the
SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS flags and report flags that they
do not support accordingly, we can migrate the bridge code to try to set
that attribute first, check the results and then do the actual setting.

Signed-off-by: Florian Fainelli
Reviewed-by: Ido Schimmel
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller

Florian Fainelli
2019-02-22 06:55:14 +0800