09 Aug, 2018
1 commit
-
commit 91874ecf32e41b5d86a4cb9d60e0bee50d828058 upstream.
It's legal to have 64 groups for netlink_sock.
As user-supplied nladdr->nl_groups is __u32, it's possible to subscribe
only to first 32 groups.The check for correctness of .bind() userspace supplied parameter
is done by applying mask made from ngroups shift. Which broke Android
as they have 64 groups and the shift for mask resulted in an overflow.Fixes: 61f4b23769f0 ("netlink: Don't shift with UB on nlk->ngroups")
Cc: "David S. Miller"
Cc: Herbert Xu
Cc: Steffen Klassert
Cc: netdev@vger.kernel.org
Cc: stable@vger.kernel.org
Reported-and-Tested-by: Nathan Chancellor
Signed-off-by: Dmitry Safonov
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
06 Aug, 2018
1 commit
-
[ Upstream commit bc5b6c0b62b932626a135f516a41838c510c6eba ]
'protocol' is a user-controlled value, so sanitize it after the bounds
check to avoid using it for speculative out-of-bounds access to arrays
indexed by it.This addresses the following accesses detected with the help of smatch:
* net/netlink/af_netlink.c:654 __netlink_create() warn: potential
spectre issue 'nlk_cb_mutex_keys' [w]* net/netlink/af_netlink.c:654 __netlink_create() warn: potential
spectre issue 'nlk_cb_mutex_key_strings' [w]* net/netlink/af_netlink.c:685 netlink_create() warn: potential spectre
issue 'nl_table' [w] (local cap)Cc: Josh Poimboeuf
Signed-off-by: Jeremy Cline
Reviewed-by: Josh Poimboeuf
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
03 Aug, 2018
2 commits
-
[ Upstream commit 61f4b23769f0cc72ae62c9a81cf08f0397d40da8 ]
On i386 nlk->ngroups might be 32 or 0. Which leads to UB, resulting in
hang during boot.
Check for 0 ngroups and use (unsigned long long) as a type to shift.Fixes: 7acf9d4237c4 ("netlink: Do not subscribe to non-existent groups").
Reported-by: kernel test robot
Signed-off-by: Dmitry Safonov
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 7acf9d4237c46894e0fa0492dd96314a41742e84 ]
Make ABI more strict about subscribing to group > ngroups.
Code doesn't check for that and it looks bogus.
(one can subscribe to non-existing group)
Still, it's possible to bind() to all possible groups with (-1)Cc: "David S. Miller"
Cc: Herbert Xu
Cc: Steffen Klassert
Cc: netdev@vger.kernel.org
Signed-off-by: Dmitry Safonov
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
16 May, 2018
1 commit
-
commit 6091f09c2f79730d895149bcfe3d66140288cd0e upstream.
syzbot reported :
BUG: KMSAN: uninit-value in ffs arch/x86/include/asm/bitops.h:432 [inline]
BUG: KMSAN: uninit-value in netlink_sendmsg+0xb26/0x1310 net/netlink/af_netlink.c:1851Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet
Reported-by: syzbot
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
12 Apr, 2018
1 commit
-
[ Upstream commit 7880287981b60a6808f39f297bb66936e8bdf57a ]
KMSAN reports use of uninitialized memory in the case when |alen| is
smaller than sizeof(struct sockaddr_nl), and therefore |nladdr| isn't
fully copied from the userspace.Signed-off-by: Alexander Potapenko
Fixes: 1da177e4c3f41524 ("Linux-2.6.12-rc2")
Reviewed-by: Eric Dumazet
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
09 Mar, 2018
1 commit
-
[ Upstream commit b87b6194be631c94785fe93398651e804ed43e28 ]
Before, if cb->start() failed, the module reference would never be put,
because cb->cb_running is intentionally false at this point. Users are
generally annoyed by this because they can no longer unload modules that
leak references. Also, it may be possible to tediously wrap a reference
counter back to zero, especially since module.c still uses atomic_inc
instead of refcount_inc.This patch expands the error path to simply call module_put if
cb->start() fails.Fixes: 41c87425a1ac ("netlink: do not set cb_running if dump's start() errs")
Signed-off-by: Jason A. Donenfeld
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
31 Jan, 2018
2 commits
-
[ Upstream commit cd443f1e91ca600a092e780e8250cd6a2954b763 ]
Move up the extack reset/initialization in netlink_rcv_skb, so that
those 'goto ack' will not skip it. Otherwise, later on netlink_ack
may use the uninitialized extack and cause kernel crash.Fixes: cbbdf8433a5f ("netlink: extack needs to be reset each time through loop")
Reported-by: syzbot+03bee3680a37466775e7@syzkaller.appspotmail.com
Signed-off-by: Xin Long
Acked-by: David Ahern
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit cbbdf8433a5f117b1a2119ea30fc651b61ef7570 ]
syzbot triggered the WARN_ON in netlink_ack testing the bad_attr value.
The problem is that netlink_rcv_skb loops over the skb repeatedly invoking
the callback and without resetting the extack leaving potentially stale
data. Initializing each time through avoids the WARN_ON.Fixes: 2d4bc93368f5a ("netlink: extended ACK reporting")
Reported-by: syzbot+315fa6766d0f7c359327@syzkaller.appspotmail.com
Signed-off-by: David Ahern
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
03 Jan, 2018
1 commit
-
[ Upstream commit 93c647643b48f0131f02e45da3bd367d80443291 ]
Currently, a nlmon link inside a child namespace can observe systemwide
netlink activity. Filter the traffic so that nlmon can only sniff
netlink messages from its own netns.Test case:
vpnns -- bash -c "ip link add nlmon0 type nlmon; \
ip link set nlmon0 up; \
tcpdump -i nlmon0 -q -w /tmp/nlmon.pcap -U" &
sudo ip xfrm state add src 10.1.1.1 dst 10.1.1.2 proto esp \
spi 0x1 mode transport \
auth sha1 0x6162633132330000000000000000000000000000 \
enc aes 0x00000000000000000000000000000000
grep --binary abc123 /tmp/nlmon.pcapSigned-off-by: Kevin Cernekee
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
24 Nov, 2017
1 commit
-
[ Upstream commit 0642840b8bb008528dbdf929cec9f65ac4231ad0 ]
The way people generally use netlink_dump is that they fill in the skb
as much as possible, breaking when nla_put returns an error. Then, they
get called again and start filling out the next skb, and again, and so
forth. The mechanism at work here is the ability for the iterative
dumping function to detect when the skb is filled up and not fill it
past the brim, waiting for a fresh skb for the rest of the data.However, if the attributes are small and nicely packed, it is possible
that a dump callback function successfully fills in attributes until the
skb is of size 4080 (libmnl's default page-sized receive buffer size).
The dump function completes, satisfied, and then, if it happens to be
that this is actually the last skb, and no further ones are to be sent,
then netlink_dump will add on the NLMSG_DONE part:nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);
It is very important that netlink_dump does this, of course. However, in
this example, that call to nlmsg_put_answer will fail, because the
previous filling by the dump function did not leave it enough room. And
how could it possibly have done so? All of the nla_put variety of
functions simply check to see if the skb has enough tailroom,
independent of the context it is in.In order to keep the important assumptions of all netlink dump users, it
is therefore important to give them an skb that has this end part of the
tail already reserved, so that the call to nlmsg_put_answer does not
fail. Otherwise, library authors are forced to find some bizarre sized
receive buffer that has a large modulo relative to the common sizes of
messages received, which is ugly and buggy.This patch thus saves the NLMSG_DONE for an additional message, for the
case that things are dangerously close to the brim. This requires
keeping track of the errno from ->dump() across calls.Signed-off-by: Jason A. Donenfeld
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
18 Oct, 2017
1 commit
-
It seems that it's possible to toggle NETLINK_F_EXT_ACK
through setsockopt() while another thread/CPU is building
a message inside netlink_ack(), which could then trigger
the WARN_ON()s I added since if it goes from being turned
off to being turned on between allocating and filling the
message, the skb could end up being too small.Avoid this whole situation by storing the value of this
flag in a separate variable and using that throughout the
function instead.Fixes: 2d4bc93368f5 ("netlink: extended ACK reporting")
Signed-off-by: Johannes Berg
Signed-off-by: David S. Miller
10 Oct, 2017
1 commit
-
It turns out that multiple places can call netlink_dump(), which means
it's still possible to dereference partially initialized values in
dump() that were the result of a faulty returned start().This fixes the issue by calling start() _before_ setting cb_running to
true, so that there's no chance at all of hitting the dump() function
through any indirect paths.It also moves the call to start() to be when the mutex is held. This has
the nice side effect of serializing invocations to start(), which is
likely desirable anyway. It also prevents any possible other races that
might come out of this logic.In testing this with several different pieces of tricky code to trigger
these issues, this commit fixes all avenues that I'm aware of.Signed-off-by: Jason A. Donenfeld
Cc: Johannes Berg
Reviewed-by: Johannes Berg
Signed-off-by: David S. Miller
30 Sep, 2017
1 commit
-
Drivers that use the start method for netlink dumping rely on dumpit not
being called if start fails. For example, ila_xlat.c allocates memory
and assigns it to cb->args[0] in its start() function. It might fail to
do that and return -ENOMEM instead. However, even when returning an
error, dumpit will be called, which, in the example above, quickly
dereferences the memory in cb->args[0], which will OOPS the kernel. This
is but one example of how this goes wrong.Since start() has always been a function with an int return type, it
therefore makes sense to use it properly, rather than ignoring it. This
patch thus returns early and does not call dumpit() when start() fails.Signed-off-by: Jason A. Donenfeld
Cc: Johannes Berg
Reviewed-by: Johannes Berg
Signed-off-by: David S. Miller
07 Sep, 2017
2 commits
-
Now there is no lock protecting nlk ngroups/groups' accessing in
netlink bind and getname. It's safe from nlk groups' setting in
netlink_release, but not from netlink_realloc_groups called by
netlink_setsockopt.netlink_lock_table is needed in both netlink bind and getname when
accessing nlk groups.Acked-by: Florian Westphal
Signed-off-by: Xin Long
Signed-off-by: David S. Miller -
ChunYu found a netlink use-after-free issue by syzkaller:
[28448.842981] BUG: KASAN: use-after-free in __nla_put+0x37/0x40 at addr ffff8807185e2378
[28448.969918] Call Trace:
[...]
[28449.117207] __nla_put+0x37/0x40
[28449.132027] nla_put+0xf5/0x130
[28449.146261] sk_diag_fill.isra.4.constprop.5+0x5a0/0x750 [netlink_diag]
[28449.176608] __netlink_diag_dump+0x25a/0x700 [netlink_diag]
[28449.202215] netlink_diag_dump+0x176/0x240 [netlink_diag]
[28449.226834] netlink_dump+0x488/0xbb0
[28449.298014] __netlink_dump_start+0x4e8/0x760
[28449.317924] netlink_diag_handler_dump+0x261/0x340 [netlink_diag]
[28449.413414] sock_diag_rcv_msg+0x207/0x390
[28449.432409] netlink_rcv_skb+0x149/0x380
[28449.467647] sock_diag_rcv+0x2d/0x40
[28449.484362] netlink_unicast+0x562/0x7b0
[28449.564790] netlink_sendmsg+0xaa8/0xe60
[28449.661510] sock_sendmsg+0xcf/0x110
[28449.865631] __sys_sendmsg+0xf3/0x240
[28450.000964] SyS_sendmsg+0x32/0x50
[28450.016969] do_syscall_64+0x25c/0x6c0
[28450.154439] entry_SYSCALL64_slow_path+0x25/0x25It was caused by no protection between nlk groups' free in netlink_release
and nlk groups' accessing in sk_diag_dump_groups. The similar issue also
exists in netlink_seq_show().This patch is to defer nlk groups' free in deferred_put_nlk_sk.
Reported-by: ChunYu Wang
Acked-by: Florian Westphal
Signed-off-by: Xin Long
Signed-off-by: David S. Miller
01 Jul, 2017
3 commits
-
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.This patch uses refcount_inc_not_zero() instead of
atomic_inc_not_zero_hint() due to absense of a _hint()
version of refcount API. If the hint() version must
be used, we might need to revisit API.Signed-off-by: Elena Reshetova
Signed-off-by: Hans Liljestrand
Signed-off-by: Kees Cook
Signed-off-by: David Windsor
Signed-off-by: David S. Miller -
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.Signed-off-by: Elena Reshetova
Signed-off-by: Hans Liljestrand
Signed-off-by: Kees Cook
Signed-off-by: David Windsor
Signed-off-by: David S. Miller -
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.Signed-off-by: Elena Reshetova
Signed-off-by: Hans Liljestrand
Signed-off-by: Kees Cook
Signed-off-by: David Windsor
Signed-off-by: David S. Miller
16 Jun, 2017
2 commits
-
It seems like a historic accident that these return unsigned char *,
and in many places that means casts are required, more often than not.Make these functions (skb_put, __skb_put and pskb_put) return void *
and remove all the casts across the tree, adding a (u8 *) cast only
where the unsigned char pointer was used directly, all done with the
following spatch:@@
expression SKB, LEN;
typedef u8;
identifier fn = { skb_put, __skb_put };
@@
- *(fn(SKB, LEN))
+ *(u8 *)fn(SKB, LEN)@@
expression E, SKB, LEN;
identifier fn = { skb_put, __skb_put };
type T;
@@
- E = ((T *)(fn(SKB, LEN)))
+ E = fn(SKB, LEN)which actually doesn't cover pskb_put since there are only three
users overall.A handful of stragglers were converted manually, notably a macro in
drivers/isdn/i4l/isdn_bsdcomp.c and, oddly enough, one of the many
instances in net/bluetooth/hci_sock.c. In the former file, I also
had to fix one whitespace problem spatch introduced.Signed-off-by: Johannes Berg
Signed-off-by: David S. Miller -
A common pattern with skb_put() is to just want to memcpy()
some data into the new space, introduce skb_put_data() for
this.An spatch similar to the one for skb_put_zero() converts many
of the places using it:@@
identifier p, p2;
expression len, skb, data;
type t, t2;
@@
(
-p = skb_put(skb, len);
+p = skb_put_data(skb, data, len);
|
-p = (t)skb_put(skb, len);
+p = skb_put_data(skb, data, len);
)
(
p2 = (t2)p;
-memcpy(p2, data, len);
|
-memcpy(p, data, len);
)@@
type t, t2;
identifier p, p2;
expression skb, data;
@@
t *p;
...
(
-p = skb_put(skb, sizeof(t));
+p = skb_put_data(skb, data, sizeof(t));
|
-p = (t *)skb_put(skb, sizeof(t));
+p = skb_put_data(skb, data, sizeof(t));
)
(
p2 = (t2)p;
-memcpy(p2, data, sizeof(*p));
|
-memcpy(p, data, sizeof(*p));
)@@
expression skb, len, data;
@@
-memcpy(skb_put(skb, len), data, len);
+skb_put_data(skb, data, len);(again, manually post-processed to retain some comments)
Reviewed-by: Stephen Hemminger
Signed-off-by: Johannes Berg
Signed-off-by: David S. Miller
01 Jun, 2017
1 commit
-
The NETLINK_F_LISTEN_ALL_NSID otion enables to listen all netns that have a
nsid assigned into the netns where the netlink socket is opened.
The nsid is sent as metadata to userland, but the existence of this nsid is
checked only for netns that are different from the socket netns. Thus, if
no nsid is assigned to the socket netns, NETNSA_NSID_NOT_ASSIGNED is
reported to the userland. This value is confusing and useless.
After this patch, only valid nsid are sent to userland.Reported-by: Flavio Leitner
Signed-off-by: Nicolas Dichtel
Signed-off-by: David S. Miller
14 Apr, 2017
2 commits
-
Now that we have extended error reporting and a new message format for
netlink ACK messages, also extend this to be able to return arbitrary
cookie data on success.This will allow, for example, nl80211 to not send an extra message for
cookies identifying newly created objects, but return those directly
in the ACK message.The cookie data size is currently limited to 20 bytes (since Jamal
talked about using SHA1 for identifiers.)Thanks to Jamal Hadi Salim for bringing up this idea during the
discussions.Signed-off-by: Johannes Berg
Reviewed-by: Jiri Pirko
Signed-off-by: David S. Miller -
Add the base infrastructure and UAPI for netlink extended ACK
reporting. All "manual" calls to netlink_ack() pass NULL for now and
thus don't get extended ACK reporting.Big thanks goes to Pablo Neira Ayuso for not only bringing up the
whole topic at netconf (again) but also coming up with the nlattr
passing trick and various other ideas.Signed-off-by: Johannes Berg
Reviewed-by: David Ahern
Signed-off-by: David S. Miller
05 Apr, 2017
1 commit
-
cb_running is reported in /proc/self/net/netlink and it is reported by
the ss tool, when it gets information from the proc files.sock_diag is a new interface which is used instead of proc files, so it
looks reasonable that this interface has to report no less information
about sockets than proc files.We use these flags to dump and restore netlink sockets.
Signed-off-by: Andrei Vagin
Signed-off-by: David S. Miller
22 Mar, 2017
1 commit
-
On Tue, Mar 14, 2017 at 10:44:10AM +0100, Dmitry Vyukov wrote:
>
> Yes, please.
> Disregarding some reports is not a good way long term.Please try this patch.
---8cb_mutex are annotated by lockdep
as a single class. This causes a false lcokdep cycle involving
genl and crypto_user.This patch fixes it by dividing cb_mutex into individual classes
based on the netlink protocol. As genl and crypto_user do not
use the same netlink protocol this breaks the false dependency
loop.Reported-by: Dmitry Vyukov
Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller
28 Jan, 2017
1 commit
-
Slava Shwartsman reported a warning in skb_try_coalesce(), when we
detect skb->truesize is completely wrong.In his case, issue came from IPv6 reassembly coping with malicious
datagrams, that forced various pskb_may_pull() to reallocate a bigger
skb->head than the one allocated by NIC driver before entering GRO
layer.Current code does not change skb->truesize, leaving this burden to
callers if they care enough.Blindly changing skb->truesize in pskb_expand_head() is not
easy, as some producers might track skb->truesize, for example
in xmit path for back pressure feedback (sk->sk_wmem_alloc)We can detect the cases where it should be safe to change
skb->truesize :1) skb is not attached to a socket.
2) If it is attached to a socket, destructor is sock_edemux()My audit gave only two callers doing their own skb->truesize
manipulation.I had to remove skb parameter in sock_edemux macro when
CONFIG_INET is not set to avoid a compile error.Signed-off-by: Eric Dumazet
Reported-by: Slava Shwartsman
Signed-off-by: David S. Miller
17 Jan, 2017
1 commit
-
In commit d35c99ff77ecb ("netlink: do not enter direct reclaim from
netlink_dump()") we made sure to not trigger expensive memory reclaim.Problem is that a bit later, netlink_trim() might be called and
trigger memory reclaim.netlink_trim() should be best effort, and really as fast as possible.
Under memory pressure, it is fine to not trim this skb.Signed-off-by: Eric Dumazet
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller
25 Dec, 2016
1 commit
-
This was entirely automated, using the script by Al:
PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*'
sed -i -e "s!$PATT!#include !" \
$(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)to do the replacement at the end of the merge window.
Requested-by: Al Viro
Signed-off-by: Linus Torvalds
11 Dec, 2016
1 commit
-
netlink_chain is called in ->release(), which is apparently
a process context, so we don't have to use an atomic notifier
here.Signed-off-by: Cong Wang
Signed-off-by: David S. Miller
06 Dec, 2016
1 commit
-
It is wrong to schedule a work from sk_destruct using the socket
as the memory reserve because the socket will be freed immediately
after the return from sk_destruct.Instead we should do the deferral prior to sk_free.
This patch does just that.
Fixes: 707693c8a498 ("netlink: Call cb->done from a worker thread")
Signed-off-by: Herbert Xu
Tested-by: Andrey Konovalov
Signed-off-by: David S. Miller
30 Nov, 2016
1 commit
-
The cb->done interface expects to be called in process context.
This was broken by the netlink RCU conversion. This patch fixes
it by adding a worker struct to make the cb->done call where
necessary.Fixes: 21e4902aea80 ("netlink: Lockless lookup with RCU grace...")
Reported-by: Subash Abhinov Kasiviswanathan
Signed-off-by: Herbert Xu
Acked-by: Cong Wang
Signed-off-by: David S. Miller
07 Oct, 2016
1 commit
-
Since linux-3.15, netlink_dump() can use up to 16384 bytes skb
allocations.Due to struct skb_shared_info ~320 bytes overhead, we end up using
order-3 (on x86) page allocations, that might trigger direct reclaim and
add stress.The intent was really to attempt a large allocation but immediately
fallback to a smaller one (order-1 on x86) in case of memory stress.On recent kernels (linux-4.4), we can remove __GFP_DIRECT_RECLAIM to
meet the goal. Old kernels would need to remove __GFP_WAITWhile we are at it, since we do an order-3 allocation, allow to use
all the allocated bytes instead of 16384 to reduce syscalls during
large dumps.iproute2 already uses 32KB recvmsg() buffer sizes.
Alexei provided an initial patch downsizing to SKB_WITH_OVERHEAD(16384)
Fixes: 9063e21fb026 ("netlink: autosize skb lengthes")
Signed-off-by: Eric Dumazet
Reported-by: Alexei Starovoitov
Cc: Greg Thelen
Reviewed-by: Greg Rose
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller
17 May, 2016
1 commit
-
When we free cb->skb after a dump, we do it after releasing the
lock. This means that a new dump could have started in the time
being and we'll end up freeing their skb instead of ours.This patch saves the skb and module before we unlock so we free
the right memory.Fixes: 16b304f3404f ("netlink: Eliminate kmalloc in netlink dump operation.")
Reported-by: Baozeng Ding
Signed-off-by: Herbert Xu
Acked-by: Cong Wang
Signed-off-by: David S. Miller
24 Apr, 2016
1 commit
-
Conflicts were two cases of simple overlapping changes,
nothing serious.In the UDP case, we need to add a hlist_add_tail_rcu()
to linux/rculist.h, because we've moved UDP socket handling
away from using nulls lists.Signed-off-by: David S. Miller
11 Apr, 2016
1 commit
-
All existing users of NETLINK_URELEASE use it to clean up resources that
were previously allocated to a socket via some command. As a result, no
users require getting this notification for unbound sockets.Sending it for unbound sockets, however, is a problem because any user
(including unprivileged users) can create a socket that uses the same ID
as an existing socket. Binding this new socket will fail, but if the
NETLINK_URELEASE notification is generated for such sockets, the users
thereof will be tricked into thinking the socket that they allocated the
resources for is closed.In the nl80211 case, this will cause destruction of virtual interfaces
that still belong to an existing hostapd process; this is the case that
Dmitry noticed. In the NFC case, it will cause a poll abort. In the case
of netlink log/queue it will cause them to stop reporting events, as if
NFULNL_CFG_CMD_UNBIND/NFQNL_CFG_CMD_UNBIND had been called.Fix this problem by checking that the socket is bound before generating
the NETLINK_URELEASE notification.Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Ivanov
Signed-off-by: Johannes Berg
Signed-off-by: David S. Miller
05 Apr, 2016
1 commit
-
In certain cases, the 802.11 mesh pathtable code wants to
iterate over all of the entries in the forwarding table from
the receive path, which is inside an RCU read-side critical
section. Enable walks inside atomic sections by allowing
GFP_ATOMIC allocations for the walker state.Change all existing callsites to pass in GFP_KERNEL.
Acked-by: Thomas Graf
Signed-off-by: Bob Copeland
[also adjust gfs2/glock.c and rhashtable tests]
Signed-off-by: Johannes Berg
23 Mar, 2016
1 commit
-
By returning -ENOIOCTLCMD, sock_do_ioctl() falls back to calling
dev_ioctl(), which provides support for NIC driver ioctls, which
includes ethtool support. This is similar to the way ioctls are handled
in udp.c or tcp.c.This removes the requirement that ethtool for example be tied to the
support of a specific L3 protocol (ethtool uses an AF_INET socket
today).Signed-off-by: David Decotigny
Signed-off-by: David S. Miller
19 Feb, 2016
2 commits
-
reverts commit 3ab1f683bf8b ("nfnetlink: add support for memory mapped
netlink")'Like previous commits in the series, remove wrappers that are not needed
after mmapped netlink removal.Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller -
mmapped netlink has a number of unresolved issues:
- TX zerocopy support had to be disabled more than a year ago via
commit 4682a0358639b29cf ("netlink: Always copy on mmap TX.")
because the content of the mmapped area can change after netlink
attribute validation but before message processing.- RX support was implemented mainly to speed up nfqueue dumping packet
payload to userspace. However, since commit ae08ce0021087a5d812d2
("netfilter: nfnetlink_queue: zero copy support") we avoid one copy
with the socket-based interface too (via the skb_zerocopy helper).The other problem is that skbs attached to mmaped netlink socket
behave different from normal skbs:- they don't have a shinfo area, so all functions that use skb_shinfo()
(e.g. skb_clone) cannot be used.- reserving headroom prevents userspace from seeing the content as
it expects message to start at skb->head.
See for instance
commit aa3a022094fa ("netlink: not trim skb for mmaped socket when dump").- skbs handed e.g. to netlink_ack must have non-NULL skb->sk, else we
crash because it needs the sk to check if a tx ring is attached.Also not obvious, leads to non-intuitive bug fixes such as 7c7bdf359
("netfilter: nfnetlink: use original skbuff when acking batches").mmaped netlink also didn't play nicely with the skb_zerocopy helper
used by nfqueue and openvswitch. Daniel Borkmann fixed this via
commit 6bb0fef489f6 ("netlink, mmap: fix edge-case leakages in nf queue
zero-copy")' but at the cost of also needing to provide remaining
length to the allocation function.nfqueue also has problems when used with mmaped rx netlink:
- mmaped netlink doesn't allow use of nfqueue batch verdict messages.
Problem is that in the mmap case, the allocation time also determines
the ordering in which the frame will be seen by userspace (A
allocating before B means that A is located in earlier ring slot,
but this also means that B might get a lower sequence number then A
since seqno is decided later. To fix this we would need to extend the
spinlocked region to also cover the allocation and message setup which
isn't desirable.
- nfqueue can now be configured to queue large (GSO) skbs to userspace.
Queing GSO packets is faster than having to force a software segmentation
in the kernel, so this is a desirable option. However, with a mmap based
ring one has to use 64kb per ring slot element, else mmap has to fall back
to the socket path (NL_MMAP_STATUS_COPY) for all large packets.To use the mmap interface, userspace not only has to probe for mmap netlink
support, it also has to implement a recv/socket receive path in order to
handle messages that exceed the size of an rx ring element.Cc: Daniel Borkmann
Cc: Ken-ichirou MATSUZAWA
Cc: Pablo Neira Ayuso
Cc: Patrick McHardy
Cc: Thomas Graf
Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller