18 Feb, 2017
1 commit
-
[ Upstream commit 5fa8bbda38c668e56b0c6cdecced2eac2fe36dec ]
Dmitry reported a warning [1] showing that we were calling
net_disable_timestamp() -> static_key_slow_dec() from a non
process context.Grabbing a mutex while holding a spinlock or rcu_read_lock()
is not allowed.As Cong suggested, we now use a work queue.
It is possible netstamp_clear() exits while netstamp_needed_deferred
is not zero, but it is probably not worth trying to do better than that.netstamp_needed_deferred atomic tracks the exact number of deferred
decrements.[1]
[ INFO: suspicious RCU usage. ]
4.10.0-rc5+ #192 Not tainted
-------------------------------
./include/linux/rcupdate.h:561 Illegal context switch in RCU read-side
critical section!other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 0
2 locks held by syz-executor14/23111:
#0: (sk_lock-AF_INET6){+.+.+.}, at: [] lock_sock
include/net/sock.h:1454 [inline]
#0: (sk_lock-AF_INET6){+.+.+.}, at: []
rawv6_sendmsg+0x1e65/0x3ec0 net/ipv6/raw.c:919
#1: (rcu_read_lock){......}, at: [] nf_hook
include/linux/netfilter.h:201 [inline]
#1: (rcu_read_lock){......}, at: []
__ip6_local_out+0x258/0x840 net/ipv6/output_core.c:160stack backtrace:
CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:15 [inline]
dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
lockdep_rcu_suspicious+0x139/0x180 kernel/locking/lockdep.c:4452
rcu_preempt_sleep_check include/linux/rcupdate.h:560 [inline]
___might_sleep+0x560/0x650 kernel/sched/core.c:7748
__might_sleep+0x95/0x1a0 kernel/sched/core.c:7739
mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752
atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060
__static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149
static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174
net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728
sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403
__sk_destruct+0x27d/0x6b0 net/core/sock.c:1441
sk_destruct+0x47/0x80 net/core/sock.c:1460
__sk_free+0x57/0x230 net/core/sock.c:1468
sock_wfree+0xae/0x120 net/core/sock.c:1645
skb_release_head_state+0xfc/0x200 net/core/skbuff.c:655
skb_release_all+0x15/0x60 net/core/skbuff.c:668
__kfree_skb+0x15/0x20 net/core/skbuff.c:684
kfree_skb+0x16e/0x4c0 net/core/skbuff.c:705
inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
inet_frag_put include/net/inet_frag.h:133 [inline]
nf_ct_frag6_gather+0x1106/0x3840
net/ipv6/netfilter/nf_conntrack_reasm.c:617
ipv6_defrag+0x1be/0x2b0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
nf_hook_entry_hookfn include/linux/netfilter.h:102 [inline]
nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310
nf_hook include/linux/netfilter.h:212 [inline]
__ip6_local_out+0x489/0x840 net/ipv6/output_core.c:160
ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170
ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722
ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742
rawv6_push_pending_frames net/ipv6/raw.c:613 [inline]
rawv6_sendmsg+0x2d1a/0x3ec0 net/ipv6/raw.c:927
inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
sock_sendmsg_nosec net/socket.c:635 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:645
sock_write_iter+0x326/0x600 net/socket.c:848
do_iter_readv_writev+0x2e3/0x5b0 fs/read_write.c:695
do_readv_writev+0x42c/0x9b0 fs/read_write.c:872
vfs_writev+0x87/0xc0 fs/read_write.c:911
do_writev+0x110/0x2c0 fs/read_write.c:944
SYSC_writev fs/read_write.c:1017 [inline]
SyS_writev+0x27/0x30 fs/read_write.c:1014
entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: 0033:0x445559
RSP: 002b:00007f6f46fceb58 EFLAGS: 00000292 ORIG_RAX: 0000000000000014
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 0000000000445559
RDX: 0000000000000001 RSI: 0000000020f1eff0 RDI: 0000000000000005
RBP: 00000000006e19c0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000292 R12: 0000000000700000
R13: 0000000020f59000 R14: 0000000000000015 R15: 0000000000020400
BUG: sleeping function called from invalid context at
kernel/locking/mutex.c:752
in_atomic(): 1, irqs_disabled(): 0, pid: 23111, name: syz-executor14
INFO: lockdep is turned off.
CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:15 [inline]
dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
___might_sleep+0x47e/0x650 kernel/sched/core.c:7780
__might_sleep+0x95/0x1a0 kernel/sched/core.c:7739
mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752
atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060
__static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149
static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174
net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728
sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403
__sk_destruct+0x27d/0x6b0 net/core/sock.c:1441
sk_destruct+0x47/0x80 net/core/sock.c:1460
__sk_free+0x57/0x230 net/core/sock.c:1468
sock_wfree+0xae/0x120 net/core/sock.c:1645
skb_release_head_state+0xfc/0x200 net/core/skbuff.c:655
skb_release_all+0x15/0x60 net/core/skbuff.c:668
__kfree_skb+0x15/0x20 net/core/skbuff.c:684
kfree_skb+0x16e/0x4c0 net/core/skbuff.c:705
inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
inet_frag_put include/net/inet_frag.h:133 [inline]
nf_ct_frag6_gather+0x1106/0x3840
net/ipv6/netfilter/nf_conntrack_reasm.c:617
ipv6_defrag+0x1be/0x2b0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
nf_hook_entry_hookfn include/linux/netfilter.h:102 [inline]
nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310
nf_hook include/linux/netfilter.h:212 [inline]
__ip6_local_out+0x489/0x840 net/ipv6/output_core.c:160
ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170
ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722
ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742
rawv6_push_pending_frames net/ipv6/raw.c:613 [inline]
rawv6_sendmsg+0x2d1a/0x3ec0 net/ipv6/raw.c:927
inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
sock_sendmsg_nosec net/socket.c:635 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:645
sock_write_iter+0x326/0x600 net/socket.c:848
do_iter_readv_writev+0x2e3/0x5b0 fs/read_write.c:695
do_readv_writev+0x42c/0x9b0 fs/read_write.c:872
vfs_writev+0x87/0xc0 fs/read_write.c:911
do_writev+0x110/0x2c0 fs/read_write.c:944
SYSC_writev fs/read_write.c:1017 [inline]
SyS_writev+0x27/0x30 fs/read_write.c:1014
entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: 0033:0x445559Fixes: b90e5794c5bd ("net: dont call jump_label_dec from irq context")
Suggested-by: Cong Wang
Reported-by: Dmitry Vyukov
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
04 Feb, 2017
3 commits
-
[ Upstream commit 85c814016ce3b371016c2c054a905fa2492f5a65 ]
When attempting to free lwtunnel state after the module for the encap
has been unloaded an oops occurs:BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: lwtstate_free+0x18/0x40
[..]
task: ffff88003e372380 task.stack: ffffc900001fc000
RIP: 0010:lwtstate_free+0x18/0x40
RSP: 0018:ffff88003fd83e88 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88002bbb3380 RCX: ffff88000c91a300
[..]
Call Trace:
free_fib_info_rcu+0x195/0x1a0
? rt_fibinfo_free+0x50/0x50
rcu_process_callbacks+0x2d3/0x850
? rcu_process_callbacks+0x296/0x850
__do_softirq+0xe4/0x4cb
irq_exit+0xb0/0xc0
smp_apic_timer_interrupt+0x3d/0x50
apic_timer_interrupt+0x93/0xa0
[..]
Code: e8 6e c6 fc ff 89 d8 5b 5d c3 bb de ff ff ff eb f4 66 90 66 66 66 66 90 55 48 89 e5 53 0f b7 07 48 89 fb 48 8b 04 c5 00 81 d5 81 8b 40 08 48 85 c0 74 13 ff d0 48 8d 7b 20 be 20 00 00 00 e8The problem is after the module for the encap can be unloaded the
corresponding ops is removed and is thus NULL here.Modules implementing lwtunnel ops should not be allowed to unload
while there is state alive using those ops, so grab the module
reference for the ops on creating lwtunnel state and of course release
the reference when freeing the state.Fixes: 1104d9ba443a ("lwtunnel: Add destroy state operation")
Signed-off-by: Robert Shearman
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 9ed59592e3e379b2e9557dc1d9e9ec8fcbb33f16]
Trying to add an mpls encap route when the MPLS modules are not loaded
hangs. For example:CONFIG_MPLS=y
CONFIG_NET_MPLS_GSO=m
CONFIG_MPLS_ROUTING=m
CONFIG_MPLS_IPTUNNEL=m$ ip route add 10.10.10.10/32 encap mpls 100 via inet 10.100.1.2
The ip command hangs:
root 880 826 0 21:25 pts/0 00:00:00 ip route add 10.10.10.10/32 encap mpls 100 via inet 10.100.1.2$ cat /proc/880/stack
[] call_usermodehelper_exec+0xd6/0x134
[] __request_module+0x27b/0x30a
[] lwtunnel_build_state+0xe4/0x178
[] fib_create_info+0x47f/0xdd4
[] fib_table_insert+0x90/0x41f
[] inet_rtm_newroute+0x4b/0x52
...modprobe is trying to load rtnl-lwt-MPLS:
root 881 5 0 21:25 ? 00:00:00 /sbin/modprobe -q -- rtnl-lwt-MPLS
and it hangs after loading mpls_router:
$ cat /proc/881/stack
[] rtnl_lock+0x12/0x14
[] register_netdevice_notifier+0x16/0x179
[] mpls_init+0x25/0x1000 [mpls_router]
[] do_one_initcall+0x8e/0x13f
[] do_init_module+0x5a/0x1e5
[] load_module+0x13bd/0x17d6
...The problem is that lwtunnel_build_state is called with rtnl lock
held preventing mpls_init from registering.Given the potential references held by the time lwtunnel_build_state it
can not drop the rtnl lock to the load module. So, extract the module
loading code from lwtunnel_build_state into a new function to validate
the encap type. The new function is called while converting the user
request into a fib_config which is well before any table, device or
fib entries are examined.Fixes: 745041e2aaf1 ("lwtunnel: autoload of lwt modules")
Signed-off-by: David Ahern
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 7be2c82cfd5d28d7adb66821a992604eb6dd112e ]
Ashizuka reported a highmem oddity and sent a patch for freescale
fec driver.But the problem root cause is that core networking stack
must ensure no skb with highmem fragment is ever sent through
a device that does not assert NETIF_F_HIGHDMA in its features.We need to call illegal_highdma() from harmonize_features()
regardless of CSUM checks.Fixes: ec5f06156423 ("net: Kill link between CSUM and SG features.")
Signed-off-by: Eric Dumazet
Cc: Pravin Shelar
Reported-by: "Ashizuka, Yuusuke"
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
15 Jan, 2017
7 commits
-
[ Upstream commit 7cfd5fd5a9813f1430290d20c0fead9b4582a307 ]
On 32bit arches, (skb->end - skb->data) is not 'unsigned int',
so we shall use min_t() instead of min() to avoid a compiler error.Fixes: 1272ce87fa01 ("gro: Enter slow-path if there is no tailroom")
Reported-by: kernel test robot
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 1272ce87fa017ca4cf32920764d879656b7a005a ]
The GRO path has a fast-path where we avoid calling pskb_may_pull
and pskb_expand by directly accessing frag0. However, this should
only be done if we have enough tailroom in the skb as otherwise
we'll have to expand it later anyway.This patch adds the check by capping frag0_len with the skb tailroom.
Fixes: cb18978cbf45 ("gro: Open-code final pskb_may_pull")
Reported-by: Slava Shwartsman
Signed-off-by: Herbert Xu
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 5d722b3024f6762addb8642ffddc9f275b5107ae ]
Commit bdabad3e363d ("net: Add Qualcomm IPC router") introduced a
new address family. Update the family name tables accordingly so
that the lockdep initialization can use the proper names for this
family.Cc: Courtney Cavin
Cc: Bjorn Andersson
Signed-off-by: Suman Anna
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit d0af683407a26a4437d8fa6e283ea201f2ae8146 ]
__skb_flow_dissect can be called with a skb or a data packet, either
can be NULL. All calls seems to have been moved to __skb_header_pointer
except the pptp handling which is still calling skb_header_pointer.skb_header_pointer will use skb->data and thus:
[ 109.556866] BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
[ 109.557102] IP: [] __skb_flow_dissect+0xa88/0xce0
[ 109.557263] PGD 0
[ 109.557338]
[ 109.557484] Oops: 0000 [#1] SMP
[ 109.557562] Modules linked in: chaoskey
[ 109.557783] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.9.0 #79
[ 109.557867] Hardware name: Supermicro A1SRM-LN7F/LN5F/A1SRM-LN7F-2758, BIOS 1.0c 11/04/2015
[ 109.557957] task: ffff94085c27bc00 task.stack: ffffb745c0068000
[ 109.558041] RIP: 0010:[] [] __skb_flow_dissect+0xa88/0xce0
[ 109.558203] RSP: 0018:ffff94087fc83d40 EFLAGS: 00010206
[ 109.558286] RAX: 0000000000000130 RBX: ffffffff8975bf80 RCX: ffff94084fab6800
[ 109.558373] RDX: 0000000000000010 RSI: 000000000000000c RDI: 0000000000000000
[ 109.558460] RBP: 0000000000000b88 R08: 0000000000000000 R09: 0000000000000022
[ 109.558547] R10: 0000000000000008 R11: ffff94087fc83e04 R12: 0000000000000000
[ 109.558763] R13: ffff94084fab6800 R14: ffff94087fc83e04 R15: 000000000000002f
[ 109.558979] FS: 0000000000000000(0000) GS:ffff94087fc80000(0000) knlGS:0000000000000000
[ 109.559326] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 109.559539] CR2: 0000000000000080 CR3: 0000000281809000 CR4: 00000000001026e0
[ 109.559753] Stack:
[ 109.559957] 000000000000000c ffff94084fab6822 0000000000000001 ffff94085c2b5fc0
[ 109.560578] 0000000000000001 0000000000002000 0000000000000000 0000000000000000
[ 109.561200] 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 109.561820] Call Trace:
[ 109.562027]
[ 109.562108] [] ? eth_get_headlen+0x7a/0xf0
[ 109.562522] [] ? igb_poll+0x96a/0xe80
[ 109.562737] [] ? net_rx_action+0x20b/0x350
[ 109.562953] [] ? __do_softirq+0xe8/0x280
[ 109.563169] [] ? irq_exit+0xaa/0xb0
[ 109.563382] [] ? do_IRQ+0x4b/0xc0
[ 109.563597] [] ? common_interrupt+0x7f/0x7f
[ 109.563810]
[ 109.563890] [] ? cpuidle_enter_state+0x130/0x2c0
[ 109.564304] [] ? cpuidle_enter_state+0x120/0x2c0
[ 109.564520] [] ? cpu_startup_entry+0x19f/0x1f0
[ 109.564737] [] ? start_secondary+0x12a/0x140
[ 109.564950] Code: 83 e2 20 a8 80 0f 84 60 01 00 00 c7 04 24 08 00
00 00 66 85 d2 0f 84 be fe ff ff e9 69 fe ff ff 8b 34 24 89 f2 83 c2
04 66 85 c0 8b 84 24 80 00 00 00 0f 49 d6 41 8d 31 01 d6 41 2b 84
24 84
[ 109.569959] RIP [] __skb_flow_dissect+0xa88/0xce0
[ 109.570245] RSP
[ 109.570453] CR2: 0000000000000080Fixes: ab10dccb1160 ("rps: Inspect PPTP encapsulated by GRE to get flow hash")
Signed-off-by: Ian Kumlien
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 3b48ab2248e61408910e792fe84d6ec466084c1a ]
Final nlmsg_len field update must reflect inserted net_dm_drop_point
data.This patch depends on previous patch:
"drop_monitor: add missing call to genlmsg_end"Signed-off-by: Reiter Wolfgang
Acked-by: Neil Horman
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 4200462d88f47f3759bdf4705f87e207b0f5b2e4 ]
Update nlmsg_len field with genlmsg_end to enable userspace processing
using nlmsg_next helper. Also adds error handling.Signed-off-by: Reiter Wolfgang
Acked-by: Neil Horman
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 4775cc1f2d5abca894ac32774eefc22c45347d1c ]
We miss to check if the netlink message is actually big enough to contain
a struct if_stats_msg.Add a check to prevent userland from sending us short messages that would
make us access memory beyond the end of the message.Fixes: 10c9ead9f3c6 ("rtnetlink: add new RTM_GETSTATS message to dump...")
Signed-off-by: Mathias Krause
Cc: Roopa Prabhu
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
03 Dec, 2016
1 commit
-
CAP_NET_ADMIN users should not be allowed to set negative
sk_sndbuf or sk_rcvbuf values, as it can lead to various memory
corruptions, crashes, OOM...Note that before commit 82981930125a ("net: cleanups in
sock_setsockopt()"), the bug was even more serious, since SO_SNDBUF
and SO_RCVBUF were vulnerable.This needs to be backported to all known linux kernels.
Again, many thanks to syzkaller team for discovering this gem.
Signed-off-by: Eric Dumazet
Reported-by: Andrey Konovalov
Signed-off-by: David S. Miller
02 Dec, 2016
1 commit
-
Use the correct attribute constant names IFLA_GSO_MAX_{SEGS,SIZE}
instead of IFLA_MAX_GSO_{SEGS,SIZE} for the comments int nlmsg_size().Cc: Eric Dumazet
Signed-off-by: Tobias Klauser
Signed-off-by: David S. Miller
28 Nov, 2016
1 commit
-
Steffen Klassert says:
====================
pull request (net): ipsec 2016-11-251) Fix a refcount leak in vti6.
From Nicolas Dichtel.2) Fix a wrong if statement in xfrm_sk_policy_lookup.
From Florian Westphal.3) The flowcache watermarks are per cpu. Take this into
account when comparing to the threshold where we
refusing new allocations. From Miroslav Urbanek.Please pull or let me know if there are problems.
====================Signed-off-by: David S. Miller
26 Nov, 2016
1 commit
-
The ETHTOOL_GLINKSETTINGS command is deprecating the ETHTOOL_GSET
command and likewise it shouldn't require the CAP_NET_ADMIN capability.Signed-off-by: Miroslav Lichvar
Signed-off-by: David S. Miller
24 Nov, 2016
1 commit
-
For RT netlink, calcit() function should return the minimal size for
netlink dump message. This will make sure that dump message for every
network device can be stored.Currently, rtnl_calcit() function doesn't account the size of header of
netlink message, this patch will fix it.Signed-off-by: Zhang Shengju
Signed-off-by: David S. Miller
23 Nov, 2016
2 commits
-
The threshold for OOM protection is too small for systems with large
number of CPUs. Applications report ENOBUFs on connect() every 10
minutes.The problem is that the variable net->xfrm.flow_cache_gc_count is a
global counter while the variable fc->high_watermark is a per-CPU
constant. Take the number of CPUs into account as well.Fixes: 6ad3122a08e3 ("flowcache: Avoid OOM condition under preasure")
Reported-by: Lukáš Koldrt
Tested-by: Jan Hejl
Signed-off-by: Miroslav Urbanek
Signed-off-by: Steffen Klassert -
Andre Noll reported panics after my recent fix (commit 34fad54c2537
"net: __skb_flow_dissect() must cap its return value")After some more headaches, Alexander root caused the problem to
init_default_flow_dissectors() being called too late, in case
a network driver like IGB is not a module and receives DHCP message
very early.Fix is to call init_default_flow_dissectors() much earlier,
as it is a core infrastructure and does not depend on another
kernel service.Fixes: 06635a35d13d4 ("flow_dissect: use programable dissector in skb_flow_dissect and friends")
Signed-off-by: Eric Dumazet
Reported-by: Andre Noll
Diagnosed-by: Alexander Duyck
Signed-off-by: David S. Miller
20 Nov, 2016
1 commit
-
If the link is filtered out, loop index should also be updated. If not,
loop index will not be correct.Fixes: dc599f76c22b0 ("net: Add support for filtering link dump by master device and kind")
Signed-off-by: Zhang Shengju
Acked-by: David Ahern
Signed-off-by: David S. Miller
19 Nov, 2016
1 commit
-
Add missing NDA_VLAN attribute's size.
Fixes: 1e53d5bb8878 ("net: Pass VLAN ID to rtnl_fdb_notify.")
Signed-off-by: Sabrina Dubroca
Signed-off-by: David S. Miller
18 Nov, 2016
1 commit
-
Andrei reports we still allocate netns ID from idr after we destroy
it in cleanup_net().cleanup_net():
...
idr_destroy(&net->netns_ids);
...
list_for_each_entry_reverse(ops, &pernet_list, list)
ops_exit_list(ops, &net_exit_list);
-> rollback_registered_many()
-> rtmsg_ifinfo_build_skb()
-> rtnl_fill_ifinfo()
-> peernet2id_alloc()After that point we should not even access net->netns_ids, we
should check the death of the current netns as early as we can in
peernet2id_alloc().For net-next we can consider to avoid sending rtmsg totally,
it is a good optimization for netns teardown path.Fixes: 0c7aecd4bde4 ("netns: add rtnl cmd to add and get peer netns ids")
Reported-by: Andrei Vagin
Cc: Nicolas Dichtel
Signed-off-by: Cong Wang
Acked-by: Andrei Vagin
Signed-off-by: Nicolas Dichtel
Signed-off-by: David S. Miller
16 Nov, 2016
2 commits
-
rtnl_xdp_size() only considers the size of the actual payload attribute,
and misses the space taken by the attribute used for nesting (IFLA_XDP).Fixes: d1fdd9138682 ("rtnl: add option for setting link xdp prog")
Signed-off-by: Sabrina Dubroca
Reviewed-by: Brenden Blanco
Signed-off-by: David S. Miller -
The size reported by rtnl_vfinfo_size doesn't match the space used by
rtnl_fill_vfinfo.rtnl_vfinfo_size currently doesn't account for the nest attributes
used by statistics (added in commit 3b766cd83232), nor for struct
ifla_vf_tx_rate (since commit ed616689a3d9, which added ifla_vf_rate
to the dump without removing ifla_vf_tx_rate, but replaced
ifla_vf_tx_rate with ifla_vf_rate in the size computation).Fixes: 3b766cd83232 ("net/core: Add reading VF statistics through the PF netdevice")
Fixes: ed616689a3d9 ("net-next:v4: Add support to configure SR-IOV VF minimum and maximum Tx rate through ip tool")
Signed-off-by: Sabrina Dubroca
Signed-off-by: David S. Miller
13 Nov, 2016
2 commits
-
After Tom patch, thoff field could point past the end of the buffer,
this could fool some callers.If an skb was provided, skb->len should be the upper limit.
If not, hlen is supposed to be the upper limit.Fixes: a6e544b0a88b ("flow_dissector: Jump to exit code in __skb_flow_dissect")
Signed-off-by: Eric Dumazet
Reported-by: Yibin Yang
Acked-by: Willem de Bruijn
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller -
If the bpf program calls bpf_redirect(dev, 0) and dev is
an ipip/ip6tnl, it currently includes the mac header.
e.g. If dev is ipip, the end result is IP-EthHdr-IP instead
of IP-IP.The fix is to pull the mac header. At ingress, skb_postpull_rcsum()
is not needed because the ethhdr should have been pulled once already
and then got pushed back just before calling the bpf_prog.
At egress, this patch calls skb_postpull_rcsum().If bpf_redirect(dev, BPF_F_INGRESS) is called,
it also fails now because it calls dev_forward_skb() which
eventually calls eth_type_trans(skb, dev). The eth_type_trans()
will set skb->type = PACKET_OTHERHOST because the mac address
does not match the redirecting dev->dev_addr. The PACKET_OTHERHOST
will eventually cause the ip_rcv() errors out. To fix this,
____dev_forward_skb() is added.Joint work with Daniel Borkmann.
Fixes: cfc7381b3002 ("ip_tunnel: add collect_md mode to IPIP tunnel")
Fixes: 8d79266bc48c ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
Acked-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: Martin KaFai Lau
Signed-off-by: David S. Miller
10 Nov, 2016
1 commit
-
To avoid having dangling function pointers left behind, reset calcit in
rtnl_unregister(), too.This is no issue so far, as only the rtnl core registers a netlink
handler with a calcit hook which won't be unregistered, but may become
one if new code makes use of the calcit hook.Fixes: c7ac8679bec9 ("rtnetlink: Compute and store minimum ifinfo...")
Cc: Jeff Kirsher
Cc: Greg Rose
Signed-off-by: Mathias Krause
Signed-off-by: David S. Miller
04 Nov, 2016
1 commit
-
Andrey Konovalov reported following error while fuzzing with syzkaller :
IPv4: Attempt to release alive inet socket ffff880068e98940
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Modules linked in:
CPU: 1 PID: 3905 Comm: a.out Not tainted 4.9.0-rc3+ #333
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff88006b9e0000 task.stack: ffff880068770000
RIP: 0010:[] []
selinux_socket_sock_rcv_skb+0xff/0x6a0 security/selinux/hooks.c:4639
RSP: 0018:ffff8800687771c8 EFLAGS: 00010202
RAX: ffff88006b9e0000 RBX: 1ffff1000d0eee3f RCX: 1ffff1000d1d312a
RDX: 1ffff1000d1d31a6 RSI: dffffc0000000000 RDI: 0000000000000010
RBP: ffff880068777360 R08: 0000000000000000 R09: 0000000000000002
R10: dffffc0000000000 R11: 0000000000000006 R12: ffff880068e98940
R13: 0000000000000002 R14: ffff880068777338 R15: 0000000000000000
FS: 00007f00ff760700(0000) GS:ffff88006cd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020008000 CR3: 000000006a308000 CR4: 00000000000006e0
Stack:
ffff8800687771e0 ffffffff812508a5 ffff8800686f3168 0000000000000007
ffff88006ac8cdfc ffff8800665ea500 0000000041b58ab3 ffffffff847b5480
ffffffff819eac60 ffff88006b9e0860 ffff88006b9e0868 ffff88006b9e07f0
Call Trace:
[] security_sock_rcv_skb+0x75/0xb0 security/security.c:1317
[] sk_filter_trim_cap+0x67/0x10e0 net/core/filter.c:81
[] __sk_receive_skb+0x30/0xa00 net/core/sock.c:460
[] dccp_v4_rcv+0xdb2/0x1910 net/dccp/ipv4.c:873
[] ip_local_deliver_finish+0x332/0xad0
net/ipv4/ip_input.c:216
[< inline >] NF_HOOK_THRESH ./include/linux/netfilter.h:232
[< inline >] NF_HOOK ./include/linux/netfilter.h:255
[] ip_local_deliver+0x1c2/0x4b0 net/ipv4/ip_input.c:257
[< inline >] dst_input ./include/net/dst.h:507
[] ip_rcv_finish+0x750/0x1c40 net/ipv4/ip_input.c:396
[< inline >] NF_HOOK_THRESH ./include/linux/netfilter.h:232
[< inline >] NF_HOOK ./include/linux/netfilter.h:255
[] ip_rcv+0x96f/0x12f0 net/ipv4/ip_input.c:487
[] __netif_receive_skb_core+0x1897/0x2a50 net/core/dev.c:4213
[] __netif_receive_skb+0x2a/0x170 net/core/dev.c:4251
[] netif_receive_skb_internal+0x1b3/0x390 net/core/dev.c:4279
[] netif_receive_skb+0x48/0x250 net/core/dev.c:4303
[] tun_get_user+0xbd5/0x28a0 drivers/net/tun.c:1308
[] tun_chr_write_iter+0xda/0x190 drivers/net/tun.c:1332
[< inline >] new_sync_write fs/read_write.c:499
[] __vfs_write+0x334/0x570 fs/read_write.c:512
[] vfs_write+0x17b/0x500 fs/read_write.c:560
[< inline >] SYSC_write fs/read_write.c:607
[] SyS_write+0xd4/0x1a0 fs/read_write.c:599
[] entry_SYSCALL_64_fastpath+0x1f/0xc2It turns out DCCP calls __sk_receive_skb(), and this broke when
lookups no longer took a reference on listeners.Fix this issue by adding a @refcounted parameter to __sk_receive_skb(),
so that sock_put() is used only when needed.Fixes: 3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt under synflood")
Signed-off-by: Eric Dumazet
Reported-by: Andrey Konovalov
Tested-by: Andrey Konovalov
Signed-off-by: David S. Miller
01 Nov, 2016
2 commits
-
Sending zero checksum is ok for TCP, but not for UDP.
UDPv6 receiver should by default drop a frame with a 0 checksum,
and UDPv4 would not verify the checksum and might accept a corrupted
packet.Simply replace such checksum by 0xffff, regardless of transport.
This error was caught on SIT tunnels, but seems generic.
Signed-off-by: Eric Dumazet
Cc: Maciej Żenczykowski
Cc: Willem de Bruijn
Acked-by: Maciej Żenczykowski
Signed-off-by: David S. Miller -
At accept() time, it is possible the parent has a non zero
sk_err_soft, leftover from a prior error.Make sure we do not leave this value in the child, as it
makes future getsockopt(SO_ERROR) calls quite unreliable.Signed-off-by: Eric Dumazet
Acked-by: Soheil Hassas Yeganeh
Signed-off-by: David S. Miller
30 Oct, 2016
2 commits
-
Pull networking fixes from David Miller:
"Lots of fixes, mostly drivers as is usually the case.1) Don't treat zero DMA address as invalid in vmxnet3, from Alexey
Khoroshilov.2) Fix element timeouts in netfilter's nft_dynset, from Anders K.
Pedersen.3) Don't put aead_req crypto struct on the stack in mac80211, from
Ard Biesheuvel.4) Several uninitialized variable warning fixes from Arnd Bergmann.
5) Fix memory leak in cxgb4, from Colin Ian King.
6) Fix bpf handling of VLAN header push/pop, from Daniel Borkmann.
7) Several VRF semantic fixes from David Ahern.
8) Set skb->protocol properly in ip6_tnl_xmit(), from Eli Cooper.
9) Socket needs to be locked in udp_disconnect(), from Eric Dumazet.
10) Div-by-zero on 32-bit fix in mlx4 driver, from Eugenia Emantayev.
11) Fix stale link state during failover in NCSCI driver, from Gavin
Shan.12) Fix netdev lower adjacency list traversal, from Ido Schimmel.
13) Propvide proper handle when emitting notifications of filter
deletes, from Jamal Hadi Salim.14) Memory leaks and big-endian issues in rtl8xxxu, from Jes Sorensen.
15) Fix DESYNC_FACTOR handling in ipv6, from Jiri Bohac.
16) Several routing offload fixes in mlxsw driver, from Jiri Pirko.
17) Fix broadcast sync problem in TIPC, from Jon Paul Maloy.
18) Validate chunk len before using it in SCTP, from Marcelo Ricardo
Leitner.19) Revert a netns locking change that causes regressions, from Paul
Moore.20) Add recursion limit to GRO handling, from Sabrina Dubroca.
21) GFP_KERNEL in irq context fix in ibmvnic, from Thomas Falcon.
22) Avoid accessing stale vxlan/geneve socket in data path, from
Pravin Shelar"* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (189 commits)
geneve: avoid using stale geneve socket.
vxlan: avoid using stale vxlan socket.
qede: Fix out-of-bound fastpath memory access
net: phy: dp83848: add dp83822 PHY support
enic: fix rq disable
tipc: fix broadcast link synchronization problem
ibmvnic: Fix missing brackets in init_sub_crq_irqs
ibmvnic: Fix releasing of sub-CRQ IRQs in interrupt context
Revert "ibmvnic: Fix releasing of sub-CRQ IRQs in interrupt context"
arch/powerpc: Update parameters for csum_tcpudp_magic & csum_tcpudp_nofold
net/mlx4_en: Save slave ethtool stats command
net/mlx4_en: Fix potential deadlock in port statistics flow
net/mlx4: Fix firmware command timeout during interrupt test
net/mlx4_core: Do not access comm channel if it has not yet been initialized
net/mlx4_en: Fix panic during reboot
net/mlx4_en: Process all completions in RX rings after port goes up
net/mlx4_en: Resolve dividing by zero in 32-bit system
net/mlx4_core: Change the default value of enable_qos
net/mlx4_core: Avoid setting ports to auto when only one port type is supported
net/mlx4_core: Fix the resource-type enum in res tracker to conform to FW spec
... -
When transmitting on a packet socket with PACKET_VNET_HDR and
PACKET_QDISC_BYPASS, validate device support for features requested
in vnet_hdr.Drop TSO packets sent to devices that do not support TSO or have the
feature disabled. Note that the latter currently do process those
packets correctly, regardless of not advertising the feature.Because of SKB_GSO_DODGY, it is not sufficient to test device features
with netif_needs_gso. Full validate_xmit_skb is needed.Switch to software checksum for non-TSO packets that request checksum
offload if that device feature is unsupported or disabled. Note that
similar to the TSO case, device drivers may perform checksum offload
correctly even when not advertising it.When switching to software checksum, packets hit skb_checksum_help,
which has two BUG_ON checksum not in linear segment. Packet sockets
always allocate at least up to csum_start + csum_off + 2 as linear.Tested by running github.com/wdebruij/kerneltools/psock_txring_vnet.c
ethtool -K eth0 tso off tx on
psock_txring_vnet -d $dst -s $src -i eth0 -l 2000 -n 1 -q -v
psock_txring_vnet -d $dst -s $src -i eth0 -l 2000 -n 1 -q -v -Nethtool -K eth0 tx off
psock_txring_vnet -d $dst -s $src -i eth0 -l 1000 -n 1 -q -v -G
psock_txring_vnet -d $dst -s $src -i eth0 -l 1000 -n 1 -q -v -G -Nv2:
- add EXPORT_SYMBOL_GPL(validate_xmit_skb_list)Fixes: d346a3fae3ff ("packet: introduce PACKET_QDISC_BYPASS socket option")
Signed-off-by: Willem de Bruijn
Acked-by: Eric Dumazet
Acked-by: Daniel Borkmann
Signed-off-by: David S. Miller
28 Oct, 2016
1 commit
-
gcc warns about an uninitialized pointer dereference in the vlan
priority handling:net/core/flow_dissector.c: In function '__skb_flow_dissect':
net/core/flow_dissector.c:281:61: error: 'vlan' may be used uninitialized in this function [-Werror=maybe-uninitialized]As pointed out by Jiri Pirko, the variable is never actually used
without being initialized first as the only way it end up uninitialized
is with skb_vlan_tag_present(skb)==true, and that means it does not
get accessed.However, the warning hints at some related issues that I'm addressing
here:- the second check for the vlan tag is different from the first one
that tests the skb for being NULL first, causing both the warning
and a possible NULL pointer dereference that was not entirely fixed.
- The same patch that introduced the NULL pointer check dropped an
earlier optimization that skipped the repeated check of the
protocol type
- The local '_vlan' variable is referenced through the 'vlan' pointer
but the variable has gone out of scope by the time that it is
accessed, causing undefined behaviorCaching the result of the 'skb && skb_vlan_tag_present(skb)' check
in a local variable allows the compiler to further optimize the
later check. With those changes, the warning also disappears.Fixes: 3805a938a6c2 ("flow_dissector: Check skb for VLAN only if skb specified.")
Fixes: d5709f7ab776 ("flow_dissector: For stripped vlan, get vlan info from skb->vlan_tci")
Signed-off-by: Arnd Bergmann
Acked-by: Jiri Pirko
Acked-by: Eric Garver
Signed-off-by: David S. Miller
23 Oct, 2016
1 commit
-
This reverts commit bc51dddf98c9 ("netns: avoid disabling irq for
netns id") as it was found to cause problems with systems running
SELinux/audit, see the mailing list thread below:* http://marc.info/?t=147694653900002&r=1&w=2
Eventually we should be able to reintroduce this code once we have
rewritten the audit multicast code to queue messages much the same
way we do for unicast messages. A tracking issue for this can be
found below:* https://github.com/linux-audit/audit-kernel/issues/23
Reported-by: Stephen Smalley
Reported-by: Elad Raz
Cc: Cong Wang
Signed-off-by: Paul Moore
Signed-off-by: David S. Miller
21 Oct, 2016
1 commit
-
Currently, GRO can do unlimited recursion through the gro_receive
handlers. This was fixed for tunneling protocols by limiting tunnel GRO
to one level with encap_mark, but both VLAN and TEB still have this
problem. Thus, the kernel is vulnerable to a stack overflow, if we
receive a packet composed entirely of VLAN headers.This patch adds a recursion counter to the GRO layer to prevent stack
overflow. When a gro_receive function hits the recursion limit, GRO is
aborted for this skb and it is processed normally. This recursion
counter is put in the GRO CB, but could be turned into a percpu counter
if we run out of space in the CB.Thanks to Vladimír Beneš for the initial bug report.
Fixes: CVE-2016-7039
Fixes: 9b174d88c257 ("net: Add Transparent Ethernet Bridging GRO support.")
Fixes: 66e5133f19e9 ("vlan: Add GRO support for non hardware accelerated vlan")
Signed-off-by: Sabrina Dubroca
Reviewed-by: Jiri Benc
Acked-by: Hannes Frederic Sowa
Acked-by: Tom Herbert
Signed-off-by: David S. Miller
19 Oct, 2016
3 commits
-
Tamir reported the following trace when processing ARP requests received
via a vlan device on top of a VLAN-aware bridge:NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [swapper/1:0]
[...]
CPU: 1 PID: 0 Comm: swapper/1 Tainted: G W 4.8.0-rc7 #1
Hardware name: Mellanox Technologies Ltd. "MSN2100-CB2F"/"SA001017", BIOS 5.6.5 06/07/2016
task: ffff88017edfea40 task.stack: ffff88017ee10000
RIP: 0010:[] [] netdev_all_lower_get_next_rcu+0x33/0x60
[...]
Call Trace:
[] mlxsw_sp_port_lower_dev_hold+0x5a/0xa0 [mlxsw_spectrum]
[] mlxsw_sp_router_netevent_event+0x80/0x150 [mlxsw_spectrum]
[] notifier_call_chain+0x4a/0x70
[] atomic_notifier_call_chain+0x1a/0x20
[] call_netevent_notifiers+0x1b/0x20
[] neigh_update+0x306/0x740
[] neigh_event_ns+0x4e/0xb0
[] arp_process+0x66f/0x700
[] ? common_interrupt+0x8c/0x8c
[] arp_rcv+0x139/0x1d0
[] ? vlan_do_receive+0xda/0x320
[] __netif_receive_skb_core+0x524/0xab0
[] ? dev_queue_xmit+0x10/0x20
[] ? br_forward_finish+0x3d/0xc0 [bridge]
[] ? br_handle_vlan+0xf6/0x1b0 [bridge]
[] __netif_receive_skb+0x18/0x60
[] netif_receive_skb_internal+0x40/0xb0
[] netif_receive_skb+0x1c/0x70
[] br_pass_frame_up+0xc6/0x160 [bridge]
[] ? deliver_clone+0x37/0x50 [bridge]
[] ? br_flood+0xcc/0x160 [bridge]
[] br_handle_frame_finish+0x224/0x4f0 [bridge]
[] br_handle_frame+0x174/0x300 [bridge]
[] __netif_receive_skb_core+0x329/0xab0
[] ? find_next_bit+0x15/0x20
[] ? cpumask_next_and+0x32/0x50
[] ? load_balance+0x178/0x9b0
[] __netif_receive_skb+0x18/0x60
[] netif_receive_skb_internal+0x40/0xb0
[] netif_receive_skb+0x1c/0x70
[] mlxsw_sp_rx_listener_func+0x61/0xb0 [mlxsw_spectrum]
[] mlxsw_core_skb_receive+0x187/0x200 [mlxsw_core]
[] mlxsw_pci_cq_tasklet+0x63a/0x9b0 [mlxsw_pci]
[] tasklet_action+0xf6/0x110
[] __do_softirq+0xf6/0x280
[] irq_exit+0xdf/0xf0
[] do_IRQ+0x54/0xd0
[] common_interrupt+0x8c/0x8cThe problem is that netdev_all_lower_get_next_rcu() never advances the
iterator, thereby causing the loop over the lower adjacency list to run
forever.Fix this by advancing the iterator and avoid the infinite loop.
Fixes: 7ce856aaaf13 ("mlxsw: spectrum: Add couple of lower device helper functions")
Signed-off-by: Ido Schimmel
Reported-by: Tamir Winetroub
Reviewed-by: Jiri Pirko
Acked-by: David Ahern
Signed-off-by: David S. Miller -
Fixes a panic when calling eth_get_headlen(). Noticed on i40e driver.
Fixes: d5709f7ab776 ("flow_dissector: For stripped vlan, get vlan info from skb->vlan_tci")
Signed-off-by: Eric Garver
Reviewed-by: Jakub Sitnicki
Acked-by: Amir Vadai
Signed-off-by: David S. Miller -
reuseport_add_sock() is not used from a module,
no need to export it.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
17 Oct, 2016
1 commit
-
After Jesper commit back in linux-3.18, we trigger a lockdep
splat in proc_create_data() while allocating memory from
pktgen_change_name().This patch converts t->if_lock to a mutex, since it is now only
used from control path, and adds proper locking to pktgen_change_name()1) pktgen_thread_lock to protect the outer loop (iterating threads)
2) t->if_lock to protect the inner loop (iterating devices)Note that before Jesper patch, pktgen_change_name() was lacking proper
protection, but lockdep was not able to detect the problem.Fixes: 8788370a1d4b ("pktgen: RCU-ify "if_list" to remove lock in next_to_run()")
Reported-by: John Sperbeck
Signed-off-by: Eric Dumazet
Cc: Jesper Dangaard Brouer
Signed-off-by: David S. Miller
16 Oct, 2016
1 commit
-
Pull gcc plugins update from Kees Cook:
"This adds a new gcc plugin named "latent_entropy". It is designed to
extract as much possible uncertainty from a running system at boot
time as possible, hoping to capitalize on any possible variation in
CPU operation (due to runtime data differences, hardware differences,
SMP ordering, thermal timing variation, cache behavior, etc).At the very least, this plugin is a much more comprehensive example
for how to manipulate kernel code using the gcc plugin internals"* tag 'gcc-plugins-v4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
latent_entropy: Mark functions with __latent_entropy
gcc-plugins: Add latent_entropy plugin
14 Oct, 2016
1 commit
-
The "vf_vlan_info" struct ends with a 2 byte struct hole so we have to
memset it to ensure that no stack information is revealed to user space.Fixes: 79aab093a0b5 ('net: Update API for VF vlan protocol 802.1ad support')
Signed-off-by: Dan Carpenter
Signed-off-by: David S. Miller