Eric Lee / smarc-fsl-linux-kernel

18 Aug, 2016

6 commits

184ca8234 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:

1) Buffers powersave frame test is reversed in cfg80211, fix from Felix
Fietkau.

2) Remove bogus WARN_ON in openvswitch, from Jarno Rajahalme.

3) Fix some tg3 ethtool logic bugs, and one that would cause no
interrupts to be generated when rx-coalescing is set to 0. From
Satish Baddipadige and Siva Reddy Kallam.

4) QLCNIC mailbox corruption and napi budget handling fix from Manish
Chopra.

5) Fix fib_trie logic when walking the trie during /proc/net/route
output than can access a stale node pointer. From David Forster.

6) Several sctp_diag fixes from Phil Sutter.

7) PAUSE frame handling fixes in mlxsw driver from Ido Schimmel.

8) Checksum fixup fixes in bpf from Daniel Borkmann.

9) Memork leaks in nfnetlink, from Liping Zhang.

10) Use after free in rxrpc, from David Howells.

11) Use after free in new skb_array code of macvtap driver, from Jason
Wang.

12) Calipso resource leak, from Colin Ian King.

13) mediatek bug fixes (missing stats sync init, etc.) from Sean Wang.

14) Fix bpf non-linear packet write helpers, from Daniel Borkmann.

15) Fix lockdep splats in macsec, from Sabrina Dubroca.

16) hv_netvsc bug fixes from Vitaly Kuznetsov, mostly to do with VF
handling.

17) Various tc-action bug fixes, from CONG Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
net_sched: allow flushing tc police actions
net_sched: unify the init logic for act_police
net_sched: convert tcf_exts from list to pointer array
net_sched: move tc offload macros to pkt_cls.h
net_sched: fix a typo in tc_for_each_action()
net_sched: remove an unnecessary list_del()
net_sched: remove the leftover cleanup_a()
mlxsw: spectrum: Allow packets to be trapped from any PG
mlxsw: spectrum: Unmap 802.1Q FID before destroying it
mlxsw: spectrum: Add missing rollbacks in error path
mlxsw: reg: Fix missing op field fill-up
mlxsw: spectrum: Trap loop-backed packets
mlxsw: spectrum: Add missing packet traps
mlxsw: spectrum: Mark port as active before registering it
mlxsw: spectrum: Create PVID vPort before registering netdevice
mlxsw: spectrum: Remove redundant errors from the code
mlxsw: spectrum: Don't return upon error in removal path
i40e: check for and deal with non-contiguous TCs
ixgbe: Re-enable ability to toggle VLAN filtering
ixgbe: Force VLNCTRL.VFE to be set in all VMDq paths
...

Linus Torvalds
2016-08-18 08:26:58 +0800
b5ac85188 net_sched: allow flushing tc police actions ... Browse Code »

The act_police uses its own code to walk the
action hashtable, which leads to that we could
not flush standalone tc police actions, so just
switch to tcf_generic_walker() like other actions.

(Joint work from Roman and Cong.)

Signed-off-by: Roman Mashak
Signed-off-by: Cong Wang
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller

Roman Mashak
2016-08-18 07:27:51 +0800
0852e4552 net_sched: unify the init logic for act_police ... Browse Code »

Jamal reported a crash when we create a police action
with a specific index, this is because the init logic
is not correct, we should always create one for this
case. Just unify the logic with other tc actions.

Fixes: a03e6fe56971 ("act_police: fix a crash during removal")
Reported-by: Jamal Hadi Salim
Signed-off-by: Cong Wang
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller

WANG Cong
2016-08-18 07:27:51 +0800
22dc13c83 net_sched: convert tcf_exts from list to pointer array ... Browse Code »

As pointed out by Jamal, an action could be shared by
multiple filters, so we can't use list to chain them
any more after we get rid of the original tc_action.
Instead, we could just save pointers to these actions
in tcf_exts, since they are refcount'ed, so convert
the list to an array of pointers.

The "ugly" part is the action API still accepts list
as a parameter, I just introduce a helper function to
convert the array of pointers to a list, instead of
relying on the C99 feature to iterate the array.

Fixes: a85a970af265 ("net_sched: move tc_action into tcf_common")
Reported-by: Jamal Hadi Salim
Cc: Jamal Hadi Salim
Signed-off-by: Cong Wang
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller

WANG Cong
2016-08-18 07:27:51 +0800
824a7e886 net_sched: remove an unnecessary list_del() ... Browse Code »

This list_del() for tc action is not needed actually,
because we only use this list to chain bulk operations,
therefore should not be carried for latter operations.

Fixes: ec0595cc4495 ("net_sched: get rid of struct tcf_common")
Cc: Jamal Hadi Salim
Signed-off-by: Cong Wang
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller

WANG Cong
2016-08-18 07:27:51 +0800
f07fed82a net_sched: remove the leftover cleanup_a() ... Browse Code »

After refactoring tc_action into tcf_common, we no
longer need to cleanup temporary "actions" in list,
they are permanently stored in the hashtable.

Fixes: a85a970af265 ("net_sched: move tc_action into tcf_common")
Reported-by: Jamal Hadi Salim
Cc: Jamal Hadi Salim
Signed-off-by: Cong Wang
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller

WANG Cong
2016-08-18 07:27:51 +0800

17 Aug, 2016

1 commit

3ec60b92d Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost ... Browse Code »

Pull virtio/vhost fixes from Michael Tsirkin:
- test fixes
- a vsock fix

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
tools/virtio: add dma stubs
vhost/test: fix after swiotlb changes
vhost/vsock: drop space available check for TX vq
ringtest: test build fix

Linus Torvalds
2016-08-17 06:51:57 +0800

16 Aug, 2016

3 commits

d2fbdf76b tipc: fix NULL pointer dereference in shutdown() ... Browse Code »

tipc_msg_create() can return a NULL skb and if so, we shouldn't try to
call tipc_node_xmit_skb() on it.

general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 3 PID: 30298 Comm: trinity-c0 Not tainted 4.7.0-rc7+ #19
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
task: ffff8800baf09980 ti: ffff8800595b8000 task.ti: ffff8800595b8000
RIP: 0010:[] [] tipc_node_xmit_skb+0x6b/0x140
RSP: 0018:ffff8800595bfce8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000003023b0e0
RDX: 0000000000000000 RSI: dffffc0000000000 RDI: ffffffff83d12580
RBP: ffff8800595bfd78 R08: ffffed000b2b7f32 R09: 0000000000000000
R10: fffffbfff0759725 R11: 0000000000000000 R12: 1ffff1000b2b7f9f
R13: ffff8800595bfd58 R14: ffffffff83d12580 R15: dffffc0000000000
FS: 00007fcdde242700(0000) GS:ffff88011af80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fcddde1db10 CR3: 000000006874b000 CR4: 00000000000006e0
DR0: 00007fcdde248000 DR1: 00007fcddd73d000 DR2: 00007fcdde248000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000090602
Stack:
0000000000000018 0000000000000018 0000000041b58ab3 ffffffff83954208
ffffffff830bb400 ffff8800595bfd30 ffffffff8309d767 0000000000000018
0000000000000018 ffff8800595bfd78 ffffffff8309da1a 00000000810ee611
Call Trace:
[] tipc_shutdown+0x553/0x880
[] SyS_shutdown+0x14b/0x170
[] do_syscall_64+0x19c/0x410
[] entry_SYSCALL64_slow_path+0x25/0x25
Code: 90 00 b4 0b 83 c7 00 f1 f1 f1 f1 4c 8d 6d e0 c7 40 04 00 00 00 f4 c7 40 08 f3 f3 f3 f3 48 89 d8 48 c1 e8 03 c7 45 b4 00 00 00 00 3c 30 00 75 78 48 8d 7b 08 49 8d 75 c0 48 b8 00 00 00 00 00
RIP [] tipc_node_xmit_skb+0x6b/0x140
RSP
---[ end trace 57b0484e351e71f1 ]---

I feel like we should maybe return -ENOMEM or -ENOBUFS, but I'm not sure
userspace is equipped to handle that. Anyway, this is better than a GPF
and looks somewhat consistent with other tipc_msg_create() callers.

Signed-off-by: Vegard Nossum
Acked-by: Ying Xue
Acked-by: Jon Maloy
Signed-off-by: David S. Miller

Vegard Nossum
2016-08-16 04:55:36 +0800
3d7b33209 gre: set inner_protocol on xmit ... Browse Code »

Ensure that the inner_protocol is set on transmit so that GSO segmentation,
which relies on that field, works correctly.

This is achieved by setting the inner_protocol in gre_build_header rather
than each caller of that function. It ensures that the inner_protocol is
set when gre_fb_xmit() is used to transmit GRE which was not previously the
case.

I have observed this is not the case when OvS transmits GRE using
lwtunnel metadata (which it always does).

Fixes: 38720352412a ("gre: Use inner_proto to obtain inner header protocol")
Cc: Pravin Shelar
Acked-by: Alexander Duyck
Signed-off-by: Simon Horman
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Simon Horman
2016-08-16 04:37:12 +0800
5e4578969 net: ipv6: Fix ping to link-local addresses. ... Browse Code »

ping_v6_sendmsg does not set flowi6_oif in response to
sin6_scope_id or sk_bound_dev_if, so it is not possible to use
these APIs to ping an IPv6 address on a different interface.
Instead, it sets flowi6_iif, which is incorrect but harmless.

Stop setting flowi6_iif, and support various ways of setting oif
in the same priority order used by udpv6_sendmsg.

Tested: https://android-review.googlesource.com/#/c/254470/
Signed-off-by: Lorenzo Colitti
Signed-off-by: David S. Miller

Lorenzo Colitti
2016-08-16 03:19:09 +0800

15 Aug, 2016

1 commit

21bc54fc0 vhost/vsock: drop space available check for TX vq ... Browse Code »

Remove unnecessary use of enable/disable callback notifications
and the incorrect more space available check.

The virtio_transport_tx_work handles when the TX virtqueue
has more buffers available.

Signed-off-by: Gerard Garcia
Acked-by: Stefan Hajnoczi
Signed-off-by: Michael S. Tsirkin

Gerard Garcia
2016-08-15 10:05:21 +0800

14 Aug, 2016

6 commits

952fcfd08 net: remove type_check from dev_get_nest_level() ... Browse Code »

The idea for type_check in dev_get_nest_level() was to count the number
of nested devices of the same type (currently, only macvlan or vlan
devices).
This prevented the false positive lockdep warning on configurations such
as:

eth0
Signed-off-by: David S. Miller

Sabrina Dubroca
2016-08-14 06:15:54 +0800
bc561632d net: ipv6: Do not keep IPv6 addresses when IPv6 is disabled ... Browse Code »

If IPv6 is disabled when the option is set to keep IPv6
addresses on link down, userspace is unaware of this as
there is no such indication via netlink. The solution is to
remove the IPv6 addresses in this case, which results in
netlink messages indicating removal of addresses in the
usual manner. This fix also makes the behavior consistent
with the case of having IPv6 disabled first, which stops
IPv6 addresses from being added.

Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional")
Signed-off-by: Mike Manning
Acked-by: David Ahern
Signed-off-by: David S. Miller

Mike Manning
2016-08-14 06:14:00 +0800
54236ab09 net/sctp: always initialise sctp_ht_iter::start_fail ... Browse Code »

sctp_transport_seq_start() does not currently clear iter->start_fail on
success, but relies on it being zero when it is allocated (by
seq_open_net()).

This can be a problem in the following sequence:

open() // allocates iter (and implicitly sets iter->start_fail = 0)
read()
- iter->start() // fails and sets iter->start_fail = 1
- iter->stop() // doesn't call sctp_transport_walk_stop() (correct)
read() again
- iter->start() // succeeds, but doesn't change iter->start_fail
- iter->stop() // doesn't call sctp_transport_walk_stop() (wrong)

We should initialize sctp_ht_iter::start_fail to zero if ->start()
succeeds, otherwise it's possible that we leave an old value of 1 there,
which will cause ->stop() to not call sctp_transport_walk_stop(), which
causes all sorts of problems like not calling rcu_read_unlock() (and
preempt_enable()), eventually leading to more warnings like this:

BUG: sleeping function called from invalid context at mm/slab.h:388
in_atomic(): 0, irqs_disabled(): 0, pid: 16551, name: trinity-c2
Preemption disabled at:[] rhashtable_walk_start+0x46/0x150

[] preempt_count_add+0x1fb/0x280
[] _raw_spin_lock+0x12/0x40
[] rhashtable_walk_start+0x46/0x150
[] sctp_transport_walk_start+0x2f/0x60
[] sctp_transport_seq_start+0x4d/0x150
[] traverse+0x170/0x850
[] seq_read+0x7cc/0x1180
[] proc_reg_read+0xbc/0x180
[] do_loop_readv_writev+0x134/0x210
[] do_readv_writev+0x565/0x660
[] vfs_readv+0x67/0xa0
[] do_preadv+0x126/0x170
[] SyS_preadv+0xc/0x10
[] do_syscall_64+0x19c/0x410
[] return_from_SYSCALL_64+0x0/0x6a
[] 0xffffffffffffffff

Notice that this is a subtly different stacktrace from the one in commit
5fc382d875 ("net/sctp: terminate rhashtable walk correctly").

Cc: Xin Long
Cc: Herbert Xu
Cc: Eric W. Biederman
Cc: Marcelo Ricardo Leitner
Signed-off-by: Vegard Nossum
Acked-By: Neil Horman
Acked-by: Marcelo Ricardo Leitner
Signed-off-by: David S. Miller

Vegard Nossum
2016-08-14 06:10:16 +0800
5ba092efc net/irda: handle iriap_register_lsap() allocation failure ... Browse Code »

If iriap_register_lsap() fails to allocate memory, self->lsap is
set to NULL. However, none of the callers handle the failure and
irlmp_connect_request() will happily dereference it:

iriap_register_lsap: Unable to allocated LSAP!
================================================================================
UBSAN: Undefined behaviour in net/irda/irlmp.c:378:2
member access within null pointer of type 'struct lsap_cb'
CPU: 1 PID: 15403 Comm: trinity-c0 Not tainted 4.8.0-rc1+ #81
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org
04/01/2014
0000000000000000 ffff88010c7e78a8 ffffffff82344f40 0000000041b58ab3
ffffffff84f98000 ffffffff82344e94 ffff88010c7e78d0 ffff88010c7e7880
ffff88010630ad00 ffffffff84a5fae0 ffffffff84d3f5c0 000000000000017a
Call Trace:
[] dump_stack+0xac/0xfc
[] ubsan_epilogue+0xd/0x8a
[] __ubsan_handle_type_mismatch+0x157/0x411
[] irlmp_connect_request+0x7ac/0x970
[] iriap_connect_request+0xa0/0x160
[] state_s_disconnect+0x88/0xd0
[] iriap_do_client_event+0x94/0x120
[] iriap_getvaluebyclass_request+0x3e0/0x6d0
[] irda_find_lsap_sel+0x1eb/0x630
[] irda_connect+0x828/0x12d0
[] SYSC_connect+0x22b/0x340
[] SyS_connect+0x9/0x10
[] do_syscall_64+0x1b3/0x4b0
[] entry_SYSCALL64_slow_path+0x25/0x25
================================================================================

The bug seems to have been around since forever.

There's more problems with missing error checks in iriap_init() (and
indeed all of irda_init()), but that's a bigger problem that needs
very careful review and testing. This patch will fix the most serious
bug (as it's easily reached from unprivileged userspace).

I have tested my patch with a reproducer.

Signed-off-by: Vegard Nossum
Signed-off-by: David S. Miller

Vegard Nossum
2016-08-14 06:09:07 +0800
0ed661d5a bpf: fix write helpers with regards to non-linear parts ... Browse Code »

Fix the bpf_try_make_writable() helper and all call sites we have in BPF,
it's currently defect with regards to skbs when the write_len spans into
non-linear parts, no matter if cloned or not.

There are multiple issues at once. First, using skb_store_bits() is not
correct since even if we have a cloned skb, page frags can still be shared.
To really make them private, we need to pull them in via __pskb_pull_tail()
first, which also gets us a private head via pskb_expand_head() implicitly.

This is for helpers like bpf_skb_store_bytes(), bpf_l3_csum_replace(),
bpf_l4_csum_replace(). Really, the only thing reasonable and working here
is to call skb_ensure_writable() before any write operation. Meaning, via
pskb_may_pull() it makes sure that parts we want to access are pulled in and
if not does so plus unclones the skb implicitly. If our write_len still fits
the headlen and we're cloned and our header of the clone is not writable,
then we need to make a private copy via pskb_expand_head(). skb_store_bits()
is a bit misleading and only safe to store into non-linear data in different
contexts such as 357b40a18b04 ("[IPV6]: IPV6_CHECKSUM socket option can
corrupt kernel memory").

For above BPF helper functions, it means after fixed bpf_try_make_writable(),
we've pulled in enough, so that we operate always based on skb->data. Thus,
the call to skb_header_pointer() and skb_store_bits() becomes superfluous.
In bpf_skb_store_bytes(), the len check is unnecessary too since it can
only pass in maximum of BPF stack size, so adding offset is guaranteed to
never overflow. Also bpf_l3/4_csum_replace() helpers must test for proper
offset alignment since they use __sum16 pointer for writing resulting csum.

The remaining helpers that change skb data not discussed here yet are
bpf_skb_vlan_push(), bpf_skb_vlan_pop() and bpf_skb_change_proto(). The
vlan helpers internally call either skb_ensure_writable() (pop case) and
skb_cow_head() (push case, for head expansion), respectively. Similarly,
bpf_skb_proto_xlat() takes care to not mangle page frags.

Fixes: 608cd71a9c7c ("tc: bpf: generalize pedit action")
Fixes: 91bc4822c3d6 ("tc: bpf: add checksum helpers")
Fixes: 3697649ff29e ("bpf: try harder on clones when writing into skb")
Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Daniel Borkmann
2016-08-14 06:01:02 +0800
b4c0e0c61 calipso: fix resource leak on calipso_genopt failure ... Browse Code »

Currently, if calipso_genopt fails then the error exit path
does not free the ipv6_opt_hdr new causing a memory leak. Fix
this by kfree'ing new on the error exit path.

Signed-off-by: Colin Ian King
Signed-off-by: David S. Miller

Colin Ian King
2016-08-14 05:56:17 +0800

13 Aug, 2016

2 commits

747ea55e4 bpf: fix bpf_skb_in_cgroup helper naming ... Browse Code »

While hashing out BPF's current_task_under_cgroup helper bits, it came
to discussion that the skb_in_cgroup helper name was suboptimally chosen.

Tejun says:

So, I think in_cgroup should mean that the object is in that
particular cgroup while under_cgroup in the subhierarchy of that
cgroup. Let's rename the other subhierarchy test to under too. I
think that'd be a lot less confusing going forward.

[...]

It's more intuitive and gives us the room to implement the real
"in" test if ever necessary in the future.

Since this touches uapi bits, we need to change this as long as v4.8
is not yet officially released. Thus, change the helper enum and rename
related bits.

Fixes: 4a482f34afcc ("cgroup: bpf: Add bpf_skb_in_cgroup_proto")
Reference: http://patchwork.ozlabs.org/patch/658500/
Suggested-by: Sargun Dhillon
Suggested-by: Tejun Heo
Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov

Daniel Borkmann
2016-08-13 12:53:33 +0800
990917006 Merge tag 'nfs-for-4.8-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs ... Browse Code »

Pull NFS client bugfixes from Trond Myklebust:
"Highlights include:

- Stable patch from Olga to fix RPCSEC_GSS upcalls when the same user
needs multiple different security services (e.g. krb5i and krb5p).

- Stable patch to fix a regression introduced by the use of
SO_REUSEPORT, and that prevented the use of multiple different NFS
versions to the same server.

- TCP socket reconnection timer fixes.

- Patch from Neil to disable the use of IPv6 temporary addresses"

* tag 'nfs-for-4.8-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFSv4: Cap the transport reconnection timer at 1/2 lease period
NFSv4: Cleanup the setting of the nfs4 lease period
SUNRPC: Limit the reconnect backoff timer to the max RPC message timeout
SUNRPC: Fix reconnection timeouts
NFSv4.2: LAYOUTSTATS may return NFS4ERR_ADMIN/DELEG_REVOKED
SUNRPC: disable the use of IPv6 temporary addresses.
SUNRPC: allow for upcalls for same uid but different gss service
SUNRPC: Fix up socket autodisconnect
SUNRPC: Handle EADDRNOTAVAIL on connection failures

Linus Torvalds
2016-08-13 03:32:24 +0800

12 Aug, 2016

1 commit

6da7e9532 Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost ... Browse Code »

Pull virtio/vhost fixes and cleanups from Michael Tsirkin:
"Misc fixes and cleanups all over the place"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio/s390: deprecate old transport
virtio/s390: keep early_put_chars
virtio_blk: Fix a slient kernel panic
virtio-vsock: fix include guard typo
vhost/vsock: fix vhost virtio_vsock_pkt use-after-free
9p/trans_virtio: use kvfree() for iov_iter_get_pages_alloc()
virtio: fix error handling for debug builds
virtio: fix memory leak in virtqueue_add()

Linus Torvalds
2016-08-12 05:10:23 +0800

11 Aug, 2016

3 commits

4b5b9ba55 openvswitch: do not ignore netdev errors when creating tunnel vports ... Browse Code »

The creation of a tunnel vport (geneve, gre, vxlan) brings up a
corresponding netdev, a multi-step operation which can fail.

For example, changing a vxlan vport's netdev state to 'up' binds the
vport's socket to a UDP port - if the binding fails (e.g. due to the
port being in use), the error is currently ignored giving the
appearance that the tunnel vport creation completed successfully.

Signed-off-by: Martynas Pumputis
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Martynas Pumputis
2016-08-11 14:13:23 +0800
672ca65d9 tipc: fix variable dereference before NULL check ... Browse Code »

In commit cf6f7e1d5109 ("tipc: dump monitor attributes"),
I dereferenced a pointer before checking if its valid.
This is reported by static check Smatch as:
net/tipc/monitor.c:733 tipc_nl_add_monitor_peer()
warn: variable dereferenced before check 'mon' (see line 731)

In this commit, we check for a valid monitor before proceeding
with any other operation.

Fixes: cf6f7e1d5109 ("tipc: dump monitor attributes")
Reported-by: Dan Carpenter
Signed-off-by: Parthasarathy Bhuvaragan
Signed-off-by: David S. Miller

Parthasarathy Bhuvaragan
2016-08-11 08:56:52 +0800
293fddff2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf ... Browse Code »

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for your net tree,
they are:

1) Use mod_timer_pending() to avoid reactivating a dead expectation in
the h323 conntrack helper, from Liping Zhang.

2) Oneliner to fix a type in the register name defined in the nf_tables
header.

3) Don't try to look further when we find an inactive elements with no
descendants in the rbtree set implementation, otherwise we crash.

4) Handle valid zero CSeq in the SIP conntrack helper, from
Christophe Leroy.

5) Don't display a trailing slash in conntrack helper with no classes
via /proc/net/nf_conntrack_expect, from Liping Zhang.

6) Fix an expectation leak during creation from the nfqueue path, again
from Liping Zhang.

7) Validate netlink port ID in verdict message from nfqueue, otherwise
an injection can be possible. Again from Zhang.

8) Reject conntrack tuples with different transport protocol on
original and reply tuples, also from Zhang.

9) Validate offset and length in nft_exthdr, make sure they are under
sizeof(u8), from Laura Garcia Liebana.
====================

Signed-off-by: David S. Miller

David S. Miller
2016-08-11 05:54:27 +0800

10 Aug, 2016

8 commits

4da449ae1 netfilter: nft_exthdr: Add size check on u8 nft_exthdr attributes ... Browse Code »

Fix the direct assignment of offset and length attributes included in
nft_exthdr structure from u32 data to u8.

Signed-off-by: Laura Garcia Liebana
Signed-off-by: Pablo Neira Ayuso

Laura Garcia Liebana
2016-08-10 19:10:13 +0800
7bb90c371 bridge: Fix problems around fdb entries pointing to the bridge device ... Browse Code »

Adding fdb entries pointing to the bridge device uses fdb_insert(),
which lacks various checks and does not respect added_by_user flag.

As a result, some inconsistent behavior can happen:
* Adding temporary entries succeeds but results in permanent entries.
* Same goes for "dynamic" and "use".
* Changing mac address of the bridge device causes deletion of
user-added entries.
* Replacing existing entries looks successful from userspace but actually
not, regardless of NLM_F_EXCL flag.

Use the same logic as other entries and fix them.

Fixes: 3741873b4f73 ("bridge: allow adding of fdb entries pointing to the bridge device")
Signed-off-by: Toshiaki Makita
Acked-by: Roopa Prabhu
Signed-off-by: David S. Miller

Toshiaki Makita
2016-08-10 12:42:44 +0800
a5d0dc810 vti: flush x-netns xfrm cache when vti interface is removed ... Browse Code »

When executing the script included below, the netns delete operation
hangs with the following message (repeated at 10 second intervals):

kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1

This occurs because a reference to the lo interface in the "secure" netns
is still held by a dst entry in the xfrm bundle cache in the init netns.

Address this problem by garbage collecting the tunnel netns flow cache
when a cross-namespace vti interface receives a NETDEV_DOWN notification.

A more detailed description of the problem scenario (referencing commands
in the script below):

(1) ip link add vti_test type vti local 1.1.1.1 remote 1.1.1.2 key 1

The vti_test interface is created in the init namespace. vti_tunnel_init()
attaches a struct ip_tunnel to the vti interface's netdev_priv(dev),
setting the tunnel net to &init_net.

(2) ip link set vti_test netns secure

The vti_test interface is moved to the "secure" netns. Note that
the associated struct ip_tunnel still has tunnel->net set to &init_net.

(3) ip netns exec secure ping -c 4 -i 0.02 -I 192.168.100.1 192.168.200.1

The first packet sent using the vti device causes xfrm_lookup() to be
called as follows:

dst = xfrm_lookup(tunnel->net, skb_dst(skb), fl, NULL, 0);

Note that tunnel->net is the init namespace, while skb_dst(skb) references
the vti_test interface in the "secure" namespace. The returned dst
references an interface in the init namespace.

Also note that the first parameter to xfrm_lookup() determines which flow
cache is used to store the computed xfrm bundle, so after xfrm_lookup()
returns there will be a cached bundle in the init namespace flow cache
with a dst referencing a device in the "secure" namespace.

(4) ip netns del secure

Kernel begins to delete the "secure" namespace. At some point the
vti_test interface is deleted, at which point dst_ifdown() changes
the dst->dev in the cached xfrm bundle flow from vti_test to lo (still
in the "secure" namespace however).
Since nothing has happened to cause the init namespace's flow cache
to be garbage collected, this dst remains attached to the flow cache,
so the kernel loops waiting for the last reference to lo to go away.

ip link add br1 type bridge
ip link set dev br1 up
ip addr add dev br1 1.1.1.1/8

ip netns add secure
ip link add vti_test type vti local 1.1.1.1 remote 1.1.1.2 key 1
ip link set vti_test netns secure
ip netns exec secure ip link set vti_test up
ip netns exec secure ip link s lo up
ip netns exec secure ip addr add dev lo 192.168.100.1/24
ip netns exec secure ip route add 192.168.200.0/24 dev vti_test
ip xfrm policy flush
ip xfrm state flush
ip xfrm policy add dir out tmpl src 1.1.1.1 dst 1.1.1.2 \
proto esp mode tunnel mark 1
ip xfrm policy add dir in tmpl src 1.1.1.2 dst 1.1.1.1 \
proto esp mode tunnel mark 1
ip xfrm state add src 1.1.1.1 dst 1.1.1.2 proto esp spi 1 \
mode tunnel enc des3_ede 0x112233445566778811223344556677881122334455667788
ip xfrm state add src 1.1.1.2 dst 1.1.1.1 proto esp spi 1 \
mode tunnel enc des3_ede 0x112233445566778811223344556677881122334455667788

ip netns exec secure ping -c 4 -i 0.02 -I 192.168.100.1 192.168.200.1

ip netns del secure

Reported-by: Hangbin Liu
Reported-by: Jan Tluka
Signed-off-by: Lance Richardson
Signed-off-by: David S. Miller

Lance Richardson
2016-08-10 03:57:49 +0800
992c273af rxrpc: Free packets discarded in data_ready ... Browse Code »

Under certain conditions, the data_ready handler will discard a packet.
These need to be freed.

Signed-off-by: David Howells

David Howells
2016-08-10 00:13:56 +0800
50fd85a1f rxrpc: Fix a use-after-push in data_ready handler ... Browse Code »

Fix a use of a packet after it has been enqueued onto the packet processing
queue in the data_ready handler. Once on a call's Rx queue, we mustn't
touch it any more as it may be dequeued and freed by the call processor
running on a work queue.

Save the values we need before enqueuing.

Without this, we can get an oops like the following:

BUG: unable to handle kernel NULL pointer dereference at 000000000000009c
IP: [] rxrpc_fast_process_packet+0x724/0xa11 [af_rxrpc]
PGD 0
Oops: 0000 [#1] SMP
Modules linked in: kafs(E) af_rxrpc(E) [last unloaded: af_rxrpc]
CPU: 2 PID: 0 Comm: swapper/2 Tainted: G E 4.7.0-fsdevel+ #1336
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
task: ffff88040d6863c0 task.stack: ffff88040d68c000
RIP: 0010:[] [] rxrpc_fast_process_packet+0x724/0xa11 [af_rxrpc]
RSP: 0018:ffff88041fb03a78 EFLAGS: 00010246
RAX: ffffffffffffffff RBX: ffff8803ff195b00 RCX: 0000000000000001
RDX: ffffffffa01854d1 RSI: 0000000000000008 RDI: ffff8803ff195b00
RBP: ffff88041fb03ab0 R08: 0000000000000000 R09: 0000000000000001
R10: ffff88041fb038c8 R11: 0000000000000000 R12: ffff880406874800
R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff88041fb00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000000009c CR3: 0000000001c14000 CR4: 00000000001406e0
Stack:
ffff8803ff195ea0 ffff880408348800 ffff880406874800 ffff8803ff195b00
ffff880408348800 ffff8803ff195ed8 0000000000000000 ffff88041fb03af0
ffffffffa0186072 0000000000000000 ffff8804054da000 0000000000000000
Call Trace:

[] rxrpc_data_ready+0x89d/0xbae [af_rxrpc]
[] __sock_queue_rcv_skb+0x24c/0x2b2
[] __udp_queue_rcv_skb+0x4b/0x1bd
[] udp_queue_rcv_skb+0x281/0x4db
[] __udp4_lib_rcv+0x7ed/0x963
[] udp_rcv+0x15/0x17
[] ip_local_deliver_finish+0x1c3/0x318
[] ip_local_deliver+0xbb/0xc4
[] ? inet_del_offload+0x40/0x40
[] ip_rcv_finish+0x3ce/0x42c
[] ip_rcv+0x304/0x33d
[] ? ip_local_deliver_finish+0x318/0x318
[] __netif_receive_skb_core+0x601/0x6e8
[] __netif_receive_skb+0x13/0x54
[] netif_receive_skb_internal+0xbb/0x17c
[] napi_gro_receive+0xf9/0x1bd
[] rtl8169_poll+0x32b/0x4a8
[] net_rx_action+0xe8/0x357
[] __do_softirq+0x1aa/0x414
[] irq_exit+0x3d/0xb0
[] do_IRQ+0xe4/0xfc
[] common_interrupt+0x93/0x93

[] ? cpuidle_enter_state+0x1ad/0x2be
[] ? cpuidle_enter_state+0x1a8/0x2be
[] cpuidle_enter+0x12/0x14
[] call_cpuidle+0x39/0x3b
[] cpu_startup_entry+0x230/0x35d
[] start_secondary+0xf4/0xf7

Signed-off-by: David Howells

David Howells
2016-08-10 00:13:55 +0800
2e7e9758b rxrpc: Once packet posted in data_ready, don't retry posting ... Browse Code »

Once a packet has been posted to a connection in the data_ready handler, we
mustn't try reposting if we then find that the connection is dying as the
refcount has been given over to the dying connection and the packet might
no longer exist.

Losing the packet isn't a problem as the peer will retransmit.

Signed-off-by: David Howells

David Howells
2016-08-10 00:13:55 +0800
f9dc57572 rxrpc: Don't access connection from call if pointer is NULL ... Browse Code »

The call state machine processor sets up the message parameters for a UDP
message that it might need to transmit in advance on the basis that there's
a very good chance it's going to have to transmit either an ACK or an
ABORT. This requires it to look in the connection struct to retrieve some
of the parameters.

However, if the call is complete, the call connection pointer may be NULL
to dissuade the processor from transmitting a message. However, there are
some situations where the processor is still going to be called - and it's
still going to set up message parameters whether it needs them or not.

This results in a NULL pointer dereference at:

net/rxrpc/call_event.c:837

To fix this, skip the message pre-initialisation if there's no connection
attached.

Signed-off-by: David Howells

David Howells
2016-08-10 00:12:23 +0800
17b963e31 rxrpc: Need to flag call as being released on connect failure ... Browse Code »

If rxrpc_new_client_call() fails to make a connection, the call record that
it allocated needs to be marked as RXRPC_CALL_RELEASED before it is passed
to rxrpc_put_call() to indicate that it no longer has any attachment to the
AF_RXRPC socket.

Without this, an assertion failure may occur at:

net/rxrpc/call_object:635

Signed-off-by: David Howells

David Howells
2016-08-10 00:12:23 +0800

09 Aug, 2016

9 commits

1b8553c04 9p/trans_virtio: use kvfree() for iov_iter_get_pages_alloc() ... Browse Code »

The memory allocated by iov_iter_get_pages_alloc() can be allocated with
vmalloc() if kmalloc() failed -- see get_pages_array().

In that case we need to free it with vfree(), so let's use kvfree().

The bug manifests like this:

BUG: unable to handle kernel paging request at ffffeb0400072da0
IP: [] kfree+0x4b/0x140
PGD 0
Oops: 0000 [#1] PREEMPT SMP KASAN
CPU: 2 PID: 675 Comm: trinity-c2 Not tainted 4.7.0-rc7+ #14
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
task: ffff8800badef2c0 ti: ffff880069208000 task.ti: ffff880069208000
RIP: 0010:[] [] kfree+0x4b/0x140
RSP: 0000:ffff88006920f3f0 EFLAGS: 00010282
RAX: ffffea0000000000 RBX: ffffc90001cb6000 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000246 RDI: ffffc90001cb6000
RBP: ffff88006920f410 R08: 0000000000000000 R09: dffffc0000000000
R10: ffff8800badefa30 R11: 0000056a3d3b0d9f R12: ffff88006920f620
R13: ffffeb0400072d80 R14: ffff8800baa94078 R15: 0000000000000000
FS: 00007fbd2b437700(0000) GS:ffff88011af00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffeb0400072da0 CR3: 000000006926d000 CR4: 00000000000006e0
Stack:
0000000000000001 ffff88006920f620 ffffed001755280f ffff8800baa94078
ffff88006920f6a8 ffffffff8310442b dffffc0000000000 ffff8800badefa30
ffff8800badefa28 ffff88011af1fba0 1ffff1000d241e98 ffff8800ba892150
Call Trace:
[] p9_virtio_zc_request+0x72b/0xdb0
[] p9_client_zc_rpc.constprop.8+0x246/0xb10
[] p9_client_read+0x4c9/0x750
[] v9fs_fid_readpage+0x14c/0x320
[] v9fs_vfs_readpage+0x36/0x50
[] filemap_fault+0x9a3/0xe60
[] __do_fault+0x158/0x300
[] handle_mm_fault+0x1cf1/0x3c80
[] __do_page_fault+0x30a/0x8e0
[] do_page_fault+0x2f/0x80
[] do_async_page_fault+0x27/0xa0
[] async_page_fault+0x28/0x30
Code: 00 80 41 54 53 49 01 fd 48 0f 42 05 b0 39 67 02 48 89 fb 49 01 c5 48 b8 00 00 00 00 00 ea ff ff 49 c1 ed 0c 49 c1 e5 06 49 01 c5 8b 45 20 48 8d 50 ff a8 01 4c 0f 45 ea 49 8b 55 20 48 8d 42
RIP [] kfree+0x4b/0x140
RSP
CR2: ffffeb0400072da0
---[ end trace f3d59a04bafec038 ]---

Cc: Al Viro
Signed-off-by: Vegard Nossum
Signed-off-by: Michael S. Tsirkin

Vegard Nossum
2016-08-09 18:42:36 +0800
55cae7a40 rxrpc: fix uninitialized pointer dereference in debug code ... Browse Code »

A newly added bugfix caused an uninitialized variable to be
used for printing debug output. This is harmless as long
as the debug setting is disabled, but otherwise leads to an
immediate crash.

gcc warns about this when -Wmaybe-uninitialized is enabled:

net/rxrpc/call_object.c: In function 'rxrpc_release_call':
net/rxrpc/call_object.c:496:163: error: 'sp' may be used uninitialized in this function [-Werror=maybe-uninitialized]

The initialization was removed but one of the users remains.
This adds back the initialization.

Signed-off-by: Arnd Bergmann
Fixes: 372ee16386bb ("rxrpc: Fix races between skb free, ACK generation and replying")
Signed-off-by: David Howells

Arnd Bergmann
2016-08-09 17:51:38 +0800
aa0c2c68a netfilter: ctnetlink: reject new conntrack request with different l4proto ... Browse Code »

Currently, user can add a conntrack with different l4proto via nfnetlink.
For example, original tuple is TCP while reply tuple is SCTP. This is
invalid combination, we should report EINVAL to userspace.

Signed-off-by: Liping Zhang
Signed-off-by: Pablo Neira Ayuso

Liping Zhang
2016-08-09 16:39:26 +0800
00a3101f5 netfilter: nfnetlink_queue: reject verdict request from different portid ... Browse Code »

Like NFQNL_MSG_VERDICT_BATCH do, we should also reject the verdict
request when the portid is not same with the initial portid(maybe
from another process).

Fixes: 97d32cf9440d ("netfilter: nfnetlink_queue: batch verdict support")
Signed-off-by: Liping Zhang
Reviewed-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Liping Zhang
2016-08-09 16:39:25 +0800
b18bcb001 netfilter: nfnetlink_queue: fix memory leak when attach expectation successfully ... Browse Code »

User can use NFQA_EXP to attach expectations to conntracks, but we
forget to put back nf_conntrack_expect when it is inserted successfully,
i.e. in this normal case, expect's use refcnt will be 3. So even we
unlink it and put it back later, the use refcnt is still 1, then the
memory will be leaked forever.

Signed-off-by: Liping Zhang
Signed-off-by: Pablo Neira Ayuso

Liping Zhang
2016-08-09 16:39:25 +0800
b173a28f6 netfilter: nf_ct_expect: remove the redundant slash when policy name is empty ... Browse Code »

The 'name' filed in struct nf_conntrack_expect_policy{} is not a
pointer, so check it is NULL or not will always return true. Even if the
name is empty, slash will always be displayed like follows:
# cat /proc/net/nf_conntrack_expect
297 l3proto = 2 proto=6 src=1.1.1.1 dst=2.2.2.2 sport=1 dport=1025 ftp/
^

Fixes: 3a8fc53a45c4 ("netfilter: nf_ct_helper: allocate 16 bytes for the helper and policy names")
Signed-off-by: Liping Zhang
Signed-off-by: Pablo Neira Ayuso

Liping Zhang
2016-08-09 16:38:46 +0800
1fe323aa1 sctp: use event->chunk when it's valid ... Browse Code »

Commit 52253db924d1 ("sctp: also point GSO head_skb to the sk when
it's available") used event->chunk->head_skb to get the head_skb in
sctp_ulpevent_set_owner().

But at that moment, the event->chunk was NULL, as it cloned the skb
in sctp_ulpevent_make_rcvmsg(). Therefore, that patch didn't really
work.

This patch is to move the event->chunk initialization before calling
sctp_ulpevent_receive_data() so that it uses event->chunk when it's
valid.

Fixes: 52253db924d1 ("sctp: also point GSO head_skb to the sk when it's available")
Signed-off-by: Xin Long
Acked-by: Marcelo Ricardo Leitner
Acked-by: Neil Horman
Signed-off-by: David S. Miller

Xin Long
2016-08-09 05:31:23 +0800
8065694e6 bpf: fix checksum for vlan push/pop helper ... Browse Code »

When having skbs on ingress with CHECKSUM_COMPLETE, tc BPF programs don't
push rcsum of mac header back in and after BPF run back pull out again as
opposed to some other subsystems (ovs, for example).

For cases like q-in-q, meaning when a vlan tag for offloading is already
present and we're about to push another one, then skb_vlan_push() pushes the
inner one into the skb, increasing mac header and skb_postpush_rcsum()'ing
the 4 bytes vlan header diff. Likewise, for the reverse operation in
skb_vlan_pop() for the case where vlan header needs to be pulled out of the
skb, we're decreasing the mac header and skb_postpull_rcsum()'ing the 4 bytes
rcsum of the vlan header that was removed.

However mangling the rcsum here will lead to hw csum failure for BPF case,
since we're pulling or pushing data that was not part of the current rcsum.
Changing tc BPF programs in general to push/pull rcsum around BPF_PROG_RUN()
is also not really an option since current behaviour is ABI by now, but apart
from that would also mean to do quite a bit of useless work in the sense that
usually 12 bytes need to be rcsum pushed/pulled also when we don't need to
touch this vlan related corner case. One way to fix it would be to push the
necessary rcsum fixup down into vlan helpers that are (mostly) slow-path
anyway.

Fixes: 4e10df9a60d9 ("bpf: introduce bpf_skb_vlan_push/pop() helpers")
Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Daniel Borkmann
2016-08-09 04:11:43 +0800
479ffccce bpf: fix checksum fixups on bpf_skb_store_bytes ... Browse Code »

bpf_skb_store_bytes() invocations above L2 header need BPF_F_RECOMPUTE_CSUM
flag for updates, so that CHECKSUM_COMPLETE will be fixed up along the way.
Where we ran into an issue with bpf_skb_store_bytes() is when we did a
single-byte update on the IPv6 hoplimit despite using BPF_F_RECOMPUTE_CSUM
flag; simple ping via ICMPv6 triggered a hw csum failure as a result. The
underlying issue has been tracked down to a buffer alignment issue.

Meaning, that csum_partial() computations via skb_postpull_rcsum() and
skb_postpush_rcsum() pair invoked had a wrong result since they operated on
an odd address for the hoplimit, while other computations were done on an
even address. This mix doesn't work as-is with skb_postpull_rcsum(),
skb_postpush_rcsum() pair as it always expects at least half-word alignment
of input buffers, which is normally the case. Thus, instead of these helpers
using csum_sub() and (implicitly) csum_add(), we need to use csum_block_sub(),
csum_block_add(), respectively. For unaligned offsets, they rotate the sum
to align it to a half-word boundary again, otherwise they work the same as
csum_sub() and csum_add().

Adding __skb_postpull_rcsum(), __skb_postpush_rcsum() variants that take the
offset as an input and adapting bpf_skb_store_bytes() to them fixes the hw
csum failures again. The skb_postpull_rcsum(), skb_postpush_rcsum() helpers
use a 0 constant for offset so that the compiler optimizes the offset & 1
test away and generates the same code as with csum_sub()/_add().

Fixes: 608cd71a9c7c ("tc: bpf: generalize pedit action")
Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Daniel Borkmann
2016-08-09 04:11:43 +0800