Eric Lee / smarc-fsl-linux-kernel

17 Aug, 2016

1 commit

e77e6ff50 netfilter: conntrack: do not dump other netns's conntrack entries via proc ... Browse Code »

We should skip the conntracks that belong to a different namespace,
otherwise other unrelated netns's conntrack entries will be dumped via
/proc/net/nf_conntrack.

Fixes: 56d52d4892d0 ("netfilter: conntrack: use a single hashtable for all namespaces")
Signed-off-by: Liping Zhang
Reviewed-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Liping Zhang
2016-08-17 23:41:58 +0800

16 Aug, 2016

14 commits

a1560dd7a Merge branch 'mediatek-fixes' ... Browse Code »

Sean Wang says:

====================
mediatek: Fix warning and issue

This patch set fixes the following warning and issues

v1 -> v2: Fix message typos and add coverletter

v2 -> v3: Split from the previous series for submitting bug fixes
as a series targeting 'net'
====================

Signed-off-by: David S. Miller

David S. Miller
2016-08-16 14:02:45 +0800
55a4e7781 net: ethernet: mediatek: fix runtime warning raised by inconsistent struct devic… ... Browse Code »

…e pointers passed to DMA API

Runtime warning occurs if DMA-API debug feature is enabled that would be
raised by pointers passed to DMA API as arguments to inconsistent struct
device objects, so that the patch makes them usage aligned between DMA
operations such as dma_map_*() and dma_unmap_*() to eliminate the warning.

Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sean.wang@mediatek.com
2016-08-16 14:02:44 +0800
b2025c7cc net: ethernet: mediatek: fix flow control settings on GMAC0 is not being enabled properly ... Browse Code »

Commit 08ef55c6f257acf3bdc6940813f80e8f0f5d90ec
("net-next: mediatek: fix gigabit and flow control advertisement")
had supported proper flow control settings for GMAC1. But for GMAC0,

1.GMAC0 shares the common logic with GMAC1 inside mtk_phy_link_adjust()
to adapt various settings for the target phy.

2.GMAC0 uses fixed-phy to connect to a builtin gigabit switch with
fixed link speed as commit 0c72c50f6f93b0c3daa9ea35d89ab3a933c7b5a0
("net-next: mediatek: add fixed-phy support") describes.

3.However, fixed-phy doesn't enable SUPPORTED_Pause & SUPPORTED_Asym_Pause
supported flag on default that would cause mtk_phy_link_adjust() not to
enable flow control setting on GMAC0 properly and cause packet dropped
when high traffic.

Due to these reasons, the patch adds SUPPORTED_Pause & SUPPORTED_Asym_Pause
supported flags on fixed-phy used by the driver to have proper handling on
the both GMAC with the shared common logic.

Signed-off-by: Sean Wang
Signed-off-by: David S. Miller

sean.wang@mediatek.com
2016-08-16 14:02:44 +0800
8ca7f4fe0 net: ethernet: mediatek: fix RMII mode and add REVMII supported by GMAC ... Browse Code »

The patch fixes up the incorrect setup of reduced MII (RMII) on GMAC
and adds the supplement for the setup of reverse MII (REVMII) on GMAC
, and rearranges the error handling for invalid PHY argument.

Signed-off-by: Sean Wang
Signed-off-by: David S. Miller

sean.wang@mediatek.com
2016-08-16 14:02:44 +0800
d2fbdf76b tipc: fix NULL pointer dereference in shutdown() ... Browse Code »

tipc_msg_create() can return a NULL skb and if so, we shouldn't try to
call tipc_node_xmit_skb() on it.

general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 3 PID: 30298 Comm: trinity-c0 Not tainted 4.7.0-rc7+ #19
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
task: ffff8800baf09980 ti: ffff8800595b8000 task.ti: ffff8800595b8000
RIP: 0010:[] [] tipc_node_xmit_skb+0x6b/0x140
RSP: 0018:ffff8800595bfce8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000003023b0e0
RDX: 0000000000000000 RSI: dffffc0000000000 RDI: ffffffff83d12580
RBP: ffff8800595bfd78 R08: ffffed000b2b7f32 R09: 0000000000000000
R10: fffffbfff0759725 R11: 0000000000000000 R12: 1ffff1000b2b7f9f
R13: ffff8800595bfd58 R14: ffffffff83d12580 R15: dffffc0000000000
FS: 00007fcdde242700(0000) GS:ffff88011af80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fcddde1db10 CR3: 000000006874b000 CR4: 00000000000006e0
DR0: 00007fcdde248000 DR1: 00007fcddd73d000 DR2: 00007fcdde248000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000090602
Stack:
0000000000000018 0000000000000018 0000000041b58ab3 ffffffff83954208
ffffffff830bb400 ffff8800595bfd30 ffffffff8309d767 0000000000000018
0000000000000018 ffff8800595bfd78 ffffffff8309da1a 00000000810ee611
Call Trace:
[] tipc_shutdown+0x553/0x880
[] SyS_shutdown+0x14b/0x170
[] do_syscall_64+0x19c/0x410
[] entry_SYSCALL64_slow_path+0x25/0x25
Code: 90 00 b4 0b 83 c7 00 f1 f1 f1 f1 4c 8d 6d e0 c7 40 04 00 00 00 f4 c7 40 08 f3 f3 f3 f3 48 89 d8 48 c1 e8 03 c7 45 b4 00 00 00 00 3c 30 00 75 78 48 8d 7b 08 49 8d 75 c0 48 b8 00 00 00 00 00
RIP [] tipc_node_xmit_skb+0x6b/0x140
RSP
---[ end trace 57b0484e351e71f1 ]---

I feel like we should maybe return -ENOMEM or -ENOBUFS, but I'm not sure
userspace is equipped to handle that. Anyway, this is better than a GPF
and looks somewhat consistent with other tipc_msg_create() callers.

Signed-off-by: Vegard Nossum
Acked-by: Ying Xue
Acked-by: Jon Maloy
Signed-off-by: David S. Miller

Vegard Nossum
2016-08-16 04:55:36 +0800
a8545b60a Merge branch 'hv_netvsc-VF-removal-fixes' ... Browse Code »

Vitaly Kuznetsov says:

====================
hv_netvsc: fixes for VF removal path

Kernel crash is reported after VF is removed and detached from netvsc
device. Turns out we have multiple different (but related) issues on the
VF removal path which I'm trying to address with PATCHes 2-5 of this
series. PATCH1 is required to support the change.

Changes since v1:
- Re-arrange patches in the series to not introduce new issues [David Miller]
- Add PATCH5 which fixes a new issue I discovered while testing.
- Add Haiyang' A-b tags to PATCH1-4

With regards to Stephen's suggestion: I believe that switching to using RCU
and eliminating vf_use_cnt/vf_inject is the right thing to do long-term, we
can either put this on top of this series or do it later in net-next.
====================

Signed-off-by: David S. Miller

David S. Miller
2016-08-16 04:48:08 +0800
0dbff144a hv_netvsc: fix bonding devices check in netvsc_netdev_event() ... Browse Code »

Bonding driver sets IFF_BONDING on both master (the bonding device) and
slave (the real NIC) devices and in netvsc_netdev_event() we want to skip
master devices only. Currently, there is an uncertainty when a slave
interface is removed: if bonding module comes first in netdev_chain it
clears IFF_BONDING flag on the netdev and netvsc_netdev_event() correctly
handles NETDEV_UNREGISTER event, but in case netvsc comes first on the
chain it sees the device with IFF_BONDING still attached and skips it. As
we still hold vf_netdev pointer to the device we crash on the next inject.

Signed-off-by: Vitaly Kuznetsov
Acked-by: Haiyang Zhang
Signed-off-by: David S. Miller

Vitaly Kuznetsov
2016-08-16 04:48:07 +0800
0f20d795f hv_netvsc: protect module refcount by checking net_device_ctx->vf_netdev ... Browse Code »

We're not guaranteed to see NETDEV_REGISTER/NETDEV_UNREGISTER notifications
only once per VF but we increase/decrease module refcount unconditionally.
Check vf_netdev to make sure we don't take/release it twice. We presume
that only one VF per netvsc device may exist.

Signed-off-by: Vitaly Kuznetsov
Acked-by: Haiyang Zhang
Signed-off-by: David S. Miller

Vitaly Kuznetsov
2016-08-16 04:48:07 +0800
57c1826b9 hv_netvsc: reset vf_inject on VF removal ... Browse Code »

We reset vf_inject on VF going down (netvsc_vf_down()) but we don't on
VF removal (netvsc_unregister_vf()) so vf_inject stays 'true' while
vf_netdev is already NULL and we're trying to inject packets into NULL
net device in netvsc_recv_callback() causing kernel to crash.

Signed-off-by: Vitaly Kuznetsov
Acked-by: Haiyang Zhang
Signed-off-by: David S. Miller

Vitaly Kuznetsov
2016-08-16 04:48:07 +0800
d072218f2 hv_netvsc: avoid deadlocks between rtnl lock and vf_use_cnt wait ... Browse Code »

Here is a deadlock scenario:
- netvsc_vf_up() schedules netvsc_notify_peers() work and quits.
- netvsc_vf_down() runs before netvsc_notify_peers() gets executed. As it
is being executed from netdev notifier chain we hold rtnl lock when we
get here.
- we enter while (atomic_read(&net_device_ctx->vf_use_cnt) != 0) loop and
wait till netvsc_notify_peers() drops vf_use_cnt.
- netvsc_notify_peers() starts on some other CPU but netdev_notify_peers()
will hang on rtnl_lock().
- deadlock!

Instead of introducing additional synchronization I suggest we drop
gwrk.dwrk completely and call NETDEV_NOTIFY_PEERS directly. As we're
acting under rtnl lock this is legitimate.

Signed-off-by: Vitaly Kuznetsov
Acked-by: Haiyang Zhang
Signed-off-by: David S. Miller

Vitaly Kuznetsov
2016-08-16 04:48:07 +0800
f9a7da913 hv_netvsc: don't lose VF information ... Browse Code »

struct netvsc_device is not suitable for storing VF information as this
structure is being destroyed on MTU change / set channel operation (see
rndis_filter_device_remove()). Move all VF related stuff to struct
net_device_context which is persistent.

Signed-off-by: Vitaly Kuznetsov
Acked-by: Haiyang Zhang
Signed-off-by: David S. Miller

Vitaly Kuznetsov
2016-08-16 04:48:07 +0800
3d7b33209 gre: set inner_protocol on xmit ... Browse Code »

Ensure that the inner_protocol is set on transmit so that GSO segmentation,
which relies on that field, works correctly.

This is achieved by setting the inner_protocol in gre_build_header rather
than each caller of that function. It ensures that the inner_protocol is
set when gre_fb_xmit() is used to transmit GRE which was not previously the
case.

I have observed this is not the case when OvS transmits GRE using
lwtunnel metadata (which it always does).

Fixes: 38720352412a ("gre: Use inner_proto to obtain inner header protocol")
Cc: Pravin Shelar
Acked-by: Alexander Duyck
Signed-off-by: Simon Horman
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Simon Horman
2016-08-16 04:37:12 +0800
5e4578969 net: ipv6: Fix ping to link-local addresses. ... Browse Code »

ping_v6_sendmsg does not set flowi6_oif in response to
sin6_scope_id or sk_bound_dev_if, so it is not possible to use
these APIs to ping an IPv6 address on a different interface.
Instead, it sets flowi6_iif, which is incorrect but harmless.

Stop setting flowi6_iif, and support various ways of setting oif
in the same priority order used by udpv6_sendmsg.

Tested: https://android-review.googlesource.com/#/c/254470/
Signed-off-by: Lorenzo Colitti
Signed-off-by: David S. Miller

Lorenzo Colitti
2016-08-16 03:19:09 +0800
12311959e rhashtable: fix shift by 64 when shrinking ... Browse Code »

I got this:

================================================================================
UBSAN: Undefined behaviour in ./include/linux/log2.h:63:13
shift exponent 64 is too large for 64-bit type 'long unsigned int'
CPU: 1 PID: 721 Comm: kworker/1:1 Not tainted 4.8.0-rc1+ #87
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
Workqueue: events rht_deferred_worker
0000000000000000 ffff88011661f8d8 ffffffff82344f50 0000000041b58ab3
ffffffff84f98000 ffffffff82344ea4 ffff88011661f900 ffff88011661f8b0
0000000000000001 ffff88011661f6b8 dffffc0000000000 ffffffff867f7640
Call Trace:
[] dump_stack+0xac/0xfc
[] ? _atomic_dec_and_lock+0xc4/0xc4
[] ubsan_epilogue+0xd/0x8a
[] __ubsan_handle_shift_out_of_bounds+0x255/0x29a
[] ? __ubsan_handle_out_of_bounds+0x180/0x180
[] ? nl80211_req_set_reg+0x256/0x2f0
[] ? print_context_stack+0x8a/0x160
[] ? amd_pmu_reset+0x341/0x380
[] rht_deferred_worker+0x1618/0x1790
[] ? rht_deferred_worker+0x1618/0x1790
[] ? rhashtable_jhash2+0x370/0x370
[] ? process_one_work+0x6fd/0x1970
[] process_one_work+0x79f/0x1970
[] ? process_one_work+0x6fd/0x1970
[] ? try_to_grab_pending+0x4c0/0x4c0
[] ? worker_thread+0x1c4/0x1340
[] worker_thread+0x55f/0x1340
[] ? __schedule+0x4df/0x1d40
[] ? process_one_work+0x1970/0x1970
[] ? process_one_work+0x1970/0x1970
[] kthread+0x237/0x390
[] ? __kthread_parkme+0x280/0x280
[] ? _raw_spin_unlock_irq+0x33/0x50
[] ret_from_fork+0x1f/0x40
[] ? __kthread_parkme+0x280/0x280
================================================================================

roundup_pow_of_two() is undefined when called with an argument of 0, so
let's avoid the call and just fall back to ht->p.min_size (which should
never be smaller than HASH_MIN_SIZE).

Cc: Herbert Xu
Signed-off-by: Vegard Nossum
Acked-by: Herbert Xu
Signed-off-by: David S. Miller

Vegard Nossum
2016-08-16 02:10:09 +0800

15 Aug, 2016

2 commits

eb8fc3235 mlxsw: spectrum_router: Fix use after free ... Browse Code »

In mlxsw_sp_router_fib4_add_info_destroy(), the fib_entry pointer is used
after it has been freed by mlxsw_sp_fib_entry_destroy(). Use a temporary
variable to fix this.

Fixes: 61c503f976b5449e ("mlxsw: spectrum_router: Implement fib4 add/del switchdev obj ops")
Signed-off-by: Vincent Stehlé
Cc: Jiri Pirko
Acked-by: Ido Schimmel
Signed-off-by: David S. Miller

Vincent
2016-08-15 12:32:05 +0800
4cf0b354d rhashtable: avoid large lock-array allocations ... Browse Code »

Sander reports following splat after netfilter nat bysrc table got
converted to rhashtable:

swapper/0: page allocation failure: order:3, mode:0x2084020(GFP_ATOMIC|__GFP_COMP)
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc1 [..]
[] warn_alloc_failed+0xdd/0x140
[] __alloc_pages_nodemask+0x3e1/0xcf0
[] alloc_pages_current+0x8d/0x110
[] kmalloc_order+0x1f/0x70
[] __kmalloc+0x129/0x140
[] bucket_table_alloc+0xc1/0x1d0
[] rhashtable_insert_rehash+0x5d/0xe0
[] nf_nat_setup_info+0x2ef/0x400

The failure happens when allocating the spinlock array.
Even with GFP_KERNEL its unlikely for such a large allocation
to succeed.

Thomas Graf pointed me at inet_ehash_locks_alloc(), so in addition
to adding NOWARN for atomic allocations this also makes the bucket-array
sizing more conservative.

In commit 095dc8e0c3686 ("tcp: fix/cleanup inet_ehash_locks_alloc()"),
Eric Dumazet says: "Budget 2 cache lines per cpu worth of 'spinlocks'".
IOW, consider size needed by a single spinlock when determining
number of locks per cpu. So with 64 byte per cacheline and 4 byte per
spinlock this gives 32 locks per cpu.

Resulting size of the lock-array (sizeof(spinlock) == 4):

cpus: 1 2 4 8 16 32 64
old: 1k 1k 4k 8k 16k 16k 16k
new: 128 256 512 1k 2k 4k 8k

8k allocation should have decent chance of success even
with GFP_ATOMIC, and should not fail with GFP_KERNEL.

With 72-byte spinlock (LOCKDEP):
cpus : 1 2
old: 9k 18k
new: ~2k ~4k

Reported-by: Sander Eikelenboom
Suggested-by: Thomas Graf
Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller

Florian Westphal
2016-08-15 12:12:57 +0800

14 Aug, 2016

10 commits

952fcfd08 net: remove type_check from dev_get_nest_level() ... Browse Code »

The idea for type_check in dev_get_nest_level() was to count the number
of nested devices of the same type (currently, only macvlan or vlan
devices).
This prevented the false positive lockdep warning on configurations such
as:

eth0
Signed-off-by: David S. Miller

Sabrina Dubroca
2016-08-14 06:15:54 +0800
e20038724 macsec: fix lockdep splats when nesting devices ... Browse Code »

Currently, trying to setup a vlan over a macsec device, or other
combinations of devices, triggers a lockdep warning.

Use netdev_lockdep_set_classes and ndo_get_lock_subclass, similar to
what macvlan does.

Signed-off-by: Sabrina Dubroca
Signed-off-by: David S. Miller

Sabrina Dubroca
2016-08-14 06:15:54 +0800
bc561632d net: ipv6: Do not keep IPv6 addresses when IPv6 is disabled ... Browse Code »

If IPv6 is disabled when the option is set to keep IPv6
addresses on link down, userspace is unaware of this as
there is no such indication via netlink. The solution is to
remove the IPv6 addresses in this case, which results in
netlink messages indicating removal of addresses in the
usual manner. This fix also makes the behavior consistent
with the case of having IPv6 disabled first, which stops
IPv6 addresses from being added.

Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional")
Signed-off-by: Mike Manning
Acked-by: David Ahern
Signed-off-by: David S. Miller

Mike Manning
2016-08-14 06:14:00 +0800
54236ab09 net/sctp: always initialise sctp_ht_iter::start_fail ... Browse Code »

sctp_transport_seq_start() does not currently clear iter->start_fail on
success, but relies on it being zero when it is allocated (by
seq_open_net()).

This can be a problem in the following sequence:

open() // allocates iter (and implicitly sets iter->start_fail = 0)
read()
- iter->start() // fails and sets iter->start_fail = 1
- iter->stop() // doesn't call sctp_transport_walk_stop() (correct)
read() again
- iter->start() // succeeds, but doesn't change iter->start_fail
- iter->stop() // doesn't call sctp_transport_walk_stop() (wrong)

We should initialize sctp_ht_iter::start_fail to zero if ->start()
succeeds, otherwise it's possible that we leave an old value of 1 there,
which will cause ->stop() to not call sctp_transport_walk_stop(), which
causes all sorts of problems like not calling rcu_read_unlock() (and
preempt_enable()), eventually leading to more warnings like this:

BUG: sleeping function called from invalid context at mm/slab.h:388
in_atomic(): 0, irqs_disabled(): 0, pid: 16551, name: trinity-c2
Preemption disabled at:[] rhashtable_walk_start+0x46/0x150

[] preempt_count_add+0x1fb/0x280
[] _raw_spin_lock+0x12/0x40
[] rhashtable_walk_start+0x46/0x150
[] sctp_transport_walk_start+0x2f/0x60
[] sctp_transport_seq_start+0x4d/0x150
[] traverse+0x170/0x850
[] seq_read+0x7cc/0x1180
[] proc_reg_read+0xbc/0x180
[] do_loop_readv_writev+0x134/0x210
[] do_readv_writev+0x565/0x660
[] vfs_readv+0x67/0xa0
[] do_preadv+0x126/0x170
[] SyS_preadv+0xc/0x10
[] do_syscall_64+0x19c/0x410
[] return_from_SYSCALL_64+0x0/0x6a
[] 0xffffffffffffffff

Notice that this is a subtly different stacktrace from the one in commit
5fc382d875 ("net/sctp: terminate rhashtable walk correctly").

Cc: Xin Long
Cc: Herbert Xu
Cc: Eric W. Biederman
Cc: Marcelo Ricardo Leitner
Signed-off-by: Vegard Nossum
Acked-By: Neil Horman
Acked-by: Marcelo Ricardo Leitner
Signed-off-by: David S. Miller

Vegard Nossum
2016-08-14 06:10:16 +0800
5ba092efc net/irda: handle iriap_register_lsap() allocation failure ... Browse Code »

If iriap_register_lsap() fails to allocate memory, self->lsap is
set to NULL. However, none of the callers handle the failure and
irlmp_connect_request() will happily dereference it:

iriap_register_lsap: Unable to allocated LSAP!
================================================================================
UBSAN: Undefined behaviour in net/irda/irlmp.c:378:2
member access within null pointer of type 'struct lsap_cb'
CPU: 1 PID: 15403 Comm: trinity-c0 Not tainted 4.8.0-rc1+ #81
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org
04/01/2014
0000000000000000 ffff88010c7e78a8 ffffffff82344f40 0000000041b58ab3
ffffffff84f98000 ffffffff82344e94 ffff88010c7e78d0 ffff88010c7e7880
ffff88010630ad00 ffffffff84a5fae0 ffffffff84d3f5c0 000000000000017a
Call Trace:
[] dump_stack+0xac/0xfc
[] ubsan_epilogue+0xd/0x8a
[] __ubsan_handle_type_mismatch+0x157/0x411
[] irlmp_connect_request+0x7ac/0x970
[] iriap_connect_request+0xa0/0x160
[] state_s_disconnect+0x88/0xd0
[] iriap_do_client_event+0x94/0x120
[] iriap_getvaluebyclass_request+0x3e0/0x6d0
[] irda_find_lsap_sel+0x1eb/0x630
[] irda_connect+0x828/0x12d0
[] SYSC_connect+0x22b/0x340
[] SyS_connect+0x9/0x10
[] do_syscall_64+0x1b3/0x4b0
[] entry_SYSCALL64_slow_path+0x25/0x25
================================================================================

The bug seems to have been around since forever.

There's more problems with missing error checks in iriap_init() (and
indeed all of irda_init()), but that's a bigger problem that needs
very careful review and testing. This patch will fix the most serious
bug (as it's easily reached from unprivileged userspace).

I have tested my patch with a reproducer.

Signed-off-by: Vegard Nossum
Signed-off-by: David S. Miller

Vegard Nossum
2016-08-14 06:09:07 +0800
c15c0ab12 ipv6: suppress sparse warnings in IP6_ECN_set_ce() ... Browse Code »

Pass the correct type __wsum to csum_sub() and csum_add(). This doesn't
really change anything since __wsum really *is* __be32, but removes the
address space warnings from sparse.

Cc: Eric Dumazet
Fixes: 34ae6a1aa054 ("ipv6: update skb->csum when CE mark is propagated")
Signed-off-by: Johannes Berg
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Johannes Berg
2016-08-14 06:08:00 +0800
0ed661d5a bpf: fix write helpers with regards to non-linear parts ... Browse Code »

Fix the bpf_try_make_writable() helper and all call sites we have in BPF,
it's currently defect with regards to skbs when the write_len spans into
non-linear parts, no matter if cloned or not.

There are multiple issues at once. First, using skb_store_bits() is not
correct since even if we have a cloned skb, page frags can still be shared.
To really make them private, we need to pull them in via __pskb_pull_tail()
first, which also gets us a private head via pskb_expand_head() implicitly.

This is for helpers like bpf_skb_store_bytes(), bpf_l3_csum_replace(),
bpf_l4_csum_replace(). Really, the only thing reasonable and working here
is to call skb_ensure_writable() before any write operation. Meaning, via
pskb_may_pull() it makes sure that parts we want to access are pulled in and
if not does so plus unclones the skb implicitly. If our write_len still fits
the headlen and we're cloned and our header of the clone is not writable,
then we need to make a private copy via pskb_expand_head(). skb_store_bits()
is a bit misleading and only safe to store into non-linear data in different
contexts such as 357b40a18b04 ("[IPV6]: IPV6_CHECKSUM socket option can
corrupt kernel memory").

For above BPF helper functions, it means after fixed bpf_try_make_writable(),
we've pulled in enough, so that we operate always based on skb->data. Thus,
the call to skb_header_pointer() and skb_store_bits() becomes superfluous.
In bpf_skb_store_bytes(), the len check is unnecessary too since it can
only pass in maximum of BPF stack size, so adding offset is guaranteed to
never overflow. Also bpf_l3/4_csum_replace() helpers must test for proper
offset alignment since they use __sum16 pointer for writing resulting csum.

The remaining helpers that change skb data not discussed here yet are
bpf_skb_vlan_push(), bpf_skb_vlan_pop() and bpf_skb_change_proto(). The
vlan helpers internally call either skb_ensure_writable() (pop case) and
skb_cow_head() (push case, for head expansion), respectively. Similarly,
bpf_skb_proto_xlat() takes care to not mangle page frags.

Fixes: 608cd71a9c7c ("tc: bpf: generalize pedit action")
Fixes: 91bc4822c3d6 ("tc: bpf: add checksum helpers")
Fixes: 3697649ff29e ("bpf: try harder on clones when writing into skb")
Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Daniel Borkmann
2016-08-14 06:01:02 +0800
e8c2993a4 net: ethernet: mediatek: add the missing of_node_put() after node is used done ... Browse Code »

This patch adds the missing of_node_put() after finishing the usage
of of_parse_phandle() or of_node_get() used by fixed_phy.

Signed-off-by: Sean Wang
Signed-off-by: David S. Miller

sean.wang@mediatek.com
2016-08-14 05:58:38 +0800
d7005652c net: ethernet: mediatek: fixed that initializing u64_stats_sync is missing ... Browse Code »

To fix runtime warning with lockdep is enabled due that u64_stats_sync
is not initialized well, so add it.

Signed-off-by: Sean Wang
Signed-off-by: David S. Miller

sean.wang@mediatek.com
2016-08-14 05:58:38 +0800
b4c0e0c61 calipso: fix resource leak on calipso_genopt failure ... Browse Code »

Currently, if calipso_genopt fails then the error exit path
does not free the ipv6_opt_hdr new causing a memory leak. Fix
this by kfree'ing new on the error exit path.

Signed-off-by: Colin Ian King
Signed-off-by: David S. Miller

Colin Ian King
2016-08-14 05:56:17 +0800

13 Aug, 2016

2 commits

747ea55e4 bpf: fix bpf_skb_in_cgroup helper naming ... Browse Code »

While hashing out BPF's current_task_under_cgroup helper bits, it came
to discussion that the skb_in_cgroup helper name was suboptimally chosen.

Tejun says:

So, I think in_cgroup should mean that the object is in that
particular cgroup while under_cgroup in the subhierarchy of that
cgroup. Let's rename the other subhierarchy test to under too. I
think that'd be a lot less confusing going forward.

[...]

It's more intuitive and gives us the room to implement the real
"in" test if ever necessary in the future.

Since this touches uapi bits, we need to change this as long as v4.8
is not yet officially released. Thus, change the helper enum and rename
related bits.

Fixes: 4a482f34afcc ("cgroup: bpf: Add bpf_skb_in_cgroup_proto")
Reference: http://patchwork.ozlabs.org/patch/658500/
Suggested-by: Sargun Dhillon
Suggested-by: Tejun Heo
Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov

Daniel Borkmann
2016-08-13 12:53:33 +0800
601bbae0b dsa: mv88e6xxx: hide unused functions ... Browse Code »

When CONFIG_NET_DSA_HWMON is disabled, we get warnings about two unused
functions whose only callers are all inside of an #ifdef:

drivers/net/dsa/mv88e6xxx.c:3257:12: 'mv88e6xxx_mdio_page_write' defined but not used [-Werror=unused-function]
drivers/net/dsa/mv88e6xxx.c:3244:12: 'mv88e6xxx_mdio_page_read' defined but not used [-Werror=unused-function]

This adds another ifdef around the function definitions. The warnings
appeared after the functions were marked 'static', but the problem
was already there before that.

Signed-off-by: Arnd Bergmann
Fixes: 57d3231057e9 ("net: dsa: mv88e6xxx: fix style issues")
Reviewed-by: Vivien Didelot
Signed-off-by: David S. Miller

Arnd Bergmann
2016-08-13 08:32:21 +0800

12 Aug, 2016

2 commits

bbe11fab0 macsec: use after free when deleting the underlying device ... Browse Code »

macsec_notify() loops over the list of macsec devices configured on the
underlying device when this device is being removed. This list is part
of the rx_handler data.

However, macsec_dellink unregisters the rx_handler and frees the
rx_handler data when the last macsec device is removed from the
underlying device.

Add macsec_common_dellink() to delete macsec devices without
unregistering the rx_handler and freeing the associated data.

Fixes: 960d5848dbf1 ("macsec: fix memory leaks around rx_handler (un)registration")
Signed-off-by: Sabrina Dubroca
Signed-off-by: David S. Miller

Sabrina Dubroca
2016-08-12 00:58:57 +0800
104a49339 macvtap: fix use after free for skb_array during release ... Browse Code »

We've clean skb_array in macvtap_put_queue() but still try to pop from
it during macvtap_sock_destruct(). Fix this use after free by moving
the skb array cleanup to macvtap_sock_destruct() instead.

Fixes: 362899b8725b ("macvtap: switch to use skb array")
Reported-by: Cornelia Huck
Tested-by: Cornelia Huck
Signed-off-by: Jason Wang
Signed-off-by: David S. Miller

Jason Wang
2016-08-12 00:55:51 +0800

11 Aug, 2016

4 commits

4b5b9ba55 openvswitch: do not ignore netdev errors when creating tunnel vports ... Browse Code »

The creation of a tunnel vport (geneve, gre, vxlan) brings up a
corresponding netdev, a multi-step operation which can fail.

For example, changing a vxlan vport's netdev state to 'up' binds the
vport's socket to a UDP port - if the binding fails (e.g. due to the
port being in use), the error is currently ignored giving the
appearance that the tunnel vport creation completed successfully.

Signed-off-by: Martynas Pumputis
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Martynas Pumputis
2016-08-11 14:13:23 +0800
dafa6b0db net: hns: fix typo in g_gmac_stats_string[] ... Browse Code »

s/gamc/gmac/

Signed-off-by: Fabian Frederick
Signed-off-by: David S. Miller

Fabian Frederick
2016-08-11 08:57:50 +0800
672ca65d9 tipc: fix variable dereference before NULL check ... Browse Code »

In commit cf6f7e1d5109 ("tipc: dump monitor attributes"),
I dereferenced a pointer before checking if its valid.
This is reported by static check Smatch as:
net/tipc/monitor.c:733 tipc_nl_add_monitor_peer()
warn: variable dereferenced before check 'mon' (see line 731)

In this commit, we check for a valid monitor before proceeding
with any other operation.

Fixes: cf6f7e1d5109 ("tipc: dump monitor attributes")
Reported-by: Dan Carpenter
Signed-off-by: Parthasarathy Bhuvaragan
Signed-off-by: David S. Miller

Parthasarathy Bhuvaragan
2016-08-11 08:56:52 +0800
293fddff2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf ... Browse Code »

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for your net tree,
they are:

1) Use mod_timer_pending() to avoid reactivating a dead expectation in
the h323 conntrack helper, from Liping Zhang.

2) Oneliner to fix a type in the register name defined in the nf_tables
header.

3) Don't try to look further when we find an inactive elements with no
descendants in the rbtree set implementation, otherwise we crash.

4) Handle valid zero CSeq in the SIP conntrack helper, from
Christophe Leroy.

5) Don't display a trailing slash in conntrack helper with no classes
via /proc/net/nf_conntrack_expect, from Liping Zhang.

6) Fix an expectation leak during creation from the nfqueue path, again
from Liping Zhang.

7) Validate netlink port ID in verdict message from nfqueue, otherwise
an injection can be possible. Again from Zhang.

8) Reject conntrack tuples with different transport protocol on
original and reply tuples, also from Zhang.

9) Validate offset and length in nft_exthdr, make sure they are under
sizeof(u8), from Laura Garcia Liebana.
====================

Signed-off-by: David S. Miller

David S. Miller
2016-08-11 05:54:27 +0800

10 Aug, 2016

5 commits

4da449ae1 netfilter: nft_exthdr: Add size check on u8 nft_exthdr attributes ... Browse Code »

Fix the direct assignment of offset and length attributes included in
nft_exthdr structure from u32 data to u8.

Signed-off-by: Laura Garcia Liebana
Signed-off-by: Pablo Neira Ayuso

Laura Garcia Liebana
2016-08-10 19:10:13 +0800
7bb90c371 bridge: Fix problems around fdb entries pointing to the bridge device ... Browse Code »

Adding fdb entries pointing to the bridge device uses fdb_insert(),
which lacks various checks and does not respect added_by_user flag.

As a result, some inconsistent behavior can happen:
* Adding temporary entries succeeds but results in permanent entries.
* Same goes for "dynamic" and "use".
* Changing mac address of the bridge device causes deletion of
user-added entries.
* Replacing existing entries looks successful from userspace but actually
not, regardless of NLM_F_EXCL flag.

Use the same logic as other entries and fix them.

Fixes: 3741873b4f73 ("bridge: allow adding of fdb entries pointing to the bridge device")
Signed-off-by: Toshiaki Makita
Acked-by: Roopa Prabhu
Signed-off-by: David S. Miller

Toshiaki Makita
2016-08-10 12:42:44 +0800
836384d25 net: phy: micrel: Add specific suspend ... Browse Code »

Disable all interrupts when suspend, they will be enabled
when resume. Otherwise, the suspend/resume process will be
blocked occasionally.

Signed-off-by: Wenyou Yang
Acked-by: Nicolas Ferre
Signed-off-by: David S. Miller

Wenyou Yang
2016-08-10 07:19:15 +0800
a96d3b759 dm9000: Fix irq trigger type setup on non-dt platforms ... Browse Code »

Commit b5a099c67a1c36b "net: ethernet: davicom: fix devicetree irq
resource" causes an interrupt storm after the ethernet interface
is activated on S3C24XX platform (ARM non-dt), due to the interrupt
trigger type not being set properly.

It seems, after adding parsing of IRQ flags in commit 7085a7401ba54e92b
"drivers: platform: parse IRQ flags from resources", there is no path
for non-dt platforms where irq_set_type callback could be invoked when
we don't pass the trigger type flags to the request_irq() call.

In case of a board where the regression is seen the interrupt trigger
type flags are passed through a platform device's resource and it is
not currently handled properly without passing the irq trigger type
flags to the request_irq() call. In case of OF an of_irq_get() call
within platform_get_irq() function seems to be ensuring required irq_chip
setup, but there is no equivalent code for non OF/ACPI platforms.

This patch mostly restores irq trigger type setting code which has been
removed in commit ("net: ethernet: davicom: fix devicetree irq resource").

Fixes: b5a099c67a1c36b913 ("net: ethernet: davicom: fix devicetree irq resource")

Signed-off-by: Sylwester Nawrocki
Acked-by: Robert Jarzmik
Signed-off-by: David S. Miller

Sylwester Nawrocki
2016-08-10 06:08:22 +0800
0d039f337 bonding: fix the typo ... Browse Code »

The message "803.ad" should be "802.3ad".

Signed-off-by: Zhu Yanjun
Signed-off-by: David S. Miller

Zhu Yanjun
2016-08-10 05:57:14 +0800