Eric Lee / smarc-fsl-linux-kernel

13 Aug, 2016

1 commit

747ea55e4 bpf: fix bpf_skb_in_cgroup helper naming ... Browse Code »

While hashing out BPF's current_task_under_cgroup helper bits, it came
to discussion that the skb_in_cgroup helper name was suboptimally chosen.

Tejun says:

So, I think in_cgroup should mean that the object is in that
particular cgroup while under_cgroup in the subhierarchy of that
cgroup. Let's rename the other subhierarchy test to under too. I
think that'd be a lot less confusing going forward.

[...]

It's more intuitive and gives us the room to implement the real
"in" test if ever necessary in the future.

Since this touches uapi bits, we need to change this as long as v4.8
is not yet officially released. Thus, change the helper enum and rename
related bits.

Fixes: 4a482f34afcc ("cgroup: bpf: Add bpf_skb_in_cgroup_proto")
Reference: http://patchwork.ozlabs.org/patch/658500/
Suggested-by: Sargun Dhillon
Suggested-by: Tejun Heo
Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov

Daniel Borkmann
2016-08-13 12:53:33 +0800

11 Aug, 2016

3 commits

4b5b9ba55 openvswitch: do not ignore netdev errors when creating tunnel vports ... Browse Code »

The creation of a tunnel vport (geneve, gre, vxlan) brings up a
corresponding netdev, a multi-step operation which can fail.

For example, changing a vxlan vport's netdev state to 'up' binds the
vport's socket to a UDP port - if the binding fails (e.g. due to the
port being in use), the error is currently ignored giving the
appearance that the tunnel vport creation completed successfully.

Signed-off-by: Martynas Pumputis
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Martynas Pumputis
2016-08-11 14:13:23 +0800
672ca65d9 tipc: fix variable dereference before NULL check ... Browse Code »

In commit cf6f7e1d5109 ("tipc: dump monitor attributes"),
I dereferenced a pointer before checking if its valid.
This is reported by static check Smatch as:
net/tipc/monitor.c:733 tipc_nl_add_monitor_peer()
warn: variable dereferenced before check 'mon' (see line 731)

In this commit, we check for a valid monitor before proceeding
with any other operation.

Fixes: cf6f7e1d5109 ("tipc: dump monitor attributes")
Reported-by: Dan Carpenter
Signed-off-by: Parthasarathy Bhuvaragan
Signed-off-by: David S. Miller

Parthasarathy Bhuvaragan
2016-08-11 08:56:52 +0800
293fddff2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf ... Browse Code »

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for your net tree,
they are:

1) Use mod_timer_pending() to avoid reactivating a dead expectation in
the h323 conntrack helper, from Liping Zhang.

2) Oneliner to fix a type in the register name defined in the nf_tables
header.

3) Don't try to look further when we find an inactive elements with no
descendants in the rbtree set implementation, otherwise we crash.

4) Handle valid zero CSeq in the SIP conntrack helper, from
Christophe Leroy.

5) Don't display a trailing slash in conntrack helper with no classes
via /proc/net/nf_conntrack_expect, from Liping Zhang.

6) Fix an expectation leak during creation from the nfqueue path, again
from Liping Zhang.

7) Validate netlink port ID in verdict message from nfqueue, otherwise
an injection can be possible. Again from Zhang.

8) Reject conntrack tuples with different transport protocol on
original and reply tuples, also from Zhang.

9) Validate offset and length in nft_exthdr, make sure they are under
sizeof(u8), from Laura Garcia Liebana.
====================

Signed-off-by: David S. Miller

David S. Miller
2016-08-11 05:54:27 +0800

10 Aug, 2016

8 commits

4da449ae1 netfilter: nft_exthdr: Add size check on u8 nft_exthdr attributes ... Browse Code »

Fix the direct assignment of offset and length attributes included in
nft_exthdr structure from u32 data to u8.

Signed-off-by: Laura Garcia Liebana
Signed-off-by: Pablo Neira Ayuso

Laura Garcia Liebana
2016-08-10 19:10:13 +0800
7bb90c371 bridge: Fix problems around fdb entries pointing to the bridge device ... Browse Code »

Adding fdb entries pointing to the bridge device uses fdb_insert(),
which lacks various checks and does not respect added_by_user flag.

As a result, some inconsistent behavior can happen:
* Adding temporary entries succeeds but results in permanent entries.
* Same goes for "dynamic" and "use".
* Changing mac address of the bridge device causes deletion of
user-added entries.
* Replacing existing entries looks successful from userspace but actually
not, regardless of NLM_F_EXCL flag.

Use the same logic as other entries and fix them.

Fixes: 3741873b4f73 ("bridge: allow adding of fdb entries pointing to the bridge device")
Signed-off-by: Toshiaki Makita
Acked-by: Roopa Prabhu
Signed-off-by: David S. Miller

Toshiaki Makita
2016-08-10 12:42:44 +0800
a5d0dc810 vti: flush x-netns xfrm cache when vti interface is removed ... Browse Code »

When executing the script included below, the netns delete operation
hangs with the following message (repeated at 10 second intervals):

kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1

This occurs because a reference to the lo interface in the "secure" netns
is still held by a dst entry in the xfrm bundle cache in the init netns.

Address this problem by garbage collecting the tunnel netns flow cache
when a cross-namespace vti interface receives a NETDEV_DOWN notification.

A more detailed description of the problem scenario (referencing commands
in the script below):

(1) ip link add vti_test type vti local 1.1.1.1 remote 1.1.1.2 key 1

The vti_test interface is created in the init namespace. vti_tunnel_init()
attaches a struct ip_tunnel to the vti interface's netdev_priv(dev),
setting the tunnel net to &init_net.

(2) ip link set vti_test netns secure

The vti_test interface is moved to the "secure" netns. Note that
the associated struct ip_tunnel still has tunnel->net set to &init_net.

(3) ip netns exec secure ping -c 4 -i 0.02 -I 192.168.100.1 192.168.200.1

The first packet sent using the vti device causes xfrm_lookup() to be
called as follows:

dst = xfrm_lookup(tunnel->net, skb_dst(skb), fl, NULL, 0);

Note that tunnel->net is the init namespace, while skb_dst(skb) references
the vti_test interface in the "secure" namespace. The returned dst
references an interface in the init namespace.

Also note that the first parameter to xfrm_lookup() determines which flow
cache is used to store the computed xfrm bundle, so after xfrm_lookup()
returns there will be a cached bundle in the init namespace flow cache
with a dst referencing a device in the "secure" namespace.

(4) ip netns del secure

Kernel begins to delete the "secure" namespace. At some point the
vti_test interface is deleted, at which point dst_ifdown() changes
the dst->dev in the cached xfrm bundle flow from vti_test to lo (still
in the "secure" namespace however).
Since nothing has happened to cause the init namespace's flow cache
to be garbage collected, this dst remains attached to the flow cache,
so the kernel loops waiting for the last reference to lo to go away.

ip link add br1 type bridge
ip link set dev br1 up
ip addr add dev br1 1.1.1.1/8

ip netns add secure
ip link add vti_test type vti local 1.1.1.1 remote 1.1.1.2 key 1
ip link set vti_test netns secure
ip netns exec secure ip link set vti_test up
ip netns exec secure ip link s lo up
ip netns exec secure ip addr add dev lo 192.168.100.1/24
ip netns exec secure ip route add 192.168.200.0/24 dev vti_test
ip xfrm policy flush
ip xfrm state flush
ip xfrm policy add dir out tmpl src 1.1.1.1 dst 1.1.1.2 \
proto esp mode tunnel mark 1
ip xfrm policy add dir in tmpl src 1.1.1.2 dst 1.1.1.1 \
proto esp mode tunnel mark 1
ip xfrm state add src 1.1.1.1 dst 1.1.1.2 proto esp spi 1 \
mode tunnel enc des3_ede 0x112233445566778811223344556677881122334455667788
ip xfrm state add src 1.1.1.2 dst 1.1.1.1 proto esp spi 1 \
mode tunnel enc des3_ede 0x112233445566778811223344556677881122334455667788

ip netns exec secure ping -c 4 -i 0.02 -I 192.168.100.1 192.168.200.1

ip netns del secure

Reported-by: Hangbin Liu
Reported-by: Jan Tluka
Signed-off-by: Lance Richardson
Signed-off-by: David S. Miller

Lance Richardson
2016-08-10 03:57:49 +0800
992c273af rxrpc: Free packets discarded in data_ready ... Browse Code »

Under certain conditions, the data_ready handler will discard a packet.
These need to be freed.

Signed-off-by: David Howells

David Howells
2016-08-10 00:13:56 +0800
50fd85a1f rxrpc: Fix a use-after-push in data_ready handler ... Browse Code »

Fix a use of a packet after it has been enqueued onto the packet processing
queue in the data_ready handler. Once on a call's Rx queue, we mustn't
touch it any more as it may be dequeued and freed by the call processor
running on a work queue.

Save the values we need before enqueuing.

Without this, we can get an oops like the following:

BUG: unable to handle kernel NULL pointer dereference at 000000000000009c
IP: [] rxrpc_fast_process_packet+0x724/0xa11 [af_rxrpc]
PGD 0
Oops: 0000 [#1] SMP
Modules linked in: kafs(E) af_rxrpc(E) [last unloaded: af_rxrpc]
CPU: 2 PID: 0 Comm: swapper/2 Tainted: G E 4.7.0-fsdevel+ #1336
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
task: ffff88040d6863c0 task.stack: ffff88040d68c000
RIP: 0010:[] [] rxrpc_fast_process_packet+0x724/0xa11 [af_rxrpc]
RSP: 0018:ffff88041fb03a78 EFLAGS: 00010246
RAX: ffffffffffffffff RBX: ffff8803ff195b00 RCX: 0000000000000001
RDX: ffffffffa01854d1 RSI: 0000000000000008 RDI: ffff8803ff195b00
RBP: ffff88041fb03ab0 R08: 0000000000000000 R09: 0000000000000001
R10: ffff88041fb038c8 R11: 0000000000000000 R12: ffff880406874800
R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff88041fb00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000000009c CR3: 0000000001c14000 CR4: 00000000001406e0
Stack:
ffff8803ff195ea0 ffff880408348800 ffff880406874800 ffff8803ff195b00
ffff880408348800 ffff8803ff195ed8 0000000000000000 ffff88041fb03af0
ffffffffa0186072 0000000000000000 ffff8804054da000 0000000000000000
Call Trace:

[] rxrpc_data_ready+0x89d/0xbae [af_rxrpc]
[] __sock_queue_rcv_skb+0x24c/0x2b2
[] __udp_queue_rcv_skb+0x4b/0x1bd
[] udp_queue_rcv_skb+0x281/0x4db
[] __udp4_lib_rcv+0x7ed/0x963
[] udp_rcv+0x15/0x17
[] ip_local_deliver_finish+0x1c3/0x318
[] ip_local_deliver+0xbb/0xc4
[] ? inet_del_offload+0x40/0x40
[] ip_rcv_finish+0x3ce/0x42c
[] ip_rcv+0x304/0x33d
[] ? ip_local_deliver_finish+0x318/0x318
[] __netif_receive_skb_core+0x601/0x6e8
[] __netif_receive_skb+0x13/0x54
[] netif_receive_skb_internal+0xbb/0x17c
[] napi_gro_receive+0xf9/0x1bd
[] rtl8169_poll+0x32b/0x4a8
[] net_rx_action+0xe8/0x357
[] __do_softirq+0x1aa/0x414
[] irq_exit+0x3d/0xb0
[] do_IRQ+0xe4/0xfc
[] common_interrupt+0x93/0x93

[] ? cpuidle_enter_state+0x1ad/0x2be
[] ? cpuidle_enter_state+0x1a8/0x2be
[] cpuidle_enter+0x12/0x14
[] call_cpuidle+0x39/0x3b
[] cpu_startup_entry+0x230/0x35d
[] start_secondary+0xf4/0xf7

Signed-off-by: David Howells

David Howells
2016-08-10 00:13:55 +0800
2e7e9758b rxrpc: Once packet posted in data_ready, don't retry posting ... Browse Code »

Once a packet has been posted to a connection in the data_ready handler, we
mustn't try reposting if we then find that the connection is dying as the
refcount has been given over to the dying connection and the packet might
no longer exist.

Losing the packet isn't a problem as the peer will retransmit.

Signed-off-by: David Howells

David Howells
2016-08-10 00:13:55 +0800
f9dc57572 rxrpc: Don't access connection from call if pointer is NULL ... Browse Code »

The call state machine processor sets up the message parameters for a UDP
message that it might need to transmit in advance on the basis that there's
a very good chance it's going to have to transmit either an ACK or an
ABORT. This requires it to look in the connection struct to retrieve some
of the parameters.

However, if the call is complete, the call connection pointer may be NULL
to dissuade the processor from transmitting a message. However, there are
some situations where the processor is still going to be called - and it's
still going to set up message parameters whether it needs them or not.

This results in a NULL pointer dereference at:

net/rxrpc/call_event.c:837

To fix this, skip the message pre-initialisation if there's no connection
attached.

Signed-off-by: David Howells

David Howells
2016-08-10 00:12:23 +0800
17b963e31 rxrpc: Need to flag call as being released on connect failure ... Browse Code »

If rxrpc_new_client_call() fails to make a connection, the call record that
it allocated needs to be marked as RXRPC_CALL_RELEASED before it is passed
to rxrpc_put_call() to indicate that it no longer has any attachment to the
AF_RXRPC socket.

Without this, an assertion failure may occur at:

net/rxrpc/call_object:635

Signed-off-by: David Howells

David Howells
2016-08-10 00:12:23 +0800

09 Aug, 2016

11 commits

55cae7a40 rxrpc: fix uninitialized pointer dereference in debug code ... Browse Code »

A newly added bugfix caused an uninitialized variable to be
used for printing debug output. This is harmless as long
as the debug setting is disabled, but otherwise leads to an
immediate crash.

gcc warns about this when -Wmaybe-uninitialized is enabled:

net/rxrpc/call_object.c: In function 'rxrpc_release_call':
net/rxrpc/call_object.c:496:163: error: 'sp' may be used uninitialized in this function [-Werror=maybe-uninitialized]

The initialization was removed but one of the users remains.
This adds back the initialization.

Signed-off-by: Arnd Bergmann
Fixes: 372ee16386bb ("rxrpc: Fix races between skb free, ACK generation and replying")
Signed-off-by: David Howells

Arnd Bergmann
2016-08-09 17:51:38 +0800
aa0c2c68a netfilter: ctnetlink: reject new conntrack request with different l4proto ... Browse Code »

Currently, user can add a conntrack with different l4proto via nfnetlink.
For example, original tuple is TCP while reply tuple is SCTP. This is
invalid combination, we should report EINVAL to userspace.

Signed-off-by: Liping Zhang
Signed-off-by: Pablo Neira Ayuso

Liping Zhang
2016-08-09 16:39:26 +0800
00a3101f5 netfilter: nfnetlink_queue: reject verdict request from different portid ... Browse Code »

Like NFQNL_MSG_VERDICT_BATCH do, we should also reject the verdict
request when the portid is not same with the initial portid(maybe
from another process).

Fixes: 97d32cf9440d ("netfilter: nfnetlink_queue: batch verdict support")
Signed-off-by: Liping Zhang
Reviewed-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Liping Zhang
2016-08-09 16:39:25 +0800
b18bcb001 netfilter: nfnetlink_queue: fix memory leak when attach expectation successfully ... Browse Code »

User can use NFQA_EXP to attach expectations to conntracks, but we
forget to put back nf_conntrack_expect when it is inserted successfully,
i.e. in this normal case, expect's use refcnt will be 3. So even we
unlink it and put it back later, the use refcnt is still 1, then the
memory will be leaked forever.

Signed-off-by: Liping Zhang
Signed-off-by: Pablo Neira Ayuso

Liping Zhang
2016-08-09 16:39:25 +0800
b173a28f6 netfilter: nf_ct_expect: remove the redundant slash when policy name is empty ... Browse Code »

The 'name' filed in struct nf_conntrack_expect_policy{} is not a
pointer, so check it is NULL or not will always return true. Even if the
name is empty, slash will always be displayed like follows:
# cat /proc/net/nf_conntrack_expect
297 l3proto = 2 proto=6 src=1.1.1.1 dst=2.2.2.2 sport=1 dport=1025 ftp/
^

Fixes: 3a8fc53a45c4 ("netfilter: nf_ct_helper: allocate 16 bytes for the helper and policy names")
Signed-off-by: Liping Zhang
Signed-off-by: Pablo Neira Ayuso

Liping Zhang
2016-08-09 16:38:46 +0800
1fe323aa1 sctp: use event->chunk when it's valid ... Browse Code »

Commit 52253db924d1 ("sctp: also point GSO head_skb to the sk when
it's available") used event->chunk->head_skb to get the head_skb in
sctp_ulpevent_set_owner().

But at that moment, the event->chunk was NULL, as it cloned the skb
in sctp_ulpevent_make_rcvmsg(). Therefore, that patch didn't really
work.

This patch is to move the event->chunk initialization before calling
sctp_ulpevent_receive_data() so that it uses event->chunk when it's
valid.

Fixes: 52253db924d1 ("sctp: also point GSO head_skb to the sk when it's available")
Signed-off-by: Xin Long
Acked-by: Marcelo Ricardo Leitner
Acked-by: Neil Horman
Signed-off-by: David S. Miller

Xin Long
2016-08-09 05:31:23 +0800
8065694e6 bpf: fix checksum for vlan push/pop helper ... Browse Code »

When having skbs on ingress with CHECKSUM_COMPLETE, tc BPF programs don't
push rcsum of mac header back in and after BPF run back pull out again as
opposed to some other subsystems (ovs, for example).

For cases like q-in-q, meaning when a vlan tag for offloading is already
present and we're about to push another one, then skb_vlan_push() pushes the
inner one into the skb, increasing mac header and skb_postpush_rcsum()'ing
the 4 bytes vlan header diff. Likewise, for the reverse operation in
skb_vlan_pop() for the case where vlan header needs to be pulled out of the
skb, we're decreasing the mac header and skb_postpull_rcsum()'ing the 4 bytes
rcsum of the vlan header that was removed.

However mangling the rcsum here will lead to hw csum failure for BPF case,
since we're pulling or pushing data that was not part of the current rcsum.
Changing tc BPF programs in general to push/pull rcsum around BPF_PROG_RUN()
is also not really an option since current behaviour is ABI by now, but apart
from that would also mean to do quite a bit of useless work in the sense that
usually 12 bytes need to be rcsum pushed/pulled also when we don't need to
touch this vlan related corner case. One way to fix it would be to push the
necessary rcsum fixup down into vlan helpers that are (mostly) slow-path
anyway.

Fixes: 4e10df9a60d9 ("bpf: introduce bpf_skb_vlan_push/pop() helpers")
Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Daniel Borkmann
2016-08-09 04:11:43 +0800
479ffccce bpf: fix checksum fixups on bpf_skb_store_bytes ... Browse Code »

bpf_skb_store_bytes() invocations above L2 header need BPF_F_RECOMPUTE_CSUM
flag for updates, so that CHECKSUM_COMPLETE will be fixed up along the way.
Where we ran into an issue with bpf_skb_store_bytes() is when we did a
single-byte update on the IPv6 hoplimit despite using BPF_F_RECOMPUTE_CSUM
flag; simple ping via ICMPv6 triggered a hw csum failure as a result. The
underlying issue has been tracked down to a buffer alignment issue.

Meaning, that csum_partial() computations via skb_postpull_rcsum() and
skb_postpush_rcsum() pair invoked had a wrong result since they operated on
an odd address for the hoplimit, while other computations were done on an
even address. This mix doesn't work as-is with skb_postpull_rcsum(),
skb_postpush_rcsum() pair as it always expects at least half-word alignment
of input buffers, which is normally the case. Thus, instead of these helpers
using csum_sub() and (implicitly) csum_add(), we need to use csum_block_sub(),
csum_block_add(), respectively. For unaligned offsets, they rotate the sum
to align it to a half-word boundary again, otherwise they work the same as
csum_sub() and csum_add().

Adding __skb_postpull_rcsum(), __skb_postpush_rcsum() variants that take the
offset as an input and adapting bpf_skb_store_bytes() to them fixes the hw
csum failures again. The skb_postpull_rcsum(), skb_postpush_rcsum() helpers
use a 0 constant for offset so that the compiler optimizes the offset & 1
test away and generates the same code as with csum_sub()/_add().

Fixes: 608cd71a9c7c ("tc: bpf: generalize pedit action")
Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Daniel Borkmann
2016-08-09 04:11:43 +0800
a2bfe6bf0 bpf: also call skb_postpush_rcsum on xmit occasions ... Browse Code »

Follow-up to commit f8ffad69c9f8 ("bpf: add skb_postpush_rcsum and fix
dev_forward_skb occasions") to fix an issue for dev_queue_xmit() redirect
locations which need CHECKSUM_COMPLETE fixups on ingress.

For the same reasons as described in f8ffad69c9f8 already, we of course
also need this here, since dev_queue_xmit() on a veth device will let us
end up in the dev_forward_skb() helper again to cross namespaces.

Latter then calls into skb_postpull_rcsum() to pull out L2 header, so
that netif_rx_internal() sees CHECKSUM_COMPLETE as it is expected. That
is, CHECKSUM_COMPLETE on ingress covering L2 _payload_, not L2 headers.

Also here we have to address bpf_redirect() and bpf_clone_redirect().

Fixes: 3896d655f4d4 ("bpf: introduce bpf_clone_redirect() helper")
Fixes: 27b29f63058d ("bpf: add bpf_redirect() helper")
Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Daniel Borkmann
2016-08-09 04:11:43 +0800
1ba8d77f4 sctp_diag: Respect ss adding TCPF_CLOSE to idiag_states ... Browse Code »

Since 'ss' always adds TCPF_CLOSE to idiag_states flags, sctp_diag can't
rely upon TCPF_LISTEN flag solely being present when listening sockets
are requested.

Signed-off-by: Phil Sutter
Signed-off-by: David S. Miller

Phil Sutter
2016-08-09 03:51:58 +0800
12474e8e5 sctp_diag: Fix T3_rtx timer export ... Browse Code »

The asoc's timer value is not kept in asoc->timeouts array but in it's
primary transport instead.

Furthermore, we must export the timer only if it is pending, otherwise
the value will underrun when stored in an unsigned variable and
user space will only see a very large timeout value.

Signed-off-by: Phil Sutter
Signed-off-by: David S. Miller

Phil Sutter
2016-08-09 03:51:58 +0800

08 Aug, 2016

3 commits

0d35d0815 netfilter: nf_conntrack_sip: CSeq 0 is a valid CSeq ... Browse Code »

Do not drop packet when CSeq is 0 as 0 is also a valid value for CSeq.

simple_strtoul() will return 0 either when all digits are 0
or if there are no digits at all. Therefore when simple_strtoul()
returns 0 we check if first character is digit 0 or not.

Signed-off-by: Christophe Leroy
Signed-off-by: Pablo Neira Ayuso

Christophe Leroy
2016-08-08 17:58:43 +0800
c1eda3c63 netfilter: nft_rbtree: ignore inactive matching element with no descendants ... Browse Code »

If we find a matching element that is inactive with no descendants, we
jump to the found label, then crash because of nul-dereference on the
left branch.

Fix this by checking that the element is active and not an interval end
and skipping the logic that only applies to the tree iteration.

Signed-off-by: Pablo Neira Ayuso
Tested-by: Anders K. Pedersen

Pablo Neira Ayuso
2016-08-08 17:27:37 +0800
707e6835f netfilter: nf_ct_h323: do not re-activate already expired timer ... Browse Code »

Commit 96d1327ac2e3 ("netfilter: h323: Use mod_timer instead of
set_expect_timeout") just simplify the source codes
if (!del_timer(&exp->timeout))
return 0;
add_timer(&exp->timeout);
to mod_timer(&exp->timeout, jiffies + info->timeout * HZ);

This is not correct, and introduce a race codition:
CPU0 CPU1
- timer expire
process_rcf expectation_timed_out
lock(exp_lock) -
find_exp waiting exp_lock...
re-activate timer!! waiting exp_lock...
unlock(exp_lock) lock(exp_lock)
- unlink expect
- free(expect)
- unlock(exp_lock)
So when the timer expires again, we will access the memory that
was already freed.

Replace mod_timer with mod_timer_pending here to fix this problem.

Fixes: 96d1327ac2e3 ("netfilter: h323: Use mod_timer instead of set_expect_timeout")
Cc: Gao Feng
Signed-off-by: Liping Zhang
Signed-off-by: Pablo Neira Ayuso

Liping Zhang
2016-08-08 15:26:40 +0800

07 Aug, 2016

1 commit

ca25ebe55 Merge tag 'mac80211-for-davem-2016-08-05' of git://git.kernel.org/pub/scm/linux/… ... Browse Code »

…kernel/git/jberg/mac80211

Johannes Berg says:

====================
First set of fixes for the current cycle:
* fix 80+80 bandwidth warning
* fix powersave with mac80211 TXQ implementation
* use correct way to free SKBs from multicast buffering
* mesh: fix operation ordering to work with all drivers
* mesh: end service period even when peer goes away
* mesh: correct HT opmode validity checks
* pass hw pointer from mac80211 to driver in TPT method,
fixing a bug (in a bit the wrong way, but that's what
we have right now)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

David S. Miller
2016-08-07 08:52:00 +0800

06 Aug, 2016

3 commits

94d9f1c59 ipv4: panic in leaf_walk_rcu due to stale node pointer ... Browse Code »

Panic occurs when issuing "cat /proc/net/route" whilst
populating FIB with > 1M routes.

Use of cached node pointer in fib_route_get_idx is unsafe.

BUG: unable to handle kernel paging request at ffffc90001630024
IP: [] leaf_walk_rcu+0x10/0xe0
PGD 11b08d067 PUD 11b08e067 PMD dac4b067 PTE 0
Oops: 0000 [#1] SMP
Modules linked in: nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscac
snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep virti
acpi_cpufreq button parport_pc ppdev lp parport autofs4 ext4 crc16 mbcache jbd
tio_ring virtio floppy uhci_hcd ehci_hcd usbcore usb_common libata scsi_mod
CPU: 1 PID: 785 Comm: cat Not tainted 4.2.0-rc8+ #4
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
task: ffff8800da1c0bc0 ti: ffff88011a05c000 task.ti: ffff88011a05c000
RIP: 0010:[] [] leaf_walk_rcu+0x10/0xe0
RSP: 0018:ffff88011a05fda0 EFLAGS: 00010202
RAX: ffff8800d8a40c00 RBX: ffff8800da4af940 RCX: ffff88011a05ff20
RDX: ffffc90001630020 RSI: 0000000001013531 RDI: ffff8800da4af950
RBP: 0000000000000000 R08: ffff8800da1f9a00 R09: 0000000000000000
R10: ffff8800db45b7e4 R11: 0000000000000246 R12: ffff8800da4af950
R13: ffff8800d97a74c0 R14: 0000000000000000 R15: ffff8800d97a7480
FS: 00007fd3970e0700(0000) GS:ffff88011fd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffc90001630024 CR3: 000000011a7e4000 CR4: 00000000000006e0
Stack:
ffffffff814d00d3 0000000000000000 ffff88011a05ff20 ffff8800da1f9a00
ffffffff811dd8b9 0000000000000800 0000000000020000 00007fd396f35000
ffffffff811f8714 0000000000003431 ffffffff8138dce0 0000000000000f80
Call Trace:
[] ? fib_route_seq_start+0x93/0xc0
[] ? seq_read+0x149/0x380
[] ? fsnotify+0x3b4/0x500
[] ? process_echoes+0x70/0x70
[] ? proc_reg_read+0x47/0x70
[] ? __vfs_read+0x23/0xd0
[] ? rw_verify_area+0x52/0xf0
[] ? vfs_read+0x81/0x120
[] ? SyS_read+0x42/0xa0
[] ? entry_SYSCALL_64_fastpath+0x16/0x75
Code: 48 85 c0 75 d8 f3 c3 31 c0 c3 f3 c3 66 66 66 66 66 66 2e 0f 1f 84 00 00
a 04 89 f0 33 02 44 89 c9 48 d3 e8 0f b6 4a 05 49 89
RIP [] leaf_walk_rcu+0x10/0xe0
RSP
CR2: ffffc90001630024

Signed-off-by: Dave Forster
Acked-by: Alexander Duyck
Signed-off-by: David S. Miller

David Forster
2016-08-06 12:10:05 +0800
372ee1638 rxrpc: Fix races between skb free, ACK generation and replying ... Browse Code »

Inside the kafs filesystem it is possible to occasionally have a call
processed and terminated before we've had a chance to check whether we need
to clean up the rx queue for that call because afs_send_simple_reply() ends
the call when it is done, but this is done in a workqueue item that might
happen to run to completion before afs_deliver_to_call() completes.

Further, it is possible for rxrpc_kernel_send_data() to be called to send a
reply before the last request-phase data skb is released. The rxrpc skb
destructor is where the ACK processing is done and the call state is
advanced upon release of the last skb. ACK generation is also deferred to
a work item because it's possible that the skb destructor is not called in
a context where kernel_sendmsg() can be invoked.

To this end, the following changes are made:

(1) kernel_rxrpc_data_consumed() is added. This should be called whenever
an skb is emptied so as to crank the ACK and call states. This does
not release the skb, however. kernel_rxrpc_free_skb() must now be
called to achieve that. These together replace
rxrpc_kernel_data_delivered().

(2) kernel_rxrpc_data_consumed() is wrapped by afs_data_consumed().

This makes afs_deliver_to_call() easier to work as the skb can simply
be discarded unconditionally here without trying to work out what the
return value of the ->deliver() function means.

The ->deliver() functions can, via afs_data_complete(),
afs_transfer_reply() and afs_extract_data() mark that an skb has been
consumed (thereby cranking the state) without the need to
conditionally free the skb to make sure the state is correct on an
incoming call for when the call processor tries to send the reply.

(3) rxrpc_recvmsg() now has to call kernel_rxrpc_data_consumed() when it
has finished with a packet and MSG_PEEK isn't set.

(4) rxrpc_packet_destructor() no longer calls rxrpc_hard_ACK_data().

Because of this, we no longer need to clear the destructor and put the
call before we free the skb in cases where we don't want the ACK/call
state to be cranked.

(5) The ->deliver() call-type callbacks are made to return -EAGAIN rather
than 0 if they expect more data (afs_extract_data() returns -EAGAIN to
the delivery function already), and the caller is now responsible for
producing an abort if that was the last packet.

(6) There are many bits of unmarshalling code where:

ret = afs_extract_data(call, skb, last, ...);
switch (ret) {
case 0: break;
case -EAGAIN: return 0;
default: return ret;
}

is to be found. As -EAGAIN can now be passed back to the caller, we
now just return if ret < 0:

ret = afs_extract_data(call, skb, last, ...);
if (ret < 0)
return ret;

(7) Checks for trailing data and empty final data packets has been
consolidated as afs_data_complete(). So:

if (skb->len > 0)
return -EBADMSG;
if (!last)
return 0;

becomes:

ret = afs_data_complete(call, skb, last);
if (ret < 0)
return ret;

(8) afs_transfer_reply() now checks the amount of data it has against the
amount of data desired and the amount of data in the skb and returns
an error to induce an abort if we don't get exactly what we want.

Without these changes, the following oops can occasionally be observed,
particularly if some printks are inserted into the delivery path:

general protection fault: 0000 [#1] SMP
Modules linked in: kafs(E) af_rxrpc(E) [last unloaded: af_rxrpc]
CPU: 0 PID: 1305 Comm: kworker/u8:3 Tainted: G E 4.7.0-fsdevel+ #1303
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
Workqueue: kafsd afs_async_workfn [kafs]
task: ffff88040be041c0 ti: ffff88040c070000 task.ti: ffff88040c070000
RIP: 0010:[] [] __lock_acquire+0xcf/0x15a1
RSP: 0018:ffff88040c073bc0 EFLAGS: 00010002
RAX: 6b6b6b6b6b6b6b6b RBX: 0000000000000000 RCX: ffff88040d29a710
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88040d29a710
RBP: ffff88040c073c70 R08: 0000000000000001 R09: 0000000000000001
R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: ffff88040be041c0 R15: ffffffff814c928f
FS: 0000000000000000(0000) GS:ffff88041fa00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa4595f4750 CR3: 0000000001c14000 CR4: 00000000001406f0
Stack:
0000000000000006 000000000be04930 0000000000000000 ffff880400000000
ffff880400000000 ffffffff8108f847 ffff88040be041c0 ffffffff81050446
ffff8803fc08a920 ffff8803fc08a958 ffff88040be041c0 ffff88040c073c38
Call Trace:
[] ? mark_held_locks+0x5e/0x74
[] ? __local_bh_enable_ip+0x9b/0xa1
[] ? trace_hardirqs_on_caller+0x16d/0x189
[] lock_acquire+0x122/0x1b6
[] ? lock_acquire+0x122/0x1b6
[] ? skb_dequeue+0x18/0x61
[] _raw_spin_lock_irqsave+0x35/0x49
[] ? skb_dequeue+0x18/0x61
[] skb_dequeue+0x18/0x61
[] afs_deliver_to_call+0x344/0x39d [kafs]
[] afs_process_async_call+0x4c/0xd5 [kafs]
[] afs_async_workfn+0xe/0x10 [kafs]
[] process_one_work+0x29d/0x57c
[] worker_thread+0x24a/0x385
[] ? rescuer_thread+0x2d0/0x2d0
[] kthread+0xf3/0xfb
[] ret_from_fork+0x1f/0x40
[] ? kthread_create_on_node+0x1cf/0x1cf

Signed-off-by: David Howells
Signed-off-by: David S. Miller

David Howells
2016-08-06 12:08:40 +0800
5ef9f289c OVS: Ignore negative headroom value ... Browse Code »

net_device->ndo_set_rx_headroom (introduced in
871b642adebe300be2e50aa5f65a418510f636ec) says

"Setting a negtaive value reset the rx headroom
to the default value".

It seems that the OVS implementation in
3a927bc7cf9d0fbe8f4a8189dd5f8440228f64e7 overlooked this and sets
dev->needed_headroom unconditionally.

This doesn't have an immediate effect, but can mess up later
LL_RESERVED_SPACE calculations, such as done in
net/ipv6/mcast.c:mld_newpack. For reference, this issue was found
from a skb_panic raised there after the length calculations had given
the wrong result.

Note the other current users of this interface
(drivers/net/tun.c:tun_set_headroom and
drivers/net/veth.c:veth_set_rx_headroom) are both checking this
correctly thus need no modification.

Thanks to Ben for some pointers from the crash dumps!

Cc: Benjamin Poirier
Cc: Paolo Abeni
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1361414
Signed-off-by: Ian Wienand
Signed-off-by: David S. Miller

Ian Wienand
2016-08-06 12:06:11 +0800

05 Aug, 2016

4 commits

2439ca040 mac80211: Add ieee80211_hw pointer to get_expected_throughput ... Browse Code »

The variable is added to allow the driver an easy access to
it's own hw->priv when the op is invoked.

This fixes a crash in wlcore because it was relying on a
station pointer that wasn't initialized yet. It's the wrong
way to fix the crash, but it solves the problem for now and
it does make sense to have the hw pointer here.

Signed-off-by: Maxim Altshul
[rewrite commit message, fix indentation]
Signed-off-by: Johannes Berg

Maxim Altshul
2016-08-05 20:23:25 +0800
9757235f4 nl80211: correct checks for NL80211_MESHCONF_HT_OPMODE value ... Browse Code »

Previously, NL80211_MESHCONF_HT_OPMODE validation rejected correct
flag combinations, e.g. IEEE80211_HT_OP_MODE_PROTECTION_NONHT_MIXED |
IEEE80211_HT_OP_MODE_NON_HT_STA_PRSNT.

Doing just a range-check allows setting flags that don't exist (0x8)
and invalid flag combinations.

Implements some checks based on IEEE 802.11 2012 8.4.2.59 "HT
Operation element".

Signed-off-by: Masashi Honma
[reword commit message, simplify a bit]
Signed-off-by: Johannes Berg

Masashi Honma
2016-08-05 20:14:54 +0800
71f2c3470 mac80211: End the MPSP even if EOSP frame was not acked ... Browse Code »

If QoS frame with EOSP (end of service period) subfield=1 sent by local
peer was not acked by remote peer, local peer did not end the MPSP. This
prevents local peer from going to DOZE state. And if the remote peer
goes away without closing connection, local peer continues AWAKE state
and wastes battery.

Signed-off-by: Masashi Honma
Acked-by: Bob Copeland
Signed-off-by: Johannes Berg

Masashi Honma
2016-08-05 20:06:29 +0800
6b07d9ca9 mac80211: fix purging multicast PS buffer queue ... Browse Code »

The code currently assumes that buffered multicast PS frames don't have
a pending ACK frame for tx status reporting.
However, hostapd sends a broadcast deauth frame on teardown for which tx
status is requested. This can lead to the "Have pending ack frames"
warning on module reload.
Fix this by using ieee80211_free_txskb/ieee80211_purge_tx_queue.

Cc: stable@vger.kernel.org
Signed-off-by: Felix Fietkau
Signed-off-by: Johannes Berg

Felix Fietkau
2016-08-05 20:06:28 +0800

04 Aug, 2016

1 commit

bce91f8a4 openvswitch: Remove incorrect WARN_ONCE(). ... Browse Code »

ovs_ct_find_existing() issues a warning if an existing conntrack entry
classified as IP_CT_NEW is found, with the premise that this should
not happen. However, a newly confirmed, non-expected conntrack entry
remains IP_CT_NEW as long as no reply direction traffic is seen. This
has resulted into somewhat confusing kernel log messages. This patch
removes this check and warning.

Fixes: 289f2253 ("openvswitch: Find existing conntrack entry after upcall.")
Suggested-by: Joe Stringer
Signed-off-by: Jarno Rajahalme
Acked-by: Joe Stringer
Signed-off-by: David S. Miller

Jarno Rajahalme
2016-08-04 02:50:40 +0800

03 Aug, 2016

3 commits

f0936155f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:

1) Fix several cases of missing of_node_put() calls in various
networking drivers. From Peter Chen.

2) Don't try to remove unconfigured VLANs in qed driver, from Yuval
Mintz.

3) Unbalanced locking in TIPC error handling, from Wei Yongjun.

4) Fix lockups in CPDMA driver, from Grygorii Strashko.

5) More MACSEC refcount et al fixes, from Sabrina Dubroca.

6) Fix MAC address setting in r8169 during runtime suspend, from
Chun-Hao Lin.

7) Various printf format specifier fixes, from Heinrich Schuchardt.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (59 commits)
qed: Fail driver load in 100g MSI mode.
ethernet: ti: davinci_emac: add missing of_node_put after calling of_parse_phandle
ethernet: stmicro: stmmac: add missing of_node_put after calling of_parse_phandle
ethernet: stmicro: stmmac: dwmac-socfpga: add missing of_node_put after calling of_parse_phandle
ethernet: renesas: sh_eth: add missing of_node_put after calling of_parse_phandle
ethernet: renesas: ravb_main: add missing of_node_put after calling of_parse_phandle
ethernet: marvell: pxa168_eth: add missing of_node_put after calling of_parse_phandle
ethernet: marvell: mvpp2: add missing of_node_put after calling of_parse_phandle
ethernet: marvell: mvneta: add missing of_node_put after calling of_parse_phandle
ethernet: hisilicon: hns: hns_dsaf_main: add missing of_node_put after calling of_parse_phandle
ethernet: hisilicon: hns: hns_dsaf_mac: add missing of_node_put after calling of_parse_phandle
ethernet: cavium: octeon: add missing of_node_put after calling of_parse_phandle
ethernet: aurora: nb8800: add missing of_node_put after calling of_parse_phandle
ethernet: arc: emac_main: add missing of_node_put after calling of_parse_phandle
ethernet: apm: xgene: add missing of_node_put after calling of_parse_phandle
ethernet: altera: add missing of_node_put
8139too: fix system hang when there is a tx timeout event.
qed: Fix error return code in qed_resc_alloc()
net: qlcnic: avoid superfluous assignement
dsa: b53: remove redundant if
...

Linus Torvalds
2016-08-03 19:26:11 +0800
c37a54ac3 mac80211: mesh: flush stations before beacons are stopped ... Browse Code »

Some drivers (e.g. wl18xx) expect that the last stage in the
de-initialization process will be stopping the beacons, similar to AP flow.
Update ieee80211_stop_mesh() flow accordingly.
As peers can be removed dynamically, this would not impact other drivers.

Tested also on Ralink RT3572 chipset.

Signed-off-by: Maital Hahn
Signed-off-by: Yaniv Machani
Signed-off-by: Johannes Berg

Maital Hahn
2016-08-03 14:45:15 +0800
72b5ac54d Merge tag 'ceph-for-4.8-rc1' of git://github.com/ceph/ceph-client ... Browse Code »

Pull Ceph updates from Ilya Dryomov:
"The highlights are:

- RADOS namespace support in libceph and CephFS (Zheng Yan and
myself). The stopgaps added in 4.5 to deny access to inodes in
namespaces are removed and CEPH_FEATURE_FS_FILE_LAYOUT_V2 feature
bit is now fully supported

- A large rework of the MDS cap flushing code (Zheng Yan)

- Handle some of ->d_revalidate() in RCU mode (Jeff Layton). We were
overly pessimistic before, bailing at the first sight of LOOKUP_RCU

On top of that we've got a few CephFS bug fixes, a couple of cleanups
and Arnd's workaround for a weird genksyms issue"

* tag 'ceph-for-4.8-rc1' of git://github.com/ceph/ceph-client: (34 commits)
ceph: fix symbol versioning for ceph_monc_do_statfs
ceph: Correctly return NXIO errors from ceph_llseek
ceph: Mark the file cache as unreclaimable
ceph: optimize cap flush waiting
ceph: cleanup ceph_flush_snaps()
ceph: kick cap flushes before sending other cap message
ceph: introduce an inode flag to indicates if snapflush is needed
ceph: avoid sending duplicated cap flush message
ceph: unify cap flush and snapcap flush
ceph: use list instead of rbtree to track cap flushes
ceph: update types of some local varibles
ceph: include 'follows' of pending snapflush in cap reconnect message
ceph: update cap reconnect message to version 3
ceph: mount non-default filesystem by name
libceph: fsmap.user subscription support
ceph: handle LOOKUP_RCU in ceph_d_revalidate
ceph: allow dentry_lease_is_valid to work under RCU walk
ceph: clear d_fsinfo pointer under d_lock
ceph: remove ceph_mdsc_lease_release
ceph: don't use ->d_time
...

Linus Torvalds
2016-08-03 07:39:09 +0800

02 Aug, 2016

2 commits

4e3f21bc7 mac80211: fix check for buffered powersave frames with txq ... Browse Code »

The logic was inverted here, set the bit if frames are pending.

Fixes: ba8c3d6f16a1 ("mac80211: add an intermediate software queue implementation")
Signed-off-by: Felix Fietkau
Signed-off-by: Johannes Berg

Felix Fietkau
2016-08-02 15:50:26 +0800
680682d4d cfg80211: fix missing break in NL8211_CHAN_WIDTH_80P80 case ... Browse Code »

The switch on chandef->width is missing a break on the
NL8211_CHAN_WIDTH_80P80 case; currently we get a WARN_ON when
center_freq2 is non-zero because of the missing break.

Signed-off-by: Colin Ian King
Signed-off-by: Johannes Berg

Colin Ian King
2016-08-02 15:50:25 +0800