Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

28 Jan, 2015

1 commit

06539d307 net: don't OOPS on socket aio ... Browse Code »
3

Signed-off-by: Christoph Hellwig
Signed-off-by: David S. Miller

Christoph Hellwig
2015-01-28 04:25:33 +0800

27 Jan, 2015

9 commits

bf693f7be Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec ... Browse Code »

Steffen Klassert says:

====================
ipsec 2015-01-26

Just two small fixes for _decode_session6() where we
might decode to wrong header information in some rare
situations.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller

David S. Miller
2015-01-27 16:28:38 +0800
6e9e16e61 ipv6: replacing a rt6_info needs to purge possible propagated rt6_infos too ... Browse Code »
3

Lubomir Rintel reported that during replacing a route the interface
reference counter isn't correctly decremented.

To quote bug :
| [root@rhel7-5 lkundrak]# sh -x lal
| + ip link add dev0 type dummy
| + ip link set dev0 up
| + ip link add dev1 type dummy
| + ip link set dev1 up
| + ip addr add 2001:db8:8086::2/64 dev dev0
| + ip route add 2001:db8:8086::/48 dev dev0 proto static metric 20
| + ip route add 2001:db8:8088::/48 dev dev1 proto static metric 10
| + ip route replace 2001:db8:8086::/48 dev dev1 proto static metric 20
| + ip link del dev0 type dummy
| Message from syslogd@rhel7-5 at Jan 23 10:54:41 ...
| kernel:unregister_netdevice: waiting for dev0 to become free. Usage count = 2
|
| Message from syslogd@rhel7-5 at Jan 23 10:54:51 ...
| kernel:unregister_netdevice: waiting for dev0 to become free. Usage count = 2

During replacement of a rt6_info we must walk all parent nodes and check
if the to be replaced rt6_info got propagated. If so, replace it with
an alive one.

Fixes: 4a287eba2de3957 ("IPv6 routing, NLM_F_* flag support: REPLACE and EXCL flags support, warn about missing CREATE flag")
Reported-by: Lubomir Rintel
Signed-off-by: Hannes Frederic Sowa
Tested-by: Lubomir Rintel
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2015-01-27 16:22:14 +0800
fc752f1f4 ping: Fix race in free in receive path ... Browse Code »
3

An exception is seen in ICMP ping receive path where the skb
destructor sock_rfree() tries to access a freed socket. This happens
because ping_rcv() releases socket reference with sock_put() and this
internally frees up the socket. Later icmp_rcv() will try to free the
skb and as part of this, skb destructor is called and which leads
to a kernel panic as the socket is freed already in ping_rcv().

-->|exception
-007|sk_mem_uncharge
-007|sock_rfree
-008|skb_release_head_state
-009|skb_release_all
-009|__kfree_skb
-010|kfree_skb
-011|icmp_rcv
-012|ip_local_deliver_finish

Fix this incorrect free by cloning this skb and processing this cloned
skb instead.

This patch was suggested by Eric Dumazet

Signed-off-by: Subash Abhinov Kasiviswanathan
Cc: Eric Dumazet
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

subashab@codeaurora.org
2015-01-27 16:04:53 +0800
86f3cddbc udp_diag: Fix socket skipping within chain ... Browse Code »
3

While working on rhashtable walking I noticed that the UDP diag
dumping code is buggy. In particular, the socket skipping within
a chain never happens, even though we record the number of sockets
that should be skipped.

As this code was supposedly copied from TCP, this patch does what
TCP does and resets num before we walk a chain.

Signed-off-by: Herbert Xu
Acked-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Herbert Xu
2015-01-27 16:02:41 +0800
7d63585bf Merge tag 'mac80211-for-davem-2015-01-23' of git://git.kernel.org/pub/scm/linux/… ... Browse Code »

…kernel/git/jberg/mac80211

Another set of last-minute fixes:
* fix station double-removal when suspending while associating
* fix the HT (802.11n) header length calculation
* fix the CCK radiotap flag used for monitoring, a pretty
old regression but a simple one-liner
* fix per-station group-key handling

Signed-off-by: David S. Miller <davem@davemloft.net>

David S. Miller
2015-01-27 09:32:24 +0800
df4d92549 ipv4: try to cache dst_entries which would cause a redirect ... Browse Code »
3

Not caching dst_entries which cause redirects could be exploited by hosts
on the same subnet, causing a severe DoS attack. This effect aggravated
since commit f88649721268999 ("ipv4: fix dst race in sk_dst_get()").

Lookups causing redirects will be allocated with DST_NOCACHE set which
will force dst_release to free them via RCU. Unfortunately waiting for
RCU grace period just takes too long, we can end up with >1M dst_entries
waiting to be released and the system will run OOM. rcuos threads cannot
catch up under high softirq load.

Attaching the flag to emit a redirect later on to the specific skb allows
us to cache those dst_entries thus reducing the pressure on allocation
and deallocation.

This issue was discovered by Marcelo Leitner.

Cc: Julian Anastasov
Signed-off-by: Marcelo Leitner
Signed-off-by: Florian Westphal
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: Julian Anastasov
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2015-01-27 09:28:27 +0800
600ddd682 net: sctp: fix slab corruption from use after free on INIT collisions ... Browse Code »
3

When hitting an INIT collision case during the 4WHS with AUTH enabled, as
already described in detail in commit 1be9a950c646 ("net: sctp: inherit
auth_capable on INIT collisions"), it can happen that we occasionally
still remotely trigger the following panic on server side which seems to
have been uncovered after the fix from commit 1be9a950c646 ...

[ 533.876389] BUG: unable to handle kernel paging request at 00000000ffffffff
[ 533.913657] IP: [] __kmalloc+0x95/0x230
[ 533.940559] PGD 5030f2067 PUD 0
[ 533.957104] Oops: 0000 [#1] SMP
[ 533.974283] Modules linked in: sctp mlx4_en [...]
[ 534.939704] Call Trace:
[ 534.951833] [] ? crypto_init_shash_ops+0x60/0xf0
[ 534.984213] [] crypto_init_shash_ops+0x60/0xf0
[ 535.015025] [] __crypto_alloc_tfm+0x6d/0x170
[ 535.045661] [] crypto_alloc_base+0x4c/0xb0
[ 535.074593] [] ? _raw_spin_lock_bh+0x12/0x50
[ 535.105239] [] sctp_inet_listen+0x161/0x1e0 [sctp]
[ 535.138606] [] SyS_listen+0x9d/0xb0
[ 535.166848] [] system_call_fastpath+0x16/0x1b

... or depending on the the application, for example this one:

[ 1370.026490] BUG: unable to handle kernel paging request at 00000000ffffffff
[ 1370.026506] IP: [] kmem_cache_alloc+0x75/0x1d0
[ 1370.054568] PGD 633c94067 PUD 0
[ 1370.070446] Oops: 0000 [#1] SMP
[ 1370.085010] Modules linked in: sctp kvm_amd kvm [...]
[ 1370.963431] Call Trace:
[ 1370.974632] [] ? SyS_epoll_ctl+0x53f/0x960
[ 1371.000863] [] SyS_epoll_ctl+0x53f/0x960
[ 1371.027154] [] ? anon_inode_getfile+0xd3/0x170
[ 1371.054679] [] ? __alloc_fd+0xa7/0x130
[ 1371.080183] [] system_call_fastpath+0x16/0x1b

With slab debugging enabled, we can see that the poison has been overwritten:

[ 669.826368] BUG kmalloc-128 (Tainted: G W ): Poison overwritten
[ 669.826385] INFO: 0xffff880228b32e50-0xffff880228b32e50. First byte 0x6a instead of 0x6b
[ 669.826414] INFO: Allocated in sctp_auth_create_key+0x23/0x50 [sctp] age=3 cpu=0 pid=18494
[ 669.826424] __slab_alloc+0x4bf/0x566
[ 669.826433] __kmalloc+0x280/0x310
[ 669.826453] sctp_auth_create_key+0x23/0x50 [sctp]
[ 669.826471] sctp_auth_asoc_create_secret+0xcb/0x1e0 [sctp]
[ 669.826488] sctp_auth_asoc_init_active_key+0x68/0xa0 [sctp]
[ 669.826505] sctp_do_sm+0x29d/0x17c0 [sctp] [...]
[ 669.826629] INFO: Freed in kzfree+0x31/0x40 age=1 cpu=0 pid=18494
[ 669.826635] __slab_free+0x39/0x2a8
[ 669.826643] kfree+0x1d6/0x230
[ 669.826650] kzfree+0x31/0x40
[ 669.826666] sctp_auth_key_put+0x19/0x20 [sctp]
[ 669.826681] sctp_assoc_update+0x1ee/0x2d0 [sctp]
[ 669.826695] sctp_do_sm+0x674/0x17c0 [sctp]

Since this only triggers in some collision-cases with AUTH, the problem at
heart is that sctp_auth_key_put() on asoc->asoc_shared_key is called twice
when having refcnt 1, once directly in sctp_assoc_update() and yet again
from within sctp_auth_asoc_init_active_key() via sctp_assoc_update() on
the already kzfree'd memory, which is also consistent with the observation
of the poison decrease from 0x6b to 0x6a (note: the overwrite is detected
at a later point in time when poison is checked on new allocation).

Reference counting of auth keys revisited:

Shared keys for AUTH chunks are being stored in endpoints and associations
in endpoint_shared_keys list. On endpoint creation, a null key is being
added; on association creation, all endpoint shared keys are being cached
and thus cloned over to the association. struct sctp_shared_key only holds
a pointer to the actual key bytes, that is, struct sctp_auth_bytes which
keeps track of users internally through refcounting. Naturally, on assoc
or enpoint destruction, sctp_shared_key are being destroyed directly and
the reference on sctp_auth_bytes dropped.

User space can add keys to either list via setsockopt(2) through struct
sctp_authkey and by passing that to sctp_auth_set_key() which replaces or
adds a new auth key. There, sctp_auth_create_key() creates a new sctp_auth_bytes
with refcount 1 and in case of replacement drops the reference on the old
sctp_auth_bytes. A key can be set active from user space through setsockopt()
on the id via sctp_auth_set_active_key(), which iterates through either
endpoint_shared_keys and in case of an assoc, invokes (one of various places)
sctp_auth_asoc_init_active_key().

sctp_auth_asoc_init_active_key() computes the actual secret from local's
and peer's random, hmac and shared key parameters and returns a new key
directly as sctp_auth_bytes, that is asoc->asoc_shared_key, plus drops
the reference if there was a previous one. The secret, which where we
eventually double drop the ref comes from sctp_auth_asoc_set_secret() with
intitial refcount of 1, which also stays unchanged eventually in
sctp_assoc_update(). This key is later being used for crypto layer to
set the key for the hash in crypto_hash_setkey() from sctp_auth_calculate_hmac().

To close the loop: asoc->asoc_shared_key is freshly allocated secret
material and independant of the sctp_shared_key management keeping track
of only shared keys in endpoints and assocs. Hence, also commit 4184b2a79a76
("net: sctp: fix memory leak in auth key management") is independant of
this bug here since it concerns a different layer (though same structures
being used eventually). asoc->asoc_shared_key is reference dropped correctly
on assoc destruction in sctp_association_free() and when active keys are
being replaced in sctp_auth_asoc_init_active_key(), it always has a refcount
of 1. Hence, it's freed prematurely in sctp_assoc_update(). Simple fix is
to remove that sctp_auth_key_put() from there which fixes these panics.

Fixes: 730fc3d05cd4 ("[SCTP]: Implete SCTP-AUTH parameter processing")
Signed-off-by: Daniel Borkmann
Acked-by: Vlad Yasevich
Acked-by: Neil Horman
Signed-off-by: David S. Miller

Daniel Borkmann
2015-01-27 09:02:05 +0800
3f2ab1359 net: cls_bpf: fix auto generation of per list handles ... Browse Code »

When creating a bpf classifier in tc with priority collisions and
invoking automatic unique handle assignment, cls_bpf_grab_new_handle()
will return a wrong handle id which in fact is non-unique. Usually
altering of specific filters is being addressed over major id, but
in case of collisions we result in a filter chain, where handle ids
address individual cls_bpf_progs inside the classifier.

Issue is, in cls_bpf_grab_new_handle() we probe for head->hgen handle
in cls_bpf_get() and in case we found a free handle, we're supposed
to use exactly head->hgen. In case of insufficient numbers of handles,
we bail out later as handle id 0 is not allowed.

Fixes: 7d1d65cb84e1 ("net: sched: cls_bpf: add BPF-based classifier")
Signed-off-by: Daniel Borkmann
Acked-by: Jiri Pirko
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Daniel Borkmann
2015-01-27 07:50:19 +0800
7913ecf69 net: cls_bpf: fix size mismatch on filter preparation ... Browse Code »

In cls_bpf_modify_existing(), we read out the number of filter blocks,
do some sanity checks, allocate a block on that size, and copy over the
BPF instruction blob from user space, then pass everything through the
classic BPF checker prior to installation of the classifier.

We should reject mismatches here, there are 2 scenarios: the number of
filter blocks could be smaller than the provided instruction blob, so
we do a partial copy of the BPF program, and thus the instructions will
either be rejected from the verifier or a valid BPF program will be run;
in the other case, we'll end up copying more than we're supposed to,
and most likely the trailing garbage will be rejected by the verifier
as well (i.e. we need to fit instruction pattern, ret {A,K} needs to be
last instruction, load/stores must be correct, etc); in case not, we
would leak memory when dumping back instruction patterns. The code should
have only used nla_len() as Dave noted to avoid this from the beginning.
Anyway, lets fix it by rejecting such load attempts.

Fixes: 7d1d65cb84e1 ("net: sched: cls_bpf: add BPF-based classifier")
Signed-off-by: Daniel Borkmann
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller

Daniel Borkmann
2015-01-27 07:50:18 +0800

26 Jan, 2015

2 commits

b0a1ba599 ipv6: Fix __ip6_route_redirect ... Browse Code »

In my last commit (a3c00e4: ipv6: Remove BACKTRACK macro), the changes in
__ip6_route_redirect is incorrect. The following case is missed:
1. The for loop tries to find a valid gateway rt. If it fails to find
one, rt will be NULL.
2. When rt is NULL, it is set to the ip6_null_entry.
3. The newly added 'else if', from a3c00e4, will stop the backtrack from
happening.

Signed-off-by: Martin KaFai Lau
Signed-off-by: David S. Miller

Martin KaFai Lau
2015-01-26 14:09:51 +0800
24df8986f net: dsa: set slave MII bus PHY mask ... Browse Code »

When registering a mdio bus, Linux assumes than every port has a PHY and tries
to scan it. If a switch port has no PHY registered, DSA will fail to register
the slave MII bus. To fix this, set the slave MII bus PHY mask to the switch
PHYs mask.

As an example, if we use a Marvell MV88E6352 (which is a 7-port switch with no
registered PHYs for port 5 and port 6), with the following declared names:

static struct dsa_chip_data switch_cdata = {
[...]
.port_names[0] = "sw0",
.port_names[1] = "sw1",
.port_names[2] = "sw2",
.port_names[3] = "sw3",
.port_names[4] = "sw4",
.port_names[5] = "cpu",
};

DSA will fail to create the switch instance. With the PHY mask set for the
slave MII bus, only the PHY for ports 0-4 will be scanned and the instance will
be successfully created.

Signed-off-by: Vivien Didelot
Tested-by: Florian Fainelli
Acked-by: Florian Fainelli
Signed-off-by: David S. Miller

Vivien Didelot
2015-01-26 08:00:54 +0800

25 Jan, 2015

1 commit

6b8d9117c net: llc: use correct size for sysctl timeout entries ... Browse Code »
2

The timeout entries are sizeof(int) rather than sizeof(long), which
means that when they were getting read we'd also leak kernel memory
to userspace along with the timeout values.

Signed-off-by: Sasha Levin
Signed-off-by: David S. Miller

Sasha Levin
2015-01-25 16:23:21 +0800

23 Jan, 2015

4 commits

0fa7b3913 nl80211: fix per-station group key get/del and memory leak ... Browse Code »
3

In case userspace attempts to obtain key information for or delete a
unicast key, this is currently erroneously rejected unless the driver
sets the WIPHY_FLAG_IBSS_RSN flag. Apparently enough drivers do so it
was never noticed.

Fix that, and while at it fix a potential memory leak: the error path
in the get_key() function was placed after allocating a message but
didn't free it - move it to a better place. Luckily admin permissions
are needed to call this operation.

Cc: stable@vger.kernel.org
Fixes: e31b82136d1ad ("cfg80211/mac80211: allow per-station GTKs")
Signed-off-by: Johannes Berg

Johannes Berg
2015-01-23 18:21:02 +0800
3a5c5e81d mac80211: properly set CCK flag in radiotap ... Browse Code »
3

Fix a regression introduced by commit a5e70697d0c4 ("mac80211: add radiotap flag
and handling for 5/10 MHz") where the IEEE80211_CHAN_CCK channel type flag was
incorrectly replaced by the IEEE80211_CHAN_OFDM flag. This commit fixes that by
using the CCK flag again.

Cc: stable@vger.kernel.org
Fixes: a5e70697d0c4 ("mac80211: add radiotap flag and handling for 5/10 MHz")
Signed-off-by: Mathy Vanhoef
Signed-off-by: Johannes Berg

Mathy Vanhoef
2015-01-23 17:53:58 +0800
fb142f4bb mac80211: correct header length calculation ... Browse Code »

HT Control field may also be present in management frames, as defined
in 8.2.4.1.10 of 802.11-2012. Account for this in calculation of header
length.

Signed-off-by: Fred Chou
Signed-off-by: Johannes Berg

Fred Chou
2015-01-23 17:52:48 +0800
2af81d671 mac80211: only roll back station states for WDS when suspending ... Browse Code »

In normal cases (i.e. when we are fully associated), cfg80211 takes
care of removing all the stations before calling suspend in mac80211.

But in the corner case when we suspend during authentication or
association, mac80211 needs to roll back the station states. But we
shouldn't roll back the station states in the suspend function,
because this is taken care of in other parts of the code, except for
WDS interfaces. For AP types of interfaces, cfg80211 takes care of
disconnecting all stations before calling the driver's suspend code.
For station interfaces, this is done in the quiesce code.

For WDS interfaces we still need to do it here, so move the code into
a new switch case for WDS.

Cc: stable@kernel.org [3.15+]
Signed-off-by: Luciano Coelho
Signed-off-by: Johannes Berg

Luciano Coelho
2015-01-23 17:47:40 +0800

20 Jan, 2015

1 commit

9d289715e ipv6: stop sending PTB packets for MTU < 1280 ... Browse Code »
3

Reduce the attack vector and stop generating IPv6 Fragment Header for
paths with an MTU smaller than the minimum required IPv6 MTU
size (1280 byte) - called atomic fragments.

See IETF I-D "Deprecating the Generation of IPv6 Atomic Fragments" [1]
for more information and how this "feature" can be misused.

[1] https://tools.ietf.org/html/draft-ietf-6man-deprecate-atomfrag-generation-00

Signed-off-by: Fernando Gont
Signed-off-by: Hagen Paul Pfeifer
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Hagen Paul Pfeifer
2015-01-20 03:52:07 +0800

18 Jan, 2015

1 commit

2061dcd6b net: sctp: fix race for one-to-many sockets in sendmsg's auto associate ... Browse Code »

I.e. one-to-many sockets in SCTP are not required to explicitly
call into connect(2) or sctp_connectx(2) prior to data exchange.
Instead, they can directly invoke sendmsg(2) and the SCTP stack
will automatically trigger connection establishment through 4WHS
via sctp_primitive_ASSOCIATE(). However, this in its current
implementation is racy: INIT is being sent out immediately (as
it cannot be bundled anyway) and the rest of the DATA chunks are
queued up for later xmit when connection is established, meaning
sendmsg(2) will return successfully. This behaviour can result
in an undesired side-effect that the kernel made the application
think the data has already been transmitted, although none of it
has actually left the machine, worst case even after close(2)'ing
the socket.

Instead, when the association from client side has been shut down
e.g. first gracefully through SCTP_EOF and then close(2), the
client could afterwards still receive the server's INIT_ACK due
to a connection with higher latency. This INIT_ACK is then considered
out of the blue and hence responded with ABORT as there was no
alive assoc found anymore. This can be easily reproduced f.e.
with sctp_test application from lksctp. One way to fix this race
is to wait for the handshake to actually complete.

The fix defers waiting after sctp_primitive_ASSOCIATE() and
sctp_primitive_SEND() succeeded, so that DATA chunks cooked up
from sctp_sendmsg() have already been placed into the output
queue through the side-effect interpreter, and therefore can then
be bundeled together with COOKIE_ECHO control chunks.

strace from example application (shortened):

socket(PF_INET, SOCK_SEQPACKET, IPPROTO_SCTP) = 3
sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5
sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5
sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5
sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5
sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
msg_iov(0)=[], msg_controllen=48, {cmsg_len=48, cmsg_level=0x84 /* SOL_??? */, cmsg_type=, ...},
msg_flags=0}, 0) = 0 // graceful shutdown for SOCK_SEQPACKET via SCTP_EOF
close(3) = 0

tcpdump before patch (fooling the application):

22:33:36.306142 IP 192.168.1.114.41462 > 192.168.1.115.8888: sctp (1) [INIT] [init tag: 3879023686] [rwnd: 106496] [OS: 10] [MIS: 65535] [init TSN: 3139201684]
22:33:36.316619 IP 192.168.1.115.8888 > 192.168.1.114.41462: sctp (1) [INIT ACK] [init tag: 3345394793] [rwnd: 106496] [OS: 10] [MIS: 10] [init TSN: 3380109591]
22:33:36.317600 IP 192.168.1.114.41462 > 192.168.1.115.8888: sctp (1) [ABORT]

tcpdump after patch:

14:28:58.884116 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [INIT] [init tag: 438593213] [rwnd: 106496] [OS: 10] [MIS: 65535] [init TSN: 3092969729]
14:28:58.888414 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [INIT ACK] [init tag: 381429855] [rwnd: 106496] [OS: 10] [MIS: 10] [init TSN: 2141904492]
14:28:58.888638 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [COOKIE ECHO] , (2) [DATA] (B)(E) [TSN: 3092969729] [...]
14:28:58.893278 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [COOKIE ACK] , (2) [SACK] [cum ack 3092969729] [a_rwnd 106491] [#gap acks 0] [#dup tsns 0]
14:28:58.893591 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [DATA] (B)(E) [TSN: 3092969730] [...]
14:28:59.096963 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [SACK] [cum ack 3092969730] [a_rwnd 106496] [#gap acks 0] [#dup tsns 0]
14:28:59.097086 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [DATA] (B)(E) [TSN: 3092969731] [...] , (2) [DATA] (B)(E) [TSN: 3092969732] [...]
14:28:59.103218 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [SACK] [cum ack 3092969732] [a_rwnd 106486] [#gap acks 0] [#dup tsns 0]
14:28:59.103330 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [SHUTDOWN]
14:28:59.107793 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [SHUTDOWN ACK]
14:28:59.107890 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [SHUTDOWN COMPLETE]

Looks like this bug is from the pre-git history museum. ;)

Fixes: 08707d5482df ("lksctp-2_5_31-0_5_1.patch")
Signed-off-by: Daniel Borkmann
Acked-by: Vlad Yasevich
Signed-off-by: David S. Miller

Daniel Borkmann
2015-01-18 12:52:20 +0800

17 Jan, 2015

2 commits

ee1c24421 genetlink: synchronize socket closing and family removal ... Browse Code »

In addition to the problem Jeff Layton reported, I looked at the code
and reproduced the same warning by subscribing and removing the genl
family with a socket still open. This is a fairly tricky race which
originates in the fact that generic netlink allows the family to go
away while sockets are still open - unlike regular netlink which has
a module refcount for every open socket so in general this cannot be
triggered.

Trying to resolve this issue by the obvious locking isn't possible as
it will result in deadlocks between unregistration and group unbind
notification (which incidentally lockdep doesn't find due to the home
grown locking in the netlink table.)

To really resolve this, introduce a "closing socket" reference counter
(for generic netlink only, as it's the only affected family) in the
core netlink code and use that in generic netlink to wait for all the
sockets that are being closed at the same time as a generic netlink
family is removed.

This fixes the race that when a socket is closed, it will should call
the unbind, but if the family is removed at the same time the unbind
will not find it, leading to the warning. The real problem though is
that in this case the unbind could actually find a new family that is
registered to have a multicast group with the same ID, and call its
mcast_unbind() leading to confusing.

Also remove the warning since it would still trigger, but is now no
longer a problem.

This also moves the code in af_netlink.c to before unreferencing the
module to avoid having the same problem in the normal non-genl case.

Signed-off-by: Johannes Berg
Signed-off-by: David S. Miller

Johannes Berg
2015-01-17 06:04:25 +0800
5ad630052 genetlink: disallow subscribing to unknown mcast groups ... Browse Code »

Jeff Layton reported that he could trigger the multicast unbind warning
in generic netlink using trinity. I originally thought it was a race
condition between unregistering the generic netlink family and closing
the socket, but there's a far simpler explanation: genetlink currently
allows subscribing to groups that don't (yet) exist, and the warning is
triggered when unsubscribing again while the group still doesn't exist.

Originally, I had a warning in the subscribe case and accepted it out of
userspace API concerns, but the warning was of course wrong and removed
later.

However, I now think that allowing userspace to subscribe to groups that
don't exist is wrong and could possibly become a security problem:
Consider a (new) genetlink family implementing a permission check in
the mcast_bind() function similar to the like the audit code does today;
it would be possible to bypass the permission check by guessing the ID
and subscribing to the group it exists. This is only possible in case a
family like that would be dynamically loaded, but it doesn't seem like a
huge stretch, for example wireless may be loaded when you plug in a USB
device.

To avoid this reject such subscription attempts.

If this ends up causing userspace issues we may need to add a workaround
in af_netlink to deny such requests but not return an error.

Reported-by: Jeff Layton
Signed-off-by: Johannes Berg
Signed-off-by: David S. Miller

Johannes Berg
2015-01-17 06:04:24 +0800

16 Jan, 2015

3 commits

ac64da0b8 net: rps: fix cpu unplug ... Browse Code »
3

softnet_data.input_pkt_queue is protected by a spinlock that
we must hold when transferring packets from victim queue to an active
one. This is because other cpus could still be trying to enqueue packets
into victim queue.

A second problem is that when we transfert the NAPI poll_list from
victim to current cpu, we absolutely need to special case the percpu
backlog, because we do not want to add complex locking to protect
process_queue : Only owner cpu is allowed to manipulate it, unless cpu
is offline.

Based on initial patch from Prasad Sodagudi & Subash Abhinov
Kasiviswanathan.

This version is better because we do not slow down packet processing,
only make migration safer.

Reported-by: Prasad Sodagudi
Reported-by: Subash Abhinov Kasiviswanathan
Signed-off-by: Eric Dumazet
Cc: Tom Herbert
Signed-off-by: David S. Miller

Eric Dumazet
2015-01-16 14:02:42 +0800
f812116b1 ip: zero sockaddr returned on error queue ... Browse Code »
3

The sockaddr is returned in IP(V6)_RECVERR as part of errhdr. That
structure is defined and allocated on the stack as

struct {
struct sock_extended_err ee;
struct sockaddr_in(6) offender;
} errhdr;

The second part is only initialized for certain SO_EE_ORIGIN values.
Always initialize it completely.

An MTU exceeded error on a SOCK_RAW/IPPROTO_RAW is one example that
would return uninitialized bytes.

Signed-off-by: Willem de Bruijn

----

Also verified that there is no padding between errhdr.ee and
errhdr.offender that could leak additional kernel data.
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Willem de Bruijn
2015-01-16 08:41:16 +0800
aaef66b83 Merge tag 'mac80211-for-davem-2015-01-15' of git://git.kernel.org/pub/scm/linux/… ... Browse Code »

…kernel/git/jberg/mac80211

Just two fixes - one for an uninialized variable and
one for a deadlock in regulatory processing.

Signed-off-by: David S. Miller <davem@davemloft.net>

David S. Miller
2015-01-16 08:28:36 +0800

15 Jan, 2015

3 commits

a6391a924 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:

1) Don't use uninitialized data in IPVS, from Dan Carpenter.

2) conntrack race fixes from Pablo Neira Ayuso.

3) Fix TX hangs with i40e, from Jesse Brandeburg.

4) Fix budget return from poll calls in dnet and alx, from Eric
Dumazet.

5) Fix bugus "if (unlikely(x) < 0)" test in AF_PACKET, from Christoph
Jaeger.

6) Fix bug introduced by conversion to list_head in TIPC retransmit
code, from Jon Paul Maloy.

7) Don't use GFP_NOIO under spinlock in USB kaweth driver, from Alexey
Khoroshilov.

8) Fix bridge build with INET disabled, from Arnd Bergmann.

9) Fix netlink array overrun for PROBE attributes in openvswitch, from
Thomas Graf.

10) Don't hold spinlock across synchronize_irq() in tg3 driver, from
Prashant Sreedharan.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
tg3: Release tp->lock before invoking synchronize_irq()
tg3: tg3_reset_task() needs to use rtnl_lock to synchronize
tg3: tg3_timer() should grab tp->lock before checking for tp->irq_sync
team: avoid possible underflow of count_pending value for notify_peers and mcast_rejoin
openvswitch: packet messages need their own probe attribtue
i40e: adds FCoE configure option
cxgb4vf: Fix queue allocation for 40G adapter
netdevice: Add missing parentheses in macro
bridge: only provide proxy ARP when CONFIG_INET is enabled
neighbour: fix base_reachable_time(_ms) not effective immediatly when changed
net: fec: fix MDIO bus assignement for dual fec SoC's
xen-netfront: use different locks for Rx and Tx stats
drivers: net: cpsw: fix multicast flush in dual emac mode
cxgb4vf: Initialize mdio_addr before using it
net: Corrected the comment describing the ndo operations to reflect the actual prototype for couple of operations
usb/kaweth: use GFP_ATOMIC under spin_lock in usb_start_wait_urb()
MAINTAINERS: add me as ibmveth maintainer
tipc: fix bug in broadcast retransmit code
update ip-sysctl.txt documentation (v2)
net/at91_ether: prepare and unprepare clock
...

Linus Torvalds
2015-01-15 06:17:37 +0800
1ba398041 openvswitch: packet messages need their own probe attribtue ... Browse Code »

User space is currently sending a OVS_FLOW_ATTR_PROBE for both flow
and packet messages. This leads to an out-of-bounds access in
ovs_packet_cmd_execute() because OVS_FLOW_ATTR_PROBE >
OVS_PACKET_ATTR_MAX.

Introduce a new OVS_PACKET_ATTR_PROBE with the same numeric value
as OVS_FLOW_ATTR_PROBE to grow the range of accepted packet attributes
while maintaining to be binary compatible with existing OVS binaries.

Fixes: 05da589 ("openvswitch: Add support for OVS_FLOW_ATTR_PROBE.")
Reported-by: Sander Eikelenboom
Tracked-down-by: Florian Westphal
Signed-off-by: Thomas Graf
Reviewed-by: Jesse Gross
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Thomas Graf
2015-01-15 05:49:44 +0800
d92cfdbbe bridge: only provide proxy ARP when CONFIG_INET is enabled ... Browse Code »

When IPV4 support is disabled, we cannot call arp_send from
the bridge code, which would result in a kernel link error:

net/built-in.o: In function `br_handle_frame_finish':
:(.text+0x59914): undefined reference to `arp_send'
:(.text+0x59a50): undefined reference to `arp_tbl'

This makes the newly added proxy ARP support in the bridge
code depend on the CONFIG_INET symbol and lets the compiler
optimize the code out to avoid the link error.

Signed-off-by: Arnd Bergmann
Fixes: 958501163ddd ("bridge: Add support for IEEE 802.11 Proxy ARP")
Cc: Kyeyoon Park
Signed-off-by: David S. Miller

Arnd Bergmann
2015-01-15 04:08:02 +0800

14 Jan, 2015

1 commit

4bf6980dd neighbour: fix base_reachable_time(_ms) not effective immediatly when changed ... Browse Code »

When setting base_reachable_time or base_reachable_time_ms on a
specific interface through sysctl or netlink, the reachable_time
value is not updated.

This means that neighbour entries will continue to be updated using the
old value until it is recomputed in neigh_period_work (which
recomputes the value every 300*HZ).
On systems with HZ equal to 1000 for instance, it means 5mins before
the change is effective.

This patch changes this behavior by recomputing reachable_time after
each set on base_reachable_time or base_reachable_time_ms.
The new value will become effective the next time the neighbour's timer
is triggered.

Changes are made in two places: the netlink code for set and the sysctl
handling code. For sysctl, I use a proc_handler. The ipv6 network
code does provide its own handler but it already refreshes
reachable_time correctly so it's not an issue.
Any other user of neighbour which provide its own handlers must
refresh reachable_time.

Signed-off-by: Jean-Francois Remy
Signed-off-by: David S. Miller

Jean-Francois Remy
2015-01-14 13:28:00 +0800

13 Jan, 2015

1 commit

164167794 tipc: fix bug in broadcast retransmit code ... Browse Code »

In commit 58dc55f25631178ee74cd27185956a8f7dcb3e32 ("tipc: use generic
SKB list APIs to manage link transmission queue") we replace all list
traversal loops with the macros skb_queue_walk() or
skb_queue_walk_safe(). While the previous loops were based on the
assumption that the list was NULL-terminated, the standard macros
stop when the iterator reaches the list head, which is non-NULL.

In the function bclink_retransmit_pkt() this macro replacement has
lead to a bug. When we receive a BCAST STATE_MSG we unconditionally
call the function bclink_retransmit_pkt(), whether there really is
anything to retransmit or not, assuming that the sequence number
comparisons will lead to the correct behavior. However, if the
transmission queue is empty, or if there are no eligible buffers in
the transmission queue, we will by mistake pass the list head pointer
to the function tipc_link_retransmit(). Since the list head is not a
valid sk_buff, this leads to a crash.

In this commit we fix this by only calling tipc_link_retransmit()
if we actually found eligible buffers in the transmission queue.

Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-01-13 05:01:59 +0800

12 Jan, 2015

2 commits

2bd822180 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf ... Browse Code »

Pablo Neira Ayuso says:

====================
netfilter/ipvs fixes for net

The following patchset contains netfilter/ipvs fixes, they are:

1) Small fix for the FTP helper in IPVS, a diff variable may be left
unset when CONFIG_IP_VS_IPV6 is set. Patch from Dan Carpenter.

2) Fix nf_tables port NAT in little endian archs, patch from leroy
christophe.

3) Fix race condition between conntrack confirmation and flush from
userspace. This is the second reincarnation to resolve this problem.

4) Make sure inner messages in the batch come with the nfnetlink header.

5) Relax strict check from nfnetlink_bind() that may break old userspace
applications using all 1s group mask.

6) Schedule removal of chains once no sets and rules refer to them in
the new nf_tables ruleset flush command. Reported by Asbjoern Sloth
Toennesen.

Note that this batch comes later than usual because of the short
winter holidays.
====================

Signed-off-by: David S. Miller

David S. Miller
2015-01-12 13:14:49 +0800
46d2cfb19 packet: bail out of packet_snd() if L2 header creation fails ... Browse Code »

Due to a misplaced parenthesis, the expression

(unlikely(offset) < 0),

which expands to

(__builtin_expect(!!(offset), 0) < 0),

never evaluates to true. Therefore, when sending packets with
PF_PACKET/SOCK_DGRAM, packet_snd() does not abort as intended
if the creation of the layer 2 header fails.

Spotted by Coverity - CID 1259975 ("Operands don't affect result").

Fixes: 9c7077622dd9 ("packet: make packet_snd fail on len smaller than l2 header")
Signed-off-by: Christoph Jaeger
Acked-by: Eric Dumazet
Acked-by: Willem de Bruijn
Acked-by: Daniel Borkmann
Signed-off-by: David S. Miller

Christoph Jaeger
2015-01-12 10:54:03 +0800

10 Jan, 2015

2 commits

dc9319f5a Merge branch 'for-3.19' of git://linux-nfs.org/~bfields/linux ... Browse Code »

Pull two nfsd bugfixes from Bruce Fields.

* 'for-3.19' of git://linux-nfs.org/~bfields/linux:
rpc: fix xdr_truncate_encode to handle buffer ending on page boundary
nfsd: fix fi_delegees leak when fi_had_conflict returns true

Linus Torvalds
2015-01-10 10:10:48 +0800
20ebb3452 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client ... Browse Code »

Pull two Ceph fixes from Sage Weil:
"These are both pretty trivial: a sparse warning fix and size_t printk
thing"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
libceph: fix sparse endianness warnings
ceph: use %zu for len in ceph_fill_inline_data()

Linus Torvalds
2015-01-10 09:55:00 +0800

09 Jan, 2015

1 commit

d7d5a007b libceph: fix sparse endianness warnings ... Browse Code »

The only real issue is the one in auth_x.c and it came with
3.19-rc1 merge.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2015-01-09 01:36:57 +0800

08 Jan, 2015

1 commit

49a068f82 rpc: fix xdr_truncate_encode to handle buffer ending on page boundary ... Browse Code »

A struct xdr_stream at a page boundary might point to the end of one
page or the beginning of the next, but xdr_truncate_encode isn't
prepared to handle the former.

This can cause corruption of NFSv4 READDIR replies in the case that a
readdir entry that would have exceeded the client's dircount/maxcount
limit would have ended exactly on a 4k page boundary. You're more
likely to hit this case on large directories.

Other xdr_truncate_encode callers are probably also affected.

Reported-by: Holger Hoffstätte
Tested-by: Holger Hoffstätte
Fixes: 3e19ce762b53 "rpc: xdr_truncate_encode"
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields

J. Bruce Fields
2015-01-08 03:03:58 +0800

07 Jan, 2015

5 commits

20658702e cfg80211: fix deadlock during reg chan check ... Browse Code »

If a P2P GO is active, the cfg80211_reg_can_beacon function will take
the wdev lock, in its call to cfg80211_go_permissive_chan. But the wdev lock
is already taken by the parent channel-checking function, causing a
deadlock.
Split the checking code into two parts. The first part will check if the
wdev is active and saves the channel under the wdev lock. The second part
will check actual channel validity according to type.

Signed-off-by: Arik Nemtsov
Reviewed-by: Ilan Peer
Reviewed-by: Emmanuel Grumbach
Signed-off-by: Johannes Berg

Arik Nemtsov
2015-01-07 21:53:46 +0800
cc72f6e22 mac80211: uninitialized return val in __ieee80211_sta_handle_tspec_ac_params ... Browse Code »

The return value should be initialized to false so that there's a
valid return value when there are no sessions that need work to be
done on them. Luckily, the side effect of using the uninitialized
value is an extra harmless driver call.

Coverity: CID 1260096
Fixes: 02219b3abca59 ("mac80211: add WMM admission control support")
Signed-off-by: John W. Linville
[extend commit message]
Signed-off-by: Johannes Berg

John Linville
2015-01-07 20:57:34 +0800
bdec41963 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:
"Just a pile of random fixes, including:

1) Do not apply TSO limits to non-TSO packets, fix from Herbert Xu.

2) MDI{,X} eeprom check in e100 driver is reversed, from John W.
Linville.

3) Missing error return assignments in several ethernet drivers, from
Julia Lawall.

4) Altera TSE device doesn't come back up after ifconfig down/up
sequence, fix from Kostya Belezko.

5) Add more cases to the check for whether the qmi_wwan device has a
bogus MAC address and needs to be assigned a random one. From
Kristian Evensen.

6) Fix interrupt hangs in CPSW, from Felipe Balbi.

7) Implement ndo_features_check in r8152 so that the stack doesn't
feed GSO packets which are outside of the chip's capabilities.
From Hayes Wang"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits)
qla3xxx: don't allow never end busy loop
xen-netback: fixing the propagation of the transmit shaper timeout
r8152: support ndo_features_check
batman-adv: fix potential TT client + orig-node memory leak
batman-adv: fix multicast counter when purging originators
batman-adv: fix counter for multicast supporting nodes
batman-adv: fix lock class for decoding hash in network-coding.c
batman-adv: fix delayed foreign originator recognition
batman-adv: fix and simplify condition when bonding should be used
Revert "mac80211: Fix accounting of the tailroom-needed counter"
net: ethernet: cpsw: fix hangs with interrupts
enic: free all rq buffs when allocation fails
qmi_wwan: Set random MAC on devices with buggy fw
openvswitch: Consistently include VLAN header in flow and port stats.
tcp: Do not apply TSO segment limit to non-TSO packets
Altera TSE: Add missing phydev
net/mlx4_core: Fix error flow in mlx4_init_hca()
net/mlx4_core: Correcly update the mtt's offset in the MR re-reg flow
qlcnic: Fix return value in qlcnic_probe()
net: axienet: fix error return code
...

Linus Torvalds
2015-01-07 09:48:14 +0800
a2f18db0c netfilter: nf_tables: fix flush ruleset chain dependencies ... Browse Code »

Jumping between chains doesn't mix well with flush ruleset. Rules
from a different chain and set elements may still refer to us.

[ 353.373791] ------------[ cut here ]------------
[ 353.373845] kernel BUG at net/netfilter/nf_tables_api.c:1159!
[ 353.373896] invalid opcode: 0000 [#1] SMP
[ 353.373942] Modules linked in: intel_powerclamp uas iwldvm iwlwifi
[ 353.374017] CPU: 0 PID: 6445 Comm: 31c3.nft Not tainted 3.18.0 #98
[ 353.374069] Hardware name: LENOVO 5129CTO/5129CTO, BIOS 6QET47WW (1.17 ) 07/14/2010
[...]
[ 353.375018] Call Trace:
[ 353.375046] [] ? nf_tables_commit+0x381/0x540
[ 353.375101] [] nfnetlink_rcv+0x3d8/0x4b0
[ 353.375150] [] netlink_unicast+0x105/0x1a0
[ 353.375200] [] netlink_sendmsg+0x32e/0x790
[ 353.375253] [] sock_sendmsg+0x8e/0xc0
[ 353.375300] [] ? move_addr_to_kernel.part.20+0x19/0x70
[ 353.375357] [] ? move_addr_to_kernel+0x19/0x30
[ 353.375410] [] ? verify_iovec+0x42/0xd0
[ 353.375459] [] ___sys_sendmsg+0x3f0/0x400
[ 353.375510] [] ? native_sched_clock+0x2a/0x90
[ 353.375563] [] ? acct_account_cputime+0x17/0x20
[ 353.375616] [] ? account_user_time+0x88/0xa0
[ 353.375667] [] __sys_sendmsg+0x3d/0x80
[ 353.375719] [] ? int_check_syscall_exit_work+0x34/0x3d
[ 353.375776] [] SyS_sendmsg+0xd/0x20
[ 353.375823] [] system_call_fastpath+0x16/0x1b

Release objects in this order: rules -> sets -> chains -> tables, to
make sure no references to chains are held anymore.

Reported-by: Asbjoern Sloth Toennesen
Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2015-01-07 05:27:48 +0800
62924af24 netfilter: nfnetlink: relax strict multicast group check from netlink_bind ... Browse Code »

Relax the checking that was introduced in 97840cb ("netfilter:
nfnetlink: fix insufficient validation in nfnetlink_bind") when the
subscription bitmask is used. Existing userspace code code may request
to listen to all of the existing netlink groups by setting an all to one
subscription group bitmask. Netlink already validates subscription via
setsockopt() for us.

Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2015-01-07 05:27:47 +0800