Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

13 Jun, 2015

1 commit

ae36806a6 sctp: allow authenticating DATA chunks that are bundled with COOKIE_ECHO ... Browse Code »

Currently, we can ask to authenticate DATA chunks and we can send DATA
chunks on the same packet as COOKIE_ECHO, but if you try to combine
both, the DATA chunk will be sent unauthenticated and peer won't accept
it, leading to a communication failure.

This happens because even though the data was queued after it was
requested to authenticate DATA chunks, it was also queued before we
could know that remote peer can handle authenticating, so
sctp_auth_send_cid() returns false.

The fix is whenever we set up an active key, re-check send queue for
chunks that now should be authenticated. As a result, such packet will
now contain COOKIE_ECHO + AUTH + DATA chunks, in that order.

Reported-by: Liu Wei
Signed-off-by: Marcelo Ricardo Leitner
Acked-by: Neil Horman
Acked-by: Vlad Yasevich
Signed-off-by: David S. Miller

Marcelo Ricardo Leitner
2015-06-13 05:18:20 +0800

12 Jun, 2015

2 commits

fb05e7a89 net: don't wait for order-3 page allocation ... Browse Code »
5

We saw excessive direct memory compaction triggered by skb_page_frag_refill.
This causes performance issues and add latency. Commit 5640f7685831e0
introduces the order-3 allocation. According to the changelog, the order-3
allocation isn't a must-have but to improve performance. But direct memory
compaction has high overhead. The benefit of order-3 allocation can't
compensate the overhead of direct memory compaction.

This patch makes the order-3 page allocation atomic. If there is no memory
pressure and memory isn't fragmented, the alloction will still success, so we
don't sacrifice the order-3 benefit here. If the atomic allocation fails,
direct memory compaction will not be triggered, skb_page_frag_refill will
fallback to order-0 immediately, hence the direct memory compaction overhead is
avoided. In the allocation failure case, kswapd is waken up and doing
compaction, so chances are allocation could success next time.

alloc_skb_with_frags is the same.

The mellanox driver does similar thing, if this is accepted, we must fix
the driver too.

V3: fix the same issue in alloc_skb_with_frags as pointed out by Eric
V2: make the changelog clearer

Cc: Eric Dumazet
Cc: Chris Mason
Cc: Debabrata Banerjee
Signed-off-by: Shaohua Li
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Shaohua Li
2015-06-12 08:33:44 +0800
0fae3bf01 mpls: handle device renames for per-device sysctls ... Browse Code »

If a device is renamed and the original name is subsequently reused
for a new device, the following warning is generated:

sysctl duplicate entry: /net/mpls/conf/veth0//input
CPU: 3 PID: 1379 Comm: ip Not tainted 4.1.0-rc4+ #20
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
0000000000000000 0000000000000000 ffffffff81566aaf 0000000000000000
ffffffff81236279 ffff88002f7d7f00 0000000000000000 ffff88000db336d8
ffff88000db33698 0000000000000005 ffff88002e046000 ffff8800168c9280
Call Trace:
[] ? dump_stack+0x40/0x50
[] ? __register_sysctl_table+0x289/0x5a0
[] ? mpls_dev_notify+0x1ff/0x300 [mpls_router]
[] ? notifier_call_chain+0x4f/0x70
[] ? register_netdevice+0x2b2/0x480
[] ? veth_newlink+0x178/0x2d3 [veth]
[] ? rtnl_newlink+0x73c/0x8e0
[] ? rtnl_newlink+0x16a/0x8e0
[] ? __kmalloc_reserve.isra.30+0x32/0x90
[] ? rtnetlink_rcv_msg+0x8d/0x250
[] ? __alloc_skb+0x47/0x1f0
[] ? __netlink_lookup+0xab/0xe0
[] ? rtnetlink_rcv+0x30/0x30
[] ? netlink_rcv_skb+0xb0/0xd0
[] ? rtnetlink_rcv+0x24/0x30
[] ? netlink_unicast+0x107/0x1a0
[] ? netlink_sendmsg+0x50e/0x630
[] ? sock_sendmsg+0x3c/0x50
[] ? ___sys_sendmsg+0x27b/0x290
[] ? mem_cgroup_try_charge+0x88/0x110
[] ? mem_cgroup_commit_charge+0x56/0xa0
[] ? do_filp_open+0x30/0xa0
[] ? __sys_sendmsg+0x3e/0x80
[] ? system_call_fastpath+0x16/0x75

Fix this by unregistering the previous sysctl table (registered for
the path containing the original device name) and re-registering the
table for the path containing the new device name.

Fixes: 37bde79979c3 ("mpls: Per-device enabling of packet input")
Reported-by: Scott Feldman
Signed-off-by: Robert Shearman
Signed-off-by: David S. Miller

Robert Shearman
2015-06-12 07:47:16 +0800

11 Jun, 2015

4 commits

5d7536102 net, swap: Remove a warning and clarify why sk_mem_reclaim is required when deactivating swap ... Browse Code »

Jeff Layton reported the following;

[ 74.232485] ------------[ cut here ]------------
[ 74.233354] WARNING: CPU: 2 PID: 754 at net/core/sock.c:364 sk_clear_memalloc+0x51/0x80()
[ 74.234790] Modules linked in: cts rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache xfs libcrc32c snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device nfsd snd_pcm snd_timer snd e1000 ppdev parport_pc joydev parport pvpanic soundcore floppy serio_raw i2c_piix4 pcspkr nfs_acl lockd virtio_balloon acpi_cpufreq auth_rpcgss grace sunrpc qxl drm_kms_helper ttm drm virtio_console virtio_blk virtio_pci ata_generic virtio_ring pata_acpi virtio
[ 74.243599] CPU: 2 PID: 754 Comm: swapoff Not tainted 4.1.0-rc6+ #5
[ 74.244635] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 74.245546] 0000000000000000 0000000079e69e31 ffff8800d066bde8 ffffffff8179263d
[ 74.246786] 0000000000000000 0000000000000000 ffff8800d066be28 ffffffff8109e6fa
[ 74.248175] 0000000000000000 ffff880118d48000 ffff8800d58f5c08 ffff880036e380a8
[ 74.249483] Call Trace:
[ 74.249872] [] dump_stack+0x45/0x57
[ 74.250703] [] warn_slowpath_common+0x8a/0xc0
[ 74.251655] [] warn_slowpath_null+0x1a/0x20
[ 74.252585] [] sk_clear_memalloc+0x51/0x80
[ 74.253519] [] xs_disable_swap+0x42/0x80 [sunrpc]
[ 74.254537] [] rpc_clnt_swap_deactivate+0x7e/0xc0 [sunrpc]
[ 74.255610] [] nfs_swap_deactivate+0x27/0x30 [nfs]
[ 74.256582] [] destroy_swap_extents+0x74/0x80
[ 74.257496] [] SyS_swapoff+0x222/0x5c0
[ 74.258318] [] ? syscall_trace_leave+0xc7/0x140
[ 74.259253] [] system_call_fastpath+0x12/0x71
[ 74.260158] ---[ end trace 2530722966429f10 ]---

The warning in question was unnecessary but with Jeff's series the rules
are also clearer. This patch removes the warning and updates the comment
to explain why sk_mem_reclaim() may still be called.

[jlayton: remove if (sk->sk_forward_alloc) conditional. As Leon
points out that it's not needed.]

Cc: Leon Romanovsky
Signed-off-by: Mel Gorman
Signed-off-by: Jeff Layton
Signed-off-by: David S. Miller

Mel Gorman
2015-06-11 14:02:31 +0800
1a040eaca bridge: fix multicast router rlist endless loop ... Browse Code »

Since the addition of sysfs multicast router support if one set
multicast_router to "2" more than once, then the port would be added to
the hlist every time and could end up linking to itself and thus causing an
endless loop for rlist walkers.
So to reproduce just do:
echo 2 > multicast_router; echo 2 > multicast_router;
in a bridge port and let some igmp traffic flow, for me it hangs up
in br_multicast_flood().
Fix this by adding a check in br_multicast_add_router() if the port is
already linked.
The reason this didn't happen before the addition of multicast_router
sysfs entries is because there's a !hlist_unhashed check that prevents
it.

Signed-off-by: Nikolay Aleksandrov
Fixes: 0909e11758bd ("bridge: Add multicast_router sysfs entries")
Acked-by: Herbert Xu
Signed-off-by: David S. Miller

Nikolay Aleksandrov
2015-06-11 13:07:50 +0800
b3be5e3e7 tipc: disconnect socket directly after probe failure ... Browse Code »

If the TIPC connection timer expires in a probing state, a
self abort message is supposed to be generated and delivered
to the local socket. This is currently broken, and the abort
message is actually sent out to the peer node with invalid
addressing information. This will cause the link to enter
a constant retransmission state and eventually reset.
We fix this by removing the self-abort message creation and
tear down connection immediately instead.

Signed-off-by: Erik Hugne
Reviewed-by: Ying Xue
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller

Erik Hugne
2015-06-11 13:05:20 +0800
1b0ccfe54 Revert "ipv6: Fix protocol resubmission" ... Browse Code »

This reverts commit 0243508edd317ff1fa63b495643a7c192fbfcd92.

It introduces new regressions.

Signed-off-by: David S. Miller

David S. Miller
2015-06-11 06:29:31 +0800

10 Jun, 2015

1 commit

9c5a18a31 cfg80211: wext: clear sinfo struct before calling driver ... Browse Code »

Until recently, mac80211 overwrote all the statistics it could
provide when getting called, but it now relies on the struct
having been zeroed by the caller. This was always the case in
nl80211, but wext used a static struct which could even cause
values from one device leak to another.

Using a static struct is OK (as even documented in a comment)
since the whole usage of this function and its return value is
always locked under RTNL. Not clearing the struct for calling
the driver has always been wrong though, since drivers were
free to only fill values they could report, so calling this
for one device and then for another would always have leaked
values from one to the other.

Fix this by initializing the structure in question before the
driver method call.

This fixes https://bugzilla.kernel.org/show_bug.cgi?id=99691

Cc: stable@vger.kernel.org
Reported-by: Gerrit Renker
Reported-by: Alexander Kaltsas
Signed-off-by: Johannes Berg
Signed-off-by: David S. Miller

Johannes Berg
2015-06-10 04:54:58 +0800

09 Jun, 2015

3 commits

bbbf2df00 net: replace last open coded skb_orphan_frags with function call ... Browse Code »

Commit 70008aa50e92 ("skbuff: convert to skb_orphan_frags") replaced
open coded tests of SKBTX_DEV_ZEROCOPY and skb_copy_ubufs with calls
to helper function skb_orphan_frags. Apply that to the last remaining
open coded site.

Signed-off-by: Willem de Bruijn
Acked-by: Michael S. Tsirkin
Signed-off-by: David S. Miller

Willem de Bruijn
2015-06-09 03:15:13 +0800
0243508ed ipv6: Fix protocol resubmission ... Browse Code »
13

UDP encapsulation is broken on IPv6. This is because the logic to resubmit
the nexthdr is inverted, checking for a ret value > 0 instead of < 0. Also,
the resubmit label is in the wrong position since we already get the
nexthdr value when performing decapsulation. In addition the skb pull is no
longer necessary either.

This changes the return value check to look for < 0, using it for the
nexthdr on the next iteration, and moves the resubmit label to the proper
location.

With these changes the v6 code now matches what we do in the v4 ip input
code wrt resubmitting when decapsulating.

Signed-off-by: Josh Hunt
Acked-by: "Tom Herbert"
Signed-off-by: David S. Miller

Josh Hunt
2015-06-09 03:13:17 +0800
27e41fcfa ipv6: fix possible use after free of dev stats ... Browse Code »

The memory pointed to by idev->stats.icmpv6msgdev,
idev->stats.icmpv6dev and idev->stats.ipv6 can each be used in an RCU
read context without taking a reference on idev. For example, through
IP6_*_STATS_* calls in ip6_rcv. These memory blocks are freed without
waiting for an RCU grace period to elapse. This could lead to the
memory being written to after it has been freed.

Fix this by using call_rcu to free the memory used for stats, as well
as idev after an RCU grace period has elapsed.

Signed-off-by: Robert Shearman
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Robert Shearman
2015-06-09 03:12:45 +0800

08 Jun, 2015

4 commits

c4c832f89 bridge: disable softirqs around br_fdb_update to avoid lockup ... Browse Code »

br_fdb_update() can be called in process context in the following way:
br_fdb_add() -> __br_fdb_add() -> br_fdb_update() (if NTF_USE flag is set)
so we need to disable softirqs because there are softirq users of the
hash_lock. One easy way to reproduce this is to modify the bridge utility
to set NTF_USE, enable stp and then set maxageing to a low value so
br_fdb_cleanup() is called frequently and then just add new entries in
a loop. This happens because br_fdb_cleanup() is called from timer/softirq
context. The spin locks in br_fdb_update were _bh before commit f8ae737deea1
("[BRIDGE]: forwarding remove unneeded preempt and bh diasables")
and at the time that commit was correct because br_fdb_update() couldn't be
called from process context, but that changed after commit:
292d1398983f ("bridge: add NTF_USE support")
Using local_bh_disable/enable around br_fdb_update() allows us to keep
using the spin_lock/unlock in br_fdb_update for the fast-path.

Signed-off-by: Nikolay Aleksandrov
Fixes: 292d1398983f ("bridge: add NTF_USE support")
Signed-off-by: David S. Miller

Nikolay Aleksandrov
2015-06-08 10:44:13 +0800
7ff46e79f Revert "bridge: use _bh spinlock variant for br_fdb_update to avoid lockup" ... Browse Code »

This reverts commit 1d7c49037b12016e7056b9f2c990380e2187e766.

Nikolay Aleksandrov has a better version of this fix.

Signed-off-by: David S. Miller

David S. Miller
2015-06-08 10:43:47 +0800
25cc8f076 mpls: fix possible use after free of device ... Browse Code »

The mpls device is used in an RCU read context without a lock being
held. As the memory is freed without waiting for the RCU grace period
to elapse, the freed memory could still be in use.

Address this by using kfree_rcu to free the memory for the mpls device
after the RCU grace period has elapsed.

Fixes: 03c57747a702 ("mpls: Per-device MPLS state")
Signed-off-by: Robert Shearman
Acked-by: "Eric W. Biederman"
Signed-off-by: David S. Miller

Robert Shearman
2015-06-08 10:37:27 +0800
1d7c49037 bridge: use _bh spinlock variant for br_fdb_update to avoid lockup ... Browse Code »
13

br_fdb_update() can be called in process context in the following way:
br_fdb_add() -> __br_fdb_add() -> br_fdb_update() (if NTF_USE flag is set)
so we need to use spin_lock_bh because there are softirq users of the
hash_lock. One easy way to reproduce this is to modify the bridge utility
to set NTF_USE, enable stp and then set maxageing to a low value so
br_fdb_cleanup() is called frequently and then just add new entries in
a loop. This happens because br_fdb_cleanup() is called from timer/softirq
context. These locks were _bh before commit f8ae737deea1
("[BRIDGE]: forwarding remove unneeded preempt and bh diasables")
and at the time that commit was correct because br_fdb_update() couldn't be
called from process context, but that changed after commit:
292d1398983f ("bridge: add NTF_USE support")

Signed-off-by: Wilson Kok
Signed-off-by: Nikolay Aleksandrov
Fixes: 292d1398983f ("bridge: add NTF_USE support")
Signed-off-by: David S. Miller

Wilson Kok
2015-06-08 06:24:54 +0800

04 Jun, 2015

2 commits

6e5403093 ipv4/udp: Verify multicast group is ours in upd_v4_early_demux() ... Browse Code »
5

421b3885bf6d56391297844f43fb7154a6396e12 "udp: ipv4: Add udp early
demux" introduced a regression that allowed sockets bound to INADDR_ANY
to receive packets from multicast groups that the socket had not joined.
For example a socket that had joined 224.168.2.9 could also receive
packets from 225.168.2.9 despite not having joined that group if
ip_early_demux is enabled.

Fix this by calling ip_check_mc_rcu() in udp_v4_early_demux() to verify
that the multicast packet is indeed ours.

Signed-off-by: Shawn Bohrer
Reported-by: Yurij M. Plotnikov
Signed-off-by: David S. Miller

Shawn Bohrer
2015-06-04 15:46:26 +0800
640b2b107 openvswitch: disable LRO ... Browse Code »

Currently, openvswitch tries to disable LRO from the user space. This does
not work correctly when the device added is a vlan interface, though.
Instead of dealing with possibly complex stacked cross name space relations
in the user space, do the same as bridging does and call dev_disable_lro in
the kernel.

Signed-off-by: Jiri Benc
Acked-by: Flavio Leitner
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Jiri Benc
2015-06-04 10:39:35 +0800

02 Jun, 2015

4 commits

e453581dd Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf ... Browse Code »

Pablo Neira Ayuso says:

====================
Netfilter fix for net

The following patch reverts the ebtables chunk that enforces counters that was
introduced in the recently applied d26e2c9ffa38 ('Revert "netfilter: ensure
number of counters is >0 in do_replace()"') since this breaks ebtables.
====================

Signed-off-by: David S. Miller

David S. Miller
2015-06-02 07:56:43 +0800
ccd740cbc vti6: Add pmtu handling to vti6_xmit. ... Browse Code »

We currently rely on the PMTU discovery of xfrm.
However if a packet is localy sent, the PMTU mechanism
of xfrm tries to to local socket notification what
might not work for applications like ping that don't
check for this. So add pmtu handling to vti6_xmit to
report MTU changes immediately.

Signed-off-by: Steffen Klassert
Signed-off-by: Alexander Duyck
Signed-off-by: David S. Miller

Steffen Klassert
2015-06-02 07:03:43 +0800
18ec898ee Revert "net: core: 'ethtool' issue with querying phy settings" ... Browse Code »

This reverts commit f96dee13b8e10f00840124255bed1d8b4c6afd6f.

It isn't right, ethtool is meant to manage one PHY instance
per netdevice at a time, and this is selected by the SET
command. Therefore by definition the GET command must only
return the settings for the configured and selected PHY.

Reported-by: Ben Hutchings
Signed-off-by: David S. Miller

David S. Miller
2015-06-02 05:43:50 +0800
d26e2c9ff Revert "netfilter: ensure number of counters is >0 in do_replace()" ... Browse Code »
13

This partially reverts commit 1086bbe97a07 ("netfilter: ensure number of
counters is >0 in do_replace()") in net/bridge/netfilter/ebtables.c.

Setting rules with ebtables does not work any more with 1086bbe97a07 place.

There is an error message and no rules set in the end.

e.g.

~# ebtables -t nat -A POSTROUTING --src 12:34:56:78:9a:bc -j DROP
Unable to update the kernel. Two possible causes:
1. Multiple ebtables programs were executing simultaneously. The ebtables
userspace tool doesn't by default support multiple ebtables programs
running

Reverting the ebtables part of 1086bbe97a07 makes this work again.

Signed-off-by: Bernhard Thaler
Signed-off-by: Pablo Neira Ayuso

Bernhard Thaler
2015-06-02 01:45:47 +0800

01 Jun, 2015

3 commits

24595346d net: dsa: Properly propagate errors from dsa_switch_setup_one ... Browse Code »

While shuffling some code around, dsa_switch_setup_one() was introduced,
and it was modified to return either an error code using ERR_PTR() or a
NULL pointer when running out of memory or failing to setup a switch.

This is a problem for its caler: dsa_switch_setup() which uses IS_ERR()
and expects to find an error code, not a NULL pointer, so we still try
to proceed with dsa_switch_setup() and operate on invalid memory
addresses. This can be easily reproduced by having e.g: the bcm_sf2
driver built-in, but having no such switch, such that drv->setup will
fail.

Fix this by using PTR_ERR() consistently which is both more informative
and avoids for the caller to use IS_ERR_OR_NULL().

Fixes: df197195a5248 ("net: dsa: split dsa_switch_setup into two functions")
Reported-by: Andrew Lunn
Signed-off-by: Florian Fainelli
Tested-by: Andrew Lunn
Signed-off-by: David S. Miller

Florian Fainelli
2015-06-01 12:50:34 +0800
9f950415e tcp: fix child sockets to use system default congestion control if not set ... Browse Code »

Linux 3.17 and earlier are explicitly engineered so that if the app
doesn't specifically request a CC module on a listener before the SYN
arrives, then the child gets the system default CC when the connection
is established. See tcp_init_congestion_control() in 3.17 or earlier,
which says "if no choice made yet assign the current value set as
default". The change ("net: tcp: assign tcp cong_ops when tcp sk is
created") altered these semantics, so that children got their parent
listener's congestion control even if the system default had changed
after the listener was created.

This commit returns to those original semantics from 3.17 and earlier,
since they are the original semantics from 2007 in 4d4d3d1e8 ("[TCP]:
Congestion control initialization."), and some Linux congestion
control workflows depend on that.

In summary, if a listener socket specifically sets TCP_CONGESTION to
"x", or the route locks the CC module to "x", then the child gets
"x". Otherwise the child gets current system default from
net.ipv4.tcp_congestion_control. That's the behavior in 3.17 and
earlier, and this commit restores that.

Fixes: 55d8694fa82c ("net: tcp: assign tcp cong_ops when tcp sk is created")
Cc: Florian Westphal
Cc: Daniel Borkmann
Cc: Glenn Judd
Cc: Stephen Hemminger
Signed-off-by: Neal Cardwell
Signed-off-by: Eric Dumazet
Signed-off-by: Yuchung Cheng
Acked-by: Daniel Borkmann
Signed-off-by: David S. Miller

Neal Cardwell
2015-06-01 12:49:14 +0800
beb39db59 udp: fix behavior of wrong checksums ... Browse Code »

We have two problems in UDP stack related to bogus checksums :

1) We return -EAGAIN to application even if receive queue is not empty.
This breaks applications using edge trigger epoll()

2) Under UDP flood, we can loop forever without yielding to other
processes, potentially hanging the host, especially on non SMP.

This patch is an attempt to make things better.

We might in the future add extra support for rt applications
wanting to better control time spent doing a recv() in a hostile
environment. For example we could validate checksums before queuing
packets in socket receive queue.

Signed-off-by: Eric Dumazet
Cc: Willem de Bruijn
Signed-off-by: David S. Miller

Eric Dumazet
2015-06-01 12:42:18 +0800

31 May, 2015

1 commit

71d9f6149 bridge: fix br_multicast_query_expired() bug ... Browse Code »

br_multicast_query_expired() querier argument is a pointer to
a struct bridge_mcast_querier :

struct bridge_mcast_querier {
struct br_ip addr;
struct net_bridge_port __rcu *port;
};

Intent of the code was to clear port field, not the pointer to querier.

Fixes: 2cd4143192e8 ("bridge: memorize and export selected IGMP/MLD querier port")
Signed-off-by: Eric Dumazet
Acked-by: Thadeu Lima de Souza Cascardo
Acked-by: Linus Lüssing
Cc: Linus Lüssing
Cc: Steinar H. Gunderson
Signed-off-by: David S. Miller

Eric Dumazet
2015-05-31 14:31:28 +0800

29 May, 2015

1 commit

5aab0e8a4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec ... Browse Code »

Steffen Klassert says:

====================
pull request (net): ipsec 2015-05-28

1) Fix a race in xfrm_state_lookup_byspi, we need to take
the refcount before we release xfrm_state_lock.
From Li RongQing.

2) Fix IV generation on ESN state. We used just the
low order sequence numbers for IV generation on
ESN, as a result the IV can repeat on the same
state. Fix this by using the high order sequence
number bits too and make sure to always initialize
the high order bits with zero. These patches are
serious stable candidates. Fixes from Herbert Xu.

3) Fix the skb->mark handling on vti. We don't
reset skb->mark in skb_scrub_packet anymore,
so vti must care to restore the original
value back after it was used to lookup the
vti policy and state. Fixes from Alexander Duyck.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller

David S. Miller
2015-05-29 11:41:35 +0800

28 May, 2015

5 commits

d55c670cb ip_vti/ip6_vti: Preserve skb->mark after rcv_cb call ... Browse Code »

The vti6_rcv_cb and vti_rcv_cb calls were leaving the skb->mark modified
after completing the function. This resulted in the original skb->mark
value being lost. Since we only need skb->mark to be set for
xfrm_policy_check we can pull the assignment into the rcv_cb calls and then
just restore the original mark after xfrm_policy_check has been completed.

Signed-off-by: Alexander Duyck
Signed-off-by: Steffen Klassert

Alexander Duyck
2015-05-28 12:23:32 +0800
049f8e2e2 xfrm: Override skb->mark with tunnel->parm.i_key in xfrm_input ... Browse Code »
2

This change makes it so that if a tunnel is defined we just use the mark
from the tunnel instead of the mark from the skb header. By doing this we
can avoid the need to set skb->mark inside of the tunnel receive functions.

Signed-off-by: Alexander Duyck
Signed-off-by: Steffen Klassert

Alexander Duyck
2015-05-28 12:23:31 +0800
cd5279c19 ip_vti/ip6_vti: Do not touch skb->mark on xmit ... Browse Code »

Instead of modifying skb->mark we can simply modify the flowi_mark that is
generated as a result of the xfrm_decode_session. By doing this we don't
need to actually touch the skb->mark and it can be preserved as it passes
out through the tunnel.

Signed-off-by: Alexander Duyck
Signed-off-by: Steffen Klassert

Alexander Duyck
2015-05-28 12:23:31 +0800
8f98bcdf8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:

1) Don't use MMIO on certain iwlwifi devices otherwise we get a
firmware crash.

2) Don't corrupt the GRO lists of mac80211 contexts by doing sends via
timer interrupt, from Johannes Berg.

3) SKB tailroom is miscalculated in AP_VLAN crypto code, from Michal
Kazior.

4) Fix fw_status memory leak in iwlwifi, from Haim Dreyfuss.

5) Fix use after free in iwl_mvm_d0i3_enable_tx(), from Eliad Peller.

6) JIT'ing of large BPF programs is broken on x86, from Alexei
Starovoitov.

7) EMAC driver ethtool register dump size is miscalculated, from Ivan
Mikhaylov.

8) Fix PHY initial link mode when autonegotiation is disabled in
amd-xgbe, from Tom Lendacky.

9) Fix NULL deref on SOCK_DEAD socket in AF_UNIX and CAIF protocols,
from Mark Salyzyn.

10) credit_bytes not initialized properly in xen-netback, from Ross
Lagerwall.

11) Fallback from MSI-X to INTx interrupts not handled properly in mlx4
driver, fix from Benjamin Poirier.

12) Perform ->attach() after binding dev->qdisc in packet scheduler,
otherwise we can crash. From Cong WANG.

13) Don't clobber data in sctp_v4_map_v6(). From Jason Gunthorpe.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (30 commits)
sctp: Fix mangled IPv4 addresses on a IPv6 listening socket
net_sched: invoke ->attach() after setting dev->qdisc
xen-netfront: properly destroy queues when removing device
mlx4_core: Fix fallback from MSI-X to INTx
xen/netback: Properly initialize credit_bytes
net: netxen: correct sysfs bin attribute return code
tools: bpf_jit_disasm: fix segfault on disabled debugging log output
unix/caif: sk_socket can disappear when state is unlocked
amd-xgbe-phy: Fix initial mode when autoneg is disabled
net: dp83640: fix improper double spin locking.
net: dp83640: reinforce locking rules.
net: dp83640: fix broken calibration routine.
net: stmmac: create one debugfs dir per net-device
net/ibm/emac: fix size of emac dump memory areas
x86: bpf_jit: fix compilation of large bpf programs
net: phy: bcm7xxx: Fix 7425 PHY ID and flags
iwlwifi: mvm: avoid use-after-free on iwl_mvm_d0i3_enable_tx()
iwlwifi: mvm: clean net-detect info if device was reset during suspend
iwlwifi: mvm: take the UCODE_DOWN reference when resuming
iwlwifi: mvm: BT Coex - duplicate the command if sent ASYNC
...

Linus Torvalds
2015-05-28 04:41:13 +0800
86e363dc3 net_sched: invoke ->attach() after setting dev->qdisc ... Browse Code »

For mq qdisc, we add per tx queue qdisc to root qdisc
for display purpose, however, that happens too early,
before the new dev->qdisc is finally set, this causes
q->list points to an old root qdisc which is going to be
freed right before assigning with a new one.

Fix this by moving ->attach() after setting dev->qdisc.

For the record, this fixes the following crash:

------------[ cut here ]------------
WARNING: CPU: 1 PID: 975 at lib/list_debug.c:59 __list_del_entry+0x5a/0x98()
list_del corruption. prev->next should be ffff8800d1998ae8, but was 6b6b6b6b6b6b6b6b
CPU: 1 PID: 975 Comm: tc Not tainted 4.1.0-rc4+ #1019
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
0000000000000009 ffff8800d73fb928 ffffffff81a44e7f 0000000047574756
ffff8800d73fb978 ffff8800d73fb968 ffffffff810790da ffff8800cfc4cd20
ffffffff814e725b ffff8800d1998ae8 ffffffff82381250 0000000000000000
Call Trace:
[] dump_stack+0x4c/0x65
[] warn_slowpath_common+0x9c/0xb6
[] ? __list_del_entry+0x5a/0x98
[] warn_slowpath_fmt+0x46/0x48
[] ? dev_graft_qdisc+0x5e/0x6a
[] __list_del_entry+0x5a/0x98
[] list_del+0xe/0x2d
[] qdisc_list_del+0x1e/0x20
[] qdisc_destroy+0x30/0xd6
[] qdisc_graft+0x11d/0x243
[] tc_get_qdisc+0x1a6/0x1d4
[] ? mark_lock+0x2e/0x226
[] rtnetlink_rcv_msg+0x181/0x194
[] ? rtnl_lock+0x17/0x19
[] ? rtnl_lock+0x17/0x19
[] ? __rtnl_unlock+0x17/0x17
[] netlink_rcv_skb+0x4d/0x93
[] rtnetlink_rcv+0x26/0x2d
[] netlink_unicast+0xcb/0x150
[] ? might_fault+0x59/0xa9
[] netlink_sendmsg+0x4fa/0x51c
[] sock_sendmsg_nosec+0x12/0x1d
[] sock_sendmsg+0x29/0x2e
[] ___sys_sendmsg+0x1b4/0x23a
[] ? native_sched_clock+0x35/0x37
[] ? sched_clock_local+0x12/0x72
[] ? sched_clock_cpu+0x9e/0xb7
[] ? current_kernel_time+0xe/0x32
[] ? lock_release_holdtime.part.29+0x71/0x7f
[] ? read_seqcount_begin.constprop.27+0x5f/0x76
[] ? trace_hardirqs_on_caller+0x17d/0x199
[] ? __fget_light+0x50/0x78
[] __sys_sendmsg+0x42/0x60
[] SyS_sendmsg+0x12/0x1c
[] system_call_fastpath+0x12/0x6f
---[ end trace ef29d3fb28e97ae7 ]---

For long term, we probably need to clean up the qdisc_graft() code
in case it hides other bugs like this.

Fixes: 95dc19299f74 ("pkt_sched: give visibility to mq slave qdiscs")
Cc: Jamal Hadi Salim
Signed-off-by: Cong Wang
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

WANG Cong
2015-05-28 02:09:55 +0800

27 May, 2015

2 commits

b48732e4a unix/caif: sk_socket can disappear when state is unlocked ... Browse Code »

got a rare NULL pointer dereference in clear_bit

Signed-off-by: Mark Salyzyn
Acked-by: Hannes Frederic Sowa
----
v2: switch to sock_flag(sk, SOCK_DEAD) and added net/caif/caif_socket.c
v3: return -ECONNRESET in upstream caller of wait function for SOCK_DEAD
Signed-off-by: David S. Miller

Mark Salyzyn
2015-05-27 11:19:29 +0800
fe9066ade Merge tag 'mac80211-for-davem-2015-05-26' of git://git.kernel.org/pub/scm/linux/… ... Browse Code »

…kernel/git/jberg/mac80211

Johannes Berg says:

====================
We have three more fixes:
* AP_VLAN tailroom calculation fix, the bug leads to warnings
along with dropped packets
* NAPI context issue, calling napi_gro_receive() from a timer
(obviously) can lead to crashes
* remain-on-channel combining leads to dropped requests and not
being able to finish certain operations, so remove it
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

David S. Miller
2015-05-27 07:38:53 +0800

24 May, 2015

1 commit

086e8ddb5 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client ... Browse Code »

Pull two Ceph fixes from Sage Weil:
"These fix an issue with the RBD notifications when there are topology
changes in the cluster"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
Revert "libceph: clear r_req_lru_item in __unregister_linger_request()"
libceph: request a new osdmap if lingering request maps to no osd

Linus Torvalds
2015-05-24 02:28:25 +0800

23 May, 2015

6 commits

93a33a584 bridge: fix lockdep splat ... Browse Code »

Following lockdep splat was reported :

[ 29.382286] ===============================
[ 29.382315] [ INFO: suspicious RCU usage. ]
[ 29.382344] 4.1.0-0.rc0.git11.1.fc23.x86_64 #1 Not tainted
[ 29.382380] -------------------------------
[ 29.382409] net/bridge/br_private.h:626 suspicious
rcu_dereference_check() usage!
[ 29.382455]
other info that might help us debug this:

[ 29.382507]
rcu_scheduler_active = 1, debug_locks = 0
[ 29.382549] 2 locks held by swapper/0/0:
[ 29.382576] #0: (((&p->forward_delay_timer))){+.-...}, at:
[] call_timer_fn+0x5/0x4f0
[ 29.382660] #1: (&(&br->lock)->rlock){+.-...}, at:
[] br_forward_delay_timer_expired+0x31/0x140
[bridge]
[ 29.382754]
stack backtrace:
[ 29.382787] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
4.1.0-0.rc0.git11.1.fc23.x86_64 #1
[ 29.382838] Hardware name: LENOVO 422916G/LENOVO, BIOS A1KT53AUS 04/07/2015
[ 29.382882] 0000000000000000 3ebfc20364115825 ffff880666603c48
ffffffff81892d4b
[ 29.382943] 0000000000000000 ffffffff81e124e0 ffff880666603c78
ffffffff8110bcd7
[ 29.383004] ffff8800785c9d00 ffff88065485ac58 ffff880c62002800
ffff880c5fc88ac0
[ 29.383065] Call Trace:
[ 29.383084] [] dump_stack+0x4c/0x65
[ 29.383130] [] lockdep_rcu_suspicious+0xe7/0x120
[ 29.383178] [] br_fill_ifinfo+0x4a9/0x6a0 [bridge]
[ 29.383225] [] br_ifinfo_notify+0x11b/0x4b0 [bridge]
[ 29.383271] [] ? br_hold_timer_expired+0x70/0x70 [bridge]
[ 29.383320] []
br_forward_delay_timer_expired+0x58/0x140 [bridge]
[ 29.383371] [] ? br_hold_timer_expired+0x70/0x70 [bridge]
[ 29.383416] [] call_timer_fn+0xc3/0x4f0
[ 29.383454] [] ? call_timer_fn+0x5/0x4f0
[ 29.383493] [] ? lock_release_holdtime.part.29+0xf/0x200
[ 29.383541] [] ? br_hold_timer_expired+0x70/0x70 [bridge]
[ 29.383587] [] run_timer_softirq+0x244/0x490
[ 29.383629] [] __do_softirq+0xec/0x670
[ 29.383666] [] irq_exit+0x145/0x150
[ 29.383703] [] smp_apic_timer_interrupt+0x46/0x60
[ 29.383744] [] apic_timer_interrupt+0x73/0x80
[ 29.383782] [] ? cpuidle_enter_state+0x5f/0x2f0
[ 29.383832] [] ? cpuidle_enter_state+0x5b/0x2f0

Problem here is that br_forward_delay_timer_expired() is a timer
handler, calling br_ifinfo_notify() which assumes either rcu_read_lock()
or RTNL are held.

Simplest fix seems to add rcu read lock section.

Signed-off-by: Eric Dumazet
Reported-by: Josh Boyer
Reported-by: Dominick Grift
Cc: Vlad Yasevich
Signed-off-by: David S. Miller

Eric Dumazet
2015-05-23 04:23:56 +0800
f96dee13b net: core: 'ethtool' issue with querying phy settings ... Browse Code »
13

When trying to configure the settings for PHY1, using commands
like 'ethtool -s eth0 phyad 1 speed 100', the 'ethtool' seems to
modify other settings apart from the speed of the PHY1, in the
above case.

The ethtool seems to query the settings for PHY0, and use this
as the base to apply the new settings to the PHY1. This is
causing the other settings of the PHY 1 to be wrongly
configured.

The issue is caused by the '_ethtool_get_settings()' API, which
gets called because of the 'ETHTOOL_GSET' command, is clearing
the 'cmd' pointer (of type 'struct ethtool_cmd') by calling
memset. This clears all the parameters (if any) passed for the
'ETHTOOL_GSET' cmd. So the driver's callback is always invoked
with 'cmd->phy_address' as '0'.

The '_ethtool_get_settings()' is called from other files in the
'net/core'. So the fix is applied to the 'ethtool_get_settings()'
which is only called in the context of the 'ethtool'.

Signed-off-by: Arun Parameswaran
Reviewed-by: Ray Jui
Reviewed-by: Scott Branden
Signed-off-by: David S. Miller

Arun Parameswaran
2015-05-23 04:14:17 +0800
47cc84ce0 bridge: fix parsing of MLDv2 reports ... Browse Code »
2

When more than a multicast address is present in a MLDv2 report, all but
the first address is ignored, because the code breaks out of the loop if
there has not been an error adding that address.

This has caused failures when two guests connected through the bridge
tried to communicate using IPv6. Neighbor discoveries would not be
transmitted to the other guest when both used a link-local address and a
static address.

This only happens when there is a MLDv2 querier in the network.

The fix will only break out of the loop when there is a failure adding a
multicast address.

The mdb before the patch:

dev ovirtmgmt port vnet0 grp ff02::1:ff7d:6603 temp
dev ovirtmgmt port vnet1 grp ff02::1:ff7d:6604 temp
dev ovirtmgmt port bond0.86 grp ff02::2 temp

After the patch:

dev ovirtmgmt port vnet0 grp ff02::1:ff7d:6603 temp
dev ovirtmgmt port vnet1 grp ff02::1:ff7d:6604 temp
dev ovirtmgmt port bond0.86 grp ff02::fb temp
dev ovirtmgmt port bond0.86 grp ff02::2 temp
dev ovirtmgmt port bond0.86 grp ff02::d temp
dev ovirtmgmt port vnet0 grp ff02::1:ff00:76 temp
dev ovirtmgmt port bond0.86 grp ff02::16 temp
dev ovirtmgmt port vnet1 grp ff02::1:ff00:77 temp
dev ovirtmgmt port bond0.86 grp ff02::1:ff00:def temp
dev ovirtmgmt port bond0.86 grp ff02::1:ffa1:40bf temp

Fixes: 08b202b67264 ("bridge br_multicast: IPv6 MLD support.")
Reported-by: Rik Theys
Signed-off-by: Thadeu Lima de Souza Cascardo
Tested-by: Rik Theys
Signed-off-by: David S. Miller

Thadeu Lima de Souza Cascardo
2015-05-23 03:08:20 +0800
d4e64c290 ipv4: fill in table id when replacing a route ... Browse Code »

When replacing an IPv4 route, tb_id member of the new fib_alias
structure is not set in the replace code path so that the new route is
ignored.

Fixes: 0ddcf43d5d4a ("ipv4: FIB Local/MAIN table collapse")
Signed-off-by: Michal Kubecek
Acked-by: Alexander Duyck
Signed-off-by: David S. Miller

Michal Kubeček
2015-05-23 02:33:17 +0800
572152adf Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf ... Browse Code »

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contain Netfilter fixes for your net tree, they are:

1) Fix a race in nfnetlink_log and nfnetlink_queue that can lead to a crash.
This problem is due to wrong order in the per-net registration and netlink
socket events. Patch from Francesco Ruggeri.

2) Make sure that counters that userspace pass us are higher than 0 in all the
x_tables frontends. Discovered via Trinity, patch from Dave Jones.

3) Revert a patch for br_netfilter to rely on the conntrack status bits. This
breaks stateless IPv6 NAT transformations. Patch from Florian Westphal.
====================

Signed-off-by: David S. Miller

David S. Miller
2015-05-23 02:25:45 +0800
381c759d9 ipv4: Avoid crashing in ip_error ... Browse Code »
18

ip_error does not check if in_dev is NULL before dereferencing it.

IThe following sequence of calls is possible:
CPU A CPU B
ip_rcv_finish
ip_route_input_noref()
ip_route_input_slow()
inetdev_destroy()
dst_input()

With the result that a network device can be destroyed while processing
an input packet.

A crash was triggered with only unicast packets in flight, and
forwarding enabled on the only network device. The error condition
was created by the removal of the network device.

As such it is likely the that error code was -EHOSTUNREACH, and the
action taken by ip_error (if in_dev had been accessible) would have
been to not increment any counters and to have tried and likely failed
to send an icmp error as the network device is going away.

Therefore handle this weird case by just dropping the packet if
!in_dev. It will result in dropping the packet sooner, and will not
result in an actual change of behavior.

Fixes: 251da4130115b ("ipv4: Cache ip_error() routes even when not forwarding.")
Reported-by: Vittorio Gambaletta
Tested-by: Vittorio Gambaletta
Signed-off-by: Vittorio Gambaletta
Signed-off-by: "Eric W. Biederman"
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric W. Biederman
2015-05-23 02:23:40 +0800