Eric Lee / smarc-fsl-linux-kernel

08 Oct, 2013

3 commits

d45ed4a4e net: fix unsafe set_memory_rw from softirq ... Browse Code »

on x86 system with net.core.bpf_jit_enable = 1

sudo tcpdump -i eth1 'tcp port 22'

causes the warning:
[ 56.766097] Possible unsafe locking scenario:
[ 56.766097]
[ 56.780146] CPU0
[ 56.786807] ----
[ 56.793188] lock(&(&vb->lock)->rlock);
[ 56.799593]
[ 56.805889] lock(&(&vb->lock)->rlock);
[ 56.812266]
[ 56.812266] *** DEADLOCK ***
[ 56.812266]
[ 56.830670] 1 lock held by ksoftirqd/1/13:
[ 56.836838] #0: (rcu_read_lock){.+.+..}, at: [] vm_unmap_aliases+0x8c/0x380
[ 56.849757]
[ 56.849757] stack backtrace:
[ 56.862194] CPU: 1 PID: 13 Comm: ksoftirqd/1 Not tainted 3.12.0-rc3+ #45
[ 56.868721] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012
[ 56.882004] ffffffff821944c0 ffff88080bbdb8c8 ffffffff8175a145 0000000000000007
[ 56.895630] ffff88080bbd5f40 ffff88080bbdb928 ffffffff81755b14 0000000000000001
[ 56.909313] ffff880800000001 ffff880800000000 ffffffff8101178f 0000000000000001
[ 56.923006] Call Trace:
[ 56.929532] [] dump_stack+0x55/0x76
[ 56.936067] [] print_usage_bug+0x1f7/0x208
[ 56.942445] [] ? save_stack_trace+0x2f/0x50
[ 56.948932] [] ? check_usage_backwards+0x150/0x150
[ 56.955470] [] mark_lock+0x282/0x2c0
[ 56.961945] [] __lock_acquire+0x45d/0x1d50
[ 56.968474] [] ? __lock_acquire+0x2de/0x1d50
[ 56.975140] [] ? cpumask_next_and+0x55/0x90
[ 56.981942] [] lock_acquire+0x92/0x1d0
[ 56.988745] [] ? vm_unmap_aliases+0x16a/0x380
[ 56.995619] [] _raw_spin_lock+0x41/0x50
[ 57.002493] [] ? vm_unmap_aliases+0x16a/0x380
[ 57.009447] [] vm_unmap_aliases+0x16a/0x380
[ 57.016477] [] ? vm_unmap_aliases+0x8c/0x380
[ 57.023607] [] change_page_attr_set_clr+0xc0/0x460
[ 57.030818] [] ? trace_hardirqs_on+0xd/0x10
[ 57.037896] [] ? kmem_cache_free+0xb0/0x2b0
[ 57.044789] [] ? free_object_rcu+0x93/0xa0
[ 57.051720] [] set_memory_rw+0x2f/0x40
[ 57.058727] [] bpf_jit_free+0x2c/0x40
[ 57.065577] [] sk_filter_release_rcu+0x1a/0x30
[ 57.072338] [] rcu_process_callbacks+0x202/0x7c0
[ 57.078962] [] __do_softirq+0xf7/0x3f0
[ 57.085373] [] run_ksoftirqd+0x35/0x70

cannot reuse jited filter memory, since it's readonly,
so use original bpf insns memory to hold work_struct

defer kfree of sk_filter until jit completed freeing

tested on x86_64 and i386

Signed-off-by: Alexei Starovoitov
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Alexei Starovoitov
2013-10-08 03:16:45 +0800
582442d6d ipv6: Allow the MTU of ipip6 tunnel to be set below 1280 ... Browse Code »

The (inner) MTU of a ipip6 (IPv4-in-IPv6) tunnel cannot be set below 1280, which is the minimum MTU in IPv6.
However, there should be no IPv6 on the tunnel interface at all, so the IPv6 rules should not apply.
More info at https://bugzilla.kernel.org/show_bug.cgi?id=15530

This patch allows to check the minimum MTU for ipv6 tunnel according to these rules:
-In case the tunnel is configured with ipip6 mode the minimum MTU is 68.
-In case the tunnel is configured with ip6ip6 or any mode the minimum MTU is 1280.

Signed-off-by: Oussama Ghorbel
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Oussama Ghorbel
2013-10-08 00:32:26 +0800
3573540ca netif_set_xps_queue: make cpu mask const ... Browse Code »

virtio wants to pass in cpumask_of(cpu), make parameter
const to avoid build warnings.

Signed-off-by: Michael S. Tsirkin
Signed-off-by: David S. Miller

Michael S. Tsirkin
2013-10-08 00:29:26 +0800

05 Oct, 2013

1 commit

5e8a402f8 tcp: do not forget FIN in tcp_shifted_skb() ... Browse Code »

Yuchung found following problem :

There are bugs in the SACK processing code, merging part in
tcp_shift_skb_data(), that incorrectly resets or ignores the sacked
skbs FIN flag. When a receiver first SACK the FIN sequence, and later
throw away ofo queue (e.g., sack-reneging), the sender will stop
retransmitting the FIN flag, and hangs forever.

Following packetdrill test can be used to reproduce the bug.

$ cat sack-merge-bug.pkt
`sysctl -q net.ipv4.tcp_fack=0`

// Establish a connection and send 10 MSS.
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+.000 bind(3, ..., ...) = 0
+.000 listen(3, 1) = 0

+.050 < S 0:0(0) win 32792
+.000 > S. 0:0(0) ack 1
+.001 < . 1:1(0) ack 1 win 1024
+.000 accept(3, ..., ...) = 4

+.100 write(4, ..., 12000) = 12000
+.000 shutdown(4, SHUT_WR) = 0
+.000 > . 1:10001(10000) ack 1
+.050 < . 1:1(0) ack 2001 win 257
+.000 > FP. 10001:12001(2000) ack 1
+.050 < . 1:1(0) ack 2001 win 257
+.050 < . 1:1(0) ack 2001 win 257
// SACK reneg
+.050 < . 1:1(0) ack 12001 win 257
+0 %{ print "unacked: ",tcpi_unacked }%
+5 %{ print "" }%

First, a typo inverted left/right of one OR operation, then
code forgot to advance end_seq if the merged skb carried FIN.

Bug was added in 2.6.29 by commit 832d11c5cd076ab
("tcp: Try to restore large SKBs while SACK processing")

Signed-off-by: Eric Dumazet
Signed-off-by: Yuchung Cheng
Acked-by: Neal Cardwell
Cc: Ilpo Järvinen
Acked-by: Ilpo Järvinen
Signed-off-by: David S. Miller

Eric Dumazet
2013-10-05 02:16:36 +0800

04 Oct, 2013

4 commits

7df9b4858 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless ... Browse Code »

John W. Linville says:

====================
Here is another batch of fixes intended for the 3.12 stream...

For the mac80211 bits, Johannes says:

"This time I have two fixes for IBSS (including one for wext, hah), a fix
for extended rates IEs, an active monitor checking fix and a sysfs
registration race fix."

On top of those...

Amitkumar Karwar brings an mwifiex fix for an interrupt loss issue
w/ SDIO devices. The problem was due to a command timeout issue
introduced by an earlier patch.

Felix Fietkau a stall in the ath9k driver. This patch fixes the
regression introduced in the commit "ath9k: use software queues for
un-aggregated data packets".

Stanislaw Gruszka reverts an rt2x00 patch that was found to cause
connection problems with some devices.
====================

Signed-off-by: David S. Miller

David S. Miller
2013-10-04 04:28:34 +0800
1661bf364 net: heap overflow in __audit_sockaddr() ... Browse Code »

We need to cap ->msg_namelen or it leads to a buffer overflow when we
to the memcpy() in __audit_sockaddr(). It requires CAP_AUDIT_CONTROL to
exploit this bug.

The call tree is:
___sys_recvmsg()
move_addr_to_user()
audit_sockaddr()
__audit_sockaddr()

Reported-by: Jüri Aedla
Signed-off-by: Dan Carpenter
Signed-off-by: David S. Miller

Dan Carpenter
2013-10-04 04:05:14 +0800
1eea72f03 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/… ... Browse Code »

…wireless into for-davem

John W. Linville
2013-10-04 04:00:03 +0800
196896d4b Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge ... Browse Code »

Included change:
- fix multi soft-interfaces setups with Network Coding enabled by
registering the CODED packet type once only (instead of once per soft-if)

Signed-off-by: David S. Miller

David S. Miller
2013-10-04 03:57:36 +0800

03 Oct, 2013

4 commits

e18503f41 l2tp: fix kernel panic when using IPv4-mapped IPv6 addresses ... Browse Code »

IPv4 mapped addresses cause kernel panic.
The patch juste check whether the IPv6 address is an IPv4 mapped
address. If so, use IPv4 API instead of IPv6.

[ 940.026915] general protection fault: 0000 [#1]
[ 940.026915] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core pppox ppp_generic slhc loop psmouse
[ 940.026915] CPU: 0 PID: 3184 Comm: memcheck-amd64- Not tainted 3.11.0+ #1
[ 940.026915] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[ 940.026915] task: ffff880007130e20 ti: ffff88000737e000 task.ti: ffff88000737e000
[ 940.026915] RIP: 0010:[] [] ip6_xmit+0x276/0x326
[ 940.026915] RSP: 0018:ffff88000737fd28 EFLAGS: 00010286
[ 940.026915] RAX: c748521a75ceff48 RBX: ffff880000c30800 RCX: 0000000000000000
[ 940.026915] RDX: ffff88000075cc4e RSI: 0000000000000028 RDI: ffff8800060e5a40
[ 940.026915] RBP: ffff8800060e5a40 R08: 0000000000000000 R09: ffff88000075cc90
[ 940.026915] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88000737fda0
[ 940.026915] R13: 0000000000000000 R14: 0000000000002000 R15: ffff880005d3b580
[ 940.026915] FS: 00007f163dc5e800(0000) GS:ffffffff81623000(0000) knlGS:0000000000000000
[ 940.026915] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 940.026915] CR2: 00000004032dc940 CR3: 0000000005c25000 CR4: 00000000000006f0
[ 940.026915] Stack:
[ 940.026915] ffff88000075cc4e ffffffff81694e90 ffff880000c30b38 0000000000000020
[ 940.026915] 11000000523c4bac ffff88000737fdb4 0000000000000000 ffff880000c30800
[ 940.026915] ffff880005d3b580 ffff880000c30b38 ffff8800060e5a40 0000000000000020
[ 940.026915] Call Trace:
[ 940.026915] [] ? inet6_csk_xmit+0xa4/0xc4
[ 940.026915] [] ? l2tp_xmit_skb+0x503/0x55a [l2tp_core]
[ 940.026915] [] ? pskb_expand_head+0x161/0x214
[ 940.026915] [] ? pppol2tp_xmit+0xf2/0x143 [l2tp_ppp]
[ 940.026915] [] ? ppp_channel_push+0x36/0x8b [ppp_generic]
[ 940.026915] [] ? ppp_write+0xaf/0xc5 [ppp_generic]
[ 940.026915] [] ? vfs_write+0xa2/0x106
[ 940.026915] [] ? SyS_write+0x56/0x8a
[ 940.026915] [] ? system_call_fastpath+0x16/0x1b
[ 940.026915] Code: 00 49 8b 8f d8 00 00 00 66 83 7c 11 02 00 74 60 49
8b 47 58 48 83 e0 fe 48 8b 80 18 01 00 00 48 85 c0 74 13 48 8b 80 78 02
00 00 ff 40 28 41 8b 57 68 48 01 50 30 48 8b 54 24 08 49 c7 c1 51
[ 940.026915] RIP [] ip6_xmit+0x276/0x326
[ 940.026915] RSP
[ 940.057945] ---[ end trace be8aba9a61c8b7f3 ]---
[ 940.058583] Kernel panic - not syncing: Fatal exception in interrupt

Signed-off-by: François CACHEREUL
Signed-off-by: David S. Miller

François Cachereul
2013-10-03 05:09:22 +0800
80ad1d61e net: do not call sock_put() on TIMEWAIT sockets ... Browse Code »

commit 3ab5aee7fe84 ("net: Convert TCP & DCCP hash tables to use RCU /
hlist_nulls") incorrectly used sock_put() on TIMEWAIT sockets.

We should instead use inet_twsk_put()

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2013-10-03 05:05:54 +0800
5843ef421 tcp: Always set options to 0 before calling tcp_established_options ... Browse Code »

tcp_established_options assumes opts->options is 0 before calling,
as it read modify writes it.

For the tcp_current_mss() case the opts structure is not zeroed,
so this can be done with uninitialized values.

This is ok, because ->options is not read in this path.
But it's still better to avoid the operation on the uninitialized
field. This shuts up a static code analyzer, and presumably
may help the optimizer.

Cc: netdev@vger.kernel.org
Signed-off-by: Andi Kleen
Signed-off-by: David S. Miller

Andi Kleen
2013-10-03 04:32:43 +0800
6865d1e83 unix_diag: fix info leak ... Browse Code »

When filling the netlink message we miss to wipe the pad field,
therefore leak one byte of heap memory to userland. Fix this by
setting pad to 0.

Signed-off-by: Mathias Krause
Signed-off-by: David S. Miller

Mathias Krause
2013-10-03 04:08:24 +0800

02 Oct, 2013

10 commits

6c519bad7 batman-adv: set up network coding packet handlers during module init ... Browse Code »

batman-adv saves its table of packet handlers as a global state, so handlers
must be set up only once (and setting them up a second time will fail).

The recently-added network coding support tries to set up its handler each time
a new softif is registered, which obviously fails when more that one softif is
used (and in consequence, the softif creation fails).

Fix this by splitting up batadv_nc_init into batadv_nc_init (which is called
only once) and batadv_nc_mesh_init (which is called for each softif); in
addition batadv_nc_free is renamed to batadv_nc_mesh_free to keep naming
consistent.

Signed-off-by: Matthias Schiffer
Signed-off-by: Marek Lindner
Signed-off-by: Antonio Quartulli

Matthias Schiffer
2013-10-02 19:46:19 +0800
c31eeaced Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking changes from David Miller:

1) Multiply in netfilter IPVS can overflow when calculating destination
weight. From Simon Kirby.

2) Use after free fixes in IPVS from Julian Anastasov.

3) SFC driver bug fixes from Daniel Pieczko.

4) Memory leak in pcan_usb_core failure paths, from Alexey Khoroshilov.

5) Locking and encapsulation fixes to serial line CAN driver, from
Andrew Naujoks.

6) Duplex and VF handling fixes to bnx2x driver from Yaniv Rosner,
Eilon Greenstein, and Ariel Elior.

7) In lapb, if no other packets are outstanding, T1 timeouts actually
stall things and no packet gets sent. Fix from Josselin Costanzi.

8) ICMP redirects should not make it to the socket error queues, from
Duan Jiong.

9) Fix bugs in skge DMA mapping error handling, from Nikulas Patocka.

10) Fix setting of VLAN priority field on via-rhine driver, from Roget
Luethi.

11) Fix TX stalls and VLAN promisc programming in be2net driver from
Ajit Khaparde.

12) Packet padding doesn't get handled correctly in new usbnet SG
support code, from Ming Lei.

13) Fix races in netdevice teardown wrt. network namespace closing.
From Eric W. Biederman.

14) Fix potential missed initialization of net_secret if not TCP
connections are openned. From Eric Dumazet.

15) Cinterion PLXX product ID in qmi_wwan driver is wrong, from
Aleksander Morgado.

16) skb_cow_head() can change skb->data and thus packet header pointers,
don't use stale ip_hdr reference in ip_tunnel code.

17) Backend state transition handling fixes in xen-netback, from Paul
Durrant.

18) Packet offset for AH protocol is handled wrong in flow dissector,
from Eric Dumazet.

19) Taking down an fq packet scheduler instance can leave stale packets
in the queues, fix from Eric Dumazet.

20) Fix performance regressions introduced by TCP Small Queues. From
Eric Dumazet.

21) IPV6 GRE tunneling code calculates max_headroom incorrectly, from
Hannes Frederic Sowa.

22) Multicast timer handlers in ipv4 and ipv6 can be the last and final
reference to the ipv4/ipv6 specific network device state, so use the
reference put that will check and release the object if the
reference hits zero. From Salam Noureddine.

23) Fix memory corruption in ip_tunnel driver, and use skb_push()
instead of __skb_push() so that similar bugs are less hard to find.
From Steffen Klassert.

24) Add forgotten hookup of rtnl_ops in SIT and ip6tnl drivers, from
Nicolas Dichtel.

25) fq scheduler doesn't accurately rate limit in certain circumstances,
from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (103 commits)
pkt_sched: fq: rate limiting improvements
ip6tnl: allow to use rtnl ops on fb tunnel
sit: allow to use rtnl ops on fb tunnel
ip_tunnel: Remove double unregister of the fallback device
ip_tunnel_core: Change __skb_push back to skb_push
ip_tunnel: Add fallback tunnels to the hash lists
ip_tunnel: Fix a memory corruption in ip_tunnel_xmit
qlcnic: Fix SR-IOV configuration
ll_temac: Reset dma descriptors indexes on ndo_open
skbuff: size of hole is wrong in a comment
ipv6 mcast: use in6_dev_put in timer handlers instead of __in6_dev_put
ipv4 igmp: use in_dev_put in timer handlers instead of __in_dev_put
ethernet: moxa: fix incorrect placement of __initdata tag
ipv6: gre: correct calculation of max_headroom
powerpc/83xx: gianfar_ptp: select 1588 clock source through dts file
Revert "powerpc/83xx: gianfar_ptp: select 1588 clock source through dts file"
bonding: Fix broken promiscuity reference counting issue
tcp: TSQ can use a dynamic limit
dm9601: fix IFF_ALLMULTI handling
pkt_sched: fq: qdisc dismantle fixes
...

Linus Torvalds
2013-10-02 03:58:48 +0800
0eab5eb7a pkt_sched: fq: rate limiting improvements ... Browse Code »

FQ rate limiting suffers from two problems, reported
by Steinar :

1) FQ enforces a delay when flow quantum is exhausted in order
to reduce cpu overhead. But if packets are small, current
delay computation is slightly wrong, and observed rates can
be too high.

Steinar had this problem because he disabled TSO and GSO,
and default FQ quantum is 2*1514.

(Of course, I wish recent TSO auto sizing changes will help
to not having to disable TSO in the first place)

2) maxrate was not used for forwarded flows (skbs not attached
to a socket)

Tested:

tc qdisc add dev eth0 root est 1sec 4sec fq maxrate 8Mbit
netperf -H lpq84 -l 1000 &
sleep 10 ; tc -s qdisc show dev eth0
qdisc fq 8003: root refcnt 32 limit 10000p flow_limit 100p buckets 1024
quantum 3028 initial_quantum 15140 maxrate 8000Kbit
Sent 16819357 bytes 11258 pkt (dropped 0, overlimits 0 requeues 0)
rate 7831Kbit 653pps backlog 7570b 5p requeues 0
44 flows (43 inactive, 1 throttled), next packet delay 2977352 ns
0 gc, 0 highprio, 5545 throttled

lpq83:~# tcpdump -p -i eth0 host lpq84 -c 12
09:02:52.079484 IP lpq83 > lpq84: . 1389536928:1389538376(1448) ack 3808678021 win 457
09:02:52.079499 IP lpq83 > lpq84: . 1448:2896(1448) ack 1 win 457
09:02:52.079906 IP lpq84 > lpq83: . ack 2896 win 16384
09:02:52.082568 IP lpq83 > lpq84: . 2896:4344(1448) ack 1 win 457
09:02:52.082581 IP lpq83 > lpq84: . 4344:5792(1448) ack 1 win 457
09:02:52.083017 IP lpq84 > lpq83: . ack 5792 win 16384
09:02:52.085678 IP lpq83 > lpq84: . 5792:7240(1448) ack 1 win 457
09:02:52.085693 IP lpq83 > lpq84: . 7240:8688(1448) ack 1 win 457
09:02:52.086117 IP lpq84 > lpq83: . ack 8688 win 16384
09:02:52.088792 IP lpq83 > lpq84: . 8688:10136(1448) ack 1 win 457
09:02:52.088806 IP lpq83 > lpq84: . 10136:11584(1448) ack 1 win 457
09:02:52.089217 IP lpq84 > lpq83: . ack 11584 win 16384

Reported-by: Steinar H. Gunderson
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2013-10-02 01:00:38 +0800
bb8140947 ip6tnl: allow to use rtnl ops on fb tunnel ... Browse Code »

rtnl ops where introduced by c075b13098b3 ("ip6tnl: advertise tunnel param via
rtnl"), but I forget to assign rtnl ops to fb tunnels.

Now that it is done, we must remove the explicit call to
unregister_netdevice_queue(), because the fallback tunnel is added to the queue
in ip6_tnl_destroy_tunnels() when checking rtnl_link_ops of all netdevices (this
is valid since commit 0bd8762824e7 ("ip6tnl: add x-netns support")).

Signed-off-by: Nicolas Dichtel
Signed-off-by: David S. Miller

Nicolas Dichtel
2013-10-02 00:55:53 +0800
205983c43 sit: allow to use rtnl ops on fb tunnel ... Browse Code »

rtnl ops where introduced by ba3e3f50a0e5 ("sit: advertise tunnel param via
rtnl"), but I forget to assign rtnl ops to fb tunnels.

Now that it is done, we must remove the explicit call to
unregister_netdevice_queue(), because the fallback tunnel is added to the queue
in sit_destroy_tunnels() when checking rtnl_link_ops of all netdevices (this
is valid since commit 5e6700b3bf98 ("sit: add support of x-netns")).

Signed-off-by: Nicolas Dichtel
Signed-off-by: David S. Miller

Nicolas Dichtel
2013-10-02 00:55:53 +0800
cfe4a5369 ip_tunnel: Remove double unregister of the fallback device ... Browse Code »

When queueing the netdevices for removal, we queue the
fallback device twice in ip_tunnel_destroy(). The first
time when we queue all netdevices in the namespace and
then again explicitly. Fix this by removing the explicit
queueing of the fallback device.

Bug was introduced when network namespace support was added
with commit 6c742e714d8 ("ipip: add x-netns support").

Cc: Nicolas Dichtel
Signed-off-by: Steffen Klassert
Acked-by: Nicolas Dichtel
Signed-off-by: David S. Miller

Steffen Klassert
2013-10-02 00:42:16 +0800
78a3694d4 ip_tunnel_core: Change __skb_push back to skb_push ... Browse Code »

Git commit 0e6fbc5b ("ip_tunnels: extend iptunnel_xmit()")
moved the IP header installation to iptunnel_xmit() and
changed skb_push() to __skb_push(). This makes possible
bugs hard to track down, so change it back to skb_push().

Cc: Pravin Shelar
Signed-off-by: Steffen Klassert
Signed-off-by: David S. Miller

Steffen Klassert
2013-10-02 00:42:16 +0800
670132826 ip_tunnel: Add fallback tunnels to the hash lists ... Browse Code »

Currently we can not update the tunnel parameters of
the fallback tunnels because we don't find them in the
hash lists. Fix this by adding them on initialization.

Bug was introduced with commit c544193214
("GRE: Refactor GRE tunneling code.")

Cc: Pravin Shelar
Signed-off-by: Steffen Klassert
Signed-off-by: David S. Miller

Steffen Klassert
2013-10-02 00:42:16 +0800
3e08f4a72 ip_tunnel: Fix a memory corruption in ip_tunnel_xmit ... Browse Code »

We might extend the used aera of a skb beyond the total
headroom when we install the ipip header. Fix this by
calling skb_cow_head() unconditionally.

Bug was introduced with commit c544193214
("GRE: Refactor GRE tunneling code.")

Cc: Pravin Shelar
Signed-off-by: Steffen Klassert
Signed-off-by: David S. Miller

Steffen Klassert
2013-10-02 00:42:16 +0800
e024bdc05 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf ... Browse Code »

Pablo Neira Ayuso says:

====================
The following patchset contains Netfilter/IPVS fixes for your net
tree, they are:

* Fix BUG_ON splat due to malformed TCP packets seen by synproxy, from
Patrick McHardy.

* Fix possible weight overflow in lblc and lblcr schedulers due to
32-bits arithmetics, from Simon Kirby.

* Fix possible memory access race in the lblc and lblcr schedulers,
introduced when it was converted to use RCU, two patches from
Julian Anastasov.

* Fix hard dependency on CPU 0 when reading per-cpu stats in the
rate estimator, from Julian Anastasov.

* Fix race that may lead to object use after release, when invoking
ipvsadm -C && ipvsadm -R, introduced when adding RCU, from Julian
Anastasov.
====================

Signed-off-by: David S. Miller

David S. Miller
2013-10-02 00:39:35 +0800

01 Oct, 2013

9 commits

9260d3e10 ipv6 mcast: use in6_dev_put in timer handlers instead of __in6_dev_put ... Browse Code »

It is possible for the timer handlers to run after the call to
ipv6_mc_down so use in6_dev_put instead of __in6_dev_put in the
handler function in order to do proper cleanup when the refcnt
reaches 0. Otherwise, the refcnt can reach zero without the
inet6_dev being destroyed and we end up leaking a reference to
the net_device and see messages like the following,

unregister_netdevice: waiting for eth0 to become free. Usage count = 1

Tested on linux-3.4.43.

Signed-off-by: Salam Noureddine
Signed-off-by: David S. Miller

Salam Noureddine
2013-10-01 13:28:58 +0800
e2401654d ipv4 igmp: use in_dev_put in timer handlers instead of __in_dev_put ... Browse Code »

It is possible for the timer handlers to run after the call to
ip_mc_down so use in_dev_put instead of __in_dev_put in the handler
function in order to do proper cleanup when the refcnt reaches 0.
Otherwise, the refcnt can reach zero without the in_device being
destroyed and we end up leaking a reference to the net_device and
see messages like the following,

unregister_netdevice: waiting for eth0 to become free. Usage count = 1

Tested on linux-3.4.43.

Signed-off-by: Salam Noureddine
Signed-off-by: David S. Miller

Salam Noureddine
2013-10-01 13:28:56 +0800
3da812d86 ipv6: gre: correct calculation of max_headroom ... Browse Code »

gre_hlen already accounts for sizeof(struct ipv6_hdr) + gre header,
so initialize max_headroom to zero. Otherwise the

if (encap_limit >= 0) {
max_headroom += 8;
mtu -= 8;
}

increments an uninitialized variable before max_headroom was reset.

Found with coverity: 728539

Cc: Dmitry Kozlov
Signed-off-by: Hannes Frederic Sowa
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2013-10-01 13:04:09 +0800
c9eeec26e tcp: TSQ can use a dynamic limit ... Browse Code »

When TCP Small Queues was added, we used a sysctl to limit amount of
packets queues on Qdisc/device queues for a given TCP flow.

Problem is this limit is either too big for low rates, or too small
for high rates.

Now TCP stack has rate estimation in sk->sk_pacing_rate, and TSO
auto sizing, it can better control number of packets in Qdisc/device
queues.

New limit is two packets or at least 1 to 2 ms worth of packets.

Low rates flows benefit from this patch by having even smaller
number of packets in queues, allowing for faster recovery,
better RTT estimations.

High rates flows benefit from this patch by allowing more than 2 packets
in flight as we had reports this was a limiting factor to reach line
rate. [ In particular if TX completion is delayed because of coalescing
parameters ]

Example for a single flow on 10Gbp link controlled by FQ/pacing

14 packets in flight instead of 2

$ tc -s -d qd
qdisc fq 8001: dev eth0 root refcnt 32 limit 10000p flow_limit 100p
buckets 1024 quantum 3028 initial_quantum 15140
Sent 1168459366606 bytes 771822841 pkt (dropped 0, overlimits 0
requeues 6822476)
rate 9346Mbit 771713pps backlog 953820b 14p requeues 6822476
2047 flow, 2046 inactive, 1 throttled, delay 15673 ns
2372 gc, 0 highprio, 0 retrans, 9739249 throttled, 0 flows_plimit

Note that sk_pacing_rate is currently set to twice the actual rate, but
this might be refined in the future when a flow is in congestion
avoidance.

Additional change : skb->destructor should be set to tcp_wfree().

A future patch (for linux 3.13+) might remove tcp_limit_output_bytes

Signed-off-by: Eric Dumazet
Cc: Wei Liu
Cc: Cong Wang
Cc: Yuchung Cheng
Cc: Neal Cardwell
Acked-by: Neal Cardwell
Signed-off-by: David S. Miller

Eric Dumazet
2013-10-01 11:41:57 +0800
15214c2f6 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Browse Code »

John W. Linville
2013-10-01 04:14:27 +0800
8d34ce10c pkt_sched: fq: qdisc dismantle fixes ... Browse Code »

fq_reset() should drops all packets in queue, including
throttled flows.

This patch moves code from fq_destroy() to fq_reset()
to do the cleaning.

fq_change() must stop calling fq_dequeue() if all remaining
packets are from throttled flows.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2013-10-01 03:51:23 +0800
b86783587 net: flow_dissector: fix thoff for IPPROTO_AH ... Browse Code »

In commit 8ed781668dd49 ("flow_keys: include thoff into flow_keys for
later usage"), we missed that existing code was using nhoff as a
temporary variable that could not always contain transport header
offset.

This is not a problem for TCP/UDP because port offset (@poff)
is 0 for these protocols.

Signed-off-by: Eric Dumazet
Cc: Daniel Borkmann
Cc: Nikolay Aleksandrov
Acked-by: Nikolay Aleksandrov
Acked-by: Daniel Borkmann
Signed-off-by: David S. Miller

Eric Dumazet
2013-10-01 03:32:05 +0800
c9d55d5bf ipv6: Fix preferred_lft not updating in some cases ... Browse Code »

Consider the scenario where an IPv6 router is advertising a fixed
preferred_lft of 1800 seconds, while the valid_lft begins at 3600
seconds and counts down in realtime.

A client should reset its preferred_lft to 1800 every time the RA is
received, but a bug is causing Linux to ignore the update.

The core problem is here:
if (prefered_lft != ifp->prefered_lft) {

Note that ifp->prefered_lft is an offset, so it doesn't decrease over
time. Thus, the comparison is always (1800 != 1800), which fails to
trigger an update.

The most direct solution would be to compute a "stored_prefered_lft",
and use that value in the comparison. But I think that trying to filter
out unnecessary updates here is a premature optimization. In order for
the filter to apply, both of these would need to hold:

- The advertised valid_lft and preferred_lft are both declining in
real time.
- No clock skew exists between the router & client.

So in this patch, I've set "update_lft = 1" unconditionally, which
allows the surrounding code to be greatly simplified.

Signed-off-by: Paul Marks
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Paul Marks
2013-10-01 03:06:19 +0800
d4a71b155 ip_tunnel: Do not use stale inner_iph pointer. ... Browse Code »

While sending packet skb_cow_head() can change skb header which
invalidates inner_iph pointer to skb header. Following patch
avoid using it. Found by code inspection.

This bug was introduced by commit 0e6fbc5b6c6218 (ip_tunnels: extend
iptunnel_xmit()).

Signed-off-by: Pravin B Shelar
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Pravin B Shelar
2013-10-01 03:05:07 +0800

30 Sep, 2013

1 commit

f4a87e7bd netfilter: synproxy: fix BUG_ON triggered by corrupt TCP packets ... Browse Code »

TCP packets hitting the SYN proxy through the SYNPROXY target are not
validated by TCP conntrack. When th->doff is below 5, an underflow happens
when calculating the options length, causing skb_header_pointer() to
return NULL and triggering the BUG_ON().

Handle this case gracefully by checking for NULL instead of using BUG_ON().

Reported-by: Martin Topholm
Tested-by: Martin Topholm
Signed-off-by: Patrick McHardy
Signed-off-by: Pablo Neira Ayuso

Patrick McHardy
2013-09-30 18:44:38 +0800

29 Sep, 2013

3 commits

9a3bab6b0 net: net_secret should not depend on TCP ... Browse Code »

A host might need net_secret[] and never open a single socket.

Problem added in commit aebda156a570782
("net: defer net_secret[] initialization")

Based on prior patch from Hannes Frederic Sowa.

Reported-by: Hannes Frederic Sowa
Signed-off-by: Eric Dumazet
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Eric Dumazet
2013-09-29 06:19:40 +0800
50624c934 net: Delay default_device_exit_batch until no devices are unregistering v2 ... Browse Code »

There is currently serialization network namespaces exiting and
network devices exiting as the final part of netdev_run_todo does not
happen under the rtnl_lock. This is compounded by the fact that the
only list of devices unregistering in netdev_run_todo is local to the
netdev_run_todo.

This lack of serialization in extreme cases results in network devices
unregistering in netdev_run_todo after the loopback device of their
network namespace has been freed (making dst_ifdown unsafe), and after
the their network namespace has exited (making the NETDEV_UNREGISTER,
and NETDEV_UNREGISTER_FINAL callbacks unsafe).

Add the missing serialization by a per network namespace count of how
many network devices are unregistering and having a wait queue that is
woken up whenever the count is decreased. The count and wait queue
allow default_device_exit_batch to wait until all of the unregistration
activity for a network namespace has finished before proceeding to
unregister the loopback device and then allowing the network namespace
to exit.

Only a single global wait queue is used because there is a single global
lock, and there is a single waiter, per network namespace wait queues
would be a waste of resources.

The per network namespace count of unregistering devices gives a
progress guarantee because the number of network devices unregistering
in an exiting network namespace must ultimately drop to zero (assuming
network device unregistration completes).

The basic logic remains the same as in v1. This patch is now half
comment and half rtnl_lock_unregistering an expanded version of
wait_event performs no extra work in the common case where no network
devices are unregistering when we get to default_device_exit_batch.

Reported-by: Francesco Ruggeri
Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller

Eric W. Biederman
2013-09-29 06:09:15 +0800
7df37ff33 IPv6 NAT: Do not drop DNATed 6to4/6rd packets ... Browse Code »

When a router is doing DNAT for 6to4/6rd packets the latest
anti-spoofing commit 218774dc ("ipv6: add anti-spoofing checks for
6to4 and 6rd") will drop them because the IPv6 address embedded does
not match the IPv4 destination. This patch will allow them to pass by
testing if we have an address that matches on 6to4/6rd interface. I
have been hit by this problem using Fedora and IPV6TO4_IPV4ADDR.
Also, log the dropped packets (with rate limit).

Signed-off-by: Catalin(ux) M. BOIE
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Catalin$ux$ M. BOIE
2013-09-29 03:56:15 +0800

28 Sep, 2013

1 commit

0a878747e Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/… ... Browse Code »

…wireless into for-davem

Also fixed-up a badly indented closing brace...

Signed-off-by: John W. Linville <linville@tuxdriver.com>

John W. Linville
2013-09-28 01:11:17 +0800

27 Sep, 2013

4 commits

aa5f66d5a cfg80211: fix sysfs registration race ... Browse Code »

My locking rework/race fixes caused a regression in the
registration, causing uevent notifications for wireless
devices before the device is really fully registered and
available in nl80211.

Fix this by moving the device_add() under rtnl and move
the rfkill to afterwards (it can't be under rtnl.)

Reported-and-tested-by: Maxime Bizon
Signed-off-by: Johannes Berg

Johannes Berg
2013-09-27 02:03:45 +0800
cc63ec766 mac80211: fix the setting of extended supported rate IE ... Browse Code »

The patch "mac80211: select and adjust bitrates according to
channel mode" causes regression and breaks the extended supported rate
IE setting. Since "i" is starting with 8, so this is not necessary
to introduce "skip" here.

Signed-off-by: Chun-Yeow Yeoh
Signed-off-by: Colleen Twitty
Reviewed-by: Jason Abele
Signed-off-by: Johannes Berg

Chun-Yeow Yeoh
2013-09-27 01:56:59 +0800
6329b8d91 mac80211: drop spoofed packets in ad-hoc mode ... Browse Code »

If an Ad-Hoc node receives packets with the Cell ID or its own MAC
address as source address, it hits a WARN_ON in sta_info_insert_check()
With many packets, this can massively spam the logs. One way that this
can easily happen is through having Cisco APs in the area with rouge AP
detection and countermeasures enabled.
Such Cisco APs will regularly send fake beacons, disassoc and deauth
packets that trigger these warnings.

To fix this issue, drop such spoofed packets early in the rx path.

Cc: stable@vger.kernel.org
Reported-by: Thomas Huehn
Signed-off-by: Felix Fietkau
Signed-off-by: Johannes Berg

Felix Fietkau
2013-09-27 01:56:06 +0800
7c6a4acc6 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Browse Code »

John W. Linville
2013-09-27 01:47:05 +0800