14 Oct, 2013
6 commits
-
[ Upstream commit 3e08f4a72f689c6296d336c2aab4bddd60c93ae2 ]
We might extend the used aera of a skb beyond the total
headroom when we install the ipip header. Fix this by
calling skb_cow_head() unconditionally.Bug was introduced with commit c544193214
("GRE: Refactor GRE tunneling code.")Cc: Pravin Shelar
Signed-off-by: Steffen Klassert
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit e2401654dd0f5f3fb7a8d80dad9554d73d7ca394 ]
It is possible for the timer handlers to run after the call to
ip_mc_down so use in_dev_put instead of __in_dev_put in the handler
function in order to do proper cleanup when the refcnt reaches 0.
Otherwise, the refcnt can reach zero without the in_device being
destroyed and we end up leaking a reference to the net_device and
see messages like the following,unregister_netdevice: waiting for eth0 to become free. Usage count = 1
Tested on linux-3.4.43.
Signed-off-by: Salam Noureddine
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 9a3bab6b05383f1e4c3716b3615500c51285959e ]
A host might need net_secret[] and never open a single socket.
Problem added in commit aebda156a570782
("net: defer net_secret[] initialization")Based on prior patch from Hannes Frederic Sowa.
Reported-by: Hannes Frederic Sowa
Signed-off-by: Eric Dumazet
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 703133de331a7a7df47f31fb9de51dc6f68a9de8 ]
If local fragmentation is allowed, then ip_select_ident() and
ip_select_ident_more() need to generate unique IDs to ensure
correct defragmentation on the peer.For example, if IPsec (tunnel mode) has to encrypt large skbs
that have local_df bit set, then all IP fragments that belonged
to different ESP datagrams would have used the same identificator.
If one of these IP fragments would get lost or reordered, then
peer could possibly stitch together wrong IP fragments that did
not belong to the same datagram. This would lead to a packet loss
or data corruption.Signed-off-by: Ansis Atteka
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 749154aa56b57652a282cbde57a57abc278d1205 ]
skb->data already points to IP header, but for the sake of
consistency we can also use ip_hdr() to retrieve it.Signed-off-by: Ansis Atteka
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit e2e5c4c07caf810d7849658dca42f598b3938e21 ]
Signed-off-by: Dave Jones
Acked-by: Neal Cardwell
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
14 Sep, 2013
11 commits
-
[ Upstream commit eb8895debe1baba41fcb62c78a16f0c63c21662a ]
In commit 90ba9b19 (tcp: tcp_make_synack() can use alloc_skb()), Eric changed
the call to sock_wmalloc in tcp_make_synack to alloc_skb. In doing so,
the netfilter owner match lost its ability to block the SYNACK packet on
outbound listening sockets. Revert the change, restoring the owner match
functionality.This closes netfilter bugzilla #847.
Signed-off-by: Phil Oester
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit c27c9322d015dc1d9dfdf31724fca71c0476c4d1 ]
ipv4: raw_sendmsg: don't use header's destination address
A sendto() regression was bisected and found to start with commit
f8126f1d5136be1 (ipv4: Adjust semantics of rt->rt_gateway.)The problem is that it tries to ARP-lookup the constructed packet's
destination address rather than the explicitly provided address.Fix this using FLOWI_FLAG_KNOWN_NH so that given nexthop is used.
cf. commit 2ad5b9e4bd314fc685086b99e90e5de3bc59e26b
Reported-by: Chris Clark
Bisected-by: Chris Clark
Tested-by: Chris Clark
Suggested-by: Julian Anastasov
Signed-off-by: Chris Clark
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit e3e12028315749b7fa2edbc37328e5847be9ede9 ]
The zero value means that tsecr is not valid, so it's a special case.
tsoffset is used to customize tcp_time_stamp for one socket.
tsoffset is usually zero, it's used when a socket was moved from one
host to another host.Currently this issue affects logic of tcp_rcv_rtt_measure_ts. Due to
incorrect value of rcv_tsecr, tcp_rcv_rtt_measure_ts sets rto to
TCP_RTO_MAX.Reported-by: Cyrill Gorcunov
Cc: Pavel Emelyanov
Cc: Eric Dumazet
Cc: "David S. Miller"
Cc: Alexey Kuznetsov
Cc: James Morris
Cc: Hideaki YOSHIFUJI
Cc: Patrick McHardy
Signed-off-by: Andrey Vagin
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit c7781a6e3c4a9a17e144ec2db00ebfea327bd627 ]
u32 rcv_tstamp; /* timestamp of last received ACK */
Its value used in tcp_retransmit_timer, which closes socket
if the last ack was received more then TCP_RTO_MAX ago.Currently rcv_tstamp is initialized to zero and if tcp_retransmit_timer
is called before receiving a first ack, the connection is closed.This patch initializes rcv_tstamp to a timestamp, when a socket was
restored.Reported-by: Cyrill Gorcunov
Cc: Pavel Emelyanov
Cc: Eric Dumazet
Cc: "David S. Miller"
Cc: Alexey Kuznetsov
Cc: James Morris
Cc: Hideaki YOSHIFUJI
Cc: Patrick McHardy
Signed-off-by: Andrey Vagin
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 7ed5c5ae96d23da22de95e1c7a239537acd378b1 ]
When the repair mode is turned off, the write queue seqs are
updated so that the whole queue is considered to be 'already sent.The "when" field must be set for such skb. It's used in tcp_rearm_rto
for example. If the "when" field isn't set, the retransmit timeout can
be calculated incorrectly and a tcp connected can stop for two minutes
(TCP_RTO_MAX).Acked-by: Pavel Emelyanov
Cc: "David S. Miller"
Cc: Alexey Kuznetsov
Cc: James Morris
Cc: Hideaki YOSHIFUJI
Cc: Patrick McHardy
Signed-off-by: Andrey Vagin
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 4221f40513233fa8edeef7fc82e44163fde03b9b ]
Using inner-id for tunnel id is not safe in some rare cases.
E.g. packets coming from multiple sources entering same tunnel
can have same id. Therefore on tunnel packet receive we
could have packets from two different stream but with same
source and dst IP with same ip-id which could confuse ip packet
reassembly.Following patch reverts optimization from commit
490ab08127 (IP_GRE: Fix IP-Identification.)Signed-off-by: Pravin B Shelar
CC: Jarno Rajahalme
CC: Ansis Atteka
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 77a482bdb2e68d13fae87541b341905ba70d572b ]
Fix ipgre_header() (header_ops->create) to return the correct
amount of bytes pushed. Most callers of dev_hard_header() seem
to care only if it was success, but af_packet.c uses it as
offset to the skb to copy from userspace only once. In practice
this fixes packet socket sendto()/sendmsg() to gre tunnels.Regression introduced in c54419321455631079c7d6e60bc732dd0c5914c5
("GRE: Refactor GRE tunneling code.")Cc: Pravin B Shelar
Signed-off-by: Timo Teräs
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit cd6b423afd3c08b27e1fed52db828ade0addbc6b ]
While investigating about strange increase of retransmit rates
on hosts ~24 days after boot, Van found hystart was disabled
if ca->epoch_start was 0, as following condition is true
when tcp_time_stamp high order bit is set.(s32)(tcp_time_stamp - ca->epoch_start) < HZ
Quoting Van :
At initialization & after every loss ca->epoch_start is set to zero so
I believe that the above line will turn off hystart as soon as the 2^31
bit is set in tcp_time_stamp & hystart will stay off for 24 days.
I think we've observed that cubic's restart is too aggressive without
hystart so this might account for the higher drop rate we observe.Diagnosed-by: Van Jacobson
Signed-off-by: Eric Dumazet
Cc: Neal Cardwell
Cc: Yuchung Cheng
Acked-by: Neal Cardwell
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 2ed0edf9090bf4afa2c6fc4f38575a85a80d4b20 ]
commit 17a6e9f1aa9 ("tcp_cubic: fix clock dependency") added an
overflow error in bictcp_update() in following code :/* change the unit from HZ to bictcp_HZ */
t = ((tcp_time_stamp + msecs_to_jiffies(ca->delay_min>>3) -
ca->epoch_start) << BICTCP_HZ) / HZ;Because msecs_to_jiffies() being unsigned long, compiler does
implicit type promotion.We really want to constrain (tcp_time_stamp - ca->epoch_start)
to a signed 32bit value, or else 't' has unexpected high values.This bugs triggers an increase of retransmit rates ~24 days after
boot [1], as the high order bit of tcp_time_stamp flips.[1] for hosts with HZ=1000
Big thanks to Van Jacobson for spotting this problem.
Diagnosed-by: Van Jacobson
Signed-off-by: Eric Dumazet
Cc: Neal Cardwell
Cc: Yuchung Cheng
Cc: Stephen Hemminger
Acked-by: Neal Cardwell
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit aab515d7c32a34300312416c50314e755ea6f765 ]
AddressSanitizer [1] dynamic checker pointed a potential
out of bound access in leaf_walk_rcu()We could allocate one more slot in tnode_new() to leave the prefetch()
in-place but it looks not worth the pain.Bug added in commit 82cfbb008572b ("[IPV4] fib_trie: iterator recode")
[1] :
https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernelReported-by: Andrey Konovalov
Signed-off-by: Eric Dumazet
Cc: Dmitry Vyukov
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 446266b0c742a2c9ee8f0dce759a0117bce58a86 ]
Commit 5c766d642 ("ipv4: introduce address lifetime") leaves the ifa
resource that was allocated via inet_alloc_ifa() unfreed when returning
the function with -EINVAL. Thus, free it first via inet_free_ifa().Signed-off-by: Daniel Borkmann
Reviewed-by: Jiri Pirko
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
12 Aug, 2013
1 commit
-
[ Upstream commit 651e92716aaae60fc41b9652f54cb6803896e0da ]
Limit the min/max value passed to the
/proc/sys/net/ipv4/tcp_syn_retries.Signed-off-by: Michal Tesar
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
29 Jul, 2013
7 commits
-
[ Upstream commit 21d1196a35f5686c4323e42a62fdb4b23b0ab4a3 ]
commit 45f00f99d6e ("ipv4: tcp: clean up tcp_v4_early_demux()") added a
performance regression for non GRO traffic, basically disabling
IP early demux.IPv6 stack resets transport header in ip6_rcv() before calling
IP early demux in ip6_rcv_finish(), while IPv4 does this only in
ip_local_deliver_finish(), _after_ IP early demux.GRO traffic happened to enable IP early demux because transport header
is also set in inet_gro_receive()Instead of reverting the faulty commit, we can make IPv4/IPv6 behave the
same : transport_header should be set in ip_rcv() instead of
ip_local_deliver_finish()ip_local_deliver_finish() can also use skb_network_header_len() which is
faster than ip_hdrlen()Signed-off-by: Eric Dumazet
Cc: Neal Cardwell
Cc: Tom Herbert
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 8c91e162e058bb91b7766f26f4d5823a21941026 ]
This change fixes an MTU sizing issue seen with gretap tunnels when non-gso
packets are sent from the interface.In my case I was able to reproduce the issue by simply sending a ping of
1421 bytes with the gretap interface created on a device with a standard
1500 mtu.This fix is based on the fact that the tunnel mtu is already adjusted by
dev->hard_header_len so it would make sense that any packets being compared
against that mtu should also be adjusted by hard_header_len and the tunnel
header instead of just the tunnel header.Signed-off-by: Alexander Duyck
Reported-by: Cong Wang
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 8822b64a0fa64a5dd1dfcf837c5b0be83f8c05d1 ]
We accidentally call down to ip6_push_pending_frames when uncorking
pending AF_INET data on a ipv6 socket. This results in the following
splat (from Dave Jones):skbuff: skb_under_panic: text:ffffffff816765f6 len:48 put:40 head:ffff88013deb6df0 data:ffff88013deb6dec tail:0x2c end:0xc0 dev:
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:126!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in: dccp_ipv4 dccp 8021q garp bridge stp dlci mpoa snd_seq_dummy sctp fuse hidp tun bnep nfnetlink scsi_transport_iscsi rfcomm can_raw can_bcm af_802154 appletalk caif_socket can caif ipt_ULOG x25 rose af_key pppoe pppox ipx phonet irda llc2 ppp_generic slhc p8023 psnap p8022 llc crc_ccitt atm bluetooth
+netrom ax25 nfc rfkill rds af_rxrpc coretemp hwmon kvm_intel kvm crc32c_intel snd_hda_codec_realtek ghash_clmulni_intel microcode pcspkr snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep usb_debug snd_seq snd_seq_device snd_pcm e1000e snd_page_alloc snd_timer ptp snd pps_core soundcore xfs libcrc32c
CPU: 2 PID: 8095 Comm: trinity-child2 Not tainted 3.10.0-rc7+ #37
task: ffff8801f52c2520 ti: ffff8801e6430000 task.ti: ffff8801e6430000
RIP: 0010:[] [] skb_panic+0x63/0x65
RSP: 0018:ffff8801e6431de8 EFLAGS: 00010282
RAX: 0000000000000086 RBX: ffff8802353d3cc0 RCX: 0000000000000006
RDX: 0000000000003b90 RSI: ffff8801f52c2ca0 RDI: ffff8801f52c2520
RBP: ffff8801e6431e08 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: ffff88022ea0c800
R13: ffff88022ea0cdf8 R14: ffff8802353ecb40 R15: ffffffff81cc7800
FS: 00007f5720a10740(0000) GS:ffff880244c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000005862000 CR3: 000000022843c000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
ffff88013deb6dec 000000000000002c 00000000000000c0 ffffffff81a3f6e4
ffff8801e6431e18 ffffffff8159a9aa ffff8801e6431e90 ffffffff816765f6
ffffffff810b756b 0000000700000002 ffff8801e6431e40 0000fea9292aa8c0
Call Trace:
[] skb_push+0x3a/0x40
[] ip6_push_pending_frames+0x1f6/0x4d0
[] ? mark_held_locks+0xbb/0x140
[] udp_v6_push_pending_frames+0x2b9/0x3d0
[] ? udplite_getfrag+0x20/0x20
[] udp_lib_setsockopt+0x1aa/0x1f0
[] ? fget_light+0x387/0x4f0
[] udpv6_setsockopt+0x34/0x40
[] sock_common_setsockopt+0x14/0x20
[] SyS_setsockopt+0x71/0xd0
[] tracesys+0xdd/0xe2
Code: 00 00 48 89 44 24 10 8b 87 d8 00 00 00 48 89 44 24 08 48 8b 87 e8 00 00 00 48 c7 c7 c0 04 aa 81 48 89 04 24 31 c0 e8 e1 7e ff ff 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55
RIP [] skb_panic+0x63/0x65
RSPThis patch adds a check if the pending data is of address family AF_INET
and directly calls udp_push_ending_frames from udp_v6_push_pending_frames
if that is the case.This bug was found by Dave Jones with trinity.
(Also move the initialization of fl6 below the AF_INET check, even if
not strictly necessary.)Signed-off-by: Hannes Frederic Sowa
Cc: Dave Jones
Cc: YOSHIFUJI Hideaki
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 3b7b514f44bff05d26a6499c4d4fac2a83938e6e ]
This is a regression introduced by
commit fd58156e456d9f68fe0448 (IPIP: Use ip-tunneling code.)Similar to GRE tunnel, previously we only check the parameters
for SIOCADDTUNNEL and SIOCCHGTUNNEL, after that commit, the
check is moved for all commands.So, just check for SIOCADDTUNNEL and SIOCCHGTUNNEL.
Also, the check for i_key, o_key etc. is suspicious too,
which did not exist before, reset them before passing
to ip_tunnel_ioctl().Signed-off-by: Cong Wang
Cc: Pravin B Shelar
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 23a3647bc4f93bac3776c66dc2c7f7f68b3cd662 ]
In path mtu check, ip header total length works for gre device
but not for gre-tap device. Use skb len which is consistent
for all tunneling types. This is old bug in gre.
This also fixes mtu calculation bug introduced by
commit c54419321455631079c7d (GRE: Refactor GRE tunneling code).Reported-by: Timo Teras
Signed-off-by: Pravin B Shelar
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit ab6c7a0a43c2eaafa57583822b619b22637b49c7 ]
vti module allocates dev->tstats twice: in vti_fb_tunnel_init()
and in vti_tunnel_init(), this lead to a memory leak of
dev->tstats.Just remove the duplicated operations in vti_fb_tunnel_init().
(candidate for -stable)
Signed-off-by: Cong Wang
Cc: Stephen Hemminger
Cc: Saurabh Mohan
Acked-by: Stephen Hemminger
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 6c734fb8592f6768170e48e7102cb2f0a1bb9759 ]
When testing GRE tunnel, I got:
# ip tunnel show
get tunnel gre0 failed: Invalid argument
get tunnel gre1 failed: Invalid argumentThis is a regression introduced by commit c54419321455631079c7d
("GRE: Refactor GRE tunneling code.") because previously we
only check the parameters for SIOCADDTUNNEL and SIOCCHGTUNNEL,
after that commit, the check is moved for all commands.So, just check for SIOCADDTUNNEL and SIOCCHGTUNNEL.
After this patch I got:
# ip tunnel show
gre0: gre/ip remote any local any ttl inherit nopmtudisc
gre1: gre/ip remote 192.168.122.101 local 192.168.122.45 ttl inheritSigned-off-by: Cong Wang
Cc: Pravin B Shelar
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
26 Jun, 2013
1 commit
-
commit 68c331631143 ("v4 GRE: Add TCP segmentation offload for GRE")
added a possible skb leak, because it frees only the head of segment
list, in case a skb_linearize() call fails.This patch adds a kfree_skb_list() helper to fix the bug.
Signed-off-by: Eric Dumazet
Cc: Pravin B Shelar
Cc: Daniel Borkmann
Signed-off-by: David S. Miller
25 Jun, 2013
1 commit
-
Pablo Neira Ayuso says:
====================
The following patchset contains five fixes for Netfilter/IPVS, they are:* A skb leak fix in fragmentation handling in case that helpers are in place,
it occurs since the IPV6 NAT infrastructure, from Phil Oester.* Fix SCTP port mangling in ICMP packets for IPVS, from Julian Anastasov.
* Fix event delivery in ctnetlink regarding the new connlabel infrastructure,
from Florian Westphal.* Fix mangling in the SIP NAT helper, from Balazs Peter Odor.
* Fix crash in ipt_ULOG introduced while adding netnamespace support,
from Gao Feng.I'll take care of passing several of these patches to -stable once they hit
Linus' tree.
====================Signed-off-by: David S. Miller
24 Jun, 2013
1 commit
-
The parameter of setup_timer should be &ulog->nlgroup[i].
the incorrect parameter will cause kernel panic in
ulog_timer.Bug introducted in commit 355430671ad93546b34b4e91bdf720f3a704efa4
"netfilter: ipt_ULOG: add net namespace support for ipt_ULOG"ebt_ULOG doesn't have this problem.
[ I have mangled this patch to fix nlgroup != 0 case, we were
also crashing there --pablo ]Tested-by: George Spelvin
Reported-by: Borislav Petkov
Signed-off-by: Gao feng
Signed-off-by: Pablo Neira Ayuso
20 Jun, 2013
1 commit
-
MD5 key lookups on a given TCP socket were being performed
incorrectly. This fix alters parameter inputs to the MD5
lookup function tcp_md5_do_lookup, which is called by functions
tcp_md5_do_add and tcp_md5_do_del. Specifically, the change now
inputs the correct address and address family required to make
a proper lookup.Signed-off-by: Aydin Arik
Signed-off-by: David S. Miller
13 Jun, 2013
2 commits
-
If CONFIG_NET_NS is not set then __net_init is the same as __init and
__net_exit is the same as __exit. These functions will be removed from
memory after the module loads or is removed. Functions that are exported
for use by other functions should never be labeled for removal.Bug introduced by commit c54419321455631079c
("GRE: Refactor GRE tunneling code.")Reported-by: Steinar H. Gunderson
Signed-off-by: Steven Rostedt
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
If users apply shaper to vti tunnel then it will cause a kernel crash. The
problem seems to be due to the vti_tunnel_xmit function not clearing
skb->opt field before passing the packet to xfrm tunneling code.Signed-off-by: Saurabh Mohan
Acked-by: Stephen Hemminger
Signed-off-by: David S. Miller
31 May, 2013
1 commit
-
Pablo Neira Ayuso says:
====================
The following patchset contains Netfilter/IPVS fixes for 3.10-rc3,
they are:* fix xt_addrtype with IPv6, from Florian Westphal. This required
a new hook for IPv6 functions in the netfilter core to avoid
hard dependencies with the ipv6 subsystem when this match is
only used for IPv4.* fix connection reuse case in IPVS. Currently, if an reused
connection are directed to the same server. If that server is
down, those connection would fail. Therefore, clear the
connection and choose a new server among the available ones.* fix possible non-nul terminated string sent to user-space if
ipt_ULOG is used as the default netfilter logging stub, from
Chen Gang.* fix mark logging of IPv6 packets in xt_LOG, from Michal Kubecek.
This bug has been there since 2.6.26.* Fix breakage ip_vs_sh due to incorrect structure layout for
RCU, from Jan Beulich.
====================Signed-off-by: David S. Miller
28 May, 2013
1 commit
-
Unlike ipv4_redirect() and ipv4_sk_redirect(), ip_do_redirect()
doesn't call __build_flow_key() directly but via
ip_rt_build_flow_key() wrapper. This leads to __build_flow_key()
getting pointer to IPv4 header of the ICMP redirect packet
rather than pointer to the embedded IPv4 header of the packet
initiating the redirect.As a result, handling of ICMP redirects initiated by TCP packets
is broken. Issue was introduced by4895c771c ("ipv4: Add FIB nexthop exceptions.")
Signed-off-by: Michal Kubecek
Signed-off-by: David S. Miller
26 May, 2013
1 commit
-
Daniel Petre reported crashes in icmp_dst_unreach() with following call
graph:#3 [ffff88003fc03938] __stack_chk_fail at ffffffff81037f77
#4 [ffff88003fc03948] icmp_send at ffffffff814d5fec
#5 [ffff88003fc03ae8] ipv4_link_failure at ffffffff814a1795
#6 [ffff88003fc03af8] ipgre_tunnel_xmit at ffffffff814e7965
#7 [ffff88003fc03b78] dev_hard_start_xmit at ffffffff8146e032
#8 [ffff88003fc03bc8] sch_direct_xmit at ffffffff81487d66
#9 [ffff88003fc03c08] __qdisc_run at ffffffff81487efd
#10 [ffff88003fc03c48] dev_queue_xmit at ffffffff8146e5a7
#11 [ffff88003fc03c88] ip_finish_output at ffffffff814ab596Daniel found a similar problem mentioned in
http://lkml.indiana.edu/hypermail/linux/kernel/1007.0/00961.htmlAnd indeed this is the root cause : skb->cb[] contains data fooling IP
stack.We must clear IPCB in ip_tunnel_xmit() sooner in case dst_link_failure()
is called. Or else skb->cb[] might contain garbage from GSO segmentation
layer.A similar fix was tested on linux-3.9, but gre code was refactored in
linux-3.10. I'll send patches for stable kernels as well.Many thanks to Daniel for providing reports, patches and testing !
Reported-by: Daniel Petre
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
24 May, 2013
1 commit
-
commit 3853b5841c01a ("xps: Improvements in TX queue selection")
introduced ooo_okay flag, but the condition to set it is slightly wrong.In our traces, we have seen ACK packets being received out of order,
and RST packets sent in response.We should test if we have any packets still in host queue.
Signed-off-by: Eric Dumazet
Cc: Tom Herbert
Cc: Yuchung Cheng
Cc: Neal Cardwell
Signed-off-by: David S. Miller
23 May, 2013
2 commits
-
If nf_log uses ipt_ULOG as logging output, we can deliver non-null
terminated strings to user-space since the maximum length of the
prefix that is passed by nf_log is NF_LOG_PREFIXLEN but pm->prefix
is 32 bytes long (ULOG_PREFIX_LEN).This is actually happening already from nf_conntrack_tcp if ipt_ULOG
is used, since it is passing strings longer than 32 bytes.Signed-off-by: Chen Gang
Signed-off-by: Pablo Neira Ayuso -
This patch is a fix for a bug triggering newly_acked_sacked < 0
in tcp_ack(.).The bug is triggered by sacked_out decreasing relative to prior_sacked,
but packets_out remaining the same as pior_packets. This is because the
snapshot of prior_packets is taken after tcp_sacktag_write_queue() while
prior_sacked is captured before tcp_sacktag_write_queue(). The problem
is: tcp_sacktag_write_queue (tcp_match_skb_to_sack() -> tcp_fragment)
adjusts the pcount for packets_out and sacked_out (MSS change or other
reason). As a result, this delta in pcount is reflected in
(prior_sacked - sacked_out) but not in (prior_packets - packets_out).This patch does the following:
1) initializes prior_packets at the start of tcp_ack() so as to
capture the delta in packets_out created by tcp_fragment.
2) introduces a new "previous_packets_out" variable that snapshots
packets_out right before tcp_clean_rtx_queue, so pkts_acked can be
correctly computed as before.
3) Computes pkts_acked using previous_packets_out, and computes
newly_acked_sacked using prior_packets.Signed-off-by: Nandita Dukkipati
Acked-by: Yuchung Cheng
Signed-off-by: David S. Miller
20 May, 2013
1 commit
-
Another fix needed in ipgre_err(), as parse_gre_header() might change
skb->head.Bug added in commit c54419321455 (GRE: Refactor GRE tunneling code.)
Signed-off-by: Eric Dumazet
Cc: Pravin B Shelar
Signed-off-by: David S. Miller
17 May, 2013
2 commits
-
GSO TCP handler has following issues :
1) ooo_okay from original GSO packet is duplicated to all segments
2) segments (but the last one) are orphaned, so transmit path can not
get transmit queue number from the socket. This happens if GSO
segmentation is done before stacked device for example.Result is we can send packets from a given TCP flow to different TX
queues (if using multiqueue NICS). This generates OOO problems and
spurious SACK & retransmits.Fix this by keeping socket pointer set for all segments.
This means that every segment must also have a destructor, and the
original gso skb truesize must be split on all segments, to keep
precise sk->sk_wmem_alloc accounting.Signed-off-by: Eric Dumazet
Cc: Maciej Żenczykowski
Cc: Tom Herbert
Cc: Neal Cardwell
Cc: Yuchung Cheng
Signed-off-by: David S. Miller -
Pablo Neira Ayuso says:
====================
The following patchset contains three Netfilter fixes and update
for the MAINTAINER file for your net tree, they are:* Fix crash if nf_log_packet is called from conntrack, in that case
both interfaces are NULL, from Hans Schillstrom. This bug introduced
with the logging netns support in the previous merge window.* Fix compilation of nf_log and nf_queue without CONFIG_PROC_FS,
from myself. This bug was introduced in the previous merge window
with the new netns support for the netfilter logging infrastructure.* Fix possible crash in xt_TCPOPTSTRIP due to missing sanity
checkings to validate that the TCP header is well-formed, from
myself. I can find this bug in 2.6.25, probably it's been there
since the beginning. I'll pass this to -stable.* Update MAINTAINER file to point to new nf trees at git.kernel.org,
remove Harald and use M: instead of P: (now obsolete tag) to
keep Jozsef in the list of people.Please, consider pulling this. Thanks!
====================Signed-off-by: David S. Miller