01 Jun, 2014
40 commits
-
[ Upstream commit e33d0ba8047b049c9262fdb1fcafb93cb52ceceb ]
Recycling skb always had been very tough...
This time it appears GRO layer can accumulate skb->truesize
adjustments made by drivers when they attach a fragment to skb.skb_gro_receive() can only subtract from skb->truesize the used part
of a fragment.I spotted this problem seeing TcpExtPruneCalled and
TcpExtTCPRcvCollapsed that were unexpected with a recent kernel, where
TCP receive window should be sized properly to accept traffic coming
from a driver not overshooting skb->truesize.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit fbdc0ad095c0a299e9abf5d8ac8f58374951149a ]
the value of itag is a random value from stack, and may not be initiated by
fib_validate_source, which called fib_combine_itag if CONFIG_IP_ROUTE_CLASSID
is not setThis will make the cached dst uncertainty
Signed-off-by: Li RongQing
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 4de462ab63e23953fd05da511aeb460ae10cc726 ]
When GRE support was added in linux-3.14, CHECKSUM_COMPLETE handling
broke on GRE+IPv6 because we did not update/use the appropriate csum :GRO layer is supposed to use/update NAPI_GRO_CB(skb)->csum instead of
skb->csumTested using a GRE tunnel and IPv6 traffic. GRO aggregation now happens
at the first level (ethernet device) instead of being done in gre
tunnel. Native IPv6+TCP is still properly aggregated.Fixes: bf5a755f5e918 ("net-gre-gro: Add GRE support to the GRO stack")
Signed-off-by: Eric Dumazet
Cc: Jerry Chu
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit bf63ac73b3e132e6bf0c8798aba7b277c3316e19 ]
Kelly reported the following crash:
IP: [] tcf_action_exec+0x46/0x90
PGD 3009067 PUD 300c067 PMD 11ff30067 PTE 800000011634b060
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
CPU: 1 PID: 639 Comm: dhclient Not tainted 3.15.0-rc4+ #342
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
task: ffff8801169ecd00 ti: ffff8800d21b8000 task.ti: ffff8800d21b8000
RIP: 0010:[] [] tcf_action_exec+0x46/0x90
RSP: 0018:ffff8800d21b9b90 EFLAGS: 00010283
RAX: 00000000ffffffff RBX: ffff88011634b8e8 RCX: ffff8800cf7133d8
RDX: ffff88011634b900 RSI: ffff8800cf7133e0 RDI: ffff8800d210f840
RBP: ffff8800d21b9bb0 R08: ffffffff8287bf60 R09: 0000000000000001
R10: ffff8800d2b22b24 R11: 0000000000000001 R12: ffff8800d210f840
R13: ffff8800d21b9c50 R14: ffff8800cf7133e0 R15: ffff8800cad433d8
FS: 00007f49723e1840(0000) GS:ffff88011a800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff88011634b8f0 CR3: 00000000ce469000 CR4: 00000000000006e0
Stack:
ffff8800d2170188 ffff8800d210f840 ffff8800d2171b90 0000000000000000
ffff8800d21b9be8 ffffffff817c55bb ffff8800d21b9c50 ffff8800d2171b90
ffff8800d210f840 ffff8800d21b0300 ffff8800d21b9c50 ffff8800d21b9c18
Call Trace:
[] tcindex_classify+0x88/0x9b
[] tc_classify_compat+0x3e/0x7b
[] tc_classify+0x25/0x9f
[] htb_enqueue+0x55/0x27a
[] dsmark_enqueue+0x165/0x1a4
[] __dev_queue_xmit+0x35e/0x536
[] dev_queue_xmit+0x10/0x12
[] packet_sendmsg+0xb26/0xb9a
[] ? __lock_acquire+0x3ae/0xdf3
[] __sock_sendmsg_nosec+0x25/0x27
[] sock_aio_write+0xd0/0xe7
[] do_sync_write+0x59/0x78
[] vfs_write+0xb5/0x10a
[] SyS_write+0x49/0x7f
[] system_call_fastpath+0x16/0x1bThis is because we memcpy struct tcindex_filter_result which contains
struct tcf_exts, obviously struct list_head can not be simply copied.
This is a regression introduced by commit 33be627159913b094bb578
(net_sched: act: use standard struct list_head).It's not very easy to fix it as the code is a mess:
if (old_r)
memcpy(&cr, r, sizeof(cr));
else {
memset(&cr, 0, sizeof(cr));
tcf_exts_init(&cr.exts, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE);
}
...
tcf_exts_change(tp, &cr.exts, &e);
...
memcpy(r, &cr, sizeof(cr));the above code should equal to:
tcindex_filter_result_init(&cr);
if (old_r)
cr.res = r->res;
...
if (old_r)
tcf_exts_change(tp, &r->exts, &e);
else
tcf_exts_change(tp, &cr.exts, &e);
...
r->res = cr.res;after this change, since there is no need to copy struct tcf_exts.
And it also fixes other places zero'ing struct's contains struct tcf_exts.
Fixes: commit 33be627159913b0 (net_sched: act: use standard struct list_head)
Reported-by: Kelly Anderson
Tested-by: Kelly Anderson
Cc: David S. Miller
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 78ff4be45a4c51d8fb21ad92e4fabb467c6c3eeb ]
We need to initialize the fallback device to have a correct mtu
set on this device. Otherwise the mtu is set to null and the device
is unusable.Fixes: fd58156e456d ("IPIP: Use ip-tunneling code.")
Cc: Pravin B Shelar
Signed-off-by: Steffen Klassert
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit cc2f33860cea0e48ebec096130bd0f7c4bf6e0bc ]
Change introduced by 88e48d7b3340ef07b108eb8a8b3813dd093cc7f7
("batman-adv: make DAT drop ARP requests targeting local clients")
implements a check that prevents DAT from using the caching
mechanism when the client that is supposed to provide a reply
to an arp request is local.However change brought by be1db4f6615b5e6156c807ea8985171c215c2d57
("batman-adv: make the Distributed ARP Table vlan aware")
has not converted the above check into its vlan aware version
thus making it useless when the local client is behind a vlan.Fix the behaviour by properly specifying the vlan when
checking for a client being local or not.Reported-by: Simon Wunderlich
Signed-off-by: Antonio Quartulli
Signed-off-by: Marek Lindner
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 377fe0f968b30a1a714fab53a908061914f30e26 ]
A pointer to the orig_node representing a bat-gateway is
stored in the gw_node->orig_node member, but the refcount
for such orig_node is never increased.
This leads to memory faults when gw_node->orig_node is accessed
and the originator has already been freed.Fix this by increasing the refcount on gw_node creation
and decreasing it on gw_node free.Signed-off-by: Antonio Quartulli
Signed-off-by: Marek Lindner
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit be181015a189cd141398b761ba4e79d33fe69949 ]
In the new fragmentation code the batadv_frag_send_packet()
function obtains a reference to the primary_if, but it does
not release it upon return.This reference imbalance prevents the primary_if (and then
the related netdevice) to be properly released on shut down.Fix this by releasing the primary_if in batadv_frag_send_packet().
Introduced by ee75ed88879af88558818a5c6609d85f60ff0df4
("batman-adv: Fragment and send skbs larger than mtu")Cc: Martin Hundebøll
Signed-off-by: Antonio Quartulli
Signed-off-by: Marek Lindner
Acked-by: Martin Hundebøll
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 16a4142363b11952d3aa76ac78004502c0c2fe6e ]
If hard_iface is NULL and goto out is made batadv_hardif_free_ref()
doesn't check for NULL before dereferencing it to get to refcount.Introduced in cb1c92ec37fb70543d133a1fa7d9b54d6f8a1ecd
("batman-adv: add debugfs support to view multiif tables").Reported-by: Sven Eckelmann
Signed-off-by: Marek Lindner
Acked-by: Antonio Quartulli
Signed-off-by: Antonio Quartulli
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 29e98242783ed3ba569797846a606ba66f781625 ]
Starting from linux-3.13, GRO attempts to build full size skbs.
Problem is the commit assumed one particular field in skb->cb[]
was clean, but it is not the case on some stacked devices.Timo reported a crash in case traffic is decrypted before
reaching a GRE device.Fix this by initializing NAPI_GRO_CB(skb)->last at the right place,
this also removes one conditional.Thanks a lot to Timo for providing full reports and bisecting this.
Fixes: 8a29111c7ca6 ("net: gro: allow to build full sized skb")
Bisected-by: Timo Teras
Signed-off-by: Eric Dumazet
Tested-by: Timo Teräs
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 81c708068dfedece038e07d818ba68333d8d885d ]
I've missed to add a NULL entry to the bond_intmax_tbl when I introduced
it with the conversion of arp_interval so add it now.CC: Jay Vosburgh
CC: Veaceslav Falico
CC: Andy GospodarekFixes: 7bdb04ed0dbf ("bonding: convert arp_interval to use the new option API")
Signed-off-by: Nikolay Aleksandrov
Acked-by: Veaceslav Falico
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit b394745df2d9d4c30bf1bcc55773bec6f3bc7c67 ]
After the call to phy_init_hw failed in phy_attach_direct, phy_detach is called
to detach the phy device from its network device. If the attached driver is a
generic phy driver, this also detaches the driver. Subsequently phy_resume
is called, which assumes without checking that a driver is attached to the
device. This will result in a crash such asUnable to handle kernel paging request for data at address 0xffffffffffffff90
Faulting instruction address: 0xc0000000003a0e18
Oops: Kernel access of bad area, sig: 11 [#1]
...
NIP [c0000000003a0e18] .phy_attach_direct+0x68/0x17c
LR [c0000000003a0e6c] .phy_attach_direct+0xbc/0x17c
Call Trace:
[c0000003fc0475d0] [c0000000003a0e6c] .phy_attach_direct+0xbc/0x17c (unreliable)
[c0000003fc047670] [c0000000003a0ff8] .phy_connect_direct+0x28/0x98
[c0000003fc047700] [c0000000003f0074] .of_phy_connect+0x4c/0xa4Only call phy_resume if phy_init_hw was successful.
Signed-off-by: Guenter Roeck
Acked-by: Florian Fainelli
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 200b916f3575bdf11609cb447661b8d5957b0bbf ]
From: Cong Wang
commit 50624c934db18ab90 (net: Delay default_device_exit_batch until no
devices are unregistering) introduced rtnl_lock_unregistering() for
default_device_exit_batch(). Same race could happen we when rmmod a driver
which calls rtnl_link_unregister() as we call dev->destructor without rtnl
lock.For long term, I think we should clean up the mess of netdev_run_todo()
and net namespce exit code.Cc: Eric W. Biederman
Cc: David S. Miller
Signed-off-by: Cong Wang
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 3a1cebe7e05027a1c96f2fc1a8eddf5f19b78f42 ]
tot_len does specify the size of struct ipv6_txoptions. We need opt_flen +
opt_nflen to calculate the overall length of additional ipv6 extensions.I found this while auditing the ipv6 output path for a memory corruption
reported by Alexey Preobrazhensky while he fuzzed an instrumented
AddressSanitizer kernel with trinity. This may or may not be the cause
of the original bug.Fixes: 4df98e76cde7c6 ("ipv6: pmtudisc setting not respected with UFO/CORK")
Reported-by: Alexey Preobrazhensky
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 3d4405226d27b3a215e4d03cfa51f536244e5de7 ]
net_get_random_once depends on the static keys infrastructure to patch up
the branch to the slow path during boot. This was realized by abusing the
static keys api and defining a new initializer to not enable the call
site while still indicating that the branch point should get patched
up. This was needed to have the fast path considered likely by gcc.The static key initialization during boot up normally walks through all
the registered keys and either patches in ideal nops or enables the jump
site but omitted that step on x86 if ideal nops where already placed at
static_key branch points. Thus net_get_random_once branches not always
became active.This patch switches net_get_random_once to the ordinary static_key
api and thus places the kernel fast path in the - by gcc considered -
unlikely path. Microbenchmarks on Intel and AMD x86-64 showed that
the unlikely path actually beats the likely path in terms of cycle cost
and that different nop patterns did not make much difference, thus this
switch should not be noticeable.Fixes: a48e42920ff38b ("net: introduce new macro net_get_random_once")
Reported-by: Tuomas Räsänen
Cc: Linus Torvalds
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit e84d2f8d2ae33c8215429824e1ecf24cbca9645e ]
This is the s390 variant of Alexei's JIT bug fix.
(patch description below stolen from Alexei's patch)bpf_alloc_binary() adds 128 bytes of room to JITed program image
and rounds it up to the nearest page size. If image size is close
to page size (like 4000), it is rounded to two pages:
round_up(4000 + 4 + 128) == 8192
then 'hole' is computed as 8192 - (4000 + 4) = 4188
If prandom_u32() % hole selects a number >= PAGE_SIZE - sizeof(*header)
then kernel will crash during bpf_jit_free():kernel BUG at arch/x86/mm/pageattr.c:887!
Call Trace:
[] change_page_attr_set_clr+0x135/0x460
[] ? _raw_spin_unlock_irq+0x30/0x50
[] set_memory_rw+0x2f/0x40
[] bpf_jit_free_deferred+0x2d/0x60
[] process_one_work+0x1d8/0x6a0
[] ? process_one_work+0x178/0x6a0
[] worker_thread+0x11c/0x370since bpf_jit_free() does:
unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK;
struct bpf_binary_header *header = (void *)addr;
to compute start address of 'bpf_binary_header'
and header->pages will pass junk to:
set_memory_rw(addr, header->pages);Fix it by making sure that &header->image[prandom_u32() % hole] and &header
are in the same page.Fixes: aa2d2c73c21f2 ("s390/bpf,jit: address randomize and write protect jit code")
Reported-by: Alexei Starovoitov
Cc: # v3.11+
Signed-off-by: Heiko Carstens
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 773cd38f40b8834be991dbfed36683acc1dd41ee ]
bpf_alloc_binary() adds 128 bytes of room to JITed program image
and rounds it up to the nearest page size. If image size is close
to page size (like 4000), it is rounded to two pages:
round_up(4000 + 4 + 128) == 8192
then 'hole' is computed as 8192 - (4000 + 4) = 4188
If prandom_u32() % hole selects a number >= PAGE_SIZE - sizeof(*header)
then kernel will crash during bpf_jit_free():kernel BUG at arch/x86/mm/pageattr.c:887!
Call Trace:
[] change_page_attr_set_clr+0x135/0x460
[] ? _raw_spin_unlock_irq+0x30/0x50
[] set_memory_rw+0x2f/0x40
[] bpf_jit_free_deferred+0x2d/0x60
[] process_one_work+0x1d8/0x6a0
[] ? process_one_work+0x178/0x6a0
[] worker_thread+0x11c/0x370since bpf_jit_free() does:
unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK;
struct bpf_binary_header *header = (void *)addr;
to compute start address of 'bpf_binary_header'
and header->pages will pass junk to:
set_memory_rw(addr, header->pages);Fix it by making sure that &header->image[prandom_u32() % hole] and &header
are in the same pageFixes: 314beb9bcabfd ("x86: bpf_jit_comp: secure bpf jit against spraying attacks")
Signed-off-by: Alexei Starovoitov
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 709de13f0c532fe9c468c094aff069a725ed57fe ]
When an interface is removed separately, all neighbors need to be
checked if they have a neigh_ifinfo structure for that particular
interface. If that is the case, remove that ifinfo so any references to
a hard interface can be freed.This is a regression introduced by
89652331c00f43574515059ecbf262d26d885717
("batman-adv: split tq information in neigh_node struct")Reported-by: Antonio Quartulli
Signed-off-by: Simon Wunderlich
Signed-off-by: Marek Lindner
Signed-off-by: Antonio Quartulli
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 7b955a9fc164487d7c51acb9787f6d1b01b35ef6 ]
The current code will not execute batadv_purge_orig_neighbors() when an
orig_ifinfo has already been purged. However we need to run it in any
case. Fix that.This is a regression introduced by
7351a4822d42827ba0110677c0cbad88a3d52585
("batman-adv: split out router from orig_node")Signed-off-by: Simon Wunderlich
Signed-off-by: Marek Lindner
Signed-off-by: Antonio Quartulli
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 000c8dff97311357535d64539e58990526e4de70 ]
When an interface is removed from batman-adv, the orig_ifinfo of a
orig_node may be removed without releasing the router first.
This will prevent the reference for the neighbor pointed at by the
orig_ifinfo->router to be released, and this leak may result in
reference leaks for the interface used by this neighbor. Fix that.This is a regression introduced by
7351a4822d42827ba0110677c0cbad88a3d52585
("batman-adv: split out router from orig_node").Reported-by: Antonio Quartulli
Signed-off-by: Simon Wunderlich
Signed-off-by: Marek Lindner
Signed-off-by: Antonio Quartulli
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit c1e517fbbcdb13f50662af4edc11c3251fe44f86 ]
The neigh_ifinfo object must be freed if it has been used in
batadv_iv_ogm_process_per_outif().This is a regression introduced by
89652331c00f43574515059ecbf262d26d885717
("batman-adv: split tq information in neigh_node struct")Reported-by: Antonio Quartulli
Signed-off-by: Simon Wunderlich
Signed-off-by: Marek Lindner
Signed-off-by: Antonio Quartulli
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 2176d5d41891753774f648b67470398a5acab584 ]
Since commit 7e98056964("ipv6: router reachability probing"), a router falls
into NUD_FAILED will be probed.Now if function rt6_select() selects a router which neighbour state is NUD_FAILED,
and at the same time function rt6_probe() changes the neighbour state to NUD_PROBE,
then function dst_neigh_output() can directly send packets, but actually the
neighbour still is unreachable. If we set nud_state to NUD_INCOMPLETE instead
NUD_PROBE, packets will not be sent out until the neihbour is reachable.In addition, because the route should be probes with a single NS, so we must
set neigh->probes to neigh_max_probes(), then the neigh timer timeout and function
neigh_timer_handler() will not send other NS Messages.Signed-off-by: Duan Jiong
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit c8965932a2e3b70197ec02c6741c29460279e2a8 ]
The function ip6_tnl_validate assumes that the rtnl
attribute IFLA_IPTUN_PROTO always be filled . If this
attribute is not filled by the userspace application
kernel get crashed with NULL pointer dereference. This
patch fixes the potential kernel crash when
IFLA_IPTUN_PROTO is missing .Signed-off-by: Susant Sahani
Acked-by: Thomas Graf
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 1c3639005f48492e5f2d965779efd814e80f8b15 ]
If the sfc driver is in legacy interrupt mode (either explicitly by
using interrupt_mode module param or by falling back to it) it will
hit a warning at kernel/irq/manage.c because it will try to free an irq
which wasn't allocated by it in the first place because the MSI(X) irqs are
zero and it'll try to free them unconditionally. So fix it by checking if
we're in legacy mode and freeing the appropriate irqs.CC: Zenghui Shi
CC: Ben Hutchings
CC:
CC: Shradha Shah
CC: David S. MillerFixes: 1899c111a535 ("sfc: Fix IRQ cleanup in case of a probe failure")
Reported-by: Zenghui Shi
Signed-off-by: Nikolay Aleksandrov
Acked-by: Shradha Shah
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit bbeb0eadcf9fe74fb2b9b1a6fea82cd538b1e556 ]
Clearing the IFF_ALLMULTI flag on a down interface could cause an allmulti
overflow on the underlying interface.Attempting the set IFF_ALLMULTI on the underlying interface would cause an
error and the log message:"allmulti touches root, set allmulti failed."
Signed-off-by: Peter Christensen
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 6b5eeb7f874b689403e52a646e485d0191ab9507 ]
This driver maps 802.1q VLANs to MBIM sessions. The mapping is based on
a bogus assumption that all tagged frames will use the acceleration API
because we enable NETIF_F_HW_VLAN_CTAG_TX. This fails for e.g. frames
tagged in userspace using packet sockets. Such frames will erroneously
be considered as untagged and silently dropped based on not being IP.Fix by falling back to looking into the ethernet header for a tag if no
accelerated tag was found.Fixes: a82c7ce5bc5b ("net: cdc_ncm: map MBIM IPS SessionID to VLAN ID")
Cc: Greg Suarez
Signed-off-by: Bjørn Mork
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit aeefa1ecfc799b0ea2c4979617f14cecd5cccbfd ]
Increment fib_info_cnt in fib_create_info() right after successfuly
alllocating fib_info structure, overwise fib_metrics allocation failure
leads to fib_info_cnt incorrectly decremented in free_fib_info(), called
on error path from fib_create_info().Signed-off-by: Sergey Popovich
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 418a31561d594a2b636c1e2fa94ecd9e1245abb1 ]
If conntrack defragments incoming ipv6 frags it stores largest original
frag size in ip6cb and sets ->local_df.We must thus first test the largest original frag size vs. mtu, and not
vice versa.Without this patch PKTTOOBIG is still generated in ip6_fragment() later
in the stack, but1) IPSTATS_MIB_INTOOBIGERRORS won't increment
2) packet did (needlessly) traverse netfilter postrouting hook.Fixes: fe6cc55f3a9 ("net: ip, ipv6: handle gso skbs in forwarding path")
Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit ca6c5d4ad216d5942ae544bbf02503041bd802aa ]
local_df means 'ignore DF bit if set', so if its set we're
allowed to perform ip fragmentation.This wasn't noticed earlier because the output path also drops such skbs
(and emits needed icmp error) and because netfilter ip defrag did not
set local_df until couple of days ago.Only difference is that DF-packets-larger-than MTU now discarded
earlier (f.e. we avoid pointless netfilter postrouting trip).While at it, drop the repeated test ip_exceeds_mtu, checking it once
is enough...Fixes: fe6cc55f3a9 ("net: ip, ipv6: handle gso skbs in forwarding path")
Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 4f4178f3bb1f470d7fb863ec531e08e20a0fd51c ]
Fixes this warning introduced by commit 5b8f15f78e6f
("net: cdc_mbim: handle IPv6 Neigbor Solicitations"):===============================
[ INFO: suspicious RCU usage. ]
3.15.0-rc3 #213 Tainted: G W O
-------------------------------
net/8021q/vlan_core.c:69 suspicious rcu_dereference_check() usage!other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 1
no locks held by ksoftirqd/0/3.stack backtrace:
CPU: 0 PID: 3 Comm: ksoftirqd/0 Tainted: G W O 3.15.0-rc3 #213
Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011
0000000000000001 ffff880232533bf0 ffffffff813a5ee6 0000000000000006
ffff880232530090 ffff880232533c20 ffffffff81076b94 0000000000000081
0000000000000000 ffff8802085ac000 ffff88007fc8ea00 ffff880232533c50
Call Trace:
[] dump_stack+0x4e/0x68
[] lockdep_rcu_suspicious+0xfa/0x103
[] __vlan_find_dev_deep+0x54/0x94
[] cdc_mbim_rx_fixup+0x379/0x66a [cdc_mbim]
[] ? _raw_spin_unlock_irqrestore+0x3a/0x49
[] ? trace_hardirqs_on_caller+0x192/0x1a1
[] usbnet_bh+0x59/0x287 [usbnet]
[] tasklet_action+0xbb/0xcd
[] __do_softirq+0x14c/0x30d
[] run_ksoftirqd+0x1f/0x50
[] smpboot_thread_fn+0x172/0x18e
[] ? SyS_setgroups+0xdf/0xdf
[] kthread+0xb5/0xbd
[] ? __wait_for_common+0x13b/0x170
[] ? __kthread_parkme+0x5c/0x5c
[] ret_from_fork+0x7c/0xb0
[] ? __kthread_parkme+0x5c/0x5cFixes: 5b8f15f78e6f ("net: cdc_mbim: handle IPv6 Neigbor Solicitations")
Signed-off-by: Bjørn Mork
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 22fb22eaebf4d16987f3fd9c3484c436ee0badf2 ]
The connected check fails to check for ip_gre nbma mode tunnels
properly. ip_gre creates temporary tnl_params with daddr specified
to pass-in the actual target on per-packet basis from neighbor
layer. Detect these tunnels by inspecting the actual tunnel
configuration.Minimal test case:
ip route add 192.168.1.1/32 via 10.0.0.1
ip route add 192.168.1.2/32 via 10.0.0.2
ip tunnel add nbma0 mode gre key 1 tos c0
ip addr add 172.17.0.0/16 dev nbma0
ip link set nbma0 up
ip neigh add 172.17.0.1 lladdr 192.168.1.1 dev nbma0
ip neigh add 172.17.0.2 lladdr 192.168.1.2 dev nbma0
ping 172.17.0.1
ping 172.17.0.2The second ping should be going to 192.168.1.2 and head 10.0.0.2;
but cached gre tunnel level route is used and it's actually going
to 192.168.1.1 via 10.0.0.1.The lladdr's need to go to separate dst for the bug to trigger.
Test case uses separate route entries, but this can also happen
when the route entry is same: if there is a nexthop exception or
the GRE tunnel is IPsec'ed in which case the dst points to xfrm
bundle unique to the gre lladdr.Fixes: 7d442fab0a67 ("ipv4: Cache dst in tunnels")
Signed-off-by: Timo Teräs
Cc: Tom Herbert
Cc: Eric Dumazet
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit e96f2e7c430014eff52c93cabef1ad4f42ed0db1 ]
In ip_tunnel_rcv(), set skb->network_header to inner IP header
before IP_ECN_decapsulate().Without the fix, IP_ECN_decapsulate() takes outer IP header as
inner IP header, possibly causing error messages or packet drops.Note that this skb_reset_network_header() call was in this spot when
the original feature for checking consistency of ECN bits through
tunnels was added in eccc1bb8d4b4 ("tunnel: drop packet if ECN present
with not-ECT"). It was only removed from this spot in 3d7b46cd20e3
("ip_tunnel: push generic protocol handling to ip_tunnel module.").Fixes: 3d7b46cd20e3 ("ip_tunnel: push generic protocol handling to ip_tunnel module.")
Reported-by: Neal Cardwell
Signed-off-by: Ying Cai
Acked-by: Neal Cardwell
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 83d3459a5928f18c9344683e31bc2a7c3c25562a ]
Carrying out PCI speed/width checks through pcie_get_minimum_link()
on VFs yield wrong results, so remove them.Fixes: b912b2f ('net/mlx4_core: Warn if device doesn't have enough PCI bandwidth')
Signed-off-by: Eyal Perry
Signed-off-by: Or Gerlitz
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 9becd707841207652449a8dfd90fe9c476d88546 ]
Commit 4d619f625a60 ("net: cdc_ncm: no point in filling up the NTBs
if we send ZLPs") changed the padding logic for devices with the ZLP
flag set. This meant that frames of any size will be sent without
additional padding, except for the single byte added if the size is
a multiple of the USB packet size. But if the unpadded size is
identical to the maximum frame size, and the maximum size is a
multiplum of the USB packet size, then this one-byte padding will
overflow the buffer.Prevent padding if already at maximum frame size, letting usbnet
transmit a ZLP instead in this case.Fixes: 4d619f625a60 ("net: cdc_ncm: no point in filling up the NTBs if we send ZLPs")
Reported by: Yu-an Shih
Signed-off-by: Bjørn Mork
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 2c4a336e0a3e203fab6aa8d8f7bb70a0ad968a6b ]
Right now the core vsock module is the owner of the proto family. This
means there's nothing preventing the transport module from unloading if
there are open sockets, which results in a panic. Fix that by allowing
the transport to be the owner, which will refcount it properly.Includes version bump to 1.0.1.0-k
Passes checkpatch this time, I swear...
Acked-by: Dmitry Torokhov
Signed-off-by: Andy King
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit f6a082fed1e6407c2f4437d0d963b1bcbe5f9f58 ]
hhf_change() takes the sch_tree_lock and releases it but misses the
error cases. Fix the missed case here.To reproduce try a command like this,
# tc qdisc change dev p3p2 root hhf quantum 40960 non_hh_weight 300000
Signed-off-by: John Fastabend
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 0cda345d1b2201dd15591b163e3c92bad5191745 ]
commit b9f47a3aaeab (tcp_cubic: limit delayed_ack ratio to prevent
divide error) try to prevent divide error, but there is still a little
chance that delayed_ack can reach zero. In case the param cnt get
negative value, then ratio+cnt would overflow and may happen to be zero.
As a result, min(ratio, ACK_RATIO_LIMIT) will calculate to be zero.In some old kernels, such as 2.6.32, there is a bug that would
pass negative param, which then ultimately leads to this divide error.commit 5b35e1e6e9c (tcp: fix tcp_trim_head() to adjust segment count
with skb MSS) fixed the negative param issue. However,
it's safe that we fix the range of delayed_ack as well,
to make sure we do not hit a divide by zero.CC: Stephen Hemminger
Signed-off-by: Liu Yu
Signed-off-by: Eric Dumazet
Acked-by: Neal Cardwell
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit f114890cdf84d753f6b41cd0cc44ba51d16313da ]
This reverts commit 12a2856b604476c27d85a5f9a57ae1661fc46019.
The commit above doesn't appear to be necessary any more as the
checksums appear to be correctly computed/validated.Additionally the above commit breaks kvm configurations where
one VM is using a device that support checksum offload (virtio) and
the other VM does not.
In this case, packets leaving virtio device will have CHECKSUM_PARTIAL
set. The packets is forwarded to a macvtap that has offload features
turned off. Since we use CHECKSUM_UNNECESSARY, the host does does not
update the checksum and thus a bad checksum is passed up to
the guest.CC: Daniel Lezcano
CC: Patrick McHardy
CC: Andrian Nord
CC: Eric Dumazet
CC: Michael S. Tsirkin
CC: Jason Wang
Signed-off-by: Vlad Yasevich
Acked-by: Michael S. Tsirkin
Acked-by: Jason Wang
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit cbdb04279ccaefcc702c8757757eea8ed76e50cf ]
The following is a problematic configuration:
VM1: virtio-net device connected to macvtap0@eth0
VM2: e1000 device connect to macvtap1@eth0The problem is is that virtio-net supports checksum offloading
and thus sends the packets to the host with CHECKSUM_PARTIAL set.
On the other hand, e1000 does not support any acceleration.For small TCP packets (and this includes the 3-way handshake),
e1000 ends up receiving packets that only have a partial checksum
set. This causes TCP to fail checksum validation and to drop
packets. As a result tcp connections can not be established.Commit 3e4f8b787370978733ca6cae452720a4f0c296b8
macvtap: Perform GSO on forwarding path.
fixes this issue for large packets wthat will end up undergoing GSO.
This commit adds a check for the non-GSO case and attempts to
compute the checksum for partially checksummed packets in the
non-GSO case.CC: Daniel Lezcano
CC: Patrick McHardy
CC: Andrian Nord
CC: Eric Dumazet
CC: Michael S. Tsirkin
CC: Jason Wang
Signed-off-by: Vlad Yasevich
Acked-by: Jason Wang
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman