Eric Lee / smarc-fsl-linux-kernel

17 Nov, 2015

1 commit

41033f029 snmp: Remove duplicate OUTMCAST stat increment ... Browse Code »

the OUTMCAST stat is double incremented, getting bumped once in the mcast code
itself, and again in the common ip output path. Remove the mcast bump, as its
not needed

Validated by the reporter, with good results

Signed-off-by: Neil Horman
Reported-by: Claus Jensen
CC: Claus Jensen
CC: David Miller
Signed-off-by: David S. Miller

Neil Horman
2015-11-17 05:36:32 +0800

16 Nov, 2015

4 commits

00fd38d93 tcp: ensure proper barriers in lockless contexts ... Browse Code »

Some functions access TCP sockets without holding a lock and
might output non consistent data, depending on compiler and or
architecture.

tcp_diag_get_info(), tcp_get_info(), tcp_poll(), get_tcp4_sock() ...

Introduce sk_state_load() and sk_state_store() to fix the issues,
and more clearly document where this lack of locking is happening.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2015-11-16 07:36:38 +0800
02bcf4e08 ipv6: Check rt->dst.from for the DST_NOCACHE route ... Browse Code »

All DST_NOCACHE rt6_info used to have rt->dst.from set to
its parent.

After commit 8e3d5be73681 ("ipv6: Avoid double dst_free"),
DST_NOCACHE is also set to rt6_info which does not have
a parent (i.e. rt->dst.from is NULL).

This patch catches the rt->dst.from == NULL case.

Fixes: 8e3d5be73681 ("ipv6: Avoid double dst_free")
Signed-off-by: Martin KaFai Lau
Cc: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Martin KaFai Lau
2015-11-16 06:12:37 +0800
5973fb1e2 ipv6: Check expire on DST_NOCACHE route ... Browse Code »

Since the expires of the DST_NOCACHE rt can be set during
the ip6_rt_update_pmtu(), we also need to consider the expires
value when doing ip6_dst_check().

This patches creates __rt6_check_expired() to only
check the expire value (if one exists) of the current rt.

In rt6_dst_from_check(), it adds __rt6_check_expired() as
one of the condition check.

Signed-off-by: Martin KaFai Lau
Cc: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Martin KaFai Lau
2015-11-16 06:12:37 +0800
0d3f6d297 ipv6: Avoid creating RTF_CACHE from a rt that is not managed by fib6 tree ... Browse Code »

The original bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=1272571

The setup has a IPv4 GRE tunnel running in a IPSec. The bug
happens when ndisc starts sending router solicitation at the gre
interface. The simplified oops stack is like:

__lock_acquire+0x1b2/0x1c30
lock_acquire+0xb9/0x140
_raw_write_lock_bh+0x3f/0x50
__ip6_ins_rt+0x2e/0x60
ip6_ins_rt+0x49/0x50
~~~~~~~~
__ip6_rt_update_pmtu.part.54+0x145/0x250
ip6_rt_update_pmtu+0x2e/0x40
~~~~~~~~
ip_tunnel_xmit+0x1f1/0xf40
__gre_xmit+0x7a/0x90
ipgre_xmit+0x15a/0x220
dev_hard_start_xmit+0x2bd/0x480
__dev_queue_xmit+0x696/0x730
dev_queue_xmit+0x10/0x20
neigh_direct_output+0x11/0x20
ip6_finish_output2+0x21f/0x770
ip6_finish_output+0xa7/0x1d0
ip6_output+0x56/0x190
~~~~~~~~
ndisc_send_skb+0x1d9/0x400
ndisc_send_rs+0x88/0xc0
~~~~~~~~

The rt passed to ip6_rt_update_pmtu() is created by
icmp6_dst_alloc() and it is not managed by the fib6 tree,
so its rt6i_table == NULL. When __ip6_rt_update_pmtu() creates
a RTF_CACHE clone, the newly created clone also has rt6i_table == NULL
and it causes the ip6_ins_rt() oops.

During pmtu update, we only want to create a RTF_CACHE clone
from a rt which is currently managed (or owned) by the
fib6 tree. It means either rt->rt6i_node != NULL or
rt is a RTF_PCPU clone.

It is worth to note that rt6i_table may not be NULL even it is
not (yet) managed by the fib6 tree (e.g. addrconf_dst_alloc()).
Hence, rt6i_node is a better check instead of rt6i_table.

Fixes: 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu")
Signed-off-by: Martin KaFai Lau
Reported-by: Chris Siebenmann
Cc: Chris Siebenmann
Cc: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Martin KaFai Lau
2015-11-16 06:12:37 +0800

06 Nov, 2015

2 commits

49a496c97 tcp: use correct req pointer in tcp_move_syn() calls ... Browse Code »

I mistakenly took wrong request sock pointer when calling tcp_move_syn()

@req_unhash is either a copy of @req, or a NULL value for
FastOpen connexions (as we do not expect to unhash the temporary
request sock from ehash table)

Fixes: 805c4bc05705 ("tcp: fix req->saved_syn race")
Signed-off-by: Eric Dumazet
Cc: Ying Cai
Signed-off-by: David S. Miller

Eric Dumazet
2015-11-06 04:57:51 +0800
805c4bc05 tcp: fix req->saved_syn race ... Browse Code »

For the reasons explained in commit ce1050089c96 ("tcp/dccp: fix
ireq->pktopts race"), we need to make sure we do not access
req->saved_syn unless we own the request sock.

This fixes races for listeners using TCP_SAVE_SYN option.

Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets")
Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Eric Dumazet
Reported-by: Ying Cai
Signed-off-by: David S. Miller

Eric Dumazet
2015-11-06 03:36:09 +0800

05 Nov, 2015

1 commit

2a189f9e5 ipv6: clean up dev_snmp6 proc entry when we fail to initialize inet6_dev ... Browse Code »

In ipv6_add_dev, when addrconf_sysctl_register fails, we do not clean up
the dev_snmp6 entry that we have already registered for this device.
Call snmp6_unregister_dev in this case.

Fixes: a317a2f19da7d ("ipv6: fail early when creating netdev named all or default")
Reported-by: Dmitry Vyukov
Signed-off-by: Sabrina Dubroca
Signed-off-by: David S. Miller

Sabrina Dubroca
2015-11-05 12:49:48 +0800

04 Nov, 2015

1 commit

73186df8d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Minor overlapping changes in net/ipv4/ipmr.c, in 'net' we were
fixing the "BH-ness" of the counter bumps whilst in 'net-next'
the functions were modified to take an explicit 'net' parameter.

Signed-off-by: David S. Miller

David S. Miller
2015-11-04 02:41:45 +0800

03 Nov, 2015

5 commits

ebac62fe3 ipv6: fix tunnel error handling ... Browse Code »

Both tunnel6_protocol and tunnel46_protocol share the same error
handler, tunnel6_err(), which traverses through tunnel6_handlers list.
For ipip6 tunnels, we need to traverse tunnel46_handlers as we do e.g.
in tunnel46_rcv(). Current code can generate an ICMPv6 error message
with an IPv4 packet embedded in it.

Fixes: 73d605d1abbd ("[IPSEC]: changing API of xfrm6_tunnel_register")
Signed-off-by: Michal Kubecek
Signed-off-by: David S. Miller

Michal Kubeček
2015-11-03 23:52:13 +0800
4ece90097 sit: fix sit0 percpu double allocations ... Browse Code »

sit0 device allocates its percpu storage twice :
- One time in ipip6_tunnel_init()
- One time in ipip6_fb_tunnel_init()

Thus we leak 48 bytes per possible cpu per network namespace dismantle.

ipip6_fb_tunnel_init() can be much simpler and does not
return an error, and should be called after register_netdev()

Note that ipip6_tunnel_clone_6rd() also needs to be called
after register_netdev() (calling ipip6_tunnel_init())

Fixes: ebe084aafb7e ("sit: Use ipip6_tunnel_init as the ndo_init function.")
Signed-off-by: Eric Dumazet
Reported-by: Dmitry Vyukov
Cc: Steffen Klassert
Signed-off-by: David S. Miller

Eric Dumazet
2015-11-03 11:54:45 +0800
1d6119baf net: fix percpu memory leaks ... Browse Code »

This patch fixes following problems :

1) percpu_counter_init() can return an error, therefore
init_frag_mem_limit() must propagate this error so that
inet_frags_init_net() can do the same up to its callers.

2) If ip[46]_frags_ns_ctl_register() fail, we must unwind
properly and free the percpu_counter.

Without this fix, we leave freed object in percpu_counters
global list (if CONFIG_HOTPLUG_CPU) leading to crashes.

This bug was detected by KASAN and syzkaller tool
(http://github.com/google/syzkaller)

Fixes: 6d7b857d541e ("net: use lib/percpu_counter API for fragmentation mem accounting")
Signed-off-by: Eric Dumazet
Reported-by: Dmitry Vyukov
Cc: Hannes Frederic Sowa
Cc: Jesper Dangaard Brouer
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Eric Dumazet
2015-11-03 11:47:14 +0800
ec13ad1d7 ipv6: fix crash on ICMPv6 redirects with prohibited/blackholed source ... Browse Code »

There are other error values besides ip6_null_entry that can be returned by
ip6_route_redirect(): fib6_rule_action() can also result in
ip6_blk_hole_entry and ip6_prohibit_entry if such ip rules are installed.

Only checking for ip6_null_entry in rt6_do_redirect() causes ip6_ins_rt()
to be called with rt->rt6i_table == NULL in these cases, making the kernel
crash.

Signed-off-by: Matthias Schiffer
Signed-off-by: David S. Miller

Matthias Schiffer
2015-11-03 05:30:15 +0800
ce1050089 tcp/dccp: fix ireq->pktopts race ... Browse Code »

IPv6 request sockets store a pointer to skb containing the SYN packet
to be able to transfer it to full blown socket when 3WHS is done
(ireq->pktopts -> np->pktoptions)

As explained in commit 5e0724d027f0 ("tcp/dccp: fix hashdance race for
passive sessions"), we must transfer the skb only if we won the
hashdance race, if multiple cpus receive the 'ack' packet completing
3WHS at the same time.

Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets")
Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2015-11-03 04:38:26 +0800

02 Nov, 2015

2 commits

405c92f7a ipv6: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment ... Browse Code »

CHECKSUM_PARTIAL skbs should never arrive in ip_fragment. If we get one
of those warn about them once and handle them gracefully by recalculating
the checksum.

Fixes: commit 32dce968dd987 ("ipv6: Allow for partial checksums on non-ufo packets")
See-also: commit 72e843bb09d45 ("ipv6: ip6_fragment() should check CHECKSUM_PARTIAL")
Cc: Eric Dumazet
Cc: Vlad Yasevich
Cc: Benjamin Coddington
Cc: Tom Herbert
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2015-11-02 01:01:28 +0800
682b1a9d3 ipv6: no CHECKSUM_PARTIAL on MSG_MORE corked sockets ... Browse Code »

We cannot reliable calculate packet size on MSG_MORE corked sockets
and thus cannot decide if they are going to be fragmented later on,
so better not use CHECKSUM_PARTIAL in the first place.

The IPv6 code also intended to protect and not use CHECKSUM_PARTIAL in
the existence of IPv6 extension headers, but the condition was wrong. Fix
it up, too. Also the condition to check whether the packet fits into
one fragment was wrong and has been corrected.

Fixes: commit 32dce968dd987 ("ipv6: Allow for partial checksums on non-ufo packets")
See-also: commit 72e843bb09d45 ("ipv6: ip6_fragment() should check CHECKSUM_PARTIAL")
Cc: Eric Dumazet
Cc: Vlad Yasevich
Cc: Benjamin Coddington
Cc: Tom Herbert
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2015-11-02 01:01:27 +0800

01 Nov, 2015

1 commit

b75ec3af2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Browse Code »

David S. Miller
2015-11-01 12:15:30 +0800

30 Oct, 2015

2 commits

e7b63ff11 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next ... Browse Code »

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2015-10-30

1) The flow cache is limited by the flow cache limit which
depends on the number of cpus and the xfrm garbage collector
threshold which is independent of the number of cpus. This
leads to the fact that on systems with more than 16 cpus
we hit the xfrm garbage collector limit and refuse new
allocations, so new flows are dropped. On systems with 16
or less cpus, we hit the flowcache limit. In this case, we
shrink the flow cache instead of refusing new flows.

We increase the xfrm garbage collector threshold to INT_MAX
to get the same behaviour, independent of the number of cpus.

2) Fix some unaligned accesses on sparc systems.
From Sowmini Varadhan.

3) Fix some header checks in _decode_session4. We may call
pskb_may_pull with a negative value converted to unsigened
int from pskb_may_pull. This can lead to incorrect policy
lookups. We fix this by a check of the data pointer position
before we call pskb_may_pull.

4) Reload skb header pointers after calling pskb_may_pull
in _decode_session4 as this may change the pointers into
the packet.

5) Add a missing statistic counter on inner mode errors.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller

David S. Miller
2015-10-30 19:51:56 +0800
b7b0b1d29 ipv6: recreate ipv6 link-local addresses when increasing MTU over IPV6_MIN_MTU ... Browse Code »

This change makes it so that we reinitialize the interface if the MTU is
increased back above IPV6_MIN_MTU and the interface is up.

Cc: Hannes Frederic Sowa
Signed-off-by: Alexander Duyck
Signed-off-by: David S. Miller

Alexander Duyck
2015-10-30 17:11:07 +0800

29 Oct, 2015

2 commits

89bc7848a ipv6: protect mtu calculation of wrap-around and infinite loop by rounding issues ... Browse Code »

Raw sockets with hdrincl enabled can insert ipv6 extension headers
right into the data stream. In case we need to fragment those packets,
we reparse the options header to find the place where we can insert
the fragment header. If the extension headers exceed the link's MTU we
actually cannot make progress in such a case.

Instead of ending up in broken arithmetic or rounding towards 0 and
entering an endless loop in ip6_fragment, just prevent those cases by
aborting early and signal -EMSGSIZE to user space.

This is the second version of the patch which doesn't use the
overflow_usub function, which got reverted for now.

Suggested-by: Linus Torvalds
Cc: Linus Torvalds
Reported-by: Dmitry Vyukov
Cc: Dmitry Vyukov
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2015-10-29 22:01:50 +0800
1e0d69a9c Revert "Merge branch 'ipv6-overflow-arith'" ... Browse Code »

Linus dislikes these changes. To not hold up the net-merge let's revert
it for now and fix the bug like Linus suggested.

This reverts commit ec3661b42257d9a06cf0d318175623ac7a660113, reversing
changes made to c80dbe04612986fd6104b4a1be21681b113b5ac9.

Cc: Linus Torvalds
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2015-10-29 22:01:48 +0800

28 Oct, 2015

1 commit

190b8ffbb ipv6: Export nf_ct_frag6_consume_orig() ... Browse Code »

This is needed in openvswitch to fix an skb leak in the next patch.

Signed-off-by: Joe Stringer
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Joe Stringer
2015-10-28 10:32:17 +0800

27 Oct, 2015

1 commit

4b3418fba ipv6: icmp: include addresses in debug messages ... Browse Code »

Messages like "icmp6_send: no reply to icmp error" are close
to useless. Adding source and destination addresses to provide
some more clue.

Signed-off-by: Bjørn Mork
Signed-off-by: David S. Miller

Bjørn Mork
2015-10-27 12:59:42 +0800

24 Oct, 2015

1 commit

ba3e2084f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
net/ipv6/xfrm6_output.c
net/openvswitch/flow_netlink.c
net/openvswitch/vport-gre.c
net/openvswitch/vport-vxlan.c
net/openvswitch/vport.c
net/openvswitch/vport.h

The openvswitch conflicts were overlapping changes. One was
the egress tunnel info fix in 'net' and the other was the
vport ->send() op simplification in 'net-next'.

The xfrm6_output.c conflicts was also a simplification
overlapping a bug fix.

Signed-off-by: David S. Miller

David S. Miller
2015-10-24 21:54:12 +0800

23 Oct, 2015

3 commits

5e0724d02 tcp/dccp: fix hashdance race for passive sessions ... Browse Code »

Multiple cpus can process duplicates of incoming ACK messages
matching a SYN_RECV request socket. This is a rare event under
normal operations, but definitely can happen.

Only one must win the race, otherwise corruption would occur.

To fix this without adding new atomic ops, we use logic in
inet_ehash_nolisten() to detect the request was present in the same
ehash bucket where we try to insert the new child.

If request socket was not found, we have to undo the child creation.

This actually removes a spin_lock()/spin_unlock() pair in
reqsk_queue_unlink() for the fast path.

Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets")
Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2015-10-23 20:42:21 +0800
b72a2b01b ipv6: protect mtu calculation of wrap-around and infinite loop by rounding issues ... Browse Code »

Raw sockets with hdrincl enabled can insert ipv6 extension headers
right into the data stream. In case we need to fragment those packets,
we reparse the options header to find the place where we can insert
the fragment header. If the extension headers exceed the link's MTU we
actually cannot make progress in such a case.

Instead of ending up in broken arithmetic or rounding towards 0 and
entering an endless loop in ip6_fragment, just prevent those cases by
aborting early and signal -EMSGSIZE to user space.

Reported-by: Dmitry Vyukov
Cc: Dmitry Vyukov
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2015-10-23 17:49:36 +0800
ab997ad40 ipv6: fix the incorrect return value of throw route ... Browse Code »

The error condition -EAGAIN, which is signaled by throw routes, tells
the rules framework to walk on searching for next matches. If the walk
ends and we stop walking the rules with the result of a throw route we
have to translate the error conditions to -ENETUNREACH.

Signed-off-by: Xin Long
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

lucien
2015-10-23 17:38:18 +0800

22 Oct, 2015

6 commits

199c65506 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec ... Browse Code »

Steffen Klassert says:

====================
pull request (net): ipsec 2015-10-22

1) Fix IPsec pre-encap fragmentation for GSO packets.
From Herbert Xu.

2) Fix some header checks in _decode_session6.
We skip the header informations if the data pointer points
already behind the header in question for some protocols.
This is because we call pskb_may_pull with a negative value
converted to unsigened int from pskb_may_pull in this case.
Skipping the header informations can lead to incorrect policy
lookups. From Mathias Krause.

3) Allow to change the replay threshold and expiry timer of a
state without having to set other attributes like replay
counter and byte lifetime. Changing these other attributes
may break the SA. From Michael Rossberg.

4) Fix pmtu discovery for local generated packets.
We may fail dispatch to the inner address family.
As a reault, the local error handler is not called
and the mtu value is not reported back to userspace.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller

David S. Miller
2015-10-22 22:46:05 +0800
d46a9d678 net: ipv6: Dont add RT6_LOOKUP_F_IFACE flag if saddr set ... Browse Code »

741a11d9e410 ("net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set")
adds the RT6_LOOKUP_F_IFACE flag to make device index mismatch fatal if
oif is given. Hajime reported that this change breaks the Mobile IPv6
use case that wants to force the message through one interface yet use
the source address from another interface. Handle this case by only
adding the flag if oif is set and saddr is not set.

Fixes: 741a11d9e410 ("net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set")
Cc: Hajime Tazaki
Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2015-10-22 22:36:19 +0800
feec0cb3f ipv6: gro: support sit protocol ... Browse Code »

Tom Herbert added SIT support to GRO with commit
19424e052fb4 ("sit: Add gro callbacks to sit_offload"),
later reverted by Herbert Xu.

The problem came because Tom patch was building GRO
packets without proper meta data : If packets were locally
delivered, we would not care.

But if packets needed to be forwarded, GSO engine was not
able to segment individual segments.

With the following patch, we correctly set skb->encapsulation
and inner network header. We also update gso_type.

Tested:

Server :
netserver
modprobe dummy
ifconfig dummy0 8.0.0.1 netmask 255.255.255.0 up
arp -s 8.0.0.100 4e:32:51:04:47:e5
iptables -I INPUT -s 10.246.7.151 -j TEE --gateway 8.0.0.100
ifconfig sixtofour0
sixtofour0 Link encap:IPv6-in-IPv4
inet6 addr: 2002:af6:798::1/128 Scope:Global
inet6 addr: 2002:af6:798::/128 Scope:Global
UP RUNNING NOARP MTU:1480 Metric:1
RX packets:411169 errors:0 dropped:0 overruns:0 frame:0
TX packets:409414 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:20319631739 (20.3 GB) TX bytes:29529556 (29.5 MB)

Client :
netperf -H 2002:af6:798::1 -l 1000 &

Checked on server traffic copied on dummy0 and verify segments were
properly rebuilt, with proper IP headers, TCP checksums...

tcpdump on eth0 shows proper GRO aggregation takes place.

Signed-off-by: Eric Dumazet
Acked-by: Tom Herbert
Signed-off-by: David S. Miller

Eric Dumazet
2015-10-22 10:36:11 +0800
36a28b211 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf ... Browse Code »

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains four Netfilter fixes for net, they are:

1) Fix Kconfig dependencies of new nf_dup_ipv4 and nf_dup_ipv6.

2) Remove bogus test nh_scope in IPv4 rpfilter match that is breaking
--accept-local, from Xin Long.

3) Wait for RCU grace period after dropping the pending packets in the
nfqueue, from Florian Westphal.

4) Fix sleeping allocation while holding spin_lock_bh, from Nikolay Borisov.
====================

Signed-off-by: David S. Miller

David S. Miller
2015-10-22 10:26:17 +0800
b1974ed05 netlink: Rightsize IFLA_AF_SPEC size calculation ... Browse Code »

if_nlmsg_size() overestimates the minimum allocation size of netlink
dump request (when called from rtnl_calcit()) or the size of the
message (when called from rtnl_getlink()). This is because
ext_filter_mask is not supported by rtnl_link_get_af_size() and
rtnl_link_get_size().

The over-estimation is significant when at least one netdev has many
VLANs configured (8 bytes for each configured VLAN).

This patch-set "rightsizes" the protocol specific attribute size
calculation by propagating ext_filter_mask to rtnl_link_get_af_size()
and adding this a argument to get_link_af_size op in rtnl_af_ops.

Bridge module already used filtering aware sizing for notifications.
br_get_link_af_size_filtered() is consistent with the modified
get_link_af_size op so it replaces br_get_link_af_size() in br_af_ops.
br_get_link_af_size() becomes unused and thus removed.

Signed-off-by: Ronen Arad
Acked-by: Sridhar Samudrala
Signed-off-by: David S. Miller

Arad, Ronen
2015-10-22 10:15:20 +0800
f1900fb5e net: Really fix vti6 with oif in dst lookups ... Browse Code »

6e28b000825d ("net: Fix vti use case with oif in dst lookups for IPv6")
is missing the checks on FLOWI_FLAG_SKIP_NH_OIF. Add them.

Fixes: 42a7b32b73d6 ("xfrm: Add oif to dst lookups")
Cc: Steffen Klassert
Signed-off-by: David Ahern
Acked-by: Steffen Klassert
Signed-off-by: David S. Miller

David Ahern
2015-10-22 10:04:54 +0800

20 Oct, 2015

1 commit

26440c835 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/net/usb/asix_common.c
net/ipv4/inet_connection_sock.c
net/switchdev/switchdev.c

In the inet_connection_sock.c case the request socket hashing scheme
is completely different in net-next.

The other two conflicts were overlapping changes.

Signed-off-by: David S. Miller

David S. Miller
2015-10-20 21:08:27 +0800

19 Oct, 2015

3 commits

ca064bd89 xfrm: Fix pmtu discovery for local generated packets. ... Browse Code »

Commit 044a832a777 ("xfrm: Fix local error reporting crash
with interfamily tunnels") moved the setting of skb->protocol
behind the last access of the inner mode family to fix an
interfamily crash. Unfortunately now skb->protocol might not
be set at all, so we fail dispatch to the inner address family.
As a reault, the local error handler is not called and the
mtu value is not reported back to userspace.

We fix this by setting skb->protocol on message size errors
before we call xfrm_local_error.

Fixes: 044a832a7779c ("xfrm: Fix local error reporting crash with interfamily tunnels")
Signed-off-by: Steffen Klassert

Steffen Klassert
2015-10-19 16:30:05 +0800
371f1c7e0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next ... Browse Code »

Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for your net-next
tree. Most relevantly, updates for the nfnetlink_log to integrate with
conntrack, fixes for cttimeout and improvements for nf_queue core, they are:

1) Remove useless ifdef around static inline function in IPVS, from
Eric W. Biederman.

2) Simplify the conntrack support for nfnetlink_queue: Merge
nfnetlink_queue_ct.c file into nfnetlink_queue_core.c, then rename it back
to nfnetlink_queue.c

3) Use y2038 safe timestamp from nfnetlink_queue.

4) Get rid of dead function definition in nf_conntrack, from Flavio
Leitner.

5) Attach conntrack support for nfnetlink_log.c, from Ken-ichirou MATSUZAWA.
This adds a new NETFILTER_NETLINK_GLUE_CT Kconfig switch that
controls enabling both nfqueue and nflog integration with conntrack.
The userspace application can request this via NFULNL_CFG_F_CONNTRACK
configuration flag.

6) Remove unused netns variables in IPVS, from Eric W. Biederman and
Simon Horman.

7) Don't put back the refcount on the cttimeout object from xt_CT on success.

8) Fix crash on cttimeout policy object removal. We have to flush out
the cttimeout extension area of the conntrack not to refer to an unexisting
object that was just removed.

9) Make sure rcu_callback completion before removing nfnetlink_cttimeout
module removal.

10) Fix compilation warning in br_netfilter when no nf_defrag_ipv4 and
nf_defrag_ipv6 are enabled. Patch from Arnd Bergmann.

11) Autoload ctnetlink dependencies when NFULNL_CFG_F_CONNTRACK is
requested. Again from Ken-ichirou MATSUZAWA.

12) Don't use pointer to previous hook when reinjecting traffic via
nf_queue with NF_REPEAT verdict since it may be already gone. This
also avoids a deadloop if the userspace application keeps returning
NF_REPEAT.

13) A bunch of cleanups for netfilter IPv4 and IPv6 code from Ian Morris.

14) Consolidate logger instance existence check in nfulnl_recv_config().

15) Fix broken atomicity when applying configuration updates to logger
instances in nfnetlink_log.

16) Get rid of the .owner attribute in our hook object. We don't need
this anymore since we're dropping pending packets that have escaped
from the kernel when unremoving the hook. Patch from Florian Westphal.

17) Remove unnecessary rcu_read_lock() from nf_reinject code, we always
assume RCU read side lock from .call_rcu in nfnetlink. Also from Florian.

18) Use static inline function instead of macros to define NF_HOOK() and
NF_HOOK_COND() when no netfilter support in on, from Arnd Bergmann.
====================

Signed-off-by: David S. Miller

David S. Miller
2015-10-19 13:48:34 +0800
dc6ef6be5 tcp: do not set queue_mapping on SYNACK ... Browse Code »

At the time of commit fff326990789 ("tcp: reflect SYN queue_mapping into
SYNACK packets") we had little ways to cope with SYN floods.

We no longer need to reflect incoming skb queue mappings, and instead
can pick a TX queue based on cpu cooking the SYNACK, with normal XPS
affinities.

Note that all SYNACK retransmits were picking TX queue 0, this no longer
is a win given that SYNACK rtx are now distributed on all cpus.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2015-10-19 13:26:02 +0800

17 Oct, 2015

2 commits

f0a0a978b Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next ... Browse Code »

This merge resolves conflicts with 75aec9df3a78 ("bridge: Remove
br_nf_push_frag_xmit_sk") as part of Eric Biederman's effort to improve
netns support in the network stack that reached upstream via David's
net-next tree.

Signed-off-by: Pablo Neira Ayuso

Conflicts:
net/bridge/br_netfilter_hooks.c

Pablo Neira Ayuso
2015-10-17 20:28:03 +0800
2ffbceb2b netfilter: remove hook owner refcounting ... Browse Code »

since commit 8405a8fff3f8 ("netfilter: nf_qeueue: Drop queue entries on
nf_unregister_hook") all pending queued entries are discarded.

So we can simply remove all of the owner handling -- when module is
removed it also needs to unregister all its hooks.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2015-10-17 00:21:39 +0800

16 Oct, 2015

1 commit

f03f2e154 tcp/dccp: add inet_csk_reqsk_queue_drop_and_put() helper ... Browse Code »

Let's reduce the confusion about inet_csk_reqsk_queue_drop() :
In many cases we also need to release reference on request socket,
so add a helper to do this, reducing code size and complexity.

Fixes: 4bdc3d66147b ("tcp/dccp: fix behavior of stale SYN_RECV request sockets")
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2015-10-16 15:52:18 +0800