Eric Lee / smarc-fsl-linux-kernel

09 Feb, 2016

1 commit

9cf749036 tcp: do not drop syn_recv on all icmp reports ... Browse Code »

Petr Novopashenniy reported that ICMP redirects on SYN_RECV sockets
were leading to RST.

This is of course incorrect.

A specific list of ICMP messages should be able to drop a SYN_RECV.

For instance, a REDIRECT on SYN_RECV shall be ignored, as we do
not hold a dst per SYN_RECV pseudo request.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=111751
Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
Reported-by: Petr Novopashenniy
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2016-02-09 17:15:37 +0800

08 Feb, 2016

1 commit

44c3d0c1c ipv6: fix a lockdep splat ... Browse Code »

Silence lockdep false positive about rcu_dereference() being
used in the wrong context.

First one should use rcu_dereference_protected() as we own the spinlock.

Second one should be a normal assignation, as no barrier is needed.

Fixes: 18367681a10bd ("ipv6 flowlabel: Convert np->ipv6_fl_list to RCU.")
Reported-by: Dave Jones
Signed-off-by: Eric Dumazet
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Eric Dumazet
2016-02-08 23:33:32 +0800

06 Feb, 2016

1 commit

16186a82d ipv6: addrconf: Fix recursive spin lock call ... Browse Code »

A rcu stall with the following backtrace was seen on a system with
forwarding, optimistic_dad and use_optimistic set. To reproduce,
set these flags and allow ipv6 autoconf.

This occurs because the device write_lock is acquired while already
holding the read_lock. Back trace below -

INFO: rcu_preempt self-detected stall on CPU { 1} (t=2100 jiffies
g=3992 c=3991 q=4471)
Task dump for CPU 1:
kworker/1:0 R running task 12168 15 2 0x00000002
Workqueue: ipv6_addrconf addrconf_dad_work
Call trace:
[] el1_irq+0x68/0xdc
[] _raw_write_lock_bh+0x20/0x30
[] __ipv6_dev_ac_inc+0x64/0x1b4
[] addrconf_join_anycast+0x9c/0xc4
[] __ipv6_ifa_notify+0x160/0x29c
[] ipv6_ifa_notify+0x50/0x70
[] addrconf_dad_work+0x314/0x334
[] process_one_work+0x244/0x3fc
[] worker_thread+0x2f8/0x418
[] kthread+0xe0/0xec

v2: do addrconf_dad_kick inside read lock and then acquire write
lock for ipv6_ifa_notify as suggested by Eric

Fixes: 7fd2561e4ebdd ("net: ipv6: Add a sysctl to make optimistic
addresses useful candidates")

Cc: Eric Dumazet
Cc: Erik Kline
Cc: Hannes Frederic Sowa
Signed-off-by: Subash Abhinov Kasiviswanathan
Acked-by: Hannes Frederic Sowa
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

subashab@codeaurora.org
2016-02-06 16:08:15 +0800

02 Feb, 2016

1 commit

34229b277 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:
"This looks like a lot but it's a mixture of regression fixes as well
as fixes for longer standing issues.

1) Fix on-channel cancellation in mac80211, from Johannes Berg.

2) Handle CHECKSUM_COMPLETE properly in xt_TCPMSS netfilter xtables
module, from Eric Dumazet.

3) Avoid infinite loop in UDP SO_REUSEPORT logic, also from Eric
Dumazet.

4) Avoid a NULL deref if we try to set SO_REUSEPORT after a socket is
bound, from Craig Gallek.

5) GRO key comparisons don't take lightweight tunnels into account,
from Jesse Gross.

6) Fix struct pid leak via SCM credentials in AF_UNIX, from Eric
Dumazet.

7) We need to set the rtnl_link_ops of ipv6 SIT tunnels before we
register them, otherwise the NEWLINK netlink message is missing
the proper attributes. From Thadeu Lima de Souza Cascardo.

8) Several Spectrum chip bug fixes for mlxsw switch driver, from Ido
Schimmel

9) Handle fragments properly in ipv4 easly socket demux, from Eric
Dumazet.

10) Don't ignore the ifindex key specifier on ipv6 output route
lookups, from Paolo Abeni"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (128 commits)
tcp: avoid cwnd undo after receiving ECN
irda: fix a potential use-after-free in ircomm_param_request
net: tg3: avoid uninitialized variable warning
net: nb8800: avoid uninitialized variable warning
net: vxge: avoid unused function warnings
net: bgmac: clarify CONFIG_BCMA dependency
net: hp100: remove unnecessary #ifdefs
net: davinci_cpdma: use dma_addr_t for DMA address
ipv6/udp: use sticky pktinfo egress ifindex on connect()
ipv6: enforce flowi6_oif usage in ip6_dst_lookup_tail()
netlink: not trim skb for mmaped socket when dump
vxlan: fix a out of bounds access in __vxlan_find_mac
net: dsa: mv88e6xxx: fix port VLAN maps
fib_trie: Fix shift by 32 in fib_table_lookup
net: moxart: use correct accessors for DMA memory
ipv4: ipconfig: avoid unused ic_proto_used symbol
bnxt_en: Fix crash in bnxt_free_tx_skbs() during tx timeout.
bnxt_en: Exclude rx_drop_pkts hw counter from the stack's rx_dropped counter.
bnxt_en: Ring free response from close path should use completion ring
net_sched: drr: check for NULL pointer in drr_dequeue
...

Linus Torvalds
2016-02-02 07:56:08 +0800

30 Jan, 2016

2 commits

1cdda9187 ipv6/udp: use sticky pktinfo egress ifindex on connect() ... Browse Code »

Currently, the egress interface index specified via IPV6_PKTINFO
is ignored by __ip6_datagram_connect(), so that RFC 3542 section 6.7
can be subverted when the user space application calls connect()
before sendmsg().
Fix it by initializing properly flowi6_oif in connect() before
performing the route lookup.

Signed-off-by: Paolo Abeni
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Paolo Abeni
2016-01-30 12:31:26 +0800
6f21c96a7 ipv6: enforce flowi6_oif usage in ip6_dst_lookup_tail() ... Browse Code »

The current implementation of ip6_dst_lookup_tail basically
ignore the egress ifindex match: if the saddr is set,
ip6_route_output() purposefully ignores flowi6_oif, due
to the commit d46a9d678e4c ("net: ipv6: Dont add RT6_LOOKUP_F_IFACE
flag if saddr set"), if the saddr is 'any' the first route lookup
in ip6_dst_lookup_tail fails, but upon failure a second lookup will
be performed with saddr set, thus ignoring the ifindex constraint.

This commit adds an output route lookup function variant, which
allows the caller to specify lookup flags, and modify
ip6_dst_lookup_tail() to enforce the ifindex match on the second
lookup via said helper.

ip6_route_output() becames now a static inline function build on
top of ip6_route_output_flags(); as a side effect, out-of-tree
modules need now a GPL license to access the output route lookup
functionality.

Signed-off-by: Paolo Abeni
Acked-by: Hannes Frederic Sowa
Acked-by: David Ahern
Signed-off-by: David S. Miller

Paolo Abeni
2016-01-30 12:31:26 +0800

26 Jan, 2016

2 commits

87e57399e sit: set rtnl_link_ops before calling register_netdevice ... Browse Code »

When creating a SIT tunnel with ip tunnel, rtnl_link_ops is not set before
ipip6_tunnel_create is called. When register_netdevice is called, there is
no linkinfo attribute in the NEWLINK message because of that.

Setting rtnl_link_ops before calling register_netdevice fixes that.

Signed-off-by: Thadeu Lima de Souza Cascardo
Signed-off-by: David S. Miller

Thadeu Lima de Souza Cascardo
2016-01-26 02:51:53 +0800
32b6170ca ipv4+ipv6: Make INET*_ESP select CRYPTO_ECHAINIV ... Browse Code »

The ESP algorithms using CBC mode require echainiv. Hence INET*_ESP have
to select CRYPTO_ECHAINIV in order to work properly. This solves the
issues caused by a misconfiguration as described in [1].
The original approach, patching crypto/Kconfig was turned down by
Herbert Xu [2].

[1] https://lists.strongswan.org/pipermail/users/2015-December/009074.html
[2] http://marc.info/?l=linux-crypto-vger&m=145224655809562&w=2

Signed-off-by: Thomas Egerer
Acked-by: Herbert Xu
Signed-off-by: David S. Miller

Thomas Egerer
2016-01-26 02:45:41 +0800

21 Jan, 2016

1 commit

d55f90bfa net: drop tcp_memcontrol.c ... Browse Code »

tcp_memcontrol.c only contains legacy memory.tcp.kmem.* file definitions
and mem_cgroup->tcp_mem init/destroy stuff. This doesn't belong to
network subsys. Let's move it to memcontrol.c. This also allows us to
reuse generic code for handling legacy memcg files.

Signed-off-by: Vladimir Davydov
Acked-by: Johannes Weiner
Cc: "David S. Miller"
Acked-by: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vladimir Davydov
2016-01-21 09:09:18 +0800

20 Jan, 2016

1 commit

ed0dfffd7 udp: fix potential infinite loop in SO_REUSEPORT logic ... Browse Code »

Using a combination of connected and un-connected sockets, Dmitry
was able to trigger soft lockups with his fuzzer.

The problem is that sockets in the SO_REUSEPORT array might have
different scores.

Right after sk2=socket(), setsockopt(sk2,...,SO_REUSEPORT, on) and
bind(sk2, ...), but _before_ the connect(sk2) is done, sk2 is added into
the soreuseport array, with a score which is smaller than the score of
first socket sk1 found in hash table (I am speaking of the regular UDP
hash table), if sk1 had the connect() done, giving a +8 to its score.

hash bucket [X] -> sk1 -> sk2 -> NULL

sk1 score = 14 (because it did a connect())
sk2 score = 6

SO_REUSEPORT fast selection is an optimization. If it turns out the
score of the selected socket does not match score of first socket, just
fallback to old SO_REUSEPORT logic instead of trying to be too smart.

Normal SO_REUSEPORT users do not mix different kind of sockets, as this
mechanism is used for load balance traffic.

Fixes: e32ea7e74727 ("soreuseport: fast reuseport UDP socket selection")
Reported-by: Dmitry Vyukov
Signed-off-by: Eric Dumazet
Cc: Craig Gallek
Acked-by: Craig Gallek
Signed-off-by: David S. Miller

Eric Dumazet
2016-01-20 02:52:25 +0800

16 Jan, 2016

2 commits

4e5448a31 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:
"A quick set of bug fixes after there initial networking merge:

1) Netlink multicast group storage allocator only was tested with
nr_groups equal to 1, make it work for other values too. From
Matti Vaittinen.

2) Check build_skb() return value in macb and hip04_eth drivers, from
Weidong Wang.

3) Don't leak x25_asy on x25_asy_open() failure.

4) More DMA map/unmap fixes in 3c59x from Neil Horman.

5) Don't clobber IP skb control block during GSO segmentation, from
Konstantin Khlebnikov.

6) ECN helpers for ipv6 don't fixup the checksum, from Eric Dumazet.

7) Fix SKB segment utilization estimation in xen-netback, from David
Vrabel.

8) Fix lockdep splat in bridge addrlist handling, from Nikolay
Aleksandrov"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits)
bgmac: Fix reversed test of build_skb() return value.
bridge: fix lockdep addr_list_lock false positive splat
net: smsc: Add support h8300
xen-netback: free queues after freeing the net device
xen-netback: delete NAPI instance when queue fails to initialize
xen-netback: use skb to determine number of required guest Rx requests
net: sctp: Move sequence start handling into sctp_transport_get_idx()
ipv6: update skb->csum when CE mark is propagated
net: phy: turn carrier off on phy attach
net: macb: clear interrupts when disabling them
sctp: support to lookup with ep+paddr in transport rhashtable
net: hns: fixes no syscon error when init mdio
dts: hisi: fixes no syscon fault when init mdio
net: preserve IP control block during GSO segmentation
fsl/fman: Delete one function call "put_device" in dtsec_config()
hip04_eth: fix missing error handle for build_skb failed
3c59x: fix another page map/single unmap imbalance
3c59x: balance page maps and unmaps
x25_asy: Free x25_asy on x25_asy_open() failure.
mlxsw: fix SWITCHDEV_OBJ_ID_PORT_MDB
...

Linus Torvalds
2016-01-16 05:33:12 +0800
34ae6a1aa ipv6: update skb->csum when CE mark is propagated ... Browse Code »

When a tunnel decapsulates the outer header, it has to comply
with RFC 6080 and eventually propagate CE mark into inner header.

It turns out IP6_ECN_set_ce() does not correctly update skb->csum
for CHECKSUM_COMPLETE packets, triggering infamous "hw csum failure"
messages and stack traces.

Signed-off-by: Eric Dumazet
Acked-by: Herbert Xu
Signed-off-by: David S. Miller

Eric Dumazet
2016-01-16 04:07:23 +0800

15 Jan, 2016

1 commit

baac50bbc net: tcp_memcontrol: simplify linkage between socket and page counter ... Browse Code »

There won't be any separate counters for socket memory consumed by
protocols other than TCP in the future. Remove the indirection and link
sockets directly to their owning memory cgroup.

Signed-off-by: Johannes Weiner
Reviewed-by: Vladimir Davydov
Acked-by: David S. Miller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2016-01-15 08:00:49 +0800

12 Jan, 2016

2 commits

9d367eddf Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/net/bonding/bond_main.c
drivers/net/ethernet/mellanox/mlxsw/spectrum.h
drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c

The bond_main.c and mellanox switch conflicts were cases of
overlapping changes.

Signed-off-by: David S. Miller

David S. Miller
2016-01-12 12:55:43 +0800
40ba33022 udp: disallow UFO for sockets with SO_NO_CHECK option ... Browse Code »

Commit acf8dd0a9d0b ("udp: only allow UFO for packets from SOCK_DGRAM
sockets") disallows UFO for packets sent from raw sockets. We need to do
the same also for SOCK_DGRAM sockets with SO_NO_CHECK options, even if
for a bit different reason: while such socket would override the
CHECKSUM_PARTIAL set by ip_ufo_append_data(), gso_size is still set and
bad offloading flags warning is triggered in __skb_gso_segment().

In the IPv6 case, SO_NO_CHECK option is ignored but we need to disallow
UFO for packets sent by sockets with UDP_NO_CHECK6_TX option.

Signed-off-by: Michal Kubecek
Tested-by: Shannon Nelson
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Michal Kubeček
2016-01-12 06:40:57 +0800

11 Jan, 2016

2 commits

3e4006f0b ipv6: tcp: add rcu locking in tcp_v6_send_synack() ... Browse Code »

When first SYNACK is sent, we already hold rcu_read_lock(), but this
is not true if a SYNACK is retransmitted, as a timer (soft) interrupt
does not hold rcu_read_lock()

Fixes: 45f6fad84cc30 ("ipv6: add complete rcu protection around np->opt")
Reported-by: Dave Jones
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2016-01-11 11:58:03 +0800
3d171f390 ipv6: always add flag an address that failed DAD with DADFAILED ... Browse Code »

The userspace needs to know why is the address being removed so that it can
perhaps obtain a new address.

Without the DADFAILED flag it's impossible to distinguish removal of a
temporary and tentative address due to DAD failure from other reasons (device
removed, manual address removal).

Signed-off-by: Lubomir Rintel
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Lubomir Rintel
2016-01-11 11:54:27 +0800

09 Jan, 2016

1 commit

9b59377b7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next ... Browse Code »

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for net-next, they are:

1) Release nf_tables objects on netns destructions via
nft_release_afinfo().

2) Destroy basechain and rules on netdevice removal in the new netdev
family.

3) Get rid of defensive check against removal of inactive objects in
nf_tables.

4) Pass down netns pointer to our existing nfnetlink callbacks, as well
as commit() and abort() nfnetlink callbacks.

5) Allow to invert limit expression in nf_tables, so we can throttle
overlimit traffic.

6) Add packet duplication for the netdev family.

7) Add forward expression for the netdev family.

8) Define pr_fmt() in conntrack helpers.

9) Don't leave nfqueue configuration on inconsistent state in case of
errors, from Ken-ichirou MATSUZAWA, follow up patches are also from
him.

10) Skip queue option handling after unbind.

11) Return error on unknown both in nfqueue and nflog command.

12) Autoload ctnetlink when NFQA_CFG_F_CONNTRACK is set.

13) Add new NFTA_SET_USERDATA attribute to store user data in sets,
from Carlos Falgueras.

14) Add support for 64 bit byteordering changes nf_tables, from Florian
Westphal.

15) Add conntrack byte/packet counter matching support to nf_tables,
also from Florian.
====================

Signed-off-by: David S. Miller

David S. Miller
2016-01-09 09:53:16 +0800

06 Jan, 2016

2 commits

1134158ba soreuseport: pass skb to secondary UDP socket lookup ... Browse Code »

This socket-lookup path did not pass along the skb in question
in my original BPF-based socket selection patch. The skb in the
udpN_lib_lookup2 path can be used for BPF-based socket selection just
like it is in the 'traditional' udpN_lib_lookup path.

udpN_lib_lookup2 kicks in when there are greater than 10 sockets in
the same hlist slot. Coincidentally, I chose 10 sockets per
reuseport group in my functional test, so the lookup2 path was not
excersised. This adds an additional set of tests with 20 sockets.

Fixes: 538950a1b752 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF")
Fixes: 3ca8e4029969 ("soreuseport: BPF selection functional test")
Suggested-by: Eric Dumazet
Signed-off-by: Craig Gallek
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Craig Gallek
2016-01-06 14:28:04 +0800
a72a5e2d3 inet: kill unused skb_free op ... Browse Code »

The only user was removed in commit
029f7f3b8701cc7a ("netfilter: ipv6: nf_defrag: avoid/free clone operations").

Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller

Florian Westphal
2016-01-06 11:25:57 +0800

05 Jan, 2016

3 commits

538950a1b soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF ... Browse Code »

Expose socket options for setting a classic or extended BPF program
for use when selecting sockets in an SO_REUSEPORT group. These options
can be used on the first socket to belong to a group before bind or
on any socket in the group after bind.

This change includes refactoring of the existing sk_filter code to
allow reuse of the existing BPF filter validation checks.

Signed-off-by: Craig Gallek
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Craig Gallek
2016-01-05 11:49:59 +0800
e32ea7e74 soreuseport: fast reuseport UDP socket selection ... Browse Code »

Include a struct sock_reuseport instance when a UDP socket binds to
a specific address for the first time with the reuseport flag set.
When selecting a socket for an incoming UDP packet, use the information
available in sock_reuseport if present.

This required adding an additional field to the UDP source address
equality function to differentiate between exact and wildcard matches.
The original use case allowed wildcard matches when checking for
existing port uses during bind. The new use case of adding a socket
to a reuseport group requires exact address matching.

Performance test (using a machine with 2 CPU sockets and a total of
48 cores): Create reuseport groups of varying size. Use one socket
from this group per user thread (pinning each thread to a different
core) calling recvmmsg in a tight loop. Record number of messages
received per second while saturating a 10G link.
10 sockets: 18% increase (~2.8M -> 3.3M pkts/s)
20 sockets: 14% increase (~2.9M -> 3.3M pkts/s)
40 sockets: 13% increase (~3.0M -> 3.4M pkts/s)

This work is based off a similar implementation written by
Ying Cai for implementing policy-based reuseport
selection.

Signed-off-by: Craig Gallek
Signed-off-by: David S. Miller

Craig Gallek
2016-01-05 11:49:58 +0800
197c949e7 udp: properly support MSG_PEEK with truncated buffers ... Browse Code »

Backport of this upstream commit into stable kernels :
89c22d8c3b27 ("net: Fix skb csum races when peeking")
exposed a bug in udp stack vs MSG_PEEK support, when user provides
a buffer smaller than skb payload.

In this case,
skb_copy_and_csum_datagram_iovec(skb, sizeof(struct udphdr),
msg->msg_iov);
returns -EFAULT.

This bug does not happen in upstream kernels since Al Viro did a great
job to replace this into :
skb_copy_and_csum_datagram_msg(skb, sizeof(struct udphdr), msg);
This variant is safe vs short buffers.

For the time being, instead reverting Herbert Xu patch and add back
skb->ip_summed invalid changes, simply store the result of
udp_lib_checksum_complete() so that we avoid computing the checksum a
second time, and avoid the problematic
skb_copy_and_csum_datagram_iovec() call.

This patch can be applied on recent kernels as it avoids a double
checksumming, then backported to stable kernels as a bug fix.

Signed-off-by: Eric Dumazet
Acked-by: Herbert Xu
Signed-off-by: David S. Miller

Eric Dumazet
2016-01-05 06:23:36 +0800

01 Jan, 2016

1 commit

c07f30ad6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Browse Code »

David S. Miller
2016-01-01 07:20:10 +0800

29 Dec, 2015

1 commit

df05ef874 netfilter: nf_tables: release objects on netns destruction ... Browse Code »

We have to release the existing objects on netns removal otherwise we
leak them. Chains are unregistered in first place to make sure no
packets are walking on our rules and sets anymore.

The object release happens by when we unregister the family via
nft_release_afinfo() which is called from nft_unregister_afinfo() from
the corresponding __net_exit path in every family.

Reported-by: Patrick McHardy
Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2015-12-29 01:34:35 +0800

26 Dec, 2015

1 commit

039f50629 ip_tunnel: Move stats update to iptunnel_xmit() ... Browse Code »

By moving stats update into iptunnel_xmit(), we can simplify
iptunnel_xmit() usage. With this change there is no need to
call another function (iptunnel_xmit_stats()) to update stats
in tunnel xmit code path.

Signed-off-by: Pravin B Shelar
Signed-off-by: David S. Miller

Pravin B Shelar
2015-12-26 12:32:23 +0800

24 Dec, 2015

1 commit

c1a9a291c ipv6: honor ifindex in case we receive ll addresses in router advertisements ... Browse Code »

Marc Haber reported we don't honor interface indexes when we receive link
local router addresses in router advertisements. Luckily the non-strict
version of ipv6_chk_addr already does the correct job here, so we can
simply use it to lighten the checks and use those addresses by default
without any configuration change.

Link:
Reported-by: Marc Haber
Cc: Marc Haber
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2015-12-24 11:03:54 +0800

23 Dec, 2015

5 commits

271c3b9b7 tcp: honour SO_BINDTODEVICE for TW_RST case too ... Browse Code »

Hannes points out that when we generate tcp reset for timewait sockets we
pretend we found no socket and pass NULL sk to tcp_vX_send_reset().

Make it cope with inet tw sockets and then provide tw sk.

This makes RSTs appear on correct interface when SO_BINDTODEVICE is used.

Packetdrill test case:
// want default route to be used, we rely on BINDTODEVICE
`ip route del 192.0.2.0/24 via 192.168.0.2 dev tun0`

0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
// test case still works due to BINDTODEVICE
0.001 setsockopt(3, SOL_SOCKET, SO_BINDTODEVICE, "tun0", 4) = 0
0.100...0.200 connect(3, ..., ...) = 0

0.100 > S 0:0(0)
0.200 < S. 0:0(0) ack 1 win 32792
0.200 > . 1:1(0) ack 1

0.210 close(3) = 0

0.210 > F. 1:1(0) ack 1 win 29200
0.300 < . 1:1(0) ack 2 win 46

// more data while in FIN_WAIT2, expect RST
1.300 < P. 1:1001(1000) ack 1 win 46

// fails without this change -- default route is used
1.301 > R 1:1(0) win 0

Reported-by: Hannes Frederic Sowa
Signed-off-by: Florian Westphal
Acked-by: Eric Dumazet
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Florian Westphal
2015-12-23 06:03:05 +0800
e46787f0d tcp: send_reset: test for non-NULL sk first ... Browse Code »

tcp_md5_do_lookup requires a full socket, so once we extend
_send_reset() to also accept timewait socket we would have to change

if (!sk && hash_location)

to something like

if ((!sk || !sk_fullsock(sk)) && hash_location) {
...
} else {
(sk && sk_fullsock(sk)) tcp_md5_do_lookup()
}

Switch the two branches: check if we have a socket first, then
fall back to a listener lookup if we saw a md5 option (hash_location).

Signed-off-by: Florian Westphal
Acked-by: Eric Dumazet
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Florian Westphal
2015-12-23 06:03:05 +0800
5449a5ca9 addrconf: always initialize sysctl table data ... Browse Code »

When sysctl performs restrict writes, it allows to write from
a middle position of a sysctl file, which requires us to initialize
the table data before calling proc_dostring() for the write case.

Fixes: 3d1bec99320d ("ipv6: introduce secret_stable to ipv6_devconf")
Reported-by: Sasha Levin
Acked-by: Hannes Frederic Sowa
Tested-by: Sasha Levin
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller

WANG Cong
2015-12-23 06:00:58 +0800
024f35c55 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec ... Browse Code »

Steffen Klassert says:

====================
pull request (net): ipsec 2015-12-22

Just one patch to fix dst_entries_init with multiple namespaces.
From Dan Streetman.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller

David S. Miller
2015-12-23 05:26:31 +0800
e459dfeeb ipv6/addrlabel: fix ip6addrlbl_get() ... Browse Code »

ip6addrlbl_get() has never worked. If ip6addrlbl_hold() succeeded,
ip6addrlbl_get() will exit with '-ESRCH'. If ip6addrlbl_hold() failed,
ip6addrlbl_get() will use about to be free ip6addrlbl_entry pointer.

Fix this by inverting ip6addrlbl_hold() check.

Fixes: 2a8cc6c89039 ("[IPV6] ADDRCONF: Support RFC3484 configurable address selection policy table.")
Signed-off-by: Andrey Ryabinin
Reviewed-by: Cong Wang
Acked-by: YOSHIFUJI Hideaki
Signed-off-by: David S. Miller

Andrey Ryabinin
2015-12-23 04:57:54 +0800

19 Dec, 2015

4 commits

59ce9670c Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next ... Browse Code »

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains the first batch of Netfilter updates for
the upcoming 4.5 kernel. This batch contains userspace netfilter header
compilation fixes, support for packet mangling in nf_tables, the new
tracing infrastructure for nf_tables and cgroup2 support for iptables.
More specifically, they are:

1) Two patches to include dependencies in our netfilter userspace
headers to resolve compilation problems, from Mikko Rapeli.

2) Four comestic cleanup patches for the ebtables codebase, from Ian Morris.

3) Remove duplicate include in the netfilter reject infrastructure,
from Stephen Hemminger.

4) Two patches to simplify the netfilter defragmentation code for IPv6,
patch from Florian Westphal.

5) Fix root ownership of /proc/net netfilter for unpriviledged net
namespaces, from Philip Whineray.

6) Get rid of unused fields in struct nft_pktinfo, from Florian Westphal.

7) Add mangling support to our nf_tables payload expression, from
Patrick McHardy.

8) Introduce a new netlink-based tracing infrastructure for nf_tables,
from Florian Westphal.

9) Change setter functions in nfnetlink_log to be void, from
Rami Rosen.

10) Add netns support to the cttimeout infrastructure.

11) Add cgroup2 support to iptables, from Tejun Heo.

12) Introduce nfnl_dereference_protected() in nfnetlink, from Florian.

13) Add support for mangling pkttype in the nf_tables meta expression,
also from Florian.

BTW, I need that you pull net into net-next, I have another batch that
requires changes that I don't yet see in net.
====================

Signed-off-by: David S. Miller

David S. Miller
2015-12-19 04:37:42 +0800
6dd9a14e9 net: Allow accepted sockets to be bound to l3mdev domain ... Browse Code »

Allow accepted sockets to derive their sk_bound_dev_if setting from the
l3mdev domain in which the packets originated. A sysctl setting is added
to control the behavior which is similar to sk_mark and
sysctl_tcp_fwmark_accept.

This effectively allow a process to have a "VRF-global" listen socket,
with child sockets bound to the VRF device in which the packet originated.
A similar behavior can be achieved using sk_mark, but a solution using marks
is incomplete as it does not handle duplicate addresses in different L3
domains/VRFs. Allowing sockets to inherit the sk_bound_dev_if from l3mdev
domain provides a complete solution.

Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2015-12-19 03:43:38 +0800
cc9da6cc4 ipv6: addrconf: use stable address generator for ARPHRD_NONE ... Browse Code »

Add a new address generator mode, using the stable address generator
with an automatically generated secret. This is intended as a default
address generator mode for device types with no EUI64 implementation.
The new generator is used for ARPHRD_NONE interfaces initially, adding
default IPv6 autoconf support to e.g. tun interfaces.

If the addrgenmode is set to 'random', either by default or manually,
and no stable secret is available, then a random secret is used as
input for the stable-privacy address generator. The secret can be
read and modified like manually configured secrets, using the proc
interface. Modifying the secret will change the addrgen mode to
'stable-privacy' to indicate that it operates on a known secret.

Existing behaviour of the 'stable-privacy' mode is kept unchanged. If
a known secret is available when the device is created, then the mode
will default to 'stable-privacy' as before. The mode can be manually
set to 'random' but it will behave exactly like 'stable-privacy' in
this case. The secret will not change.

Cc: Hannes Frederic Sowa
Cc: 吉藤英明
Signed-off-by: Bjørn Mork
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Bjørn Mork
2015-12-19 03:41:07 +0800
8cb964dae ila: add NETFILTER dependency ... Browse Code »

The recently added generic ILA translation facility fails to
build when CONFIG_NETFILTER is disabled:

net/ipv6/ila/ila_xlat.c:229:20: warning: 'struct nf_hook_state' declared inside parameter list
net/ipv6/ila/ila_xlat.c:235:27: error: array type has incomplete element type 'struct nf_hook_ops'
static struct nf_hook_ops ila_nf_hook_ops[] __read_mostly = {

This adds an explicit Kconfig dependency to avoid that case.

Signed-off-by: Arnd Bergmann
Fixes: 7f00feaf1076 ("ila: Add generic ILA translation facility")
Signed-off-by: David S. Miller

Arnd Bergmann
2015-12-19 03:19:28 +0800

18 Dec, 2015

3 commits

b3e0d3d7b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/net/geneve.c

Here we had an overlapping change, where in 'net' the extraneous stats
bump was being removed whilst in 'net-next' the final argument to
udp_tunnel6_xmit_skb() was being changed.

Signed-off-by: David S. Miller

David S. Miller
2015-12-18 11:08:28 +0800
715f504b1 ipv6: add IPV6_HDRINCL option for raw sockets ... Browse Code »

Same as in Windows, we miss IPV6_HDRINCL for SOL_IPV6 and SOL_RAW.
The SOL_IP/IP_HDRINCL is not available for IPv6 sockets.

Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2015-12-18 04:12:28 +0800
32bc201e1 ipv6: allow routes to be configured with expire values ... Browse Code »

Add the support for adding expire value to routes, requested by
Tom Gundersen for systemd-networkd, and NetworkManager
wants it too.

implement it by adding the new RTNETLINK attribute RTA_EXPIRES.

Signed-off-by: Xin Long
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Xin Long
2015-12-18 04:08:51 +0800

16 Dec, 2015

1 commit

9b29c6962 ipv6: automatically enable stable privacy mode if stable_secret set ... Browse Code »

Bjørn reported that while we switch all interfaces to privacy stable mode
when setting the secret, we don't set this mode for new interfaces. This
does not make sense, so change this behaviour.

Fixes: 622c81d57b392cc ("ipv6: generation of stable privacy addresses for link-local and autoconf")
Reported-by: Bjørn Mork
Cc: Bjørn Mork
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2015-12-16 12:37:32 +0800