Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

12 Jan, 2015

1 commit

2bd822180 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf ... Browse Code »

Pablo Neira Ayuso says:

====================
netfilter/ipvs fixes for net

The following patchset contains netfilter/ipvs fixes, they are:

1) Small fix for the FTP helper in IPVS, a diff variable may be left
unset when CONFIG_IP_VS_IPV6 is set. Patch from Dan Carpenter.

2) Fix nf_tables port NAT in little endian archs, patch from leroy
christophe.

3) Fix race condition between conntrack confirmation and flush from
userspace. This is the second reincarnation to resolve this problem.

4) Make sure inner messages in the batch come with the nfnetlink header.

5) Relax strict check from nfnetlink_bind() that may break old userspace
applications using all 1s group mask.

6) Schedule removal of chains once no sets and rules refer to them in
the new nf_tables ruleset flush command. Reported by Asbjoern Sloth
Toennesen.

Note that this batch comes later than usual because of the short
winter holidays.
====================

Signed-off-by: David S. Miller

David S. Miller
2015-01-12 13:14:49 +0800

23 Dec, 2014

2 commits

7b5bca467 netfilter: nf_tables: fix port natting in little endian archs ... Browse Code »

Make sure this fetches 16-bits port data from the register.
Remove casting to make sparse happy, not needed anymore.

Signed-off-by: leroy christophe
Signed-off-by: Pablo Neira Ayuso

leroy christophe
2014-12-23 22:34:28 +0800
2dc49d168 tcp6: don't move IP6CB before xfrm6_policy_check() ... Browse Code »
13

When xfrm6_policy_check() is used, _decode_session6() is called after some
intermediate functions. This function uses IP6CB(), thus TCP_SKB_CB() must be
prepared after the call of xfrm6_policy_check().

Before this patch, scenarii with IPv6 + TCP + IPsec Transport are broken.

Fixes: 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
Reported-by: Huaibin Wang
Suggested-by: Eric Dumazet
Signed-off-by: Nicolas Dichtel
Signed-off-by: David S. Miller

Nicolas Dichtel
2014-12-23 05:48:01 +0800

12 Dec, 2014

1 commit

70e71ca0a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next ... Browse Code »

Pull networking updates from David Miller:

1) New offloading infrastructure and example 'rocker' driver for
offloading of switching and routing to hardware.

This work was done by a large group of dedicated individuals, not
limited to: Scott Feldman, Jiri Pirko, Thomas Graf, John Fastabend,
Jamal Hadi Salim, Andy Gospodarek, Florian Fainelli, Roopa Prabhu

2) Start making the networking operate on IOV iterators instead of
modifying iov objects in-situ during transfers. Thanks to Al Viro
and Herbert Xu.

3) A set of new netlink interfaces for the TIPC stack, from Richard
Alpe.

4) Remove unnecessary looping during ipv6 routing lookups, from Martin
KaFai Lau.

5) Add PAUSE frame generation support to gianfar driver, from Matei
Pavaluca.

6) Allow for larger reordering levels in TCP, which are easily
achievable in the real world right now, from Eric Dumazet.

7) Add a variable of napi_schedule that doesn't need to disable cpu
interrupts, from Eric Dumazet.

8) Use a doubly linked list to optimize neigh_parms_release(), from
Nicolas Dichtel.

9) Various enhancements to the kernel BPF verifier, and allow eBPF
programs to actually be attached to sockets. From Alexei
Starovoitov.

10) Support TSO/LSO in sunvnet driver, from David L Stevens.

11) Allow controlling ECN usage via routing metrics, from Florian
Westphal.

12) Remote checksum offload, from Tom Herbert.

13) Add split-header receive, BQL, and xmit_more support to amd-xgbe
driver, from Thomas Lendacky.

14) Add MPLS support to openvswitch, from Simon Horman.

15) Support wildcard tunnel endpoints in ipv6 tunnels, from Steffen
Klassert.

16) Do gro flushes on a per-device basis using a timer, from Eric
Dumazet. This tries to resolve the conflicting goals between the
desired handling of bulk vs. RPC-like traffic.

17) Allow userspace to ask for the CPU upon what a packet was
received/steered, via SO_INCOMING_CPU. From Eric Dumazet.

18) Limit GSO packets to half the current congestion window, from Eric
Dumazet.

19) Add a generic helper so that all drivers set their RSS keys in a
consistent way, from Eric Dumazet.

20) Add xmit_more support to enic driver, from Govindarajulu
Varadarajan.

21) Add VLAN packet scheduler action, from Jiri Pirko.

22) Support configurable RSS hash functions via ethtool, from Eyal
Perry.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1820 commits)
Fix race condition between vxlan_sock_add and vxlan_sock_release
net/macb: fix compilation warning for print_hex_dump() called with skb->mac_header
net/mlx4: Add support for A0 steering
net/mlx4: Refactor QUERY_PORT
net/mlx4_core: Add explicit error message when rule doesn't meet configuration
net/mlx4: Add A0 hybrid steering
net/mlx4: Add mlx4_bitmap zone allocator
net/mlx4: Add a check if there are too many reserved QPs
net/mlx4: Change QP allocation scheme
net/mlx4_core: Use tasklet for user-space CQ completion events
net/mlx4_core: Mask out host side virtualization features for guests
net/mlx4_en: Set csum level for encapsulated packets
be2net: Export tunnel offloads only when a VxLAN tunnel is created
gianfar: Fix dma check map error when DMA_API_DEBUG is enabled
cxgb4/csiostor: Don't use MASTER_MUST for fw_hello call
net: fec: only enable mdio interrupt before phy device link up
net: fec: clear all interrupt events to support i.MX6SX
net: fec: reset fep link status in suspend function
net: sock: fix access via invalid file descriptor
net: introduce helper macro for_each_cmsghdr
...

Linus Torvalds
2014-12-12 06:27:06 +0800

11 Dec, 2014

3 commits

f95b414ed net: introduce helper macro for_each_cmsghdr ... Browse Code »

Introduce helper macro for_each_cmsghdr as a wrapper of the enumerating
cmsghdr from msghdr, just cleanup.

Signed-off-by: Gu Zheng
Signed-off-by: David S. Miller

Gu Zheng
2014-12-11 11:41:55 +0800
cbfe0de30 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull VFS changes from Al Viro:
"First pile out of several (there _definitely_ will be more). Stuff in
this one:

- unification of d_splice_alias()/d_materialize_unique()

- iov_iter rewrite

- killing a bunch of ->f_path.dentry users (and f_dentry macro).

Getting that completed will make life much simpler for
unionmount/overlayfs, since then we'll be able to limit the places
sensitive to file _dentry_ to reasonably few. Which allows to have
file_inode(file) pointing to inode in a covered layer, with dentry
pointing to (negative) dentry in union one.

Still not complete, but much closer now.

- crapectomy in lustre (dead code removal, mostly)

- "let's make seq_printf return nothing" preparations

- assorted cleanups and fixes

There _definitely_ will be more piles"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
copy_from_iter_nocache()
new helper: iov_iter_kvec()
csum_and_copy_..._iter()
iov_iter.c: handle ITER_KVEC directly
iov_iter.c: convert copy_to_iter() to iterate_and_advance
iov_iter.c: convert copy_from_iter() to iterate_and_advance
iov_iter.c: get rid of bvec_copy_page_{to,from}_iter()
iov_iter.c: convert iov_iter_zero() to iterate_and_advance
iov_iter.c: convert iov_iter_get_pages_alloc() to iterate_all_kinds
iov_iter.c: convert iov_iter_get_pages() to iterate_all_kinds
iov_iter.c: convert iov_iter_npages() to iterate_all_kinds
iov_iter.c: iterate_and_advance
iov_iter.c: macros for iterating over iov_iter
kill f_dentry macro
dcache: fix kmemcheck warning in switch_names
new helper: audit_file()
nfsd_vfs_write(): use file_inode()
ncpfs: use file_inode()
kill f_dentry uses
lockd: get rid of ->f_path.dentry->d_sb
...

Linus Torvalds
2014-12-11 08:10:49 +0800
22f10923d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/net/ethernet/amd/xgbe/xgbe-desc.c
drivers/net/ethernet/renesas/sh_eth.c

Overlapping changes in both conflict cases.

Signed-off-by: David S. Miller

David S. Miller
2014-12-11 04:48:20 +0800

10 Dec, 2014

5 commits

0f85feae6 tcp: fix more NULL deref after prequeue changes ... Browse Code »

When I cooked commit c3658e8d0f1 ("tcp: fix possible NULL dereference in
tcp_vX_send_reset()") I missed other spots we could deref a NULL
skb_dst(skb)

Again, if a socket is provided, we do not need skb_dst() to get a
pointer to network namespace : sock_net(sk) is good enough.

Reported-by: Dann Frazier
Bisected-by: Dann Frazier
Tested-by: Dann Frazier
Signed-off-by: Eric Dumazet
Fixes: ca777eff51f7 ("tcp: remove dst refcount false sharing for prequeue mode")
Signed-off-by: David S. Miller

Eric Dumazet
2014-12-10 10:38:44 +0800
c0371da60 put iov_iter into msghdr ... Browse Code »
2

Note that the code _using_ ->msg_iter at that point will be very
unhappy with anything other than unshifted iovec-backed iov_iter.
We still need to convert users to proper primitives.

Signed-off-by: Al Viro

Al Viro
2014-12-10 05:29:03 +0800
f69e6d131 ip_generic_getfrag, udplite_getfrag: switch to passing msghdr ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2014-12-10 05:28:22 +0800
19e3c66b5 ipv6 equivalent of "ipv4: Avoid reading user iov twice after raw_probe_proto_opt" ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2014-12-10 05:28:21 +0800
86fe8f892 ipv6: remove useless spin_lock/spin_unlock ... Browse Code »

xchg is atomic, so there is no necessary to use spin_lock/spin_unlock
to protect it. At last, remove the redundant
opt = xchg(&inet6_sk(sk)->opt, opt); statement.

Signed-off-by: Duan Jiong
Signed-off-by: David S. Miller

Duan Jiong
2014-12-10 02:18:09 +0800

09 Dec, 2014

4 commits

6db70e3e1 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next ... Browse Code »

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2014-12-03

1) Fix a set but not used warning. From Fabian Frederick.

2) Currently we make sequence number values available to userspace
only if we use ESN. Make the sequence number values also available
for non ESN states. From Zhi Ding.

3) Remove socket policy hashing. We don't need it because socket
policies are always looked up via a linked list. From Herbert Xu.

4) After removing socket policy hashing, we can use __xfrm_policy_link
in xfrm_policy_insert. From Herbert Xu.

5) Add a lookup method for vti6 tunnels with wildcard endpoints.
I forgot this when I initially implemented vti6.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller

David S. Miller
2014-12-09 10:30:21 +0800
ba00410b8 Merge branch 'iov_iter' into for-next Browse Code »

Al Viro
2014-12-09 09:39:29 +0800
60c04aecd udp: Neaten and reduce size of compute_score functions ... Browse Code »

The compute_score functions are a bit difficult to read.

Neaten them a bit to reduce object sizes and make them a
bit more intelligible.

Return early to avoid indentation and avoid unnecessary
initializations.

(allyesconfig, but w/ -O2 and no profiling)

$ size net/ipv[46]/udp.o.*
text data bss dec hex filename
28680 1184 25 29889 74c1 net/ipv4/udp.o.new
28756 1184 25 29965 750d net/ipv4/udp.o.old
17600 1010 2 18612 48b4 net/ipv6/udp.o.new
17632 1010 2 18644 48d4 net/ipv6/udp.o.old

Signed-off-by: Joe Perches
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Joe Perches
2014-12-09 09:28:47 +0800
829ae9d61 net-timestamp: allow reading recv cmsg on errqueue with origin tstamp ... Browse Code »
2

Allow reading of timestamps and cmsg at the same time on all relevant
socket families. One use is to correlate timestamps with egress
device, by asking for cmsg IP_PKTINFO.

on AF_INET sockets, call the relevant function (ip_cmsg_recv). To
avoid changing legacy expectations, only do so if the caller sets a
new timestamping flag SOF_TIMESTAMPING_OPT_CMSG.

on AF_INET6 sockets, IPV6_PKTINFO and all other recv cmsg are already
returned for all origins. only change is to set ifindex, which is
not initialized for all error origins.

In both cases, only generate the pktinfo message if an ifindex is
known. This is not the case for ACK timestamps.

The difference between the protocol families is probably a historical
accident as a result of the different conditions for generating cmsg
in the relevant ip(v6)_recv_error function:

ipv4: if (serr->ee.ee_origin == SO_EE_ORIGIN_ICMP) {
ipv6: if (serr->ee.ee_origin != SO_EE_ORIGIN_LOCAL) {

At one time, this was the same test bar for the ICMP/ICMP6
distinction. This is no longer true.

Signed-off-by: Willem de Bruijn

----

Changes
v1 -> v2
large rewrite
- integrate with existing pktinfo cmsg generation code
- on ipv4: only send with new flag, to maintain legacy behavior
- on ipv6: send at most a single pktinfo cmsg
- on ipv6: initialize fields if not yet initialized

The recv cmsg interfaces are also relevant to the discussion of
whether looping packet headers is problematic. For v6, cmsgs that
identify many headers are already returned. This patch expands
that to v4. If it sounds reasonable, I will follow with patches

1. request timestamps without payload with SOF_TIMESTAMPING_OPT_TSONLY
(http://patchwork.ozlabs.org/patch/366967/)
2. sysctl to conditionally drop all timestamps that have payload or
cmsg from users without CAP_NET_RAW.
Signed-off-by: David S. Miller

Willem de Bruijn
2014-12-09 09:20:48 +0800

06 Dec, 2014

1 commit

244ebd9f8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next ... Browse Code »

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following batch contains netfilter updates for net-next. Basically,
enhancements for xt_recent, skip zeroing of timer in conntrack, fix
linking problem with recent redirect support for nf_tables, ipset
updates and a couple of cleanups. More specifically, they are:

1) Rise maximum number per IP address to be remembered in xt_recent
while retaining backward compatibility, from Florian Westphal.

2) Skip zeroing timer area in nf_conn objects, also from Florian.

3) Inspect IPv4 and IPv6 traffic from the bridge to allow filtering using
using meta l4proto and transport layer header, from Alvaro Neira.

4) Fix linking problems in the new redirect support when CONFIG_IPV6=n
and IP6_NF_IPTABLES=n.

And ipset updates from Jozsef Kadlecsik:

5) Support updating element extensions when the set is full (fixes
netfilter bugzilla id 880).

6) Fix set match with 32-bits userspace / 64-bits kernel.

7) Indicate explicitly when /0 networks are supported in ipset.

8) Simplify cidr handling for hash:*net* types.

9) Allocate the proper size of memory when /0 networks are supported.

10) Explicitly add padding elements to hash:net,net and hash:net,port,
because the elements must be u32 sized for the used hash function.

Jozsef is also cooking ipset RCU conversion which should land soon if
they reach the merge window in time.
====================

Signed-off-by: David S. Miller

David S. Miller
2014-12-06 12:56:46 +0800

30 Nov, 2014

1 commit

60b7379dc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Browse Code »

David S. Miller
2014-11-30 12:47:48 +0800

29 Nov, 2014

1 commit

4338c5725 netfilter: nf_log_ipv6: correct typo in module description ... Browse Code »

It incorrectly identifies itself as "IPv4" packet logging.

Signed-off-by: Steven Noonan
Signed-off-by: Pablo Neira Ayuso

Steven Noonan
2014-11-29 00:28:17 +0800

27 Nov, 2014

2 commits

b59eaf9e2 netfilter: combine IPv4 and IPv6 nf_nat_redirect code in one module ... Browse Code »

This resolves linking problems with CONFIG_IPV6=n:

net/built-in.o: In function `redirect_tg6':
xt_REDIRECT.c:(.text+0x6d021): undefined reference to `nf_nat_redirect_ipv6'

Reported-by: Andreas Ruprecht
Reported-by: Or Gerlitz
Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2014-11-27 20:08:42 +0800
73cf0e923 ipv6: Remove unnecessary test ... Browse Code »

The "init_net" test in function addrconf_exit_net is introduced
in commit 44a6bd29 [Create ipv6 devconf-s for namespaces] to avoid freeing
init_net. In commit c900a800 [ipv6: fix bad free of addrconf_init_net],
function addrconf_init_net will allocate memory for every net regardless of
init_net. In this case, it is unnecessary to make "init_net" test.

CC: Hong Zhiguo
CC: Octavian Purdila
CC: Pavel Emelyanov
CC: Cong Wang
Suggested-by: David S. Miller
Signed-off-by: Zhu Yanjun
Signed-off-by: David S. Miller

zhuyj
2014-11-27 01:27:04 +0800

26 Nov, 2014

3 commits

d3fc6b3fd Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

More work from Al Viro to move away from modifying iovecs
by using iov_iter instead.

Signed-off-by: David S. Miller

David S. Miller
2014-11-26 09:02:51 +0800
c3658e8d0 tcp: fix possible NULL dereference in tcp_vX_send_reset() ... Browse Code »
13

After commit ca777eff51f7 ("tcp: remove dst refcount false sharing for
prequeue mode") we have to relax check against skb dst in
tcp_v[46]_send_reset() if prequeue dropped the dst.

If a socket is provided, a full lookup was done to find this socket,
so the dst test can be skipped.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=88191
Reported-by: Jaša Bartelj
Signed-off-by: Eric Dumazet
Reported-by: Daniel Borkmann
Fixes: ca777eff51f7 ("tcp: remove dst refcount false sharing for prequeue mode")
Signed-off-by: David S. Miller

Eric Dumazet
2014-11-26 03:29:18 +0800
f3750817a ip6_udp_tunnel: Fix checksum calculation ... Browse Code »

The UDP checksum calculation for VXLAN tunnels is currently using the
socket addresses instead of the actual packet source and destination
addresses. As a result the checksum calculated is incorrect in some
cases.

Also uh->check was being set twice, first it was set to 0, and then it is
set again in udp6_set_csum. This change removes the redundant assignment
to 0.

Fixes: acbf74a7 ("vxlan: Refactor vxlan driver to make use of the common UDP tunnel functions.")

Cc: Andy Zhou
Signed-off-by: Alexander Duyck
Signed-off-by: David S. Miller

Alexander Duyck
2014-11-26 03:12:12 +0800

25 Nov, 2014

2 commits

be6572fdb ipv6: gre: fix wrong skb->protocol in WCCP ... Browse Code »
3

When using GRE redirection in WCCP, it sets the wrong skb->protocol,
that is, ETH_P_IP instead of ETH_P_IPV6 for the encapuslated traffic.

Fixes: c12b395a4664 ("gre: Support GRE over IPv6")
Cc: Dmitry Kozlov
Signed-off-by: Yuri Chislov
Tested-by: Yuri Chislov
Signed-off-by: Daniel Borkmann
Signed-off-by: David S. Miller

Yuri Chislov
2014-11-25 05:11:05 +0800
958d03b01 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next ... Browse Code »

Pablo Neira Ayuso says:

====================
netfilter/ipvs updates for net-next

The following patchset contains Netfilter updates for your net-next
tree, this includes the NAT redirection support for nf_tables, the
cgroup support for nft meta and conntrack zone support for the connlimit
match. Coming after those, a bunch of sparse warning fixes, missing
netns bits and cleanups. More specifically, they are:

1) Prepare IPv4 and IPv6 NAT redirect code to use it from nf_tables,
patches from Arturo Borrero.

2) Introduce the nf_tables redir expression, from Arturo Borrero.

3) Remove an unnecessary assignment in ip_vs_xmit/__ip_vs_get_out_rt().
Patch from Alex Gartrell.

4) Add nft_log_dereference() macro to the nf_log infrastructure, patch
from Marcelo Leitner.

5) Add some extra validation when registering logger families, also
from Marcelo.

6) Some spelling cleanups from stephen hemminger.

7) Fix sparse warning in nf_logger_find_get().

8) Add cgroup support to nf_tables meta, patch from Ana Rey.

9) A Kconfig fix for the new redir expression and fix sparse warnings in
the new redir expression.

10) Fix several sparse warnings in the netfilter tree, from
Florian Westphal.

11) Reduce verbosity when OOM in nfnetlink_log. User can basically do
nothing when this situation occurs.

12) Add conntrack zone support to xt_connlimit, again from Florian.

13) Add netnamespace support to the h323 conntrack helper, contributed
by Vasily Averin.

14) Remove unnecessary nul-pointer checks before free_percpu() and
module_put(), from Markus Elfring.

15) Use pr_fmt in nfnetlink_log, again patch from Marcelo Leitner.
====================

Signed-off-by: David S. Miller

David S. Miller
2014-11-25 05:00:58 +0800

24 Nov, 2014

4 commits

227158db1 new helper: skb_copy_and_csum_datagram_msg() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2014-11-24 17:28:44 +0800
20ea60ca9 ip_tunnel: the lack of vti_link_ops' dellink() cause kernel panic ... Browse Code »
3

Now the vti_link_ops do not point the .dellink, for fb tunnel device
(ip_vti0), the net_device will be removed as the default .dellink is
unregister_netdevice_queue,but the tunnel still in the tunnel list,
then if we add a new vti tunnel, in ip_tunnel_find():

hlist_for_each_entry_rcu(t, head, hash_node) {
if (local == t->parms.iph.saddr &&
remote == t->parms.iph.daddr &&
link == t->parms.link &&
==> type == t->dev->type &&
ip_tunnel_key_match(&t->parms, flags, key))
break;
}

the panic will happen, cause dev of ip_tunnel *t is null:
[ 3835.072977] IP: [] ip_tunnel_find+0x9d/0xc0 [ip_tunnel]
[ 3835.073008] PGD b2c21067 PUD b7277067 PMD 0
[ 3835.073008] Oops: 0000 [#1] SMP
.....
[ 3835.073008] Stack:
[ 3835.073008] ffff8800b72d77f0 ffffffffa0411924 ffff8800bb956000 ffff8800b72d78e0
[ 3835.073008] ffff8800b72d78a0 0000000000000000 ffffffffa040d100 ffff8800b72d7858
[ 3835.073008] ffffffffa040b2e3 0000000000000000 0000000000000000 0000000000000000
[ 3835.073008] Call Trace:
[ 3835.073008] [] ip_tunnel_newlink+0x64/0x160 [ip_tunnel]
[ 3835.073008] [] vti_newlink+0x43/0x70 [ip_vti]
[ 3835.073008] [] rtnl_newlink+0x4fa/0x5f0
[ 3835.073008] [] ? nla_strlcpy+0x5b/0x70
[ 3835.073008] [] ? rtnl_link_ops_get+0x40/0x60
[ 3835.073008] [] ? rtnl_newlink+0x13f/0x5f0
[ 3835.073008] [] rtnetlink_rcv_msg+0xa4/0x270
[ 3835.073008] [] ? sock_has_perm+0x75/0x90
[ 3835.073008] [] ? rtnetlink_rcv+0x30/0x30
[ 3835.073008] [] netlink_rcv_skb+0xa9/0xc0
[ 3835.073008] [] rtnetlink_rcv+0x28/0x30
....

modprobe ip_vti
ip link del ip_vti0 type vti
ip link add ip_vti0 type vti
rmmod ip_vti

do that one or more times, kernel will panic.

fix it by assigning ip_tunnel_dellink to vti_link_ops' dellink, in
which we skip the unregister of fb tunnel device. do the same on ip6_vti.

Signed-off-by: Xin Long
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller

lucien
2014-11-24 10:11:17 +0800
e5d08d718 ipv6: coding style improvements (remove assignment in if statements) ... Browse Code »

This change has no functional impact and simply addresses some coding
style issues detected by checkpatch. Specifically this change
adjusts "if" statements which also include the assignment of a
variable.

No changes to the resultant object files result as determined by objdiff.

Signed-off-by: Ian Morris
Signed-off-by: David S. Miller

Ian Morris
2014-11-24 10:00:56 +0800
b6fef4c6b ipv6: Do not treat a GSO_TCPV4 request from UDP tunnel over IPv6 as invalid ... Browse Code »

This patch adds SKB_GSO_TCPV4 to the list of supported GSO types handled by
the IPv6 GSO offloads. Without this change VXLAN tunnels running over IPv6
do not currently handle IPv4 TCP TSO requests correctly and end up handing
the non-segmented frame off to the device.

Below is the before and after for a simple netperf TCP_STREAM test between
two endpoints tunneling IPv4 over a VXLAN tunnel running on IPv6 on top of
a 1Gb/s network adapter.

Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec

87380 16384 16384 10.29 0.88 Before
87380 16384 16384 10.03 895.69 After

Signed-off-by: Alexander Duyck
Signed-off-by: David S. Miller

Alexander Duyck
2014-11-24 03:18:11 +0800

22 Nov, 2014

1 commit

145914338 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/net/ieee802154/fakehard.c

A bug fix went into 'net' for ieee802154/fakehard.c, which is removed
in 'net-next'.

Add build fix into the merge from Stephen Rothwell in openvswitch, the
logging macros take a new initial 'log' argument, a new call was added
in 'net' so when we merge that in here we have to explicitly add the
new 'log' arg to it else the build fails.

Signed-off-by: David S. Miller

David S. Miller
2014-11-22 11:28:24 +0800

20 Nov, 2014

2 commits

fbe68ee87 vti6: Add a lookup method for tunnels with wildcard endpoints. ... Browse Code »

Currently we can't lookup tunnels with wildcard endpoints.
This patch adds a method to lookup these tunnels in the
receive path.

Signed-off-by: Steffen Klassert

Steffen Klassert
2014-11-20 17:03:07 +0800
ffb1388a3 ipv6: delete protocol and unregister rtnetlink when cleanup ... Browse Code »

pim6_protocol was added when initiation, but it not deleted.
Similarly, unregister RTNL_FAMILY_IP6MR rtnetlink.

Signed-off-by: Duan Jiong
Reviewed-by: Cong Wang
Signed-off-by: David S. Miller

Duan Jiong
2014-11-20 05:56:17 +0800

19 Nov, 2014

1 commit

e3e321702 icmp: Remove some spurious dropped packet profile hits from the ICMP path ... Browse Code »

If icmp_rcv() has successfully processed the incoming ICMP datagram, we
should use consume_skb() rather than kfree_skb() because a hit on the likes
of perf -e skb:kfree_skb is not called-for.

Signed-off-by: Rick Jones
Signed-off-by: David S. Miller

Rick Jones
2014-11-19 04:28:28 +0800

17 Nov, 2014

2 commits

feb91a02c ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs ... Browse Code »

It has been reported that generating an MLD listener report on
devices with large MTUs (e.g. 9000) and a high number of IPv6
addresses can trigger a skb_over_panic():

skbuff: skb_over_panic: text:ffffffff80612a5d len:3776 put:20
head:ffff88046d751000 data:ffff88046d751010 tail:0xed0 end:0xec0
dev:port1
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:100!
invalid opcode: 0000 [#1] SMP
Modules linked in: ixgbe(O)
CPU: 3 PID: 0 Comm: swapper/3 Tainted: G O 3.14.23+ #4
[...]
Call Trace:

[] ? skb_put+0x3a/0x3b
[] ? add_grhead+0x45/0x8e
[] ? add_grec+0x394/0x3d4
[] ? mld_ifc_timer_expire+0x195/0x20d
[] ? mld_dad_timer_expire+0x45/0x45
[] ? call_timer_fn.isra.29+0x12/0x68
[] ? run_timer_softirq+0x163/0x182
[] ? __do_softirq+0xe0/0x21d
[] ? irq_exit+0x4e/0xd3
[] ? smp_apic_timer_interrupt+0x3b/0x46
[] ? apic_timer_interrupt+0x6a/0x70

mld_newpack() skb allocations are usually requested with dev->mtu
in size, since commit 72e09ad107e7 ("ipv6: avoid high order allocations")
we have changed the limit in order to be less likely to fail.

However, in MLD/IGMP code, we have some rather ugly AVAILABLE(skb)
macros, which determine if we may end up doing an skb_put() for
adding another record. To avoid possible fragmentation, we check
the skb's tailroom as skb->dev->mtu - skb->len, which is a wrong
assumption as the actual max allocation size can be much smaller.

The IGMP case doesn't have this issue as commit 57e1ab6eaddc
("igmp: refine skb allocations") stores the allocation size in
the cb[].

Set a reserved_tailroom to make it fit into the MTU and use
skb_availroom() helper instead. This also allows to get rid of
igmp_skb_size().

Reported-by: Wei Liu
Fixes: 72e09ad107e7 ("ipv6: avoid high order allocations")
Signed-off-by: Daniel Borkmann
Cc: Eric Dumazet
Cc: Hannes Frederic Sowa
Cc: David L Stevens
Acked-by: Eric Dumazet
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Daniel Borkmann
2014-11-17 05:55:06 +0800
f1227c5c1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf ... Browse Code »

Pablo Neira Ayuso says:

====================
Netfilter/IPVS fixes for net

The following patchset contains Netfilter updates for your net tree,
they are:

1) Fix missing initialization of the range structure (allocated in the
stack) in nft_masq_{ipv4, ipv6}_eval, from Daniel Borkmann.

2) Make sure the data we receive from userspace contains the req_version
structure, otherwise return an error incomplete on truncated input.
From Dan Carpenter.

3) Fix handling og skb->sk which may cause incorrect handling
of connections from a local process. Via Simon Horman, patch from
Calvin Owens.

4) Fix wrong netns in nft_compat when setting target and match params
structure.

5) Relax chain type validation in nft_compat that was recently included,
this broke the matches that need to be run from the route chain type.
Now iptables-test.py automated regression tests report success again
and we avoid the only possible problematic case, which is the use of
nat targets out of nat chain type.

6) Use match->table to validate the tablename, instead of the match->name.
Again patch for nft_compat.

7) Restore the synchronous release of objects from the commit and abort
path in nf_tables. This is causing two major problems: splats when using
nft_compat, given that matches and targets may sleep and call_rcu is
invoked from softirq context. Moreover Patrick reported possible event
notification reordering when rules refer to anonymous sets.

8) Fix race condition in between packets that are being confirmed by
conntrack and the ctnetlink flush operation. This happens since the
removal of the central spinlock. Thanks to Jesper D. Brouer to looking
into this.
====================

Signed-off-by: David S. Miller

David S. Miller
2014-11-17 03:23:56 +0800

13 Nov, 2014

1 commit

567686443 netfilter: fix various sparse warnings ... Browse Code »

net/bridge/br_netfilter.c:870:6: symbol 'br_netfilter_enable' was not declared. Should it be static?
no; add include
net/ipv4/netfilter/nft_reject_ipv4.c:22:6: symbol 'nft_reject_ipv4_eval' was not declared. Should it be static?
yes
net/ipv6/netfilter/nf_reject_ipv6.c:16:6: symbol 'nf_send_reset6' was not declared. Should it be static?
no; add include
net/ipv6/netfilter/nft_reject_ipv6.c:22:6: symbol 'nft_reject_ipv6_eval' was not declared. Should it be static?
yes
net/netfilter/core.c:33:32: symbol 'nf_ipv6_ops' was not declared. Should it be static?
no; add include
net/netfilter/xt_DSCP.c:40:57: cast truncates bits from constant value (ffffff03 becomes 3)
net/netfilter/xt_DSCP.c:57:59: cast truncates bits from constant value (ffffff03 becomes 3)
add __force, 3 is what we want.
net/ipv4/netfilter/nf_log_arp.c:77:6: symbol 'nf_log_arp_packet' was not declared. Should it be static?
yes
net/ipv4/netfilter/nf_reject_ipv4.c:17:6: symbol 'nf_send_reset' was not declared. Should it be static?
no; add include

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2014-11-13 19:14:42 +0800

12 Nov, 2014

3 commits

d7480fd3b neigh: remove dynamic neigh table registration support ... Browse Code »
13

Currently there are only three neigh tables in the whole kernel:
arp table, ndisc table and decnet neigh table. What's more,
we don't support registering multiple tables per family.
Therefore we can just make these tables statically built-in.

Cc: David S. Miller
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller

WANG Cong
2014-11-12 04:23:54 +0800
ba7a46f16 net: Convert LIMIT_NETDEBUG to net_dbg_ratelimited ... Browse Code »

Use the more common dynamic_debug capable net_dbg_ratelimited
and remove the LIMIT_NETDEBUG macro.

All messages are still ratelimited.

Some KERN_ uses are changed to KERN_DEBUG.

This may have some negative impact on messages that were
emitted at KERN_INFO that are not not enabled at all unless
DEBUG is defined or dynamic_debug is enabled. Even so,
these messages are now _not_ emitted by default.

This also eliminates the use of the net_msg_warn sysctl
"/proc/sys/net/core/warnings". For backward compatibility,
the sysctl is not removed, but it has no function. The extern
declaration of net_msg_warn is removed from sock.h and made
static in net/core/sysctl_net_core.c

Miscellanea:

o Update the sysctl documentation
o Remove the embedded uses of pr_fmt
o Coalesce format fragments
o Realign arguments

Signed-off-by: Joe Perches
Signed-off-by: David S. Miller

Joe Perches
2014-11-12 03:10:31 +0800
2c8c56e15 net: introduce SO_INCOMING_CPU ... Browse Code »
5

Alternative to RPS/RFS is to use hardware support for multiple
queues.

Then split a set of million of sockets into worker threads, each
one using epoll() to manage events on its own socket pool.

Ideally, we want one thread per RX/TX queue/cpu, but we have no way to
know after accept() or connect() on which queue/cpu a socket is managed.

We normally use one cpu per RX queue (IRQ smp_affinity being properly
set), so remembering on socket structure which cpu delivered last packet
is enough to solve the problem.

After accept(), connect(), or even file descriptor passing around
processes, applications can use :

int cpu;
socklen_t len = sizeof(cpu);

getsockopt(fd, SOL_SOCKET, SO_INCOMING_CPU, &cpu, &len);

And use this information to put the socket into the right silo
for optimal performance, as all networking stack should run
on the appropriate cpu, without need to send IPI (RPS/RFS).

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2014-11-12 02:00:06 +0800