Eric Lee / smarc-fsl-linux-kernel

27 Nov, 2020

1 commit

4bc3c8dc9 ipvs: fix possible memory leak in ip_vs_control_net_init ... Browse Code »

kmemleak report a memory leak as follows:

BUG: memory leak
unreferenced object 0xffff8880759ea000 (size 256):
backtrace:
[] kmem_cache_zalloc include/linux/slab.h:656 [inline]
[] __proc_create+0x23d/0x7d0 fs/proc/generic.c:421
[] proc_create_reg+0x8e/0x140 fs/proc/generic.c:535
[] proc_create_net_data+0x8c/0x1b0 fs/proc/proc_net.c:126
[] ip_vs_control_net_init+0x308/0x13a0 net/netfilter/ipvs/ip_vs_ctl.c:4169
[] __ip_vs_init+0x211/0x400 net/netfilter/ipvs/ip_vs_core.c:2429
[] ops_init+0xa8/0x3c0 net/core/net_namespace.c:151
[] setup_net+0x2de/0x7e0 net/core/net_namespace.c:341
[] copy_net_ns+0x27d/0x530 net/core/net_namespace.c:482
[] create_new_namespaces+0x382/0xa30 kernel/nsproxy.c:110
[] copy_namespaces+0x2e6/0x3b0 kernel/nsproxy.c:179
[] copy_process+0x220a/0x5f00 kernel/fork.c:2072
[] _do_fork+0xc7/0xda0 kernel/fork.c:2428
[] __do_sys_clone3+0x18a/0x280 kernel/fork.c:2703
[] do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46
[] entry_SYSCALL_64_after_hwframe+0x44/0xa9

In the error path of ip_vs_control_net_init(), remove_proc_entry() needs
to be called to remove the added proc entry, otherwise a memory leak
will occur.

Also, add some '#ifdef CONFIG_PROC_FS' because proc_create_net* return NULL
when PROC is not used.

Fixes: b17fc9963f83 ("IPVS: netns, ip_vs_stats and its procfs")
Fixes: 61b1ab4583e2 ("IPVS: netns, add basic init per netns.")
Reported-by: Hulk Robot
Signed-off-by: Wang Hai
Acked-by: Julian Anastasov
Signed-off-by: Pablo Neira Ayuso

Wang Hai
2020-11-27 19:10:46 +0800

30 Oct, 2020

1 commit

46d6c5ae9 netfilter: use actual socket sk rather than skb sk when routing harder ... Browse Code »

If netfilter changes the packet mark when mangling, the packet is
rerouted using the route_me_harder set of functions. Prior to this
commit, there's one big difference between route_me_harder and the
ordinary initial routing functions, described in the comment above
__ip_queue_xmit():

/* Note: skb->sk can be different from sk, in case of tunnels */
int __ip_queue_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl,

That function goes on to correctly make use of sk->sk_bound_dev_if,
rather than skb->sk->sk_bound_dev_if. And indeed the comment is true: a
tunnel will receive a packet in ndo_start_xmit with an initial skb->sk.
It will make some transformations to that packet, and then it will send
the encapsulated packet out of a *new* socket. That new socket will
basically always have a different sk_bound_dev_if (otherwise there'd be
a routing loop). So for the purposes of routing the encapsulated packet,
the routing information as it pertains to the socket should come from
that socket's sk, rather than the packet's original skb->sk. For that
reason __ip_queue_xmit() and related functions all do the right thing.

One might argue that all tunnels should just call skb_orphan(skb) before
transmitting the encapsulated packet into the new socket. But tunnels do
*not* do this -- and this is wisely avoided in skb_scrub_packet() too --
because features like TSQ rely on skb->destructor() being called when
that buffer space is truely available again. Calling skb_orphan(skb) too
early would result in buffers filling up unnecessarily and accounting
info being all wrong. Instead, additional routing must take into account
the new sk, just as __ip_queue_xmit() notes.

So, this commit addresses the problem by fishing the correct sk out of
state->sk -- it's already set properly in the call to nf_hook() in
__ip_local_out(), which receives the sk as part of its normal
functionality. So we make sure to plumb state->sk through the various
route_me_harder functions, and then make correct use of it following the
example of __ip_queue_xmit().

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jason A. Donenfeld
Reviewed-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Jason A. Donenfeld
2020-10-30 19:57:39 +0800

20 Oct, 2020

1 commit

79dce09ab ipvs: adjust the debug info in function set_tcp_state ... Browse Code »

Outputting client,virtual,dst addresses info when tcp state changes,
which makes the connection debug more clear

Signed-off-by: longguang.yue
Acked-by: Julian Anastasov
Signed-off-by: Pablo Neira Ayuso

longguang.yue
2020-10-20 19:54:46 +0800

16 Oct, 2020

1 commit

2295cddf9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net ... Browse Code »

Minor conflicts in net/mptcp/protocol.h and
tools/testing/selftests/net/Makefile.

In both cases code was added on both sides in the same place
so just keep both.

Signed-off-by: Jakub Kicinski

Jakub Kicinski
2020-10-16 03:43:21 +0800

12 Oct, 2020

2 commits

7980d2eab ipvs: clear skb->tstamp in forwarding path ... Browse Code »

fq qdisc requires tstamp to be cleared in forwarding path

Reported-by: Evgeny B
Link: https://bugzilla.kernel.org/show_bug.cgi?id=209427
Suggested-by: Eric Dumazet
Fixes: 8203e2d844d3 ("net: clear skb->tstamp in forwarding paths")
Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC")
Fixes: 80b14dee2bea ("net: Add a new socket option for a future transmit time.")
Signed-off-by: Julian Anastasov
Reviewed-by: Simon Horman
Signed-off-by: Pablo Neira Ayuso

Julian Anastasov
2020-10-12 07:59:41 +0800
073b04e76 ipvs: inspect reply packets from DR/TUN real servers ... Browse Code »

Just like for MASQ, inspect the reply packets coming from DR/TUN
real servers and alter the connection's state and timeout
according to the protocol.

It's ipvs's duty to do traffic statistic if packets get hit,
no matter what mode it is.

Signed-off-by: longguang.yue
Signed-off-by: Julian Anastasov
Signed-off-by: Pablo Neira Ayuso

longguang.yue
2020-10-12 07:57:34 +0800

05 Oct, 2020

1 commit

321e921da Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next ... Browse Code »

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for net-next:

1) Rename 'searched' column to 'clashres' in conntrack /proc/ stats
to amend a recent patch, from Florian Westphal.

2) Remove unused nft_data_debug(), from YueHaibing.

3) Remove unused definitions in IPVS, also from YueHaibing.

4) Fix user data memleak in tables and objects, this is also amending
a recent patch, from Jose M. Guisado.

5) Use nla_memdup() to allocate user data in table and objects, also
from Jose M. Guisado

6) User data support for chains, from Jose M. Guisado

7) Remove unused definition in nf_tables_offload, from YueHaibing.

8) Use kvzalloc() in ip_set_alloc(), from Vasily Averin.

9) Fix false positive reported by lockdep in nfnetlink mutexes,
from Florian Westphal.

10) Extend fast variant of cmp for neq operation, from Phil Sutter.

11) Implement fast bitwise variant, also from Phil Sutter.
====================

Signed-off-by: David S. Miller

David S. Miller
2020-10-05 05:35:53 +0800

03 Oct, 2020

1 commit

66a9b9287 genetlink: move to smaller ops wherever possible ... Browse Code »

Bulk of the genetlink users can use smaller ops, move them.

Signed-off-by: Jakub Kicinski
Reviewed-by: Johannes Berg
Signed-off-by: David S. Miller

Jakub Kicinski
2020-10-03 10:11:11 +0800

22 Sep, 2020

1 commit

18cd9b00f ipvs: Remove unused macros ... Browse Code »

They are not used since commit e4ff67513096 ("ipvs: add
sync_maxlen parameter for the sync daemon")

Signed-off-by: YueHaibing
Acked-by: Simon Horman
Signed-off-by: Pablo Neira Ayuso

YueHaibing
2020-09-22 07:36:20 +0800

10 Sep, 2020

1 commit

d85427e3c Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next ... Browse Code »

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for net-next:

1) Rewrite inner header IPv6 in ICMPv6 messages in ip6t_NPT,
from Michael Zhou.

2) do_ip_vs_set_ctl() dereferences uninitialized value,
from Peilin Ye.

3) Support for userdata in tables, from Jose M. Guisado.

4) Do not increment ct error and invalid stats at the same time,
from Florian Westphal.

5) Remove ct ignore stats, also from Florian.

6) Add ct stats for clash resolution, from Florian Westphal.

7) Bump reference counter bump on ct clash resolution only,
this is safe because bucket lock is held, again from Florian.

8) Use ip_is_fragment() in xt_HMARK, from YueHaibing.

9) Add wildcard support for nft_socket, from Balazs Scheidler.

10) Remove superfluous IPVS dependency on iptables, from
Yaroslav Bolyukin.

11) Remove unused definition in ebt_stp, from Wang Hai.

12) Replace CONFIG_NFT_CHAIN_NAT_{IPV4,IPV6} by CONFIG_NFT_NAT
in selftests/net, from Fabian Frederick.

13) Add userdata support for nft_object, from Jose M. Guisado.
====================

Signed-off-by: David S. Miller

David S. Miller
2020-09-10 02:21:19 +0800

01 Sep, 2020

1 commit

144b0a0e6 ipvs: remove dependency on ip6_tables ... Browse Code »

This dependency was added because ipv6_find_hdr was in iptables specific
code but is no longer required

Fixes: f8f626754ebe ("ipv6: Move ipv6_find_hdr() out of Netfilter code.")
Fixes: 63dca2c0b0e7 ("ipvs: Fix faulty IPv6 extension header handling in IPVS")
Signed-off-by: Yaroslav Bolyukin
Acked-by: Julian Anastasov
Signed-off-by: Pablo Neira Ayuso

Yaroslav Bolyukin
2020-09-01 05:06:51 +0800

29 Aug, 2020

1 commit

c5a8a8498 ipvs: Fix uninit-value in do_ip_vs_set_ctl() ... Browse Code »

do_ip_vs_set_ctl() is referencing uninitialized stack value when `len` is
zero. Fix it.

Reported-by: syzbot+23b5f9e7caf61d9a3898@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=46ebfb92a8a812621a001ef04d90dfa459520fe2
Suggested-by: Julian Anastasov
Signed-off-by: Peilin Ye
Acked-by: Julian Anastasov
Reviewed-by: Simon Horman
Signed-off-by: Pablo Neira Ayuso

Peilin Ye
2020-08-29 01:18:48 +0800

24 Aug, 2020

1 commit

df561f668 treewide: Use fallthrough pseudo-keyword ... Browse Code »

Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva

Gustavo A. R. Silva
2020-08-24 06:36:59 +0800

04 Aug, 2020

1 commit

f2e0b29a9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next ... Browse Code »

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

1) UAF in chain binding support from previous batch, from Dan Carpenter.

2) Queue up delayed work to expire connections with no destination,
from Andrew Sy Kim.

3) Use fallthrough pseudo-keyword, from Gustavo A. R. Silva.

4) Replace HTTP links with HTTPS, from Alexander A. Klimov.

5) Remove superfluous null header checks in ip6tables, from
Gaurav Singh.

6) Add extended netlink error reporting for expression.

7) Report EEXIST on overlapping chain, set elements and flowtable
devices.
====================

Signed-off-by: David S. Miller

David S. Miller
2020-08-04 07:03:18 +0800

26 Jul, 2020

1 commit

a57066b1a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net ... Browse Code »

The UDP reuseport conflict was a little bit tricky.

The net-next code, via bpf-next, extracted the reuseport handling
into a helper so that the BPF sk lookup code could invoke it.

At the same time, the logic for reuseport handling of unconnected
sockets changed via commit efc6b6f6c3113e8b203b9debfb72d81e0f3dcace
which changed the logic to carry on the reuseport result into the
rest of the lookup loop if we do not return immediately.

This requires moving the reuseport_has_conns() logic into the callers.

While we are here, get rid of inline directives as they do not belong
in foo.c files.

The other changes were cases of more straightforward overlapping
modifications.

Signed-off-by: David S. Miller

David S. Miller
2020-07-26 08:49:04 +0800

25 Jul, 2020

1 commit

c2f12630c netfilter: switch nf_setsockopt to sockptr_t ... Browse Code »

Pass a sockptr_t to prepare for set_fs-less handling of the kernel
pointer from bpf-cgroup.

Signed-off-by: Christoph Hellwig
Signed-off-by: David S. Miller

Christoph Hellwig
2020-07-25 06:41:54 +0800

22 Jul, 2020

2 commits

8210e344c ipvs: fix the connection sync failed in some cases ... Browse Code »

The sync_thread_backup only checks sk_receive_queue is empty or not,
there is a situation which cannot sync the connection entries when
sk_receive_queue is empty and sk_rmem_alloc is larger than sk_rcvbuf,
the sync packets are dropped in __udp_enqueue_schedule_skb, this is
because the packets in reader_queue is not read, so the rmem is
not reclaimed.

Here I add the check of whether the reader_queue of the udp sock is
empty or not to solve this problem.

Fixes: 2276f58ac589 ("udp: use a separate rx queue for packet reception")
Reported-by: zhouxudong
Signed-off-by: guodeqing
Acked-by: Julian Anastasov
Signed-off-by: Pablo Neira Ayuso

guodeqing
2020-07-22 07:21:34 +0800
35dfb0131 ipvs: queue delayed work to expire no destination connections if expire_nodest_conn=1 ... Browse Code »

When expire_nodest_conn=1 and a destination is deleted, IPVS does not
expire the existing connections until the next matching incoming packet.
If there are many connection entries from a single client to a single
destination, many packets may get dropped before all the connections are
expired (more likely with lots of UDP traffic). An optimization can be
made where upon deletion of a destination, IPVS queues up delayed work
to immediately expire any connections with a deleted destination. This
ensures any reused source ports from a client (within the IPVS timeouts)
are scheduled to new real servers instead of silently dropped.

Signed-off-by: Andrew Sy Kim
Signed-off-by: Julian Anastasov
Signed-off-by: Pablo Neira Ayuso

Andrew Sy Kim
2020-07-22 07:17:59 +0800

04 Jul, 2020

1 commit

f0a5e4d7a ipvs: allow connection reuse for unconfirmed conntrack ... Browse Code »

YangYuxi is reporting that connection reuse
is causing one-second delay when SYN hits
existing connection in TIME_WAIT state.
Such delay was added to give time to expire
both the IPVS connection and the corresponding
conntrack. This was considered a rare case
at that time but it is causing problem for
some environments such as Kubernetes.

As nf_conntrack_tcp_packet() can decide to
release the conntrack in TIME_WAIT state and
to replace it with a fresh NEW conntrack, we
can use this to allow rescheduling just by
tuning our check: if the conntrack is
confirmed we can not schedule it to different
real server and the one-second delay still
applies but if new conntrack was created,
we are free to select new real server without
any delays.

YangYuxi lists some of the problem reports:

- One second connection delay in masquerading mode:
https://marc.info/?t=151683118100004&r=1&w=2

- IPVS low throughput #70747
https://github.com/kubernetes/kubernetes/issues/70747

- Apache Bench can fill up ipvs service proxy in seconds #544
https://github.com/cloudnativelabs/kube-router/issues/544

- Additional 1s latency in `host -> service IP -> pod`
https://github.com/kubernetes/kubernetes/issues/90854

Fixes: f719e3754ee2 ("ipvs: drop first packet to redirect conntrack")
Co-developed-by: YangYuxi
Signed-off-by: YangYuxi
Signed-off-by: Julian Anastasov
Reviewed-by: Simon Horman
Signed-off-by: Pablo Neira Ayuso

Julian Anastasov
2020-07-04 07:18:37 +0800

01 Jul, 2020

2 commits

f9200a52e ipvs: avoid expiring many connections from timer ... Browse Code »

Add new functions ip_vs_conn_del() and ip_vs_conn_del_put()
to release many IPVS connections in process context.
They are suitable for connections found in table
when we do not want to overload the timers.

Currently, the change is useful for the dropentry delayed
work but it will be used also in following patch
when flushing connections to failed destinations.

Signed-off-by: Julian Anastasov
Reviewed-by: Simon Horman
Signed-off-by: Pablo Neira Ayuso

Julian Anastasov
2020-07-01 16:18:20 +0800
857ca8971 ipvs: register hooks only with services ... Browse Code »

Keep the IPVS hooks registered in Netfilter only
while there are configured virtual services. This
saves CPU cycles while IPVS is loaded but not used.

Signed-off-by: Julian Anastasov
Reviewed-by: Simon Horman
Signed-off-by: Pablo Neira Ayuso

Julian Anastasov
2020-07-01 00:37:39 +0800

14 Jun, 2020

1 commit

a7f7f6248 treewide: replace '---help---' in Kconfig files with 'help' ... Browse Code »

Since commit 84af7a6194e4 ("checkpatch: kconfig: prefer 'help' over
'---help---'"), the number of '---help---' has been gradually
decreasing, but there are still more than 2400 instances.

This commit finishes the conversion. While I touched the lines,
I also fixed the indentation.

There are a variety of indentation styles found.

a) 4 spaces + '---help---'
b) 7 spaces + '---help---'
c) 8 spaces + '---help---'
d) 1 space + 1 tab + '---help---'
e) 1 tab + '---help---' (correct indentation)
f) 1 tab + 1 space + '---help---'
g) 1 tab + 2 spaces + '---help---'

In order to convert all of them to 1 tab + 'help', I ran the
following commend:

$ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/'

Signed-off-by: Masahiro Yamada

Masahiro Yamada
2020-06-14 00:57:21 +0800

27 Apr, 2020

1 commit

32927393d sysctl: pass kernel pointers to ->proc_handler ... Browse Code »

Instead of having all the sysctl handlers deal with user pointers, which
is rather hairy in terms of the BPF interaction, copy the input to and
from userspace in common code. This also means that the strings are
always NUL-terminated by the common code, making the API a little bit
safer.

As most handler just pass through the data to one of the common handlers
a lot of the changes are mechnical.

Signed-off-by: Christoph Hellwig
Acked-by: Andrey Ignatov
Signed-off-by: Al Viro

Christoph Hellwig
2020-04-27 14:07:40 +0800

31 Mar, 2020

1 commit

e19680f83 ipvs: fix uninitialized variable warning ... Browse Code »

If outer_proto is not set, GCC warning as following:

In file included from net/netfilter/ipvs/ip_vs_core.c:52:
net/netfilter/ipvs/ip_vs_core.c: In function 'ip_vs_in_icmp':
include/net/ip_vs.h:233:4: warning: 'outer_proto' may be used uninitialized in this function [-Wmaybe-uninitialized]
233 | printk(KERN_DEBUG pr_fmt(msg), ##__VA_ARGS__); \
| ^~~~~~
net/netfilter/ipvs/ip_vs_core.c:1666:8: note: 'outer_proto' was declared here
1666 | char *outer_proto;
| ^~~~~~~~~~~

Fixes: 73348fed35d0 ("ipvs: optimize tunnel dumps for icmp errors")
Signed-off-by: Haishuang Yan
Acked-by: Julian Anastasov
Signed-off-by: Pablo Neira Ayuso

Haishuang Yan
2020-03-31 03:17:53 +0800

28 Mar, 2020

1 commit

73348fed3 ipvs: optimize tunnel dumps for icmp errors ... Browse Code »

After strip GRE/UDP tunnel header for icmp errors, it's better to show
"GRE/UDP" instead of "IPIP" in debug message.

Signed-off-by: Haishuang Yan
Acked-by: Julian Anastasov
Signed-off-by: Pablo Neira Ayuso

Haishuang Yan
2020-03-28 01:31:01 +0800

24 Jan, 2020

1 commit

971485a0d ipvs: fix spelling mistake "to" -> "too" ... Browse Code »

There is a spelling mistake in a IP_VS_ERR_RL message. Fix it.

Signed-off-by: Colin Ian King
Signed-off-by: David S. Miller

Colin Ian King
2020-01-24 15:12:06 +0800

25 Dec, 2019

1 commit

bd085ef67 net: add bool confirm_neigh parameter for dst_ops.update_pmtu ... Browse Code »

The MTU update code is supposed to be invoked in response to real
networking events that update the PMTU. In IPv6 PMTU update function
__ip6_rt_update_pmtu() we called dst_confirm_neigh() to update neighbor
confirmed time.

But for tunnel code, it will call pmtu before xmit, like:
- tnl_update_pmtu()
- skb_dst_update_pmtu()
- ip6_rt_update_pmtu()
- __ip6_rt_update_pmtu()
- dst_confirm_neigh()

If the tunnel remote dst mac address changed and we still do the neigh
confirm, we will not be able to update neigh cache and ping6 remote
will failed.

So for this ip_tunnel_xmit() case, _EVEN_ if the MTU is changed, we
should not be invoking dst_confirm_neigh() as we have no evidence
of successful two-way communication at this point.

On the other hand it is also important to keep the neigh reachability fresh
for TCP flows, so we cannot remove this dst_confirm_neigh() call.

To fix the issue, we have to add a new bool parameter for dst_ops.update_pmtu
to choose whether we should do neigh update or not. I will add the parameter
in this patch and set all the callers to true to comply with the previous
way, and fix the tunnel code one by one on later patches.

v5: No change.
v4: No change.
v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
dst_ops.update_pmtu to control whether we should do neighbor confirm.
Also split the big patch to small ones for each area.
v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.

Suggested-by: David Miller
Reviewed-by: Guillaume Nault
Acked-by: David Ahern
Signed-off-by: Hangbin Liu
Signed-off-by: David S. Miller

Hangbin Liu
2019-12-25 14:28:54 +0800

27 Nov, 2019

1 commit

82f31ebf6 net: port < inet_prot_sock(net) --> inet_port_requires_bind_service(net, port) ... Browse Code »

Note that the sysctl write accessor functions guarantee that:
net->ipv4.sysctl_ip_prot_sock ipv4.ip_local_ports.range[0]
invariant is maintained, and as such the max() in selinux hooks is actually spurious.

ie. even though
if (snum < max(inet_prot_sock(sock_net(sk)), low) || snum > high) {
per logic is the same as
if ((snum < inet_prot_sock(sock_net(sk)) && snum < low) || snum > high) {
it is actually functionally equivalent to:
if (snum < low || snum > high) {
which is equivalent to:
if (snum < inet_prot_sock(sock_net(sk)) || snum < low || snum > high) {
even though the first clause is spurious.

But we want to hold on to it in case we ever want to change what what
inet_port_requires_bind_service() means (for example by changing
it from a, by default, [0..1024) range to some sort of set).

Test: builds, git 'grep inet_prot_sock' finds no other references
Cc: Eric Dumazet
Signed-off-by: Maciej Żenczykowski
Signed-off-by: David S. Miller

Maciej Żenczykowski
2019-11-27 05:20:46 +0800

03 Nov, 2019

1 commit

d31e95585 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net ... Browse Code »

The only slightly tricky merge conflict was the netdevsim because the
mutex locking fix overlapped a lot of driver reload reorganization.

The rest were (relatively) trivial in nature.

Signed-off-by: David S. Miller

David S. Miller
2019-11-03 04:54:56 +0800

29 Oct, 2019

1 commit

e1b185491 net: Fix various misspellings of "connect" ... Browse Code »

Fix misspellings of "disconnect", "disconnecting", "connections", and
"disconnected".

Signed-off-by: Geert Uytterhoeven
Acked-by: Kalle Valo
Acked-by: Simon Horman
Signed-off-by: David S. Miller

Geert Uytterhoeven
2019-10-29 04:41:59 +0800

24 Oct, 2019

2 commits

c24b75e0f ipvs: move old_secure_tcp into struct netns_ipvs ... Browse Code »

syzbot reported the following issue :

BUG: KCSAN: data-race in update_defense_level / update_defense_level

read to 0xffffffff861a6260 of 4 bytes by task 3006 on cpu 1:
update_defense_level+0x621/0xb30 net/netfilter/ipvs/ip_vs_ctl.c:177
defense_work_handler+0x3d/0xd0 net/netfilter/ipvs/ip_vs_ctl.c:225
process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
worker_thread+0xa0/0x800 kernel/workqueue.c:2415
kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352

write to 0xffffffff861a6260 of 4 bytes by task 7333 on cpu 0:
update_defense_level+0xa62/0xb30 net/netfilter/ipvs/ip_vs_ctl.c:205
defense_work_handler+0x3d/0xd0 net/netfilter/ipvs/ip_vs_ctl.c:225
process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
worker_thread+0xa0/0x800 kernel/workqueue.c:2415
kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 7333 Comm: kworker/0:5 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events defense_work_handler

Indeed, old_secure_tcp is currently a static variable, while it
needs to be a per netns variable.

Fixes: a0840e2e165a ("IPVS: netns, ip_vs_ctl local vars moved to ipvs struct.")
Signed-off-by: Eric Dumazet
Reported-by: syzbot
Signed-off-by: Simon Horman

Eric Dumazet
2019-10-24 17:56:02 +0800
62931f59c ipvs: don't ignore errors in case refcounting ip_vs module fails ... Browse Code »

if the IPVS module is removed while the sync daemon is starting, there is
a small gap where try_module_get() might fail getting the refcount inside
ip_vs_use_count_inc(). Then, the refcounts of IPVS module are unbalanced,
and the subsequent call to stop_sync_thread() causes the following splat:

WARNING: CPU: 0 PID: 4013 at kernel/module.c:1146 module_put.part.44+0x15b/0x290
Modules linked in: ip_vs(-) nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ext4 mbcache jbd2 ghash_clmulni_intel snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_nhlt snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd glue_helper joydev pcspkr snd_timer virtio_balloon snd soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi virtio_net net_failover virtio_blk failover virtio_console qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ata_piix ttm crc32c_intel serio_raw drm virtio_pci libata virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: nf_defrag_ipv6]
CPU: 0 PID: 4013 Comm: modprobe Tainted: G W 5.4.0-rc1.upstream+ #741
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:module_put.part.44+0x15b/0x290
Code: 04 25 28 00 00 00 0f 85 18 01 00 00 48 83 c4 68 5b 5d 41 5c 41 5d 41 5e 41 5f c3 89 44 24 28 83 e8 01 89 c5 0f 89 57 ff ff ff 0b e9 78 ff ff ff 65 8b 1d 67 83 26 4a 89 db be 08 00 00 00 48
RSP: 0018:ffff888050607c78 EFLAGS: 00010297
RAX: 0000000000000003 RBX: ffffffffc1420590 RCX: ffffffffb5db0ef9
RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffffc1420590
RBP: 00000000ffffffff R08: fffffbfff82840b3 R09: fffffbfff82840b3
R10: 0000000000000001 R11: fffffbfff82840b2 R12: 1ffff1100a0c0f90
R13: ffffffffc1420200 R14: ffff88804f533300 R15: ffff88804f533ca0
FS: 00007f8ea9720740(0000) GS:ffff888053800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f3245abe000 CR3: 000000004c28a006 CR4: 00000000001606f0
Call Trace:
stop_sync_thread+0x3a3/0x7c0 [ip_vs]
ip_vs_sync_net_cleanup+0x13/0x50 [ip_vs]
ops_exit_list.isra.5+0x94/0x140
unregister_pernet_operations+0x29d/0x460
unregister_pernet_device+0x26/0x60
ip_vs_cleanup+0x11/0x38 [ip_vs]
__x64_sys_delete_module+0x2d5/0x400
do_syscall_64+0xa5/0x4e0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f8ea8bf0db7
Code: 73 01 c3 48 8b 0d b9 80 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 3d 01 f0 ff ff 73 01 c3 48 8b 0d 89 80 2c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcd38d2fe8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
RAX: ffffffffffffffda RBX: 0000000002436240 RCX: 00007f8ea8bf0db7
RDX: 0000000000000000 RSI: 0000000000000800 RDI: 00000000024362a8
RBP: 0000000000000000 R08: 00007f8ea8eba060 R09: 00007f8ea8c658a0
R10: 00007ffcd38d2a60 R11: 0000000000000206 R12: 0000000000000000
R13: 0000000000000001 R14: 00000000024362a8 R15: 0000000000000000
irq event stamp: 4538
hardirqs last enabled at (4537): [] quarantine_put+0x9e/0x170
hardirqs last disabled at (4538): [] trace_hardirqs_off_thunk+0x1a/0x20
softirqs last enabled at (4522): [] sk_common_release+0x169/0x2d0
softirqs last disabled at (4520): [] sk_common_release+0xbe/0x2d0

Check the return value of ip_vs_use_count_inc() and let its caller return
proper error. Inside do_ip_vs_set_ctl() the module is already refcounted,
we don't need refcount/derefcount there. Finally, in register_ip_vs_app()
and start_sync_thread(), take the module refcount earlier and ensure it's
released in the error path.

Change since v1:
- better return values in case of failure of ip_vs_use_count_inc(),
thanks to Julian Anastasov
- no need to increase/decrease the module refcount in ip_vs_set_ctl(),
thanks to Julian Anastasov

Signed-off-by: Davide Caratti
Signed-off-by: Julian Anastasov
Signed-off-by: Simon Horman

Davide Caratti
2019-10-24 17:53:19 +0800

08 Oct, 2019

3 commits

ac524481d ipvs: batch __ip_vs_dev_cleanup ... Browse Code »

It's better to batch __ip_vs_cleanup to speedup ipvs
devices dismantle.

Signed-off-by: Haishuang Yan
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman

Haishuang Yan
2019-10-08 17:28:33 +0800
5d5a0815f ipvs: batch __ip_vs_cleanup ... Browse Code »

It's better to batch __ip_vs_cleanup to speedup ipvs
connections dismantle.

Signed-off-by: Haishuang Yan
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman

Haishuang Yan
2019-10-08 17:28:33 +0800
c09b8970f ipvs: no need to update skb route entry for local destination packets. ... Browse Code »

In the end of function __ip_vs_get_out_rt/__ip_vs_get_out_rt_v6,the
'local' variable is always zero.

Signed-off-by: zhang kai
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman

zhang kai
2019-10-08 17:28:33 +0800

02 Oct, 2019

1 commit

895b5c9f2 netfilter: drop bridge nf reset from nf_reset ... Browse Code »

commit 174e23810cd31
("sk_buff: drop all skb extensions on free and skb scrubbing") made napi
recycle always drop skb extensions. The additional skb_ext_del() that is
performed via nf_reset on napi skb recycle is not needed anymore.

Most nf_reset() calls in the stack are there so queued skb won't block
'rmmod nf_conntrack' indefinitely.

This removes the skb_ext_del from nf_reset, and renames it to a more
fitting nf_reset_ct().

In a few selected places, add a call to skb_ext_reset to make sure that
no active extensions remain.

I am submitting this for "net", because we're still early in the release
cycle. The patch applies to net-next too, but I think the rename causes
needless divergence between those trees.

Suggested-by: Eric Dumazet
Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2019-10-02 00:42:15 +0800

26 Sep, 2019

1 commit

bf69abad2 net: Fix Kconfig indentation ... Browse Code »

Adjust indentation from spaces to tab (+optional two spaces) as in
coding style with command like:
$ sed -e 's/^ /\t/' -i */Kconfig

Signed-off-by: Krzysztof Kozlowski
Acked-by: Sven Eckelmann
Signed-off-by: David S. Miller

Krzysztof Kozlowski
2019-09-26 14:56:17 +0800

14 Aug, 2019

1 commit

c162610c7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next ... Browse Code »

Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for net-next:

1) Rename mss field to mss_option field in synproxy, from Fernando Mancera.

2) Use SYSCTL_{ZERO,ONE} definitions in conntrack, from Matteo Croce.

3) More strict validation of IPVS sysctl values, from Junwei Hu.

4) Remove unnecessary spaces after on the right hand side of assignments,
from yangxingwu.

5) Add offload support for bitwise operation.

6) Extend the nft_offload_reg structure to store immediate date.

7) Collapse several ip_set header files into ip_set.h, from
Jeremy Sowden.

8) Make netfilter headers compile with CONFIG_KERNEL_HEADER_TEST=y,
from Jeremy Sowden.

9) Fix several sparse warnings due to missing prototypes, from
Valdis Kletnieks.

10) Use static lock initialiser to ensure connlabel spinlock is
initialized on boot time to fix sched/act_ct.c, patch
from Florian Westphal.
====================

Signed-off-by: Jakub Kicinski

Jakub Kicinski
2019-08-14 09:22:57 +0800

13 Aug, 2019

1 commit

7e59b3fea netfilter: remove unnecessary spaces ... Browse Code »

This patch removes extra spaces.

Signed-off-by: yangxingwu
Signed-off-by: Pablo Neira Ayuso

yangxingwu
2019-08-13 18:08:48 +0800

09 Aug, 2019

1 commit

9d2f11238 net: delete "register" keyword ... Browse Code »

Delete long obsoleted "register" keyword.

Signed-off-by: Alexey Dobriyan
Signed-off-by: David S. Miller

Alexey Dobriyan
2019-08-09 09:03:42 +0800