Eric Lee / smarc-fsl-linux-kernel

24 Aug, 2018

1 commit

8e39e96f2 ipv6: make ipv6_renew_options() interrupt/kernel safe ... Browse Code »

[ Upstream commit a9ba23d48dbc6ffd08426bb10f05720e0b9f5c14 ]

At present the ipv6_renew_options_kern() function ends up calling into
access_ok() which is problematic if done from inside an interrupt as
access_ok() calls WARN_ON_IN_IRQ() on some (all?) architectures
(x86-64 is affected). Example warning/backtrace is shown below:

WARNING: CPU: 1 PID: 3144 at lib/usercopy.c:11 _copy_from_user+0x85/0x90
...
Call Trace:

ipv6_renew_option+0xb2/0xf0
ipv6_renew_options+0x26a/0x340
ipv6_renew_options_kern+0x2c/0x40
calipso_req_setattr+0x72/0xe0
netlbl_req_setattr+0x126/0x1b0
selinux_netlbl_inet_conn_request+0x80/0x100
selinux_inet_conn_request+0x6d/0xb0
security_inet_conn_request+0x32/0x50
tcp_conn_request+0x35f/0xe00
? __lock_acquire+0x250/0x16c0
? selinux_socket_sock_rcv_skb+0x1ae/0x210
? tcp_rcv_state_process+0x289/0x106b
tcp_rcv_state_process+0x289/0x106b
? tcp_v6_do_rcv+0x1a7/0x3c0
tcp_v6_do_rcv+0x1a7/0x3c0
tcp_v6_rcv+0xc82/0xcf0
ip6_input_finish+0x10d/0x690
ip6_input+0x45/0x1e0
? ip6_rcv_finish+0x1d0/0x1d0
ipv6_rcv+0x32b/0x880
? ip6_make_skb+0x1e0/0x1e0
__netif_receive_skb_core+0x6f2/0xdf0
? process_backlog+0x85/0x250
? process_backlog+0x85/0x250
? process_backlog+0xec/0x250
process_backlog+0xec/0x250
net_rx_action+0x153/0x480
__do_softirq+0xd9/0x4f7
do_softirq_own_stack+0x2a/0x40

...

While not present in the backtrace, ipv6_renew_option() ends up calling
access_ok() via the following chain:

access_ok()
_copy_from_user()
copy_from_user()
ipv6_renew_option()

The fix presented in this patch is to perform the userspace copy
earlier in the call chain such that it is only called when the option
data is actually coming from userspace; that place is
do_ipv6_setsockopt(). Not only does this solve the problem seen in
the backtrace above, it also allows us to simplify the code quite a
bit by removing ipv6_renew_options_kern() completely. We also take
this opportunity to cleanup ipv6_renew_options()/ipv6_renew_option()
a small amount as well.

This patch is heavily based on a rough patch by Al Viro. I've taken
his original patch, converted a kmemdup() call in do_ipv6_setsockopt()
to a memdup_user() call, made better use of the e_inval jump target in
the same function, and cleaned up the use ipv6_renew_option() by
ipv6_renew_options().

CC: Al Viro
Signed-off-by: Paul Moore
Signed-off-by: David S. Miller
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Paul Moore
2018-08-24 19:09:13 +0800

28 Feb, 2018

1 commit

485595768 netfilter: drop outermost socket lock in getsockopt() ... Browse Code »

commit 01ea306f2ac2baff98d472da719193e738759d93 upstream.

The Syzbot reported a possible deadlock in the netfilter area caused by
rtnl lock, xt lock and socket lock being acquired with a different order
on different code paths, leading to the following backtrace:
Reviewed-by: Xin Long

======================================================
WARNING: possible circular locking dependency detected
4.15.0+ #301 Not tainted
------------------------------------------------------
syzkaller233489/4179 is trying to acquire lock:
(rtnl_mutex){+.+.}, at: [] rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74

but task is already holding lock:
(&xt[i].mutex){+.+.}, at: []
xt_find_table_lock+0x3e/0x3e0 net/netfilter/x_tables.c:1041

which lock already depends on the new lock.
===

Since commit 3f34cfae1230 ("netfilter: on sockopt() acquire sock lock
only in the required scope"), we already acquire the socket lock in
the innermost scope, where needed. In such commit I forgot to remove
the outer-most socket lock from the getsockopt() path, this commit
addresses the issues dropping it now.

v1 -> v2: fix bad subj, added relavant 'fixes' tag

Fixes: 22265a5c3c10 ("netfilter: xt_TEE: resolve oif using netdevice notifiers")
Fixes: 202f59afd441 ("netfilter: ipt_CLUSTERIP: do not hold dev")
Fixes: 3f34cfae1230 ("netfilter: on sockopt() acquire sock lock only in the required scope")
Reported-by: syzbot+ddde1c7b7ff7442d7f2d@syzkaller.appspotmail.com
Suggested-by: Florian Westphal
Signed-off-by: Paolo Abeni
Signed-off-by: Pablo Neira Ayuso
Tested-by: Krzysztof Piotr Oledzki
Signed-off-by: Greg Kroah-Hartman

Paolo Abeni
2018-02-28 17:19:38 +0800

25 Feb, 2018

1 commit

516c855cf netfilter: on sockopt() acquire sock lock only in the required scope ... Browse Code »

commit 3f34cfae1238848fd53f25e5c8fd59da57901f4b upstream.

Syzbot reported several deadlocks in the netfilter area caused by
rtnl lock and socket lock being acquired with a different order on
different code paths, leading to backtraces like the following one:

======================================================
WARNING: possible circular locking dependency detected
4.15.0-rc9+ #212 Not tainted
------------------------------------------------------
syzkaller041579/3682 is trying to acquire lock:
(sk_lock-AF_INET6){+.+.}, at: [] lock_sock
include/net/sock.h:1463 [inline]
(sk_lock-AF_INET6){+.+.}, at: []
do_ipv6_setsockopt.isra.8+0x3c5/0x39d0 net/ipv6/ipv6_sockglue.c:167

but task is already holding lock:
(rtnl_mutex){+.+.}, at: [] rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (rtnl_mutex){+.+.}:
__mutex_lock_common kernel/locking/mutex.c:756 [inline]
__mutex_lock+0x16f/0x1a80 kernel/locking/mutex.c:893
mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74
register_netdevice_notifier+0xad/0x860 net/core/dev.c:1607
tee_tg_check+0x1a0/0x280 net/netfilter/xt_TEE.c:106
xt_check_target+0x22c/0x7d0 net/netfilter/x_tables.c:845
check_target net/ipv6/netfilter/ip6_tables.c:538 [inline]
find_check_entry.isra.7+0x935/0xcf0
net/ipv6/netfilter/ip6_tables.c:580
translate_table+0xf52/0x1690 net/ipv6/netfilter/ip6_tables.c:749
do_replace net/ipv6/netfilter/ip6_tables.c:1165 [inline]
do_ip6t_set_ctl+0x370/0x5f0 net/ipv6/netfilter/ip6_tables.c:1691
nf_sockopt net/netfilter/nf_sockopt.c:106 [inline]
nf_setsockopt+0x67/0xc0 net/netfilter/nf_sockopt.c:115
ipv6_setsockopt+0x115/0x150 net/ipv6/ipv6_sockglue.c:928
udpv6_setsockopt+0x45/0x80 net/ipv6/udp.c:1422
sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2978
SYSC_setsockopt net/socket.c:1849 [inline]
SyS_setsockopt+0x189/0x360 net/socket.c:1828
entry_SYSCALL_64_fastpath+0x29/0xa0

-> #0 (sk_lock-AF_INET6){+.+.}:
lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3914
lock_sock_nested+0xc2/0x110 net/core/sock.c:2780
lock_sock include/net/sock.h:1463 [inline]
do_ipv6_setsockopt.isra.8+0x3c5/0x39d0 net/ipv6/ipv6_sockglue.c:167
ipv6_setsockopt+0xd7/0x150 net/ipv6/ipv6_sockglue.c:922
udpv6_setsockopt+0x45/0x80 net/ipv6/udp.c:1422
sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2978
SYSC_setsockopt net/socket.c:1849 [inline]
SyS_setsockopt+0x189/0x360 net/socket.c:1828
entry_SYSCALL_64_fastpath+0x29/0xa0

other info that might help us debug this:

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(rtnl_mutex);
lock(sk_lock-AF_INET6);
lock(rtnl_mutex);
lock(sk_lock-AF_INET6);

*** DEADLOCK ***

1 lock held by syzkaller041579/3682:
#0: (rtnl_mutex){+.+.}, at: [] rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74

The problem, as Florian noted, is that nf_setsockopt() is always
called with the socket held, even if the lock itself is required only
for very tight scopes and only for some operation.

This patch addresses the issues moving the lock_sock() call only
where really needed, namely in ipv*_getorigdst(), so that nf_setsockopt()
does not need anymore to acquire both locks.

Fixes: 22265a5c3c10 ("netfilter: xt_TEE: resolve oif using netdevice notifiers")
Reported-by: syzbot+a4c2dc980ac1af699b36@syzkaller.appspotmail.com
Suggested-by: Florian Westphal
Signed-off-by: Paolo Abeni
Signed-off-by: Pablo Neira Ayuso
Signed-off-by: Greg Kroah-Hartman

Paolo Abeni
2018-02-25 18:07:50 +0800

31 Jan, 2018

1 commit

2295b3dd5 ipv6: Fix getsockopt() for sockets with default IPV6_AUTOFLOWLABEL ... Browse Code »

[ Upstream commit e9191ffb65d8e159680ce0ad2224e1acbde6985c ]

Commit 513674b5a2c9 ("net: reevalulate autoflowlabel setting after
sysctl setting") removed the initialisation of
ipv6_pinfo::autoflowlabel and added a second flag to indicate
whether this field or the net namespace default should be used.

The getsockopt() handling for this case was not updated, so it
currently returns 0 for all sockets for which IPV6_AUTOFLOWLABEL is
not explicitly enabled. Fix it to return the effective value, whether
that has been set at the socket or net namespace level.

Fixes: 513674b5a2c9 ("net: reevalulate autoflowlabel setting after sysctl ...")
Signed-off-by: Ben Hutchings
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Ben Hutchings
2018-01-31 21:03:44 +0800

03 Jan, 2018

1 commit

6d0317869 net: reevalulate autoflowlabel setting after sysctl setting ... Browse Code »

[ Upstream commit 513674b5a2c9c7a67501506419da5c3c77ac6f08 ]

sysctl.ip6.auto_flowlabels is default 1. In our hosts, we set it to 2.
If sockopt doesn't set autoflowlabel, outcome packets from the hosts are
supposed to not include flowlabel. This is true for normal packet, but
not for reset packet.

The reason is ipv6_pinfo.autoflowlabel is set in sock creation. Later if
we change sysctl.ip6.auto_flowlabels, the ipv6_pinfo.autoflowlabel isn't
changed, so the sock will keep the old behavior in terms of auto
flowlabel. Reset packet is suffering from this problem, because reset
packet is sent from a special control socket, which is created at boot
time. Since sysctl.ipv6.auto_flowlabels is 1 by default, the control
socket will always have its ipv6_pinfo.autoflowlabel set, even after
user set sysctl.ipv6.auto_flowlabels to 1, so reset packset will always
have flowlabel. Normal sock created before sysctl setting suffers from
the same issue. We can't even turn off autoflowlabel unless we kill all
socks in the hosts.

To fix this, if IPV6_AUTOFLOWLABEL sockopt is used, we use the
autoflowlabel setting from user, otherwise we always call
ip6_default_np_autolabel() which has the new settings of sysctl.

Note, this changes behavior a little bit. Before commit 42240901f7c4
(ipv6: Implement different admin modes for automatic flow labels), the
autoflowlabel behavior of a sock isn't sticky, eg, if sysctl changes,
existing connection will change autoflowlabel behavior. After that
commit, autoflowlabel behavior is sticky in the whole life of the sock.
With this patch, the behavior isn't sticky again.

Cc: Martin KaFai Lau
Cc: Eric Dumazet
Cc: Tom Herbert
Signed-off-by: Shaohua Li
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Shaohua Li
2018-01-03 03:31:06 +0800

30 Aug, 2017

1 commit

e8d411d29 ipv6: do not set sk_destruct in IPV6_ADDRFORM sockopt ... Browse Code »

ChunYu found a kernel warn_on during syzkaller fuzzing:

[40226.038539] WARNING: CPU: 5 PID: 23720 at net/ipv4/af_inet.c:152 inet_sock_destruct+0x78d/0x9a0
[40226.144849] Call Trace:
[40226.147590]
[40226.149859] dump_stack+0xe2/0x186
[40226.176546] __warn+0x1a4/0x1e0
[40226.180066] warn_slowpath_null+0x31/0x40
[40226.184555] inet_sock_destruct+0x78d/0x9a0
[40226.246355] __sk_destruct+0xfa/0x8c0
[40226.290612] rcu_process_callbacks+0xaa0/0x18a0
[40226.336816] __do_softirq+0x241/0x75e
[40226.367758] irq_exit+0x1f6/0x220
[40226.371458] smp_apic_timer_interrupt+0x7b/0xa0
[40226.376507] apic_timer_interrupt+0x93/0xa0

The warn_on happned when sk->sk_rmem_alloc wasn't 0 in inet_sock_destruct.
As after commit f970bd9e3a06 ("udp: implement memory accounting helpers"),
udp has changed to use udp_destruct_sock as sk_destruct where it would
udp_rmem_release all rmem.

But IPV6_ADDRFORM sockopt sets sk_destruct with inet_sock_destruct after
changing family to PF_INET. If rmem is not 0 at that time, and there is
no place to release rmem before calling inet_sock_destruct, the warn_on
will be triggered.

This patch is to fix it by not setting sk_destruct in IPV6_ADDRFORM sockopt
any more. As IPV6_ADDRFORM sockopt only works for tcp and udp. TCP sock has
already set it's sk_destruct with inet_sock_destruct and UDP has set with
udp_destruct_sock since they're created.

Fixes: f970bd9e3a06 ("udp: implement memory accounting helpers")
Reported-by: ChunYu Wang
Signed-off-by: Xin Long
Acked-by: Paolo Abeni
Signed-off-by: David S. Miller

Xin Long
2017-08-30 01:54:40 +0800

06 Jul, 2017

1 commit

7114f51fc Merge branch 'work.memdup_user' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull memdup_user() conversions from Al Viro:
"A fairly self-contained series - hunting down open-coded memdup_user()
and memdup_user_nul() instances"

* 'work.memdup_user' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
bpf: don't open-code memdup_user()
kimage_file_prepare_segments(): don't open-code memdup_user()
ethtool: don't open-code memdup_user()
do_ip_setsockopt(): don't open-code memdup_user()
do_ipv6_setsockopt(): don't open-code memdup_user()
irda: don't open-code memdup_user()
xfrm_user_policy(): don't open-code memdup_user()
ima_write_policy(): don't open-code memdup_user_nul()
sel_write_validatetrans(): don't open-code memdup_user_nul()

Linus Torvalds
2017-07-06 07:05:24 +0800

04 Jul, 2017

1 commit

0aeea21ad net, ipv6: convert ipv6_txoptions.refcnt from atomic_t to refcount_t ... Browse Code »

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova
Signed-off-by: Hans Liljestrand
Signed-off-by: Kees Cook
Signed-off-by: David Windsor
Signed-off-by: David S. Miller

Reshetova, Elena
2017-07-04 16:29:03 +0800

30 Jun, 2017

1 commit

43727da90 do_ipv6_setsockopt(): don't open-code memdup_user() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2017-06-30 14:04:08 +0800

31 Dec, 2016

1 commit

7bb387c5a net: Allow IP_MULTICAST_IF to set index to L3 slave ... Browse Code »

IP_MULTICAST_IF fails if sk_bound_dev_if is already set and the new index
does not match it. e.g.,

ntpd[15381]: setsockopt IP_MULTICAST_IF 192.168.1.23 fails: Invalid argument

Relax the check in setsockopt to allow setting mc_index to an L3 slave if
sk_bound_dev_if points to an L3 master.

Make a similar change for IPv6. In this case change the device lookup to
take the rcu_read_lock avoiding a refcnt. The rcu lock is also needed for
the lookup of a potential L3 master device.

This really only silences a setsockopt failure since uses of mc_index are
secondary to sk_bound_dev_if if it is set. In both cases, if either index
is an L3 slave or master, lookups are directed to the same FIB table so
relaxing the check at setsockopt time causes no harm.

Patch is based on a suggested change by Darwin for a problem noted in
their code base.

Suggested-by: Darwin Dingel
Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2016-12-31 04:24:47 +0800

25 Dec, 2016

1 commit

7c0f6ba68 Replace <asm/uaccess.h> with <linux/uaccess.h> globally ... Browse Code »

This was entirely automated, using the script by Al:

PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*'
sed -i -e "s!$PATT!#include !" \
$(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro
Signed-off-by: Linus Torvalds

Linus Torvalds
2016-12-25 03:46:01 +0800

10 Nov, 2016

1 commit

a149e7c7c ipv6: sr: add support for SRH injection through setsockopt ... Browse Code »

This patch adds support for per-socket SRH injection with the setsockopt
system call through the IPPROTO_IPV6, IPV6_RTHDR options.
The SRH is pushed through the ipv6_push_nfrag_opts function.

Signed-off-by: David Lebrun
Signed-off-by: David S. Miller

David Lebrun
2016-11-10 09:40:06 +0800

04 Nov, 2016

1 commit

0cc0aa614 ipv6: add IPV6_RECVFRAGSIZE cmsg ... Browse Code »

When reading a datagram or raw packet that arrived fragmented, expose
the maximum fragment size if recorded to allow applications to
estimate receive path MTU.

At this point, the field is only recorded when ipv6 connection
tracking is enabled. A follow-up patch will record this field also
in the ipv6 input path.

Tested using the test for IP_RECVFRAGSIZE plus

ip netns exec to ip addr add dev veth1 fc07::1/64
ip netns exec from ip addr add dev veth0 fc07::2/64

ip netns exec to ./recv_cmsg_recvfragsize -6 -u -p 6000 &
ip netns exec from nc -q 1 -u fc07::1 6000 < payload

Both with and without enabling connection tracking

ip6tables -A INPUT -m state --state NEW -p udp -j LOG

Signed-off-by: Willem de Bruijn
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Willem de Bruijn
2016-11-04 03:41:11 +0800

21 Oct, 2016

1 commit

8651be8f1 ipv6: fix a potential deadlock in do_ipv6_setsockopt() ... Browse Code »

Baozeng reported this deadlock case:

CPU0 CPU1
---- ----
lock([ 165.136033] sk_lock-AF_INET6);
lock([ 165.136033] rtnl_mutex);
lock([ 165.136033] sk_lock-AF_INET6);
lock([ 165.136033] rtnl_mutex);

Similar to commit 87e9f0315952
("ipv4: fix a potential deadlock in mcast getsockopt() path")
this is due to we still have a case, ipv6_sock_mc_close(),
where we acquire sk_lock before rtnl_lock. Close this deadlock
with the similar solution, that is always acquire rtnl lock first.

Fixes: baf606d9c9b1 ("ipv4,ipv6: grab rtnl before locking the socket")
Reported-by: Baozeng Ding
Tested-by: Baozeng Ding
Cc: Marcelo Ricardo Leitner
Signed-off-by: Cong Wang
Reviewed-by: Marcelo Ricardo Leitner
Signed-off-by: David S. Miller

WANG Cong
2016-10-21 23:29:02 +0800

07 Jul, 2016

1 commit

d011a4d86 Merge branch 'stable-4.8' of git://git.infradead.org/users/pcmoore/selinux into next Browse Code »

James Morris
2016-07-07 08:15:34 +0800

28 Jun, 2016

1 commit

ceba1832b calipso: Set the calipso socket label to match the secattr. ... Browse Code »

CALIPSO is a hop-by-hop IPv6 option. A lot of this patch is based on
the equivalent CISPO code. The main difference is due to manipulating
the options in the hop-by-hop header.

Signed-off-by: Huw Davies
Signed-off-by: Paul Moore

Huw Davies
2016-06-28 03:02:51 +0800

04 May, 2016

1 commit

26879da58 ipv6: add new struct ipcm6_cookie ... Browse Code »

In the sendmsg function of UDP, raw, ICMP and l2tp sockets, we use local
variables like hlimits, tclass, opt and dontfrag and pass them to corresponding
functions like ip6_make_skb, ip6_append_data and xxx_push_pending_frames.
This is not a good practice and makes it hard to add new parameters.
This fix introduces a new struct ipcm6_cookie similar to ipcm_cookie in
ipv4 and include the above mentioned variables. And we only pass the
pointer to this structure to corresponding functions. This makes it easier
to add new parameters in the future and makes the function cleaner.

Signed-off-by: Wei Wang
Signed-off-by: David S. Miller

Wei Wang
2016-05-04 04:08:14 +0800

08 Apr, 2016

1 commit

1e1d04e67 net: introduce lockdep_is_held and update various places to use it ... Browse Code »

The socket is either locked if we hold the slock spin_lock for
lock_sock_fast and unlock_sock_fast or we own the lock (sk_lock.owned
!= 0). Check for this and at the same time improve that the current
thread/cpu is really holding the lock.

Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2016-04-08 04:44:14 +0800

05 Apr, 2016

1 commit

ad1e46a83 ipv6: process socket-level control messages in IPv6 ... Browse Code »

Process socket-level control messages by invoking
__sock_cmsg_send in ip6_datagram_send_ctl for control messages on
the SOL_SOCKET layer.

This makes sure whenever ip6_datagram_send_ctl is called for
udp and raw, we also process socket-level control messages.

This is a bit uglier than IPv4, since IPv6 does not have
something like ipcm_cookie. Perhaps we can later create
a control message cookie for IPv6?

Note that this commit interprets new control messages that
were ignored before. As such, this commit does not change
the behavior of IPv6 control messages.

Signed-off-by: Soheil Hassas Yeganeh
Acked-by: Willem de Bruijn
Signed-off-by: David S. Miller

Soheil Hassas Yeganeh
2016-04-05 03:50:30 +0800

03 Dec, 2015

1 commit

45f6fad84 ipv6: add complete rcu protection around np->opt ... Browse Code »

This patch addresses multiple problems :

UDP/RAW sendmsg() need to get a stable struct ipv6_txoptions
while socket is not locked : Other threads can change np->opt
concurrently. Dmitry posted a syzkaller
(http://github.com/google/syzkaller) program desmonstrating
use-after-free.

Starting with TCP/DCCP lockless listeners, tcp_v6_syn_recv_sock()
and dccp_v6_request_recv_sock() also need to use RCU protection
to dereference np->opt once (before calling ipv6_dup_options())

This patch adds full RCU protection to np->opt

Reported-by: Dmitry Vyukov
Signed-off-by: Eric Dumazet
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Eric Dumazet
2015-12-03 12:37:16 +0800

01 Apr, 2015

1 commit

63159f29b ipv6: coding style: comparison for equality with NULL ... Browse Code »

The ipv6 code uses a mixture of coding styles. In some instances check for NULL
pointer is done as x == NULL and sometimes as !x. !x is preferred according to
checkpatch and this patch makes the code consistent by adopting the latter
form.

No changes detected by objdiff.

Signed-off-by: Ian Morris
Signed-off-by: David S. Miller

Ian Morris
2015-04-01 01:51:54 +0800

21 Mar, 2015

1 commit

c4a6853d8 ipv6: invert join/leave anycast rtnl/socket locking order ... Browse Code »

Commit baf606d9c9b1 ("ipv4,ipv6: grab rtnl before locking the socket")
missed to update two setsockopt options, IPV6_JOIN_ANYCAST and
IPV6_LEAVE_ANYCAST, causing a lock inverstion regarding to the updated ones.

As ipv6_sock_ac_join and ipv6_sock_ac_leave are only called from
do_ipv6_setsockopt, we are good to just move the rtnl lock upper.

Fixes: baf606d9c9b1 ("ipv4,ipv6: grab rtnl before locking the socket")
Reported-by: Ying Huang
Signed-off-by: Marcelo Ricardo Leitner
Signed-off-by: David S. Miller

Marcelo Ricardo Leitner
2015-03-21 01:32:38 +0800

19 Mar, 2015

2 commits

54ff9ef36 ipv4, ipv6: kill ip_mc_{join, leave}_group and ipv6_sock_mc_{join, drop} ... Browse Code »

in favor of their inner __ ones, which doesn't grab rtnl.

As these functions need to operate on a locked socket, we can't be
grabbing rtnl by then. It's too late and doing so causes reversed
locking.

So this patch:
- move rtnl handling to callers instead while already fixing some
reversed locking situations, like on vxlan and ipvs code.
- renames __ ones to not have the __ mark:
__ip_mc_{join,leave}_group -> ip_mc_{join,leave}_group
__ipv6_sock_mc_{join,drop} -> ipv6_sock_mc_{join,drop}

Signed-off-by: Marcelo Ricardo Leitner
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Marcelo Ricardo Leitner
2015-03-19 10:05:09 +0800
baf606d9c ipv4,ipv6: grab rtnl before locking the socket ... Browse Code »

There are some setsockopt operations in ipv4 and ipv6 that are grabbing
rtnl after having grabbed the socket lock. Yet this makes it impossible
to do operations that have to lock the socket when already within a rtnl
protected scope, like ndo dev_open and dev_stop.

We normally take coarse grained locks first but setsockopt inverted that.

So this patch invert the lock logic for these operations and makes
setsockopt grab rtnl if it will be needed prior to grabbing socket lock.

Signed-off-by: Marcelo Ricardo Leitner
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Marcelo Ricardo Leitner
2015-03-19 10:05:09 +0800

26 Jan, 2015

1 commit

1dc7b90f7 ipv6: tcp: fix race in IPV6_2292PKTOPTIONS ... Browse Code »

IPv6 TCP sockets store in np->pktoptions skbs, and use skb_set_owner_r()
to charge the skb to socket.

It means that destructor must be called while socket is locked.

Therefore, we cannot use skb_get() or atomic_inc(&skb->users)
to protect ourselves : kfree_skb() might race with other users
manipulating sk->sk_forward_alloc

Fix this race by holding socket lock for the duration of
ip6_datagram_recv_ctl()

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2015-01-26 16:44:08 +0800

10 Dec, 2014

1 commit

86fe8f892 ipv6: remove useless spin_lock/spin_unlock ... Browse Code »

xchg is atomic, so there is no necessary to use spin_lock/spin_unlock
to protect it. At last, remove the redundant
opt = xchg(&inet6_sk(sk)->opt, opt); statement.

Signed-off-by: Duan Jiong
Signed-off-by: David S. Miller

Duan Jiong
2014-12-10 02:18:09 +0800

25 Aug, 2014

2 commits

4c83acbc5 ipv6: White-space cleansing : gaps between function and symbol export ... Browse Code »

This patch makes no changes to the logic of the code but simply addresses
coding style issues as detected by checkpatch.

Both objdump and diff -w show no differences.

This patch removes some blank lines between the end of a function
definition and the EXPORT_SYMBOL_GPL macro in order to prevent
checkpatch warning that EXPORT_SYMBOL must immediately follow
a function.

Signed-off-by: Ian Morris
Signed-off-by: David S. Miller

Ian Morris
2014-08-25 13:37:52 +0800
67ba4152e ipv6: White-space cleansing : Line Layouts ... Browse Code »

This patch makes no changes to the logic of the code but simply addresses
coding style issues as detected by checkpatch.

Both objdump and diff -w show no differences.

A number of items are addressed in this patch:
* Multiple spaces converted to tabs
* Spaces before tabs removed.
* Spaces in pointer typing cleansed (char *)foo etc.
* Remove space after sizeof
* Ensure spacing around comparators such as if statements.

Signed-off-by: Ian Morris
Signed-off-by: David S. Miller

Ian Morris
2014-08-25 13:37:52 +0800

16 Jul, 2014

1 commit

66635dc12 ipv6: remove unnecessary break after return ... Browse Code »

Signed-off-by: Fabian Frederick
Signed-off-by: David S. Miller

Fabian Frederick
2014-07-16 07:27:01 +0800

08 Jul, 2014

1 commit

cb1ce2ef3 ipv6: Implement automatic flow label generation on transmit ... Browse Code »

Automatically generate flow labels for IPv6 packets on transmit.
The flow label is computed based on skb_get_hash. The flow label will
only automatically be set when it is zero otherwise (i.e. flow label
manager hasn't set one). This supports the transmit side functionality
of RFC 6438.

Added an IPv6 sysctl auto_flowlabels to enable/disable this behavior
system wide, and added IPV6_AUTOFLOWLABEL socket option to enable this
functionality per socket.

By default, auto flowlabels are disabled to avoid possible conflicts
with flow label manager, however if this feature proves useful we
may want to enable it by default.

It should also be noted that FreeBSD has already implemented automatic
flow labels (including the sysctl and socket option). In FreeBSD,
automatic flow labels default to enabled.

Performance impact:

Running super_netperf with 200 flows for TCP_RR and UDP_RR for
IPv6. Note that in UDP case, __skb_get_hash will be called for
every packet with explains slight regression. In the TCP case
the hash is saved in the socket so there is no regression.

Automatic flow labels disabled:

TCP_RR:
86.53% CPU utilization
127/195/322 90/95/99% latencies
1.40498e+06 tps

UDP_RR:
90.70% CPU utilization
118/168/243 90/95/99% latencies
1.50309e+06 tps

Automatic flow labels enabled:

TCP_RR:
85.90% CPU utilization
128/199/337 90/95/99% latencies
1.40051e+06

UDP_RR
92.61% CPU utilization
115/164/236 90/95/99% latencies
1.4687e+06

Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller

Tom Herbert
2014-07-08 12:14:21 +0800

02 Jul, 2014

1 commit

9fe516ba3 inet: move ipv6only in sock_common ... Browse Code »

When an UDP application switches from AF_INET to AF_INET6 sockets, we
have a small performance degradation for IPv4 communications because of
extra cache line misses to access ipv6only information.

This can also be noticed for TCP listeners, as ipv6_only_sock() is also
used from __inet_lookup_listener()->compute_score()

This is magnified when SO_REUSEPORT is used.

Move ipv6only into struct sock_common so that it is available at
no extra cost in lookups.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2014-07-02 14:46:21 +0800

27 Feb, 2014

1 commit

0b95227a7 ipv6: yet another new IPV6_MTU_DISCOVER option IPV6_PMTUDISC_OMIT ... Browse Code »

This option has the same semantic as IP_PMTUDISC_OMIT for IPv4 which
got recently introduced. It doesn't honor the path mtu discovered by the
host but in contrary to IPV6_PMTUDISC_INTERFACE allows the generation of
fragments if the packet size exceeds the MTU of the outgoing interface
MTU.

Fixes: 93b36cf3425b9b ("ipv6: support IPV6_PMTU_INTERFACE on sockets")
Cc: Florian Weimer
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2014-02-27 04:51:01 +0800

20 Jan, 2014

2 commits

4b261c75a ipv6: make IPV6_RECVPKTINFO work for ipv4 datagrams ... Browse Code »

We currently don't report IPV6_RECVPKTINFO in cmsg access ancillary data
for IPv4 datagrams on IPv6 sockets.

This patch splits the ip6_datagram_recv_ctl into two functions, one
which handles both protocol families, AF_INET and AF_INET6, while the
ip6_datagram_recv_specific_ctl only handles IPv6 cmsg data.

ip6_datagram_recv_*_ctl never reported back any errors, so we can make
them return void. Also provide a helper for protocols which don't offer dual
personality to further use ip6_datagram_recv_ctl, which is exported to
modules.

I needed to shuffle the code for ping around a bit to make it easier to
implement dual personality for ping ipv6 sockets in future.

Reported-by: Gert Doering
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2014-01-20 11:53:18 +0800
46e5f4017 ipv6: add a flag to get the flow label used remotly ... Browse Code »

This information is already available via IPV6_FLOWINFO
of IPV6_2292PKTOPTIONS, and them a filtering to get the flow label
information. But it is probably logical and easier for users to add this
here, and to control both sent/received flow label values with the
IPV6_FLOWLABEL_MGR option.

Signed-off-by: Florent Fourcot
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Florent Fourcot
2014-01-20 09:12:31 +0800

16 Jan, 2014

1 commit

d76ed22b2 ipv6: move IPV6_TCLASS_SHIFT into ipv6.h and define a helper ... Browse Code »

Two places defined IPV6_TCLASS_SHIFT, so we should move it into ipv6.h,
and use this macro as possible. And define ip6_tclass helper to return
tclass

Signed-off-by: Li RongQing
Signed-off-by: David S. Miller

Li RongQing
2014-01-16 07:53:18 +0800

19 Dec, 2013

1 commit

93b36cf34 ipv6: support IPV6_PMTU_INTERFACE on sockets ... Browse Code »

IPV6_PMTU_INTERFACE is the same as IPV6_PMTU_PROBE for ipv6. Add it
nontheless for symmetry with IPv4 sockets. Also drop incoming MTU
information if this mode is enabled.

The additional bit in ipv6_pinfo just eats in the padding behind the
bitfield. There are no changes to the layout of the struct at all.

Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2013-12-19 06:37:05 +0800

13 Dec, 2013

1 commit

685360536 ipv6: fix incorrect type in declaration ... Browse Code »

Introduced by 1397ed35f22d7c30d0b89ba74b6b7829220dfcfd
"ipv6: add flowinfo for tcp6 pkt_options for all cases"

Reported-by: kbuild test robot

V2: fix the title, add empty line after the declaration (Sergei Shtylyov
feedbacks)

Signed-off-by: David S. Miller

Florent Fourcot
2013-12-13 05:14:09 +0800

10 Dec, 2013

2 commits

82e9f105a ipv6: remove rcv_tclass of ipv6_pinfo ... Browse Code »

tclass information in now already stored in rcv_flowinfo
We do not need to store the same information twice.

Signed-off-by: Florent Fourcot
Reviewed-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Florent Fourcot
2013-12-10 10:03:49 +0800
1397ed35f ipv6: add flowinfo for tcp6 pkt_options for all cases ... Browse Code »

The current implementation of IPV6_FLOWINFO only gives a
result if pktoptions is available (thanks to the
ip6_datagram_recv_ctl function).
It gives inconsistent results to user space, sometimes
there is a result for getsockopt(IPV6_FLOWINFO), sometimes
not.

This patch add rcv_flowinfo to store it, and return it to
the userspace in the same way than other pkt_options.

Signed-off-by: Florent Fourcot
Reviewed-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Florent Fourcot
2013-12-10 10:03:49 +0800

09 Nov, 2013

1 commit

3fdfa5ff5 ipv6: enable IPV6_FLOWLABEL_MGR for getsockopt ... Browse Code »

It is already possible to set/put/renew a label
with IPV6_FLOWLABEL_MGR and setsockopt. This patch
add the possibility to get information about this
label (current value, time before expiration, etc).

It helps application to take decision for a renew
or a release of the label.

v2:
* Add spin_lock to prevent race condition
* return -ENOENT if no result found
* check if flr_action is GET

v3:
* move the spin_lock to protect only the
relevant code

Signed-off-by: Florent Fourcot
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Florent Fourcot
2013-11-09 02:42:57 +0800