Eric Lee / smarc-fsl-linux-kernel

29 Aug, 2016

1 commit

320355858 tcp: Set read_sock and peek_len proto_ops ... Browse Code »

In inet_stream_ops we set read_sock to tcp_read_sock and peek_len to
tcp_peek_len (which is just a stub function that calls tcp_inq).

Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller

Tom Herbert
2016-08-29 11:32:41 +0800

30 Jul, 2016

1 commit

7a1e8b80f Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security ... Browse Code »

Pull security subsystem updates from James Morris:
"Highlights:

- TPM core and driver updates/fixes
- IPv6 security labeling (CALIPSO)
- Lots of Apparmor fixes
- Seccomp: remove 2-phase API, close hole where ptrace can change
syscall #"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (156 commits)
apparmor: fix SECURITY_APPARMOR_HASH_DEFAULT parameter handling
tpm: Add TPM 2.0 support to the Nuvoton i2c driver (NPCT6xx family)
tpm: Factor out common startup code
tpm: use devm_add_action_or_reset
tpm2_i2c_nuvoton: add irq validity check
tpm: read burstcount from TPM_STS in one 32-bit transaction
tpm: fix byte-order for the value read by tpm2_get_tpm_pt
tpm_tis_core: convert max timeouts from msec to jiffies
apparmor: fix arg_size computation for when setprocattr is null terminated
apparmor: fix oops, validate buffer size in apparmor_setprocattr()
apparmor: do not expose kernel stack
apparmor: fix module parameters can be changed after policy is locked
apparmor: fix oops in profile_unpack() when policy_db is not present
apparmor: don't check for vmalloc_addr if kvzalloc() failed
apparmor: add missing id bounds check on dfa verification
apparmor: allow SYS_CAP_RESOURCE to be sufficient to prlimit another task
apparmor: use list_next_entry instead of list_entry_next
apparmor: fix refcount race when finding a child profile
apparmor: fix ref count leak when profile sha1 hash is read
apparmor: check that xindex is in trans_table bounds
...

Linus Torvalds
2016-07-30 08:38:46 +0800

07 Jul, 2016

1 commit

d011a4d86 Merge branch 'stable-4.8' of git://git.infradead.org/users/pcmoore/selinux into next Browse Code »

James Morris
2016-07-07 08:15:34 +0800

28 Jun, 2016

1 commit

cb72d3821 netlabel: Initial support for the CALIPSO netlink protocol. ... Browse Code »

CALIPSO is a packet labelling protocol for IPv6 which is very similar
to CIPSO. It is specified in RFC 5570. Much of the code is based on
the current CIPSO code.

This adds support for adding passthrough-type CALIPSO DOIs through the
NLBL_CALIPSO_C_ADD command. It requires attributes:

NLBL_CALIPSO_A_TYPE which must be CALIPSO_MAP_PASS.
NLBL_CALIPSO_A_DOI.

In passthrough mode the CALIPSO engine will map MLS secattr levels
and categories directly to the packet label.

At this stage, the major difference between this and the CIPSO
code is that IPv6 may be compiled as a module. To allow for
this the CALIPSO functions are registered at module init time.

Signed-off-by: Huw Davies
Signed-off-by: Paul Moore

Huw Davies
2016-06-28 03:02:46 +0800

10 Jun, 2016

1 commit

e43486371 net: vrf: Fix crash when IPv6 is disabled at boot time ... Browse Code »

Frank Kellermann reported a kernel crash with 4.5.0 when IPv6 is
disabled at boot using the kernel option ipv6.disable=1. Using
current net-next with the boot option:

$ ip link add red type vrf table 1001

Generates:
[12210.919584] BUG: unable to handle kernel NULL pointer dereference at 0000000000000748
[12210.921341] IP: [] fib6_get_table+0x2c/0x5a
[12210.922537] PGD b79e3067 PUD bb32b067 PMD 0
[12210.923479] Oops: 0000 [#1] SMP
[12210.924001] Modules linked in: ipvlan 8021q garp mrp stp llc
[12210.925130] CPU: 3 PID: 1177 Comm: ip Not tainted 4.7.0-rc1+ #235
[12210.926168] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[12210.928065] task: ffff8800b9ac4640 ti: ffff8800bacac000 task.ti: ffff8800bacac000
[12210.929328] RIP: 0010:[] [] fib6_get_table+0x2c/0x5a
[12210.930697] RSP: 0018:ffff8800bacaf888 EFLAGS: 00010202
[12210.931563] RAX: 0000000000000748 RBX: ffffffff81a9e280 RCX: ffff8800b9ac4e28
[12210.932688] RDX: 00000000000000e9 RSI: 0000000000000002 RDI: 0000000000000286
[12210.933820] RBP: ffff8800bacaf898 R08: ffff8800b9ac4df0 R09: 000000000052001b
[12210.934941] R10: 00000000657c0000 R11: 000000000000c649 R12: 00000000000003e9
[12210.936032] R13: 00000000000003e9 R14: ffff8800bace7800 R15: ffff8800bb3ec000
[12210.937103] FS: 00007faa1766c700(0000) GS:ffff88013ac00000(0000) knlGS:0000000000000000
[12210.938321] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12210.939166] CR2: 0000000000000748 CR3: 00000000b79d6000 CR4: 00000000000406e0
[12210.940278] Stack:
[12210.940603] ffff8800bb3ec000 ffffffff81a9e280 ffff8800bacaf8c8 ffffffff814b3135
[12210.941818] ffff8800bb3ec000 ffffffff81a9e280 ffffffff81a9e280 ffff8800bace7800
[12210.943040] ffff8800bacaf8f0 ffffffff81397c88 ffff8800bb3ec000 ffffffff81a9e280
[12210.944288] Call Trace:
[12210.944688] [] fib6_new_table+0x24/0x8a
[12210.945516] [] vrf_dev_init+0xd4/0x162
[12210.946328] [] register_netdevice+0x100/0x396
[12210.947209] [] vrf_newlink+0x40/0xb3
[12210.948001] [] rtnl_newlink+0x5d3/0x6d5
...

The problem above is due to the fact that the fib hash table is not
allocated when IPv6 is disabled at boot.

As for the VRF driver it should not do any IPv6 initializations if IPv6
is disabled, so it needs to know if IPv6 is disabled at boot. The disable
parameter is private to the IPv6 module, so provide an accessor for
modules to determine if IPv6 was disabled at boot time.

Fixes: 35402e3136634 ("net: Add IPv6 support to VRF device")
Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2016-06-10 14:34:42 +0800

08 Apr, 2016

1 commit

a6024562f udp: Add GRO functions to UDP socket ... Browse Code »

This patch adds GRO functions (gro_receive and gro_complete) to UDP
sockets. udp_gro_receive is changed to perform socket lookup on a
packet. If a socket is found the related GRO functions are called.

This features obsoletes using UDP offload infrastructure for GRO
(udp_offload). This has the advantage of not being limited to provide
offload on a per port basis, GRO is now applied to whatever individual
UDP sockets are bound to. This also allows the possbility of
"application defined GRO"-- that is we can attach something like
a BPF program to a UDP socket to perfrom GRO on an application
layer protocol.

Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller

Tom Herbert
2016-04-08 04:53:29 +0800

06 Apr, 2016

1 commit

627d2d6b5 udp: enable MSG_PEEK at non-zero offset ... Browse Code »

Enable peeking at UDP datagrams at the offset specified with socket
option SOL_SOCKET/SO_PEEK_OFF. Peek at any datagram in the queue, up
to the end of the given datagram.

Implement the SO_PEEK_OFF semantics introduced in commit ef64a54f6e55
("sock: Introduce the SO_PEEK_OFF sock option"). Increase the offset
on peek, decrease it on regular reads.

When peeking, always checksum the packet immediately, to avoid
recomputation on subsequent peeks and final read.

The socket lock is not held for the duration of udp_recvmsg, so
peek and read operations can run concurrently. Only the last store
to sk_peek_off is preserved.

Signed-off-by: Sam Kumar
Signed-off-by: Willem de Bruijn
Signed-off-by: David S. Miller

samanthakumar
2016-04-06 04:29:37 +0800

11 Feb, 2016

1 commit

086c653f5 sock: struct proto hash function may error ... Browse Code »

In order to support fast reuseport lookups in TCP, the hash function
defined in struct proto must be capable of returning an error code.
This patch changes the function signature of all related hash functions
to return an integer and handles or propagates this return value at
all call sites.

Signed-off-by: Craig Gallek
Signed-off-by: David S. Miller

Craig Gallek
2016-02-11 16:54:14 +0800

15 Dec, 2015

1 commit

79462ad02 net: add validation for the socket syscall protocol argument ... Browse Code »

郭永刚 reported that one could simply crash the kernel as root by
using a simple program:

int socket_fd;
struct sockaddr_in addr;
addr.sin_port = 0;
addr.sin_addr.s_addr = INADDR_ANY;
addr.sin_family = 10;

socket_fd = socket(10,3,0x40000000);
connect(socket_fd , &addr,16);

AF_INET, AF_INET6 sockets actually only support 8-bit protocol
identifiers. inet_sock's skc_protocol field thus is sized accordingly,
thus larger protocol identifiers simply cut off the higher bits and
store a zero in the protocol fields.

This could lead to e.g. NULL function pointer because as a result of
the cut off inet_num is zero and we call down to inet_autobind, which
is NULL for raw sockets.

kernel: Call Trace:
kernel: [] ? inet_autobind+0x2e/0x70
kernel: [] inet_dgram_connect+0x54/0x80
kernel: [] SYSC_connect+0xd9/0x110
kernel: [] ? ptrace_notify+0x5b/0x80
kernel: [] ? syscall_trace_enter_phase2+0x108/0x200
kernel: [] SyS_connect+0xe/0x10
kernel: [] tracesys_phase2+0x84/0x89

I found no particular commit which introduced this problem.

CVE: CVE-2015-8543
Cc: Cong Wang
Reported-by: 郭永刚
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2015-12-15 05:09:30 +0800

04 Dec, 2015

1 commit

6bd4f355d ipv6: kill sk_dst_lock ... Browse Code »

While testing the np->opt RCU conversion, I found that UDP/IPv6 was
using a mixture of xchg() and sk_dst_lock to protect concurrent changes
to sk->sk_dst_cache, leading to possible corruptions and crashes.

ip6_sk_dst_lookup_flow() uses sk_dst_check() anyway, so the simplest
way to fix the mess is to remove sk_dst_lock completely, as we did for
IPv4.

__ip6_dst_store() and ip6_dst_store() share same implementation.

sk_setup_caps() being called with socket lock being held or not,
we have to use sk_dst_set() instead of __sk_dst_set()

Note that I had to move the "np->dst_cookie = rt6_get_cookie(rt);"
in ip6_dst_store() before the sk_setup_caps(sk, dst) call.

This is because ip6_dst_store() can be called from process context,
without any lock held.

As soon as the dst is installed in sk->sk_dst_cache, dst can be freed
from another cpu doing a concurrent ip6_dst_store()

Doing the dst dereference before doing the install is needed to make
sure no use after free would trigger.

Signed-off-by: Eric Dumazet
Reported-by: Dmitry Vyukov
Signed-off-by: David S. Miller

Eric Dumazet
2015-12-04 00:32:06 +0800

03 Dec, 2015

1 commit

45f6fad84 ipv6: add complete rcu protection around np->opt ... Browse Code »

This patch addresses multiple problems :

UDP/RAW sendmsg() need to get a stable struct ipv6_txoptions
while socket is not locked : Other threads can change np->opt
concurrently. Dmitry posted a syzkaller
(http://github.com/google/syzkaller) program desmonstrating
use-after-free.

Starting with TCP/DCCP lockless listeners, tcp_v6_syn_recv_sock()
and dccp_v6_request_recv_sock() also need to use RCU protection
to dereference np->opt once (before calling ipv6_dup_options())

This patch adds full RCU protection to np->opt

Reported-by: Dmitry Vyukov
Signed-off-by: Eric Dumazet
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Eric Dumazet
2015-12-03 12:37:16 +0800

01 Aug, 2015

2 commits

be26849bf ipv6: Disable flowlabel state ranges by default ... Browse Code »

Per RFC6437 stateful flow labels (e.g. labels set by flow label manager)
cannot "disturb" nodes taking part in stateless flow labels. While the
ranges only reduce the flow label entropy by one bit, it is conceivable
that this might bias the algorithm on some routers causing a load
imbalance. For best results on the Internet we really need the full
20 bits.

Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller

Tom Herbert
2015-08-01 08:07:11 +0800
42240901f ipv6: Implement different admin modes for automatic flow labels ... Browse Code »

Change the meaning of net.ipv6.auto_flowlabels to provide a mode for
automatic flow labels generation. There are four modes:

0: flow labels are disabled
1: flow labels are enabled, sockets can opt-out
2: flow labels are allowed, sockets can opt-in
3: flow labels are enabled and enforced, no opt-out for sockets

np->autoflowlabel is initialized according to the sysctl value.

Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller

Tom Herbert
2015-08-01 08:07:11 +0800

10 Jul, 2015

2 commits

35a256fee ipv6: Nonlocal bind ... Browse Code »

Add support to allow non-local binds similar to how this was done for IPv4.
Non-local binds are very useful in emulating the Internet in a box, etc.

This add the ip_nonlocal_bind sysctl under ipv6.

Testing:

Set up nonlocal binding and receive routing on a host, e.g.:

ip -6 rule add from ::/0 iif eth0 lookup 200
ip -6 route add local 2001:0:0:1::/64 dev lo proto kernel scope host table 200
sysctl -w net.ipv6.ip_nonlocal_bind=1

Set up routing to 2001:0:0:1::/64 on peer to go to first host

ping6 -I 2001:0:0:1::1 peer-address -- to verify

Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller

Tom Herbert
2015-07-10 12:09:10 +0800
8b58a3984 ipv6: use flag instead of u16 for hop in inet6_skb_parm ... Browse Code »

Hop was always either 0 or sizeof(struct ipv6hdr).

Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller

Florian Westphal
2015-07-10 06:06:59 +0800

07 Jun, 2015

1 commit

90c337da1 inet: add IP_BIND_ADDRESS_NO_PORT to overcome bind(0) limitations ... Browse Code »

When an application needs to force a source IP on an active TCP socket
it has to use bind(IP, port=x).

As most applications do not want to deal with already used ports, x is
often set to 0, meaning the kernel is in charge to find an available
port.
But kernel does not know yet if this socket is going to be a listener or
be connected.
It has very limited choices (no full knowledge of final 4-tuple for a
connect())

With limited ephemeral port range (about 32K ports), it is very easy to
fill the space.

This patch adds a new SOL_IP socket option, asking kernel to ignore
the 0 port provided by application in bind(IP, port=0) and only
remember the given IP address.

The port will be automatically chosen at connect() time, in a way
that allows sharing a source port as long as the 4-tuples are unique.

This new feature is available for both IPv4 and IPv6 (Thanks Neal)

Tested:

Wrote a test program and checked its behavior on IPv4 and IPv6.

strace(1) shows sequences of bind(IP=127.0.0.2, port=0) followed by
connect().
Also getsockname() show that the port is still 0 right after bind()
but properly allocated after connect().

socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 5
setsockopt(5, SOL_IP, IP_BIND_ADDRESS_NO_PORT, [1], 4) = 0
bind(5, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.2")}, 16) = 0
getsockname(5, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.2")}, [16]) = 0
connect(5, {sa_family=AF_INET, sin_port=htons(53174), sin_addr=inet_addr("127.0.0.3")}, 16) = 0
getsockname(5, {sa_family=AF_INET, sin_port=htons(38050), sin_addr=inet_addr("127.0.0.2")}, [16]) = 0

IPv6 test :

socket(PF_INET6, SOCK_STREAM, IPPROTO_IP) = 7
setsockopt(7, SOL_IP, IP_BIND_ADDRESS_NO_PORT, [1], 4) = 0
bind(7, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
getsockname(7, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
connect(7, {sa_family=AF_INET6, sin6_port=htons(57300), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
getsockname(7, {sa_family=AF_INET6, sin6_port=htons(60964), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0

I was able to bind()/connect() a million concurrent IPv4 sockets,
instead of ~32000 before patch.

lpaa23:~# ulimit -n 1000010
lpaa23:~# ./bind --connect --num-flows=1000000 &
1000000 sockets

lpaa23:~# grep TCP /proc/net/sockstat
TCP: inuse 2000063 orphan 0 tw 47 alloc 2000157 mem 66

Check that a given source port is indeed used by many different
connections :

lpaa23:~# ss -t src :40000 | head -10
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 0 127.0.0.2:40000 127.0.202.33:44983
ESTAB 0 0 127.0.0.2:40000 127.2.27.240:44983
ESTAB 0 0 127.0.0.2:40000 127.2.98.5:44983
ESTAB 0 0 127.0.0.2:40000 127.0.124.196:44983
ESTAB 0 0 127.0.0.2:40000 127.2.139.38:44983
ESTAB 0 0 127.0.0.2:40000 127.1.59.80:44983
ESTAB 0 0 127.0.0.2:40000 127.3.6.228:44983
ESTAB 0 0 127.0.0.2:40000 127.0.38.53:44983
ESTAB 0 0 127.0.0.2:40000 127.1.197.10:44983

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2015-06-07 14:57:12 +0800

11 May, 2015

1 commit

11aa9c28b net: Pass kern from net_proto_family.create to sk_alloc ... Browse Code »

In preparation for changing how struct net is refcounted
on kernel sockets pass the knowledge that we are creating
a kernel socket from sock_create_kern through to sk_alloc.

Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller

Eric W. Biederman
2015-05-11 22:50:17 +0800

04 May, 2015

1 commit

82a584b7c ipv6: Flow label state ranges ... Browse Code »

This patch divides the IPv6 flow label space into two ranges:
0-7ffff is reserved for flow label manager, 80000-fffff will be
used for creating auto flow labels (per RFC6438). This only affects how
labels are set on transmit, it does not affect receive. This range split
can be disbaled by systcl.

Background:

IPv6 flow labels have been an unmitigated disappointment thus far
in the lifetime of IPv6. Support in HW devices to use them for ECMP
is lacking, and OSes don't turn them on by default. If we had these
we could get much better hashing in IPv6 networks without resorting
to DPI, possibly eliminating some of the motivations to to define new
encaps in UDP just for getting ECMP.

Unfortunately, the initial specfications of IPv6 did not clarify
how they are to be used. There has always been a vague concept that
these can be used for ECMP, flow hashing, etc. and we do now have a
good standard how to this in RFC6438. The problem is that flow labels
can be either stateful or stateless (as in RFC6438), and we are
presented with the possibility that a stateless label may collide
with a stateful one. Attempts to split the flow label space were
rejected in IETF. When we added support in Linux for RFC6438, we
could not turn on flow labels by default due to this conflict.

This patch splits the flow label space and should give us
a path to enabling auto flow labels by default for all IPv6 packets.
This is an API change so we need to consider compatibility with
existing deployment. The stateful range is chosen to be the lower
values in hopes that most uses would have chosen small numbers.

Once we resolve the stateless/stateful issue, we can proceed to
look at enabling RFC6438 flow labels by default (starting with
scaled testing).

Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller

Tom Herbert
2015-05-04 09:58:01 +0800

01 Apr, 2015

2 commits

53b24b8f9 ipv6: coding style: comparison for inequality with NULL ... Browse Code »

The ipv6 code uses a mixture of coding styles. In some instances check for NULL
pointer is done as x != NULL and sometimes as x. x is preferred according to
checkpatch and this patch makes the code consistent by adopting the latter
form.

No changes detected by objdiff.

Signed-off-by: Ian Morris
Signed-off-by: David S. Miller

Ian Morris
2015-04-01 01:51:54 +0800
63159f29b ipv6: coding style: comparison for equality with NULL ... Browse Code »

The ipv6 code uses a mixture of coding styles. In some instances check for NULL
pointer is done as x == NULL and sometimes as !x. !x is preferred according to
checkpatch and this patch makes the code consistent by adopting the latter
form.

No changes detected by objdiff.

Signed-off-by: Ian Morris
Signed-off-by: David S. Miller

Ian Morris
2015-04-01 01:51:54 +0800

24 Mar, 2015

1 commit

1855b7c3e ipv6: introduce idgen_delay and idgen_retries knobs ... Browse Code »

This is specified by RFC 7217.

Cc: Erik Kline
Cc: Fernando Gont
Cc: Lorenzo Colitti
Cc: YOSHIFUJI Hideaki/吉藤英明
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2015-03-24 10:12:09 +0800

02 Mar, 2015

1 commit

b4772ef87 net: use common macro for assering skb->cb[] available size in protocol families ... Browse Code »

As part of an effort to move skb->dropcount to skb->cb[] use a common
macro in protocol families using skb->cb[] for ancillary data to
validate available room in skb->cb[].

Signed-off-by: Eyal Birger
Signed-off-by: David S. Miller

Eyal Birger
2015-03-02 13:19:30 +0800

07 Oct, 2014

1 commit

812918c46 ipv6: make fib6 serial number per namespace ... Browse Code »

Try to reduce number of possible fn_sernum mutation by constraining them
to their namespace.

Also remove rt_genid which I forgot to remove in 705f1c869d577c ("ipv6:
remove rt6i_genid").

Cc: YOSHIFUJI Hideaki
Cc: Martin Lau
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2014-10-07 12:02:30 +0800

29 Sep, 2014

1 commit

a224772db ipv6: add a struct inet6_skb_parm param to ipv6_opt_accepted() ... Browse Code »

ipv6_opt_accepted() assumes IP6CB(skb) holds the struct inet6_skb_parm
that it needs. Lets not assume this, as TCP stack might use a different
place.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2014-09-29 04:35:43 +0800

10 Sep, 2014

1 commit

49a601589 net/ipv4: bind ip_nonlocal_bind to current netns ... Browse Code »

net.ipv4.ip_nonlocal_bind sysctl was global to all network
namespaces. This patch allows to set a different value for each
network namespace.

Signed-off-by: Vincent Bernat
Signed-off-by: David S. Miller

Vincent Bernat
2014-09-10 02:27:09 +0800

25 Aug, 2014

1 commit

67ba4152e ipv6: White-space cleansing : Line Layouts ... Browse Code »

This patch makes no changes to the logic of the code but simply addresses
coding style issues as detected by checkpatch.

Both objdump and diff -w show no differences.

A number of items are addressed in this patch:
* Multiple spaces converted to tabs
* Spaces before tabs removed.
* Spaces in pointer typing cleansed (char *)foo etc.
* Remove space after sizeof
* Ensure spacing around comparators such as if statements.

Signed-off-by: Ian Morris
Signed-off-by: David S. Miller

Ian Morris
2014-08-25 13:37:52 +0800

08 Jul, 2014

1 commit

cb1ce2ef3 ipv6: Implement automatic flow label generation on transmit ... Browse Code »

Automatically generate flow labels for IPv6 packets on transmit.
The flow label is computed based on skb_get_hash. The flow label will
only automatically be set when it is zero otherwise (i.e. flow label
manager hasn't set one). This supports the transmit side functionality
of RFC 6438.

Added an IPv6 sysctl auto_flowlabels to enable/disable this behavior
system wide, and added IPV6_AUTOFLOWLABEL socket option to enable this
functionality per socket.

By default, auto flowlabels are disabled to avoid possible conflicts
with flow label manager, however if this feature proves useful we
may want to enable it by default.

It should also be noted that FreeBSD has already implemented automatic
flow labels (including the sysctl and socket option). In FreeBSD,
automatic flow labels default to enabled.

Performance impact:

Running super_netperf with 200 flows for TCP_RR and UDP_RR for
IPv6. Note that in UDP case, __skb_get_hash will be called for
every packet with explains slight regression. In the TCP case
the hash is saved in the socket so there is no regression.

Automatic flow labels disabled:

TCP_RR:
86.53% CPU utilization
127/195/322 90/95/99% latencies
1.40498e+06 tps

UDP_RR:
90.70% CPU utilization
118/168/243 90/95/99% latencies
1.50309e+06 tps

Automatic flow labels enabled:

TCP_RR:
85.90% CPU utilization
128/199/337 90/95/99% latencies
1.40051e+06

UDP_RR
92.61% CPU utilization
115/164/236 90/95/99% latencies
1.4687e+06

Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller

Tom Herbert
2014-07-08 12:14:21 +0800

02 Jul, 2014

1 commit

9fe516ba3 inet: move ipv6only in sock_common ... Browse Code »

When an UDP application switches from AF_INET to AF_INET6 sockets, we
have a small performance degradation for IPv4 communications because of
extra cache line misses to access ipv6only information.

This can also be noticed for TCP listeners, as ipv6_only_sock() is also
used from __inet_lookup_listener()->compute_score()

This is magnified when SO_REUSEPORT is used.

Move ipv6only into struct sock_common so that it is available at
no extra cost in lookups.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2014-07-02 14:46:21 +0800

24 May, 2014

1 commit

b26ba202e net: Eliminate no_check from protosw ... Browse Code »

It doesn't seem like an protocols are setting anything other
than the default, and allowing to arbitrarily disable checksums
for a whole protocol seems dangerous. This can be done on a per
socket basis.

Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller

Tom Herbert
2014-05-24 04:28:53 +0800

08 May, 2014

1 commit

698365fa1 net: clean up snmp stats code ... Browse Code »

commit 8f0ea0fe3a036a47767f9c80e (snmp: reduce percpu needs by 50%)
reduced snmp array size to 1, so technically it doesn't have to be
an array any more. What's more, after the following commit:

commit 933393f58fef9963eac61db8093689544e29a600
Date: Thu Dec 22 11:58:51 2011 -0600

percpu: Remove irqsafe_cpu_xxx variants

We simply say that regular this_cpu use must be safe regardless of
preemption and interrupt state. That has no material change for x86
and s390 implementations of this_cpu operations. However, arches that
do not provide their own implementation for this_cpu operations will
now get code generated that disables interrupts instead of preemption.

probably no arch wants to have SNMP_ARRAY_SZ == 2. At least after
almost 3 years, no one complains.

So, just convert the array to a single pointer and remove snmp_mib_init()
and snmp_mib_free() as well.

Cc: Christoph Lameter
Cc: Eric Dumazet
Cc: David S. Miller
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller

WANG Cong
2014-05-08 04:06:05 +0800

20 Jan, 2014

1 commit

6444f72b4 ipv6: add flowlabel_consistency sysctl ... Browse Code »

With the introduction of IPV6_FL_F_REFLECT, there is no guarantee of
flow label unicity. This patch introduces a new sysctl to protect the old
behaviour, enable by default.

Changelog of V3:
* rename ip6_flowlabel_consistency to flowlabel_consistency
* use net_info_ratelimited()
* checkpatch cleanups

Signed-off-by: Florent Fourcot
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Florent Fourcot
2014-01-20 09:12:31 +0800

20 Dec, 2013

1 commit

1669cb985 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next ... Browse Code »

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2013-12-19

1) Use the user supplied policy index instead of a generated one
if present. From Fan Du.

2) Make xfrm migration namespace aware. From Fan Du.

3) Make the xfrm state and policy locks namespace aware. From Fan Du.

4) Remove ancient sleeping when the SA is in acquire state,
we now queue packets to the policy instead. This replaces the
sleeping code.

5) Remove FLOWI_FLAG_CAN_SLEEP. This was used to notify xfrm about the
posibility to sleep. The sleeping code is gone, so remove it.

6) Check user specified spi for IPComp. Thr spi for IPcomp is only
16 bit wide, so check for a valid value. From Fan Du.

7) Export verify_userspi_info to check for valid user supplied spi ranges
with pfkey and netlink. From Fan Du.

8) RFC3173 states that if the total size of a compressed payload and the IPComp
header is not smaller than the size of the original payload, the IP datagram
must be sent in the original non-compressed form. These packets are dropped
by the inbound policy check because they are not transformed. Document the need
to set 'level use' for IPcomp to receive such packets anyway. From Fan Du.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller

David S. Miller
2013-12-20 07:37:49 +0800

19 Dec, 2013

1 commit

974eda11c inet: make no_pmtu_disc per namespace and kill ipv4_config ... Browse Code »

The other field in ipv4_config, log_martians, was converted to a
per-interface setting, so we can just remove the whole structure.

Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2013-12-19 05:58:20 +0800

10 Dec, 2013

1 commit

ac1eabcae ipv6: use ip6_flowinfo helper ... Browse Code »

Signed-off-by: Florent Fourcot
Reviewed-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Florent Fourcot
2013-12-10 10:03:50 +0800

06 Dec, 2013

1 commit

0e0d44ab4 net: Remove FLOWI_FLAG_CAN_SLEEP ... Browse Code »

FLOWI_FLAG_CAN_SLEEP was used to notify xfrm about the posibility
to sleep until the needed states are resolved. This code is gone,
so FLOWI_FLAG_CAN_SLEEP is not needed anymore.

Signed-off-by: Steffen Klassert

Steffen Klassert
2013-12-06 14:24:39 +0800

20 Nov, 2013

1 commit

1ee2dcc22 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:
"Mostly these are fixes for fallout due to merge window changes, as
well as cures for problems that have been with us for a much longer
period of time"

1) Johannes Berg noticed two major deficiencies in our genetlink
registration. Some genetlink protocols we passing in constant
counts for their ops array rather than something like
ARRAY_SIZE(ops) or similar. Also, some genetlink protocols were
using fixed IDs for their multicast groups.

We have to retain these fixed IDs to keep existing userland tools
working, but reserve them so that other multicast groups used by
other protocols can not possibly conflict.

In dealing with these two problems, we actually now use less state
management for genetlink operations and multicast groups.

2) When configuring interface hardware timestamping, fix several
drivers that simply do not validate that the hwtstamp_config value
is one the driver actually supports. From Ben Hutchings.

3) Invalid memory references in mwifiex driver, from Amitkumar Karwar.

4) In dev_forward_skb(), set the skb->protocol in the right order
relative to skb_scrub_packet(). From Alexei Starovoitov.

5) Bridge erroneously fails to use the proper wrapper functions to make
calls to netdev_ops->ndo_vlan_rx_{add,kill}_vid. Fix from Toshiaki
Makita.

6) When detaching a bridge port, make sure to flush all VLAN IDs to
prevent them from leaking, also from Toshiaki Makita.

7) Put in a compromise for TCP Small Queues so that deep queued devices
that delay TX reclaim non-trivially don't have such a performance
decrease. One particularly problematic area is 802.11 AMPDU in
wireless. From Eric Dumazet.

8) Fix crashes in tcp_fastopen_cache_get(), we can see NULL socket dsts
here. Fix from Eric Dumzaet, reported by Dave Jones.

9) Fix use after free in ipv6 SIT driver, from Willem de Bruijn.

10) When computing mergeable buffer sizes, virtio-net fails to take the
virtio-net header into account. From Michael Dalton.

11) Fix seqlock deadlock in ip4_datagram_connect() wrt. statistic
bumping, this one has been with us for a while. From Eric Dumazet.

12) Fix NULL deref in the new TIPC fragmentation handling, from Erik
Hugne.

13) 6lowpan bit used for traffic classification was wrong, from Jukka
Rissanen.

14) macvlan has the same issue as normal vlans did wrt. propagating LRO
disabling down to the real device, fix it the same way. From Michal
Kubecek.

15) CPSW driver needs to soft reset all slaves during suspend, from
Daniel Mack.

16) Fix small frame pacing in FQ packet scheduler, from Eric Dumazet.

17) The xen-netfront RX buffer refill timer isn't properly scheduled on
partial RX allocation success, from Ma JieYue.

18) When ipv6 ping protocol support was added, the AF_INET6 protocol
initialization cleanup path on failure was borked a little. Fix
from Vlad Yasevich.

19) If a socket disconnects during a read/recvmsg/recvfrom/etc that
blocks we can do the wrong thing with the msg_name we write back to
userspace. From Hannes Frederic Sowa. There is another fix in the
works from Hannes which will prevent future problems of this nature.

20) Fix route leak in VTI tunnel transmit, from Fan Du.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits)
genetlink: make multicast groups const, prevent abuse
genetlink: pass family to functions using groups
genetlink: add and use genl_set_err()
genetlink: remove family pointer from genl_multicast_group
genetlink: remove genl_unregister_mc_group()
hsr: don't call genl_unregister_mc_group()
quota/genetlink: use proper genetlink multicast APIs
drop_monitor/genetlink: use proper genetlink multicast APIs
genetlink: only pass array to genl_register_family_with_ops()
tcp: don't update snd_nxt, when a socket is switched from repair mode
atm: idt77252: fix dev refcnt leak
xfrm: Release dst if this dst is improper for vti tunnel
netlink: fix documentation typo in netlink_set_err()
be2net: Delete secondary unicast MAC addresses during be_close
be2net: Fix unconditional enabling of Rx interface options
net, virtio_net: replace the magic value
ping: prevent NULL pointer dereference on write to msg_name
bnx2x: Prevent "timeout waiting for state X"
bnx2x: prevent CFC attention
bnx2x: Prevent panic during DMAE timeout
...

Linus Torvalds
2013-11-20 07:50:47 +0800

19 Nov, 2013

1 commit

eca42aaf8 ipv6: Fix inet6_init() cleanup order ... Browse Code »

Commit 6d0bfe22611602f36617bc7aa2ffa1bbb2f54c67
net: ipv6: Add IPv6 support to the ping socket

introduced a change in the cleanup logic of inet6_init and
has a bug in that ipv6_packet_cleanup() may not be called.
Fix the cleanup ordering.

CC: Hannes Frederic Sowa
CC: Lorenzo Colitti
CC: Fabio Estevam
Signed-off-by: Vlad Yasevich
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Vlad Yasevich
2013-11-19 04:38:46 +0800

14 Nov, 2013

1 commit

5e30025a3 Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull core locking changes from Ingo Molnar:
"The biggest changes:

- add lockdep support for seqcount/seqlocks structures, this
unearthed both bugs and required extra annotation.

- move the various kernel locking primitives to the new
kernel/locking/ directory"

* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
block: Use u64_stats_init() to initialize seqcounts
locking/lockdep: Mark __lockdep_count_forward_deps() as static
lockdep/proc: Fix lock-time avg computation
locking/doc: Update references to kernel/mutex.c
ipv6: Fix possible ipv6 seqlock deadlock
cpuset: Fix potential deadlock w/ set_mems_allowed
seqcount: Add lockdep functionality to seqcount/seqlock structures
net: Explicitly initialize u64_stats_sync structures for lockdep
locking: Move the percpu-rwsem code to kernel/locking/
locking: Move the lglocks code to kernel/locking/
locking: Move the rwsem code to kernel/locking/
locking: Move the rtmutex code to kernel/locking/
locking: Move the semaphore core to kernel/locking/
locking: Move the spinlock code to kernel/locking/
locking: Move the lockdep code to kernel/locking/
locking: Move the mutex code to kernel/locking/
hung_task debugging: Add tracepoint to report the hang
x86/locking/kconfig: Update paravirt spinlock Kconfig description
lockstat: Report avg wait and hold times
lockdep, x86/alternatives: Drop ancient lockdep fixup message
...

Linus Torvalds
2013-11-14 15:30:30 +0800

06 Nov, 2013

1 commit

827da44c6 net: Explicitly initialize u64_stats_sync structures for lockdep ... Browse Code »

In order to enable lockdep on seqcount/seqlock structures, we
must explicitly initialize any locks.

The u64_stats_sync structure, uses a seqcount, and thus we need
to introduce a u64_stats_init() function and use it to initialize
the structure.

This unfortunately adds a lot of fairly trivial initialization code
to a number of drivers. But the benefit of ensuring correctness makes
this worth while.

Because these changes are required for lockdep to be enabled, and the
changes are quite trivial, I've not yet split this patch out into 30-some
separate patches, as I figured it would be better to get the various
maintainers thoughts on how to best merge this change along with
the seqcount lockdep enablement.

Feedback would be appreciated!

Signed-off-by: John Stultz
Acked-by: Julian Anastasov
Signed-off-by: Peter Zijlstra
Cc: Alexey Kuznetsov
Cc: "David S. Miller"
Cc: Eric Dumazet
Cc: Hideaki YOSHIFUJI
Cc: James Morris
Cc: Jesse Gross
Cc: Mathieu Desnoyers
Cc: "Michael S. Tsirkin"
Cc: Mirko Lindner
Cc: Patrick McHardy
Cc: Roger Luethi
Cc: Rusty Russell
Cc: Simon Horman
Cc: Stephen Hemminger
Cc: Steven Rostedt
Cc: Thomas Petazzoni
Cc: Wensong Zhang
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/1381186321-4906-2-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar

John Stultz
2013-11-06 19:40:25 +0800

22 Oct, 2013

1 commit

a4fe34bf9 tcp_memcontrol: Remove the per netns control. ... Browse Code »

The code that is implemented is per memory cgroup not per netns, and
having per netns bits is just confusing. Remove the per netns bits to
make it easier to see what is really going on.

Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller

Eric W. Biederman
2013-10-22 06:43:02 +0800