Eric Lee / smarc-fsl-linux-kernel

25 Feb, 2018

1 commit

f6b581e3d dn_getsockoptdecnet: move nf_{get/set}sockopt outside sock lock ... Browse Code »

commit dfec091439bb2acf763497cfc58f2bdfc67c56b7 upstream.

After commit 3f34cfae1238 ("netfilter: on sockopt() acquire sock lock
only in the required scope"), the caller of nf_{get/set}sockopt() must
not hold any lock, but, in such changeset, I forgot to cope with DECnet.

This commit addresses the issue moving the nf call outside the lock,
in the dn_{get,set}sockopt() with the same schema currently used by
ipv4 and ipv6. Also moves the unhandled sockopts of the end of the main
switch statements, to improve code readability.

Reported-by: Petr Vandrovec
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=198791#c2
Fixes: 3f34cfae1238 ("netfilter: on sockopt() acquire sock lock only in the required scope")
Signed-off-by: Paolo Abeni
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Paolo Abeni
2018-02-25 18:07:52 +0800

02 Nov, 2017

1 commit

b24413180 License cleanup: add SPDX GPL-2.0 license identifier to files with no license ... Browse Code »

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if
Reviewed-by: Philippe Ombredanne
Reviewed-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2017-11-02 18:10:55 +0800

04 Sep, 2017

1 commit

b63f6044d Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next ... Browse Code »

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for your net-next
tree. Basically, updates to the conntrack core, enhancements for
nf_tables, conversion of netfilter hooks from linked list to array to
improve memory locality and asorted improvements for the Netfilter
codebase. More specifically, they are:

1) Add expection to hashes after timer initialization to prevent
access from another CPU that walks on the hashes and calls
del_timer(), from Florian Westphal.

2) Don't update nf_tables chain counters from hot path, this is only
used by the x_tables compatibility layer.

3) Get rid of nested rcu_read_lock() calls from netfilter hook path.
Hooks are always guaranteed to run from rcu read side, so remove
nested rcu_read_lock() where possible. Patch from Taehee Yoo.

4) nf_tables new ruleset generation notifications include PID and name
of the process that has updated the ruleset, from Phil Sutter.

5) Use skb_header_pointer() from nft_fib, so we can reuse this code from
the nf_family netdev family. Patch from Pablo M. Bermudo.

6) Add support for nft_fib in nf_tables netdev family, also from Pablo.

7) Use deferrable workqueue for conntrack garbage collection, to reduce
power consumption, from Patch from Subash Abhinov Kasiviswanathan.

8) Add nf_ct_expect_iterate_net() helper and use it. From Florian
Westphal.

9) Call nf_ct_unconfirmed_destroy only from cttimeout, from Florian.

10) Drop references on conntrack removal path when skbuffs has escaped via
nfqueue, from Florian.

11) Don't queue packets to nfqueue with dying conntrack, from Florian.

12) Constify nf_hook_ops structure, from Florian.

13) Remove neededlessly branch in nf_tables trace code, from Phil Sutter.

14) Add nla_strdup(), from Phil Sutter.

15) Rise nf_tables objects name size up to 255 chars, people want to use
DNS names, so increase this according to what RFC 1035 specifies.
Patch series from Phil Sutter.

16) Kill nf_conntrack_default_on, it's broken. Default on conntrack hook
registration on demand, suggested by Eric Dumazet, patch from Florian.

17) Remove unused variables in compat_copy_entry_from_user both in
ip_tables and arp_tables code. Patch from Taehee Yoo.

18) Constify struct nf_conntrack_l4proto, from Julia Lawall.

19) Constify nf_loginfo structure, also from Julia.

20) Use a single rb root in connlimit, from Taehee Yoo.

21) Remove unused netfilter_queue_init() prototype, from Taehee Yoo.

22) Use audit_log() instead of open-coding it, from Geliang Tang.

23) Allow to mangle tcp options via nft_exthdr, from Florian.

24) Allow to fetch TCP MSS from nft_rt, from Florian. This includes
a fix for a miscalculation of the minimal length.

25) Simplify branch logic in h323 helper, from Nick Desaulniers.

26) Calculate netlink attribute size for conntrack tuple at compile
time, from Florian.

27) Remove protocol name field from nf_conntrack_{l3,l4}proto structure.
From Florian.

28) Remove holes in nf_conntrack_l4proto structure, so it becomes
smaller. From Florian.

29) Get rid of print_tuple() indirection for /proc conntrack listing.
Place all the code in net/netfilter/nf_conntrack_standalone.c.
Patch from Florian.

30) Do not built in print_conntrack() if CONFIG_NF_CONNTRACK_PROCFS is
off. From Florian.

31) Constify most nf_conntrack_{l3,l4}proto helper functions, from
Florian.

32) Fix broken indentation in ebtables extensions, from Colin Ian King.

33) Fix several harmless sparse warning, from Florian.

34) Convert netfilter hook infrastructure to use array for better memory
locality, joint work done by Florian and Aaron Conole. Moreover, add
some instrumentation to debug this.

35) Batch nf_unregister_net_hooks() calls, to call synchronize_net once
per batch, from Florian.

36) Get rid of noisy logging in ICMPv6 conntrack helper, from Florian.

37) Get rid of obsolete NFDEBUG() instrumentation, from Varsha Rao.

38) Remove unused code in the generic protocol tracker, from Davide
Caratti.

I think I will have material for a second Netfilter batch in my queue if
time allow to make it fit in this merge window.
====================

Signed-off-by: David S. Miller

David S. Miller
2017-09-04 08:08:42 +0800

30 Aug, 2017

1 commit

eaa72dc47 neigh: increase queue_len_bytes to match wmem_default ... Browse Code »

Florian reported UDP xmit drops that could be root caused to the
too small neigh limit.

Current limit is 64 KB, meaning that even a single UDP socket would hit
it, since its default sk_sndbuf comes from net.core.wmem_default
(~212992 bytes on 64bit arches).

Once ARP/ND resolution is in progress, we should allow a little more
packets to be queued, at least for one producer.

Once neigh arp_queue is filled, a rogue socket should hit its sk_sndbuf
limit and either block in sendmsg() or return -EAGAIN.

Signed-off-by: Eric Dumazet
Reported-by: Florian Fainelli
Signed-off-by: David S. Miller

Eric Dumazet
2017-08-30 07:10:50 +0800

10 Aug, 2017

1 commit

b97bac64a rtnetlink: make rtnl_register accept a flags parameter ... Browse Code »

This change allows us to later indicate to rtnetlink core that certain
doit functions should be called without acquiring rtnl_mutex.

This change should have no effect, we simply replace the last (now
unused) calcit argument with the new flag.

Signed-off-by: Florian Westphal
Reviewed-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Florian Westphal
2017-08-10 07:57:38 +0800

01 Aug, 2017

1 commit

591bb2789 netfilter: nf_hook_ops structs can be const ... Browse Code »

We no longer place these on a list so they can be const.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2017-08-01 01:10:44 +0800

05 Jul, 2017

1 commit

e0542dd51 net, decnet: convert dn_fib_info.fib_clntref from atomic_t to refcount_t ... Browse Code »

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova
Signed-off-by: Hans Liljestrand
Signed-off-by: Kees Cook
Signed-off-by: David Windsor
Signed-off-by: David S. Miller

Reshetova, Elena
2017-07-05 05:35:15 +0800

01 Jul, 2017

1 commit

9f2374301 net: convert neighbour.refcnt from atomic_t to refcount_t ... Browse Code »

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova
Signed-off-by: Hans Liljestrand
Signed-off-by: Kees Cook
Signed-off-by: David Windsor
Signed-off-by: David S. Miller

Reshetova, Elena
2017-07-01 22:39:07 +0800

22 Jun, 2017

1 commit

3d0919824 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Two entries being added at the same time to the IFLA
policy table, whilst parallel bug fixes to decnet
routing dst handling overlapping with the dst gc removal
in net-next.

Signed-off-by: David S. Miller

David S. Miller
2017-06-22 05:35:22 +0800

18 Jun, 2017

2 commits

b2a9c0ed7 net: remove DST_NOGC flag ... Browse Code »

Now that all the components have been changed to release dst based on
refcnt only and not depend on dst gc anymore, we can remove the
temporary flag DST_NOGC.

Note that we also need to remove the DST_NOCACHE check in dst_release()
and dst_hold_safe() because now all the dst are released based on refcnt
and behaves as DST_NOCACHE.

Signed-off-by: Wei Wang
Acked-by: Martin KaFai Lau
Signed-off-by: David S. Miller

Wei Wang
2017-06-18 10:54:01 +0800
560fd93bc decnet: take dst->__refcnt when struct dn_route is created ... Browse Code »

struct dn_route is inserted into dn_rt_hash_table but no dst->__refcnt
is taken.
This patch makes sure the dn_rt_hash_table's reference to the dst is ref
counted.

As the dst is always ref counted properly, we can safely mark
DST_NOGC flag so dst_release() will release dst based on refcnt only.
And dst gc is no longer needed and all dst_free() or its related
function calls should be replaced with dst_release() or
dst_release_immediate(). And dst_dev_put() is called when removing dst
from the hash table to release the reference on dst->dev before we lose
pointer to it.

Also, correct the logic in dn_dst_check_expire() and dn_dst_gc() to
check dst->__refcnt to be > 1 to indicate it is referenced by other
users.

Signed-off-by: Wei Wang
Acked-by: Martin KaFai Lau
Signed-off-by: David S. Miller

Wei Wang
2017-06-18 10:54:01 +0800

17 Jun, 2017

2 commits

247488c0a decnet: always not take dst->__refcnt when inserting dst into hash table ... Browse Code »

In the existing dn_route.c code, dn_route_output_slow() takes
dst->__refcnt before calling dn_insert_route() while dn_route_input_slow()
does not take dst->__refcnt before calling dn_insert_route().
This makes the whole routing code very buggy.
In dn_dst_check_expire(), dnrt_free() is called when rt expires. This
makes the routes inserted by dn_route_output_slow() not able to be
freed as the refcnt is not released.
In dn_dst_gc(), dnrt_drop() is called to release rt which could
potentially cause the dst->__refcnt to be dropped to -1.
In dn_run_flush(), dst_free() is called to release all the dst. Again,
it makes the dst inserted by dn_route_output_slow() not able to be
released and also, it does not wait on the rcu and could potentially
cause crash in the path where other users still refer to this dst.

This patch makes sure both input and output path do not take
dst->__refcnt before calling dn_insert_route() and also makes sure
dnrt_free()/dst_free() is called when removing dst from the hash table.
The only difference between those 2 calls is that dnrt_free() waits on
the rcu while dst_free() does not.

Signed-off-by: Wei Wang
Acked-by: Martin KaFai Lau
Signed-off-by: David S. Miller

Wei Wang
2017-06-17 03:00:00 +0800
76371d2e3 decnet: always not take dst->__refcnt when inserting dst into hash table ... Browse Code »

In the existing dn_route.c code, dn_route_output_slow() takes
dst->__refcnt before calling dn_insert_route() while dn_route_input_slow()
does not take dst->__refcnt before calling dn_insert_route().
This makes the whole routing code very buggy.
In dn_dst_check_expire(), dnrt_free() is called when rt expires. This
makes the routes inserted by dn_route_output_slow() not able to be
freed as the refcnt is not released.
In dn_dst_gc(), dnrt_drop() is called to release rt which could
potentially cause the dst->__refcnt to be dropped to -1.
In dn_run_flush(), dst_free() is called to release all the dst. Again,
it makes the dst inserted by dn_route_output_slow() not able to be
released and also, it does not wait on the rcu and could potentially
cause crash in the path where other users still refer to this dst.

This patch makes sure both input and output path do not take
dst->__refcnt before calling dn_insert_route() and also makes sure
dnrt_free()/dst_free() is called when removing dst from the hash table.
The only difference between those 2 calls is that dnrt_free() waits on
the rcu while dst_free() does not.

Signed-off-by: Wei Wang
Acked-by: Martin KaFai Lau
Signed-off-by: David S. Miller

Wei Wang
2017-06-17 02:59:36 +0800

16 Jun, 2017

4 commits

634fef610 networking: add and use skb_put_u8() ... Browse Code »

Joe and Bjørn suggested that it'd be nicer to not have the
cast in the fairly common case of doing
*(u8 *)skb_put(skb, 1) = c;

Add skb_put_u8() for this case, and use it across the code,
using the following spatch:

@@
expression SKB, C, S;
typedef u8;
identifier fn = {skb_put};
fresh identifier fn2 = fn ## "_u8";
@@
- *(u8 *)fn(SKB, S) = C;
+ fn2(SKB, C);

Note that due to the "S", the spatch isn't perfect, it should
have checked that S is 1, but there's also places that use a
sizeof expression like sizeof(var) or sizeof(u8) etc. Turns
out that nobody ever did something like
*(u8 *)skb_put(skb, 2) = c;

which would be wrong anyway since the second byte wouldn't be
initialized.

Suggested-by: Joe Perches
Suggested-by: Bjørn Mork
Signed-off-by: Johannes Berg
Signed-off-by: David S. Miller

Johannes Berg
2017-06-16 23:48:40 +0800
d58ff3512 networking: make skb_push & __skb_push return void pointers ... Browse Code »

It seems like a historic accident that these return unsigned char *,
and in many places that means casts are required, more often than not.

Make these functions return void * and remove all the casts across
the tree, adding a (u8 *) cast only where the unsigned char pointer
was used directly, all done with the following spatch:

@@
expression SKB, LEN;
typedef u8;
identifier fn = { skb_push, __skb_push, skb_push_rcsum };
@@
- *(fn(SKB, LEN))
+ *(u8 *)fn(SKB, LEN)

@@
expression E, SKB, LEN;
identifier fn = { skb_push, __skb_push, skb_push_rcsum };
type T;
@@
- E = ((T *)(fn(SKB, LEN)))
+ E = fn(SKB, LEN)

@@
expression SKB, LEN;
identifier fn = { skb_push, __skb_push, skb_push_rcsum };
@@
- fn(SKB, LEN)[0]
+ *(u8 *)fn(SKB, LEN)

Note that the last part there converts from push(...)[0] to the
more idiomatic *(u8 *)push(...).

Signed-off-by: Johannes Berg
Signed-off-by: David S. Miller

Johannes Berg
2017-06-16 23:48:40 +0800
4df864c1d networking: make skb_put & friends return void pointers ... Browse Code »

It seems like a historic accident that these return unsigned char *,
and in many places that means casts are required, more often than not.

Make these functions (skb_put, __skb_put and pskb_put) return void *
and remove all the casts across the tree, adding a (u8 *) cast only
where the unsigned char pointer was used directly, all done with the
following spatch:

@@
expression SKB, LEN;
typedef u8;
identifier fn = { skb_put, __skb_put };
@@
- *(fn(SKB, LEN))
+ *(u8 *)fn(SKB, LEN)

@@
expression E, SKB, LEN;
identifier fn = { skb_put, __skb_put };
type T;
@@
- E = ((T *)(fn(SKB, LEN)))
+ E = fn(SKB, LEN)

which actually doesn't cover pskb_put since there are only three
users overall.

A handful of stragglers were converted manually, notably a macro in
drivers/isdn/i4l/isdn_bsdcomp.c and, oddly enough, one of the many
instances in net/bluetooth/hci_sock.c. In the former file, I also
had to fix one whitespace problem spatch introduced.

Signed-off-by: Johannes Berg
Signed-off-by: David S. Miller

Johannes Berg
2017-06-16 23:48:39 +0800
59ae1d127 networking: introduce and use skb_put_data() ... Browse Code »

A common pattern with skb_put() is to just want to memcpy()
some data into the new space, introduce skb_put_data() for
this.

An spatch similar to the one for skb_put_zero() converts many
of the places using it:

@@
identifier p, p2;
expression len, skb, data;
type t, t2;
@@
(
-p = skb_put(skb, len);
+p = skb_put_data(skb, data, len);
|
-p = (t)skb_put(skb, len);
+p = skb_put_data(skb, data, len);
)
(
p2 = (t2)p;
-memcpy(p2, data, len);
|
-memcpy(p, data, len);
)

@@
type t, t2;
identifier p, p2;
expression skb, data;
@@
t *p;
...
(
-p = skb_put(skb, sizeof(t));
+p = skb_put_data(skb, data, sizeof(t));
|
-p = (t *)skb_put(skb, sizeof(t));
+p = skb_put_data(skb, data, sizeof(t));
)
(
p2 = (t2)p;
-memcpy(p2, data, sizeof(*p));
|
-memcpy(p, data, sizeof(*p));
)

@@
expression skb, len, data;
@@
-memcpy(skb_put(skb, len), data, len);
+skb_put_data(skb, data, len);

(again, manually post-processed to retain some comments)

Reviewed-by: Stephen Hemminger
Signed-off-by: Johannes Berg
Signed-off-by: David S. Miller

Johannes Berg
2017-06-16 23:48:37 +0800

15 Jun, 2017

1 commit

0ddead90b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

The conflicts were two cases of overlapping changes in
batman-adv and the qed driver.

Signed-off-by: David S. Miller

David S. Miller
2017-06-15 23:59:32 +0800

08 Jun, 2017

4 commits

060447511 tcp: add TCPMemoryPressuresChrono counter ... Browse Code »

DRAM supply shortage and poor memory pressure tracking in TCP
stack makes any change in SO_SNDBUF/SO_RCVBUF (or equivalent autotuning
limits) and tcp_mem[] quite hazardous.

TCPMemoryPressures SNMP counter is an indication of tcp_mem sysctl
limits being hit, but only tracking number of transitions.

If TCP stack behavior under stress was perfect :
1) It would maintain memory usage close to the limit.
2) Memory pressure state would be entered for short times.

We certainly prefer 100 events lasting 10ms compared to one event
lasting 200 seconds.

This patch adds a new SNMP counter tracking cumulative duration of
memory pressure events, given in ms units.

$ cat /proc/sys/net/ipv4/tcp_mem
3088 4117 6176
$ grep TCP /proc/net/sockstat
TCP: inuse 180 orphan 0 tw 2 alloc 234 mem 4140
$ nstat -n ; sleep 10 ; nstat |grep Pressure
TcpExtTCPMemoryPressures 1700
TcpExtTCPMemoryPressuresChrono 5209

v2: Used EXPORT_SYMBOL_GPL() instead of EXPORT_SYMBOL() as David
instructed.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2017-06-08 23:26:19 +0800
dd0da17b2 decnet: dn_rtmsg: Improve input length sanitization in dnrmg_receive_user_skb ... Browse Code »

Verify that the length of the socket buffer is sufficient to cover the
nlmsghdr structure before accessing the nlh->nlmsg_len field for further
input sanitization. If the client only supplies 1-3 bytes of data in
sk_buff, then nlh->nlmsg_len remains partially uninitialized and
contains leftover memory from the corresponding kernel allocation.
Operating on such data may result in indeterminate evaluation of the
nlmsg_len < sizeof(*nlh) expression.

The bug was discovered by a runtime instrumentation designed to detect
use of uninitialized memory in the kernel. The patch prevents this and
other similar tools (e.g. KMSAN) from flagging this behavior in the future.

Signed-off-by: Mateusz Jurczyk
Signed-off-by: David S. Miller

Mateusz Jurczyk
2017-06-08 22:51:22 +0800
c164772dd Revert "decnet: dn_rtmsg: Improve input length sanitization in dnrmg_receive_user_skb" ... Browse Code »

This reverts commit 85eac2ba35a2dbfbdd5767c7447a4af07444a5b4.

There is an updated version of this fix which we should
use instead.

Signed-off-by: David S. Miller

David S. Miller
2017-06-08 22:50:18 +0800
85eac2ba3 decnet: dn_rtmsg: Improve input length sanitization in dnrmg_receive_user_skb ... Browse Code »

Verify that the length of the socket buffer is sufficient to cover the
entire nlh->nlmsg_len field before accessing that field for further
input sanitization. If the client only supplies 1-3 bytes of data in
sk_buff, then nlh->nlmsg_len remains partially uninitialized and
contains leftover memory from the corresponding kernel allocation.
Operating on such data may result in indeterminate evaluation of the
nlmsg_len < sizeof(*nlh) expression.

The bug was discovered by a runtime instrumentation designed to detect
use of uninitialized memory in the kernel. The patch prevents this and
other similar tools (e.g. KMSAN) from flagging this behavior in the future.

Signed-off-by: Mateusz Jurczyk
Signed-off-by: David S. Miller

Mateusz Jurczyk
2017-06-08 22:38:54 +0800

10 May, 2017

1 commit

50fb55d88 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:

1) Fix multiqueue in stmmac driver on PCI, from Andy Shevchenko.

2) cdc_ncm doesn't actually fully zero out the padding area is
allocates on TX, from Jim Baxter.

3) Don't leak map addresses in BPF verifier, from Daniel Borkmann.

4) If we randomize TCP timestamps, we have to do it everywhere
including SYN cookies. From Eric Dumazet.

5) Fix "ethtool -S" crash in aquantia driver, from Pavel Belous.

6) Fix allocation size for ntp filter bitmap in bnxt_en driver, from
Dan Carpenter.

7) Add missing memory allocation return value check to DSA loop driver,
from Christophe Jaillet.

8) Fix XDP leak on driver unload in qed driver, from Suddarsana Reddy
Kalluru.

9) Don't inherit MC list from parent inet connection sockets, another
syzkaller spotted gem. Fix from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits)
dccp/tcp: do not inherit mc_list from parent
qede: Split PF/VF ndos.
qed: Correct doorbell configuration for !4Kb pages
qed: Tell QM the number of tasks
qed: Fix VF removal sequence
qede: Fix XDP memory leak on unload
net/mlx4_core: Reduce harmless SRIOV error message to debug level
net/mlx4_en: Avoid adding steering rules with invalid ring
net/mlx4_en: Change the error print to debug print
drivers: net: wimax: i2400m: i2400m-usb: Use time_after for time comparison
DECnet: Use container_of() for embedded struct
Revert "ipv4: restore rt->fi for reference counting"
net: mdio-mux: bcm-iproc: call mdiobus_free() in error path
net: ethernet: ti: cpsw: adjust cpsw fifos depth for fullduplex flow control
ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf
net: cdc_ncm: Fix TX zero padding
stmmac: pci: split out common_default_data() helper
stmmac: pci: RX queue routing configuration
stmmac: pci: TX and RX queue priority configuration
stmmac: pci: set default number of rx and tx queues
...

Linus Torvalds
2017-05-10 06:42:31 +0800

09 May, 2017

2 commits

f92ceb01c DECnet: Use container_of() for embedded struct ... Browse Code »

Instead of a direct cross-type cast, use conatiner_of() to locate
the embedded structure, even in the face of future struct layout
randomization.

Signed-off-by: Kees Cook
Signed-off-by: David S. Miller

Kees Cook
2017-05-09 21:39:49 +0800
063246641 format-security: move static strings to const ... Browse Code »

While examining output from trial builds with -Wformat-security enabled,
many strings were found that should be defined as "const", or as a char
array instead of char pointer. This makes some static analysis easier,
by producing fewer false positives.

As these are all trivial changes, it seemed best to put them all in a
single patch rather than chopping them up per maintainer.

Link: http://lkml.kernel.org/r/20170405214711.GA5711@beast
Signed-off-by: Kees Cook
Acked-by: Jes Sorensen [runner.c]
Cc: Tony Lindgren
Cc: Russell King
Cc: "Maciej W. Rozycki"
Cc: Ralf Baechle
Cc: Arnd Bergmann
Cc: Greg Kroah-Hartman
Cc: "Rafael J. Wysocki"
Cc: Viresh Kumar
Cc: Daniel Vetter
Cc: Jani Nikula
Cc: Sean Paul
Cc: David Airlie
Cc: Yisen Zhuang
Cc: Salil Mehta
Cc: Thomas Bogendoerfer
Cc: Jiri Slaby
Cc: Patrice Chotard
Cc: "David S. Miller"
Cc: James Hogan
Cc: Paul Burton
Cc: Matt Redfearn
Cc: Paolo Bonzini
Cc: Ingo Molnar
Cc: Rasmus Villemoes
Cc: Mugunthan V N
Cc: Felipe Balbi
Cc: Jarod Wilson
Cc: Florian Westphal
Cc: Antonio Quartulli
Cc: Dmitry Torokhov
Cc: Kejian Yan
Cc: Daode Huang
Cc: Qianqian Xie
Cc: Philippe Reynes
Cc: Colin Ian King
Cc: Eric Dumazet
Cc: Christian Gromm
Cc: Andrey Shvetsov
Cc: Jason Litzinger
Cc: WANG Cong
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
2017-05-09 08:15:14 +0800

01 May, 2017

1 commit

a01aa920b Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next ... Browse Code »

Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter updates for your net-next
tree. A large bunch of code cleanups, simplify the conntrack extension
codebase, get rid of the fake conntrack object, speed up netns by
selective synchronize_net() calls. More specifically, they are:

1) Check for ct->status bit instead of using nfct_nat() from IPVS and
Netfilter codebase, patch from Florian Westphal.

2) Use kcalloc() wherever possible in the IPVS code, from Varsha Rao.

3) Simplify FTP IPVS helper module registration path, from Arushi Singhal.

4) Introduce nft_is_base_chain() helper function.

5) Enforce expectation limit from userspace conntrack helper,
from Gao Feng.

6) Add nf_ct_remove_expect() helper function, from Gao Feng.

7) NAT mangle helper function return boolean, from Gao Feng.

8) ctnetlink_alloc_expect() should only work for conntrack with
helpers, from Gao Feng.

9) Add nfnl_msg_type() helper function to nfnetlink to build the
netlink message type.

10) Get rid of unnecessary cast on void, from simran singhal.

11) Use seq_puts()/seq_putc() instead of seq_printf() where possible,
also from simran singhal.

12) Use list_prev_entry() from nf_tables, from simran signhal.

13) Remove unnecessary & on pointer function in the Netfilter and IPVS
code.

14) Remove obsolete comment on set of rules per CPU in ip6_tables,
no longer true. From Arushi Singhal.

15) Remove duplicated nf_conntrack_l4proto_udplite4, from Gao Feng.

16) Remove unnecessary nested rcu_read_lock() in
__nf_nat_decode_session(). Code running from hooks are already
guaranteed to run under RCU read side.

17) Remove deadcode in nf_tables_getobj(), from Aaron Conole.

18) Remove double assignment in nf_ct_l4proto_pernet_unregister_one(),
also from Aaron.

19) Get rid of unsed __ip_set_get_netlink(), from Aaron Conole.

20) Don't propagate NF_DROP error to userspace via ctnetlink in
__nf_nat_alloc_null_binding() function, from Gao Feng.

21) Revisit nf_ct_deliver_cached_events() to remove unnecessary checks,
from Gao Feng.

22) Kill the fake untracked conntrack objects, use ctinfo instead to
annotate a conntrack object is untracked, from Florian Westphal.

23) Remove nf_ct_is_untracked(), now obsolete since we have no
conntrack template anymore, from Florian.

24) Add event mask support to nft_ct, also from Florian.

25) Move nf_conn_help structure to
include/net/netfilter/nf_conntrack_helper.h.

26) Add a fixed 32 bytes scratchpad area for conntrack helpers.
Thus, we don't deal with variable conntrack extensions anymore.
Make sure userspace conntrack helper doesn't go over that size.
Remove variable size ct extension infrastructure now this code
got no more clients. From Florian Westphal.

27) Restore offset and length of nf_ct_ext structure to 8 bytes now
that wraparound is not possible any longer, also from Florian.

28) Allow to get rid of unassured flows under stress in conntrack,
this applies to DCCP, SCTP and TCP protocols, from Florian.

29) Shrink size of nf_conntrack_ecache structure, from Florian.

30) Use TCP_MAX_WSCALE instead of hardcoded 14 in TCP tracker,
from Gao Feng.

31) Register SYNPROXY hooks on demand, from Florian Westphal.

32) Use pernet hook whenever possible, instead of global hook
registration, from Florian Westphal.

33) Pass hook structure to ebt_register_table() to consolidate some
infrastructure code, from Florian Westphal.

34) Use consume_skb() and return NF_STOLEN, instead of NF_DROP in the
SYNPROXY code, to make sure device stats are not fooled, patch
from Gao Feng.

35) Remove NF_CT_EXT_F_PREALLOC this kills quite some code that we
don't need anymore if we just select a fixed size instead of
expensive runtime time calculation of this. From Florian.

36) Constify nf_ct_extend_register() and nf_ct_extend_unregister(),
from Florian.

37) Simplify nf_ct_ext_add(), this kills nf_ct_ext_create(), from
Florian.

38) Attach NAT extension on-demand from masquerade and pptp helper
path, from Florian.

39) Get rid of useless ip_vs_set_state_timeout(), from Aaron Conole.

40) Speed up netns by selective calls of synchronize_net(), from
Florian Westphal.

41) Silence stack size warning gcc in 32-bit arch in snmp helper,
from Florian.

42) Inconditionally call nf_ct_ext_destroy(), even if we have no
extensions, to deal with the NF_NAT_MANIP_SRC case. Patch from
Liping Zhang.
====================

Signed-off-by: David S. Miller

David S. Miller
2017-05-01 22:47:53 +0800

26 Apr, 2017

1 commit

1a0ed0ad4 netfilter: decnet: only register hooks in init namespace ... Browse Code »

looks like decnet isn't namespacified in first place, so restrict hook
registration to the initial namespace.

Prepares for eventual removal of legacy nf_register_hook() api.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2017-04-26 15:30:21 +0800

18 Apr, 2017

1 commit

c21ef3e34 net: rtnetlink: plumb extended ack to doit function ... Browse Code »

Add netlink_ext_ack arg to rtnl_doit_func. Pass extack arg to nlmsg_parse
for doit functions that call it directly.

This is the first step to using extended error reporting in rtnetlink.
>From here individual subsystems can be updated to set netlink_ext_ack as
needed.

Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2017-04-18 03:35:38 +0800

14 Apr, 2017

2 commits

fceb6435e netlink: pass extended ACK struct to parsing functions ... Browse Code »

Pass the new extended ACK reporting struct to all of the generic
netlink parsing functions. For now, pass NULL in almost all callers
(except for some in the core.)

Signed-off-by: Johannes Berg
Signed-off-by: David S. Miller

Johannes Berg
2017-04-14 01:58:22 +0800
2d4bc9336 netlink: extended ACK reporting ... Browse Code »

Add the base infrastructure and UAPI for netlink extended ACK
reporting. All "manual" calls to netlink_ack() pass NULL for now and
thus don't get extended ACK reporting.

Big thanks goes to Pablo Neira Ayuso for not only bringing up the
whole topic at netconf (again) but also coming up with the nlattr
passing trick and various other ideas.

Signed-off-by: Johannes Berg
Reviewed-by: David Ahern
Signed-off-by: David S. Miller

Johannes Berg
2017-04-14 01:58:20 +0800

16 Mar, 2017

1 commit

101c43149 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/net/ethernet/broadcom/genet/bcmgenet.c
net/core/sock.c

Conflicts were overlapping changes in bcmgenet and the
lockdep handling of sockets.

Signed-off-by: David S. Miller

David S. Miller
2017-03-16 02:59:10 +0800

10 Mar, 2017

1 commit

cdfbabfb2 net: Work around lockdep limitation in sockets that use sockets ... Browse Code »

Lockdep issues a circular dependency warning when AFS issues an operation
through AF_RXRPC from a context in which the VFS/VM holds the mmap_sem.

The theory lockdep comes up with is as follows:

(1) If the pagefault handler decides it needs to read pages from AFS, it
calls AFS with mmap_sem held and AFS begins an AF_RXRPC call, but
creating a call requires the socket lock:

mmap_sem must be taken before sk_lock-AF_RXRPC

(2) afs_open_socket() opens an AF_RXRPC socket and binds it. rxrpc_bind()
binds the underlying UDP socket whilst holding its socket lock.
inet_bind() takes its own socket lock:

sk_lock-AF_RXRPC must be taken before sk_lock-AF_INET

(3) Reading from a TCP socket into a userspace buffer might cause a fault
and thus cause the kernel to take the mmap_sem, but the TCP socket is
locked whilst doing this:

sk_lock-AF_INET must be taken before mmap_sem

However, lockdep's theory is wrong in this instance because it deals only
with lock classes and not individual locks. The AF_INET lock in (2) isn't
really equivalent to the AF_INET lock in (3) as the former deals with a
socket entirely internal to the kernel that never sees userspace. This is
a limitation in the design of lockdep.

Fix the general case by:

(1) Double up all the locking keys used in sockets so that one set are
used if the socket is created by userspace and the other set is used
if the socket is created by the kernel.

(2) Store the kern parameter passed to sk_alloc() in a variable in the
sock struct (sk_kern_sock). This informs sock_lock_init(),
sock_init_data() and sk_clone_lock() as to the lock keys to be used.

Note that the child created by sk_clone_lock() inherits the parent's
kern setting.

(3) Add a 'kern' parameter to ->accept() that is analogous to the one
passed in to ->create() that distinguishes whether kernel_accept() or
sys_accept4() was the caller and can be passed to sk_alloc().

Note that a lot of accept functions merely dequeue an already
allocated socket. I haven't touched these as the new socket already
exists before we get the parameter.

Note also that there are a couple of places where I've made the accepted
socket unconditionally kernel-based:

irda_accept()
rds_rcp_accept_one()
tcp_accept_from_sock()

because they follow a sock_create_kern() and accept off of that.

Whilst creating this, I noticed that lustre and ocfs don't create sockets
through sock_create_kern() and thus they aren't marked as for-kernel,
though they appear to be internal. I wonder if these should do that so
that they use the new set of lock keys.

Signed-off-by: David Howells
Signed-off-by: David S. Miller

David Howells
2017-03-10 10:23:27 +0800

08 Mar, 2017

1 commit

9c28286b1 decnet: Use TCP nagle macro instead of literal number in decnet ... Browse Code »

Use existing TCP nagle macro TCP_NAGLE_OFF and TCP_NAGLE_CORK instead
of the literal number 1 and 2 in the current decnet codes.

Signed-off-by: Gao Feng
Signed-off-by: David S. Miller

Gao Feng
2017-03-08 06:07:55 +0800

02 Mar, 2017

1 commit

3f07c0144 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/signal.h> ... Browse Code »

We are going to split out of , which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder file that just
maps to to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2017-03-02 15:42:29 +0800

25 Dec, 2016

1 commit

7c0f6ba68 Replace <asm/uaccess.h> with <linux/uaccess.h> globally ... Browse Code »

This was entirely automated, using the script by Al:

PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*'
sed -i -e "s!$PATT!#include !" \
$(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro
Signed-off-by: Linus Torvalds

Linus Torvalds
2016-12-25 03:46:01 +0800

18 Dec, 2016

1 commit

9d1c0ca5e net: use designated initializers ... Browse Code »

Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.

Signed-off-by: Kees Cook
Signed-off-by: David S. Miller

Kees Cook
2016-12-18 00:56:57 +0800

15 Nov, 2016

1 commit

d9dc8b0f8 net: fix sleeping for sk_wait_event() ... Browse Code »

Similar to commit 14135f30e33c ("inet: fix sleeping inside inet_wait_for_connect()"),
sk_wait_event() needs to fix too, because release_sock() is blocking,
it changes the process state back to running after sleep, which breaks
the previous prepare_to_wait().

Switch to the new wait API.

Cc: Eric Dumazet
Cc: Peter Zijlstra
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller

WANG Cong
2016-11-15 02:17:21 +0800

06 Jul, 2016

1 commit

ab58298cf net: fix decnet rtnexthop parsing ... Browse Code »

dn_fib_count_nhs() could enter an infinite loop if nhp->rtnh_len == 0
(i.e. if userspace passes a malformed netlink message).

Let's use the helpers from net/nexthop.h which take care of all this
stuff. We can do exactly the same as e.g. fib_count_nexthops() and
fib_get_nhs() from net/ipv4/fib_semantics.c.

This fixes the softlockup for me.

Cc: Thomas Graf
Signed-off-by: Vegard Nossum
Signed-off-by: David S. Miller

Vegard Nossum
2016-07-06 05:08:47 +0800

11 Apr, 2016

1 commit

a36a0d400 decnet: Do not build routes to devices without decnet private data. ... Browse Code »

In particular, make sure we check for decnet private presence
for loopback devices.

Signed-off-by: David S. Miller

David S. Miller
2016-04-11 11:01:30 +0800

15 Dec, 2015

1 commit

79462ad02 net: add validation for the socket syscall protocol argument ... Browse Code »

郭永刚 reported that one could simply crash the kernel as root by
using a simple program:

int socket_fd;
struct sockaddr_in addr;
addr.sin_port = 0;
addr.sin_addr.s_addr = INADDR_ANY;
addr.sin_family = 10;

socket_fd = socket(10,3,0x40000000);
connect(socket_fd , &addr,16);

AF_INET, AF_INET6 sockets actually only support 8-bit protocol
identifiers. inet_sock's skc_protocol field thus is sized accordingly,
thus larger protocol identifiers simply cut off the higher bits and
store a zero in the protocol fields.

This could lead to e.g. NULL function pointer because as a result of
the cut off inet_num is zero and we call down to inet_autobind, which
is NULL for raw sockets.

kernel: Call Trace:
kernel: [] ? inet_autobind+0x2e/0x70
kernel: [] inet_dgram_connect+0x54/0x80
kernel: [] SYSC_connect+0xd9/0x110
kernel: [] ? ptrace_notify+0x5b/0x80
kernel: [] ? syscall_trace_enter_phase2+0x108/0x200
kernel: [] SyS_connect+0xe/0x10
kernel: [] tracesys_phase2+0x84/0x89

I found no particular commit which introduced this problem.

CVE: CVE-2015-8543
Cc: Cong Wang
Reported-by: 郭永刚
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2015-12-15 05:09:30 +0800