Eric Lee / smarc-fsl-linux-kernel

06 Mar, 2019

1 commit

98fa15f34 mm: replace all open encodings for NUMA_NO_NODE ... Browse Code »

Patch series "Replace all open encodings for NUMA_NO_NODE", v3.

All these places for replacement were found by running the following
grep patterns on the entire kernel code. Please let me know if this
might have missed some instances. This might also have replaced some
false positives. I will appreciate suggestions, inputs and review.

1. git grep "nid == -1"
2. git grep "node == -1"
3. git grep "nid = -1"
4. git grep "node = -1"

This patch (of 2):

At present there are multiple places where invalid node number is
encoded as -1. Even though implicitly understood it is always better to
have macros in there. Replace these open encodings for an invalid node
number with the global macro NUMA_NO_NODE. This helps remove NUMA
related assumptions like 'invalid node' from various places redirecting
them to a common definition.

Link: http://lkml.kernel.org/r/1545127933-10711-2-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual
Reviewed-by: David Hildenbrand
Acked-by: Jeff Kirsher [ixgbe]
Acked-by: Jens Axboe [mtip32xx]
Acked-by: Vinod Koul [dmaengine.c]
Acked-by: Michael Ellerman [powerpc]
Acked-by: Doug Ledford [drivers/infiniband]
Cc: Joseph Qi
Cc: Hans Verkuil
Cc: Stephen Rothwell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Anshuman Khandual
2019-03-06 13:07:14 +0800

14 Sep, 2018

1 commit

f91845da9 pktgen: Fix fall-through annotation ... Browse Code »

Replace "fallthru" with a proper "fall through" annotation.

This fix is part of the ongoing efforts to enabling
-Wimplicit-fallthrough

Signed-off-by: Gustavo A. R. Silva
Signed-off-by: David S. Miller

Gustavo A. R. Silva
2018-09-14 06:36:41 +0800

28 Jul, 2018

1 commit

7a49d3d4e Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next ... Browse Code »

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2018-07-27

1) Extend the output_mark to also support the input direction
and masking the mark values before applying to the skb.

2) Add a new lookup key for the upcomming xfrm interfaces.

3) Extend the xfrm lookups to match xfrm interface IDs.

4) Add virtual xfrm interfaces. The purpose of these interfaces
is to overcome the design limitations that the existing
VTI devices have.

The main limitations that we see with the current VTI are the
following:

VTI interfaces are L3 tunnels with configurable endpoints.
For xfrm, the tunnel endpoint are already determined by the SA.
So the VTI tunnel endpoints must be either the same as on the
SA or wildcards. In case VTI tunnel endpoints are same as on
the SA, we get a one to one correlation between the SA and
the tunnel. So each SA needs its own tunnel interface.

On the other hand, we can have only one VTI tunnel with
wildcard src/dst tunnel endpoints in the system because the
lookup is based on the tunnel endpoints. The existing tunnel
lookup won't work with multiple tunnels with wildcard
tunnel endpoints. Some usecases require more than on
VTI tunnel of this type, for example if somebody has multiple
namespaces and every namespace requires such a VTI.

VTI needs separate interfaces for IPv4 and IPv6 tunnels.
So when routing to a VTI, we have to know to which address
family this traffic class is going to be encapsulated.
This is a lmitation because it makes routing more complex
and it is not always possible to know what happens behind the
VTI, e.g. when the VTI is move to some namespace.

VTI works just with tunnel mode SAs. We need generic interfaces
that ensures transfomation, regardless of the xfrm mode and
the encapsulated address family.

VTI is configured with a combination GRE keys and xfrm marks.
With this we have to deal with some extra cases in the generic
tunnel lookup because the GRE keys on the VTI are actually
not GRE keys, the GRE keys were just reused for something else.
All extensions to the VTI interfaces would require to add
even more complexity to the generic tunnel lookup.

So to overcome this, we developed xfrm interfaces with the
following design goal:

It should be possible to tunnel IPv4 and IPv6 through the same
interface.

No limitation on xfrm mode (tunnel, transport and beet).

Should be a generic virtual interface that ensures IPsec
transformation, no need to know what happens behind the
interface.

Interfaces should be configured with a new key that must match a
new policy/SA lookup key.

The lookup logic should stay in the xfrm codebase, no need to
change or extend generic routing and tunnel lookups.

Should be possible to use IPsec hardware offloads of the underlying
interface.

5) Remove xfrm pcpu policy cache. This was added after the flowcache
removal, but it turned out to make things even worse.
From Florian Westphal.

6) Allow to update the set mark on SA updates.
From Nathan Harold.

7) Convert some timestamps to time64_t.
From Arnd Bergmann.

8) Don't check the offload_handle in xfrm code,
it is an opaque data cookie for the driver.
From Shannon Nelson.

9) Remove xfrmi interface ID from flowi. After this pach
no generic code is touched anymore to do xfrm interface
lookups. From Benedict Wong.

10) Allow to update the xfrm interface ID on SA updates.
From Nathan Harold.

11) Don't pass zero to ERR_PTR() in xfrm_resolve_and_create_bundle.
From YueHaibing.

12) Return more detailed errors on xfrm interface creation.
From Benedict Wong.

13) Use PTR_ERR_OR_ZERO instead of IS_ERR + PTR_ERR.
From the kbuild test robot.
====================

Signed-off-by: David S. Miller

David S. Miller
2018-07-28 00:33:37 +0800

19 Jul, 2018

1 commit

f15f084ff pktgen: convert safe uses of strncpy() to strcpy() to avoid string truncation warning ... Browse Code »

GCC 8 complains:

net/core/pktgen.c: In function ‘pktgen_if_write’:
net/core/pktgen.c:1419:4: warning: ‘strncpy’ output may be truncated copying between 0 and 31 bytes from a string of length 127 [-Wstringop-truncation]
strncpy(pkt_dev->src_max, buf, len);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
net/core/pktgen.c:1399:4: warning: ‘strncpy’ output may be truncated copying between 0 and 31 bytes from a string of length 127 [-Wstringop-truncation]
strncpy(pkt_dev->src_min, buf, len);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
net/core/pktgen.c:1290:4: warning: ‘strncpy’ output may be truncated copying between 0 and 31 bytes from a string of length 127 [-Wstringop-truncation]
strncpy(pkt_dev->dst_max, buf, len);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
net/core/pktgen.c:1268:4: warning: ‘strncpy’ output may be truncated copying between 0 and 31 bytes from a string of length 127 [-Wstringop-truncation]
strncpy(pkt_dev->dst_min, buf, len);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

There is no bug here, but the code is not perfect either. It copies
sizeof(pkt_dev->/member/) - 1 from user space into buf, and then does
a strcmp(pkt_dev->/member/, buf) hence assuming buf will be null-terminated
and shorter than pkt_dev->/member/ (pkt_dev->/member/ is never
explicitly null-terminated, and strncpy() doesn't have to null-terminate
so the assumption must be on buf). The use of strncpy() without explicit
null-termination looks suspicious. Convert to use straight strcpy().

strncpy() would also null-pad the output, but that's clearly unnecessary
since the author calls memset(pkt_dev->/member/, 0, sizeof(..)); prior
to strncpy(), anyway.

While at it format the code for "dst_min", "dst_max", "src_min" and
"src_max" in the same way by removing extra new lines in one case.

Signed-off-by: Jakub Kicinski
Reviewed-by: Jiong Wang
Signed-off-by: David S. Miller

Jakub Kicinski
2018-07-19 06:24:04 +0800

23 Jun, 2018

1 commit

7e6526404 xfrm: Add a new lookup key to match xfrm interfaces. ... Browse Code »

This patch adds the xfrm interface id as a lookup key
for xfrm states and policies. With this we can assign
states and policies to virtual xfrm interfaces.

Signed-off-by: Steffen Klassert
Acked-by: Shannon Nelson
Acked-by: Benedict Wong
Tested-by: Benedict Wong
Tested-by: Antony Antony
Reviewed-by: Eyal Birger

Steffen Klassert
2018-06-23 22:07:15 +0800

13 Jun, 2018

1 commit

fd7becedb treewide: Use array_size() in vzalloc_node() ... Browse Code »

The vzalloc_node() function has no 2-factor argument form, so
multiplication factors need to be wrapped in array_size(). This patch
replaces cases of:

vzalloc_node(a * b, node)

with:
vzalloc_node(array_size(a, b), node)

as well as handling cases of:

vzalloc_node(a * b * c, node)

with:

vzalloc_node(array3_size(a, b, c), node)

This does, however, attempt to ignore constant size factors like:

vzalloc_node(4 * 1024, node)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
vzalloc_node(
- (sizeof(TYPE)) * E
+ sizeof(TYPE) * E
, ...)
|
vzalloc_node(
- (sizeof(THING)) * E
+ sizeof(THING) * E
, ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
vzalloc_node(
- sizeof(u8) * (COUNT)
+ COUNT
, ...)
|
vzalloc_node(
- sizeof(__u8) * (COUNT)
+ COUNT
, ...)
|
vzalloc_node(
- sizeof(char) * (COUNT)
+ COUNT
, ...)
|
vzalloc_node(
- sizeof(unsigned char) * (COUNT)
+ COUNT
, ...)
|
vzalloc_node(
- sizeof(u8) * COUNT
+ COUNT
, ...)
|
vzalloc_node(
- sizeof(__u8) * COUNT
+ COUNT
, ...)
|
vzalloc_node(
- sizeof(char) * COUNT
+ COUNT
, ...)
|
vzalloc_node(
- sizeof(unsigned char) * COUNT
+ COUNT
, ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
vzalloc_node(
- sizeof(TYPE) * (COUNT_ID)
+ array_size(COUNT_ID, sizeof(TYPE))
, ...)
|
vzalloc_node(
- sizeof(TYPE) * COUNT_ID
+ array_size(COUNT_ID, sizeof(TYPE))
, ...)
|
vzalloc_node(
- sizeof(TYPE) * (COUNT_CONST)
+ array_size(COUNT_CONST, sizeof(TYPE))
, ...)
|
vzalloc_node(
- sizeof(TYPE) * COUNT_CONST
+ array_size(COUNT_CONST, sizeof(TYPE))
, ...)
|
vzalloc_node(
- sizeof(THING) * (COUNT_ID)
+ array_size(COUNT_ID, sizeof(THING))
, ...)
|
vzalloc_node(
- sizeof(THING) * COUNT_ID
+ array_size(COUNT_ID, sizeof(THING))
, ...)
|
vzalloc_node(
- sizeof(THING) * (COUNT_CONST)
+ array_size(COUNT_CONST, sizeof(THING))
, ...)
|
vzalloc_node(
- sizeof(THING) * COUNT_CONST
+ array_size(COUNT_CONST, sizeof(THING))
, ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

vzalloc_node(
- SIZE * COUNT
+ array_size(COUNT, SIZE)
, ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
vzalloc_node(
- sizeof(TYPE) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
vzalloc_node(
- sizeof(TYPE) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
vzalloc_node(
- sizeof(TYPE) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
vzalloc_node(
- sizeof(TYPE) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
vzalloc_node(
- sizeof(THING) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
vzalloc_node(
- sizeof(THING) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
vzalloc_node(
- sizeof(THING) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
vzalloc_node(
- sizeof(THING) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
vzalloc_node(
- sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
vzalloc_node(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
vzalloc_node(
- sizeof(THING1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
vzalloc_node(
- sizeof(THING1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
vzalloc_node(
- sizeof(TYPE1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
|
vzalloc_node(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
vzalloc_node(
- (COUNT) * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
vzalloc_node(
- COUNT * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
vzalloc_node(
- COUNT * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
vzalloc_node(
- (COUNT) * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
vzalloc_node(
- COUNT * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
vzalloc_node(
- (COUNT) * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
vzalloc_node(
- (COUNT) * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
vzalloc_node(
- COUNT * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
)

// Any remaining multi-factor products, first at least 3-factor products
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
vzalloc_node(C1 * C2 * C3, ...)
|
vzalloc_node(
- E1 * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
)

// And then all remaining 2 factors products when they're not all constants.
@@
expression E1, E2;
constant C1, C2;
@@

(
vzalloc_node(C1 * C2, ...)
|
vzalloc_node(
- E1 * E2
+ array_size(E1, E2)
, ...)
)

Signed-off-by: Kees Cook

Kees Cook
2018-06-13 07:19:22 +0800

28 Mar, 2018

1 commit

2f635ceeb net: Drop pernet_operations::async ... Browse Code »

Synchronous pernet_operations are not allowed anymore.
All are asynchronous. So, drop the structure member.

Signed-off-by: Kirill Tkhai
Signed-off-by: David S. Miller

Kirill Tkhai
2018-03-28 01:18:09 +0800

14 Mar, 2018

2 commits

29d1df72c pktgen: Fix memory leak in pktgen_if_write ... Browse Code »

_buf_ is an array and the one that must be freed is _tp_ instead.

Fixes: a870a02cc963 ("pktgen: use dynamic allocation for debug print buffer")
Reported-by: Wang Jian
Signed-off-by: Gustavo A. R. Silva
Acked-by: Arnd Bergmann
Signed-off-by: David S. Miller

Gustavo A. R. Silva
2018-03-14 22:02:15 +0800
a870a02cc pktgen: use dynamic allocation for debug print buffer ... Browse Code »

After the removal of the VLA, we get a harmless warning about a large
stack frame:

net/core/pktgen.c: In function 'pktgen_if_write':
net/core/pktgen.c:1710:1: error: the frame size of 1076 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

The function was previously shown to be safe despite hitting
the 1024 bye warning level. To get rid of the annoyging warning,
while keeping it readable, this changes it to use strndup_user().

Obviously this is not a fast path, so the kmalloc() overhead
can be disregarded.

Fixes: 35951393bbff ("pktgen: Remove VLA usage")
Signed-off-by: Arnd Bergmann
Signed-off-by: David S. Miller

Arnd Bergmann
2018-03-14 08:25:26 +0800

10 Mar, 2018

1 commit

35951393b pktgen: Remove VLA usage ... Browse Code »

In preparation to enabling -Wvla, remove VLA usage and replace it
with a fixed-length array instead.

Signed-off-by: Gustavo A. R. Silva
Signed-off-by: David S. Miller

Gustavo A. R. Silva
2018-03-10 00:57:17 +0800

09 Mar, 2018

1 commit

59d269731 net: Convert pg_net_ops ... Browse Code »

These pernet_operations create per-net pktgen threads
and /proc entries. These pernet subsys looks closed
in itself, and there are no pernet_operations outside
this file, which are interested in the threads.
Init and/or exit methods look safe to be executed
in parallel.

Signed-off-by: Kirill Tkhai
Signed-off-by: David S. Miller

Kirill Tkhai
2018-03-09 01:36:44 +0800

25 Jan, 2018

4 commits

52e12d5da pktgen: Clean read user supplied flag mess ... Browse Code »

Don't use error-prone-brute-force way.

Signed-off-by: Dmitry Safonov
Signed-off-by: David S. Miller

Dmitry Safonov
2018-01-25 04:03:36 +0800
99c6d3d20 pktgen: Remove brute-force printing of flags ... Browse Code »

Add macro generated pkt_flag_names array, with a little help of which
the flags can be printed by using an index.

Signed-off-by: Dmitry Safonov
Signed-off-by: David S. Miller

Dmitry Safonov
2018-01-25 04:03:36 +0800
6f107c741 pktgen: Add behaviour flags macro to generate flags/names ... Browse Code »

PKT_FALGS macro will be used to add package behavior names definitions
to simplify the code that prints/reads pkg flags.
Sorted the array in order of printing the flags in pktgen_if_show()
Note: Renamed IPSEC_ON => IPSEC for simplicity.

No visible behavior change expected.

Signed-off-by: Dmitry Safonov
Signed-off-by: David S. Miller

Dmitry Safonov
2018-01-25 04:03:36 +0800
57a5749b0 pktgen: Add missing !flag parameters ... Browse Code »

o FLOW_SEQ now can be disabled with pgset "flag !FLOW_SEQ"
o FLOW_SEQ and FLOW_RND are antonyms, as it's shown by pktgen_if_show()
o IPSEC now may be disabled

Note, that IPV6 is enabled with dst6/src6 parameters, not with
a flag parameter.

Signed-off-by: Dmitry Safonov
Signed-off-by: David S. Miller

Dmitry Safonov
2018-01-25 04:03:36 +0800

17 Jan, 2018

1 commit

96890d625 net: delete /proc THIS_MODULE references ... Browse Code »

/proc has been ignoring struct file_operations::owner field for 10 years.
Specifically, it started with commit 786d7e1612f0b0adb6046f19b906609e4fe8b1ba
("Fix rmmod/read/write races in /proc entries"). Notice the chunk where
inode->i_fop is initialized with proxy struct file_operations for
regular files:

- if (de->proc_fops)
- inode->i_fop = de->proc_fops;
+ if (de->proc_fops) {
+ if (S_ISREG(inode->i_mode))
+ inode->i_fop = &proc_reg_file_ops;
+ else
+ inode->i_fop = de->proc_fops;
+ }

VFS stopped pinning module at this point.

Signed-off-by: Alexey Dobriyan
Signed-off-by: David S. Miller

Alexey Dobriyan
2018-01-17 04:01:33 +0800

30 Nov, 2017

1 commit

b6ca8bd5a xfrm: Move child route linkage into xfrm_dst. ... Browse Code »

XFRM bundle child chains look like this:

xdst1 --> xdst2 --> xdst3 --> path_dst

All of xdstN are xfrm_dst objects and xdst->u.dst.xfrm is non-NULL.
The final child pointer in the chain, here called 'path_dst', is some
other kind of route such as an ipv4 or ipv6 one.

The xfrm output path pops routes, one at a time, via the child
pointer, until we hit one which has a dst->xfrm pointer which
is NULL.

We can easily preserve the above mechanisms with child sitting
only in the xfrm_dst structure. All children in the chain
before we break out of the xfrm_output() loop have dst->xfrm
non-NULL and are therefore xfrm_dst objects.

Since we break out of the loop when we find dst->xfrm NULL, we
will not try to dereference 'dst' as if it were an xfrm_dst.

Signed-off-by: David S. Miller

David Miller
2017-11-30 22:54:26 +0800

16 Nov, 2017

1 commit

5bbcc0f59 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next ... Browse Code »

Pull networking updates from David Miller:
"Highlights:

1) Maintain the TCP retransmit queue using an rbtree, with 1GB
windows at 100Gb this really has become necessary. From Eric
Dumazet.

2) Multi-program support for cgroup+bpf, from Alexei Starovoitov.

3) Perform broadcast flooding in hardware in mv88e6xxx, from Andrew
Lunn.

4) Add meter action support to openvswitch, from Andy Zhou.

5) Add a data meta pointer for BPF accessible packets, from Daniel
Borkmann.

6) Namespace-ify almost all TCP sysctl knobs, from Eric Dumazet.

7) Turn on Broadcom Tags in b53 driver, from Florian Fainelli.

8) More work to move the RTNL mutex down, from Florian Westphal.

9) Add 'bpftool' utility, to help with bpf program introspection.
From Jakub Kicinski.

10) Add new 'cpumap' type for XDP_REDIRECT action, from Jesper
Dangaard Brouer.

11) Support 'blocks' of transformations in the packet scheduler which
can span multiple network devices, from Jiri Pirko.

12) TC flower offload support in cxgb4, from Kumar Sanghvi.

13) Priority based stream scheduler for SCTP, from Marcelo Ricardo
Leitner.

14) Thunderbolt networking driver, from Amir Levy and Mika Westerberg.

15) Add RED qdisc offloadability, and use it in mlxsw driver. From
Nogah Frankel.

16) eBPF based device controller for cgroup v2, from Roman Gushchin.

17) Add some fundamental tracepoints for TCP, from Song Liu.

18) Remove garbage collection from ipv6 route layer, this is a
significant accomplishment. From Wei Wang.

19) Add multicast route offload support to mlxsw, from Yotam Gigi"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2177 commits)
tcp: highest_sack fix
geneve: fix fill_info when link down
bpf: fix lockdep splat
net: cdc_ncm: GetNtbFormat endian fix
openvswitch: meter: fix NULL pointer dereference in ovs_meter_cmd_reply_start
netem: remove unnecessary 64 bit modulus
netem: use 64 bit divide by rate
tcp: Namespace-ify sysctl_tcp_default_congestion_control
net: Protect iterations over net::fib_notifier_ops in fib_seq_sum()
ipv6: set all.accept_dad to 0 by default
uapi: fix linux/tls.h userspace compilation error
usbnet: ipheth: prevent TX queue timeouts when device not ready
vhost_net: conditionally enable tx polling
uapi: fix linux/rxrpc.h userspace compilation errors
net: stmmac: fix LPI transitioning for dwmac4
atm: horizon: Fix irq release error
net-sysfs: trigger netlink notification on ifalias change via sysfs
openvswitch: Using kfree_rcu() to simplify the code
openvswitch: Make local function ovs_nsh_key_attr_size() static
openvswitch: Fix return value check in ovs_meter_cmd_features()
...

Linus Torvalds
2017-11-16 03:56:19 +0800

08 Nov, 2017

1 commit

7f5d3f272 pktgen: document 32-bit timestamp overflow ... Browse Code »

Timestamps in pktgen are currently retrieved using the deprecated
do_gettimeofday() function that wraps its signed 32-bit seconds in 2038
(on 32-bit architectures) and requires a division operation to calculate
microseconds.

The pktgen header is also defined with the same limitations, hardcoding
to a 32-bit seconds field that can be interpreted as unsigned to produce
times that only wrap in 2106. Whatever code reads the timestamps should
be aware of that problem in general, but probably doesn't care too
much as we are mostly interested in the time passing between packets,
and that is correctly represented.

Using 64-bit nanoseconds would be cheaper and good for 584 years. Using
monotonic times would also make this unambiguous by avoiding the overflow,
but would make it harder to correlate to the times with those on remote
machines. Either approach would require adding a new runtime flag and
implementing the same thing on the remote side, which we probably don't
want to do unless someone sees it as a real problem. Also, this should
be coordinated with other pktgen implementations and might need a new
magic number.

For the moment, I'm documenting the overflow in the source code, and
changing the implementation over to an open-coded ktime_get_real_ts64()
plus division, so we don't have to look at it again while scanning for
deprecated time interfaces.

Signed-off-by: Arnd Bergmann
Signed-off-by: David S. Miller

Arnd Bergmann
2017-11-08 14:56:12 +0800

05 Nov, 2017

1 commit

df7e8e2e3 pktgen: do not abuse IN6_ADDR_HSIZE ... Browse Code »

pktgen accidentally used IN6_ADDR_HSIZE, instead of using the size of an
IPv6 address.

Since IN6_ADDR_HSIZE recently was increased from 16 to 256, this old
bug is hitting us.

Fixes: 3f27fb23219e ("ipv6: addrconf: add per netns perturbation in inet6_addr_hash()")
Signed-off-by: Eric Dumazet
Reported-by: Dan Carpenter
Signed-off-by: David S. Miller

Eric Dumazet
2017-11-05 08:16:46 +0800

25 Oct, 2017

1 commit

6aa7de059 locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to … ... Browse Code »

…READ_ONCE()/WRITE_ONCE()

Please do not apply this to mainline directly, instead please re-run the
coccinelle script shown below and apply its output.

For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't harmful, and changing them results in
churn.

However, for some features, the read/write distinction is critical to
correct operation. To distinguish these cases, separate read/write
accessors must be used. This patch migrates (most) remaining
ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
coccinelle script:

----
// Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
// WRITE_ONCE()

// $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch

virtual patch

@ depends on patch @
expression E1, E2;
@@

- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)

@ depends on patch @
expression E;
@@

- ACCESS_ONCE(E)
+ READ_ONCE(E)
----

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: davem@davemloft.net
Cc: linux-arch@vger.kernel.org
Cc: mpe@ellerman.id.au
Cc: shuah@kernel.org
Cc: snitzer@redhat.com
Cc: thor.thayer@linux.intel.com
Cc: tj@kernel.org
Cc: viro@zeniv.linux.org.uk
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Mark Rutland
2017-10-25 17:01:08 +0800

01 Jul, 2017

1 commit

633547973 net: convert sk_buff.users from atomic_t to refcount_t ... Browse Code »

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova
Signed-off-by: Hans Liljestrand
Signed-off-by: Kees Cook
Signed-off-by: David Windsor
Signed-off-by: David S. Miller

Reshetova, Elena
2017-07-01 22:39:07 +0800

16 Jun, 2017

3 commits

d58ff3512 networking: make skb_push & __skb_push return void pointers ... Browse Code »

It seems like a historic accident that these return unsigned char *,
and in many places that means casts are required, more often than not.

Make these functions return void * and remove all the casts across
the tree, adding a (u8 *) cast only where the unsigned char pointer
was used directly, all done with the following spatch:

@@
expression SKB, LEN;
typedef u8;
identifier fn = { skb_push, __skb_push, skb_push_rcsum };
@@
- *(fn(SKB, LEN))
+ *(u8 *)fn(SKB, LEN)

@@
expression E, SKB, LEN;
identifier fn = { skb_push, __skb_push, skb_push_rcsum };
type T;
@@
- E = ((T *)(fn(SKB, LEN)))
+ E = fn(SKB, LEN)

@@
expression SKB, LEN;
identifier fn = { skb_push, __skb_push, skb_push_rcsum };
@@
- fn(SKB, LEN)[0]
+ *(u8 *)fn(SKB, LEN)

Note that the last part there converts from push(...)[0] to the
more idiomatic *(u8 *)push(...).

Signed-off-by: Johannes Berg
Signed-off-by: David S. Miller

Johannes Berg
2017-06-16 23:48:40 +0800
4df864c1d networking: make skb_put & friends return void pointers ... Browse Code »

It seems like a historic accident that these return unsigned char *,
and in many places that means casts are required, more often than not.

Make these functions (skb_put, __skb_put and pskb_put) return void *
and remove all the casts across the tree, adding a (u8 *) cast only
where the unsigned char pointer was used directly, all done with the
following spatch:

@@
expression SKB, LEN;
typedef u8;
identifier fn = { skb_put, __skb_put };
@@
- *(fn(SKB, LEN))
+ *(u8 *)fn(SKB, LEN)

@@
expression E, SKB, LEN;
identifier fn = { skb_put, __skb_put };
type T;
@@
- E = ((T *)(fn(SKB, LEN)))
+ E = fn(SKB, LEN)

which actually doesn't cover pskb_put since there are only three
users overall.

A handful of stragglers were converted manually, notably a macro in
drivers/isdn/i4l/isdn_bsdcomp.c and, oddly enough, one of the many
instances in net/bluetooth/hci_sock.c. In the former file, I also
had to fix one whitespace problem spatch introduced.

Signed-off-by: Johannes Berg
Signed-off-by: David S. Miller

Johannes Berg
2017-06-16 23:48:39 +0800
b080db585 networking: convert many more places to skb_put_zero() ... Browse Code »

There were many places that my previous spatch didn't find,
as pointed out by yuan linyu in various patches.

The following spatch found many more and also removes the
now unnecessary casts:

@@
identifier p, p2;
expression len;
expression skb;
type t, t2;
@@
(
-p = skb_put(skb, len);
+p = skb_put_zero(skb, len);
|
-p = (t)skb_put(skb, len);
+p = skb_put_zero(skb, len);
)
... when != p
(
p2 = (t2)p;
-memset(p2, 0, len);
|
-memset(p, 0, len);
)

@@
type t, t2;
identifier p, p2;
expression skb;
@@
t *p;
...
(
-p = skb_put(skb, sizeof(t));
+p = skb_put_zero(skb, sizeof(t));
|
-p = (t *)skb_put(skb, sizeof(t));
+p = skb_put_zero(skb, sizeof(t));
)
... when != p
(
p2 = (t2)p;
-memset(p2, 0, sizeof(*p));
|
-memset(p, 0, sizeof(*p));
)

@@
expression skb, len;
@@
-memset(skb_put(skb, len), 0, len);
+skb_put_zero(skb, len);

Apply it to the tree (with one manual fixup to keep the
comment in vxlan.c, which spatch removed.)

Signed-off-by: Johannes Berg
Signed-off-by: David S. Miller

Johannes Berg
2017-06-16 23:48:35 +0800

09 Jan, 2017

1 commit

a5135bcfb net-tc: convert tc_verd to integer bitfields ... Browse Code »

Extract the remaining two fields from tc_verd and remove the __u16
completely. TC_AT and TC_FROM are converted to equivalent two-bit
integer fields tc_at and tc_from. Where possible, use existing
helper skb_at_tc_ingress when reading tc_at. Introduce helper
skb_reset_tc to clear fields.

Not documenting tc_from and tc_at, because they will be replaced
with single bit fields in follow-on patches.

Signed-off-by: Willem de Bruijn
Signed-off-by: David S. Miller

Willem de Bruijn
2017-01-09 09:58:52 +0800

18 Nov, 2016

1 commit

c7d03a00b netns: make struct pernet_operations::id unsigned int ... Browse Code »

Make struct pernet_operations::id unsigned.

There are 2 reasons to do so:

1)
This field is really an index into an zero based array and
thus is unsigned entity. Using negative value is out-of-bound
access by definition.

2)
On x86_64 unsigned 32-bit data which are mixed with pointers
via array indexing or offsets added or subtracted to pointers
are preffered to signed 32-bit data.

"int" being used as an array index needs to be sign-extended
to 64-bit before being used.

void f(long *p, int i)
{
g(p[i]);
}

roughly translates to

movsx rsi, esi
mov rdi, [rsi+...]
call g

MOVSX is 3 byte instruction which isn't necessary if the variable is
unsigned because x86_64 is zero extending by default.

Now, there is net_generic() function which, you guessed it right, uses
"int" as an array index:

static inline void *net_generic(const struct net *net, int id)
{
...
ptr = ng->ptr[id - 1];
...
}

And this function is used a lot, so those sign extensions add up.

Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
messing with code generation):

add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)

Unfortunately some functions actually grow bigger.
This is a semmingly random artefact of code generation with register
allocator being used differently. gcc decides that some variable
needs to live in new r8+ registers and every access now requires REX
prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
used which is longer than [r8]

However, overall balance is in negative direction:

add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
function old new delta
nfsd4_lock 3886 3959 +73
tipc_link_build_proto_msg 1096 1140 +44
mac80211_hwsim_new_radio 2776 2808 +32
tipc_mon_rcv 1032 1058 +26
svcauth_gss_legacy_init 1413 1429 +16
tipc_bcbase_select_primary 379 392 +13
nfsd4_exchange_id 1247 1260 +13
nfsd4_setclientid_confirm 782 793 +11
...
put_client_renew_locked 494 480 -14
ip_set_sockfn_get 730 716 -14
geneve_sock_add 829 813 -16
nfsd4_sequence_done 721 703 -18
nlmclnt_lookup_host 708 686 -22
nfsd4_lockt 1085 1063 -22
nfs_get_client 1077 1050 -27
tcf_bpf_init 1106 1076 -30
nfsd4_encode_fattr 5997 5930 -67
Total: Before=154856051, After=154854321, chg -0.00%

Signed-off-by: Alexey Dobriyan
Signed-off-by: David S. Miller

Alexey Dobriyan
2016-11-18 23:59:15 +0800

17 Oct, 2016

1 commit

9a0b1e8ba net: pktgen: remove rcu locking in pktgen_change_name() ... Browse Code »

After Jesper commit back in linux-3.18, we trigger a lockdep
splat in proc_create_data() while allocating memory from
pktgen_change_name().

This patch converts t->if_lock to a mutex, since it is now only
used from control path, and adds proper locking to pktgen_change_name()

1) pktgen_thread_lock to protect the outer loop (iterating threads)
2) t->if_lock to protect the inner loop (iterating devices)

Note that before Jesper patch, pktgen_change_name() was lacking proper
protection, but lockdep was not able to detect the problem.

Fixes: 8788370a1d4b ("pktgen: RCU-ify "if_list" to remove lock in next_to_run()")
Reported-by: John Sperbeck
Signed-off-by: Eric Dumazet
Cc: Jesper Dangaard Brouer
Signed-off-by: David S. Miller

Eric Dumazet
2016-10-17 22:52:59 +0800

03 Oct, 2016

1 commit

63d75463c net: pktgen: fix pkt_size ... Browse Code »

The commit 879c7220e828 ("net: pktgen: Observe needed_headroom
of the device") increased the 'pkt_overhead' field value by
LL_RESERVED_SPACE.
As a side effect the generated packet size, computed as:

/* Eth + IPh + UDPh + mpls */
datalen = pkt_dev->cur_pkt_size - 14 - 20 - 8 -
pkt_dev->pkt_overhead;

is decreased by the same value.
The above changed slightly the behavior of existing pktgen users,
and made the procfs interface somewhat inconsistent.
Fix it by restoring the previous pkt_overhead value and using
LL_RESERVED_SPACE as extralen in skb allocation.
Also, change pktgen_alloc_skb() to only partially reserve
the headroom to allow the caller to prefetch from ll header
start.

v1 -> v2:
- fixed some typos in the comments

Fixes: 879c7220e828 ("net: pktgen: Observe needed_headroom of the device")
Suggested-by: Ben Greear
Signed-off-by: Paolo Abeni
Signed-off-by: David S. Miller

Paolo Abeni
2016-10-03 13:29:57 +0800

05 Jul, 2016

1 commit

0967f2445 net: pktgen: support injecting packets for qdisc testing ... Browse Code »

Add another xmit_mode to pktgen to allow testing xmit functionality
of qdiscs. The new mode "queue_xmit" injects packets at
__dev_queue_xmit() so that qdisc is called.

Signed-off-by: John Fastabend
Signed-off-by: David S. Miller

John Fastabend
2016-07-05 07:07:34 +0800

13 Jun, 2016

1 commit

99860208b sched: remove NET_XMIT_POLICED ... Browse Code »

sch_atm returns this when TC_ACT_SHOT classification occurs.

But all other schedulers that use tc_classify
(htb, hfsc, drr, fq_codel ...) return NET_XMIT_SUCCESS | __BYPASS
in this case so just do that in atm.

BATMAN uses it as an intermediate return value to signal
forwarding vs. buffering, but it did not return POLICED to
callers outside of BATMAN.

Reviewed-by: Sven Eckelmann
Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller

Florian Westphal
2016-06-13 10:02:11 +0800

01 Jun, 2016

1 commit

bcf91bdb4 net: pktgen: Call destroy_hrtimer_on_stack() ... Browse Code »

If CONFIG_DEBUG_OBJECTS_TIMERS=y, hrtimer_init_on_stack() requires
a matching call to destroy_hrtimer_on_stack() to clean up timer
debug objects.

Signed-off-by: Guenter Roeck
Signed-off-by: David S. Miller

Guenter Roeck
2016-06-01 02:44:08 +0800

27 Apr, 2016

1 commit

f0cdf76c1 net: remove NETDEV_TX_LOCKED support ... Browse Code »

No more users in the tree, remove NETDEV_TX_LOCKED support.
Adds another hole in softnet_stats struct, but better than keeping
the unused collision counter around.

Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller

Florian Westphal
2016-04-27 03:53:05 +0800

02 Mar, 2016

1 commit

c145aeb3f net: pktgen: use reset to set mac header ... Browse Code »

Since offset is zero, it's not necessary to use set function.

Signed-off-by: Zhang Shengju
Signed-off-by: David S. Miller

Zhang Shengju
2016-03-02 06:46:54 +0800

12 Jan, 2016

2 commits

9d367eddf Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/net/bonding/bond_main.c
drivers/net/ethernet/mellanox/mlxsw/spectrum.h
drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c

The bond_main.c and mellanox switch conflicts were cases of
overlapping changes.

Signed-off-by: David S. Miller

David S. Miller
2016-01-12 12:55:43 +0800
3de03596d net: pktgen: fix null ptr deref in skb allocation ... Browse Code »

Fix possible null pointer dereference that may occur when calling
skb_reserve() on a null skb.

Fixes: 879c7220e82 ("net: pktgen: Observe needed_headroom of the device")
Signed-off-by: John Fastabend
Signed-off-by: David S. Miller

John Fastabend
2016-01-12 06:39:44 +0800

16 Dec, 2015

1 commit

c8cd0989b net: Eliminate NETIF_F_GEN_CSUM and NETIF_F_V[46]_CSUM ... Browse Code »

These netif flags are unnecessary convolutions. It is more
straightforward to just use NETIF_F_HW_CSUM, NETIF_F_IP_CSUM,
and NETIF_F_IPV6_CSUM directly.

This patch also:
- Cleans up can_checksum_protocol
- Simplifies netdev_intersect_features

Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller

Tom Herbert
2015-12-16 05:50:20 +0800

14 Aug, 2015

1 commit

182ad468e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/net/ethernet/cavium/Kconfig

The cavium conflict was overlapping dependency
changes.

Signed-off-by: David S. Miller

David S. Miller
2015-08-14 07:23:11 +0800

07 Aug, 2015

1 commit

7ba8bd75d net: pktgen: don't abuse current->state in pktgen_thread_worker() ... Browse Code »

Commit 1fbe4b46caca "net: pktgen: kill the Wait for kthread_stop
code in pktgen_thread_worker()" removed (in particular) the final
__set_current_state(TASK_RUNNING) and I didn't notice the previous
set_current_state(TASK_INTERRUPTIBLE). This triggers the warning
in __might_sleep() after return.

Afaics, we can simply remove both set_current_state()'s, and we
could do this a long ago right after ef87979c273a2 "pktgen: better
scheduler friendliness" which changed pktgen_thread_worker() to
use wait_event_interruptible_timeout().

Reported-by: Huang Ying
Signed-off-by: Oleg Nesterov
Signed-off-by: David S. Miller

Oleg Nesterov
2015-08-07 14:52:44 +0800

30 Jul, 2015

1 commit

1ce4b2f44 net: pktgen: Remove unused 'allocated_skbs' field ... Browse Code »

Field pktgen_dev.allocated_skbs had been written to, but never read
from. The number of allocated skbs can be deduced anyway, from the total
number of sent packets and the 'clone_skb' param.

Signed-off-by: Bogdan Hamciuc
Signed-off-by: David S. Miller

Bogdan Hamciuc
2015-07-30 14:00:39 +0800