Eric Lee / smarc-fsl-linux-kernel

02 Nov, 2017

1 commit

b24413180 License cleanup: add SPDX GPL-2.0 license identifier to files with no license ... Browse Code »

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if
Reviewed-by: Philippe Ombredanne
Reviewed-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2017-11-02 18:10:55 +0800

22 Oct, 2017

6 commits

5670a8471 Merge branch 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull smp/hotplug fix from Thomas Gleixner:
"The recent rework of the callback invocation missed to cleanup the
leftovers of the operation, so under certain circumstances a
subsequent CPU hotplug operation accesses stale data and crashes.
Clean it up."

* 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpu/hotplug: Reset node state after operation

Linus Torvalds
2017-10-22 18:54:42 +0800
4f184d7d8 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull irq fixes from Thomas Gleixner:
"A set of small fixes mostly in the irq drivers area:

- Make the tango irq chip work correctly, which requires a new
function in the generiq irq chip implementation

- A set of updates to the GIC-V3 ITS driver removing a bogus BUG_ON()
and parsing the VCPU table size correctly"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: generic chip: remove irq_gc_mask_disable_reg_and_ack()
irqchip/tango: Use irq_gc_mask_disable_and_ack_set
genirq: generic chip: Add irq_gc_mask_disable_and_ack_set()
irqchip/gic-v3-its: Add missing changes to support 52bit physical address
irqchip/gic-v3-its: Fix the incorrect parsing of VCPU table size
irqchip/gic-v3-its: Fix the incorrect BUG_ON in its_init_vpe_domain()
DT: arm,gic-v3: Update the ITS size in the examples

Linus Torvalds
2017-10-22 18:42:58 +0800
b5ac3beb5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:
"A little more than usual this time around. Been travelling, so that is
part of it.

Anyways, here are the highlights:

1) Deal with memcontrol races wrt. listener dismantle, from Eric
Dumazet.

2) Handle page allocation failures properly in nfp driver, from Jaku
Kicinski.

3) Fix memory leaks in macsec, from Sabrina Dubroca.

4) Fix crashes in pppol2tp_session_ioctl(), from Guillaume Nault.

5) Several fixes in bnxt_en driver, including preventing potential
NVRAM parameter corruption from Michael Chan.

6) Fix for KRACK attacks in wireless, from Johannes Berg.

7) rtnetlink event generation fixes from Xin Long.

8) Deadlock in mlxsw driver, from Ido Schimmel.

9) Disallow arithmetic operations on context pointers in bpf, from
Jakub Kicinski.

10) Missing sock_owned_by_user() check in sctp_icmp_redirect(), from
Xin Long.

11) Only TCP is supported for sockmap, make that explicit with a
check, from John Fastabend.

12) Fix IP options state races in DCCP and TCP, from Eric Dumazet.

13) Fix panic in packet_getsockopt(), also from Eric Dumazet.

14) Add missing locked in hv_sock layer, from Dexuan Cui.

15) Various aquantia bug fixes, including several statistics handling
cures. From Igor Russkikh et al.

16) Fix arithmetic overflow in devmap code, from John Fastabend.

17) Fix busted socket memory accounting when we get a fault in the tcp
zero copy paths. From Willem de Bruijn.

18) Don't leave opt->tot_len uninitialized in ipv6, from Eric Dumazet"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits)
stmmac: Don't access tx_q->dirty_tx before netif_tx_lock
ipv6: flowlabel: do not leave opt->tot_len with garbage
of_mdio: Fix broken PHY IRQ in case of probe deferral
textsearch: fix typos in library helpers
rxrpc: Don't release call mutex on error pointer
net: stmmac: Prevent infinite loop in get_rx_timestamp_status()
net: stmmac: Fix stmmac_get_rx_hwtstamp()
net: stmmac: Add missing call to dev_kfree_skb()
mlxsw: spectrum_router: Configure TIGCR on init
mlxsw: reg: Add Tunneling IPinIP General Configuration Register
net: ethtool: remove error check for legacy setting transceiver type
soreuseport: fix initialization race
net: bridge: fix returning of vlan range op errors
sock: correct sk_wmem_queued accounting on efault in tcp zerocopy
bpf: add test cases to bpf selftests to cover all access tests
bpf: fix pattern matches for direct packet access
bpf: fix off by one for range markings with L{T, E} patterns
bpf: devmap fix arithmetic overflow in bitmap_size calculation
net: aquantia: Bad udp rate on default interrupt coalescing
net: aquantia: Enable coalescing management via ethtool interface
...

Linus Torvalds
2017-10-22 10:44:48 +0800
0fd4759c5 bpf: fix pattern matches for direct packet access ... Browse Code »

Alexander had a test program with direct packet access, where
the access test was in the form of data + X > data_end. In an
unrelated change to the program LLVM decided to swap the branches
and emitted code for the test in form of data + X
Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Acked-by: John Fastabend
Signed-off-by: David S. Miller

Daniel Borkmann
2017-10-22 07:56:09 +0800
fb2a311a3 bpf: fix off by one for range markings with L{T, E} patterns ... Browse Code »

During review I noticed that the current logic for direct packet
access marking in check_cond_jmp_op() has an off by one for the
upper right range border when marking in find_good_pkt_pointers()
with BPF_JLT and BPF_JLE. It's not really harmful given access
up to pkt_end is always safe, but we should nevertheless correct
the range marking before it becomes ABI. If pkt_data' denotes a
pkt_data derived pointer (pkt_data + X), then for pkt_data' < pkt_end
in the true branch as well as for pkt_end < pkt_end the verifier simulation cannot
deduce that a byte load of pkt_data' - 1 would succeed in this
branch.

Fixes: b4e432f1000a ("bpf: enable BPF_J{LT, LE, SLT, SLE} opcodes in verifier")
Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Acked-by: John Fastabend
Signed-off-by: David S. Miller

Daniel Borkmann
2017-10-22 07:56:09 +0800
8695a5395 bpf: devmap fix arithmetic overflow in bitmap_size calculation ... Browse Code »

An integer overflow is possible in dev_map_bitmap_size() when
calculating the BITS_TO_LONG logic which becomes, after macro
replacement,

(((n) + (d) - 1)/ (d))

where 'n' is a __u32 and 'd' is (8 * sizeof(long)). To avoid
overflow cast to u64 before arithmetic.

Reported-by: Richard Weinberger
Acked-by: Daniel Borkmann
Signed-off-by: John Fastabend
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

John Fastabend
2017-10-22 07:54:09 +0800

21 Oct, 2017

2 commits

1f7c70d6b cpu/hotplug: Reset node state after operation ... Browse Code »

The recent rework of the cpu hotplug internals changed the usage of the per
cpu state->node field, but missed to clean it up after usage.

So subsequent hotplug operations use the stale pointer from a previous
operation and hand it into the callback functions. The callbacks then
dereference a pointer which either belongs to a different facility or
points to freed and potentially reused memory. In either case data
corruption and crashes are the obvious consequence.

Reset the node and the last pointers in the per cpu state to NULL after the
operation which set them has completed.

Fixes: 96abb968549c ("smp/hotplug: Allow external multi-instance rollback")
Reported-by: Tvrtko Ursulin
Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Sebastian Andrzej Siewior
Cc: Boris Ostrovsky
Cc: "Paul E. McKenney"
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1710211606130.3213@nanos

Thomas Gleixner
2017-10-21 22:11:30 +0800
1c9fec470 waitid(): Avoid unbalanced user_access_end() on access_ok() error ... Browse Code »

As pointed out by Linus and David, the earlier waitid() fix resulted in
a (currently harmless) unbalanced user_access_end() call. This fixes it
to just directly return EFAULT on access_ok() failure.

Fixes: 96ca579a1ecc ("waitid(): Add missing access_ok() checks")
Acked-by: David Daney
Cc: Al Viro
Signed-off-by: Kees Cook
Signed-off-by: Linus Torvalds

Kees Cook
2017-10-21 03:32:54 +0800

20 Oct, 2017

6 commits

9ef2a8cd5 bpf: require CAP_NET_ADMIN when using devmap ... Browse Code »

Devmap is used with XDP which requires CAP_NET_ADMIN so lets also
make CAP_NET_ADMIN required to use the map.

Signed-off-by: John Fastabend
Acked-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

John Fastabend
2017-10-20 20:01:29 +0800
fb50df8d3 bpf: require CAP_NET_ADMIN when using sockmap maps ... Browse Code »

Restrict sockmap to CAP_NET_ADMIN.

Signed-off-by: John Fastabend
Acked-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

John Fastabend
2017-10-20 20:01:29 +0800
34f79502b bpf: avoid preempt enable/disable in sockmap using tcp_skb_cb region ... Browse Code »

SK_SKB BPF programs are run from the socket/tcp context but early in
the stack before much of the TCP metadata is needed in tcp_skb_cb. So
we can use some unused fields to place BPF metadata needed for SK_SKB
programs when implementing the redirect function.

This allows us to drop the preempt disable logic. It does however
require an API change so sk_redirect_map() has been updated to
additionally provide ctx_ptr to skb. Note, we do however continue to
disable/enable preemption around actual BPF program running to account
for map updates.

Signed-off-by: John Fastabend
Acked-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

John Fastabend
2017-10-20 20:01:29 +0800
435bf0d3f bpf: enforce TCP only support for sockmap ... Browse Code »

Only TCP sockets have been tested and at the moment the state change
callback only handles TCP sockets. This adds a check to ensure that
sockets actually being added are TCP sockets.

For net-next we can consider UDP support.

Signed-off-by: John Fastabend
Acked-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

John Fastabend
2017-10-20 20:01:29 +0800
27fdb35fe doc: Fix various RCU docbook comment-header problems ... Browse Code »

Because many of RCU's files have not been included into docbook, a
number of errors have accumulated. This commit fixes them.

Signed-off-by: Paul E. McKenney
Signed-off-by: Linus Torvalds

Paul E. McKenney
2017-10-20 10:26:11 +0800
a961e4091 membarrier: Provide register expedited private command ... Browse Code »

This introduces a "register private expedited" membarrier command which
allows eventual removal of important memory barrier constraints on the
scheduler fast-paths. It changes how the "private expedited" membarrier
command (new to 4.14) is used from user-space.

This new command allows processes to register their intent to use the
private expedited command. This affects how the expedited private
command introduced in 4.14-rc is meant to be used, and should be merged
before 4.14 final.

Processes are now required to register before using
MEMBARRIER_CMD_PRIVATE_EXPEDITED, otherwise that command returns EPERM.

This fixes a problem that arose when designing requested extensions to
sys_membarrier() to allow JITs to efficiently flush old code from
instruction caches. Several potential algorithms are much less painful
if the user register intent to use this functionality early on, for
example, before the process spawns the second thread. Registering at
this time removes the need to interrupt each and every thread in that
process at the first expedited sys_membarrier() system call.

Signed-off-by: Mathieu Desnoyers
Acked-by: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Alexander Viro
Signed-off-by: Linus Torvalds

Mathieu Desnoyers
2017-10-20 10:13:40 +0800

19 Oct, 2017

2 commits

bc6d5031b bpf: do not test for PCPU_MIN_UNIT_SIZE before percpu allocations ... Browse Code »

PCPU_MIN_UNIT_SIZE is an implementation detail of the percpu
allocator. Given we support __GFP_NOWARN now, lets just let
the allocation request fail naturally instead. The two call
sites from BPF mistakenly assumed __GFP_NOWARN would work, so
no changes needed to their actual __alloc_percpu_gfp() calls
which use the flag already.

Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Acked-by: John Fastabend
Signed-off-by: David S. Miller

Daniel Borkmann
2017-10-19 20:13:50 +0800
82f8dd28b bpf: fix splat for illegal devmap percpu allocation ... Browse Code »

It was reported that syzkaller was able to trigger a splat on
devmap percpu allocation due to illegal/unsupported allocation
request size passed to __alloc_percpu():

[ 70.094249] illegal size (32776) or align (8) for percpu allocation
[ 70.094256] ------------[ cut here ]------------
[ 70.094259] WARNING: CPU: 3 PID: 3451 at mm/percpu.c:1365 pcpu_alloc+0x96/0x630
[...]
[ 70.094325] Call Trace:
[ 70.094328] __alloc_percpu_gfp+0x12/0x20
[ 70.094330] dev_map_alloc+0x134/0x1e0
[ 70.094331] SyS_bpf+0x9bc/0x1610
[ 70.094333] ? selinux_task_setrlimit+0x5a/0x60
[ 70.094334] ? security_task_setrlimit+0x43/0x60
[ 70.094336] entry_SYSCALL_64_fastpath+0x1a/0xa5

This was due to too large max_entries for the map such that we
surpassed the upper limit of PCPU_MIN_UNIT_SIZE. It's fine to
fail naturally here, so switch to __alloc_percpu_gfp() and pass
__GFP_NOWARN instead.

Fixes: 11393cc9b9be ("xdp: Add batching support to redirect map")
Reported-by: Mark Rutland
Reported-by: Shankara Pailoor
Reported-by: Richard Weinberger
Signed-off-by: Daniel Borkmann
Cc: John Fastabend
Acked-by: Alexei Starovoitov
Acked-by: John Fastabend
Signed-off-by: David S. Miller

Daniel Borkmann
2017-10-19 20:13:50 +0800

18 Oct, 2017

1 commit

28e33f9d7 bpf: disallow arithmetic operations on context pointer ... Browse Code »

Commit f1174f77b50c ("bpf/verifier: rework value tracking")
removed the crafty selection of which pointer types are
allowed to be modified. This is OK for most pointer types
since adjust_ptr_min_max_vals() will catch operations on
immutable pointers. One exception is PTR_TO_CTX which is
now allowed to be offseted freely.

The intent of aforementioned commit was to allow context
access via modified registers. The offset passed to
->is_valid_access() verifier callback has been adjusted
by the value of the variable offset.

What is missing, however, is taking the variable offset
into account when the context register is used. Or in terms
of the code adding the offset to the value passed to the
->convert_ctx_access() callback. This leads to the following
eBPF user code:

r1 += 68
r0 = *(u32 *)(r1 + 8)
exit

being translated to this in kernel space:

0: (07) r1 += 68
1: (61) r0 = *(u32 *)(r1 +180)
2: (95) exit

Offset 8 is corresponding to 180 in the kernel, but offset
76 is valid too. Verifier will "accept" access to offset
68+8=76 but then "convert" access to offset 8 as 180.
Effective access to offset 248 is beyond the kernel context.
(This is a __sk_buff example on a debug-heavy kernel -
packet mark is 8 -> 180, 76 would be data.)

Dereferencing the modified context pointer is not as easy
as dereferencing other types, because we have to translate
the access to reading a field in kernel structures which is
usually at a different offset and often of a different size.
To allow modifying the pointer we would have to make sure
that given eBPF instruction will always access the same
field or the fields accessed are "compatible" in terms of
offset and size...

Disallow dereferencing modified context pointers and add
to selftests the test case described here.

Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
Signed-off-by: Jakub Kicinski
Acked-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Acked-by: Edward Cree
Signed-off-by: David S. Miller

Jakub Kicinski
2017-10-18 20:21:13 +0800

16 Oct, 2017

1 commit

3d51969ce Merge tag 'irqchip-4.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/… ... Browse Code »

…arm-platforms into irq/urgent

Pull irqchip updates for 4.14-rc5 from Marc Zyngier:

- Fix unfortunate mistake in the GICv3 ITS binding example
- Two fixes for the recently merged GICv4 support
- GICv3 ITS 52bit PA fixes
- Generic irqchip mask-ack fix, and its application to the tango irqchip

Thomas Gleixner
2017-10-16 16:26:46 +0800

15 Oct, 2017

4 commits

a339b3513 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler fixes from Ingo Molnar:
"Three fixes that address an SMP balancing performance regression"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/core: Ensure load_balance() respects the active_mask
sched/core: Address more wake_affine() regressions
sched/core: Fix wake_affine() performance regression

Linus Torvalds
2017-10-15 03:20:38 +0800
26c923ab1 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull perf fixes from Ingo Molnar:
"Some tooling fixes plus three kernel fixes: a memory leak fix, a
statistics fix and a crash fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/uncore: Fix memory leaks on allocation failures
perf/core: Fix cgroup time when scheduling descendants
perf/core: Avoid freeing static PMU contexts when PMU is unregistered
tools include uapi bpf.h: Sync kernel ABI header with tooling header
perf pmu: Unbreak perf record for arm/arm64 with events with explicit PMU
perf script: Add missing separator for "-F ip,brstack" (and brstackoff)
perf callchain: Compare dsos (as well) for CCKEY_FUNCTION

Linus Torvalds
2017-10-15 03:16:49 +0800
60a6ca6c9 Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull locking fixes from Ingo Molnar:
"Two lockdep fixes for bugs introduced by the cross-release dependency
tracking feature - plus a commit that disables it because performance
regressed in an absymal fashion on some systems"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/lockdep: Disable cross-release features for now
locking/selftest: Avoid false BUG report
locking/lockdep: Fix stacktrace mess

Linus Torvalds
2017-10-15 03:14:20 +0800
2b34218e8 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull irq fixes from Ingo Molnar:
"A CPU hotplug related fix, plus two related sanity checks"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/cpuhotplug: Enforce affinity setting on startup of managed irqs
genirq/cpuhotplug: Add sanity check for effective affinity mask
genirq: Warn when effective affinity is not updated

Linus Torvalds
2017-10-15 03:11:21 +0800

14 Oct, 2017

1 commit

ca1825518 kmemleak: clear stale pointers from task stacks ... Browse Code »

Kmemleak considers any pointers on task stacks as references. This
patch clears newly allocated and reused vmap stacks.

Link: http://lkml.kernel.org/r/150728990124.744199.8403409836394318684.stgit@buzz
Signed-off-by: Konstantin Khlebnikov
Acked-by: Catalin Marinas
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Konstantin Khlebnikov
2017-10-14 07:18:33 +0800

13 Oct, 2017

3 commits

0d08af35f genirq: generic chip: remove irq_gc_mask_disable_reg_and_ack() ... Browse Code »

Any usage of the irq_gc_mask_disable_reg_and_ack() function has
been replaced with the desired functionality.

The incorrect and ambiguously named function is removed here to
prevent accidental misuse.

Signed-off-by: Doug Berger
Signed-off-by: Marc Zyngier

Doug Berger
2017-10-13 23:31:05 +0800
20608924c genirq: generic chip: Add irq_gc_mask_disable_and_ack_set() ... Browse Code »

The irq_gc_mask_disable_reg_and_ack() function name implies that it
provides the combined functions of irq_gc_mask_disable_reg() and
irq_gc_ack(). However, the implementation does not actually do
that since it writes the mask instead of the disable register. It
also does not maintain the mask cache which makes it inappropriate
to use with other masking functions.

In addition, commit 659fb32d1b67 ("genirq: replace irq_gc_ack() with
{set,clr}_bit variants (fwd)") effectively renamed irq_gc_ack() to
irq_gc_ack_set_bit() so this function probably should have also been
renamed at that time.

The generic chip code currently provides three functions for use
with the irq_mask member of the irq_chip structure and two functions
for use with the irq_ack member of the irq_chip structure. These
functions could be combined into six functions for use with the
irq_mask_ack member of the irq_chip structure. However, since only
one of the combinations is currently used, only the function
irq_gc_mask_disable_and_ack_set() is added by this commit.

The '_reg' and '_bit' portions of the base function name were left
out of the new combined function name in an attempt to keep the
function name length manageable with the 80 character source code
line length while still allowing the distinct aspects of each
combination to be captured by the name.

If other combinations are desired in the future please add them to
the irq generic chip library at that time.

Signed-off-by: Doug Berger
Signed-off-by: Marc Zyngier

Doug Berger
2017-10-13 23:31:05 +0800
0de50ea7b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching ... Browse Code »

Pull livepatching fix from Jiri Kosina:

- bugfix for handling of coming modules (incorrect handling of failure)
from Joe Lawrence

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
livepatch: unpatch all klp_objects if klp_module_coming fails

Linus Torvalds
2017-10-13 00:21:56 +0800

12 Oct, 2017

1 commit

467251c69 Merge branch 'waitid-fix' ... Browse Code »

Merge waitid() fix from Kees Cook.

I'd have hoped that the unsafe_{get|put}_user() naming would have
avoided these kinds of stupid bugs, but no such luck.

* waitid-fix:
waitid(): Add missing access_ok() checks

Linus Torvalds
2017-10-12 23:36:47 +0800

11 Oct, 2017

4 commits

ef8daf8ee livepatch: unpatch all klp_objects if klp_module_coming fails ... Browse Code »

When an incoming module is considered for livepatching by
klp_module_coming(), it iterates over multiple patches and multiple
kernel objects in this order:

list_for_each_entry(patch, &klp_patches, list) {
klp_for_each_object(patch, obj) {

which means that if one of the kernel objects fails to patch,
klp_module_coming()'s error path needs to unpatch and cleanup any kernel
objects that were already patched by a previous patch.

Reported-by: Miroslav Benes
Suggested-by: Petr Mladek
Signed-off-by: Joe Lawrence
Acked-by: Josh Poimboeuf
Reviewed-by: Petr Mladek
Signed-off-by: Jiri Kosina

Joe Lawrence
2017-10-11 21:38:46 +0800
75cb07096 Revert "net: defer call to cgroup_sk_alloc()" ... Browse Code »

This reverts commit fbb1fb4ad415cb31ce944f65a5ca700aaf73a227.

This was not the proper fix, lets cleanly revert it, so that
following patch can be carried to stable versions.

sock_cgroup_ptr() callers do not expect a NULL return value.

Signed-off-by: Eric Dumazet
Cc: Johannes Weiner
Cc: Tejun Heo
Signed-off-by: David S. Miller

Eric Dumazet
2017-10-11 11:24:29 +0800
a957fd420 Merge tag 'seccomp-v4.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux ... Browse Code »

Pull seccomp fixlet from Kees Cook:
"Minor seccomp fix for v4.14-rc5. I debated sending this at all for
v4.14, but since it fixes a minor issue in the prior fix, which also
went to -stable, it seemed better to just get all of it cleaned up
right now.

- fix missed "static" to avoid Sparse warning (Colin King)"

* tag 'seccomp-v4.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
seccomp: make function __get_seccomp_filter static

Linus Torvalds
2017-10-11 04:08:59 +0800
084f5601c seccomp: make function __get_seccomp_filter static ... Browse Code »

The function __get_seccomp_filter is local to the source and does
not need to be in global scope, so make it static.

Cleans up sparse warning:
symbol '__get_seccomp_filter' was not declared. Should it be static?

Signed-off-by: Colin Ian King
Fixes: 66a733ea6b61 ("seccomp: fix the usage of get/put_seccomp_filter() in seccomp_get_filter()")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook

Colin Ian King
2017-10-11 02:45:29 +0800

10 Oct, 2017

8 commits

024c9d2fa sched/core: Ensure load_balance() respects the active_mask ... Browse Code »

While load_balance() masks the source CPUs against active_mask, it had
a hole against the destination CPU. Ensure the destination CPU is also
part of the 'domain-mask & active-mask' set.

Reported-by: Levin, Alexander (Sasha Levin)
Signed-off-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Fixes: 77d1dfda0e79 ("sched/topology, cpuset: Avoid spurious/wrong domain rebuilds")
Signed-off-by: Ingo Molnar

Peter Zijlstra
2017-10-10 16:14:03 +0800
f2cdd9cc6 sched/core: Address more wake_affine() regressions ... Browse Code »

The trivial wake_affine_idle() implementation is very good for a
number of workloads, but it comes apart at the moment there are no
idle CPUs left, IOW. the overloaded case.

hackbench:

NO_WA_WEIGHT WA_WEIGHT

hackbench-20 : 7.362717561 seconds 6.450509391 seconds

(win)

netperf:

NO_WA_WEIGHT WA_WEIGHT

TCP_SENDFILE-1 : Avg: 54524.6 Avg: 52224.3
TCP_SENDFILE-10 : Avg: 48185.2 Avg: 46504.3
TCP_SENDFILE-20 : Avg: 29031.2 Avg: 28610.3
TCP_SENDFILE-40 : Avg: 9819.72 Avg: 9253.12
TCP_SENDFILE-80 : Avg: 5355.3 Avg: 4687.4

TCP_STREAM-1 : Avg: 41448.3 Avg: 42254
TCP_STREAM-10 : Avg: 24123.2 Avg: 25847.9
TCP_STREAM-20 : Avg: 15834.5 Avg: 18374.4
TCP_STREAM-40 : Avg: 5583.91 Avg: 5599.57
TCP_STREAM-80 : Avg: 2329.66 Avg: 2726.41

TCP_RR-1 : Avg: 80473.5 Avg: 82638.8
TCP_RR-10 : Avg: 72660.5 Avg: 73265.1
TCP_RR-20 : Avg: 52607.1 Avg: 52634.5
TCP_RR-40 : Avg: 57199.2 Avg: 56302.3
TCP_RR-80 : Avg: 25330.3 Avg: 26867.9

UDP_RR-1 : Avg: 108266 Avg: 107844
UDP_RR-10 : Avg: 95480 Avg: 95245.2
UDP_RR-20 : Avg: 68770.8 Avg: 68673.7
UDP_RR-40 : Avg: 76231 Avg: 75419.1
UDP_RR-80 : Avg: 34578.3 Avg: 35639.1

UDP_STREAM-1 : Avg: 64684.3 Avg: 66606
UDP_STREAM-10 : Avg: 52701.2 Avg: 52959.5
UDP_STREAM-20 : Avg: 30376.4 Avg: 29704
UDP_STREAM-40 : Avg: 15685.8 Avg: 15266.5
UDP_STREAM-80 : Avg: 8415.13 Avg: 7388.97

(wins and losses)

sysbench:

NO_WA_WEIGHT WA_WEIGHT

sysbench-mysql-2 : 2135.17 per sec. 2142.51 per sec.
sysbench-mysql-5 : 4809.68 per sec. 4800.19 per sec.
sysbench-mysql-10 : 9158.59 per sec. 9157.05 per sec.
sysbench-mysql-20 : 14570.70 per sec. 14543.55 per sec.
sysbench-mysql-40 : 22130.56 per sec. 22184.82 per sec.
sysbench-mysql-80 : 20995.56 per sec. 21904.18 per sec.

sysbench-psql-2 : 1679.58 per sec. 1705.06 per sec.
sysbench-psql-5 : 3797.69 per sec. 3879.93 per sec.
sysbench-psql-10 : 7253.22 per sec. 7258.06 per sec.
sysbench-psql-20 : 11166.75 per sec. 11220.00 per sec.
sysbench-psql-40 : 17277.28 per sec. 17359.78 per sec.
sysbench-psql-80 : 17112.44 per sec. 17221.16 per sec.

(increase on the top end)

tbench:

NO_WA_WEIGHT

Throughput 685.211 MB/sec 2 clients 2 procs max_latency=0.123 ms
Throughput 1596.64 MB/sec 5 clients 5 procs max_latency=0.119 ms
Throughput 2985.47 MB/sec 10 clients 10 procs max_latency=0.262 ms
Throughput 4521.15 MB/sec 20 clients 20 procs max_latency=0.506 ms
Throughput 9438.1 MB/sec 40 clients 40 procs max_latency=2.052 ms
Throughput 8210.5 MB/sec 80 clients 80 procs max_latency=8.310 ms

WA_WEIGHT

Throughput 697.292 MB/sec 2 clients 2 procs max_latency=0.127 ms
Throughput 1596.48 MB/sec 5 clients 5 procs max_latency=0.080 ms
Throughput 2975.22 MB/sec 10 clients 10 procs max_latency=0.254 ms
Throughput 4575.14 MB/sec 20 clients 20 procs max_latency=0.502 ms
Throughput 9468.65 MB/sec 40 clients 40 procs max_latency=2.069 ms
Throughput 8631.73 MB/sec 80 clients 80 procs max_latency=8.605 ms

(increase on the top end)

Signed-off-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Rik van Riel
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2017-10-10 16:14:03 +0800
d153b1534 sched/core: Fix wake_affine() performance regression ... Browse Code »

Eric reported a sysbench regression against commit:

3fed382b46ba ("sched/numa: Implement NUMA node level wake_affine()")

Similarly, Rik was looking at the NAS-lu.C benchmark, which regressed
against his v3.10 enterprise kernel.

PRE (current tip/master):

ivb-ep sysbench:

2: [30 secs] transactions: 64110 (2136.94 per sec.)
5: [30 secs] transactions: 143644 (4787.99 per sec.)
10: [30 secs] transactions: 274298 (9142.93 per sec.)
20: [30 secs] transactions: 418683 (13955.45 per sec.)
40: [30 secs] transactions: 320731 (10690.15 per sec.)
80: [30 secs] transactions: 355096 (11834.28 per sec.)

hsw-ex NAS:

OMP_PROC_BIND/lu.C.x_threads_144_run_1.log: Time in seconds = 18.01
OMP_PROC_BIND/lu.C.x_threads_144_run_2.log: Time in seconds = 17.89
OMP_PROC_BIND/lu.C.x_threads_144_run_3.log: Time in seconds = 17.93
lu.C.x_threads_144_run_1.log: Time in seconds = 434.68
lu.C.x_threads_144_run_2.log: Time in seconds = 405.36
lu.C.x_threads_144_run_3.log: Time in seconds = 433.83

POST (+patch):

ivb-ep sysbench:

2: [30 secs] transactions: 64494 (2149.75 per sec.)
5: [30 secs] transactions: 145114 (4836.99 per sec.)
10: [30 secs] transactions: 278311 (9276.69 per sec.)
20: [30 secs] transactions: 437169 (14571.60 per sec.)
40: [30 secs] transactions: 669837 (22326.73 per sec.)
80: [30 secs] transactions: 631739 (21055.88 per sec.)

hsw-ex NAS:

lu.C.x_threads_144_run_1.log: Time in seconds = 23.36
lu.C.x_threads_144_run_2.log: Time in seconds = 22.96
lu.C.x_threads_144_run_3.log: Time in seconds = 22.52

This patch takes out all the shiny wake_affine() stuff and goes back to
utter basics. Between the two CPUs involved with the wakeup (the CPU
doing the wakeup and the CPU we ran on previously) pick the CPU we can
run on _now_.

This restores much of the regressions against the older kernels,
but leaves some ground in the overloaded case. The default-enabled
WA_WEIGHT (which will be introduced in the next patch) is an attempt
to address the overloaded situation.

Reported-by: Eric Farman
Signed-off-by: Peter Zijlstra (Intel)
Cc: Christian Borntraeger
Cc: Linus Torvalds
Cc: Matthew Rosato
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Thomas Gleixner
Cc: jinpuwang@gmail.com
Cc: vcaputo@pengaru.com
Fixes: 3fed382b46ba ("sched/numa: Implement NUMA node level wake_affine()")
Signed-off-by: Ingo Molnar

Peter Zijlstra
2017-10-10 16:14:02 +0800
e6a520339 perf/core: Fix cgroup time when scheduling descendants ... Browse Code »

Update cgroup time when an event is scheduled in by descendants.

Reviewed-and-tested-by: Jiri Olsa
Signed-off-by: leilei.lin
Signed-off-by: Peter Zijlstra (Intel)
Cc: Jiri Olsa
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Thomas Gleixner
Cc: acme@kernel.org
Cc: alexander.shishkin@linux.intel.com
Cc: brendan.d.gregg@gmail.com
Cc: yang_oliver@hotmail.com
Link: http://lkml.kernel.org/r/CALPjY3mkHiekRkRECzMi9G-bjUQOvOjVBAqxmWkTzc-g+0LwMg@mail.gmail.com
Signed-off-by: Ingo Molnar

leilei.lin
2017-10-10 16:06:55 +0800
df0062b27 perf/core: Avoid freeing static PMU contexts when PMU is unregistered ... Browse Code »

Since commit:

1fd7e4169954 ("perf/core: Remove perf_cpu_context::unique_pmu")

... when a PMU is unregistered then its associated ->pmu_cpu_context is
unconditionally freed. Whilst this is fine for dynamically allocated
context types (i.e. those registered using perf_invalid_context), this
causes a problem for sharing of static contexts such as
perf_{sw,hw}_context, which are used by multiple built-in PMUs and
effectively have a global lifetime.

Whilst testing the ARM SPE driver, which must use perf_sw_context to
support per-task AUX tracing, unregistering the driver as a result of a
module unload resulted in:

Unable to handle kernel NULL pointer dereference at virtual address 00000038
Internal error: Oops: 96000004 [#1] PREEMPT SMP
Modules linked in: [last unloaded: arm_spe_pmu]
PC is at ctx_resched+0x38/0xe8
LR is at perf_event_exec+0x20c/0x278
[...]
ctx_resched+0x38/0xe8
perf_event_exec+0x20c/0x278
setup_new_exec+0x88/0x118
load_elf_binary+0x26c/0x109c
search_binary_handler+0x90/0x298
do_execveat_common.isra.14+0x540/0x618
SyS_execve+0x38/0x48

since the software context has been freed and the ctx.pmu->pmu_disable_count
field has been set to NULL.

This patch fixes the problem by avoiding the freeing of static PMU contexts
altogether. Whilst the sharing of dynamic contexts is questionable, this
actually requires the caller to share their context pointer explicitly
and so the burden is on them to manage the object lifetime.

Reported-by: Kim Phillips
Signed-off-by: Will Deacon
Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Mark Rutland
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Fixes: 1fd7e4169954 ("perf/core: Remove perf_cpu_context::unique_pmu")
Link: http://lkml.kernel.org/r/1507040450-7730-1-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar

Will Deacon
2017-10-10 16:06:54 +0800
8b405d5c5 locking/lockdep: Fix stacktrace mess ... Browse Code »

There is some complication between check_prevs_add() and
check_prev_add() wrt. saving stack traces. The problem is that we want
to be frugal with saving stack traces, since it consumes static
resources.

We'll only know in check_prev_add() if we need the trace, but we can
call into it multiple times. So we want to do on-demand and re-use.

A further complication is that check_prev_add() can drop graph_lock
and mess with our static resources.

In any case, the current state; after commit:

ce07a9415f26 ("locking/lockdep: Make check_prev_add() able to handle external stack_trace")

is that we'll assume the trace contains valid data once
check_prev_add() returns '2'. However, as noted by Josh, this is
false, check_prev_add() can return '2' before having saved a trace,
this then result in the possibility of using uninitialized data.
Testing, as reported by Wu, shows a NULL deref.

So simplify.

Since the graph_lock() thing is a debug path that hasn't
really been used in a long while, take it out back and avoid the
head-ache.

Further initialize the stack_trace to a known 'empty' state; as long
as nr_entries == 0, nothing should deref entries. We can then use the
'entries == NULL' test for a valid trace / on-demand saving.

Analyzed-by: Josh Poimboeuf
Reported-by: Fengguang Wu
Signed-off-by: Peter Zijlstra (Intel)
Cc: Byungchul Park
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Fixes: ce07a9415f26 ("locking/lockdep: Make check_prev_add() able to handle external stack_trace")
Signed-off-by: Ingo Molnar

Peter Zijlstra
2017-10-10 16:04:28 +0800
fbb1fb4ad net: defer call to cgroup_sk_alloc() ... Browse Code »

sk_clone_lock() might run while TCP/DCCP listener already vanished.

In order to prevent use after free, it is better to defer cgroup_sk_alloc()
to the point we know both parent and child exist, and from process context.

Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets")
Signed-off-by: Eric Dumazet
Cc: Johannes Weiner
Cc: Tejun Heo
Signed-off-by: David S. Miller

Eric Dumazet
2017-10-10 11:55:01 +0800
96ca579a1 waitid(): Add missing access_ok() checks ... Browse Code »

Adds missing access_ok() checks.

CVE-2017-5123

Reported-by: Chris Salls
Signed-off-by: Kees Cook
Acked-by: Al Viro
Fixes: 4c48abe91be0 ("waitid(): switch copyout of siginfo to unsafe_put_user()")
Cc: stable@kernel.org # 4.13
Signed-off-by: Linus Torvalds

Kees Cook
2017-10-10 08:03:31 +0800