Eric Lee / smarc-fsl-linux-kernel

23 Sep, 2020

2 commits

d3017135c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net ... Browse Code »

Pull networking fixes from Jakub Kicinski:

- fix failure to add bond interfaces to a bridge, the offload-handling
code was too defensive there and recent refactoring unearthed that.
Users complained (Ido)

- fix unnecessarily reflecting ECN bits within TOS values / QoS marking
in TCP ACK and reset packets (Wei)

- fix a deadlock with bpf iterator. Hopefully we're in the clear on
this front now... (Yonghong)

- BPF fix for clobbering r2 in bpf_gen_ld_abs (Daniel)

- fix AQL on mt76 devices with FW rate control and add a couple of AQL
issues in mac80211 code (Felix)

- fix authentication issue with mwifiex (Maximilian)

- WiFi connectivity fix: revert IGTK support in ti/wlcore (Mauro)

- fix exception handling for multipath routes via same device (David
Ahern)

- revert back to a BH spin lock flavor for nsid_lock: there are paths
which do require the BH context protection (Taehee)

- fix interrupt / queue / NAPI handling in the lantiq driver (Hauke)

- fix ife module load deadlock (Cong)

- make an adjustment to netlink reply message type for code added in
this release (the sole change touching uAPI here) (Michal)

- a number of fixes for small NXP and Microchip switches (Vladimir)

[ Pull request acked by David: "you can expect more of this in the
future as I try to delegate more things to Jakub" ]

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (167 commits)
net: mscc: ocelot: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries
net: dsa: seville: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries
net: dsa: felix: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries
inet_diag: validate INET_DIAG_REQ_PROTOCOL attribute
net: bridge: br_vlan_get_pvid_rcu() should dereference the VLAN group under RCU
net: Update MAINTAINERS for MediaTek switch driver
net/mlx5e: mlx5e_fec_in_caps() returns a boolean
net/mlx5e: kTLS, Avoid kzalloc(GFP_KERNEL) under spinlock
net/mlx5e: kTLS, Fix leak on resync error flow
net/mlx5e: kTLS, Add missing dma_unmap in RX resync
net/mlx5e: kTLS, Fix napi sync and possible use-after-free
net/mlx5e: TLS, Do not expose FPGA TLS counter if not supported
net/mlx5e: Fix using wrong stats_grps in mlx5e_update_ndo_stats()
net/mlx5e: Fix multicast counter not up-to-date in "ip -s"
net/mlx5e: Fix endianness when calculating pedit mask first bit
net/mlx5e: Enable adding peer miss rules only if merged eswitch is supported
net/mlx5e: CT: Fix freeing ct_label mapping
net/mlx5e: Fix memory leak of tunnel info when rule under multipath not ready
net/mlx5e: Use synchronize_rcu to sync with NAPI
net/mlx5e: Use RCU to protect rq->xdp_prog
...

Linus Torvalds
2020-09-23 05:43:50 +0800
eff48ddea Merge tag 'trace-v5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace ... Browse Code »

Pull tracing fixes from Steven Rostedt:

- Check kprobe is enabled before unregistering from ftrace as it isn't
registered when disabled.

- Remove kprobes enabled via command-line that is on init text when
freed.

- Add missing RCU synchronization for ftrace trampoline symbols removed
from kallsyms.

- Free trampoline on error path if ftrace_startup() fails.

- Give more space for the longer PID numbers in trace output.

- Fix a possible double free in the histogram code.

- A couple of fixes that were discovered by sparse.

* tag 'trace-v5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
bootconfig: init: make xbc_namebuf static
kprobes: tracing/kprobes: Fix to kill kprobes on initmem after boot
tracing: fix double free
ftrace: Let ftrace_enable_sysctl take a kernel pointer buffer
tracing: Make the space reserved for the pid wider
ftrace: Fix missing synchronize_rcu() removing trampoline from kallsyms
ftrace: Free the trampoline when ftrace_startup() fails
kprobes: Fix to check probe enabled before disarm_kprobe_ftrace()

Linus Torvalds
2020-09-23 00:08:33 +0800

22 Sep, 2020

1 commit

984777406 Merge branch 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu ... Browse Code »

Pull RCU fix from Paul McKenney:
"This contains a single commit that fixes a bug that was introduced in
the last merge window. This bug causes a compiler warning complaining
about show_rcu_tasks_classic_gp_kthread() being an unused static
function in !SMP kernels.

The fix is straightforward, just adding an 'inline' to make this a
static inline function, thus avoiding the warning.

This bug was reported by Laurent Pinchart, who would like it fixed
sooner rather than later"

* 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
rcu-tasks: Prevent complaints of unused show_rcu_tasks_classic_gp_kthread()

Linus Torvalds
2020-09-22 03:42:31 +0800

21 Sep, 2020

2 commits

e2bff391c Merge tag 'core_urgent_for_v5.9_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull syscall tracing fix from Borislav Petkov:
"Fix the seccomp syscall rewriting so that trace and audit see the
rewritten syscall number, from Kees Cook"

* tag 'core_urgent_for_v5.9_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
core/entry: Report syscall correctly for trace and audit

Linus Torvalds
2020-09-21 06:37:15 +0800
3d491679b Merge tag 'locking_urgent_for_v5.9_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull locking fixes from Borislav Petkov:
"Two fixes from the locking/urgent pile:

- Fix lockdep's detection of "USED" inversions

Linus Torvalds
2020-09-21 06:25:33 +0800

20 Sep, 2020

4 commits

325d0eab4 Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge fixes from Andrew Morton:
"15 patches.

Subsystems affected by this patch series: mailmap, mm/hotfixes,
mm/thp, mm/memory-hotplug, misc, kcsan"

* emailed patches from Andrew Morton :
kcsan: kconfig: move to menu 'Generic Kernel Debugging Instruments'
fs/fs-writeback.c: adjust dirtytime_interval_handler definition to match prototype
stackleak: let stack_erasing_sysctl take a kernel pointer buffer
ftrace: let ftrace_enable_sysctl take a kernel pointer buffer
mm/memory_hotplug: drain per-cpu pages again during memory offline
selftests/vm: fix display of page size in map_hugetlb
mm/thp: fix __split_huge_pmd_locked() for migration PMD
kprobes: fix kill kprobe which has been marked as gone
tmpfs: restore functionality of nr_inodes=0
mlock: fix unevictable_pgs event counts on THP
mm: fix check_move_unevictable_pages() on THP
mm: migration of hugetlbfs page skip memcg
ksm: reinstate memcg charge on copied pages
mailmap: add older email addresses for Kees Cook

Linus Torvalds
2020-09-20 09:18:37 +0800
4773ef33f stackleak: let stack_erasing_sysctl take a kernel pointer buffer ... Browse Code »

Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
changed ctl_table.proc_handler to take a kernel pointer. Adjust the
signature of stack_erasing_sysctl to match ctl_table.proc_handler which
fixes the following sparse warning:

kernel/stackleak.c:31:50: warning: incorrect type in argument 3 (different address spaces)
kernel/stackleak.c:31:50: expected void *
kernel/stackleak.c:31:50: got void [noderef] __user *buffer

Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
Signed-off-by: Tobias Klauser
Signed-off-by: Andrew Morton
Cc: Christoph Hellwig
Cc: Al Viro
Link: https://lkml.kernel.org/r/20200907093253.13656-1-tklauser@distanz.ch
Signed-off-by: Linus Torvalds

Tobias Klauser
2020-09-20 04:13:39 +0800
7bb82ac30 ftrace: let ftrace_enable_sysctl take a kernel pointer buffer ... Browse Code »

Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
changed ctl_table.proc_handler to take a kernel pointer. Adjust the
signature of ftrace_enable_sysctl to match ctl_table.proc_handler which
fixes the following sparse warning:

kernel/trace/ftrace.c:7544:43: warning: incorrect type in argument 3 (different address spaces)
kernel/trace/ftrace.c:7544:43: expected void *
kernel/trace/ftrace.c:7544:43: got void [noderef] __user *buffer

Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
Signed-off-by: Tobias Klauser
Signed-off-by: Andrew Morton
Cc: Christoph Hellwig
Cc: Al Viro
Link: https://lkml.kernel.org/r/20200907093207.13540-1-tklauser@distanz.ch
Signed-off-by: Linus Torvalds

Tobias Klauser
2020-09-20 04:13:39 +0800
b0399092c kprobes: fix kill kprobe which has been marked as gone ... Browse Code »

If a kprobe is marked as gone, we should not kill it again. Otherwise, we
can disarm the kprobe more than once. In that case, the statistics of
kprobe_ftrace_enabled can unbalance which can lead to that kprobe do not
work.

Fixes: e8386a0cb22f ("kprobes: support probing module __exit function")
Co-developed-by: Chengming Zhou
Signed-off-by: Muchun Song
Signed-off-by: Chengming Zhou
Signed-off-by: Andrew Morton
Acked-by: Masami Hiramatsu
Cc: "Naveen N . Rao"
Cc: Anil S Keshavamurthy
Cc: David S. Miller
Cc: Song Liu
Cc: Steven Rostedt
Cc:
Link: https://lkml.kernel.org/r/20200822030055.32383-1-songmuchun@bytedance.com
Signed-off-by: Linus Torvalds

Muchun Song
2020-09-20 04:13:38 +0800

19 Sep, 2020

7 commits

eb5f95f15 Merge tag 's390-5.9-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux ... Browse Code »

Pull s390 fixes from Vasily Gorbik:

- Fix order in trace_hardirqs_off_caller() to make locking state
consistent even if the IRQ tracer calls into lockdep again. Touches
common code. Acked-by Peter Zijlstra.

- Correctly handle secure storage violation exception to avoid kernel
panic triggered by user space misbehaviour.

- Switch the idle->seqcount over to using raw_write_*() to avoid
"suspicious RCU usage".

- Fix memory leaks on hard unplug in pci code.

- Use kvmalloc instead of kmalloc for larger allocations in zcrypt.

- Add few missing __init annotations to static functions to avoid
section mismatch complains when functions are not inlined.

* tag 's390-5.9-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: add 3f program exception handler
lockdep: fix order in trace_hardirqs_off_caller()
s390/pci: fix leak of DMA tables on hard unplug
s390/init: add missing __init annotations
s390/zcrypt: fix kmalloc 256k failure
s390/idle: fix suspicious RCU usage

Linus Torvalds
2020-09-19 09:51:08 +0800
82d083ab6 kprobes: tracing/kprobes: Fix to kill kprobes on initmem after boot ... Browse Code »

Since kprobe_event= cmdline option allows user to put kprobes on the
functions in initmem, kprobe has to make such probes gone after boot.
Currently the probes on the init functions in modules will be handled
by module callback, but the kernel init text isn't handled.
Without this, kprobes may access non-exist text area to disable or
remove it.

Link: https://lkml.kernel.org/r/159972810544.428528.1839307531600646955.stgit@devnote2

Fixes: 970988e19eb0 ("tracing/kprobe: Add kprobe_event= boot parameter")
Cc: Jonathan Corbet
Cc: Shuah Khan
Cc: Randy Dunlap
Cc: Ingo Molnar
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu
Signed-off-by: Steven Rostedt (VMware)

Masami Hiramatsu
2020-09-19 02:27:24 +0800
46bbe5c67 tracing: fix double free ... Browse Code »

clang static analyzer reports this problem

trace_events_hist.c:3824:3: warning: Attempt to free
released memory
kfree(hist_data->attrs->var_defs.name[i]);

In parse_var_defs() if there is a problem allocating
var_defs.expr, the earlier var_defs.name is freed.
This free is duplicated by free_var_defs() which frees
the rest of the list.

Because free_var_defs() has to run anyway, remove the
second free fom parse_var_defs().

Link: https://lkml.kernel.org/r/20200907135845.15804-1-trix@redhat.com

Cc: stable@vger.kernel.org
Fixes: 30350d65ac56 ("tracing: Add variable support to hist triggers")
Reviewed-by: Tom Zanussi
Signed-off-by: Tom Rix
Signed-off-by: Steven Rostedt (VMware)

Tom Rix
2020-09-19 01:16:40 +0800
54fa9ba56 ftrace: Let ftrace_enable_sysctl take a kernel pointer buffer ... Browse Code »

Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
changed ctl_table.proc_handler to take a kernel pointer. Adjust the
signature of ftrace_enable_sysctl to match ctl_table.proc_handler which
fixes the following sparse warning:

kernel/trace/ftrace.c:7544:43: warning: incorrect type in argument 3 (different address spaces)
kernel/trace/ftrace.c:7544:43: expected void *
kernel/trace/ftrace.c:7544:43: got void [noderef] __user *buffer

Link: https://lkml.kernel.org/r/20200907093207.13540-1-tklauser@distanz.ch

Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
Cc: Andrew Morton
Cc: Ingo Molnar
Cc: Christoph Hellwig
Cc: Al Viro
Signed-off-by: Tobias Klauser
Signed-off-by: Steven Rostedt (VMware)

Tobias Klauser
2020-09-19 01:15:56 +0800
795d6379a tracing: Make the space reserved for the pid wider ... Browse Code »

For 64bit CONFIG_BASE_SMALL=0 systems PID_MAX_LIMIT is set by default to
4194304. During boot the kernel sets a new value based on number of CPUs
but no lower than 32768. It is 1024 per CPU so with 128 CPUs the default
becomes 131072 which needs six digits.
This value can be increased during run time but must not exceed the
initial upper limit.

Systemd sometime after v241 sets it to the upper limit during boot. The
result is that when the pid exceeds five digits, the trace output is a
little hard to read because it is no longer properly padded (same like
on big iron with 98+ CPUs).

Increase the pid padding to seven digits.

Link: https://lkml.kernel.org/r/20200904082331.dcdkrr3bkn3e4qlg@linutronix.de

Signed-off-by: Sebastian Andrzej Siewior
Signed-off-by: Steven Rostedt (VMware)

Sebastian Andrzej Siewior
2020-09-19 00:42:11 +0800
478ece957 ftrace: Fix missing synchronize_rcu() removing trampoline from kallsyms ... Browse Code »

Add synchronize_rcu() after list_del_rcu() in
ftrace_remove_trampoline_from_kallsyms() to protect readers of
ftrace_ops_trampoline_list (in ftrace_get_trampoline_kallsym)
which is used when kallsyms is read.

Link: https://lkml.kernel.org/r/20200901091617.31837-1-adrian.hunter@intel.com

Fixes: fc0ea795f53c8d ("ftrace: Add symbols for ftrace trampolines")
Signed-off-by: Adrian Hunter
Signed-off-by: Steven Rostedt (VMware)

Adrian Hunter
2020-09-19 00:22:42 +0800
d5e47505e ftrace: Free the trampoline when ftrace_startup() fails ... Browse Code »

Commit fc0ea795f53c ("ftrace: Add symbols for ftrace trampolines")
missed to remove ops from new ftrace_ops_trampoline_list in
ftrace_startup() if ftrace_hash_ipmodify_enable() fails there. It may
lead to BUG if such ops come from a module which may be removed.

Moreover, the trampoline itself is not freed in this case.

Fix it by calling ftrace_trampoline_free() during the rollback.

Link: https://lkml.kernel.org/r/20200831122631.28057-1-mbenes@suse.cz

Fixes: fc0ea795f53c ("ftrace: Add symbols for ftrace trampolines")
Fixes: f8b8be8a310a ("ftrace, kprobes: Support IPMODIFY flag to find IP modify conflict")
Signed-off-by: Miroslav Benes
Signed-off-by: Steven Rostedt (VMware)

Miroslav Benes
2020-09-19 00:19:08 +0800

18 Sep, 2020

2 commits

3031313eb kprobes: Fix to check probe enabled before disarm_kprobe_ftrace() ... Browse Code »

Commit 0cb2f1372baa ("kprobes: Fix NULL pointer dereference at
kprobe_ftrace_handler") fixed one bug but not completely fixed yet.
If we run a kprobe_module.tc of ftracetest, kernel showed a warning
as below.

# ./ftracetest test.d/kprobe/kprobe_module.tc
=== Ftrace unit tests ===
[1] Kprobe dynamic event - probing module
...
[ 22.400215] ------------[ cut here ]------------
[ 22.400962] Failed to disarm kprobe-ftrace at trace_printk_irq_work+0x0/0x7e [trace_printk] (-2)
[ 22.402139] WARNING: CPU: 7 PID: 200 at kernel/kprobes.c:1091 __disarm_kprobe_ftrace.isra.0+0x7e/0xa0
[ 22.403358] Modules linked in: trace_printk(-)
[ 22.404028] CPU: 7 PID: 200 Comm: rmmod Not tainted 5.9.0-rc2+ #66
[ 22.404870] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014
[ 22.406139] RIP: 0010:__disarm_kprobe_ftrace.isra.0+0x7e/0xa0
[ 22.406947] Code: 30 8b 03 eb c9 80 3d e5 09 1f 01 00 75 dc 49 8b 34 24 89 c2 48 c7 c7 a0 c2 05 82 89 45 e4 c6 05 cc 09 1f 01 01 e8 a9 c7 f0 ff 0b 8b 45 e4 eb b9 89 c6 48 c7 c7 70 c2 05 82 89 45 e4 e8 91 c7
[ 22.409544] RSP: 0018:ffffc90000237df0 EFLAGS: 00010286
[ 22.410385] RAX: 0000000000000000 RBX: ffffffff83066024 RCX: 0000000000000000
[ 22.411434] RDX: 0000000000000001 RSI: ffffffff810de8d3 RDI: ffffffff810de8d3
[ 22.412687] RBP: ffffc90000237e10 R08: 0000000000000001 R09: 0000000000000001
[ 22.413762] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88807c478640
[ 22.414852] R13: ffffffff8235ebc0 R14: ffffffffa00060c0 R15: 0000000000000000
[ 22.415941] FS: 00000000019d48c0(0000) GS:ffff88807d7c0000(0000) knlGS:0000000000000000
[ 22.417264] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 22.418176] CR2: 00000000005bb7e3 CR3: 0000000078f7a000 CR4: 00000000000006a0
[ 22.419309] Call Trace:
[ 22.419990] kill_kprobe+0x94/0x160
[ 22.420652] kprobes_module_callback+0x64/0x230
[ 22.421470] notifier_call_chain+0x4f/0x70
[ 22.422184] blocking_notifier_call_chain+0x49/0x70
[ 22.422979] __x64_sys_delete_module+0x1ac/0x240
[ 22.423733] do_syscall_64+0x38/0x50
[ 22.424366] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 22.425176] RIP: 0033:0x4bb81d
[ 22.425741] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e0 ff ff ff f7 d8 64 89 01 48
[ 22.428726] RSP: 002b:00007ffc70fef008 EFLAGS: 00000246 ORIG_RAX: 00000000000000b0
[ 22.430169] RAX: ffffffffffffffda RBX: 00000000019d48a0 RCX: 00000000004bb81d
[ 22.431375] RDX: 0000000000000000 RSI: 0000000000000880 RDI: 00007ffc70fef028
[ 22.432543] RBP: 0000000000000880 R08: 00000000ffffffff R09: 00007ffc70fef320
[ 22.433692] R10: 0000000000656300 R11: 0000000000000246 R12: 00007ffc70fef028
[ 22.434635] R13: 0000000000000000 R14: 0000000000000002 R15: 0000000000000000
[ 22.435682] irq event stamp: 1169
[ 22.436240] hardirqs last enabled at (1179): [] console_unlock+0x422/0x580
[ 22.437466] hardirqs last disabled at (1188): [] console_unlock+0x7b/0x580
[ 22.438608] softirqs last enabled at (866): [] __do_softirq+0x38e/0x490
[ 22.439637] softirqs last disabled at (859): [] asm_call_on_stack+0x12/0x20
[ 22.440690] ---[ end trace 1e7ce7e1e4567276 ]---
[ 22.472832] trace_kprobe: This probe might be able to register after target module is loaded. Continue.

This is because the kill_kprobe() calls disarm_kprobe_ftrace() even
if the given probe is not enabled. In that case, ftrace_set_filter_ip()
fails because the given probe point is not registered to ftrace.

Fix to check the given (going) probe is enabled before invoking
disarm_kprobe_ftrace().

Link: https://lkml.kernel.org/r/159888672694.1411785.5987998076694782591.stgit@devnote2

Fixes: 0cb2f1372baa ("kprobes: Fix NULL pointer dereference at kprobe_ftrace_handler")
Cc: Ingo Molnar
Cc: "Naveen N . Rao"
Cc: Anil S Keshavamurthy
Cc: David Miller
Cc: Muchun Song
Cc: Chengming Zhou
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu
Signed-off-by: Steven Rostedt (VMware)

Masami Hiramatsu
2020-09-18 23:50:51 +0800
5ef64cc89 mm: allow a controlled amount of unfairness in the page lock ... Browse Code »

Commit 2a9127fcf229 ("mm: rewrite wait_on_page_bit_common() logic") made
the page locking entirely fair, in that if a waiter came in while the
lock was held, the lock would be transferred to the lockers strictly in
order.

That was intended to finally get rid of the long-reported watchdog
failures that involved the page lock under extreme load, where a process
could end up waiting essentially forever, as other page lockers stole
the lock from under it.

It also improved some benchmarks, but it ended up causing huge
performance regressions on others, simply because fair lock behavior
doesn't end up giving out the lock as aggressively, causing better
worst-case latency, but potentially much worse average latencies and
throughput.

Instead of reverting that change entirely, this introduces a controlled
amount of unfairness, with a sysctl knob to tune it if somebody needs
to. But the default value should hopefully be good for any normal load,
allowing a few rounds of lock stealing, but enforcing the strict
ordering before the lock has been stolen too many times.

There is also a hint from Matthieu Baerts that the fair page coloring
may end up exposing an ABBA deadlock that is hidden by the usual
optimistic lock stealing, and while the unfairness doesn't fix the
fundamental issue (and I'm still looking at that), it avoids it in
practice.

The amount of unfairness can be modified by writing a new value to the
'sysctl_page_lock_unfairness' variable (default value of 5, exposed
through /proc/sys/vm/page_lock_unfairness), but that is hopefully
something we'd use mainly for debugging rather than being necessary for
any deep system tuning.

This whole issue has exposed just how critical the page lock can be, and
how contended it gets under certain locks. And the main contention
doesn't really seem to be anything related to IO (which was the origin
of this lock), but for things like just verifying that the page file
mapping is stable while faulting in the page into a page table.

Link: https://lore.kernel.org/linux-fsdevel/ed8442fd-6f54-dd84-cd4a-941e8b7ee603@MichaelLarabel.com/
Link: https://www.phoronix.com/scan.php?page=article&item=linux-50-59&num=1
Link: https://lore.kernel.org/linux-fsdevel/c560a38d-8313-51fb-b1ec-e904bd8836bc@tessares.net/
Reported-and-tested-by: Michael Larabel
Tested-by: Matthieu Baerts
Cc: Dave Chinner
Cc: Matthew Wilcox
Cc: Chris Mason
Cc: Jan Kara
Cc: Amir Goldstein
Signed-off-by: Linus Torvalds

Linus Torvalds
2020-09-18 01:26:41 +0800

17 Sep, 2020

1 commit

78edc005f rcu-tasks: Prevent complaints of unused show_rcu_tasks_classic_gp_kthread() ... Browse Code »

Commit 8344496e8b49 ("rcu-tasks: Conditionally compile
show_rcu_tasks_gp_kthreads()") introduced conditional
compilation of several functions, but forgot one occurrence of
show_rcu_tasks_classic_gp_kthread() that causes the compiler to warn of
an unused static function. This commit uses "static inline" to avoid
these complaints and possibly also to avoid emitting an actual definition
of this function.

Fixes: 8344496e8b49 ("rcu-tasks: Conditionally compile show_rcu_tasks_gp_kthreads()")
Cc: # 5.8.x
Reported-by: Laurent Pinchart
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2020-09-17 07:32:36 +0800

16 Sep, 2020

3 commits

e6b1a44ec locking/percpu-rwsem: Use this_cpu_{inc,dec}() for read_count ... Browse Code »

The __this_cpu*() accessors are (in general) IRQ-unsafe which, given
that percpu-rwsem is a blocking primitive, should be just fine.

However, file_end_write() is used from IRQ context and will cause
load-store issues on architectures where the per-cpu accessors are not
natively irq-safe.

Fix it by using the IRQ-safe this_cpu_*() for operations on
read_count. This will generate more expensive code on a number of
platforms, which might cause a performance regression for some of the
other percpu-rwsem users.

If any such is reported, we can consider alternative solutions.

Fixes: 70fe2f48152e ("aio: fix freeze protection of aio writes")
Signed-off-by: Hou Tao
Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Will Deacon
Acked-by: Oleg Nesterov
Link: https://lkml.kernel.org/r/20200915140750.137881-1-houtao1@huawei.com

Hou Tao
2020-09-16 22:26:56 +0800
d5d325eae Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf ... Browse Code »

Alexei Starovoitov says:

====================
pull-request: bpf 2020-09-15

The following pull-request contains BPF updates for your *net* tree.

We've added 12 non-merge commits during the last 19 day(s) which contain
a total of 10 files changed, 47 insertions(+), 38 deletions(-).

The main changes are:

1) docs/bpf fixes, from Andrii.

2) ld_abs fix, from Daniel.

3) socket casting helpers fix, from Martin.

4) hash iterator fixes, from Yonghong.
====================

Signed-off-by: David S. Miller

David S. Miller
2020-09-16 10:26:21 +0800
ce880cb82 bpf: Fix a rcu warning for bpffs map pretty-print ... Browse Code »

Running selftest
./btf_btf -p
the kernel had the following warning:
[ 51.528185] WARNING: CPU: 3 PID: 1756 at kernel/bpf/hashtab.c:717 htab_map_get_next_key+0x2eb/0x300
[ 51.529217] Modules linked in:
[ 51.529583] CPU: 3 PID: 1756 Comm: test_btf Not tainted 5.9.0-rc1+ #878
[ 51.530346] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.el7.centos 04/01/2014
[ 51.531410] RIP: 0010:htab_map_get_next_key+0x2eb/0x300
...
[ 51.542826] Call Trace:
[ 51.543119] map_seq_next+0x53/0x80
[ 51.543528] seq_read+0x263/0x400
[ 51.543932] vfs_read+0xad/0x1c0
[ 51.544311] ksys_read+0x5f/0xe0
[ 51.544689] do_syscall_64+0x33/0x40
[ 51.545116] entry_SYSCALL_64_after_hwframe+0x44/0xa9

The related source code in kernel/bpf/hashtab.c:
709 static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
710 {
711 struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
712 struct hlist_nulls_head *head;
713 struct htab_elem *l, *next_l;
714 u32 hash, key_size;
715 int i = 0;
716
717 WARN_ON_ONCE(!rcu_read_lock_held());

In kernel/bpf/inode.c, bpffs map pretty print calls map->ops->map_get_next_key()
without holding a rcu_read_lock(), hence causing the above warning.
To fix the issue, just surrounding map->ops->map_get_next_key() with rcu read lock.

Fixes: a26ca7c982cb ("bpf: btf: Add pretty print support to the basic arraymap")
Reported-by: Alexei Starovoitov
Signed-off-by: Yonghong Song
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Cc: Martin KaFai Lau
Link: https://lore.kernel.org/bpf/20200916004401.146277-1-yhs@fb.com

Yonghong Song
2020-09-16 09:17:39 +0800

15 Sep, 2020

1 commit

b6ec41346 core/entry: Report syscall correctly for trace and audit ... Browse Code »

On v5.8 when doing seccomp syscall rewrites (e.g. getpid into getppid
as seen in the seccomp selftests), trace (and audit) correctly see the
rewritten syscall on entry and exit:

seccomp_bpf-1307 [000] .... 22974.874393: sys_enter: NR 110 (...
seccomp_bpf-1307 [000] .N.. 22974.874401: sys_exit: NR 110 = 1304

With mainline we see a mismatched enter and exit (the original syscall
is incorrectly visible on entry):

seccomp_bpf-1030 [000] .... 21.806766: sys_enter: NR 39 (...
seccomp_bpf-1030 [000] .... 21.806767: sys_exit: NR 110 = 1027

When ptrace or seccomp change the syscall, this needs to be visible to
trace and audit at that time as well. Update the syscall earlier so they
see the correct value.

Fixes: d88d59b64ca3 ("core/entry: Respect syscall number rewrites")
Reported-by: Michael Ellerman
Signed-off-by: Kees Cook
Signed-off-by: Thomas Gleixner
Link: https://lore.kernel.org/r/20200912005826.586171-1-keescook@chromium.org

Kees Cook
2020-09-15 04:49:51 +0800

14 Sep, 2020

1 commit

73ac74c7d lockdep: fix order in trace_hardirqs_off_caller() ... Browse Code »

Switch order so that locking state is consistent even
if the IRQ tracer calls into lockdep again.

Acked-by: Peter Zijlstra
Signed-off-by: Sven Schnelle
Signed-off-by: Vasily Gorbik

Sven Schnelle
2020-09-14 16:08:07 +0800

13 Sep, 2020

1 commit

ef2e9a563 Merge tag 'seccomp-v5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux ... Browse Code »

Pull seccomp fixes from Kees Cook:
"This fixes a rare race condition in seccomp when using TSYNC and
USER_NOTIF together where a memory allocation would not get freed
(found by syzkaller, fixed by Tycho).

Additionally updates Tycho's MAINTAINERS and .mailmap entries for his
new address"

* tag 'seccomp-v5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
seccomp: don't leave dangling ->notif if file allocation fails
mailmap, MAINTAINERS: move to tycho.pizza
seccomp: don't leak memory when filter install races

Linus Torvalds
2020-09-13 03:58:01 +0800

12 Sep, 2020

1 commit

40249c696 gcov: add support for GCC 10.1 ... Browse Code »

Using gcov to collect coverage data for kernels compiled with GCC 10.1
causes random malfunctions and kernel crashes. This is the result of a
changed GCOV_COUNTERS value in GCC 10.1 that causes a mismatch between
the layout of the gcov_info structure created by GCC profiling code and
the related structure used by the kernel.

Fix this by updating the in-kernel GCOV_COUNTERS value. Also re-enable
config GCOV_KERNEL for use with GCC 10.

Reported-by: Colin Ian King
Reported-by: Leon Romanovsky
Signed-off-by: Peter Oberparleiter
Tested-by: Leon Romanovsky
Tested-and-Acked-by: Colin Ian King
Signed-off-by: Linus Torvalds

Peter Oberparleiter
2020-09-12 00:33:54 +0800

10 Sep, 2020

1 commit

7fe10096c Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 ... Browse Code »

Pull crypto fix from Herbert Xu:
"This fixes a regression in padata"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
padata: fix possible padata_works_lock deadlock

Linus Torvalds
2020-09-10 10:46:22 +0800

09 Sep, 2020

2 commits

e83931790 seccomp: don't leave dangling ->notif if file allocation fails ... Browse Code »

Christian and Kees both pointed out that this is a bit sloppy to open-code
both places, and Christian points out that we leave a dangling pointer to
->notif if file allocation fails. Since we check ->notif for null in order
to determine if it's ok to install a filter, this means people won't be
able to install a filter if the file allocation fails for some reason, even
if they subsequently should be able to.

To fix this, let's hoist this free+null into its own little helper and use
it.

Reported-by: Kees Cook
Reported-by: Christian Brauner
Signed-off-by: Tycho Andersen
Acked-by: Christian Brauner
Link: https://lore.kernel.org/r/20200902140953.1201956-1-tycho@tycho.pizza
Signed-off-by: Kees Cook

Tycho Andersen
2020-09-09 02:30:16 +0800
a566a9012 seccomp: don't leak memory when filter install races ... Browse Code »

In seccomp_set_mode_filter() with TSYNC | NEW_LISTENER, we first initialize
the listener fd, then check to see if we can actually use it later in
seccomp_may_assign_mode(), which can fail if anyone else in our thread
group has installed a filter and caused some divergence. If we can't, we
partially clean up the newly allocated file: we put the fd, put the file,
but don't actually clean up the *memory* that was allocated at
filter->notif. Let's clean that up too.

To accomplish this, let's hoist the actual "detach a notifier from a
filter" code to its own helper out of seccomp_notify_release(), so that in
case anyone adds stuff to init_listener(), they only have to add the
cleanup code in one spot. This does a bit of extra locking and such on the
failure path when the filter is not attached, but it's a slow failure path
anyway.

Fixes: 51891498f2da ("seccomp: allow TSYNC and USER_NOTIF together")
Reported-by: syzbot+3ad9614a12f80994c32e@syzkaller.appspotmail.com
Signed-off-by: Tycho Andersen
Acked-by: Christian Brauner
Link: https://lore.kernel.org/r/20200902014017.934315-1-tycho@tycho.pizza
Signed-off-by: Kees Cook

Tycho Andersen
2020-09-09 02:19:50 +0800

07 Sep, 2020

1 commit

015b3155c Merge tag 'x86-urgent-2020-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 fixes from Ingo Molnar:

- more generic entry code ABI fallout

- debug register handling bugfixes

- fix vmalloc mappings on 32-bit kernels

- kprobes instrumentation output fix on 32-bit kernels

- fix over-eager WARN_ON_ONCE() on !SMAP hardware

- NUMA debugging fix

- fix Clang related crash on !RETPOLINE kernels

* tag 'x86-urgent-2020-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/entry: Unbreak 32bit fast syscall
x86/debug: Allow a single level of #DB recursion
x86/entry: Fix AC assertion
tracing/kprobes, x86/ptrace: Fix regs argument order for i386
x86, fakenuma: Fix invalid starting node ID
x86/mm/32: Bring back vmalloc faulting on x86_32
x86/cmdline: Disable jump tables for cmdline.c

Linus Torvalds
2020-09-07 01:28:00 +0800

06 Sep, 2020

2 commits

7514c0362 Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge misc fixes from Andrew Morton:
"19 patches.

Subsystems affected by this patch series: MAINTAINERS, ipc, fork,
checkpatch, lib, and mm (memcg, slub, pagemap, madvise, migration,
hugetlb)"

* emailed patches from Andrew Morton :
include/linux/log2.h: add missing () around n in roundup_pow_of_two()
mm/khugepaged.c: fix khugepaged's request size in collapse_file
mm/hugetlb: fix a race between hugetlb sysctl handlers
mm/hugetlb: try preferred node first when alloc gigantic page from cma
mm/migrate: preserve soft dirty in remove_migration_pte()
mm/migrate: remove unnecessary is_zone_device_page() check
mm/rmap: fixup copying of soft dirty and uffd ptes
mm/migrate: fixup setting UFFD_WP flag
mm: madvise: fix vma user-after-free
checkpatch: fix the usage of capture group ( ... )
fork: adjust sysctl_max_threads definition to match prototype
ipc: adjust proc_ipc_sem_dointvec definition to match prototype
mm: track page table modifications in __apply_to_page_range()
MAINTAINERS: IA64: mark Status as Odd Fixes only
MAINTAINERS: add LLVM maintainers
MAINTAINERS: update Cavium/Marvell entries
mm: slub: fix conversion of freelist_corrupted()
mm: memcg: fix memcg reclaim soft lockup
memcg: fix use-after-free in uncharge_batch

Linus Torvalds
2020-09-06 04:28:40 +0800
b0daa2c73 fork: adjust sysctl_max_threads definition to match prototype ... Browse Code »

Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
changed ctl_table.proc_handler to take a kernel pointer. Adjust the
definition of sysctl_max_threads to match its prototype in
linux/sysctl.h which fixes the following sparse error/warning:

kernel/fork.c:3050:47: warning: incorrect type in argument 3 (different address spaces)
kernel/fork.c:3050:47: expected void *
kernel/fork.c:3050:47: got void [noderef] __user *buffer
kernel/fork.c:3036:5: error: symbol 'sysctl_max_threads' redeclared with different type (incompatible argument 3 (different address spaces)):
kernel/fork.c:3036:5: int extern [addressable] [signed] [toplevel] sysctl_max_threads( ... )
kernel/fork.c: note: in included file (through include/linux/key.h, include/linux/cred.h, include/linux/sched/signal.h, include/linux/sched/cputime.h):
include/linux/sysctl.h:242:5: note: previously declared as:
include/linux/sysctl.h:242:5: int extern [addressable] [signed] [toplevel] sysctl_max_threads( ... )

Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
Signed-off-by: Tobias Klauser
Signed-off-by: Andrew Morton
Cc: Christoph Hellwig
Cc: Al Viro
Link: https://lkml.kernel.org/r/20200825093647.24263-1-tklauser@distanz.ch
Signed-off-by: Linus Torvalds

Tobias Klauser
2020-09-06 03:14:29 +0800

05 Sep, 2020

1 commit

cfc905f15 gcov: Disable gcov build with GCC 10 ... Browse Code »

GCOV built with GCC 10 doesn't initialize n_function variable. This
produces different kernel panics as was seen by Colin in Ubuntu and me
in FC 32.

As a workaround, let's disable GCOV build for broken GCC 10 version.

Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1891288
Link: https://lore.kernel.org/lkml/20200827133932.3338519-1-leon@kernel.org
Link: https://lore.kernel.org/lkml/CAHk-=whbijeSdSvx-Xcr0DPMj0BiwhJ+uiNnDSVZcr_h_kg7UA@mail.gmail.com/
Cc: Colin Ian King
Signed-off-by: Leon Romanovsky
Signed-off-by: Linus Torvalds

Leon Romanovsky
2020-09-05 00:19:49 +0800

04 Sep, 2020

4 commits

4facb95b7 x86/entry: Unbreak 32bit fast syscall ... Browse Code »

Andy reported that the syscall treacing for 32bit fast syscall fails:

# ./tools/testing/selftests/x86/ptrace_syscall_32
...
[RUN] SYSEMU
[FAIL] Initial args are wrong (nr=224, args=10 11 12 13 14 4289172732)
...
[RUN] SYSCALL
[FAIL] Initial args are wrong (nr=29, args=0 0 0 0 0 4289172732)

The eason is that the conversion to generic entry code moved the retrieval
of the sixth argument (EBP) after the point where the syscall entry work
runs, i.e. ptrace, seccomp, audit...

Unbreak it by providing a split up version of syscall_enter_from_user_mode().

- syscall_enter_from_user_mode_prepare() establishes state and enables
interrupts

- syscall_enter_from_user_mode_work() runs the entry work

Replace the call to syscall_enter_from_user_mode() in the 32bit fast
syscall C-entry with the split functions and stick the EBP retrieval
between them.

Fixes: 27d6b4d14f5c ("x86/entry: Use generic syscall entry function")
Reported-by: Andy Lutomirski
Signed-off-by: Thomas Gleixner
Link: https://lore.kernel.org/r/87k0xdjbtt.fsf@nanos.tec.linutronix.de

Thomas Gleixner
2020-09-04 21:50:14 +0800
1b0df11fd padata: fix possible padata_works_lock deadlock ... Browse Code »

syzbot reports,

WARNING: inconsistent lock state
5.9.0-rc2-syzkaller #0 Not tainted
--------------------------------
inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
syz-executor.0/26715 takes:
(padata_works_lock){+.?.}-{2:2}, at: padata_do_parallel kernel/padata.c:220
{IN-SOFTIRQ-W} state was registered at:
spin_lock include/linux/spinlock.h:354 [inline]
padata_do_parallel kernel/padata.c:220
...
__do_softirq kernel/softirq.c:298
...
sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1091
asm_sysvec_apic_timer_interrupt arch/x86/include/asm/idtentry.h:581

Possible unsafe locking scenario:

CPU0
----
lock(padata_works_lock);

lock(padata_works_lock);

padata_do_parallel() takes padata_works_lock with softirqs enabled, so a
deadlock is possible if, on the same CPU, the lock is acquired in
process context and then softirq handling done in an interrupt leads to
the same path.

Fix by leaving softirqs disabled while do_parallel holds
padata_works_lock.

Reported-by: syzbot+f4b9f49e38e25eb4ef52@syzkaller.appspotmail.com
Fixes: 4611ce2246889 ("padata: allocate work structures for parallel jobs from a pool")
Signed-off-by: Daniel Jordan
Cc: Herbert Xu
Cc: Steffen Klassert
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Herbert Xu

Daniel Jordan
2020-09-04 15:51:55 +0800
3e8d3bdc2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net ... Browse Code »

Pull networking fixes from David Miller:

1) Use netif_rx_ni() when necessary in batman-adv stack, from Jussi
Kivilinna.

2) Fix loss of RTT samples in rxrpc, from David Howells.

3) Memory leak in hns_nic_dev_probe(), from Dignhao Liu.

4) ravb module cannot be unloaded, fix from Yuusuke Ashizuka.

5) We disable BH for too lokng in sctp_get_port_local(), add a
cond_resched() here as well, from Xin Long.

6) Fix memory leak in st95hf_in_send_cmd, from Dinghao Liu.

7) Out of bound access in bpf_raw_tp_link_fill_link_info(), from
Yonghong Song.

8) Missing of_node_put() in mt7530 DSA driver, from Sumera
Priyadarsini.

9) Fix crash in bnxt_fw_reset_task(), from Michael Chan.

10) Fix geneve tunnel checksumming bug in hns3, from Yi Li.

11) Memory leak in rxkad_verify_response, from Dinghao Liu.

12) In tipc, don't use smp_processor_id() in preemptible context. From
Tuong Lien.

13) Fix signedness issue in mlx4 memory allocation, from Shung-Hsi Yu.

14) Missing clk_disable_prepare() in gemini driver, from Dan Carpenter.

15) Fix ABI mismatch between driver and firmware in nfp, from Louis
Peens.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (110 commits)
net/smc: fix sock refcounting in case of termination
net/smc: reset sndbuf_desc if freed
net/smc: set rx_off for SMCR explicitly
net/smc: fix toleration of fake add_link messages
tg3: Fix soft lockup when tg3_reset_task() fails.
doc: net: dsa: Fix typo in config code sample
net: dp83867: Fix WoL SecureOn password
nfp: flower: fix ABI mismatch between driver and firmware
tipc: fix shutdown() of connectionless socket
ipv6: Fix sysctl max for fib_multipath_hash_policy
drivers/net/wan/hdlc: Change the default of hard_header_len to 0
net: gemini: Fix another missing clk_disable_unprepare() in probe
net: bcmgenet: fix mask check in bcmgenet_validate_flow()
amd-xgbe: Add support for new port mode
net: usb: dm9601: Add USB ID of Keenetic Plus DSL
vhost: fix typo in error message
net: ethernet: mlx4: Fix memory allocation in mlx4_buddy_init()
pktgen: fix error message with wrong function name
net: ethernet: ti: am65-cpsw: fix rmii 100Mbit link mode
cxgb4: fix thermal zone device registration
...

Linus Torvalds
2020-09-04 09:50:48 +0800
dc0988bbe bpf: Do not use bucket_lock for hashmap iterator ... Browse Code »

Currently, for hashmap, the bpf iterator will grab a bucket lock, a
spinlock, before traversing the elements in the bucket. This can ensure
all bpf visted elements are valid. But this mechanism may cause
deadlock if update/deletion happens to the same bucket of the
visited map in the program. For example, if we added bpf_map_update_elem()
call to the same visited element in selftests bpf_iter_bpf_hash_map.c,
we will have the following deadlock:

============================================
WARNING: possible recursive locking detected
5.9.0-rc1+ #841 Not tainted
--------------------------------------------
test_progs/1750 is trying to acquire lock:
ffff9a5bb73c5e70 (&htab->buckets[i].raw_lock){....}-{2:2}, at: htab_map_update_elem+0x1cf/0x410

but task is already holding lock:
ffff9a5bb73c5e20 (&htab->buckets[i].raw_lock){....}-{2:2}, at: bpf_hash_map_seq_find_next+0x94/0x120

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(&htab->buckets[i].raw_lock);
lock(&htab->buckets[i].raw_lock);

*** DEADLOCK ***
...
Call Trace:
dump_stack+0x78/0xa0
__lock_acquire.cold.74+0x209/0x2e3
lock_acquire+0xba/0x380
? htab_map_update_elem+0x1cf/0x410
? __lock_acquire+0x639/0x20c0
_raw_spin_lock_irqsave+0x3b/0x80
? htab_map_update_elem+0x1cf/0x410
htab_map_update_elem+0x1cf/0x410
? lock_acquire+0xba/0x380
bpf_prog_ad6dab10433b135d_dump_bpf_hash_map+0x88/0xa9c
? find_held_lock+0x34/0xa0
bpf_iter_run_prog+0x81/0x16e
__bpf_hash_map_seq_show+0x145/0x180
bpf_seq_read+0xff/0x3d0
vfs_read+0xad/0x1c0
ksys_read+0x5f/0xe0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9
...

The bucket_lock first grabbed in seq_ops->next() called by bpf_seq_read(),
and then grabbed again in htab_map_update_elem() in the bpf program, causing
deadlocks.

Actually, we do not need bucket_lock here, we can just use rcu_read_lock()
similar to netlink iterator where the rcu_read_{lock,unlock} likes below:
seq_ops->start():
rcu_read_lock();
seq_ops->next():
rcu_read_unlock();
/* next element */
rcu_read_lock();
seq_ops->stop();
rcu_read_unlock();

Compared to old bucket_lock mechanism, if concurrent updata/delete happens,
we may visit stale elements, miss some elements, or repeat some elements.
I think this is a reasonable compromise. For users wanting to avoid
stale, missing/repeated accesses, bpf_map batch access syscall interface
can be used.

Signed-off-by: Yonghong Song
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/20200902235340.2001375-1-yhs@fb.com

Yonghong Song
2020-09-04 08:36:41 +0800

03 Sep, 2020

1 commit

23870f122 locking/lockdep: Fix "USED" <- "IN-NMI" inversions ... Browse Code »

During the LPC RCU BoF Paul asked how come the "USED" usage_mask & LOCK_USED))
+ if (!(class->usage_mask & LOCKF_USED))

fixing that will indeed cause rcu_read_lock() to insta-splat :/

The above typo means that instead of testing for: 0x100 (1 <<
LOCK_USED), we test for 8 (LOCK_USED), which corresponds to (1 <<
LOCK_ENABLED_HARDIRQ).

So instead of testing for _any_ used lock, it will only match any lock
used with interrupts enabled.

The rcu_read_lock() annotation uses .check=0, which means it will not
set any of the interrupt bits and will thus never match.

In order to properly fix the situation and allow rcu_read_lock() to
correctly work, split LOCK_USED into LOCK_USED and LOCK_USED_READ and by
having .read users set USED_READ and test USED, pure read-recursive
locks are permitted.

Fixes: f6f48e180404 ("lockdep: Teach lockdep about "USED"
Signed-off-by: Ingo Molnar
Tested-by: Masami Hiramatsu
Acked-by: Paul E. McKenney
Link: https://lore.kernel.org/r/20200902160323.GK1362448@hirez.programming.kicks-ass.net

peterz@infradead.org
2020-09-03 17:19:42 +0800

31 Aug, 2020

2 commits

dcc5c6f01 Merge tag 'x86-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 fixes from Thomas Gleixner:
"Three interrupt related fixes for X86:

- Move disabling of the local APIC after invoking fixup_irqs() to
ensure that interrupts which are incoming are noted in the IRR and
not ignored.

- Unbreak affinity setting.

The rework of the entry code reused the regular exception entry
code for device interrupts. The vector number is pushed into the
errorcode slot on the stack which is then lifted into an argument
and set to -1 because that's regs->orig_ax which is used in quite
some places to check whether the entry came from a syscall.

But it was overlooked that orig_ax is used in the affinity cleanup
code to validate whether the interrupt has arrived on the new
target. It turned out that this vector check is pointless because
interrupts are never moved from one vector to another on the same
CPU. That check is a historical leftover from the time where x86
supported multi-CPU affinities, but not longer needed with the now
strict single CPU affinity. Famous last words ...

- Add a missing check for an empty cpumask into the matrix allocator.

The affinity change added a warning to catch the case where an
interrupt is moved on the same CPU to a different vector. This
triggers because a condition with an empty cpumask returns an
assignment from the allocator as the allocator uses for_each_cpu()
without checking the cpumask for being empty. The historical
inconsistent for_each_cpu() behaviour of ignoring the cpumask and
unconditionally claiming that CPU0 is in the mask struck again.
Sigh.

plus a new entry into the MAINTAINER file for the HPE/UV platform"

* tag 'x86-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/matrix: Deal with the sillyness of for_each_cpu() on UP
x86/irq: Unbreak interrupt affinity setting
x86/hotplug: Silence APIC only after all interrupts are migrated
MAINTAINERS: Add entry for HPE Superdome Flex (UV) maintainers

Linus Torvalds
2020-08-31 03:01:23 +0800
b69bea8a6 Merge tag 'locking-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull locking fixes from Thomas Gleixner:
"A set of fixes for lockdep, tracing and RCU:

- Prevent recursion by using raw_cpu_* operations

- Fixup the interrupt state in the cpu idle code to be consistent

- Push rcu_idle_enter/exit() invocations deeper into the idle path so
that the lock operations are inside the RCU watching sections

- Move trace_cpu_idle() into generic code so it's called before RCU
goes idle.

- Handle raw_local_irq* vs. local_irq* operations correctly

- Move the tracepoints out from under the lockdep recursion handling
which turned out to be fragile and inconsistent"

* tag 'locking-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
lockdep,trace: Expose tracepoints
lockdep: Only trace IRQ edges
mips: Implement arch_irqs_disabled()
arm64: Implement arch_irqs_disabled()
nds32: Implement arch_irqs_disabled()
locking/lockdep: Cleanup
x86/entry: Remove unused THUNKs
cpuidle: Move trace_cpu_idle() into generic code
cpuidle: Make CPUIDLE_FLAG_TLB_FLUSHED generic
sched,idle,rcu: Push rcu_idle deeper into the idle path
cpuidle: Fixup IRQ state
lockdep: Use raw_cpu_*() for per-cpu variables

Linus Torvalds
2020-08-31 02:43:50 +0800