23 Sep, 2020

2 commits

  • Pull networking fixes from Jakub Kicinski:

    - fix failure to add bond interfaces to a bridge, the offload-handling
    code was too defensive there and recent refactoring unearthed that.
    Users complained (Ido)

    - fix unnecessarily reflecting ECN bits within TOS values / QoS marking
    in TCP ACK and reset packets (Wei)

    - fix a deadlock with bpf iterator. Hopefully we're in the clear on
    this front now... (Yonghong)

    - BPF fix for clobbering r2 in bpf_gen_ld_abs (Daniel)

    - fix AQL on mt76 devices with FW rate control and add a couple of AQL
    issues in mac80211 code (Felix)

    - fix authentication issue with mwifiex (Maximilian)

    - WiFi connectivity fix: revert IGTK support in ti/wlcore (Mauro)

    - fix exception handling for multipath routes via same device (David
    Ahern)

    - revert back to a BH spin lock flavor for nsid_lock: there are paths
    which do require the BH context protection (Taehee)

    - fix interrupt / queue / NAPI handling in the lantiq driver (Hauke)

    - fix ife module load deadlock (Cong)

    - make an adjustment to netlink reply message type for code added in
    this release (the sole change touching uAPI here) (Michal)

    - a number of fixes for small NXP and Microchip switches (Vladimir)

    [ Pull request acked by David: "you can expect more of this in the
    future as I try to delegate more things to Jakub" ]

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (167 commits)
    net: mscc: ocelot: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries
    net: dsa: seville: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries
    net: dsa: felix: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries
    inet_diag: validate INET_DIAG_REQ_PROTOCOL attribute
    net: bridge: br_vlan_get_pvid_rcu() should dereference the VLAN group under RCU
    net: Update MAINTAINERS for MediaTek switch driver
    net/mlx5e: mlx5e_fec_in_caps() returns a boolean
    net/mlx5e: kTLS, Avoid kzalloc(GFP_KERNEL) under spinlock
    net/mlx5e: kTLS, Fix leak on resync error flow
    net/mlx5e: kTLS, Add missing dma_unmap in RX resync
    net/mlx5e: kTLS, Fix napi sync and possible use-after-free
    net/mlx5e: TLS, Do not expose FPGA TLS counter if not supported
    net/mlx5e: Fix using wrong stats_grps in mlx5e_update_ndo_stats()
    net/mlx5e: Fix multicast counter not up-to-date in "ip -s"
    net/mlx5e: Fix endianness when calculating pedit mask first bit
    net/mlx5e: Enable adding peer miss rules only if merged eswitch is supported
    net/mlx5e: CT: Fix freeing ct_label mapping
    net/mlx5e: Fix memory leak of tunnel info when rule under multipath not ready
    net/mlx5e: Use synchronize_rcu to sync with NAPI
    net/mlx5e: Use RCU to protect rq->xdp_prog
    ...

    Linus Torvalds
     
  • Pull tracing fixes from Steven Rostedt:

    - Check kprobe is enabled before unregistering from ftrace as it isn't
    registered when disabled.

    - Remove kprobes enabled via command-line that is on init text when
    freed.

    - Add missing RCU synchronization for ftrace trampoline symbols removed
    from kallsyms.

    - Free trampoline on error path if ftrace_startup() fails.

    - Give more space for the longer PID numbers in trace output.

    - Fix a possible double free in the histogram code.

    - A couple of fixes that were discovered by sparse.

    * tag 'trace-v5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    bootconfig: init: make xbc_namebuf static
    kprobes: tracing/kprobes: Fix to kill kprobes on initmem after boot
    tracing: fix double free
    ftrace: Let ftrace_enable_sysctl take a kernel pointer buffer
    tracing: Make the space reserved for the pid wider
    ftrace: Fix missing synchronize_rcu() removing trampoline from kallsyms
    ftrace: Free the trampoline when ftrace_startup() fails
    kprobes: Fix to check probe enabled before disarm_kprobe_ftrace()

    Linus Torvalds
     

22 Sep, 2020

1 commit

  • Pull RCU fix from Paul McKenney:
    "This contains a single commit that fixes a bug that was introduced in
    the last merge window. This bug causes a compiler warning complaining
    about show_rcu_tasks_classic_gp_kthread() being an unused static
    function in !SMP kernels.

    The fix is straightforward, just adding an 'inline' to make this a
    static inline function, thus avoiding the warning.

    This bug was reported by Laurent Pinchart, who would like it fixed
    sooner rather than later"

    * 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
    rcu-tasks: Prevent complaints of unused show_rcu_tasks_classic_gp_kthread()

    Linus Torvalds
     

21 Sep, 2020

2 commits


20 Sep, 2020

4 commits

  • Merge fixes from Andrew Morton:
    "15 patches.

    Subsystems affected by this patch series: mailmap, mm/hotfixes,
    mm/thp, mm/memory-hotplug, misc, kcsan"

    * emailed patches from Andrew Morton :
    kcsan: kconfig: move to menu 'Generic Kernel Debugging Instruments'
    fs/fs-writeback.c: adjust dirtytime_interval_handler definition to match prototype
    stackleak: let stack_erasing_sysctl take a kernel pointer buffer
    ftrace: let ftrace_enable_sysctl take a kernel pointer buffer
    mm/memory_hotplug: drain per-cpu pages again during memory offline
    selftests/vm: fix display of page size in map_hugetlb
    mm/thp: fix __split_huge_pmd_locked() for migration PMD
    kprobes: fix kill kprobe which has been marked as gone
    tmpfs: restore functionality of nr_inodes=0
    mlock: fix unevictable_pgs event counts on THP
    mm: fix check_move_unevictable_pages() on THP
    mm: migration of hugetlbfs page skip memcg
    ksm: reinstate memcg charge on copied pages
    mailmap: add older email addresses for Kees Cook

    Linus Torvalds
     
  • Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
    changed ctl_table.proc_handler to take a kernel pointer. Adjust the
    signature of stack_erasing_sysctl to match ctl_table.proc_handler which
    fixes the following sparse warning:

    kernel/stackleak.c:31:50: warning: incorrect type in argument 3 (different address spaces)
    kernel/stackleak.c:31:50: expected void *
    kernel/stackleak.c:31:50: got void [noderef] __user *buffer

    Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
    Signed-off-by: Tobias Klauser
    Signed-off-by: Andrew Morton
    Cc: Christoph Hellwig
    Cc: Al Viro
    Link: https://lkml.kernel.org/r/20200907093253.13656-1-tklauser@distanz.ch
    Signed-off-by: Linus Torvalds

    Tobias Klauser
     
  • Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
    changed ctl_table.proc_handler to take a kernel pointer. Adjust the
    signature of ftrace_enable_sysctl to match ctl_table.proc_handler which
    fixes the following sparse warning:

    kernel/trace/ftrace.c:7544:43: warning: incorrect type in argument 3 (different address spaces)
    kernel/trace/ftrace.c:7544:43: expected void *
    kernel/trace/ftrace.c:7544:43: got void [noderef] __user *buffer

    Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
    Signed-off-by: Tobias Klauser
    Signed-off-by: Andrew Morton
    Cc: Christoph Hellwig
    Cc: Al Viro
    Link: https://lkml.kernel.org/r/20200907093207.13540-1-tklauser@distanz.ch
    Signed-off-by: Linus Torvalds

    Tobias Klauser
     
  • If a kprobe is marked as gone, we should not kill it again. Otherwise, we
    can disarm the kprobe more than once. In that case, the statistics of
    kprobe_ftrace_enabled can unbalance which can lead to that kprobe do not
    work.

    Fixes: e8386a0cb22f ("kprobes: support probing module __exit function")
    Co-developed-by: Chengming Zhou
    Signed-off-by: Muchun Song
    Signed-off-by: Chengming Zhou
    Signed-off-by: Andrew Morton
    Acked-by: Masami Hiramatsu
    Cc: "Naveen N . Rao"
    Cc: Anil S Keshavamurthy
    Cc: David S. Miller
    Cc: Song Liu
    Cc: Steven Rostedt
    Cc:
    Link: https://lkml.kernel.org/r/20200822030055.32383-1-songmuchun@bytedance.com
    Signed-off-by: Linus Torvalds

    Muchun Song
     

19 Sep, 2020

7 commits

  • Pull s390 fixes from Vasily Gorbik:

    - Fix order in trace_hardirqs_off_caller() to make locking state
    consistent even if the IRQ tracer calls into lockdep again. Touches
    common code. Acked-by Peter Zijlstra.

    - Correctly handle secure storage violation exception to avoid kernel
    panic triggered by user space misbehaviour.

    - Switch the idle->seqcount over to using raw_write_*() to avoid
    "suspicious RCU usage".

    - Fix memory leaks on hard unplug in pci code.

    - Use kvmalloc instead of kmalloc for larger allocations in zcrypt.

    - Add few missing __init annotations to static functions to avoid
    section mismatch complains when functions are not inlined.

    * tag 's390-5.9-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
    s390: add 3f program exception handler
    lockdep: fix order in trace_hardirqs_off_caller()
    s390/pci: fix leak of DMA tables on hard unplug
    s390/init: add missing __init annotations
    s390/zcrypt: fix kmalloc 256k failure
    s390/idle: fix suspicious RCU usage

    Linus Torvalds
     
  • Since kprobe_event= cmdline option allows user to put kprobes on the
    functions in initmem, kprobe has to make such probes gone after boot.
    Currently the probes on the init functions in modules will be handled
    by module callback, but the kernel init text isn't handled.
    Without this, kprobes may access non-exist text area to disable or
    remove it.

    Link: https://lkml.kernel.org/r/159972810544.428528.1839307531600646955.stgit@devnote2

    Fixes: 970988e19eb0 ("tracing/kprobe: Add kprobe_event= boot parameter")
    Cc: Jonathan Corbet
    Cc: Shuah Khan
    Cc: Randy Dunlap
    Cc: Ingo Molnar
    Cc: stable@vger.kernel.org
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • clang static analyzer reports this problem

    trace_events_hist.c:3824:3: warning: Attempt to free
    released memory
    kfree(hist_data->attrs->var_defs.name[i]);

    In parse_var_defs() if there is a problem allocating
    var_defs.expr, the earlier var_defs.name is freed.
    This free is duplicated by free_var_defs() which frees
    the rest of the list.

    Because free_var_defs() has to run anyway, remove the
    second free fom parse_var_defs().

    Link: https://lkml.kernel.org/r/20200907135845.15804-1-trix@redhat.com

    Cc: stable@vger.kernel.org
    Fixes: 30350d65ac56 ("tracing: Add variable support to hist triggers")
    Reviewed-by: Tom Zanussi
    Signed-off-by: Tom Rix
    Signed-off-by: Steven Rostedt (VMware)

    Tom Rix
     
  • Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
    changed ctl_table.proc_handler to take a kernel pointer. Adjust the
    signature of ftrace_enable_sysctl to match ctl_table.proc_handler which
    fixes the following sparse warning:

    kernel/trace/ftrace.c:7544:43: warning: incorrect type in argument 3 (different address spaces)
    kernel/trace/ftrace.c:7544:43: expected void *
    kernel/trace/ftrace.c:7544:43: got void [noderef] __user *buffer

    Link: https://lkml.kernel.org/r/20200907093207.13540-1-tklauser@distanz.ch

    Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
    Cc: Andrew Morton
    Cc: Ingo Molnar
    Cc: Christoph Hellwig
    Cc: Al Viro
    Signed-off-by: Tobias Klauser
    Signed-off-by: Steven Rostedt (VMware)

    Tobias Klauser
     
  • For 64bit CONFIG_BASE_SMALL=0 systems PID_MAX_LIMIT is set by default to
    4194304. During boot the kernel sets a new value based on number of CPUs
    but no lower than 32768. It is 1024 per CPU so with 128 CPUs the default
    becomes 131072 which needs six digits.
    This value can be increased during run time but must not exceed the
    initial upper limit.

    Systemd sometime after v241 sets it to the upper limit during boot. The
    result is that when the pid exceeds five digits, the trace output is a
    little hard to read because it is no longer properly padded (same like
    on big iron with 98+ CPUs).

    Increase the pid padding to seven digits.

    Link: https://lkml.kernel.org/r/20200904082331.dcdkrr3bkn3e4qlg@linutronix.de

    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Steven Rostedt (VMware)

    Sebastian Andrzej Siewior
     
  • Add synchronize_rcu() after list_del_rcu() in
    ftrace_remove_trampoline_from_kallsyms() to protect readers of
    ftrace_ops_trampoline_list (in ftrace_get_trampoline_kallsym)
    which is used when kallsyms is read.

    Link: https://lkml.kernel.org/r/20200901091617.31837-1-adrian.hunter@intel.com

    Fixes: fc0ea795f53c8d ("ftrace: Add symbols for ftrace trampolines")
    Signed-off-by: Adrian Hunter
    Signed-off-by: Steven Rostedt (VMware)

    Adrian Hunter
     
  • Commit fc0ea795f53c ("ftrace: Add symbols for ftrace trampolines")
    missed to remove ops from new ftrace_ops_trampoline_list in
    ftrace_startup() if ftrace_hash_ipmodify_enable() fails there. It may
    lead to BUG if such ops come from a module which may be removed.

    Moreover, the trampoline itself is not freed in this case.

    Fix it by calling ftrace_trampoline_free() during the rollback.

    Link: https://lkml.kernel.org/r/20200831122631.28057-1-mbenes@suse.cz

    Fixes: fc0ea795f53c ("ftrace: Add symbols for ftrace trampolines")
    Fixes: f8b8be8a310a ("ftrace, kprobes: Support IPMODIFY flag to find IP modify conflict")
    Signed-off-by: Miroslav Benes
    Signed-off-by: Steven Rostedt (VMware)

    Miroslav Benes
     

18 Sep, 2020

2 commits

  • Commit 0cb2f1372baa ("kprobes: Fix NULL pointer dereference at
    kprobe_ftrace_handler") fixed one bug but not completely fixed yet.
    If we run a kprobe_module.tc of ftracetest, kernel showed a warning
    as below.

    # ./ftracetest test.d/kprobe/kprobe_module.tc
    === Ftrace unit tests ===
    [1] Kprobe dynamic event - probing module
    ...
    [ 22.400215] ------------[ cut here ]------------
    [ 22.400962] Failed to disarm kprobe-ftrace at trace_printk_irq_work+0x0/0x7e [trace_printk] (-2)
    [ 22.402139] WARNING: CPU: 7 PID: 200 at kernel/kprobes.c:1091 __disarm_kprobe_ftrace.isra.0+0x7e/0xa0
    [ 22.403358] Modules linked in: trace_printk(-)
    [ 22.404028] CPU: 7 PID: 200 Comm: rmmod Not tainted 5.9.0-rc2+ #66
    [ 22.404870] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014
    [ 22.406139] RIP: 0010:__disarm_kprobe_ftrace.isra.0+0x7e/0xa0
    [ 22.406947] Code: 30 8b 03 eb c9 80 3d e5 09 1f 01 00 75 dc 49 8b 34 24 89 c2 48 c7 c7 a0 c2 05 82 89 45 e4 c6 05 cc 09 1f 01 01 e8 a9 c7 f0 ff 0b 8b 45 e4 eb b9 89 c6 48 c7 c7 70 c2 05 82 89 45 e4 e8 91 c7
    [ 22.409544] RSP: 0018:ffffc90000237df0 EFLAGS: 00010286
    [ 22.410385] RAX: 0000000000000000 RBX: ffffffff83066024 RCX: 0000000000000000
    [ 22.411434] RDX: 0000000000000001 RSI: ffffffff810de8d3 RDI: ffffffff810de8d3
    [ 22.412687] RBP: ffffc90000237e10 R08: 0000000000000001 R09: 0000000000000001
    [ 22.413762] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88807c478640
    [ 22.414852] R13: ffffffff8235ebc0 R14: ffffffffa00060c0 R15: 0000000000000000
    [ 22.415941] FS: 00000000019d48c0(0000) GS:ffff88807d7c0000(0000) knlGS:0000000000000000
    [ 22.417264] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 22.418176] CR2: 00000000005bb7e3 CR3: 0000000078f7a000 CR4: 00000000000006a0
    [ 22.419309] Call Trace:
    [ 22.419990] kill_kprobe+0x94/0x160
    [ 22.420652] kprobes_module_callback+0x64/0x230
    [ 22.421470] notifier_call_chain+0x4f/0x70
    [ 22.422184] blocking_notifier_call_chain+0x49/0x70
    [ 22.422979] __x64_sys_delete_module+0x1ac/0x240
    [ 22.423733] do_syscall_64+0x38/0x50
    [ 22.424366] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [ 22.425176] RIP: 0033:0x4bb81d
    [ 22.425741] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e0 ff ff ff f7 d8 64 89 01 48
    [ 22.428726] RSP: 002b:00007ffc70fef008 EFLAGS: 00000246 ORIG_RAX: 00000000000000b0
    [ 22.430169] RAX: ffffffffffffffda RBX: 00000000019d48a0 RCX: 00000000004bb81d
    [ 22.431375] RDX: 0000000000000000 RSI: 0000000000000880 RDI: 00007ffc70fef028
    [ 22.432543] RBP: 0000000000000880 R08: 00000000ffffffff R09: 00007ffc70fef320
    [ 22.433692] R10: 0000000000656300 R11: 0000000000000246 R12: 00007ffc70fef028
    [ 22.434635] R13: 0000000000000000 R14: 0000000000000002 R15: 0000000000000000
    [ 22.435682] irq event stamp: 1169
    [ 22.436240] hardirqs last enabled at (1179): [] console_unlock+0x422/0x580
    [ 22.437466] hardirqs last disabled at (1188): [] console_unlock+0x7b/0x580
    [ 22.438608] softirqs last enabled at (866): [] __do_softirq+0x38e/0x490
    [ 22.439637] softirqs last disabled at (859): [] asm_call_on_stack+0x12/0x20
    [ 22.440690] ---[ end trace 1e7ce7e1e4567276 ]---
    [ 22.472832] trace_kprobe: This probe might be able to register after target module is loaded. Continue.

    This is because the kill_kprobe() calls disarm_kprobe_ftrace() even
    if the given probe is not enabled. In that case, ftrace_set_filter_ip()
    fails because the given probe point is not registered to ftrace.

    Fix to check the given (going) probe is enabled before invoking
    disarm_kprobe_ftrace().

    Link: https://lkml.kernel.org/r/159888672694.1411785.5987998076694782591.stgit@devnote2

    Fixes: 0cb2f1372baa ("kprobes: Fix NULL pointer dereference at kprobe_ftrace_handler")
    Cc: Ingo Molnar
    Cc: "Naveen N . Rao"
    Cc: Anil S Keshavamurthy
    Cc: David Miller
    Cc: Muchun Song
    Cc: Chengming Zhou
    Cc: stable@vger.kernel.org
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Commit 2a9127fcf229 ("mm: rewrite wait_on_page_bit_common() logic") made
    the page locking entirely fair, in that if a waiter came in while the
    lock was held, the lock would be transferred to the lockers strictly in
    order.

    That was intended to finally get rid of the long-reported watchdog
    failures that involved the page lock under extreme load, where a process
    could end up waiting essentially forever, as other page lockers stole
    the lock from under it.

    It also improved some benchmarks, but it ended up causing huge
    performance regressions on others, simply because fair lock behavior
    doesn't end up giving out the lock as aggressively, causing better
    worst-case latency, but potentially much worse average latencies and
    throughput.

    Instead of reverting that change entirely, this introduces a controlled
    amount of unfairness, with a sysctl knob to tune it if somebody needs
    to. But the default value should hopefully be good for any normal load,
    allowing a few rounds of lock stealing, but enforcing the strict
    ordering before the lock has been stolen too many times.

    There is also a hint from Matthieu Baerts that the fair page coloring
    may end up exposing an ABBA deadlock that is hidden by the usual
    optimistic lock stealing, and while the unfairness doesn't fix the
    fundamental issue (and I'm still looking at that), it avoids it in
    practice.

    The amount of unfairness can be modified by writing a new value to the
    'sysctl_page_lock_unfairness' variable (default value of 5, exposed
    through /proc/sys/vm/page_lock_unfairness), but that is hopefully
    something we'd use mainly for debugging rather than being necessary for
    any deep system tuning.

    This whole issue has exposed just how critical the page lock can be, and
    how contended it gets under certain locks. And the main contention
    doesn't really seem to be anything related to IO (which was the origin
    of this lock), but for things like just verifying that the page file
    mapping is stable while faulting in the page into a page table.

    Link: https://lore.kernel.org/linux-fsdevel/ed8442fd-6f54-dd84-cd4a-941e8b7ee603@MichaelLarabel.com/
    Link: https://www.phoronix.com/scan.php?page=article&item=linux-50-59&num=1
    Link: https://lore.kernel.org/linux-fsdevel/c560a38d-8313-51fb-b1ec-e904bd8836bc@tessares.net/
    Reported-and-tested-by: Michael Larabel
    Tested-by: Matthieu Baerts
    Cc: Dave Chinner
    Cc: Matthew Wilcox
    Cc: Chris Mason
    Cc: Jan Kara
    Cc: Amir Goldstein
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

17 Sep, 2020

1 commit

  • Commit 8344496e8b49 ("rcu-tasks: Conditionally compile
    show_rcu_tasks_gp_kthreads()") introduced conditional
    compilation of several functions, but forgot one occurrence of
    show_rcu_tasks_classic_gp_kthread() that causes the compiler to warn of
    an unused static function. This commit uses "static inline" to avoid
    these complaints and possibly also to avoid emitting an actual definition
    of this function.

    Fixes: 8344496e8b49 ("rcu-tasks: Conditionally compile show_rcu_tasks_gp_kthreads()")
    Cc: # 5.8.x
    Reported-by: Laurent Pinchart
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

16 Sep, 2020

3 commits

  • The __this_cpu*() accessors are (in general) IRQ-unsafe which, given
    that percpu-rwsem is a blocking primitive, should be just fine.

    However, file_end_write() is used from IRQ context and will cause
    load-store issues on architectures where the per-cpu accessors are not
    natively irq-safe.

    Fix it by using the IRQ-safe this_cpu_*() for operations on
    read_count. This will generate more expensive code on a number of
    platforms, which might cause a performance regression for some of the
    other percpu-rwsem users.

    If any such is reported, we can consider alternative solutions.

    Fixes: 70fe2f48152e ("aio: fix freeze protection of aio writes")
    Signed-off-by: Hou Tao
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Will Deacon
    Acked-by: Oleg Nesterov
    Link: https://lkml.kernel.org/r/20200915140750.137881-1-houtao1@huawei.com

    Hou Tao
     
  • Alexei Starovoitov says:

    ====================
    pull-request: bpf 2020-09-15

    The following pull-request contains BPF updates for your *net* tree.

    We've added 12 non-merge commits during the last 19 day(s) which contain
    a total of 10 files changed, 47 insertions(+), 38 deletions(-).

    The main changes are:

    1) docs/bpf fixes, from Andrii.

    2) ld_abs fix, from Daniel.

    3) socket casting helpers fix, from Martin.

    4) hash iterator fixes, from Yonghong.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Running selftest
    ./btf_btf -p
    the kernel had the following warning:
    [ 51.528185] WARNING: CPU: 3 PID: 1756 at kernel/bpf/hashtab.c:717 htab_map_get_next_key+0x2eb/0x300
    [ 51.529217] Modules linked in:
    [ 51.529583] CPU: 3 PID: 1756 Comm: test_btf Not tainted 5.9.0-rc1+ #878
    [ 51.530346] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.el7.centos 04/01/2014
    [ 51.531410] RIP: 0010:htab_map_get_next_key+0x2eb/0x300
    ...
    [ 51.542826] Call Trace:
    [ 51.543119] map_seq_next+0x53/0x80
    [ 51.543528] seq_read+0x263/0x400
    [ 51.543932] vfs_read+0xad/0x1c0
    [ 51.544311] ksys_read+0x5f/0xe0
    [ 51.544689] do_syscall_64+0x33/0x40
    [ 51.545116] entry_SYSCALL_64_after_hwframe+0x44/0xa9

    The related source code in kernel/bpf/hashtab.c:
    709 static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
    710 {
    711 struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
    712 struct hlist_nulls_head *head;
    713 struct htab_elem *l, *next_l;
    714 u32 hash, key_size;
    715 int i = 0;
    716
    717 WARN_ON_ONCE(!rcu_read_lock_held());

    In kernel/bpf/inode.c, bpffs map pretty print calls map->ops->map_get_next_key()
    without holding a rcu_read_lock(), hence causing the above warning.
    To fix the issue, just surrounding map->ops->map_get_next_key() with rcu read lock.

    Fixes: a26ca7c982cb ("bpf: btf: Add pretty print support to the basic arraymap")
    Reported-by: Alexei Starovoitov
    Signed-off-by: Yonghong Song
    Signed-off-by: Alexei Starovoitov
    Acked-by: Andrii Nakryiko
    Cc: Martin KaFai Lau
    Link: https://lore.kernel.org/bpf/20200916004401.146277-1-yhs@fb.com

    Yonghong Song
     

15 Sep, 2020

1 commit

  • On v5.8 when doing seccomp syscall rewrites (e.g. getpid into getppid
    as seen in the seccomp selftests), trace (and audit) correctly see the
    rewritten syscall on entry and exit:

    seccomp_bpf-1307 [000] .... 22974.874393: sys_enter: NR 110 (...
    seccomp_bpf-1307 [000] .N.. 22974.874401: sys_exit: NR 110 = 1304

    With mainline we see a mismatched enter and exit (the original syscall
    is incorrectly visible on entry):

    seccomp_bpf-1030 [000] .... 21.806766: sys_enter: NR 39 (...
    seccomp_bpf-1030 [000] .... 21.806767: sys_exit: NR 110 = 1027

    When ptrace or seccomp change the syscall, this needs to be visible to
    trace and audit at that time as well. Update the syscall earlier so they
    see the correct value.

    Fixes: d88d59b64ca3 ("core/entry: Respect syscall number rewrites")
    Reported-by: Michael Ellerman
    Signed-off-by: Kees Cook
    Signed-off-by: Thomas Gleixner
    Link: https://lore.kernel.org/r/20200912005826.586171-1-keescook@chromium.org

    Kees Cook
     

14 Sep, 2020

1 commit


13 Sep, 2020

1 commit

  • Pull seccomp fixes from Kees Cook:
    "This fixes a rare race condition in seccomp when using TSYNC and
    USER_NOTIF together where a memory allocation would not get freed
    (found by syzkaller, fixed by Tycho).

    Additionally updates Tycho's MAINTAINERS and .mailmap entries for his
    new address"

    * tag 'seccomp-v5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
    seccomp: don't leave dangling ->notif if file allocation fails
    mailmap, MAINTAINERS: move to tycho.pizza
    seccomp: don't leak memory when filter install races

    Linus Torvalds
     

12 Sep, 2020

1 commit

  • Using gcov to collect coverage data for kernels compiled with GCC 10.1
    causes random malfunctions and kernel crashes. This is the result of a
    changed GCOV_COUNTERS value in GCC 10.1 that causes a mismatch between
    the layout of the gcov_info structure created by GCC profiling code and
    the related structure used by the kernel.

    Fix this by updating the in-kernel GCOV_COUNTERS value. Also re-enable
    config GCOV_KERNEL for use with GCC 10.

    Reported-by: Colin Ian King
    Reported-by: Leon Romanovsky
    Signed-off-by: Peter Oberparleiter
    Tested-by: Leon Romanovsky
    Tested-and-Acked-by: Colin Ian King
    Signed-off-by: Linus Torvalds

    Peter Oberparleiter
     

10 Sep, 2020

1 commit


09 Sep, 2020

2 commits

  • Christian and Kees both pointed out that this is a bit sloppy to open-code
    both places, and Christian points out that we leave a dangling pointer to
    ->notif if file allocation fails. Since we check ->notif for null in order
    to determine if it's ok to install a filter, this means people won't be
    able to install a filter if the file allocation fails for some reason, even
    if they subsequently should be able to.

    To fix this, let's hoist this free+null into its own little helper and use
    it.

    Reported-by: Kees Cook
    Reported-by: Christian Brauner
    Signed-off-by: Tycho Andersen
    Acked-by: Christian Brauner
    Link: https://lore.kernel.org/r/20200902140953.1201956-1-tycho@tycho.pizza
    Signed-off-by: Kees Cook

    Tycho Andersen
     
  • In seccomp_set_mode_filter() with TSYNC | NEW_LISTENER, we first initialize
    the listener fd, then check to see if we can actually use it later in
    seccomp_may_assign_mode(), which can fail if anyone else in our thread
    group has installed a filter and caused some divergence. If we can't, we
    partially clean up the newly allocated file: we put the fd, put the file,
    but don't actually clean up the *memory* that was allocated at
    filter->notif. Let's clean that up too.

    To accomplish this, let's hoist the actual "detach a notifier from a
    filter" code to its own helper out of seccomp_notify_release(), so that in
    case anyone adds stuff to init_listener(), they only have to add the
    cleanup code in one spot. This does a bit of extra locking and such on the
    failure path when the filter is not attached, but it's a slow failure path
    anyway.

    Fixes: 51891498f2da ("seccomp: allow TSYNC and USER_NOTIF together")
    Reported-by: syzbot+3ad9614a12f80994c32e@syzkaller.appspotmail.com
    Signed-off-by: Tycho Andersen
    Acked-by: Christian Brauner
    Link: https://lore.kernel.org/r/20200902014017.934315-1-tycho@tycho.pizza
    Signed-off-by: Kees Cook

    Tycho Andersen
     

07 Sep, 2020

1 commit

  • Pull x86 fixes from Ingo Molnar:

    - more generic entry code ABI fallout

    - debug register handling bugfixes

    - fix vmalloc mappings on 32-bit kernels

    - kprobes instrumentation output fix on 32-bit kernels

    - fix over-eager WARN_ON_ONCE() on !SMAP hardware

    - NUMA debugging fix

    - fix Clang related crash on !RETPOLINE kernels

    * tag 'x86-urgent-2020-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/entry: Unbreak 32bit fast syscall
    x86/debug: Allow a single level of #DB recursion
    x86/entry: Fix AC assertion
    tracing/kprobes, x86/ptrace: Fix regs argument order for i386
    x86, fakenuma: Fix invalid starting node ID
    x86/mm/32: Bring back vmalloc faulting on x86_32
    x86/cmdline: Disable jump tables for cmdline.c

    Linus Torvalds
     

06 Sep, 2020

2 commits

  • Merge misc fixes from Andrew Morton:
    "19 patches.

    Subsystems affected by this patch series: MAINTAINERS, ipc, fork,
    checkpatch, lib, and mm (memcg, slub, pagemap, madvise, migration,
    hugetlb)"

    * emailed patches from Andrew Morton :
    include/linux/log2.h: add missing () around n in roundup_pow_of_two()
    mm/khugepaged.c: fix khugepaged's request size in collapse_file
    mm/hugetlb: fix a race between hugetlb sysctl handlers
    mm/hugetlb: try preferred node first when alloc gigantic page from cma
    mm/migrate: preserve soft dirty in remove_migration_pte()
    mm/migrate: remove unnecessary is_zone_device_page() check
    mm/rmap: fixup copying of soft dirty and uffd ptes
    mm/migrate: fixup setting UFFD_WP flag
    mm: madvise: fix vma user-after-free
    checkpatch: fix the usage of capture group ( ... )
    fork: adjust sysctl_max_threads definition to match prototype
    ipc: adjust proc_ipc_sem_dointvec definition to match prototype
    mm: track page table modifications in __apply_to_page_range()
    MAINTAINERS: IA64: mark Status as Odd Fixes only
    MAINTAINERS: add LLVM maintainers
    MAINTAINERS: update Cavium/Marvell entries
    mm: slub: fix conversion of freelist_corrupted()
    mm: memcg: fix memcg reclaim soft lockup
    memcg: fix use-after-free in uncharge_batch

    Linus Torvalds
     
  • Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
    changed ctl_table.proc_handler to take a kernel pointer. Adjust the
    definition of sysctl_max_threads to match its prototype in
    linux/sysctl.h which fixes the following sparse error/warning:

    kernel/fork.c:3050:47: warning: incorrect type in argument 3 (different address spaces)
    kernel/fork.c:3050:47: expected void *
    kernel/fork.c:3050:47: got void [noderef] __user *buffer
    kernel/fork.c:3036:5: error: symbol 'sysctl_max_threads' redeclared with different type (incompatible argument 3 (different address spaces)):
    kernel/fork.c:3036:5: int extern [addressable] [signed] [toplevel] sysctl_max_threads( ... )
    kernel/fork.c: note: in included file (through include/linux/key.h, include/linux/cred.h, include/linux/sched/signal.h, include/linux/sched/cputime.h):
    include/linux/sysctl.h:242:5: note: previously declared as:
    include/linux/sysctl.h:242:5: int extern [addressable] [signed] [toplevel] sysctl_max_threads( ... )

    Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
    Signed-off-by: Tobias Klauser
    Signed-off-by: Andrew Morton
    Cc: Christoph Hellwig
    Cc: Al Viro
    Link: https://lkml.kernel.org/r/20200825093647.24263-1-tklauser@distanz.ch
    Signed-off-by: Linus Torvalds

    Tobias Klauser
     

05 Sep, 2020

1 commit

  • GCOV built with GCC 10 doesn't initialize n_function variable. This
    produces different kernel panics as was seen by Colin in Ubuntu and me
    in FC 32.

    As a workaround, let's disable GCOV build for broken GCC 10 version.

    Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1891288
    Link: https://lore.kernel.org/lkml/20200827133932.3338519-1-leon@kernel.org
    Link: https://lore.kernel.org/lkml/CAHk-=whbijeSdSvx-Xcr0DPMj0BiwhJ+uiNnDSVZcr_h_kg7UA@mail.gmail.com/
    Cc: Colin Ian King
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Linus Torvalds

    Leon Romanovsky
     

04 Sep, 2020

4 commits

  • Andy reported that the syscall treacing for 32bit fast syscall fails:

    # ./tools/testing/selftests/x86/ptrace_syscall_32
    ...
    [RUN] SYSEMU
    [FAIL] Initial args are wrong (nr=224, args=10 11 12 13 14 4289172732)
    ...
    [RUN] SYSCALL
    [FAIL] Initial args are wrong (nr=29, args=0 0 0 0 0 4289172732)

    The eason is that the conversion to generic entry code moved the retrieval
    of the sixth argument (EBP) after the point where the syscall entry work
    runs, i.e. ptrace, seccomp, audit...

    Unbreak it by providing a split up version of syscall_enter_from_user_mode().

    - syscall_enter_from_user_mode_prepare() establishes state and enables
    interrupts

    - syscall_enter_from_user_mode_work() runs the entry work

    Replace the call to syscall_enter_from_user_mode() in the 32bit fast
    syscall C-entry with the split functions and stick the EBP retrieval
    between them.

    Fixes: 27d6b4d14f5c ("x86/entry: Use generic syscall entry function")
    Reported-by: Andy Lutomirski
    Signed-off-by: Thomas Gleixner
    Link: https://lore.kernel.org/r/87k0xdjbtt.fsf@nanos.tec.linutronix.de

    Thomas Gleixner
     
  • syzbot reports,

    WARNING: inconsistent lock state
    5.9.0-rc2-syzkaller #0 Not tainted
    --------------------------------
    inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
    syz-executor.0/26715 takes:
    (padata_works_lock){+.?.}-{2:2}, at: padata_do_parallel kernel/padata.c:220
    {IN-SOFTIRQ-W} state was registered at:
    spin_lock include/linux/spinlock.h:354 [inline]
    padata_do_parallel kernel/padata.c:220
    ...
    __do_softirq kernel/softirq.c:298
    ...
    sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1091
    asm_sysvec_apic_timer_interrupt arch/x86/include/asm/idtentry.h:581

    Possible unsafe locking scenario:

    CPU0
    ----
    lock(padata_works_lock);

    lock(padata_works_lock);

    padata_do_parallel() takes padata_works_lock with softirqs enabled, so a
    deadlock is possible if, on the same CPU, the lock is acquired in
    process context and then softirq handling done in an interrupt leads to
    the same path.

    Fix by leaving softirqs disabled while do_parallel holds
    padata_works_lock.

    Reported-by: syzbot+f4b9f49e38e25eb4ef52@syzkaller.appspotmail.com
    Fixes: 4611ce2246889 ("padata: allocate work structures for parallel jobs from a pool")
    Signed-off-by: Daniel Jordan
    Cc: Herbert Xu
    Cc: Steffen Klassert
    Cc: linux-crypto@vger.kernel.org
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Herbert Xu

    Daniel Jordan
     
  • Pull networking fixes from David Miller:

    1) Use netif_rx_ni() when necessary in batman-adv stack, from Jussi
    Kivilinna.

    2) Fix loss of RTT samples in rxrpc, from David Howells.

    3) Memory leak in hns_nic_dev_probe(), from Dignhao Liu.

    4) ravb module cannot be unloaded, fix from Yuusuke Ashizuka.

    5) We disable BH for too lokng in sctp_get_port_local(), add a
    cond_resched() here as well, from Xin Long.

    6) Fix memory leak in st95hf_in_send_cmd, from Dinghao Liu.

    7) Out of bound access in bpf_raw_tp_link_fill_link_info(), from
    Yonghong Song.

    8) Missing of_node_put() in mt7530 DSA driver, from Sumera
    Priyadarsini.

    9) Fix crash in bnxt_fw_reset_task(), from Michael Chan.

    10) Fix geneve tunnel checksumming bug in hns3, from Yi Li.

    11) Memory leak in rxkad_verify_response, from Dinghao Liu.

    12) In tipc, don't use smp_processor_id() in preemptible context. From
    Tuong Lien.

    13) Fix signedness issue in mlx4 memory allocation, from Shung-Hsi Yu.

    14) Missing clk_disable_prepare() in gemini driver, from Dan Carpenter.

    15) Fix ABI mismatch between driver and firmware in nfp, from Louis
    Peens.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (110 commits)
    net/smc: fix sock refcounting in case of termination
    net/smc: reset sndbuf_desc if freed
    net/smc: set rx_off for SMCR explicitly
    net/smc: fix toleration of fake add_link messages
    tg3: Fix soft lockup when tg3_reset_task() fails.
    doc: net: dsa: Fix typo in config code sample
    net: dp83867: Fix WoL SecureOn password
    nfp: flower: fix ABI mismatch between driver and firmware
    tipc: fix shutdown() of connectionless socket
    ipv6: Fix sysctl max for fib_multipath_hash_policy
    drivers/net/wan/hdlc: Change the default of hard_header_len to 0
    net: gemini: Fix another missing clk_disable_unprepare() in probe
    net: bcmgenet: fix mask check in bcmgenet_validate_flow()
    amd-xgbe: Add support for new port mode
    net: usb: dm9601: Add USB ID of Keenetic Plus DSL
    vhost: fix typo in error message
    net: ethernet: mlx4: Fix memory allocation in mlx4_buddy_init()
    pktgen: fix error message with wrong function name
    net: ethernet: ti: am65-cpsw: fix rmii 100Mbit link mode
    cxgb4: fix thermal zone device registration
    ...

    Linus Torvalds
     
  • Currently, for hashmap, the bpf iterator will grab a bucket lock, a
    spinlock, before traversing the elements in the bucket. This can ensure
    all bpf visted elements are valid. But this mechanism may cause
    deadlock if update/deletion happens to the same bucket of the
    visited map in the program. For example, if we added bpf_map_update_elem()
    call to the same visited element in selftests bpf_iter_bpf_hash_map.c,
    we will have the following deadlock:

    ============================================
    WARNING: possible recursive locking detected
    5.9.0-rc1+ #841 Not tainted
    --------------------------------------------
    test_progs/1750 is trying to acquire lock:
    ffff9a5bb73c5e70 (&htab->buckets[i].raw_lock){....}-{2:2}, at: htab_map_update_elem+0x1cf/0x410

    but task is already holding lock:
    ffff9a5bb73c5e20 (&htab->buckets[i].raw_lock){....}-{2:2}, at: bpf_hash_map_seq_find_next+0x94/0x120

    other info that might help us debug this:
    Possible unsafe locking scenario:

    CPU0
    ----
    lock(&htab->buckets[i].raw_lock);
    lock(&htab->buckets[i].raw_lock);

    *** DEADLOCK ***
    ...
    Call Trace:
    dump_stack+0x78/0xa0
    __lock_acquire.cold.74+0x209/0x2e3
    lock_acquire+0xba/0x380
    ? htab_map_update_elem+0x1cf/0x410
    ? __lock_acquire+0x639/0x20c0
    _raw_spin_lock_irqsave+0x3b/0x80
    ? htab_map_update_elem+0x1cf/0x410
    htab_map_update_elem+0x1cf/0x410
    ? lock_acquire+0xba/0x380
    bpf_prog_ad6dab10433b135d_dump_bpf_hash_map+0x88/0xa9c
    ? find_held_lock+0x34/0xa0
    bpf_iter_run_prog+0x81/0x16e
    __bpf_hash_map_seq_show+0x145/0x180
    bpf_seq_read+0xff/0x3d0
    vfs_read+0xad/0x1c0
    ksys_read+0x5f/0xe0
    do_syscall_64+0x33/0x40
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
    ...

    The bucket_lock first grabbed in seq_ops->next() called by bpf_seq_read(),
    and then grabbed again in htab_map_update_elem() in the bpf program, causing
    deadlocks.

    Actually, we do not need bucket_lock here, we can just use rcu_read_lock()
    similar to netlink iterator where the rcu_read_{lock,unlock} likes below:
    seq_ops->start():
    rcu_read_lock();
    seq_ops->next():
    rcu_read_unlock();
    /* next element */
    rcu_read_lock();
    seq_ops->stop();
    rcu_read_unlock();

    Compared to old bucket_lock mechanism, if concurrent updata/delete happens,
    we may visit stale elements, miss some elements, or repeat some elements.
    I think this is a reasonable compromise. For users wanting to avoid
    stale, missing/repeated accesses, bpf_map batch access syscall interface
    can be used.

    Signed-off-by: Yonghong Song
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/20200902235340.2001375-1-yhs@fb.com

    Yonghong Song
     

03 Sep, 2020

1 commit

  • During the LPC RCU BoF Paul asked how come the "USED" usage_mask & LOCK_USED))
    + if (!(class->usage_mask & LOCKF_USED))

    fixing that will indeed cause rcu_read_lock() to insta-splat :/

    The above typo means that instead of testing for: 0x100 (1 <<
    LOCK_USED), we test for 8 (LOCK_USED), which corresponds to (1 <<
    LOCK_ENABLED_HARDIRQ).

    So instead of testing for _any_ used lock, it will only match any lock
    used with interrupts enabled.

    The rcu_read_lock() annotation uses .check=0, which means it will not
    set any of the interrupt bits and will thus never match.

    In order to properly fix the situation and allow rcu_read_lock() to
    correctly work, split LOCK_USED into LOCK_USED and LOCK_USED_READ and by
    having .read users set USED_READ and test USED, pure read-recursive
    locks are permitted.

    Fixes: f6f48e180404 ("lockdep: Teach lockdep about "USED"
    Signed-off-by: Ingo Molnar
    Tested-by: Masami Hiramatsu
    Acked-by: Paul E. McKenney
    Link: https://lore.kernel.org/r/20200902160323.GK1362448@hirez.programming.kicks-ass.net

    peterz@infradead.org
     

31 Aug, 2020

2 commits

  • Pull x86 fixes from Thomas Gleixner:
    "Three interrupt related fixes for X86:

    - Move disabling of the local APIC after invoking fixup_irqs() to
    ensure that interrupts which are incoming are noted in the IRR and
    not ignored.

    - Unbreak affinity setting.

    The rework of the entry code reused the regular exception entry
    code for device interrupts. The vector number is pushed into the
    errorcode slot on the stack which is then lifted into an argument
    and set to -1 because that's regs->orig_ax which is used in quite
    some places to check whether the entry came from a syscall.

    But it was overlooked that orig_ax is used in the affinity cleanup
    code to validate whether the interrupt has arrived on the new
    target. It turned out that this vector check is pointless because
    interrupts are never moved from one vector to another on the same
    CPU. That check is a historical leftover from the time where x86
    supported multi-CPU affinities, but not longer needed with the now
    strict single CPU affinity. Famous last words ...

    - Add a missing check for an empty cpumask into the matrix allocator.

    The affinity change added a warning to catch the case where an
    interrupt is moved on the same CPU to a different vector. This
    triggers because a condition with an empty cpumask returns an
    assignment from the allocator as the allocator uses for_each_cpu()
    without checking the cpumask for being empty. The historical
    inconsistent for_each_cpu() behaviour of ignoring the cpumask and
    unconditionally claiming that CPU0 is in the mask struck again.
    Sigh.

    plus a new entry into the MAINTAINER file for the HPE/UV platform"

    * tag 'x86-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    genirq/matrix: Deal with the sillyness of for_each_cpu() on UP
    x86/irq: Unbreak interrupt affinity setting
    x86/hotplug: Silence APIC only after all interrupts are migrated
    MAINTAINERS: Add entry for HPE Superdome Flex (UV) maintainers

    Linus Torvalds
     
  • Pull locking fixes from Thomas Gleixner:
    "A set of fixes for lockdep, tracing and RCU:

    - Prevent recursion by using raw_cpu_* operations

    - Fixup the interrupt state in the cpu idle code to be consistent

    - Push rcu_idle_enter/exit() invocations deeper into the idle path so
    that the lock operations are inside the RCU watching sections

    - Move trace_cpu_idle() into generic code so it's called before RCU
    goes idle.

    - Handle raw_local_irq* vs. local_irq* operations correctly

    - Move the tracepoints out from under the lockdep recursion handling
    which turned out to be fragile and inconsistent"

    * tag 'locking-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    lockdep,trace: Expose tracepoints
    lockdep: Only trace IRQ edges
    mips: Implement arch_irqs_disabled()
    arm64: Implement arch_irqs_disabled()
    nds32: Implement arch_irqs_disabled()
    locking/lockdep: Cleanup
    x86/entry: Remove unused THUNKs
    cpuidle: Move trace_cpu_idle() into generic code
    cpuidle: Make CPUIDLE_FLAG_TLB_FLUSHED generic
    sched,idle,rcu: Push rcu_idle deeper into the idle path
    cpuidle: Fixup IRQ state
    lockdep: Use raw_cpu_*() for per-cpu variables

    Linus Torvalds