20 Jan, 2021

1 commit

  • [ Upstream commit 1b04fa9900263b4e217ca2509fd778b32c2b4eb2 ]

    PowerPC testing encountered boot failures due to RCU Tasks not being
    fully initialized until core_initcall() time. This commit therefore
    initializes RCU Tasks (along with Rude RCU and RCU Tasks Trace) just
    before early_initcall() time, thus allowing waiting on RCU Tasks grace
    periods from early_initcall() handlers.

    Link: https://lore.kernel.org/rcu/87eekfh80a.fsf@dja-thinkpad.axtens.net/
    Fixes: 36dadef23fcc ("kprobes: Init kprobes in early_initcall")
    Tested-by: Daniel Axtens
    Signed-off-by: Uladzislau Rezki (Sony)
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Sasha Levin

    Uladzislau Rezki (Sony)
     

30 Dec, 2020

3 commits

  • [ Upstream commit 56292e8609e39537297a7468dda4d87b9bd81d6a ]

    The current memmory-allocation interface causes the following difficulties
    for kvfree_rcu():

    a) If built with CONFIG_PROVE_RAW_LOCK_NESTING, the lockdep will
    complain about violation of the nesting rules, as in "BUG: Invalid
    wait context". This Kconfig option checks for proper raw_spinlock
    vs. spinlock nesting, in particular, it is not legal to acquire a
    spinlock_t while holding a raw_spinlock_t.

    This is a problem because kfree_rcu() uses raw_spinlock_t whereas the
    "page allocator" internally deals with spinlock_t to access to its
    zones. The code also can be broken from higher level of view:

    raw_spin_lock(&some_lock);
    kfree_rcu(some_pointer, some_field_offset);

    b) If built with CONFIG_PREEMPT_RT, spinlock_t is converted into
    sleeplock. This means that invoking the page allocator from atomic
    contexts results in "BUG: scheduling while atomic".

    c) Please note that call_rcu() is already invoked from raw atomic context,
    so it is only reasonable to expaect that kfree_rcu() and kvfree_rcu()
    will also be called from atomic raw context.

    This commit therefore defers page allocation to a clean context using the
    combination of an hrtimer and a workqueue. The hrtimer stage is required
    in order to avoid deadlocks with the scheduler. This deferred allocation
    is required only when kvfree_rcu()'s per-CPU page cache is empty.

    Link: https://lore.kernel.org/lkml/20200630164543.4mdcf6zb4zfclhln@linutronix.de/
    Fixes: 3042f83f19be ("rcu: Support reclaim for head-less object")
    Reported-by: Sebastian Andrzej Siewior
    Signed-off-by: Uladzislau Rezki (Sony)
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Sasha Levin

    Uladzislau Rezki (Sony)
     
  • [ Upstream commit d2098b4440981705e844c50254540ba7b5f82795 ]

    Kim reported that perf-ftrace made his box unhappy. It turns out that
    commit:

    ff5c4f5cad33 ("rcu/tree: Mark the idle relevant functions noinstr")

    removed one too many notrace qualifiers, probably due to there not being
    a helpful comment.

    This commit therefore reinstates the notrace and adds a comment to avoid
    losing it again.

    [ paulmck: Apply Steven Rostedt's feedback on the comment. ]
    Fixes: ff5c4f5cad33 ("rcu/tree: Mark the idle relevant functions noinstr")
    Reported-by: Kim Phillips
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Sasha Levin

    Peter Zijlstra
     
  • [ Upstream commit 6dbce04d8417ae706596366e16841d77c454ba52 ]

    Eugenio managed to tickle #PF from NMI context which resulted in
    hitting a WARN in RCU through irqentry_enter() ->
    __rcu_irq_enter_check_tick().

    However, this situation is perfectly sane and does not warrant an
    WARN. The #PF will (necessarily) be atomic and not require messing
    with the tick state, so early return is correct. This commit
    therefore removes the WARN.

    Fixes: aaf2bc50df1f ("rcu: Abstract out rcu_irq_enter_check_tick() from rcu_nmi_enter()")
    Reported-by: "Eugenio Pérez"
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Andy Lutomirski
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Sasha Levin

    Peter Zijlstra
     

18 Nov, 2020

1 commit

  • Pull RCU fix from Paul McKenney:
    "A single commit that fixes a bug that was introduced a couple of merge
    windows ago, but which rather more recently converged to an
    agreed-upon fix. The bug is that interrupts can be incorrectly enabled
    while holding an irq-disabled spinlock. This can of course result in
    self-deadlocks.

    The bug is a bit difficult to trigger. It requires that a preempted
    task be blocking a preemptible-RCU grace period long enough to trigger
    an RCU CPU stall warning. In addition, an interrupt must occur at just
    the right time, and that interrupt's handler must acquire that same
    irq-disabled spinlock. Still, a deadlock is a deadlock.

    Furthermore, we do now have a fix, and that fix survives kernel test
    robot, -next, and rcutorture testing. It has also been verified by
    Sebastian as fixing the bug. Therefore..."

    * 'urgent-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
    rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled

    Linus Torvalds
     

14 Nov, 2020

1 commit

  • Pull arm64 fixes from Will Deacon:

    - Spectre/Meltdown safelisting for some Qualcomm KRYO cores

    - Fix RCU splat when failing to online a CPU due to a feature mismatch

    - Fix a recently introduced sparse warning in kexec()

    - Fix handling of CPU erratum 1418040 for late CPUs

    - Ensure hot-added memory falls within linear-mapped region

    * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
    arm64: cpu_errata: Apply Erratum 845719 to KRYO2XX Silver
    arm64: proton-pack: Add KRYO2XX silver CPUs to spectre-v2 safe-list
    arm64: kpti: Add KRYO2XX gold/silver CPU cores to kpti safelist
    arm64: Add MIDR value for KRYO2XX gold/silver CPU cores
    arm64/mm: Validate hotplug range before creating linear mapping
    arm64: smp: Tell RCU about CPUs that fail to come online
    arm64: psci: Avoid printing in cpu_psci_cpu_die()
    arm64: kexec_file: Fix sparse warning
    arm64: errata: Fix handling of 1418040 with late CPU onlining

    Linus Torvalds
     

11 Nov, 2020

1 commit

  • The try_invoke_on_locked_down_task() function requires that
    interrupts be enabled, but it is called with interrupts disabled from
    rcu_print_task_stall(), resulting in an "IRQs not enabled as expected"
    diagnostic. This commit therefore updates rcu_print_task_stall()
    to accumulate a list of the first few tasks while holding the current
    leaf rcu_node structure's ->lock, then releases that lock and only then
    uses try_invoke_on_locked_down_task() to attempt to obtain per-task
    detailed information. Of course, as soon as ->lock is released, the
    task might exit, so the get_task_struct() function is used to prevent
    the task structure from going away in the meantime.

    Link: https://lore.kernel.org/lkml/000000000000903d5805ab908fc4@google.com/
    Fixes: 5bef8da66a9c ("rcu: Add per-task state to RCU CPU stall warnings")
    Reported-by: syzbot+cb3b69ae80afd6535b0e@syzkaller.appspotmail.com
    Reported-by: syzbot+f04854e1c5c9e913cc27@syzkaller.appspotmail.com
    Tested-by: Sebastian Andrzej Siewior
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

10 Nov, 2020

1 commit

  • Commit ce3d31ad3cac ("arm64/smp: Move rcu_cpu_starting() earlier") ensured
    that RCU is informed early about incoming CPUs that might end up calling
    into printk() before they are online. However, if such a CPU fails the
    early CPU feature compatibility checks in check_local_cpu_capabilities(),
    then it will be powered off or parked without informing RCU, leading to
    an endless stream of stalls:

    | rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
    | rcu: 2-O...: (0 ticks this GP) idle=002/1/0x4000000000000000 softirq=0/0 fqs=2593
    | (detected by 0, t=5252 jiffies, g=9317, q=136)
    | Task dump for CPU 2:
    | task:swapper/2 state:R running task stack: 0 pid: 0 ppid: 1 flags:0x00000028
    | Call trace:
    | ret_from_fork+0x0/0x30

    Ensure that the dying CPU invokes rcu_report_dead() prior to being powered
    off or parked.

    Cc: Qian Cai
    Cc: "Paul E. McKenney"
    Reviewed-by: Paul E. McKenney
    Suggested-by: Qian Cai
    Link: https://lore.kernel.org/r/20201105222242.GA8842@willie-the-truck
    Link: https://lore.kernel.org/r/20201106103602.9849-3-will@kernel.org
    Signed-off-by: Will Deacon

    Will Deacon
     

26 Oct, 2020

1 commit

  • Some architectures assume that the stopped CPUs don't make function calls
    to traceable functions when they are in the stopped state. See also commit
    cb9d7fd51d9f ("watchdog: Mark watchdog touch functions as notrace").

    Violating this assumption causes kernel crashes when switching tracer on
    RISC-V.

    Mark rcu_momentary_dyntick_idle() and stop_machine_yield() notrace to
    prevent this.

    Fixes: 4ecf0a43e729 ("processor: get rid of cpu_relax_yield")
    Fixes: 366237e7b083 ("stop_machine: Provide RCU quiescent state in multi_cpu_stop()")
    Signed-off-by: Zong Li
    Signed-off-by: Thomas Gleixner
    Tested-by: Atish Patra
    Tested-by: Colin Ian King
    Acked-by: Steven Rostedt (VMware)
    Acked-by: Paul E. McKenney
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20201021073839.43935-1-zong.li@sifive.com

    Zong Li
     

19 Oct, 2020

1 commit

  • Pull RCU changes from Ingo Molnar:

    - Debugging for smp_call_function()

    - RT raw/non-raw lock ordering fixes

    - Strict grace periods for KASAN

    - New smp_call_function() torture test

    - Torture-test updates

    - Documentation updates

    - Miscellaneous fixes

    [ This doesn't actually pull the tag - I've dropped the last merge from
    the RCU branch due to questions about the series. - Linus ]

    * tag 'core-rcu-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (77 commits)
    smp: Make symbol 'csd_bug_count' static
    kernel/smp: Provide CSD lock timeout diagnostics
    smp: Add source and destination CPUs to __call_single_data
    rcu: Shrink each possible cpu krcp
    rcu/segcblist: Prevent useless GP start if no CBs to accelerate
    torture: Add gdb support
    rcutorture: Allow pointer leaks to test diagnostic code
    rcutorture: Hoist OOM registry up one level
    refperf: Avoid null pointer dereference when buf fails to allocate
    rcutorture: Properly synchronize with OOM notifier
    rcutorture: Properly set rcu_fwds for OOM handling
    torture: Add kvm.sh --help and update help message
    rcutorture: Add CONFIG_PROVE_RCU_LIST to TREE05
    torture: Update initrd documentation
    rcutorture: Replace HTTP links with HTTPS ones
    locktorture: Make function torture_percpu_rwsem_init() static
    torture: document --allcpus argument added to the kvm.sh script
    rcutorture: Output number of elapsed grace periods
    rcutorture: Remove KCSAN stubs
    rcu: Remove unused "cpu" parameter from rcu_report_qs_rdp()
    ...

    Linus Torvalds
     

17 Oct, 2020

1 commit

  • Pull documentation updates from Mauro Carvalho Chehab:
    "A series of patches addressing warnings produced by make htmldocs.
    This includes:

    - kernel-doc markup fixes

    - ReST fixes

    - Updates at the build system in order to support newer versions of
    the docs build toolchain (Sphinx)

    After this series, the number of html build warnings should reduce
    significantly, and building with Sphinx 3.1 or later should now be
    supported (although it is still recommended to use Sphinx 2.4.4).

    As agreed with Jon, I should be sending you a late pull request by the
    end of the merge window addressing remaining issues with docs build,
    as there are a number of warning fixes that depends on pull requests
    that should be happening along the merge window.

    The end goal is to have a clean htmldocs build on Kernel 5.10.

    PS. It should be noticed that Sphinx 3.0 is not currently supported,
    as it lacks support for C domain namespaces. Such feature, needed in
    order to document uAPI system calls with Sphinx 3.x, was added only on
    Sphinx 3.1"

    * tag 'docs/v5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (75 commits)
    PM / devfreq: remove a duplicated kernel-doc markup
    mm/doc: fix a literal block markup
    workqueue: fix a kernel-doc warning
    docs: virt: user_mode_linux_howto_v2.rst: fix a literal block markup
    Input: sparse-keymap: add a description for @sw
    rcu/tree: docs: document bkvcache new members at struct kfree_rcu_cpu
    nl80211: docs: add a description for s1g_cap parameter
    usb: docs: document altmode register/unregister functions
    kunit: test.h: fix a bad kernel-doc markup
    drivers: core: fix kernel-doc markup for dev_err_probe()
    docs: bio: fix a kerneldoc markup
    kunit: test.h: solve kernel-doc warnings
    block: bio: fix a warning at the kernel-doc markups
    docs: powerpc: syscall64-abi.rst: fix a malformed table
    drivers: net: hamradio: fix document location
    net: appletalk: Kconfig: Fix docs location
    dt-bindings: fix references to files converted to yaml
    memblock: get rid of a :c:type leftover
    math64.h: kernel-docs: Convert some markups into normal comments
    media: uAPI: buffer.rst: remove a left-over documentation
    ...

    Linus Torvalds
     

16 Oct, 2020

1 commit

  • Pull networking updates from Jakub Kicinski:

    - Add redirect_neigh() BPF packet redirect helper, allowing to limit
    stack traversal in common container configs and improving TCP
    back-pressure.

    Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain.

    - Expand netlink policy support and improve policy export to user
    space. (Ge)netlink core performs request validation according to
    declared policies. Expand the expressiveness of those policies
    (min/max length and bitmasks). Allow dumping policies for particular
    commands. This is used for feature discovery by user space (instead
    of kernel version parsing or trial and error).

    - Support IGMPv3/MLDv2 multicast listener discovery protocols in
    bridge.

    - Allow more than 255 IPv4 multicast interfaces.

    - Add support for Type of Service (ToS) reflection in SYN/SYN-ACK
    packets of TCPv6.

    - In Multi-patch TCP (MPTCP) support concurrent transmission of data on
    multiple subflows in a load balancing scenario. Enhance advertising
    addresses via the RM_ADDR/ADD_ADDR options.

    - Support SMC-Dv2 version of SMC, which enables multi-subnet
    deployments.

    - Allow more calls to same peer in RxRPC.

    - Support two new Controller Area Network (CAN) protocols - CAN-FD and
    ISO 15765-2:2016.

    - Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit
    kernel problem.

    - Add TC actions for implementing MPLS L2 VPNs.

    - Improve nexthop code - e.g. handle various corner cases when nexthop
    objects are removed from groups better, skip unnecessary
    notifications and make it easier to offload nexthops into HW by
    converting to a blocking notifier.

    - Support adding and consuming TCP header options by BPF programs,
    opening the doors for easy experimental and deployment-specific TCP
    option use.

    - Reorganize TCP congestion control (CC) initialization to simplify
    life of TCP CC implemented in BPF.

    - Add support for shipping BPF programs with the kernel and loading
    them early on boot via the User Mode Driver mechanism, hence reusing
    all the user space infra we have.

    - Support sleepable BPF programs, initially targeting LSM and tracing.

    - Add bpf_d_path() helper for returning full path for given 'struct
    path'.

    - Make bpf_tail_call compatible with bpf-to-bpf calls.

    - Allow BPF programs to call map_update_elem on sockmaps.

    - Add BPF Type Format (BTF) support for type and enum discovery, as
    well as support for using BTF within the kernel itself (current use
    is for pretty printing structures).

    - Support listing and getting information about bpf_links via the bpf
    syscall.

    - Enhance kernel interfaces around NIC firmware update. Allow
    specifying overwrite mask to control if settings etc. are reset
    during update; report expected max time operation may take to users;
    support firmware activation without machine reboot incl. limits of
    how much impact reset may have (e.g. dropping link or not).

    - Extend ethtool configuration interface to report IEEE-standard
    counters, to limit the need for per-vendor logic in user space.

    - Adopt or extend devlink use for debug, monitoring, fw update in many
    drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx,
    dpaa2-eth).

    - In mlxsw expose critical and emergency SFP module temperature alarms.
    Refactor port buffer handling to make the defaults more suitable and
    support setting these values explicitly via the DCBNL interface.

    - Add XDP support for Intel's igb driver.

    - Support offloading TC flower classification and filtering rules to
    mscc_ocelot switches.

    - Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as
    fixed interval period pulse generator and one-step timestamping in
    dpaa-eth.

    - Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3)
    offload.

    - Add Lynx PHY/PCS MDIO module, and convert various drivers which have
    this HW to use it. Convert mvpp2 to split PCS.

    - Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as
    7-port Mediatek MT7531 IP.

    - Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver,
    and wcn3680 support in wcn36xx.

    - Improve performance for packets which don't require much offloads on
    recent Mellanox NICs by 20% by making multiple packets share a
    descriptor entry.

    - Move chelsio inline crypto drivers (for TLS and IPsec) from the
    crypto subtree to drivers/net. Move MDIO drivers out of the phy
    directory.

    - Clean up a lot of W=1 warnings, reportedly the actively developed
    subsections of networking drivers should now build W=1 warning free.

    - Make sure drivers don't use in_interrupt() to dynamically adapt their
    code. Convert tasklets to use new tasklet_setup API (sadly this
    conversion is not yet complete).

    * tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2583 commits)
    Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH"
    net, sockmap: Don't call bpf_prog_put() on NULL pointer
    bpf, selftest: Fix flaky tcp_hdr_options test when adding addr to lo
    bpf, sockmap: Add locking annotations to iterator
    netfilter: nftables: allow re-computing sctp CRC-32C in 'payload' statements
    net: fix pos incrementment in ipv6_route_seq_next
    net/smc: fix invalid return code in smcd_new_buf_create()
    net/smc: fix valid DMBE buffer sizes
    net/smc: fix use-after-free of delayed events
    bpfilter: Fix build error with CONFIG_BPFILTER_UMH
    cxgb4/ch_ipsec: Replace the module name to ch_ipsec from chcr
    net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info
    bpf: Fix register equivalence tracking.
    rxrpc: Fix loss of final ack on shutdown
    rxrpc: Fix bundle counting for exclusive connections
    netfilter: restore NF_INET_NUMHOOKS
    ibmveth: Identify ingress large send packets.
    ibmveth: Switch order of ibmveth_helper calls.
    cxgb4: handle 4-tuple PEDIT to NAT mode translation
    selftests: Add VRF route leaking tests
    ...

    Linus Torvalds
     

15 Oct, 2020

1 commit

  • Changeset 53c72b590b3a ("rcu/tree: cache specified number of objects")
    added new members for struct kfree_rcu_cpu, but didn't add the
    corresponding at the kernel-doc markup, as repoted when doing
    "make htmldocs":
    ./kernel/rcu/tree.c:3113: warning: Function parameter or member 'bkvcache' not described in 'kfree_rcu_cpu'
    ./kernel/rcu/tree.c:3113: warning: Function parameter or member 'nr_bkv_objs' not described in 'kfree_rcu_cpu'

    So, move the description for bkvcache to kernel-doc, and add a
    description for nr_bkv_objs.

    Fixes: 53c72b590b3a ("rcu/tree: cache specified number of objects")
    Acked-by: Paul E. McKenney
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     

13 Oct, 2020

1 commit

  • Pull debugobjects updates from Thomas Gleixner:
    "A small set of updates for debug objects:

    - Make all debug object descriptors constant. There is no reason to
    have them writeable.

    - Free the per CPU object pool after CPU unplug to avoid memory
    waste"

    * tag 'core-debugobjects-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    debugobjects: Free per CPU pool after CPU unplug
    treewide: Make all debug_obj_descriptors const
    debugobjects: Allow debug_obj_descr to be const

    Linus Torvalds
     

09 Oct, 2020

1 commit

  • …k/linux-rcu into core/rcu

    Pull v5.10 RCU changes from Paul E. McKenney:

    - Debugging for smp_call_function().

    - Strict grace periods for KASAN. The point of this series is to find
    RCU-usage bugs, so the corresponding new RCU_STRICT_GRACE_PERIOD
    Kconfig option depends on both DEBUG_KERNEL and RCU_EXPERT, and is
    further disabled by dfefault. Finally, the help text includes
    a goodly list of scary caveats.

    - New smp_call_function() torture test.

    - Torture-test updates.

    - Documentation updates.

    - Miscellaneous fixes.

    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     

06 Oct, 2020

1 commit


26 Sep, 2020

1 commit

  • Pull power management fixes from Rafael Wysocki:
    "These fix more fallout of recent RCU-lockdep changes in CPU idle code
    and two devfreq issues.

    Specifics:

    - Export rcu_idle_{enter,exit} to modules to fix build issues
    introduced by recent RCU-lockdep fixes (Borislav Petkov)

    - Add missing return statement to a stub function in the ACPI
    processor driver to fix a build issue introduced by recent
    RCU-lockdep fixes (Rafael Wysocki)

    - Fix recently introduced suspicious RCU usage warnings in the PSCI
    cpuidle driver and drop stale comments regarding RCU_NONIDLE()
    usage from enter_s2idle_proper() (Ulf Hansson)

    - Fix error code path in the tegra30 devfreq driver (Dan Carpenter)

    - Add missing information to devfreq_summary debugfs (Chanwoo Choi)"

    * tag 'pm-5.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
    ACPI: processor: Fix build for ARCH_APICTIMER_STOPS_ON_C3 unset
    PM / devfreq: tegra30: Disable clock on error in probe
    PM / devfreq: Add timer type to devfreq_summary debugfs
    cpuidle: Drop misleading comments about RCU usage
    cpuidle: psci: Fix suspicious RCU usage
    rcu/tree: Export rcu_idle_{enter,exit} to modules

    Linus Torvalds
     

25 Sep, 2020

1 commit

  • This should make it harder for the kernel to corrupt the debug object
    descriptor, used to call functions to fixup state and track debug objects,
    by moving the structure to read-only memory.

    Signed-off-by: Stephen Boyd
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Kees Cook
    Link: https://lore.kernel.org/r/20200815004027.2046113-3-swboyd@chromium.org

    Stephen Boyd
     

21 Sep, 2020

1 commit

  • Fix this link error:

    ERROR: modpost: "rcu_idle_enter" [drivers/acpi/processor.ko] undefined!
    ERROR: modpost: "rcu_idle_exit" [drivers/acpi/processor.ko] undefined!

    when CONFIG_ACPI_PROCESSOR is built as module. PeterZ says that in light
    of ARM needing those soon too, they should simply be exported.

    Fixes: 1fecfdbb7acc ("ACPI: processor: Take over RCU-idle for C3-BM idle")
    Reported-by: Sven Joachim
    Suggested-by: Peter Zijlstra
    Signed-off-by: Borislav Petkov
    Reviewed-by: Paul E. McKenney
    Signed-off-by: Rafael J. Wysocki

    Borislav Petkov
     

17 Sep, 2020

8 commits

  • The rcu_tasks_trace_postgp() function uses for_each_process_thread()
    to scan the task list without the benefit of RCU read-side protection,
    which can result in use-after-free errors on task_struct structures.
    This error was missed because the TRACE01 rcutorture scenario enables
    lockdep, but also builds with CONFIG_PREEMPT_NONE=y. In this situation,
    preemption is disabled everywhere, so lockdep thinks everywhere can
    be a legitimate RCU reader. This commit therefore adds the needed
    rcu_read_lock() and rcu_read_unlock().

    Note that this bug can occur only after an RCU Tasks Trace CPU stall
    warning, which by default only happens after a grace period has extended
    for ten minutes (yes, not a typo, minutes).

    Fixes: 4593e772b502 ("rcu-tasks: Add stall warnings for RCU Tasks Trace")
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: Jiri Olsa
    Cc:
    Cc: # 5.7.x
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • When rcu_tasks_trace_postgp() function detects an RCU Tasks Trace
    CPU stall, it adds all tasks blocking the current grace period to
    a list, invoking get_task_struct() on each to prevent them from
    being freed while on the list. It then traverses that list,
    printing stall-warning messages for each one that is still blocking
    the current grace period and removing it from the list. The list
    removal invokes the matching put_task_struct().

    This of course means that in the admittedly unlikely event that some
    task executes its outermost rcu_read_unlock_trace() in the meantime, it
    won't be removed from the list and put_task_struct() won't be executing,
    resulting in a task_struct leak. This commit therefore makes the list
    removal and put_task_struct() unconditional, stopping the leak.

    Note further that this bug can occur only after an RCU Tasks Trace CPU
    stall warning, which by default only happens after a grace period has
    extended for ten minutes (yes, not a typo, minutes).

    Fixes: 4593e772b502 ("rcu-tasks: Add stall warnings for RCU Tasks Trace")
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: Jiri Olsa
    Cc:
    Cc: # 5.7.x
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The more intense grace-period processing resulting from the 50x RCU
    Tasks Trace grace-period speedups exposed the following race condition:

    o Task A running on CPU 0 executes rcu_read_lock_trace(),
    entering a read-side critical section.

    o When Task A eventually invokes rcu_read_unlock_trace()
    to exit its read-side critical section, this function
    notes that the ->trc_reader_special.s flag is zero and
    and therefore invoke wil set ->trc_reader_nesting to zero
    using WRITE_ONCE(). But before that happens...

    o The RCU Tasks Trace grace-period kthread running on some other
    CPU interrogates Task A, but this fails because this task is
    currently running. This kthread therefore sends an IPI to CPU 0.

    o CPU 0 receives the IPI, and thus invokes trc_read_check_handler().
    Because Task A has not yet cleared its ->trc_reader_nesting
    counter, this function sees that Task A is still within its
    read-side critical section. This function therefore sets the
    ->trc_reader_nesting.b.need_qs flag, AKA the .need_qs flag.

    Except that Task A has already checked the .need_qs flag, which
    is part of the ->trc_reader_special.s flag. The .need_qs flag
    therefore remains set until Task A's next rcu_read_unlock_trace().

    o Task A now invokes synchronize_rcu_tasks_trace(), which cannot
    start a new grace period until the current grace period completes.
    And thus cannot return until after that time.

    But Task A's .need_qs flag is still set, which prevents the current
    grace period from completing. And because Task A is blocked, it
    will never execute rcu_read_unlock_trace() until its call to
    synchronize_rcu_tasks_trace() returns.

    We are therefore deadlocked.

    This race is improbable, but 80 hours of rcutorture made it happen twice.
    The race was possible before the grace-period speedup, but roughly 50x
    less probable. Several thousand hours of rcutorture would have been
    necessary to have a reasonable chance of making this happen before this
    50x speedup.

    This commit therefore eliminates this deadlock by setting
    ->trc_reader_nesting to a large negative number before checking the
    .need_qs and zeroing (or decrementing with respect to its initial
    value) ->trc_reader_nesting. For its part, the IPI handler's
    trc_read_check_handler() function adds a check for negative values,
    deferring evaluation of the task in this case. Taken together, these
    changes avoid this deadlock scenario.

    Fixes: 276c410448db ("rcu-tasks: Split ->trc_reader_need_end")
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: Jiri Olsa
    Cc:
    Cc: # 5.7.x
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The various RCU tasks flavors currently wait 100 milliseconds between each
    grace period in order to prevent CPU-bound loops and to favor efficiency
    over latency. However, RCU Tasks Trace needs to have a grace-period
    latency of roughly 25 milliseconds, which is completely infeasible given
    the 100-millisecond per-grace-period sleep. This commit therefore reduces
    this sleep duration to 5 milliseconds (or one jiffy, whichever is longer)
    in kernels built with CONFIG_TASKS_TRACE_RCU_READ_MB=y.

    Link: https://lore.kernel.org/bpf/CAADnVQK_AiX+S_L_A4CQWT11XyveppBbQSQgH_qWGyzu_E8Yeg@mail.gmail.com/
    Reported-by: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: Jiri Olsa
    Cc:
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Many workloads are quite sensitive to IPIs, and such workloads should
    build kernels with CONFIG_TASKS_TRACE_RCU_READ_MB=y to prevent RCU
    Tasks Trace from using them under normal conditions. However, other
    workloads are quite happy to permit more IPIs if doing so makes BPF
    program updates go faster. This commit therefore sets the default
    value for the rcupdate.rcu_task_ipi_delay kernel parameter to zero for
    kernels that have been built with CONFIG_TASKS_TRACE_RCU_READ_MB=n,
    while retaining the old default of (HZ / 10) for kernels that have
    indicated an aversion to IPIs via CONFIG_TASKS_TRACE_RCU_READ_MB=y.

    Link: https://lore.kernel.org/bpf/CAADnVQK_AiX+S_L_A4CQWT11XyveppBbQSQgH_qWGyzu_E8Yeg@mail.gmail.com/
    Reported-by: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: Jiri Olsa
    Cc:
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Commit 8344496e8b49 ("rcu-tasks: Conditionally compile
    show_rcu_tasks_gp_kthreads()") introduced conditional
    compilation of several functions, but forgot one occurrence of
    show_rcu_tasks_classic_gp_kthread() that causes the compiler to warn of
    an unused static function. This commit uses "static inline" to avoid
    these complaints and possibly also to avoid emitting an actual definition
    of this function.

    Fixes: 8344496e8b49 ("rcu-tasks: Conditionally compile show_rcu_tasks_gp_kthreads()")
    Cc: # 5.8.x
    Reported-by: Laurent Pinchart
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The RCU Tasks Trace grace periods are too slow, as in 40x slower than
    those of RCU Tasks. This is due to my having assumed a one-second grace
    period was OK, and thus not having optimized any further. This commit
    provides the first step in this optimization process, namely by allowing
    the task_list scan backoff interval to be specified on a per-flavor basis,
    and then speeding up the scans for RCU Tasks Trace. However, kernels
    built with CONFIG_TASKS_TRACE_RCU_READ_MB=y continue to use the old slower
    backoff, consistent with that Kconfig option's goal of reducing IPIs.

    Link: https://lore.kernel.org/bpf/CAADnVQK_AiX+S_L_A4CQWT11XyveppBbQSQgH_qWGyzu_E8Yeg@mail.gmail.com/
    Reported-by: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: Jiri Olsa
    Cc:
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The n_heavy_reader_attempts, n_heavy_reader_updates, and
    n_heavy_reader_ofl_updates variables are not used outside of their
    translation unit, so this commit marks them static.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

04 Sep, 2020

5 commits

  • strictgp.2020.08.24a: Strict grace periods for KASAN testing.

    Paul E. McKenney
     
  • scftorture.2020.08.24a: Torture tests for smp_call_function() and friends.

    Paul E. McKenney
     
  • doc.2020.08.24a: Documentation updates.
    fixes.2020.09.03b: Miscellaneous fixes.
    torture.2020.08.24a: Torture-test updates.

    Paul E. McKenney
     
  • CPUs can go offline shortly after kfree_call_rcu() has been invoked,
    which can leave memory stranded until those CPUs come back online.
    This commit therefore drains the kcrp of each CPU, not just the
    ones that happen to be online.

    Acked-by: Joel Fernandes
    Signed-off-by: Zqiang
    Signed-off-by: Paul E. McKenney

    Zqiang
     
  • The rcu_segcblist_accelerate() function returns true iff it is necessary
    to request another grace period. A tracing session showed that this
    function unnecessarily requests grace periods.

    For example, consider the following sequence of events:
    1. Callbacks are queued only on the NEXT segment of CPU A's callback list.
    2. CPU A runs RCU_SOFTIRQ, accelerating these callbacks from NEXT to WAIT.
    3. Thus rcu_segcblist_accelerate() returns true, requesting grace period N.
    4. RCU's grace-period kthread wakes up on CPU B and starts grace period N.
    4. CPU A notices the new grace period and invokes RCU_SOFTIRQ.
    5. CPU A's RCU_SOFTIRQ again invokes rcu_segcblist_accelerate(), but
    there are no new callbacks. However, rcu_segcblist_accelerate()
    nevertheless (uselessly) requests a new grace period N+1.

    This extra grace period results in additional lock contention and also
    additional wakeups, all for no good reason.

    This commit therefore adds a check to rcu_segcblist_accelerate() that
    prevents the return of true when there are no new callbacks.

    This change reduces the number of grace periods (GPs) and wakeups in each
    of eleven five-second rcutorture runs as follows:

    +----+-------------------+-------------------+
    | # | Number of GPs | Number of Wakeups |
    +====+=========+=========+=========+=========+
    | 1 | With | Without | With | Without |
    +----+---------+---------+---------+---------+
    | 2 | 75 | 89 | 113 | 119 |
    +----+---------+---------+---------+---------+
    | 3 | 62 | 91 | 105 | 123 |
    +----+---------+---------+---------+---------+
    | 4 | 60 | 79 | 98 | 110 |
    +----+---------+---------+---------+---------+
    | 5 | 63 | 79 | 99 | 112 |
    +----+---------+---------+---------+---------+
    | 6 | 57 | 89 | 96 | 123 |
    +----+---------+---------+---------+---------+
    | 7 | 64 | 85 | 97 | 118 |
    +----+---------+---------+---------+---------+
    | 8 | 58 | 83 | 98 | 113 |
    +----+---------+---------+---------+---------+
    | 9 | 57 | 77 | 89 | 104 |
    +----+---------+---------+---------+---------+
    | 10 | 66 | 82 | 98 | 119 |
    +----+---------+---------+---------+---------+
    | 11 | 52 | 82 | 83 | 117 |
    +----+---------+---------+---------+---------+

    The reduction in the number of wakeups ranges from 5% to 40%.

    Cc: urezki@gmail.com
    [ paulmck: Rework commit log and comment. ]
    Signed-off-by: Joel Fernandes (Google)
    Signed-off-by: Paul E. McKenney

    Joel Fernandes (Google)
     

25 Aug, 2020

8 commits

  • This commit adds an rcutorture.leakpointer module parameter that
    intentionally leaks an RCU-protected pointer out of the RCU read-side
    critical section and checks to see if the corresponding grace period
    has elapsed, emitting a WARN_ON_ONCE() if so. This module parameter can
    be used to test facilities like CONFIG_RCU_STRICT_GRACE_PERIOD that end
    grace periods quickly.

    While in the area, also document rcutorture.irqreader, which was
    previously left out.

    Reported-by Jann Horn
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Currently, registering and unregistering the OOM notifier is done
    right before and after the test, respectively. This will not work
    well for multi-threaded tests, so this commit hoists this registering
    and unregistering up into the rcu_torture_fwd_prog_init() and
    rcu_torture_fwd_prog_cleanup() functions.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Currently in the unlikely event that buf fails to be allocated it
    is dereferenced a few times. Use the errexit flag to determine if
    buf should be written to to avoid the null pointer dereferences.

    Addresses-Coverity: ("Dereference after null check")
    Fixes: f518f154ecef ("refperf: Dynamically allocate experiment-summary output buffer")
    Signed-off-by: Colin Ian King
    Signed-off-by: Paul E. McKenney

    Colin Ian King
     
  • The current rcutorture forward-progress code assumes that it is the
    only cause of out-of-memory (OOM) events. For script-based rcutorture
    testing, this assumption is in fact correct. However, testing based
    on modprobe/rmmod might well encounter external OOM events, which could
    happen at any time.

    This commit therefore properly synchronizes the interaction between
    rcutorture's forward-progress testing and its OOM notifier by adding a
    global mutex.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The conversion of rcu_fwds to dynamic allocation failed to actually
    allocate the required structure. This commit therefore allocates it,
    frees it, and updates rcu_fwds accordingly. While in the area, it
    abstracts the cleanup actions into rcu_torture_fwd_prog_cleanup().

    Fixes: 5155be9994e5 ("rcutorture: Dynamically allocate rcu_fwds structure")
    Reported-by: kernel test robot
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • This commit adds code to print the grace-period number at the start
    of the test along with both the grace-period number and the number of
    elapsed grace periods at the end of the test. Note that variants of
    RCU)without the notion of a grace-period number (for example, Tiny RCU)
    just print zeroes.

    [ paulmck: Adjust commit log. ]
    Signed-off-by: Joel Fernandes (Google)
    Signed-off-by: Paul E. McKenney

    Joel Fernandes (Google)
     
  • KCSAN is now in mainline, so this commit removes the stubs for the
    data_race(), ASSERT_EXCLUSIVE_WRITER(), and ASSERT_EXCLUSIVE_ACCESS()
    macros.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The "cpu" parameter to rcu_report_qs_rdp() is not used, with rdp->cpu
    being used instead. Furtheremore, every call to rcu_report_qs_rdp()
    invokes it on rdp->cpu. This commit therefore removes this unused "cpu"
    parameter and converts a check of rdp->cpu against smp_processor_id()
    to a WARN_ON_ONCE().

    Reported-by: Jann Horn
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney