26 Jan, 2017

2 commits

  • commit 52d7e48b86fc108e45a656d8e53e4237993c481d upstream.

    The current preemptible RCU implementation goes through three phases
    during bootup. In the first phase, there is only one CPU that is running
    with preemption disabled, so that a no-op is a synchronous grace period.
    In the second mid-boot phase, the scheduler is running, but RCU has
    not yet gotten its kthreads spawned (and, for expedited grace periods,
    workqueues are not yet running. During this time, any attempt to do
    a synchronous grace period will hang the system (or complain bitterly,
    depending). In the third and final phase, RCU is fully operational and
    everything works normally.

    This has been OK for some time, but there has recently been some
    synchronous grace periods showing up during the second mid-boot phase.
    This code worked "by accident" for awhile, but started failing as soon
    as expedited RCU grace periods switched over to workqueues in commit
    8b355e3bc140 ("rcu: Drive expedited grace periods from workqueue").
    Note that the code was buggy even before this commit, as it was subject
    to failure on real-time systems that forced all expedited grace periods
    to run as normal grace periods (for example, using the rcu_normal ksysfs
    parameter). The callchain from the failure case is as follows:

    early_amd_iommu_init()
    |-> acpi_put_table(ivrs_base);
    |-> acpi_tb_put_table(table_desc);
    |-> acpi_tb_invalidate_table(table_desc);
    |-> acpi_tb_release_table(...)
    |-> acpi_os_unmap_memory
    |-> acpi_os_unmap_iomem
    |-> acpi_os_map_cleanup
    |-> synchronize_rcu_expedited

    The kernel showing this callchain was built with CONFIG_PREEMPT_RCU=y,
    which caused the code to try using workqueues before they were
    initialized, which did not go well.

    This commit therefore reworks RCU to permit synchronous grace periods
    to proceed during this mid-boot phase. This commit is therefore a
    fix to a regression introduced in v4.9, and is therefore being put
    forward post-merge-window in v4.10.

    This commit sets a flag from the existing rcu_scheduler_starting()
    function which causes all synchronous grace periods to take the expedited
    path. The expedited path now checks this flag, using the requesting task
    to drive the expedited grace period forward during the mid-boot phase.
    Finally, this flag is updated by a core_initcall() function named
    rcu_exp_runtime_mode(), which causes the runtime codepaths to be used.

    Note that this arrangement assumes that tasks are not sent POSIX signals
    (or anything similar) from the time that the first task is spawned
    through core_initcall() time.

    Fixes: 8b355e3bc140 ("rcu: Drive expedited grace periods from workqueue")
    Reported-by: "Zheng, Lv"
    Reported-by: Borislav Petkov
    Signed-off-by: Paul E. McKenney
    Tested-by: Stan Kain
    Tested-by: Ivan
    Tested-by: Emanuel Castelo
    Tested-by: Bruno Pesavento
    Tested-by: Borislav Petkov
    Tested-by: Frederic Bezies
    Signed-off-by: Greg Kroah-Hartman

    Paul E. McKenney
     
  • commit f466ae66fa6a599f9a53b5f9bafea4b8cfffa7fb upstream.

    It is now legal to invoke synchronize_sched() at early boot, which causes
    Tiny RCU's synchronize_sched() to emit spurious splats. This commit
    therefore removes the cond_resched() from Tiny RCU's synchronize_sched().

    Fixes: 8b355e3bc140 ("rcu: Drive expedited grace periods from workqueue")
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Greg Kroah-Hartman

    Paul E. McKenney
     

16 Oct, 2016

1 commit

  • Pull gcc plugins update from Kees Cook:
    "This adds a new gcc plugin named "latent_entropy". It is designed to
    extract as much possible uncertainty from a running system at boot
    time as possible, hoping to capitalize on any possible variation in
    CPU operation (due to runtime data differences, hardware differences,
    SMP ordering, thermal timing variation, cache behavior, etc).

    At the very least, this plugin is a much more comprehensive example
    for how to manipulate kernel code using the gcc plugin internals"

    * tag 'gcc-plugins-v4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
    latent_entropy: Mark functions with __latent_entropy
    gcc-plugins: Add latent_entropy plugin

    Linus Torvalds
     

11 Oct, 2016

1 commit

  • The __latent_entropy gcc attribute can be used only on functions and
    variables. If it is on a function then the plugin will instrument it for
    gathering control-flow entropy. If the attribute is on a variable then
    the plugin will initialize it with random contents. The variable must
    be an integer, an integer array type or a structure with integer fields.

    These specific functions have been selected because they are init
    functions (to help gather boot-time entropy), are called at unpredictable
    times, or they have variable loops, each of which provide some level of
    latent entropy.

    Signed-off-by: Emese Revfy
    [kees: expanded commit message]
    Signed-off-by: Kees Cook

    Emese Revfy
     

04 Oct, 2016

1 commit

  • Pull locking updates from Ingo Molnar:
    "The main changes in this cycle were:

    - rwsem micro-optimizations (Davidlohr Bueso)

    - Improve the implementation and optimize the performance of
    percpu-rwsems. (Peter Zijlstra.)

    - Convert all lglock users to better facilities such as percpu-rwsems
    or percpu-spinlocks and remove lglocks. (Peter Zijlstra)

    - Remove the ticket (spin)lock implementation. (Peter Zijlstra)

    - Korean translation of memory-barriers.txt and related fixes to the
    English document. (SeongJae Park)

    - misc fixes and cleanups"

    * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
    x86/cmpxchg, locking/atomics: Remove superfluous definitions
    x86, locking/spinlocks: Remove ticket (spin)lock implementation
    locking/lglock: Remove lglock implementation
    stop_machine: Remove stop_cpus_lock and lg_double_lock/unlock()
    fs/locks: Use percpu_down_read_preempt_disable()
    locking/percpu-rwsem: Add down_read_preempt_disable()
    fs/locks: Replace lg_local with a per-cpu spinlock
    fs/locks: Replace lg_global with a percpu-rwsem
    locking/percpu-rwsem: Add DEFINE_STATIC_PERCPU_RWSEMand percpu_rwsem_assert_held()
    locking/pv-qspinlock: Use cmpxchg_release() in __pv_queued_spin_unlock()
    locking/rwsem, x86: Drop a bogus cc clobber
    futex: Add some more function commentry
    locking/hung_task: Show all locks
    locking/rwsem: Scan the wait_list for readers only once
    locking/rwsem: Remove a few useless comments
    locking/rwsem: Return void in __rwsem_mark_wake()
    locking, rcu, cgroup: Avoid synchronize_sched() in __cgroup_procs_write()
    locking/Documentation: Add Korean translation
    locking/Documentation: Fix a typo of example result
    locking/Documentation: Fix wrong section reference
    ...

    Linus Torvalds
     

15 Sep, 2016

1 commit


23 Aug, 2016

14 commits

  • A few rcuperf dmesg output messages have no space between the flag and
    the start of the message. In contrast, every other messages consistently
    supplies a single space. This difference makes rcuperf dmesg output
    hard to read and to mechanically parse. This commit therefore fixes
    this problem by modifying a pr_alert() call and PERFOUT_STRING() macro
    function to provide that single space.

    Signed-off-by: SeongJae Park
    Signed-off-by: Paul E. McKenney

    SeongJae Park
     
  • Tests for rcu_barrier() were introduced by commit fae4b54f28f0 ("rcu:
    Introduce rcutorture testing for rcu_barrier()"). This commit updated
    the documentation to say that the "rtbe" field in rcutorture's dmesg
    output indicates test failure. However, the code was not updated, only
    the documentation. This commit therefore updates the code to match the
    updated documentation.

    Signed-off-by: SeongJae Park
    Signed-off-by: Paul E. McKenney

    SeongJae Park
     
  • This commit adds a dump of the scheduler state for stalled rcutorture
    writer tasks. This addition provides yet more debug for the intermittent
    "failures to proceed", where grace periods move ahead but the rcutorture
    writer tasks fail to do so.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Install the callbacks via the state machine and let the core invoke
    the callbacks on the already online CPUs.

    Cc: Josh Triplett
    Cc: "Paul E. McKenney"
    Cc: Steven Rostedt
    Cc: Mathieu Desnoyers
    Cc: Lai Jiangshan
    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Paul E. McKenney

    Sebastian Andrzej Siewior
     
  • Up to now, RCU has assumed that the CPU-online process makes it from
    CPU_UP_PREPARE to set_cpu_online() within one jiffy. Given the recent
    rise of virtualized environments, this assumption is very clearly
    obsolete. Failing to meet this deadline can result in RCU paying
    attention to an incoming CPU for one jiffy, then ignoring it until the
    grace period following the one in which that CPU sets itself online.
    This situation might prove to be fatally disappointing to any RCU
    read-side critical sections that had the misfortune to execute during
    the time in which RCU was ignoring the slow-to-come-online CPU.

    This commit therefore updates RCU's internal CPU state-tracking
    information at notify_cpu_starting() time, thus providing RCU with
    an exact transition of the CPU's state from offline to online.

    Note that this means that incoming CPUs must not use RCU read-side
    critical section (other than those of SRCU) until notify_cpu_starting()
    time. Note also that the CPU_STARTING notifiers -are- allowed to use
    RCU read-side critical sections. (Of course, CPU-hotplug notifiers are
    rapidly becoming obsolete, so you need to act fast!)

    If a given architecture or CPU family needs to use RCU read-side
    critical sections earlier, the call to rcu_cpu_starting() from
    notify_cpu_starting() will need to be architecture-specific, with
    architectures that need early use being required to hand-place
    the call to rcu_cpu_starting() at some point preceding the call to
    notify_cpu_starting().

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Currently, __note_gp_changes() checks to see if the CPU has slept through
    multiple grace periods. If it has, it resynchronizes that CPU's view
    of the grace-period state, which includes whether or not the current
    grace period needs a quiescent state from this CPU. The fact of this
    need (or lack thereof) needs to be in two places, rdp->cpu_no_qs.b.norm
    and rdp->core_needs_qs. The former tells RCU's context-switch code to
    go get a quiescent state and the latter says that it needs to be reported.
    The current code unconditionally sets the former to true, but correctly
    sets the latter.

    This does not result in failures, but it does unnecessarily increase
    the amount of work done on average at context-switch time. This commit
    therefore correctly sets both fields.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The Kconfig currently controlling compilation of tree.c is:

    init/Kconfig:config TREE_RCU
    init/Kconfig: bool

    ...and update.c and sync.c are "obj-y" meaning that none are ever
    built as a module by anyone.

    Since MODULE_ALIAS is a no-op for non-modular code, we can remove
    them from these files.

    We leave moduleparam.h behind since the files instantiate some boot
    time configuration parameters with module_param() still.

    Cc: "Paul E. McKenney"
    Cc: Josh Triplett
    Cc: Steven Rostedt
    Cc: Mathieu Desnoyers
    Cc: Lai Jiangshan
    Signed-off-by: Paul Gortmaker
    Signed-off-by: Paul E. McKenney

    Paul Gortmaker
     
  • Commit abedf8e2419f ("rcu: Use simple wait queues where possible in
    rcutree") converts Tree RCU's wait queues to simple wait queues,
    but it incorrectly reverts the commit 2aa792e6faf1 ("rcu: Use
    rcu_gp_kthread_wake() to wake up grace period kthreads"). This can
    result in redundant self-wakeups.

    This commit therefore replaces the simple wait-queue wakeups with
    rcu_gp_kthread_wake(), thus avoiding the redundant wakeups.

    Signed-off-by: Jisheng Zhang
    Signed-off-by: Paul E. McKenney

    Jisheng Zhang
     
  • This commit improves the accuracy of the interaction between CPU hotplug
    operations and RCU's expedited grace periods by using RCU's online-CPU
    state to determine when failed IPIs should be retried.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The expedited RCU grace periods currently rely on a failure indication
    from smp_call_function_single() to determine that a given CPU is offline.
    This works after a fashion, but is more contorted and less precise than
    relying on RCU's internal state. This commit therefore takes a first
    step towards relying on internal state.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The expedited RCU CPU stall warnings currently responds to neither the
    panic_on_rcu_stall sysctl setting nor the rcupdate.rcu_cpu_stall_suppress
    kernel boot parameter. This commit therefore updates the expedited code
    to respond to these two controls.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Now that RCU expedited grace periods are always driven by a workqueue,
    there is no need to account for signal reception, and thus no need
    to disable expedited RCU CPU stall warnings due to signal reception.
    This commit therefore removes the signal-reception checks, leaving a
    WARN_ON() to catch possible future bugs.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The current implementation of expedited grace periods has the user
    task drive the grace period. This works, but has downsides: (1) The
    user task must awaken tasks piggybacking on this grace period, which
    can result in latencies rivaling that of the grace period itself, and
    (2) User tasks can receive signals, which interfere with RCU CPU stall
    warnings.

    This commit therefore uses workqueues to drive the grace periods, so
    that the user task need not do the awakening. A subsequent commit
    will remove the now-unnecessary code allowing for signals.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The functions synchronize_rcu_expedited() and synchronize_sched_expedited()
    have nearly identical code. This commit therefore consolidates this code
    into a new _synchronize_rcu_expedited() function.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

22 Aug, 2016

1 commit

  • Carrying out the following steps results in a softlockup in the
    RCU callback-offload (rcuo) kthreads:

    1. Connect to ixgbevf, and set the speed to 10Gb/s.
    2. Use ifconfig to bring the nic up and down repeatedly.

    [ 317.005148] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
    [ 368.106005] BUG: soft lockup - CPU#1 stuck for 22s! [rcuos/1:15]
    [ 368.106005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
    [ 368.106005] task: ffff88057dd8a220 ti: ffff88057dd9c000 task.ti: ffff88057dd9c000
    [ 368.106005] RIP: 0010:[] [] fib_table_lookup+0x14/0x390
    [ 368.106005] RSP: 0018:ffff88061fc83ce8 EFLAGS: 00000286
    [ 368.106005] RAX: 0000000000000001 RBX: 00000000020155c0 RCX: 0000000000000001
    [ 368.106005] RDX: ffff88061fc83d50 RSI: ffff88061fc83d70 RDI: ffff880036d11a00
    [ 368.106005] RBP: ffff88061fc83d08 R08: 0000000000000001 R09: 0000000000000000
    [ 368.106005] R10: ffff880036d11a00 R11: ffffffff819e0900 R12: ffff88061fc83c58
    [ 368.106005] R13: ffffffff816154dd R14: ffff88061fc83d08 R15: 00000000020155c0
    [ 368.106005] FS: 0000000000000000(0000) GS:ffff88061fc80000(0000) knlGS:0000000000000000
    [ 368.106005] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 368.106005] CR2: 00007f8c2aee9c40 CR3: 000000057b222000 CR4: 00000000000407e0
    [ 368.106005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ 368.106005] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    [ 368.106005] Stack:
    [ 368.106005] 00000000010000c0 ffff88057b766000 ffff8802e380b000 ffff88057af03e00
    [ 368.106005] ffff88061fc83dc0 ffffffff815349a6 ffff88061fc83d40 ffffffff814ee146
    [ 368.106005] ffff8802e380af00 00000000e380af00 ffffffff819e0900 020155c0010000c0
    [ 368.106005] Call Trace:
    [ 368.106005]
    [ 368.106005]
    [ 368.106005] [] ip_route_input_noref+0x516/0xbd0
    [ 368.106005] [] ? skb_release_data+0xd6/0x110
    [ 368.106005] [] ? kfree_skb+0x3a/0xa0
    [ 368.106005] [] ip_rcv_finish+0x29f/0x350
    [ 368.106005] [] ip_rcv+0x234/0x380
    [ 368.106005] [] __netif_receive_skb_core+0x676/0x870
    [ 368.106005] [] __netif_receive_skb+0x18/0x60
    [ 368.106005] [] process_backlog+0xae/0x180
    [ 368.106005] [] net_rx_action+0x152/0x240
    [ 368.106005] [] __do_softirq+0xef/0x280
    [ 368.106005] [] call_softirq+0x1c/0x30
    [ 368.106005]
    [ 368.106005]
    [ 368.106005] [] do_softirq+0x65/0xa0
    [ 368.106005] [] local_bh_enable+0x94/0xa0
    [ 368.106005] [] rcu_nocb_kthread+0x232/0x370
    [ 368.106005] [] ? wake_up_bit+0x30/0x30
    [ 368.106005] [] ? rcu_start_gp+0x40/0x40
    [ 368.106005] [] kthread+0xcf/0xe0
    [ 368.106005] [] ? kthread_create_on_node+0x140/0x140
    [ 368.106005] [] ret_from_fork+0x58/0x90
    [ 368.106005] [] ? kthread_create_on_node+0x140/0x140

    ==================================cut here==============================

    It turns out that the rcuos callback-offload kthread is busy processing
    a very large quantity of RCU callbacks, and it is not reliquishing the
    CPU while doing so. This commit therefore adds an cond_resched_rcu_qs()
    within the loop to allow other tasks to run.

    Signed-off-by: Ding Tianhong
    [ paulmck: Substituted cond_resched_rcu_qs for cond_resched. ]
    Signed-off-by: Paul E. McKenney

    Ding Tianhong
     

18 Aug, 2016

1 commit

  • The current percpu-rwsem read side is entirely free of serializing insns
    at the cost of having a synchronize_sched() in the write path.

    The latency of the synchronize_sched() is too high for cgroups. The
    commit 1ed1328792ff talks about the write path being a fairly cold path
    but this is not the case for Android which moves task to the foreground
    cgroup and back around binder IPC calls from foreground processes to
    background processes, so it is significantly hotter than human initiated
    operations.

    Switch cgroup_threadgroup_rwsem into the slow mode for now to avoid the
    problem, hopefully it should not be that slow after another commit:

    80127a39681b ("locking/percpu-rwsem: Optimize readers and reduce global impact").

    We could just add rcu_sync_enter() into cgroup_init() but we do not want
    another synchronize_sched() at boot time, so this patch adds the new helper
    which doesn't block but currently can only be called before the first use.

    Reported-by: John Stultz
    Reported-by: Dmitry Shmidt
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Colin Cross
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Rom Lemarchand
    Cc: Tejun Heo
    Cc: Thomas Gleixner
    Cc: Todd Kjos
    Link: http://lkml.kernel.org/r/20160811165413.GA22807@redhat.com
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

10 Aug, 2016

1 commit

  • Currently the percpu-rwsem switches to (global) atomic ops while a
    writer is waiting; which could be quite a while and slows down
    releasing the readers.

    This patch cures this problem by ordering the reader-state vs
    reader-count (see the comments in __percpu_down_read() and
    percpu_down_write()). This changes a global atomic op into a full
    memory barrier, which doesn't have the global cacheline contention.

    This also enables using the percpu-rwsem with rcu_sync disabled in order
    to bias the implementation differently, reducing the writer latency by
    adding some cost to readers.

    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Oleg Nesterov
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Paul McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    [ Fixed modular build. ]
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

30 Jul, 2016

1 commit

  • Pull smp hotplug updates from Thomas Gleixner:
    "This is the next part of the hotplug rework.

    - Convert all notifiers with a priority assigned

    - Convert all CPU_STARTING/DYING notifiers

    The final removal of the STARTING/DYING infrastructure will happen
    when the merge window closes.

    Another 700 hundred line of unpenetrable maze gone :)"

    * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits)
    timers/core: Correct callback order during CPU hot plug
    leds/trigger/cpu: Move from CPU_STARTING to ONLINE level
    powerpc/numa: Convert to hotplug state machine
    arm/perf: Fix hotplug state machine conversion
    irqchip/armada: Avoid unused function warnings
    ARC/time: Convert to hotplug state machine
    clocksource/atlas7: Convert to hotplug state machine
    clocksource/armada-370-xp: Convert to hotplug state machine
    clocksource/exynos_mct: Convert to hotplug state machine
    clocksource/arm_global_timer: Convert to hotplug state machine
    rcu: Convert rcutree to hotplug state machine
    KVM/arm/arm64/vgic-new: Convert to hotplug state machine
    smp/cfd: Convert core to hotplug state machine
    x86/x2apic: Convert to CPU hotplug state machine
    profile: Convert to hotplug state machine
    timers/core: Convert to hotplug state machine
    hrtimer: Convert to hotplug state machine
    x86/tboot: Convert to hotplug state machine
    arm64/armv8 deprecated: Convert to hotplug state machine
    hwtracing/coresight-etm4x: Convert to hotplug state machine
    ...

    Linus Torvalds
     

26 Jul, 2016

1 commit

  • Pull locking updates from Ingo Molnar:
    "The locking tree was busier in this cycle than the usual pattern - a
    couple of major projects happened to coincide.

    The main changes are:

    - implement the atomic_fetch_{add,sub,and,or,xor}() API natively
    across all SMP architectures (Peter Zijlstra)

    - add atomic_fetch_{inc/dec}() as well, using the generic primitives
    (Davidlohr Bueso)

    - optimize various aspects of rwsems (Jason Low, Davidlohr Bueso,
    Waiman Long)

    - optimize smp_cond_load_acquire() on arm64 and implement LSE based
    atomic{,64}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}()
    on arm64 (Will Deacon)

    - introduce smp_acquire__after_ctrl_dep() and fix various barrier
    mis-uses and bugs (Peter Zijlstra)

    - after discovering ancient spin_unlock_wait() barrier bugs in its
    implementation and usage, strengthen its semantics and update/fix
    usage sites (Peter Zijlstra)

    - optimize mutex_trylock() fastpath (Peter Zijlstra)

    - ... misc fixes and cleanups"

    * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (67 commits)
    locking/atomic: Introduce inc/dec variants for the atomic_fetch_$op() API
    locking/barriers, arch/arm64: Implement LDXR+WFE based smp_cond_load_acquire()
    locking/static_keys: Fix non static symbol Sparse warning
    locking/qspinlock: Use __this_cpu_dec() instead of full-blown this_cpu_dec()
    locking/atomic, arch/tile: Fix tilepro build
    locking/atomic, arch/m68k: Remove comment
    locking/atomic, arch/arc: Fix build
    locking/Documentation: Clarify limited control-dependency scope
    locking/atomic, arch/rwsem: Employ atomic_long_fetch_add()
    locking/atomic, arch/qrwlock: Employ atomic_fetch_add_acquire()
    locking/atomic, arch/mips: Convert to _relaxed atomics
    locking/atomic, arch/alpha: Convert to _relaxed atomics
    locking/atomic: Remove the deprecated atomic_{set,clear}_mask() functions
    locking/atomic: Remove linux/atomic.h:atomic_fetch_or()
    locking/atomic: Implement atomic{,64,_long}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}()
    locking/atomic: Fix atomic64_relaxed() bits
    locking/atomic, arch/xtensa: Implement atomic_fetch_{add,sub,and,or,xor}()
    locking/atomic, arch/x86: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
    locking/atomic, arch/tile: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
    locking/atomic, arch/sparc: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
    ...

    Linus Torvalds
     

15 Jul, 2016

1 commit

  • Straight forward conversion to the state machine. Though the question arises
    whether this needs really all these state transitions to work.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Anna-Maria Gleixner
    Reviewed-by: Sebastian Andrzej Siewior
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: rt@linutronix.de
    Link: http://lkml.kernel.org/r/20160713153337.982013161@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

16 Jun, 2016

5 commits

  • doc.2016.06.15a: Documentation updates
    fixes.2016.06.15b: Documentation updates
    torture.2016.06.14a: Documentation updates

    Paul E. McKenney
     
  • In many cases in the RCU tree code, we iterate over the set of cpus for
    a leaf node described by rcu_node::grplo and rcu_node::grphi, checking
    per-cpu data for each cpu in this range. However, if the set of possible
    cpus is sparse, some cpus described in this range are not possible, and
    thus no per-cpu region will have been allocated (or initialised) for
    them by the generic percpu code.

    Erroneous accesses to a per-cpu area for these !possible cpus may fault
    or may hit other data depending on the addressed generated when the
    erroneous per cpu offset is applied. In practice, both cases have been
    observed on arm64 hardware (the former being silent, but detectable with
    additional patches).

    To avoid issues resulting from this, we must iterate over the set of
    *possible* cpus for a given leaf node. This patch add a new helper,
    for_each_leaf_node_possible_cpu, to enable this. As iteration is often
    intertwined with rcu_node local bitmask manipulation, a new
    leaf_node_cpu_bit helper is added to make this simpler and more
    consistent. The RCU tree code is made to use both of these where
    appropriate.

    Without this patch, running reboot at a shell can result in an oops
    like:

    [ 3369.075979] Unable to handle kernel paging request at virtual address ffffff8008b21b4c
    [ 3369.083881] pgd = ffffffc3ecdda000
    [ 3369.087270] [ffffff8008b21b4c] *pgd=00000083eca48003, *pud=00000083eca48003, *pmd=0000000000000000
    [ 3369.096222] Internal error: Oops: 96000007 [#1] PREEMPT SMP
    [ 3369.101781] Modules linked in:
    [ 3369.104825] CPU: 2 PID: 1817 Comm: NetworkManager Tainted: G W 4.6.0+ #3
    [ 3369.121239] task: ffffffc0fa13e000 ti: ffffffc3eb940000 task.ti: ffffffc3eb940000
    [ 3369.128708] PC is at sync_rcu_exp_select_cpus+0x188/0x510
    [ 3369.134094] LR is at sync_rcu_exp_select_cpus+0x104/0x510
    [ 3369.139479] pc : [] lr : [] pstate: 200001c5
    [ 3369.146860] sp : ffffffc3eb9435a0
    [ 3369.150162] x29: ffffffc3eb9435a0 x28: ffffff8008be4f88
    [ 3369.155465] x27: ffffff8008b66c80 x26: ffffffc3eceb2600
    [ 3369.160767] x25: 0000000000000001 x24: ffffff8008be4f88
    [ 3369.166070] x23: ffffff8008b51c3c x22: ffffff8008b66c80
    [ 3369.171371] x21: 0000000000000001 x20: ffffff8008b21b40
    [ 3369.176673] x19: ffffff8008b66c80 x18: 0000000000000000
    [ 3369.181975] x17: 0000007fa951a010 x16: ffffff80086a30f0
    [ 3369.187278] x15: 0000007fa9505590 x14: 0000000000000000
    [ 3369.192580] x13: ffffff8008b51000 x12: ffffffc3eb940000
    [ 3369.197882] x11: 0000000000000006 x10: ffffff8008b51b78
    [ 3369.203184] x9 : 0000000000000001 x8 : ffffff8008be4000
    [ 3369.208486] x7 : ffffff8008b21b40 x6 : 0000000000001003
    [ 3369.213788] x5 : 0000000000000000 x4 : ffffff8008b27280
    [ 3369.219090] x3 : ffffff8008b21b4c x2 : 0000000000000001
    [ 3369.224406] x1 : 0000000000000001 x0 : 0000000000000140
    ...
    [ 3369.972257] [] sync_rcu_exp_select_cpus+0x188/0x510
    [ 3369.978685] [] synchronize_rcu_expedited+0x64/0xa8
    [ 3369.985026] [] synchronize_net+0x24/0x30
    [ 3369.990499] [] dev_deactivate_many+0x28c/0x298
    [ 3369.996493] [] __dev_close_many+0x60/0xd0
    [ 3370.002052] [] __dev_close+0x28/0x40
    [ 3370.007178] [] __dev_change_flags+0x8c/0x158
    [ 3370.012999] [] dev_change_flags+0x20/0x60
    [ 3370.018558] [] do_setlink+0x288/0x918
    [ 3370.023771] [] rtnl_newlink+0x398/0x6a8
    [ 3370.029158] [] rtnetlink_rcv_msg+0xe4/0x220
    [ 3370.034891] [] netlink_rcv_skb+0xc4/0xf8
    [ 3370.040364] [] rtnetlink_rcv+0x2c/0x40
    [ 3370.045663] [] netlink_unicast+0x160/0x238
    [ 3370.051309] [] netlink_sendmsg+0x2f0/0x358
    [ 3370.056956] [] sock_sendmsg+0x18/0x30
    [ 3370.062168] [] ___sys_sendmsg+0x26c/0x280
    [ 3370.067728] [] __sys_sendmsg+0x44/0x88
    [ 3370.073027] [] SyS_sendmsg+0x10/0x20
    [ 3370.078153] [] el0_svc_naked+0x24/0x28

    Signed-off-by: Mark Rutland
    Reported-by: Dennis Chen
    Cc: Catalin Marinas
    Cc: Josh Triplett
    Cc: Lai Jiangshan
    Cc: Mathieu Desnoyers
    Cc: Steve Capper
    Cc: Steven Rostedt
    Cc: Will Deacon
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Paul E. McKenney

    Mark Rutland
     
  • It is not always easy to determine the cause of an RCU stall just by
    analysing the RCU stall messages, mainly when the problem is caused
    by the indirect starvation of rcu threads. For example, when preempt_rcu
    is not awakened due to the starvation of a timer softirq.

    We have been hard coding panic() in the RCU stall functions for
    some time while testing the kernel-rt. But this is not possible in
    some scenarios, like when supporting customers.

    This patch implements the sysctl kernel.panic_on_rcu_stall. If
    set to 1, the system will panic() when an RCU stall takes place,
    enabling the capture of a vmcore. The vmcore provides a way to analyze
    all kernel/tasks states, helping out to point to the culprit and the
    solution for the stall.

    The kernel.panic_on_rcu_stall sysctl is disabled by default.

    Changes from v1:
    - Fixed a typo in the git log
    - The if(sysctl_panic_on_rcu_stall) panic() is in a static function
    - Fixed the CONFIG_TINY_RCU compilation issue
    - The var sysctl_panic_on_rcu_stall is now __read_mostly

    Cc: Jonathan Corbet
    Cc: "Paul E. McKenney"
    Cc: Josh Triplett
    Cc: Steven Rostedt
    Cc: Mathieu Desnoyers
    Cc: Lai Jiangshan
    Acked-by: Christian Borntraeger
    Reviewed-by: Josh Triplett
    Reviewed-by: Arnaldo Carvalho de Melo
    Tested-by: "Luis Claudio R. Goncalves"
    Signed-off-by: Daniel Bristot de Oliveira
    Signed-off-by: Paul E. McKenney

    Daniel Bristot de Oliveira
     
  • In the area in hot pursuit of a bug, so might as well clean it up.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Currently, if the very first call to call_rcu_tasks() has irqs disabled,
    it will create the rcu_tasks_kthread with irqs disabled, which will
    result in a splat in the memory allocator, which kthread_run() invokes
    with the expectation that irqs are enabled.

    This commit fixes this problem by deferring kthread creation if called
    with irqs disabled. The first call to call_rcu_tasks() that has irqs
    enabled will create the kthread.

    This bug was detected by rcutorture changes that were motivated by
    Iftekhar Ahmed's mutation-testing efforts.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

15 Jun, 2016

9 commits

  • Fix to return a negative error code -ENOMEM from kcalloc() error
    handling case instead of 0, as done elsewhere in this function.

    Signed-off-by: Wei Yongjun
    Signed-off-by: Paul E. McKenney

    Wei Yongjun
     
  • 0day found a boot warning triggered in rcu_perf_writer() on !SMP kernel:

    WARN_ON(rcu_gp_is_normal() && gp_exp);

    , the root cause of which is trying to measure expedited grace
    periods(by setting gp_exp to true by default) when all the grace periods
    are normal(TINY RCU only has normal grace periods).

    However, such a mis-setting would only result in failing to measure the
    performance for a specific kind of grace periods, therefore using a
    WARN_ON to check this is a little overkilling. We could handle this
    inside rcuperf module via some error messages to tell users about the
    mis-settings.

    Therefore this patch removes the WARN_ON in rcu_perf_writer() and
    handles those checkings in rcu_perf_init() with plain if() code.

    Moreover, this patch changes the default value of gp_exp to 1) align
    with rcutorture tests and 2) make the default setting work for all RCU
    implementations by default.

    Suggested-by: Paul E. McKenney
    Signed-off-by: Boqun Feng
    Fixes: http://lkml.kernel.org/r/57411b10.mFvG0+AgcrMXGtcj%fengguang.wu@intel.com
    Signed-off-by: Paul E. McKenney

    Boqun Feng
     
  • This commit removes CONFIG_RCU_TORTURE_TEST_RUNNABLE in favor of the
    already-existing rcutorture.torture_runnable kernel boot parameter.
    It also converts an #ifdef into IS_ENABLED(), saving a few lines of code.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • This commit applies the infamous IS_ENABLED() macro to eliminate a #ifdef.
    It also eliminates the RCU_PERF_TEST_RUNNABLE Kconfig option in favor
    of the already-existing rcuperf.perf_runnable kernel boot parameter.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • People have been having some difficulty finding their way around the
    RCU code. This commit therefore pulls some of the expedited grace-period
    code from tree_plugin.h to a new tree_exp.h file. This commit is strictly
    code movement.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • People have been having some difficulty finding their way around the
    RCU code. This commit therefore pulls some of the expedited grace-period
    code from tree.c to a new tree_exp.h file. This commit is strictly code
    movement, with the exception of a forward declaration that was added
    for the sync_sched_exp_online_cleanup() function.

    A subsequent commit will move the remaining expedited grace-period code
    from tree_plugin.h to tree_exp.h.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • I think you'll find this condition is superfluous, as the whole function
    is under #ifdef of that same.

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Paul E. McKenney

    Peter Zijlstra
     
  • In the past, RCU grace-period initialization excluded CPU-hotplug
    operations, but this is no longer the case. This commit therefore
    removed an outdated comment in rcu_gp_init() claiming that these
    are excluded.

    Reported-by: Lihao Liang
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The comment header for rcu_scheduler_active states that it is used
    to optimize synchronize_sched() at early boot. This is incorrect.
    The synchronize_sched() function instead checks the number of online
    CPUs. This commit therefore replaces the comment's synchronize_sched()
    with synchronize_rcu(), which really does use rcu_scheduler_active for
    this purpose.

    Reported-by: Lihao Liang
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney