26 Sep, 2017

6 commits

  • Add a sysfs file to one-time fail a specific state. This can be used
    to test the state rollback code paths.

    Something like this (hotplug-up.sh):

    #!/bin/bash

    echo 0 > /debug/sched_debug
    echo 1 > /debug/tracing/events/cpuhp/enable

    ALL_STATES=`cat /sys/devices/system/cpu/hotplug/states | cut -d':' -f1`
    STATES=${1:-$ALL_STATES}

    for state in $STATES
    do
    echo 0 > /sys/devices/system/cpu/cpu1/online
    echo 0 > /debug/tracing/trace
    echo Fail state: $state
    echo $state > /sys/devices/system/cpu/cpu1/hotplug/fail
    cat /sys/devices/system/cpu/cpu1/hotplug/fail
    echo 1 > /sys/devices/system/cpu/cpu1/online

    cat /debug/tracing/trace > hotfail-${state}.trace

    sleep 1
    done

    Can be used to test for all possible rollback (barring multi-instance)
    scenarios on CPU-up, CPU-down is a trivial modification of the above.

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Thomas Gleixner
    Cc: bigeasy@linutronix.de
    Cc: efault@gmx.de
    Cc: rostedt@goodmis.org
    Cc: max.byungchul.park@gmail.com
    Link: https://lkml.kernel.org/r/20170920170546.972581715@infradead.org

    Peter Zijlstra
     
  • With lockdep-crossrelease we get deadlock reports that span cpu-up and
    cpu-down chains. Such deadlocks cannot possibly happen because cpu-up
    and cpu-down are globally serialized.

    takedown_cpu()
    irq_lock_sparse()
    wait_for_completion(&st->done)

    cpuhp_thread_fun
    cpuhp_up_callback
    cpuhp_invoke_callback
    irq_affinity_online_cpu
    irq_local_spare()
    irq_unlock_sparse()
    complete(&st->done)

    Now that we have consistent AP state, we can trivially separate the
    AP completion between up and down using st->bringup.

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Thomas Gleixner
    Acked-by: max.byungchul.park@gmail.com
    Cc: bigeasy@linutronix.de
    Cc: efault@gmx.de
    Cc: rostedt@goodmis.org
    Link: https://lkml.kernel.org/r/20170920170546.872472799@infradead.org

    Peter Zijlstra
     
  • With lockdep-crossrelease we get deadlock reports that span cpu-up and
    cpu-down chains. Such deadlocks cannot possibly happen because cpu-up
    and cpu-down are globally serialized.

    CPU0 CPU1 CPU2
    cpuhp_up_callbacks: takedown_cpu: cpuhp_thread_fun:

    cpuhp_state
    irq_lock_sparse()
    irq_lock_sparse()
    wait_for_completion()
    cpuhp_state
    complete()

    Now that we have consistent AP state, we can trivially separate the
    AP-work class between up and down using st->bringup.

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Thomas Gleixner
    Cc: max.byungchul.park@gmail.com
    Cc: bigeasy@linutronix.de
    Cc: efault@gmx.de
    Cc: rostedt@goodmis.org
    Link: https://lkml.kernel.org/r/20170920170546.922524234@infradead.org

    Peter Zijlstra
     
  • While the generic callback functions have an 'int' return and thus
    appear to be allowed to return error, this is not true for all states.

    Specifically, what used to be STARTING/DYING are ran with IRQs
    disabled from critical parts of CPU bringup/teardown and are not
    allowed to fail. Add WARNs to enforce this rule.

    But since some callbacks are indeed allowed to fail, we have the
    situation where a state-machine rollback encounters a failure, in this
    case we're stuck, we can't go forward and we can't go back. Also add a
    WARN for that case.

    AFAICT this is a fundamental 'problem' with no real obvious solution.
    We want the 'prepare' callbacks to allow failure on either up or down.
    Typically on prepare-up this would be things like -ENOMEM from
    resource allocations, and the typical usage in prepare-down would be
    something like -EBUSY to avoid CPUs being taken away.

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Thomas Gleixner
    Cc: bigeasy@linutronix.de
    Cc: efault@gmx.de
    Cc: rostedt@goodmis.org
    Cc: max.byungchul.park@gmail.com
    Link: https://lkml.kernel.org/r/20170920170546.819539119@infradead.org

    Peter Zijlstra
     
  • There is currently no explicit state change on rollback. That is,
    st->bringup, st->rollback and st->target are not consistent when doing
    the rollback.

    Rework the AP state handling to be more coherent. This does mean we
    have to do a second AP kick-and-wait for rollback, but since rollback
    is the slow path of a slowpath, this really should not matter.

    Take this opportunity to simplify the AP thread function to only run a
    single callback per invocation. This unifies the three single/up/down
    modes is supports. The looping it used to do for up/down are achieved
    by retaining should_run and relying on the main smpboot_thread_fn()
    loop.

    (I have most of a patch that does the same for the BP state handling,
    but that's not critical and gets a little complicated because
    CPUHP_BRINGUP_CPU does the AP handoff from a callback, which gets
    recursive @st usage, I still have de-fugly that.)

    [ tglx: Move cpuhp_down_callbacks() et al. into the HOTPLUG_CPU section to
    avoid gcc complaining about unused functions. Make the HOTPLUG_CPU
    one piece instead of having two consecutive ifdef sections of the
    same type. ]

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Thomas Gleixner
    Cc: bigeasy@linutronix.de
    Cc: efault@gmx.de
    Cc: rostedt@goodmis.org
    Cc: max.byungchul.park@gmail.com
    Link: https://lkml.kernel.org/r/20170920170546.769658088@infradead.org

    Peter Zijlstra
     
  • Currently the rollback of multi-instance states is handled inside
    cpuhp_invoke_callback(). The problem is that when we want to allow an
    explicit state change for rollback, we need to return from the
    function without doing the rollback.

    Change cpuhp_invoke_callback() to optionally return the multi-instance
    state, such that rollback can be done from a subsequent call.

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Thomas Gleixner
    Cc: bigeasy@linutronix.de
    Cc: efault@gmx.de
    Cc: rostedt@goodmis.org
    Cc: max.byungchul.park@gmail.com
    Link: https://lkml.kernel.org/r/20170920170546.720361181@infradead.org

    Peter Zijlstra
     

05 Sep, 2017

1 commit


26 Jul, 2017

1 commit

  • RCU callbacks must be migrated away from an outgoing CPU, and this is
    done near the end of the CPU-hotplug operation, after the outgoing CPU is
    long gone. Unfortunately, this means that other CPU-hotplug callbacks
    can execute while the outgoing CPU's callbacks are still immobilized
    on the long-gone CPU's callback lists. If any of these CPU-hotplug
    callbacks must wait, either directly or indirectly, for the invocation
    of any of the immobilized RCU callbacks, the system will hang.

    This commit avoids such hangs by migrating the callbacks away from the
    outgoing CPU immediately upon its departure, shortly after the return
    from __cpu_die() in takedown_cpu(). Thus, RCU is able to advance these
    callbacks and invoke them, which allows all the after-the-fact CPU-hotplug
    callbacks to wait on these RCU callbacks without risk of a hang.

    While in the neighborhood, this commit also moves rcu_send_cbs_to_orphanage()
    and rcu_adopt_orphan_cbs() under a pre-existing #ifdef to avoid including
    dead code on the one hand and to avoid define-without-use warnings on the
    other hand.

    Reported-by: Jeffrey Hugo
    Link: http://lkml.kernel.org/r/db9c91f6-1b17-6136-84f0-03c3c2581ab4@codeaurora.org
    Signed-off-by: Paul E. McKenney
    Cc: Thomas Gleixner
    Cc: Sebastian Andrzej Siewior
    Cc: Ingo Molnar
    Cc: Anna-Maria Gleixner
    Cc: Boris Ostrovsky
    Cc: Richard Weinberger

    Paul E. McKenney
     

20 Jul, 2017

1 commit

  • If cpuhp_store_callbacks() is called for CPUHP_AP_ONLINE_DYN or
    CPUHP_BP_PREPARE_DYN, which are the indicators for dynamically allocated
    states, then cpuhp_store_callbacks() allocates a new dynamic state. The
    first allocation in each range returns CPUHP_AP_ONLINE_DYN or
    CPUHP_BP_PREPARE_DYN.

    If cpuhp_remove_state() is invoked for one of these states, then there is
    no protection against the allocation mechanism. So the removal, which
    should clear the callbacks and the name, gets a new state assigned and
    clears that one.

    As a consequence the state which should be cleared stays initialized. A
    consecutive CPU hotplug operation dereferences the state callbacks and
    accesses either freed or reused memory, resulting in crashes.

    Add a protection against this by checking the name argument for NULL. If
    it's NULL it's a removal. If not, it's an allocation.

    [ tglx: Added a comment and massaged changelog ]

    Fixes: 5b7aa87e0482 ("cpu/hotplug: Implement setup/removal interface")
    Signed-off-by: Ethan Barnes
    Signed-off-by: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "Srivatsa S. Bhat"
    Cc: Sebastian Siewior
    Cc: Paul McKenney
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/DM2PR04MB398242FC7776D603D9F99C894A60@DM2PR04MB398.namprd04.prod.outlook.com

    Ethan Barnes
     

12 Jul, 2017

1 commit

  • The move of the unpark functions to the control thread moved the BUG_ON()
    there as well. While it made some sense in the idle thread of the upcoming
    CPU, it's bogus to crash the control thread on the already online CPU,
    especially as the function has a return value and the callsite is prepared
    to handle an error return.

    Replace it with a WARN_ON_ONCE() and return a proper error code.

    Fixes: 9cd4f1a4e7a8 ("smp/hotplug: Move unparking of percpu threads to the control CPU")
    Rightfully-ranted-at-by: Linux Torvalds
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

06 Jul, 2017

1 commit

  • Vikram reported the following backtrace:

    BUG: scheduling while atomic: swapper/7/0/0x00000002
    CPU: 7 PID: 0 Comm: swapper/7 Not tainted 4.9.32-perf+ #680
    schedule
    schedule_hrtimeout_range_clock
    schedule_hrtimeout
    wait_task_inactive
    __kthread_bind_mask
    __kthread_bind
    __kthread_unpark
    kthread_unpark
    cpuhp_online_idle
    cpu_startup_entry
    secondary_start_kernel

    He analyzed correctly that a parked cpu hotplug thread of an offlined CPU
    was still on the runqueue when the CPU came back online and tried to unpark
    it. This causes the thread which invoked kthread_unpark() to call
    wait_task_inactive() and subsequently schedule() with preemption disabled.
    His proposed workaround was to "make sure" that a parked thread has
    scheduled out when the CPU goes offline, so the situation cannot happen.

    But that's still wrong because the root cause is not the fact that the
    percpu thread is still on the runqueue and neither that preemption is
    disabled, which could be simply solved by enabling preemption before
    calling kthread_unpark().

    The real issue is that the calling thread is the idle task of the upcoming
    CPU, which is not supposed to call anything which might sleep. The moron,
    who wrote that code, missed completely that kthread_unpark() might end up
    in schedule().

    The solution is simpler than expected. The thread which controls the
    hotplug operation is waiting for the CPU to call complete() on the hotplug
    state completion. So the idle task of the upcoming CPU can set its state to
    CPUHP_AP_ONLINE_IDLE and invoke complete(). This in turn wakes the control
    task on a different CPU, which then can safely do the unpark and kick the
    now unparked hotplug thread of the upcoming CPU to complete the bringup to
    the final target state.

    Control CPU AP

    bringup_cpu();
    __cpu_up() ------------>
    bringup_ap();
    bringup_wait_for_ap()
    wait_for_completion();
    cpuhp_online_idle();
    stopper);
    unpark(AP->hotplugthread);
    while(1)
    do_idle();
    kick(AP->hotplugthread);
    wait_for_completion(); hotplug_thread()
    run_online_callbacks();
    complete();

    Fixes: 8df3e07e7f21 ("cpu/hotplug: Let upcoming cpu bring itself fully up")
    Reported-by: Vikram Mulukutla
    Signed-off-by: Thomas Gleixner
    Acked-by: Peter Zijlstra
    Cc: Sebastian Sewior
    Cc: Rusty Russell
    Cc: Tejun Heo
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1707042218020.2131@nanos
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

04 Jul, 2017

1 commit

  • Pull SMP hotplug updates from Thomas Gleixner:
    "This update is primarily a cleanup of the CPU hotplug locking code.

    The hotplug locking mechanism is an open coded RWSEM, which allows
    recursive locking. The main problem with that is the recursive nature
    as it evades the full lockdep coverage and hides potential deadlocks.

    The rework replaces the open coded RWSEM with a percpu RWSEM and
    establishes full lockdep coverage that way.

    The bulk of the changes fix up recursive locking issues and address
    the now fully reported potential deadlocks all over the place. Some of
    these deadlocks have been observed in the RT tree, but on mainline the
    probability was low enough to hide them away."

    * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
    cpu/hotplug: Constify attribute_group structures
    powerpc: Only obtain cpu_hotplug_lock if called by rtasd
    ARM/hw_breakpoint: Fix possible recursive locking for arch_hw_breakpoint_init
    cpu/hotplug: Remove unused check_for_tasks() function
    perf/core: Don't release cred_guard_mutex if not taken
    cpuhotplug: Link lock stacks for hotplug callbacks
    acpi/processor: Prevent cpu hotplug deadlock
    sched: Provide is_percpu_thread() helper
    cpu/hotplug: Convert hotplug locking to percpu rwsem
    s390: Prevent hotplug rwsem recursion
    arm: Prevent hotplug rwsem recursion
    arm64: Prevent cpu hotplug rwsem recursion
    kprobes: Cure hotplug lock ordering issues
    jump_label: Reorder hotplug lock and jump_label_lock
    perf/tracing/cpuhotplug: Fix locking order
    ACPI/processor: Use cpu_hotplug_disable() instead of get_online_cpus()
    PCI: Replace the racy recursion prevention
    PCI: Use cpu_hotplug_disable() instead of get_online_cpus()
    perf/x86/intel: Drop get_online_cpus() in intel_snb_check_microcode()
    x86/perf: Drop EXPORT of perf_check_microcode
    ...

    Linus Torvalds
     

30 Jun, 2017

1 commit

  • attribute_groups are not supposed to change at runtime. All functions
    working with attribute_groups provided by work with const
    attribute_group.

    So mark the non-const structs as const:

    File size before:
    text data bss dec hex filename
    12582 15361 20 27963 6d3b kernel/cpu.o

    File size After adding 'const':
    text data bss dec hex filename
    12710 15265 20 27995 6d5b kernel/cpu.o

    Signed-off-by: Arvind Yadav
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: anna-maria@linutronix.de
    Cc: bigeasy@linutronix.de
    Cc: boris.ostrovsky@oracle.com
    Cc: rcochran@linutronix.de
    Link: http://lkml.kernel.org/r/f9079e94e12b36d245e7adbf67d312bc5d0250c6.1498737970.git.arvind.yadav.cs@gmail.com
    Signed-off-by: Ingo Molnar

    Arvind Yadav
     

23 Jun, 2017

1 commit

  • If a CPU goes offline, interrupts affine to the CPU are moved away. If the
    outgoing CPU is the last CPU in the affinity mask the migration code breaks
    the affinity and sets it it all online cpus.

    This is a problem for affinity managed interrupts as CPU hotplug is often
    used for power management purposes. If the affinity is broken, the
    interrupt is not longer affine to the CPUs to which it was allocated.

    The affinity spreading allows to lay out multi queue devices in a way that
    they are assigned to a single CPU or a group of CPUs. If the last CPU goes
    offline, then the queue is not longer used, so the interrupt can be
    shutdown gracefully and parked until one of the assigned CPUs comes online
    again.

    Add a graceful shutdown mechanism into the irq affinity breaking code path,
    mark the irq as MANAGED_SHUTDOWN and leave the affinity mask unmodified.

    In the online path, scan the active interrupts for managed interrupts and
    if the interrupt is functional and the newly online CPU is part of the
    affinity mask, restart the interrupt if it is marked MANAGED_SHUTDOWN or if
    the interrupts is started up, try to add the CPU back to the effective
    affinity mask.

    Originally-by: Christoph Hellwig
    Signed-off-by: Thomas Gleixner
    Cc: Jens Axboe
    Cc: Marc Zyngier
    Cc: Michael Ellerman
    Cc: Keith Busch
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20170619235447.273417334@linutronix.de

    Thomas Gleixner
     

13 Jun, 2017

1 commit

  • clang -Wunused-function found one remaining function that was
    apparently meant to be removed in a recent code cleanup:

    kernel/cpu.c:565:20: warning: unused function 'check_for_tasks' [-Wunused-function]

    Sebastian explained: The function became unused unintentionally, but there
    is already a failure check, when a task cannot be removed from the outgoing
    cpu in the scheduler code, so bringing it back is not really giving any
    extra value.

    Fixes: 530e9b76ae8f ("cpu/hotplug: Remove obsolete cpu hotplug register/unregister functions")
    Signed-off-by: Arnd Bergmann
    Cc: Peter Zijlstra
    Cc: Sebastian Andrzej Siewior
    Cc: Boris Ostrovsky
    Cc: Anna-Maria Gleixner
    Link: http://lkml.kernel.org/r/20170608085544.2257132-1-arnd@arndb.de
    Signed-off-by: Thomas Gleixner

    Arnd Bergmann
     

03 Jun, 2017

1 commit

  • If a custom CPU target is specified and that one is not available _or_
    can't be interrupted then the code returns to userland without dropping a
    lock as notices by lockdep:

    |echo 133 > /sys/devices/system/cpu/cpu7/hotplug/target
    | ================================================
    | [ BUG: lock held when returning to user space! ]
    | ------------------------------------------------
    | bash/503 is leaving the kernel with locks still held!
    | 1 lock held by bash/503:
    | #0: (device_hotplug_lock){+.+...}, at: [] lock_device_hotplug_sysfs+0x10/0x40

    So release the lock then.

    Fixes: 757c989b9994 ("cpu/hotplug: Make target state writeable")
    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Thomas Gleixner
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/20170602142714.3ogo25f2wbq6fjpj@linutronix.de

    Sebastian Andrzej Siewior
     

26 May, 2017

6 commits

  • The CPU hotplug callbacks are not covered by lockdep versus the cpu hotplug
    rwsem.

    CPU0 CPU1
    cpuhp_setup_state(STATE, startup, teardown);
    cpus_read_lock();
    invoke_callback_on_ap();
    kick_hotplug_thread(ap);
    wait_for_completion(); hotplug_thread_fn()
    lock(m);
    do_stuff();
    unlock(m);

    Lockdep does not know about this dependency and will not trigger on the
    following code sequence:

    lock(m);
    cpus_read_lock();

    Add a lockdep map and connect the initiators lock chain with the hotplug
    thread lock chain, so potential deadlocks can be detected.

    Signed-off-by: Thomas Gleixner
    Tested-by: Paul E. McKenney
    Acked-by: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Sebastian Siewior
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/20170524081549.709375845@linutronix.de

    Thomas Gleixner
     
  • There are no more (known) nested calls to get_online_cpus() and all
    observed lock ordering problems have been addressed.

    Replace the magic nested 'rwsem' hackery with a percpu-rwsem.

    Signed-off-by: Thomas Gleixner
    Tested-by: Paul E. McKenney
    Acked-by: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Sebastian Siewior
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/20170524081549.447014063@linutronix.de

    Thomas Gleixner
     
  • takedown_cpu() is a cpu hotplug function invoking stop_machine(). The cpu
    hotplug machinery holds the hotplug lock for write.

    stop_machine() invokes get_online_cpus() as well. This is correct, but
    prevents the conversion of the hotplug locking to a percpu rwsem.

    Use stop_machine_cpuslocked() to avoid the nested call.

    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Thomas Gleixner
    Tested-by: Paul E. McKenney
    Acked-by: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/20170524081548.423292433@linutronix.de

    Sebastian Andrzej Siewior
     
  • Add cpuslocked() variants for the multi instance registration so this can
    be called from a cpus_read_lock() protected region.

    Signed-off-by: Thomas Gleixner
    Tested-by: Paul E. McKenney
    Acked-by: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Sebastian Siewior
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/20170524081547.321782217@linutronix.de

    Thomas Gleixner
     
  • Some call sites of cpuhp_setup/remove_state[_nocalls]() are within a
    cpus_read locked region.

    cpuhp_setup/remove_state[_nocalls]() call cpus_read_lock() as well, which
    is possible in the current implementation but prevents converting the
    hotplug locking to a percpu rwsem.

    Provide locked versions of the interfaces to avoid nested calls to
    cpus_read_lock().

    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Thomas Gleixner
    Tested-by: Paul E. McKenney
    Acked-by: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/20170524081547.239600868@linutronix.de

    Sebastian Andrzej Siewior
     
  • The counting 'rwsem' hackery of get|put_online_cpus() is going to be
    replaced by percpu rwsem.

    Rename the functions to make it clear that it's locking and not some
    refcount style interface. These new functions will be used for the
    preparatory patches which make the code ready for the percpu rwsem
    conversion.

    Rename all instances in the cpu hotplug code while at it.

    Signed-off-by: Thomas Gleixner
    Tested-by: Paul E. McKenney
    Acked-by: Paul E. McKenney
    Acked-by: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Sebastian Siewior
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/20170524081547.080397752@linutronix.de

    Thomas Gleixner
     

14 Apr, 2017

1 commit


26 Mar, 2017

1 commit

  • Since commit 383776fa7527 ("locking/lockdep: Handle statically initialized
    PER_CPU locks properly") we try to collapse per-cpu locks into a single
    class by giving them all the same key. For this key we choose the canonical
    address of the per-cpu object, which would be the offset into the per-cpu
    area.

    This has two problems:

    - there is a case where we run !0 lock->key through static_obj() and
    expect this to pass; it doesn't for canonical pointers.

    - 0 is a valid canonical address.

    Cure both issues by redefining the canonical address as the address of the
    per-cpu variable on the boot CPU.

    Since I didn't want to rely on CPU0 being the boot-cpu, or even existing at
    all, track the boot CPU in a variable.

    Fixes: 383776fa7527 ("locking/lockdep: Handle statically initialized PER_CPU locks properly")
    Reported-by: kernel test robot
    Signed-off-by: Peter Zijlstra (Intel)
    Tested-by: Borislav Petkov
    Cc: Sebastian Andrzej Siewior
    Cc: linux-mm@kvack.org
    Cc: wfg@linux.intel.com
    Cc: kernel test robot
    Cc: LKP
    Link: http://lkml.kernel.org/r/20170320114108.kbvcsuepem45j5cr@hirez.programming.kicks-ass.net
    Signed-off-by: Thomas Gleixner

    Peter Zijlstra
     

15 Mar, 2017

1 commit

  • The setup/remove_state/instance() functions in the hotplug core code are
    serialized against concurrent CPU hotplug, but unfortunately not serialized
    against themself.

    As a consequence a concurrent invocation of these function results in
    corruption of the callback machinery because two instances try to invoke
    callbacks on remote cpus at the same time. This results in missing callback
    invocations and initiator threads waiting forever on the completion.

    The obvious solution to replace get_cpu_online() with cpu_hotplug_begin()
    is not possible because at least one callsite calls into these functions
    from a get_online_cpu() locked region.

    Extend the protection scope of the cpuhp_state_mutex from solely protecting
    the state arrays to cover the callback invocation machinery as well.

    Fixes: 5b7aa87e0482 ("cpu/hotplug: Implement setup/removal interface")
    Reported-and-tested-by: Bart Van Assche
    Signed-off-by: Sebastian Andrzej Siewior
    Cc: hpa@zytor.com
    Cc: mingo@kernel.org
    Cc: akpm@linux-foundation.org
    Cc: torvalds@linux-foundation.org
    Link: http://lkml.kernel.org/r/20170314150645.g4tdyoszlcbajmna@linutronix.de
    Signed-off-by: Thomas Gleixner

    Sebastian Andrzej Siewior
     

02 Mar, 2017

3 commits


18 Jan, 2017

1 commit

  • After the recent removal of the hotplug notifiers the variable 'hasdied' in
    _cpu_down() is set but no longer read, leading to the following GCC warning
    when building with 'make W=1':

    kernel/cpu.c:767:7: warning: variable ‘hasdied’ set but not used [-Wunused-but-set-variable]

    Fix it by removing the variable.

    Fixes: 530e9b76ae8f ("cpu/hotplug: Remove obsolete cpu hotplug register/unregister functions")
    Signed-off-by: Tobias Klauser
    Cc: Peter Zijlstra
    Cc: Sebastian Andrzej Siewior
    Cc: rt@linutronix.de
    Link: http://lkml.kernel.org/r/20170117143501.20893-1-tklauser@distanz.ch
    Signed-off-by: Thomas Gleixner

    Tobias Klauser
     

16 Jan, 2017

1 commit

  • Mathieu reported that the LTTNG modules are broken as of 4.10-rc1 due to
    the removal of the cpu hotplug notifiers.

    Usually I don't care much about out of tree modules, but LTTNG is widely
    used in distros. There are two ways to solve that:

    1) Reserve a hotplug state for LTTNG

    2) Add a dynamic range for the prepare states.

    While #1 is the simplest solution, #2 is the proper one as we can convert
    in tree users, which do not care about ordering, to the dynamic range as
    well.

    Add a dynamic range which allows LTTNG to request states in the prepare
    stage.

    Reported-and-tested-by: Mathieu Desnoyers
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Mathieu Desnoyers
    Cc: Peter Zijlstra
    Cc: Sebastian Sewior
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1701101353010.3401@nanos
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

27 Dec, 2016

1 commit

  • The attempt to prevent overwriting an active state resulted in a
    disaster which effectively disables all dynamically allocated hotplug
    states.

    Cleanup the mess.

    Fixes: dc280d936239 ("cpu/hotplug: Prevent overwriting of callbacks")
    Reported-by: Markus Trippelsdorf
    Reported-by: Boris Ostrovsky
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     

25 Dec, 2016

2 commits

  • hotcpu_notifier(), cpu_notifier(), __hotcpu_notifier(), __cpu_notifier(),
    register_hotcpu_notifier(), register_cpu_notifier(),
    __register_hotcpu_notifier(), __register_cpu_notifier(),
    unregister_hotcpu_notifier(), unregister_cpu_notifier(),
    __unregister_hotcpu_notifier(), __unregister_cpu_notifier()

    are unused now. Remove them and all related code.

    Remove also the now pointless cpu notifier error injection mechanism. The
    states can be executed step by step and error rollback is the same as cpu
    down, so any state transition can be tested w/o requiring the notifier
    error injection.

    Some CPU hotplug states are kept as they are (ab)used for hotplug state
    tracking.

    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: rt@linutronix.de
    Link: http://lkml.kernel.org/r/20161221192112.005642358@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • Developers manage to overwrite states blindly without thought. That's fatal
    and hard to debug. Add sanity checks to make it fail.

    This requries to restructure the code so that the dynamic state allocation
    happens in the same lock protected section as the actual store. Otherwise
    the previous assignment of 'Reserved' to the name field would trigger the
    overwrite check.

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Sebastian Siewior
    Link: http://lkml.kernel.org/r/20161221192111.675234535@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

16 Dec, 2016

1 commit

  • When invoked with CPUHP_AP_ONLINE_DYN state __cpuhp_setup_state()
    is expected to return positive value which is the hotplug state that
    the routine assigns.

    Signed-off-by: Boris Ostrovsky
    Cc: linux-pm@vger.kernel.org
    Cc: viresh.kumar@linaro.org
    Cc: bigeasy@linutronix.de
    Cc: rjw@rjwysocki.net
    Cc: xen-devel@lists.xenproject.org
    Link: http://lkml.kernel.org/r/1481814058-4799-2-git-send-email-boris.ostrovsky@oracle.com
    Signed-off-by: Thomas Gleixner

    Boris Ostrovsky
     

08 Dec, 2016

1 commit

  • Yu Zhao has noticed that __unregister_cpu_notifier only unregisters its
    notifiers when HOTPLUG_CPU=y while the registration might succeed even
    when HOTPLUG_CPU=n if MODULE is enabled. This means that e.g. zswap
    might keep a stale notifier on the list on the manual clean up during
    the pool tear down and thus corrupt the list. Resulting in the following

    [ 144.964346] BUG: unable to handle kernel paging request at ffff880658a2be78
    [ 144.971337] IP: [] raw_notifier_chain_register+0x1b/0x40

    [ 145.122628] Call Trace:
    [ 145.125086] [] __register_cpu_notifier+0x18/0x20
    [ 145.131350] [] zswap_pool_create+0x273/0x400
    [ 145.137268] [] __zswap_param_set+0x1fc/0x300
    [ 145.143188] [] ? trace_hardirqs_on+0xd/0x10
    [ 145.149018] [] ? kernel_param_lock+0x28/0x30
    [ 145.154940] [] ? __might_fault+0x4f/0xa0
    [ 145.160511] [] zswap_compressor_param_set+0x17/0x20
    [ 145.167035] [] param_attr_store+0x5c/0xb0
    [ 145.172694] [] module_attr_store+0x1d/0x30
    [ 145.178443] [] sysfs_kf_write+0x4f/0x70
    [ 145.183925] [] kernfs_fop_write+0x149/0x180
    [ 145.189761] [] __vfs_write+0x18/0x40
    [ 145.194982] [] vfs_write+0xb2/0x1a0
    [ 145.200122] [] SyS_write+0x52/0xa0
    [ 145.205177] [] entry_SYSCALL_64_fastpath+0x12/0x17

    This can be even triggered manually by changing
    /sys/module/zswap/parameters/compressor multiple times.

    Fix this issue by making unregister APIs symmetric to the register so
    there are no surprises.

    Fixes: 47e627bc8c9a ("[PATCH] hotplug: Allow modules to use the cpu hotplug notifiers even if !CONFIG_HOTPLUG_CPU")
    Reported-and-tested-by: Yu Zhao
    Signed-off-by: Michal Hocko
    Cc: linux-mm@kvack.org
    Cc: Andrew Morton
    Cc: Dan Streetman
    Link: http://lkml.kernel.org/r/20161207135438.4310-1-mhocko@kernel.org
    Signed-off-by: Thomas Gleixner

    Michal Hocko
     

16 Oct, 2016

1 commit

  • Use distinctive name for cpu_hotplug.dep_map to avoid the actual
    cpu_hotplug.lock appearing as cpu_hotplug.lock#2 in lockdep splats.

    Signed-off-by: Joonas Lahtinen
    Reviewed-by: Chris Wilson
    Acked-by: Gautham R. Shenoy
    Cc: Andrew Morton
    Cc: Daniel Vetter
    Cc: Gautham R . Shenoy
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: intel-gfx@lists.freedesktop.org
    Cc: trivial@kernel.org
    Signed-off-by: Ingo Molnar

    Joonas Lahtinen
     

04 Oct, 2016

2 commits

  • Pull CPU hotplug updates from Thomas Gleixner:
    "Yet another batch of cpu hotplug core updates and conversions:

    - Provide core infrastructure for multi instance drivers so the
    drivers do not have to keep custom lists.

    - Convert custom lists to the new infrastructure. The block-mq custom
    list conversion comes through the block tree and makes the diffstat
    tip over to more lines removed than added.

    - Handle unbalanced hotplug enable/disable calls more gracefully.

    - Remove the obsolete CPU_STARTING/DYING notifier support.

    - Convert another batch of notifier users.

    The relayfs changes which conflicted with the conversion have been
    shipped to me by Andrew.

    The remaining lot is targeted for 4.10 so that we finally can remove
    the rest of the notifiers"

    * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
    cpufreq: Fix up conversion to hotplug state machine
    blk/mq: Reserve hotplug states for block multiqueue
    x86/apic/uv: Convert to hotplug state machine
    s390/mm/pfault: Convert to hotplug state machine
    mips/loongson/smp: Convert to hotplug state machine
    mips/octeon/smp: Convert to hotplug state machine
    fault-injection/cpu: Convert to hotplug state machine
    padata: Convert to hotplug state machine
    cpufreq: Convert to hotplug state machine
    ACPI/processor: Convert to hotplug state machine
    virtio scsi: Convert to hotplug state machine
    oprofile/timer: Convert to hotplug state machine
    block/softirq: Convert to hotplug state machine
    lib/irq_poll: Convert to hotplug state machine
    x86/microcode: Convert to hotplug state machine
    sh/SH-X3 SMP: Convert to hotplug state machine
    ia64/mca: Convert to hotplug state machine
    ARM/OMAP/wakeupgen: Convert to hotplug state machine
    ARM/shmobile: Convert to hotplug state machine
    arm64/FP/SIMD: Convert to hotplug state machine
    ...

    Linus Torvalds
     
  • Pull RCU updates from Ingo Molnar:
    "The main changes in this cycle were:

    - Expedited grace-period changes, most notably avoiding having user
    threads drive expedited grace periods, using a workqueue instead.

    - Miscellaneous fixes, including a performance fix for lists that was
    sent with the lists modifications.

    - CPU hotplug updates, most notably providing exact CPU-online
    tracking for RCU. This will in turn allow removal of the checks
    supporting RCU's prior heuristic that was based on the assumption
    that CPUs would take no longer than one jiffy to come online.

    - Torture-test updates.

    - Documentation updates"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
    list: Expand list_first_entry_or_null()
    torture: TOROUT_STRING(): Insert a space between flag and message
    rcuperf: Consistently insert space between flag and message
    rcutorture: Print out barrier error as document says
    torture: Add task state to writer-task stall printk()s
    torture: Convert torture_shutdown() to hrtimer
    rcutorture: Convert to hotplug state machine
    cpu/hotplug: Get rid of CPU_STARTING reference
    rcu: Provide exact CPU-online tracking for RCU
    rcu: Avoid redundant quiescent-state chasing
    rcu: Don't use modular infrastructure in non-modular code
    sched: Make wake_up_nohz_cpu() handle CPUs going offline
    rcu: Use rcu_gp_kthread_wake() to wake up grace period kthreads
    rcu: Use RCU's online-CPU state for expedited IPI retry
    rcu: Exclude RCU-offline CPUs from expedited grace periods
    rcu: Make expedited RCU CPU stall warnings respond to controls
    rcu: Stop disabling expedited RCU CPU stall warnings
    rcu: Drive expedited grace periods from workqueue
    rcu: Consolidate expedited grace period machinery
    documentation: Record reason for rcu_head two-byte alignment
    ...

    Linus Torvalds
     

07 Sep, 2016

2 commits

  • Install the callbacks via the state machine.

    Signed-off-by: Richard Weinberger
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Sebastian Andrzej Siewior
    Reviewed-by: Sebastian Andrzej Siewior
    Cc: Peter Zijlstra
    Cc: Pekka Enberg
    Cc: linux-mm@kvack.org
    Cc: rt@linutronix.de
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Andrew Morton
    Cc: Christoph Lameter
    Link: http://lkml.kernel.org/r/20160823125319.abeapfjapf2kfezp@linutronix.de
    Signed-off-by: Thomas Gleixner

    Sebastian Andrzej Siewior
     
  • Install the callbacks via the state machine. They are installed at run time but
    relay_prepare_cpu() does not need to be invoked by the boot CPU because
    relay_open() was not yet invoked and there are no pools that need to be created.

    Signed-off-by: Richard Weinberger
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Sebastian Andrzej Siewior
    Reviewed-by: Sebastian Andrzej Siewior
    Cc: Peter Zijlstra
    Cc: rt@linutronix.de
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/20160818125731.27256-3-bigeasy@linutronix.de
    Signed-off-by: Thomas Gleixner

    Richard Weinberger