04 Dec, 2015

1 commit


20 Nov, 2015

1 commit


15 Apr, 2015

1 commit

  • Pull RCU changes from Ingo Molnar:
    "The main changes in this cycle were:

    - changes permitting use of call_rcu() and friends very early in
    boot, for example, before rcu_init() is invoked.

    - add in-kernel API to enable and disable expediting of normal RCU
    grace periods.

    - improve RCU's handling of (hotplug-) outgoing CPUs.

    - NO_HZ_FULL_SYSIDLE fixes.

    - tiny-RCU updates to make it more tiny.

    - documentation updates.

    - miscellaneous fixes"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits)
    cpu: Provide smpboot_thread_init() on !CONFIG_SMP kernels as well
    cpu: Defer smpboot kthread unparking until CPU known to scheduler
    rcu: Associate quiescent-state reports with grace period
    rcu: Yet another fix for preemption and CPU hotplug
    rcu: Add diagnostics to grace-period cleanup
    rcutorture: Default to grace-period-initialization delays
    rcu: Handle outgoing CPUs on exit from idle loop
    cpu: Make CPU-offline idle-loop transition point more precise
    rcu: Eliminate ->onoff_mutex from rcu_node structure
    rcu: Process offlining and onlining only at grace-period start
    rcu: Move rcu_report_unblock_qs_rnp() to common code
    rcu: Rework preemptible expedited bitmask handling
    rcu: Remove event tracing from rcu_cpu_notify(), used by offline CPUs
    rcutorture: Enable slow grace-period initializations
    rcu: Provide diagnostic option to slow down grace-period initialization
    rcu: Detect stalls caused by failure to propagate up rcu_node tree
    rcu: Eliminate empty HOTPLUG_CPU ifdef
    rcu: Simplify sync_rcu_preempt_exp_init()
    rcu: Put all orphan-callback-related code under same comment
    rcu: Consolidate offline-CPU callback initialization
    ...

    Linus Torvalds
     

13 Apr, 2015

1 commit

  • Currently, smpboot_unpark_threads() is invoked before the incoming CPU
    has been added to the scheduler's runqueue structures. This might
    potentially cause the unparked kthread to run on the wrong CPU, since the
    correct CPU isn't fully set up yet.

    That causes a sporadic, hard to debug boot crash triggering on some
    systems, reported by Borislav Petkov, and bisected down to:

    2a442c9c6453 ("x86: Use common outgoing-CPU-notification code")

    This patch places smpboot_unpark_threads() in a CPU hotplug
    notifier with priority set so that these kthreads are unparked just after
    the CPU has been added to the runqueues.

    Reported-and-tested-by: Borislav Petkov
    Signed-off-by: Paul E. McKenney
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

03 Apr, 2015

2 commits

  • clockevents_notify() is a leftover from the early design of the
    clockevents facility. It's really not a notification mechanism,
    it's a multiplex call. We are way better off to have explicit
    calls instead of this monstrosity.

    Split out the cleanup function for a dead cpu and invoke it
    directly from the cpu down code. Make it conditional on
    CPU_HOTPLUG as well.

    Temporary change, will be refined in the future.

    Signed-off-by: Thomas Gleixner
    [ Rebased, added clockevents_notify() removal ]
    Signed-off-by: Rafael J. Wysocki
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1735025.raBZdQHM3m@vostro.rjw.lan
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • clockevents_notify() is a leftover from the early design of the
    clockevents facility. It's really not a notification mechanism,
    it's a multiplex call. We are way better off to have explicit
    calls instead of this monstrosity.

    Split out the tick_handover call and invoke it explicitely from
    the hotplug code. Temporary solution will be cleaned up in later
    patches.

    Signed-off-by: Thomas Gleixner
    [ Rebase ]
    Signed-off-by: Rafael J. Wysocki
    Cc: Peter Zijlstra
    Cc: John Stultz
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/1658173.RkEEILFiQZ@vostro.rjw.lan
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

02 Apr, 2015

1 commit

  • It was found when doing a hotplug stress test on POWER, that the
    machine either hit softlockups or rcu_sched stall warnings. The
    issue was traced to commit:

    7cba160ad789 ("powernv/cpuidle: Redesign idle states management")

    which exposed the cpu_down() race with hrtimer based broadcast mode:

    5d1638acb9f6 ("tick: Introduce hrtimer based broadcast")

    The race is the following:

    Assume CPU1 is the CPU which holds the hrtimer broadcasting duty
    before it is taken down.

    CPU0 CPU1

    cpu_down() take_cpu_down()
    disable_interrupts()

    cpu_die()

    while (CPU1 != CPU_DEAD) {
    msleep(100);
    switch_to_idle();
    stop_cpu_timer();
    schedule_broadcast();
    }

    tick_cleanup_cpu_dead()
    take_over_broadcast()

    So after CPU1 disabled interrupts it cannot handle the broadcast
    hrtimer anymore, so CPU0 will be stuck forever.

    Fix this by explicitly taking over broadcast duty before cpu_die().

    This is a temporary workaround. What we really want is a callback
    in the clockevent device which allows us to do that from the dying
    CPU by pushing the hrtimer onto a different cpu. That might involve
    an IPI and is definitely more complex than this immediate fix.

    Changelog was picked up from:

    https://lkml.org/lkml/2015/2/16/213

    Suggested-by: Thomas Gleixner
    Tested-by: Nicolas Pitre
    Signed-off-by: Preeti U. Murthy
    Cc: linuxppc-dev@lists.ozlabs.org
    Cc: mpe@ellerman.id.au
    Cc: nicolas.pitre@linaro.org
    Cc: peterz@infradead.org
    Cc: rjw@rjwysocki.net
    Fixes: http://linuxppc.10917.n7.nabble.com/offlining-cpus-breakage-td88619.html
    Link: http://lkml.kernel.org/r/20150330092410.24979.59887.stgit@preeti.in.ibm.com
    [ Merged it to the latest timer tree, renamed the callback, tidied up the changelog. ]
    Signed-off-by: Ingo Molnar

    Preeti U Murthy
     

13 Mar, 2015

1 commit

  • This commit uses a per-CPU variable to make the CPU-offline code path
    through the idle loop more precise, so that the outgoing CPU is
    guaranteed to make it into the idle loop before it is powered off.
    This commit is in preparation for putting the RCU offline-handling
    code on this code path, which will eliminate the magic one-jiffy
    wait that RCU uses as the maximum time for an outgoing CPU to get
    all the way through the scheduler.

    The magic one-jiffy wait for incoming CPUs remains a separate issue.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

07 Jan, 2015

1 commit

  • Commit b2c4623dcd07 ("rcu: More on deadlock between CPU hotplug and expedited
    grace periods") introduced another problem that can easily be reproduced by
    starting/stopping cpus in a loop.

    E.g.:
    for i in `seq 5000`; do
    echo 1 > /sys/devices/system/cpu/cpu1/online
    echo 0 > /sys/devices/system/cpu/cpu1/online
    done

    Will result in:
    INFO: task /cpu_start_stop:1 blocked for more than 120 seconds.
    Call Trace:
    ([] __schedule+0x406/0x91c)
    [] cpu_hotplug_begin+0xd0/0xd4
    [] _cpu_up+0x3e/0x1c4
    [] cpu_up+0xb6/0xd4
    [] device_online+0x80/0xc0
    [] online_store+0x90/0xb0
    ...

    And a deadlock.

    Problem is that if the last ref in put_online_cpus() can't get the
    cpu_hotplug.lock the puts_pending count is incremented, but a sleeping
    active_writer might never be woken up, therefore never exiting the loop in
    cpu_hotplug_begin().

    This fix removes puts_pending and turns refcount into an atomic variable. We
    also introduce a wait queue for the active_writer, to avoid possible races and
    use-after-free. There is no need to take the lock in put_online_cpus() anymore.

    Can't reproduce it with this fix.

    Signed-off-by: David Hildenbrand
    Signed-off-by: Paul E. McKenney

    David Hildenbrand
     

04 Nov, 2014

1 commit

  • A long string of get_online_cpus() with each followed by a
    put_online_cpu() that fails to acquire cpu_hotplug.lock can result in
    overflow of the cpu_hotplug.puts_pending counter. Although this is
    perhaps improbably, a system with absolutely no CPU-hotplug operations
    will have an arbitrarily long time in which this overflow could occur.
    This commit therefore adds overflow checks to get_online_cpus() and
    try_get_online_cpus().

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Pranith Kumar

    Paul E. McKenney
     

23 Oct, 2014

1 commit

  • Commit dd56af42bd82 (rcu: Eliminate deadlock between CPU hotplug and
    expedited grace periods) was incomplete. Although it did eliminate
    deadlocks involving synchronize_sched_expedited()'s acquisition of
    cpu_hotplug.lock via get_online_cpus(), it did nothing about the similar
    deadlock involving acquisition of this same lock via put_online_cpus().
    This deadlock became apparent with testing involving hibernation.

    This commit therefore changes put_online_cpus() acquisition of this lock
    to be conditional, and increments a new cpu_hotplug.puts_pending field
    in case of acquisition failure. Then cpu_hotplug_begin() checks for this
    new field being non-zero, and applies any changes to cpu_hotplug.refcount.

    Reported-by: Jiri Kosina
    Signed-off-by: Paul E. McKenney
    Tested-by: Jiri Kosina
    Tested-by: Borislav Petkov

    Paul E. McKenney
     

19 Sep, 2014

1 commit

  • Currently, the expedited grace-period primitives do get_online_cpus().
    This greatly simplifies their implementation, but means that calls
    to them holding locks that are acquired by CPU-hotplug notifiers (to
    say nothing of calls to these primitives from CPU-hotplug notifiers)
    can deadlock. But this is starting to become inconvenient, as can be
    seen here: https://lkml.org/lkml/2014/8/5/754. The problem in this
    case is that some developers need to acquire a mutex from a CPU-hotplug
    notifier, but also need to hold it across a synchronize_rcu_expedited().
    As noted above, this currently results in deadlock.

    This commit avoids the deadlock and retains the simplicity by creating
    a try_get_online_cpus(), which returns false if the get_online_cpus()
    reference count could not immediately be incremented. If a call to
    try_get_online_cpus() returns true, the expedited primitives operate as
    before. If a call returns false, the expedited primitives fall back to
    normal grace-period operations. This falling back of course results in
    increased grace-period latency, but only during times when CPU hotplug
    operations are actually in flight. The effect should therefore be
    negligible during normal operation.

    Signed-off-by: Paul E. McKenney
    Cc: Josh Triplett
    Cc: "Rafael J. Wysocki"
    Tested-by: Lan Tianyu

    Paul E. McKenney
     

05 Jul, 2014

1 commit

  • 1) Iterate thru all of threads in the system.
    Check for all threads, not only for group leaders.

    2) Check for p->on_rq instead of p->state and cputime.
    Preempted task in !TASK_RUNNING state OR just
    created task may be queued, that we want to be
    reported too.

    3) Use read_lock() instead of write_lock().
    This function does not change any structures, and
    read_lock() is enough.

    Signed-off-by: Kirill Tkhai
    Reviewed-by: Srikar Dronamraju
    Cc: Andrew Morton
    Cc: Ben Segall
    Cc: Fabian Frederick
    Cc: Gautham R. Shenoy
    Cc: Konstantin Khorenko
    Cc: Linus Torvalds
    Cc: Michael wang
    Cc: Mike Galbraith
    Cc: Paul Gortmaker
    Cc: Paul Turner
    Cc: Rafael J. Wysocki
    Cc: Srivatsa S. Bhat
    Cc: Todd E Brandt
    Cc: Toshi Kani
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1403684395.3462.44.camel@tkhai
    Signed-off-by: Ingo Molnar

    Kirill Tkhai
     

13 Jun, 2014

1 commit

  • Pull more ACPI and power management updates from Rafael Wysocki:
    "These are fixups on top of the previous PM+ACPI pull request,
    regression fixes (ACPI hotplug, cpufreq ppc-corenet), other bug fixes
    (ACPI reset, cpufreq), new PM trace points for system suspend
    profiling and a copyright notice update.

    Specifics:

    - I didn't remember correctly that the Hans de Goede's ACPI video
    patches actually didn't flip the video.use_native_backlight
    default, although we had discussed that and decided to do that.
    Since I said we would do that in the previous PM+ACPI pull request,
    make that change for real now.

    - ACPI bus check notifications for PCI host bridges don't cause the
    bus below the host bridge to be checked for changes as they should
    because of a mistake in the ACPI-based PCI hotplug (ACPIPHP)
    subsystem that forgets to add hotplug contexts to PCI host bridge
    ACPI device objects. Create hotplug contexts for PCI host bridges
    too as appropriate.

    - Revert recent cpufreq commit related to the big.LITTLE cpufreq
    driver that breaks arm64 builds.

    - Fix for a regression in the ppc-corenet cpufreq driver introduced
    during the 3.15 cycle and causing the driver to use the remainder
    from do_div instead of the quotient. From Ed Swarthout.

    - Resets triggered by panic activate a BUG_ON() in vmalloc.c on
    systems where the ACPI reset register is located in memory address
    space. Fix from Randy Wright.

    - Fix for a problem with cpufreq governors that decisions made by
    them may be suboptimal due to the fact that deferrable timers are
    used by them for CPU load sampling. From Srivatsa S Bhat.

    - Fix for a problem with the Tegra cpufreq driver where the CPU
    frequency is temporarily switched to a "stable" level that is
    different from both the initial and target frequencies during
    transitions which causes udelay() to expire earlier than it should
    sometimes. From Viresh Kumar.

    - New trace points and rework of some existing trace points for
    system suspend/resume profiling from Todd Brandt.

    - Assorted cpufreq fixes and cleanups from Stratos Karafotis and
    Viresh Kumar.

    - Copyright notice update for suspend-and-cpuhotplug.txt from
    Srivatsa S Bhat"

    * tag 'pm+acpi-3.16-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
    ACPI / hotplug / PCI: Add hotplug contexts to PCI host bridges
    PM / sleep: trace events for device PM callbacks
    cpufreq: cpufreq-cpu0: remove dependency on THERMAL and REGULATOR
    cpufreq: tegra: update comment for clarity
    cpufreq: intel_pstate: Remove duplicate CPU ID check
    cpufreq: Mark CPU0 driver with CPUFREQ_NEED_INITIAL_FREQ_CHECK flag
    PM / Documentation: Update copyright in suspend-and-cpuhotplug.txt
    cpufreq: governor: remove copy_prev_load from 'struct cpu_dbs_common_info'
    cpufreq: governor: Be friendly towards latency-sensitive bursty workloads
    PM / sleep: trace events for suspend/resume
    cpufreq: ppc-corenet-cpu-freq: do_div use quotient
    Revert "cpufreq: Enable big.LITTLE cpufreq driver on arm64"
    cpufreq: Tegra: implement intermediate frequency callbacks
    cpufreq: add support for intermediate (stable) frequencies
    ACPI / video: Change the default for video.use_native_backlight to 1
    ACPI: Fix bug when ACPI reset register is implemented in system memory

    Linus Torvalds
     

12 Jun, 2014

1 commit


07 Jun, 2014

1 commit

  • Adds trace events that give finer resolution into suspend/resume. These
    events are graphed in the timelines generated by the analyze_suspend.py
    script. They represent large areas of time consumed that are typical to
    suspend and resume.

    The event is triggered by calling the function "trace_suspend_resume"
    with three arguments: a string (the name of the event to be displayed
    in the timeline), an integer (case specific number, such as the power
    state or cpu number), and a boolean (where true is used to denote the start
    of the timeline event, and false to denote the end).

    The suspend_resume trace event reproduces the data that the machine_suspend
    trace event did, so the latter has been removed.

    Signed-off-by: Todd Brandt
    Acked-by: Steven Rostedt
    Signed-off-by: Rafael J. Wysocki

    Todd E Brandt
     

05 Jun, 2014

1 commit

  • no level printk converted to pr_warn (if err)
    no level printk converted to pr_info (disabling non-boot cpus)
    Other printk converted to respective level.

    Signed-off-by: Fabian Frederick
    Cc: "Rafael J. Wysocki"
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fabian Frederick
     

22 May, 2014

1 commit

  • Lai found that:

    WARNING: CPU: 1 PID: 13 at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x2d/0x4b()
    ...
    migration_cpu_stop+0x1d/0x22

    was caused by set_cpus_allowed_ptr() assuming that cpu_active_mask is
    always a sub-set of cpu_online_mask.

    This isn't true since 5fbd036b552f ("sched: Cleanup cpu_active madness").

    So set active and online at the same time to avoid this particular
    problem.

    Fixes: 5fbd036b552f ("sched: Cleanup cpu_active madness")
    Signed-off-by: Lai Jiangshan
    Signed-off-by: Peter Zijlstra
    Cc: Andrew Morton
    Cc: Gautham R. Shenoy
    Cc: Linus Torvalds
    Cc: Michael wang
    Cc: Paul Gortmaker
    Cc: Rafael J. Wysocki
    Cc: Srivatsa S. Bhat
    Cc: Toshi Kani
    Link: http://lkml.kernel.org/r/53758B12.8060609@cn.fujitsu.com
    Signed-off-by: Ingo Molnar

    Lai Jiangshan
     

20 Mar, 2014

2 commits

  • The following method of CPU hotplug callback registration is not safe
    due to the possibility of an ABBA deadlock involving the cpu_add_remove_lock
    and the cpu_hotplug.lock.

    get_online_cpus();

    for_each_online_cpu(cpu)
    init_cpu(cpu);

    register_cpu_notifier(&foobar_cpu_notifier);

    put_online_cpus();

    The deadlock is shown below:

    CPU 0 CPU 1
    ----- -----

    Acquire cpu_hotplug.lock
    [via get_online_cpus()]

    CPU online/offline operation
    takes cpu_add_remove_lock
    [via cpu_maps_update_begin()]

    Try to acquire
    cpu_add_remove_lock
    [via register_cpu_notifier()]

    CPU online/offline operation
    tries to acquire cpu_hotplug.lock
    [via cpu_hotplug_begin()]

    *** DEADLOCK! ***

    The problem here is that callback registration takes the locks in one order
    whereas the CPU hotplug operations take the same locks in the opposite order.
    To avoid this issue and to provide a race-free method to register CPU hotplug
    callbacks (along with initialization of already online CPUs), introduce new
    variants of the callback registration APIs that simply register the callbacks
    without holding the cpu_add_remove_lock during the registration. That way,
    we can avoid the ABBA scenario. However, we will need to hold the
    cpu_add_remove_lock throughout the entire critical section, to protect updates
    to the callback/notifier chain.

    This can be achieved by writing the callback registration code as follows:

    cpu_maps_update_begin(); [ or cpu_notifier_register_begin(); see below ]

    for_each_online_cpu(cpu)
    init_cpu(cpu);

    /* This doesn't take the cpu_add_remove_lock */
    __register_cpu_notifier(&foobar_cpu_notifier);

    cpu_maps_update_done(); [ or cpu_notifier_register_done(); see below ]

    Note that we can't use get_online_cpus() here instead of cpu_maps_update_begin()
    because the cpu_hotplug.lock is dropped during the invocation of CPU_POST_DEAD
    notifiers, and hence get_online_cpus() cannot provide the necessary
    synchronization to protect the callback/notifier chains against concurrent
    reads and writes. On the other hand, since the cpu_add_remove_lock protects
    the entire hotplug operation (including CPU_POST_DEAD), we can use
    cpu_maps_update_begin/done() to guarantee proper synchronization.

    Also, since cpu_maps_update_begin/done() is like a super-set of
    get/put_online_cpus(), the former naturally protects the critical sections
    from concurrent hotplug operations.

    Since the names cpu_maps_update_begin/done() don't make much sense in CPU
    hotplug callback registration scenarios, we'll introduce new APIs named
    cpu_notifier_register_begin/done() and map them to cpu_maps_update_begin/done().

    In summary, introduce the lockless variants of un/register_cpu_notifier() and
    also export the cpu_notifier_register_begin/done() APIs for use by modules.
    This way, we provide a race-free way to register hotplug callbacks as well as
    perform initialization for the CPUs that are already online.

    Cc: Thomas Gleixner
    Cc: Andrew Morton
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Acked-by: Oleg Nesterov
    Acked-by: Toshi Kani
    Reviewed-by: Gautham R. Shenoy
    Signed-off-by: Srivatsa S. Bhat
    Signed-off-by: Rafael J. Wysocki

    Srivatsa S. Bhat
     
  • Add lockdep annotations for get/put_online_cpus() and
    cpu_hotplug_begin()/cpu_hotplug_end().

    Cc: Ingo Molnar
    Reviewed-by: Oleg Nesterov
    Signed-off-by: Gautham R. Shenoy
    Signed-off-by: Srivatsa S. Bhat
    Signed-off-by: Rafael J. Wysocki

    Gautham R. Shenoy
     

14 Nov, 2013

1 commit


13 Nov, 2013

2 commits

  • Commit 6acce3ef8:

    sched: Remove get_online_cpus() usage

    tries to do sync_sched/rcu() inside _cpu_down() but triggers:

    INFO: task swapper/0:1 blocked for more than 120 seconds.
    ...
    [] synchronize_rcu+0x2c/0x30
    [] _cpu_down+0x2b2/0x340
    ...

    It was caused by that in the rcu boost case we rely on smpboot thread to
    finish the rcu callback, which has already been parked before sync in here
    and leads to the endless sync_sched/rcu().

    This patch exchanges the sequence of smpboot_park_threads() and
    sync_sched/rcu() to fix the bug.

    Reported-by: Fengguang Wu
    Tested-by: Fengguang Wu
    Signed-off-by: Michael Wang
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/5282EDC0.6060003@linux.vnet.ibm.com
    Signed-off-by: Ingo Molnar

    Michael wang
     
  • cpu_up() has #ifdef CONFIG_MEMORY_HOTPLUG code blocks, which call
    mem_online_node() to put its node online if offlined and then call
    build_all_zonelists() to initialize the zone list.

    These steps are specific to memory hotplug, and should be managed in
    mm/memory_hotplug.c. lock_memory_hotplug() should also be held for the
    whole steps.

    For this reason, this patch replaces mem_online_node() with
    try_online_node(), which performs the whole steps with
    lock_memory_hotplug() held. try_online_node() is named after
    try_offline_node() as they have similar purpose.

    There is no functional change in this patch.

    Signed-off-by: Toshi Kani
    Reviewed-by: Yasuaki Ishimatsu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Toshi Kani
     

16 Oct, 2013

1 commit

  • Remove get_online_cpus() usage from the scheduler; there's 4 sites that
    use it:

    - sched_init_smp(); where its completely superfluous since we're in
    'early' boot and there simply cannot be any hotplugging.

    - sched_getaffinity(); we already take a raw spinlock to protect the
    task cpus_allowed mask, this disables preemption and therefore
    also stabilizes cpu_online_mask as that's modified using
    stop_machine. However switch to active mask for symmetry with
    sched_setaffinity()/set_cpus_allowed_ptr(). We guarantee active
    mask stability by inserting sync_rcu/sched() into _cpu_down.

    - sched_setaffinity(); we don't appear to need get_online_cpus()
    either, there's two sites where hotplug appears relevant:
    * cpuset_cpus_allowed(); for the !cpuset case we use possible_mask,
    for the cpuset case we hold task_lock, which is a spinlock and
    thus for mainline disables preemption (might cause pain on RT).
    * set_cpus_allowed_ptr(); Holds all scheduler locks and thus has
    preemption properly disabled; also it already deals with hotplug
    races explicitly where it releases them.

    - migrate_swap(); we can make stop_two_cpus() do the heavy lifting for
    us with a little trickery. By adding a sync_sched/rcu() after the
    CPU_DOWN_PREPARE notifier we can provide preempt/rcu guarantees for
    cpu_active_mask. Use these to validate that both our cpus are active
    when queueing the stop work before we queue the stop_machine works
    for take_cpu_down().

    Signed-off-by: Peter Zijlstra
    Cc: "Srivatsa S. Bhat"
    Cc: Paul McKenney
    Cc: Mel Gorman
    Cc: Rik van Riel
    Cc: Srikar Dronamraju
    Cc: Andrea Arcangeli
    Cc: Johannes Weiner
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Cc: Steven Rostedt
    Cc: Oleg Nesterov
    Link: http://lkml.kernel.org/r/20131011123820.GV3081@twins.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

13 Aug, 2013

1 commit

  • CPU system maps are protected with reader/writer locks. The reader
    lock, get_online_cpus(), assures that the maps are not updated while
    holding the lock. The writer lock, cpu_hotplug_begin(), is used to
    udpate the cpu maps along with cpu_maps_update_begin().

    However, the ACPI processor handler updates the cpu maps without
    holding the the writer lock.

    acpi_map_lsapic() is called from acpi_processor_hotadd_init() to
    update cpu_possible_mask and cpu_present_mask. acpi_unmap_lsapic()
    is called from acpi_processor_remove() to update cpu_possible_mask.
    Currently, they are either unprotected or protected with the reader
    lock, which is not correct.

    For example, the get_online_cpus() below is supposed to assure that
    cpu_possible_mask is not changed while the code is iterating with
    for_each_possible_cpu().

    get_online_cpus();
    for_each_possible_cpu(cpu) {
    :
    }
    put_online_cpus();

    However, this lock has no protection with CPU hotplug since the ACPI
    processor handler does not use the writer lock when it updates
    cpu_possible_mask. The reader lock does not serialize within the
    readers.

    This patch protects them with the writer lock with cpu_hotplug_begin()
    along with cpu_maps_update_begin(), which must be held before calling
    cpu_hotplug_begin(). It also protects arch_register_cpu() /
    arch_unregister_cpu(), which creates / deletes a sysfs cpu device
    interface. For this purpose it changes cpu_hotplug_begin() and
    cpu_hotplug_done() to global and exports them in cpu.h.

    Signed-off-by: Toshi Kani
    Signed-off-by: Rafael J. Wysocki

    Toshi Kani
     

15 Jul, 2013

1 commit

  • The __cpuinit type of throwaway sections might have made sense
    some time ago when RAM was more constrained, but now the savings
    do not offset the cost and complications. For example, the fix in
    commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
    is a good example of the nasty type of bugs that can be created
    with improper use of the various __init prefixes.

    After a discussion on LKML[1] it was decided that cpuinit should go
    the way of devinit and be phased out. Once all the users are gone,
    we can then finally remove the macros themselves from linux/init.h.

    This removes all the uses of the __cpuinit macros from C files in
    the core kernel directories (kernel, init, lib, mm, and include)
    that don't really have a specific maintainer.

    [1] https://lkml.org/lkml/2013/5/20/589

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

13 Jun, 2013

1 commit

  • There are instances in the kernel where we would like to disable CPU
    hotplug (from sysfs) during some important operation. Today the freezer
    code depends on this and the code to do it was kinda tailor-made for
    that.

    Restructure the code and make it generic enough to be useful for other
    usecases too.

    Signed-off-by: Srivatsa S. Bhat
    Signed-off-by: Robin Holt
    Cc: H. Peter Anvin
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Russ Anderson
    Cc: Robin Holt
    Cc: Russell King
    Cc: Guan Xuetao
    Cc: Shawn Guo
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Srivatsa S. Bhat
     

20 Feb, 2013

1 commit


14 Feb, 2013

1 commit

  • Use the smpboot thread infrastructure. Mark the stopper thread
    selfparking and park it after it has finished the take_cpu_down()
    work.

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Rusty Russell
    Cc: Paul McKenney
    Cc: Srivatsa S. Bhat
    Cc: Arjan van de Veen
    Cc: Paul Turner
    Cc: Richard Weinberger
    Cc: Magnus Damm
    Link: http://lkml.kernel.org/r/20130131120741.686315164@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

28 Jan, 2013

1 commit

  • This is in preparation for the full dynticks feature. While
    remotely reading the cputime of a task running in a full
    dynticks CPU, we'll need to do some extra-computation. This
    way we can account the time it spent tickless in userspace
    since its last cputime snapshot.

    Signed-off-by: Frederic Weisbecker
    Cc: Andrew Morton
    Cc: Ingo Molnar
    Cc: Li Zhong
    Cc: Namhyung Kim
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     

12 Dec, 2012

1 commit

  • Pull x86 BSP hotplug changes from Ingo Molnar:
    "This tree enables CPU#0 (the boot processor) to be onlined/offlined on
    x86, just like any other CPU. Enabled on Intel CPUs for now.

    Allowing this required the identification and fixing of latent CPU#0
    assumptions (such as CPU#0 initializations, etc.) in the x86
    architecture code, plus the identification of barriers to
    BSP-offlining, such as active PIC interrupts which can only be
    serviced on the BSP.

    It's behind a default-off option, and there's a debug option that
    allows the automatic testing of this feature.

    The motivation of this feature is to allow and prepare for true
    CPU-hotplug hardware support: recent changes to MCE support enable us
    to detect a deteriorating but not yet hard-failing L1/L2 cache on a
    CPU that could be soft-unplugged - or a failing L3 cache on a
    multi-socket system.

    Note that true hardware hot-plug is not yet fully enabled by this,
    because that requires a special platform wakeup sequence to be sent to
    the freshly powered up CPU#0. Future patches for this are planned,
    once such a platform exists. Chicken and egg"

    * 'x86-bsp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86, topology: Debug CPU0 hotplug
    x86/i387.c: Initialize thread xstate only on CPU0 only once
    x86, hotplug: Handle retrigger irq by the first available CPU
    x86, hotplug: The first online processor saves the MTRR state
    x86, hotplug: During CPU0 online, enable x2apic, set_numa_node.
    x86, hotplug: Wake up CPU0 via NMI instead of INIT, SIPI, SIPI
    x86-32, hotplug: Add start_cpu0() entry point to head_32.S
    x86-64, hotplug: Add start_cpu0() entry point to head_64.S
    kernel/cpu.c: Add comment for priority in cpu_hotplug_pm_callback
    x86, hotplug, suspend: Online CPU0 for suspend or hibernate
    x86, hotplug: Support functions for CPU0 online/offline
    x86, topology: Don't offline CPU0 if any PIC irq can not be migrated out of it
    x86, Kconfig: Add config switch for CPU0 hotplug
    doc: Add x86 CPU0 online/offline feature

    Linus Torvalds
     

15 Nov, 2012

2 commits

  • Even if acpi_processor_handle_eject() offlines cpu, there is a chance
    to online the cpu after that. So the patch closes the window by using
    get/put_online_cpus().

    Why does the patch change _cpu_up() logic?

    The patch cares the race of hot-remove cpu and _cpu_up(). If the patch
    does not change it, there is the following race.

    hot-remove cpu | _cpu_up()
    ------------------------------------- ------------------------------------
    call acpi_processor_handle_eject() |
    call cpu_down() |
    call get_online_cpus() |
    | call cpu_hotplug_begin() and stop here
    call arch_unregister_cpu() |
    call acpi_unmap_lsapic() |
    call put_online_cpus() |
    | start and continue _cpu_up()
    return acpi_processor_remove() |
    continue hot-remove the cpu |

    So _cpu_up() can continue to itself. And hot-remove cpu can also continue
    itself. If the patch changes _cpu_up() logic, the race disappears as below:

    hot-remove cpu | _cpu_up()
    -----------------------------------------------------------------------
    call acpi_processor_handle_eject() |
    call cpu_down() |
    call get_online_cpus() |
    | call cpu_hotplug_begin() and stop here
    call arch_unregister_cpu() |
    call acpi_unmap_lsapic() |
    cpu's cpu_present is set |
    to false by set_cpu_present()|
    call put_online_cpus() |
    | start _cpu_up()
    | check cpu_present() and return -EINVAL
    return acpi_processor_remove() |
    continue hot-remove the cpu |

    Signed-off-by: Yasuaki Ishimatsu
    Reviewed-by: Srivatsa S. Bhat
    Reviewed-by: Toshi Kani
    Signed-off-by: Rafael J. Wysocki

    Yasuaki Ishimatsu
     
  • cpu_hotplug_pm_callback should have higher priority than
    bsp_pm_callback which depends on cpu_hotplug_pm_callback to disable cpu hotplug
    to avoid race during bsp online checking.

    This is to hightlight the priorities between the two callbacks in case people
    may overlook the order.

    Ideally the priorities should be defined in macro/enum instead of fixed values.
    To do that, a seperate patchset may be pushed which will touch serveral other
    generic files and is out of scope of this patchset.

    Signed-off-by: Fenghua Yu
    Link: http://lkml.kernel.org/r/1352835171-3958-7-git-send-email-fenghua.yu@intel.com
    Reviewed-by: Srivatsa S. Bhat
    Acked-by: Rafael J. Wysocki
    Signed-off-by: H. Peter Anvin

    Fenghua Yu
     

09 Oct, 2012

1 commit

  • The synchronization between CPU hotplug readers and writers is achieved
    by means of refcounting, safeguarded by the cpu_hotplug.lock.

    get_online_cpus() increments the refcount, whereas put_online_cpus()
    decrements it. If we ever hit an imbalance between the two, we end up
    compromising the guarantees of the hotplug synchronization i.e, for
    example, an extra call to put_online_cpus() can end up allowing a
    hotplug reader to execute concurrently with a hotplug writer.

    So, add a WARN_ON() in put_online_cpus() to detect such cases where the
    refcount can go negative, and also attempt to fix it up, so that we can
    continue to run.

    Signed-off-by: Srivatsa S. Bhat
    Reviewed-by: Yasuaki Ishimatsu
    Cc: Jiri Kosina
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Srivatsa S. Bhat
     

02 Oct, 2012

1 commit

  • Pull x86/asm changes from Ingo Molnar:
    "The one change that stands out is the alternatives patching change
    that prevents us from ever patching back instructions from SMP to UP:
    this simplifies things and speeds up CPU hotplug.

    Other than that it's smaller fixes, cleanups and improvements."

    * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86: Unspaghettize do_trap()
    x86_64: Work around old GAS bug
    x86: Use REP BSF unconditionally
    x86: Prefer TZCNT over BFS
    x86/64: Adjust types of temporaries used by ffs()/fls()/fls64()
    x86: Drop unnecessary kernel_eflags variable on 64-bit
    x86/smp: Don't ever patch back to UP if we unplug cpus

    Linus Torvalds
     

23 Aug, 2012

1 commit

  • We still patch SMP instructions to UP variants if we boot with a
    single CPU, but not at any other time. In particular, not if we
    unplug CPUs to return to a single cpu.

    Paul McKenney points out:

    mean offline overhead is 6251/48=130.2 milliseconds.

    If I remove the alternatives_smp_switch() from the offline
    path [...] the mean offline overhead is 550/42=13.1 milliseconds

    Basically, we're never going to get those 120ms back, and the
    code is pretty messy.

    We get rid of:

    1) The "smp-alt-once" boot option. It's actually "smp-alt-boot", the
    documentation is wrong. It's now the default.

    2) The skip_smp_alternatives flag used by suspend.

    3) arch_disable_nonboot_cpus_begin() and arch_disable_nonboot_cpus_end()
    which were only used to set this one flag.

    Signed-off-by: Rusty Russell
    Cc: Paul McKenney
    Cc: Suresh Siddha
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/87vcgwwive.fsf@rustcorp.com.au
    Signed-off-by: Ingo Molnar

    Rusty Russell
     

13 Aug, 2012

1 commit

  • Provide a generic interface for setting up and tearing down percpu
    threads.

    On registration the threads for already online cpus are created and
    started. On deregistration (modules) the threads are stoppped.

    During hotplug operations the threads are created, started, parked and
    unparked. The datastructure for registration provides a pointer to
    percpu storage space and optional setup, cleanup, park, unpark
    functions. These functions are called when the thread state changes.

    Each implementation has to provide a function which is queried and
    returns whether the thread should run and the thread function itself.

    The core code handles all state transitions and avoids duplicated code
    in the call sites.

    [ paulmck: Preemption leak fix ]

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Reviewed-by: Srivatsa S. Bhat
    Cc: Rusty Russell
    Reviewed-by: Paul E. McKenney
    Cc: Namhyung Kim
    Link: http://lkml.kernel.org/r/20120716103948.352501068@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

01 Aug, 2012

1 commit

  • When hotadd_new_pgdat() is called to create new pgdat for a new node, a
    fallback zonelist should be created for the new node. There's code to try
    to achieve that in hotadd_new_pgdat() as below:

    /*
    * The node we allocated has no zone fallback lists. For avoiding
    * to access not-initialized zonelist, build here.
    */
    mutex_lock(&zonelists_mutex);
    build_all_zonelists(pgdat, NULL);
    mutex_unlock(&zonelists_mutex);

    But it doesn't work as expected. When hotadd_new_pgdat() is called, the
    new node is still in offline state because node_set_online(nid) hasn't
    been called yet. And build_all_zonelists() only builds zonelists for
    online nodes as:

    for_each_online_node(nid) {
    pg_data_t *pgdat = NODE_DATA(nid);

    build_zonelists(pgdat);
    build_zonelist_cache(pgdat);
    }

    Though we hope to create zonelist for the new pgdat, but it doesn't. So
    add a new parameter "pgdat" the build_all_zonelists() to build pgdat for
    the new pgdat too.

    Signed-off-by: Jiang Liu
    Signed-off-by: Xishi Qiu
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: Minchan Kim
    Cc: Rusty Russell
    Cc: Yinghai Lu
    Cc: Tony Luck
    Cc: KAMEZAWA Hiroyuki
    Cc: KOSAKI Motohiro
    Cc: David Rientjes
    Cc: Keping Chen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     

01 Jun, 2012

2 commits

  • Add more comments on clear_tasks_mm_cpumask, plus adds a runtime check:
    the function is only suitable for offlined CPUs, and if called
    inappropriately, the kernel should scream aloud.

    [akpm@linux-foundation.org: tweak comment: s/walks up/walks/, use 80 cols]
    Suggested-by: Andrew Morton
    Suggested-by: Peter Zijlstra
    Signed-off-by: Anton Vorontsov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Anton Vorontsov
     
  • Many architectures clear tasks' mm_cpumask like this:

    read_lock(&tasklist_lock);
    for_each_process(p) {
    if (p->mm)
    cpumask_clear_cpu(cpu, mm_cpumask(p->mm));
    }
    read_unlock(&tasklist_lock);

    Depending on the context, the code above may have several problems,
    such as:

    1. Working with task->mm w/o getting mm or grabing the task lock is
    dangerous as ->mm might disappear (exit_mm() assigns NULL under
    task_lock(), so tasklist lock is not enough).

    2. Checking for process->mm is not enough because process' main
    thread may exit or detach its mm via use_mm(), but other threads
    may still have a valid mm.

    This patch implements a small helper function that does things
    correctly, i.e.:

    1. We take the task's lock while whe handle its mm (we can't use
    get_task_mm()/mmput() pair as mmput() might sleep);

    2. To catch exited main thread case, we use find_lock_task_mm(),
    which walks up all threads and returns an appropriate task
    (with task lock held).

    Also, Per Peter Zijlstra's idea, now we don't grab tasklist_lock in
    the new helper, instead we take the rcu read lock. We can do this
    because the function is called after the cpu is taken down and marked
    offline, so no new tasks will get this cpu set in their mm mask.

    Signed-off-by: Anton Vorontsov
    Cc: Richard Weinberger
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Russell King
    Cc: Benjamin Herrenschmidt
    Cc: Mike Frysinger
    Cc: Paul Mundt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Anton Vorontsov