06 Dec, 2018

2 commits

  • commit a74cfffb03b73d41e08f84c2e5c87dec0ce3db9f upstream

    arch_smt_update() is only called when the sysfs SMT control knob is
    changed. This means that when SMT is enabled in the sysfs control knob the
    system is considered to have SMT active even if all siblings are offline.

    To allow finegrained control of the speculation mitigations, the actual SMT
    state is more interesting than the fact that siblings could be enabled.

    Rework the code, so arch_smt_update() is invoked from each individual CPU
    hotplug function, and simplify the update function while at it.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Andy Lutomirski
    Cc: Linus Torvalds
    Cc: Jiri Kosina
    Cc: Tom Lendacky
    Cc: Josh Poimboeuf
    Cc: Andrea Arcangeli
    Cc: David Woodhouse
    Cc: Tim Chen
    Cc: Andi Kleen
    Cc: Dave Hansen
    Cc: Casey Schaufler
    Cc: Asit Mallick
    Cc: Arjan van de Ven
    Cc: Jon Masters
    Cc: Waiman Long
    Cc: Greg KH
    Cc: Dave Stewart
    Cc: Kees Cook
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20181125185004.521974984@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • commit 53c613fe6349994f023245519265999eed75957f upstream

    STIBP is a feature provided by certain Intel ucodes / CPUs. This feature
    (once enabled) prevents cross-hyperthread control of decisions made by
    indirect branch predictors.

    Enable this feature if

    - the CPU is vulnerable to spectre v2
    - the CPU supports SMT and has SMT siblings online
    - spectre_v2 mitigation autoselection is enabled (default)

    After some previous discussion, this leaves STIBP on all the time, as wrmsr
    on crossing kernel boundary is a no-no. This could perhaps later be a bit
    more optimized (like disabling it in NOHZ, experiment with disabling it in
    idle, etc) if needed.

    Note that the synchronization of the mask manipulation via newly added
    spec_ctrl_mutex is currently not strictly needed, as the only updater is
    already being serialized by cpu_add_remove_lock, but let's make this a
    little bit more future-proof.

    Signed-off-by: Jiri Kosina
    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Josh Poimboeuf
    Cc: Andrea Arcangeli
    Cc: "WoodhouseDavid"
    Cc: Andi Kleen
    Cc: Tim Chen
    Cc: "SchauflerCasey"
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251438240.15880@cbobk.fhfr.pm
    Signed-off-by: Greg Kroah-Hartman

    Jiri Kosina
     

23 Nov, 2018

1 commit

  • This reverts commit 8a13906ae519b3ed95cd0fb73f1098b46362f6c4 which is
    commit 53c613fe6349994f023245519265999eed75957f upstream.

    It's not ready for the stable trees as there are major slowdowns
    involved with this patch.

    Reported-by: Jiri Kosina
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Josh Poimboeuf
    Cc: Andrea Arcangeli
    Cc: "WoodhouseDavid"
    Cc: Andi Kleen
    Cc: Tim Chen
    Cc: "SchauflerCasey"
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

14 Nov, 2018

1 commit

  • commit 53c613fe6349994f023245519265999eed75957f upstream.

    STIBP is a feature provided by certain Intel ucodes / CPUs. This feature
    (once enabled) prevents cross-hyperthread control of decisions made by
    indirect branch predictors.

    Enable this feature if

    - the CPU is vulnerable to spectre v2
    - the CPU supports SMT and has SMT siblings online
    - spectre_v2 mitigation autoselection is enabled (default)

    After some previous discussion, this leaves STIBP on all the time, as wrmsr
    on crossing kernel boundary is a no-no. This could perhaps later be a bit
    more optimized (like disabling it in NOHZ, experiment with disabling it in
    idle, etc) if needed.

    Note that the synchronization of the mask manipulation via newly added
    spec_ctrl_mutex is currently not strictly needed, as the only updater is
    already being serialized by cpu_add_remove_lock, but let's make this a
    little bit more future-proof.

    Signed-off-by: Jiri Kosina
    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Josh Poimboeuf
    Cc: Andrea Arcangeli
    Cc: "WoodhouseDavid"
    Cc: Andi Kleen
    Cc: Tim Chen
    Cc: "SchauflerCasey"
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251438240.15880@cbobk.fhfr.pm
    Signed-off-by: Greg Kroah-Hartman

    Jiri Kosina
     

20 Sep, 2018

2 commits

  • commit 69fa6eb7d6a64801ea261025cce9723d9442d773 upstream.

    When a teardown callback fails, the CPU hotplug code brings the CPU back to
    the previous state. The previous state becomes the new target state. The
    rollback happens in undo_cpu_down() which increments the state
    unconditionally even if the state is already the same as the target.

    As a consequence the next CPU hotplug operation will start at the wrong
    state. This is easily to observe when __cpu_disable() fails.

    Prevent the unconditional undo by checking the state vs. target before
    incrementing state and fix up the consequently wrong conditional in the
    unplug code which handles the failure of the final CPU take down on the
    control CPU side.

    Fixes: 4dddfb5faa61 ("smp/hotplug: Rewrite AP state machine core")
    Reported-by: Neeraj Upadhyay
    Signed-off-by: Thomas Gleixner
    Tested-by: Geert Uytterhoeven
    Tested-by: Sudeep Holla
    Tested-by: Neeraj Upadhyay
    Cc: josh@joshtriplett.org
    Cc: peterz@infradead.org
    Cc: jiangshanlai@gmail.com
    Cc: dzickus@redhat.com
    Cc: brendan.jackman@arm.com
    Cc: malat@debian.org
    Cc: sramana@codeaurora.org
    Cc: linux-arm-msm@vger.kernel.org
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1809051419580.1416@nanos.tec.linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    ----

    Thomas Gleixner
     
  • commit f8b7530aa0a1def79c93101216b5b17cf408a70a upstream.

    The smp_mb() in cpuhp_thread_fun() is misplaced. It needs to be after the
    load of st->should_run to prevent reordering of the later load/stores
    w.r.t. the load of st->should_run.

    Fixes: 4dddfb5faa61 ("smp/hotplug: Rewrite AP state machine core")
    Signed-off-by: Neeraj Upadhyay
    Signed-off-by: Thomas Gleixner
    Acked-by: Peter Zijlstra (Intel)
    Cc: josh@joshtriplett.org
    Cc: peterz@infradead.org
    Cc: jiangshanlai@gmail.com
    Cc: dzickus@redhat.com
    Cc: brendan.jackman@arm.com
    Cc: malat@debian.org
    Cc: mojha@codeaurora.org
    Cc: sramana@codeaurora.org
    Cc: linux-arm-msm@vger.kernel.org
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/1536126727-11629-1-git-send-email-neeraju@codeaurora.org
    Signed-off-by: Greg Kroah-Hartman

    Neeraj Upadhyay
     

16 Aug, 2018

12 commits

  • commit 269777aa530f3438ec1781586cdac0b5fe47b061 upstream.

    Commit 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once")
    breaks non-SMP builds.

    [ I suspect the 'bool' fields should just be made to be bitfields and be
    exposed regardless of configuration, but that's a separate cleanup
    that I'll leave to the owners of this file for later. - Linus ]

    Fixes: 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once")
    Cc: Dave Hansen
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Signed-off-by: Abel Vesa
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Abel Vesa
     
  • commit bc2d8d262cba5736332cbc866acb11b1c5748aa9 upstream

    Josh reported that the late SMT evaluation in cpu_smt_state_init() sets
    cpu_smt_control to CPU_SMT_NOT_SUPPORTED in case that 'nosmt' was supplied
    on the kernel command line as it cannot differentiate between SMT disabled
    by BIOS and SMT soft disable via 'nosmt'. That wreckages the state and
    makes the sysfs interface unusable.

    Rework this so that during bringup of the non boot CPUs the availability of
    SMT is determined in cpu_smt_allowed(). If a newly booted CPU is not a
    'primary' thread then set the local cpu_smt_available marker and evaluate
    this explicitely right after the initial SMP bringup has finished.

    SMT evaulation on x86 is a trainwreck as the firmware has all the
    information _before_ booting the kernel, but there is no interface to query
    it.

    Fixes: 73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS")
    Reported-by: Josh Poimboeuf
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • commit 73d5e2b472640b1fcdb61ae8be389912ef211bda upstream

    If SMT is disabled in BIOS, the CPU code doesn't properly detect it.
    The /sys/devices/system/cpu/smt/control file shows 'on', and the 'l1tf'
    vulnerabilities file shows SMT as vulnerable.

    Fix it by forcing 'cpu_smt_control' to CPU_SMT_NOT_SUPPORTED in such a
    case. Unfortunately the detection can only be done after bringing all
    the CPUs online, so we have to overwrite any previous writes to the
    variable.

    Reported-by: Joe Mario
    Tested-by: Jiri Kosina
    Fixes: f048c399e0f7 ("x86/topology: Provide topology_smt_supported()")
    Signed-off-by: Josh Poimboeuf
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Josh Poimboeuf
     
  • commit fee0aede6f4739c87179eca76136f83210953b86 upstream

    The CPU_SMT_NOT_SUPPORTED state is set (if the processor does not support
    SMT) when the sysfs SMT control file is initialized.

    That was fine so far as this was only required to make the output of the
    control file correct and to prevent writes in that case.

    With the upcoming l1tf command line parameter, this needs to be set up
    before the L1TF mitigation selection and command line parsing happens.

    Signed-off-by: Thomas Gleixner
    Tested-by: Jiri Kosina
    Reviewed-by: Greg Kroah-Hartman
    Reviewed-by: Josh Poimboeuf
    Link: https://lkml.kernel.org/r/20180713142323.121795971@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • commit 8e1b706b6e819bed215c0db16345568864660393 upstream

    The L1TF mitigation will gain a commend line parameter which allows to set
    a combination of hypervisor mitigation and SMT control.

    Expose cpu_smt_disable() so the command line parser can tweak SMT settings.

    [ tglx: Split out of larger patch and made it preserve an already existing
    force off state ]

    Signed-off-by: Jiri Kosina
    Signed-off-by: Thomas Gleixner
    Tested-by: Jiri Kosina
    Reviewed-by: Greg Kroah-Hartman
    Reviewed-by: Josh Poimboeuf
    Link: https://lkml.kernel.org/r/20180713142323.039715135@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Jiri Kosina
     
  • commit 215af5499d9e2b55f111d2431ea20218115f29b3 upstream

    Writing 'off' to /sys/devices/system/cpu/smt/control offlines all SMT
    siblings. Writing 'on' merily enables the abilify to online them, but does
    not online them automatically.

    Make 'on' more useful by onlining all offline siblings.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • commit 26acfb666a473d960f0fd971fe68f3e3ad16c70b upstream

    If the L1TF CPU bug is present we allow the KVM module to be loaded as the
    major of users that use Linux and KVM have trusted guests and do not want a
    broken setup.

    Cloud vendors are the ones that are uncomfortable with CVE 2018-3620 and as
    such they are the ones that should set nosmt to one.

    Setting 'nosmt' means that the system administrator also needs to disable
    SMT (Hyper-threading) in the BIOS, or via the 'nosmt' command line
    parameter, or via the /sys/devices/system/cpu/smt/control. See commit
    05736e4ac13c ("cpu/hotplug: Provide knobs to control SMT").

    Other mitigations are to use task affinity, cpu sets, interrupt binding,
    etc - anything to make sure that _only_ the same guests vCPUs are running
    on sibling threads.

    Signed-off-by: Konrad Rzeszutek Wilk
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Konrad Rzeszutek Wilk
     
  • commit 0cc3cd21657be04cb0559fe8063f2130493f92cf upstream

    Due to the way Machine Check Exceptions work on X86 hyperthreads it's
    required to boot up _all_ logical cores at least once in order to set the
    CR4.MCE bit.

    So instead of ignoring the sibling threads right away, let them boot up
    once so they can configure themselves. After they came out of the initial
    boot stage check whether its a "secondary" sibling and cancel the operation
    which puts the CPU back into offline state.

    Reported-by: Dave Hansen
    Signed-off-by: Thomas Gleixner
    Tested-by: Tony Luck
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • commit 05736e4ac13c08a4a9b1ef2de26dd31a32cbee57 upstream

    Provide a command line and a sysfs knob to control SMT.

    The command line options are:

    'nosmt': Enumerate secondary threads, but do not online them

    'nosmt=force': Ignore secondary threads completely during enumeration
    via MP table and ACPI/MADT.

    The sysfs control file has the following states (read/write):

    'on': SMT is enabled. Secondary threads can be freely onlined
    'off': SMT is disabled. Secondary threads, even if enumerated
    cannot be onlined
    'forceoff': SMT is permanentely disabled. Writes to the control
    file are rejected.
    'notsupported': SMT is not supported by the CPU

    The command line option 'nosmt' sets the sysfs control to 'off'. This
    can be changed to 'on' to reenable SMT during runtime.

    The command line option 'nosmt=force' sets the sysfs control to
    'forceoff'. This cannot be changed during runtime.

    When SMT is 'on' and the control file is changed to 'off' then all online
    secondary threads are offlined and attempts to online a secondary thread
    later on are rejected.

    When SMT is 'off' and the control file is changed to 'on' then secondary
    threads can be onlined again. The 'off' -> 'on' transition does not
    automatically online the secondary threads.

    When the control file is set to 'forceoff', the behaviour is the same as
    setting it to 'off', but the operation is irreversible and later writes to
    the control file are rejected.

    When the control status is 'notsupported' then writes to the control file
    are rejected.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Konrad Rzeszutek Wilk
    Acked-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • commit cc1fe215e1efa406b03aa4389e6269b61342dec5 upstream

    Split out the inner workings of do_cpu_down() to allow reuse of that
    function for the upcoming SMT disabling mechanism.

    No functional change.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Konrad Rzeszutek Wilk
    Acked-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • commit c4de65696d865c225fda3b9913b31284ea65ea96 upstream

    The asymmetry caused a warning to trigger if the bootup was stopped in state
    CPUHP_AP_ONLINE_IDLE. The warning no longer triggers as kthread_park() can
    now be invoked on already or still parked threads. But there is still no
    reason to have this be asymmetric.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Konrad Rzeszutek Wilk
    Acked-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • commit b5b1404d0815894de0690de8a1ab58269e56eae6 upstream.

    This is purely a preparatory patch for upcoming changes during the 4.19
    merge window.

    We have a function called "boot_cpu_state_init()" that isn't really
    about the bootup cpu state: that is done much earlier by the similarly
    named "boot_cpu_init()" (note lack of "state" in name).

    This function initializes some hotplug CPU state, and needs to run after
    the percpu data has been properly initialized. It even has a comment to
    that effect.

    Except it _doesn't_ actually run after the percpu data has been properly
    initialized. On x86 it happens to do that, but on at least arm and
    arm64, the percpu base pointers are initialized by the arch-specific
    'smp_prepare_boot_cpu()' hook, which ran _after_ boot_cpu_state_init().

    This had some unexpected results, and in particular we have a patch
    pending for the merge window that did the obvious cleanup of using
    'this_cpu_write()' in the cpu hotplug init code:

    - per_cpu_ptr(&cpuhp_state, smp_processor_id())->state = CPUHP_ONLINE;
    + this_cpu_write(cpuhp_state.state, CPUHP_ONLINE);

    which is obviously the right thing to do. Except because of the
    ordering issue, it actually failed miserably and unexpectedly on arm64.

    So this just fixes the ordering, and changes the name of the function to
    be 'boot_cpu_hotplug_init()' to make it obvious that it's about cpu
    hotplug state, because the core CPU state was supposed to have already
    been done earlier.

    Marked for stable, since the (not yet merged) patch that will show this
    problem is marked for stable.

    Reported-by: Vlastimil Babka
    Reported-by: Mian Yousaf Kaukab
    Suggested-by: Catalin Marinas
    Acked-by: Thomas Gleixner
    Cc: Will Deacon
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Linus Torvalds
     

03 Jan, 2018

1 commit

  • commit 26456f87aca7157c057de65c9414b37f1ab881d1 upstream.

    The timer wheel bases are not (re)initialized on CPU hotplug. That leaves
    them with a potentially stale clk and next_expiry valuem, which can cause
    trouble then the CPU is plugged.

    Add a prepare callback which forwards the clock, sets next_expiry to far in
    the future and reset the control flags to a known state.

    Set base->must_forward_clk so the first timer which is queued will try to
    forward the clock to current jiffies.

    Fixes: 500462a9de65 ("timers: Switch to a non-cascading wheel")
    Reported-by: Paul E. McKenney
    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Frederic Weisbecker
    Cc: Sebastian Siewior
    Cc: Anna-Maria Gleixner
    Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1712272152200.2431@nanos
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

14 Dec, 2017

1 commit

  • commit 46febd37f9c758b05cd25feae8512f22584742fe upstream.

    Commit 31487f8328f2 ("smp/cfd: Convert core to hotplug state machine")
    accidently put this step on the wrong place. The step should be at the
    cpuhp_ap_states[] rather than the cpuhp_bp_states[].

    grep smpcfd /sys/devices/system/cpu/hotplug/states
    40: smpcfd:prepare
    129: smpcfd:dying

    "smpcfd:dying" was missing before.
    So was the invocation of the function smpcfd_dying_cpu().

    Fixes: 31487f8328f2 ("smp/cfd: Convert core to hotplug state machine")
    Signed-off-by: Lai Jiangshan
    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Richard Weinberger
    Cc: Sebastian Andrzej Siewior
    Cc: Boris Ostrovsky
    Link: https://lkml.kernel.org/r/20171128131954.81229-1-jiangshanlai@gmail.com
    Signed-off-by: Greg Kroah-Hartman

    Lai Jiangshan
     

21 Oct, 2017

1 commit

  • The recent rework of the cpu hotplug internals changed the usage of the per
    cpu state->node field, but missed to clean it up after usage.

    So subsequent hotplug operations use the stale pointer from a previous
    operation and hand it into the callback functions. The callbacks then
    dereference a pointer which either belongs to a different facility or
    points to freed and potentially reused memory. In either case data
    corruption and crashes are the obvious consequence.

    Reset the node and the last pointers in the per cpu state to NULL after the
    operation which set them has completed.

    Fixes: 96abb968549c ("smp/hotplug: Allow external multi-instance rollback")
    Reported-by: Tvrtko Ursulin
    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Sebastian Andrzej Siewior
    Cc: Boris Ostrovsky
    Cc: "Paul E. McKenney"
    Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1710211606130.3213@nanos

    Thomas Gleixner
     

06 Oct, 2017

1 commit

  • Pull watchddog clean-up and fixes from Thomas Gleixner:
    "The watchdog (hard/softlockup detector) code is pretty much broken in
    its current state. The patch series addresses this by removing all
    duct tape and refactoring it into a workable state.

    The reasons why I ask for inclusion that late in the cycle are:

    1) The code causes lockdep splats vs. hotplug locking which get
    reported over and over. Unfortunately there is no easy fix.

    2) The risk of breakage is minimal because it's already broken

    3) As 4.14 is a long term stable kernel, I prefer to have working
    watchdog code in that and the lockdep issues resolved. I wouldn't
    ask you to pull if 4.14 wouldn't be a LTS kernel or if the
    solution would be easy to backport.

    4) The series was around before the merge window opened, but then got
    delayed due to the UP failure caused by the for_each_cpu()
    surprise which we discussed recently.

    Changes vs. V1:

    - Addressed your review points

    - Addressed the warning in the powerpc code which was discovered late

    - Changed two function names which made sense up to a certain point
    in the series. Now they match what they do in the end.

    - Fixed a 'unused variable' warning, which got not detected by the
    intel robot. I triggered it when trying all possible related config
    combinations manually. Randconfig testing seems not random enough.

    The changes have been tested by and reviewed by Don Zickus and tested
    and acked by Micheal Ellerman for powerpc"

    * 'core-watchdog-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
    watchdog/core: Put softlockup_threads_initialized under ifdef guard
    watchdog/core: Rename some softlockup_* functions
    powerpc/watchdog: Make use of watchdog_nmi_probe()
    watchdog/core, powerpc: Lock cpus across reconfiguration
    watchdog/core, powerpc: Replace watchdog_nmi_reconfigure()
    watchdog/hardlockup/perf: Fix spelling mistake: "permanetely" -> "permanently"
    watchdog/hardlockup/perf: Cure UP damage
    watchdog/hardlockup: Clean up hotplug locking mess
    watchdog/hardlockup/perf: Simplify deferred event destroy
    watchdog/hardlockup/perf: Use new perf CPU enable mechanism
    watchdog/hardlockup/perf: Implement CPU enable replacement
    watchdog/hardlockup/perf: Implement init time detection of perf
    watchdog/hardlockup/perf: Implement init time perf validation
    watchdog/core: Get rid of the racy update loop
    watchdog/core, powerpc: Make watchdog_nmi_reconfigure() two stage
    watchdog/sysctl: Clean up sysctl variable name space
    watchdog/sysctl: Get rid of the #ifdeffery
    watchdog/core: Clean up header mess
    watchdog/core: Further simplify sysctl handling
    watchdog/core: Get rid of the thread teardown/setup dance
    ...

    Linus Torvalds
     

26 Sep, 2017

6 commits

  • Add a sysfs file to one-time fail a specific state. This can be used
    to test the state rollback code paths.

    Something like this (hotplug-up.sh):

    #!/bin/bash

    echo 0 > /debug/sched_debug
    echo 1 > /debug/tracing/events/cpuhp/enable

    ALL_STATES=`cat /sys/devices/system/cpu/hotplug/states | cut -d':' -f1`
    STATES=${1:-$ALL_STATES}

    for state in $STATES
    do
    echo 0 > /sys/devices/system/cpu/cpu1/online
    echo 0 > /debug/tracing/trace
    echo Fail state: $state
    echo $state > /sys/devices/system/cpu/cpu1/hotplug/fail
    cat /sys/devices/system/cpu/cpu1/hotplug/fail
    echo 1 > /sys/devices/system/cpu/cpu1/online

    cat /debug/tracing/trace > hotfail-${state}.trace

    sleep 1
    done

    Can be used to test for all possible rollback (barring multi-instance)
    scenarios on CPU-up, CPU-down is a trivial modification of the above.

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Thomas Gleixner
    Cc: bigeasy@linutronix.de
    Cc: efault@gmx.de
    Cc: rostedt@goodmis.org
    Cc: max.byungchul.park@gmail.com
    Link: https://lkml.kernel.org/r/20170920170546.972581715@infradead.org

    Peter Zijlstra
     
  • With lockdep-crossrelease we get deadlock reports that span cpu-up and
    cpu-down chains. Such deadlocks cannot possibly happen because cpu-up
    and cpu-down are globally serialized.

    takedown_cpu()
    irq_lock_sparse()
    wait_for_completion(&st->done)

    cpuhp_thread_fun
    cpuhp_up_callback
    cpuhp_invoke_callback
    irq_affinity_online_cpu
    irq_local_spare()
    irq_unlock_sparse()
    complete(&st->done)

    Now that we have consistent AP state, we can trivially separate the
    AP completion between up and down using st->bringup.

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Thomas Gleixner
    Acked-by: max.byungchul.park@gmail.com
    Cc: bigeasy@linutronix.de
    Cc: efault@gmx.de
    Cc: rostedt@goodmis.org
    Link: https://lkml.kernel.org/r/20170920170546.872472799@infradead.org

    Peter Zijlstra
     
  • With lockdep-crossrelease we get deadlock reports that span cpu-up and
    cpu-down chains. Such deadlocks cannot possibly happen because cpu-up
    and cpu-down are globally serialized.

    CPU0 CPU1 CPU2
    cpuhp_up_callbacks: takedown_cpu: cpuhp_thread_fun:

    cpuhp_state
    irq_lock_sparse()
    irq_lock_sparse()
    wait_for_completion()
    cpuhp_state
    complete()

    Now that we have consistent AP state, we can trivially separate the
    AP-work class between up and down using st->bringup.

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Thomas Gleixner
    Cc: max.byungchul.park@gmail.com
    Cc: bigeasy@linutronix.de
    Cc: efault@gmx.de
    Cc: rostedt@goodmis.org
    Link: https://lkml.kernel.org/r/20170920170546.922524234@infradead.org

    Peter Zijlstra
     
  • While the generic callback functions have an 'int' return and thus
    appear to be allowed to return error, this is not true for all states.

    Specifically, what used to be STARTING/DYING are ran with IRQs
    disabled from critical parts of CPU bringup/teardown and are not
    allowed to fail. Add WARNs to enforce this rule.

    But since some callbacks are indeed allowed to fail, we have the
    situation where a state-machine rollback encounters a failure, in this
    case we're stuck, we can't go forward and we can't go back. Also add a
    WARN for that case.

    AFAICT this is a fundamental 'problem' with no real obvious solution.
    We want the 'prepare' callbacks to allow failure on either up or down.
    Typically on prepare-up this would be things like -ENOMEM from
    resource allocations, and the typical usage in prepare-down would be
    something like -EBUSY to avoid CPUs being taken away.

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Thomas Gleixner
    Cc: bigeasy@linutronix.de
    Cc: efault@gmx.de
    Cc: rostedt@goodmis.org
    Cc: max.byungchul.park@gmail.com
    Link: https://lkml.kernel.org/r/20170920170546.819539119@infradead.org

    Peter Zijlstra
     
  • There is currently no explicit state change on rollback. That is,
    st->bringup, st->rollback and st->target are not consistent when doing
    the rollback.

    Rework the AP state handling to be more coherent. This does mean we
    have to do a second AP kick-and-wait for rollback, but since rollback
    is the slow path of a slowpath, this really should not matter.

    Take this opportunity to simplify the AP thread function to only run a
    single callback per invocation. This unifies the three single/up/down
    modes is supports. The looping it used to do for up/down are achieved
    by retaining should_run and relying on the main smpboot_thread_fn()
    loop.

    (I have most of a patch that does the same for the BP state handling,
    but that's not critical and gets a little complicated because
    CPUHP_BRINGUP_CPU does the AP handoff from a callback, which gets
    recursive @st usage, I still have de-fugly that.)

    [ tglx: Move cpuhp_down_callbacks() et al. into the HOTPLUG_CPU section to
    avoid gcc complaining about unused functions. Make the HOTPLUG_CPU
    one piece instead of having two consecutive ifdef sections of the
    same type. ]

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Thomas Gleixner
    Cc: bigeasy@linutronix.de
    Cc: efault@gmx.de
    Cc: rostedt@goodmis.org
    Cc: max.byungchul.park@gmail.com
    Link: https://lkml.kernel.org/r/20170920170546.769658088@infradead.org

    Peter Zijlstra
     
  • Currently the rollback of multi-instance states is handled inside
    cpuhp_invoke_callback(). The problem is that when we want to allow an
    explicit state change for rollback, we need to return from the
    function without doing the rollback.

    Change cpuhp_invoke_callback() to optionally return the multi-instance
    state, such that rollback can be done from a subsequent call.

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Thomas Gleixner
    Cc: bigeasy@linutronix.de
    Cc: efault@gmx.de
    Cc: rostedt@goodmis.org
    Cc: max.byungchul.park@gmail.com
    Link: https://lkml.kernel.org/r/20170920170546.720361181@infradead.org

    Peter Zijlstra
     

14 Sep, 2017

1 commit

  • The following deadlock is possible in the watchdog hotplug code:

    cpus_write_lock()
    ...
    takedown_cpu()
    smpboot_park_threads()
    smpboot_park_thread()
    kthread_park()
    ->park() := watchdog_disable()
    watchdog_nmi_disable()
    perf_event_release_kernel();
    put_event()
    _free_event()
    ->destroy() := hw_perf_event_destroy()
    x86_release_hardware()
    release_ds_buffers()
    get_online_cpus()

    when a per cpu watchdog perf event is destroyed which drops the last
    reference to the PMU hardware. The cleanup code there invokes
    get_online_cpus() which instantly deadlocks because the hotplug percpu
    rwsem is write locked.

    To solve this add a deferring mechanism:

    cpus_write_lock()
    kthread_park()
    watchdog_nmi_disable(deferred)
    perf_event_disable(event);
    move_event_to_deferred(event);
    ....
    cpus_write_unlock()
    cleaup_deferred_events()
    perf_event_release_kernel()

    This is still properly serialized against concurrent hotplug via the
    cpu_add_remove_lock, which is held by the task which initiated the hotplug
    event.

    This is also used to handle event destruction when the watchdog threads are
    parked via other mechanisms than CPU hotplug.

    Analyzed-by: Peter Zijlstra

    Reported-by: Borislav Petkov
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Don Zickus
    Cc: Andrew Morton
    Cc: Chris Metcalf
    Cc: Linus Torvalds
    Cc: Nicholas Piggin
    Cc: Peter Zijlstra
    Cc: Sebastian Siewior
    Cc: Ulrich Obergfell
    Link: http://lkml.kernel.org/r/20170912194146.884469246@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

05 Sep, 2017

1 commit


26 Jul, 2017

1 commit

  • RCU callbacks must be migrated away from an outgoing CPU, and this is
    done near the end of the CPU-hotplug operation, after the outgoing CPU is
    long gone. Unfortunately, this means that other CPU-hotplug callbacks
    can execute while the outgoing CPU's callbacks are still immobilized
    on the long-gone CPU's callback lists. If any of these CPU-hotplug
    callbacks must wait, either directly or indirectly, for the invocation
    of any of the immobilized RCU callbacks, the system will hang.

    This commit avoids such hangs by migrating the callbacks away from the
    outgoing CPU immediately upon its departure, shortly after the return
    from __cpu_die() in takedown_cpu(). Thus, RCU is able to advance these
    callbacks and invoke them, which allows all the after-the-fact CPU-hotplug
    callbacks to wait on these RCU callbacks without risk of a hang.

    While in the neighborhood, this commit also moves rcu_send_cbs_to_orphanage()
    and rcu_adopt_orphan_cbs() under a pre-existing #ifdef to avoid including
    dead code on the one hand and to avoid define-without-use warnings on the
    other hand.

    Reported-by: Jeffrey Hugo
    Link: http://lkml.kernel.org/r/db9c91f6-1b17-6136-84f0-03c3c2581ab4@codeaurora.org
    Signed-off-by: Paul E. McKenney
    Cc: Thomas Gleixner
    Cc: Sebastian Andrzej Siewior
    Cc: Ingo Molnar
    Cc: Anna-Maria Gleixner
    Cc: Boris Ostrovsky
    Cc: Richard Weinberger

    Paul E. McKenney
     

20 Jul, 2017

1 commit

  • If cpuhp_store_callbacks() is called for CPUHP_AP_ONLINE_DYN or
    CPUHP_BP_PREPARE_DYN, which are the indicators for dynamically allocated
    states, then cpuhp_store_callbacks() allocates a new dynamic state. The
    first allocation in each range returns CPUHP_AP_ONLINE_DYN or
    CPUHP_BP_PREPARE_DYN.

    If cpuhp_remove_state() is invoked for one of these states, then there is
    no protection against the allocation mechanism. So the removal, which
    should clear the callbacks and the name, gets a new state assigned and
    clears that one.

    As a consequence the state which should be cleared stays initialized. A
    consecutive CPU hotplug operation dereferences the state callbacks and
    accesses either freed or reused memory, resulting in crashes.

    Add a protection against this by checking the name argument for NULL. If
    it's NULL it's a removal. If not, it's an allocation.

    [ tglx: Added a comment and massaged changelog ]

    Fixes: 5b7aa87e0482 ("cpu/hotplug: Implement setup/removal interface")
    Signed-off-by: Ethan Barnes
    Signed-off-by: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "Srivatsa S. Bhat"
    Cc: Sebastian Siewior
    Cc: Paul McKenney
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/DM2PR04MB398242FC7776D603D9F99C894A60@DM2PR04MB398.namprd04.prod.outlook.com

    Ethan Barnes
     

12 Jul, 2017

1 commit

  • The move of the unpark functions to the control thread moved the BUG_ON()
    there as well. While it made some sense in the idle thread of the upcoming
    CPU, it's bogus to crash the control thread on the already online CPU,
    especially as the function has a return value and the callsite is prepared
    to handle an error return.

    Replace it with a WARN_ON_ONCE() and return a proper error code.

    Fixes: 9cd4f1a4e7a8 ("smp/hotplug: Move unparking of percpu threads to the control CPU")
    Rightfully-ranted-at-by: Linux Torvalds
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

06 Jul, 2017

1 commit

  • Vikram reported the following backtrace:

    BUG: scheduling while atomic: swapper/7/0/0x00000002
    CPU: 7 PID: 0 Comm: swapper/7 Not tainted 4.9.32-perf+ #680
    schedule
    schedule_hrtimeout_range_clock
    schedule_hrtimeout
    wait_task_inactive
    __kthread_bind_mask
    __kthread_bind
    __kthread_unpark
    kthread_unpark
    cpuhp_online_idle
    cpu_startup_entry
    secondary_start_kernel

    He analyzed correctly that a parked cpu hotplug thread of an offlined CPU
    was still on the runqueue when the CPU came back online and tried to unpark
    it. This causes the thread which invoked kthread_unpark() to call
    wait_task_inactive() and subsequently schedule() with preemption disabled.
    His proposed workaround was to "make sure" that a parked thread has
    scheduled out when the CPU goes offline, so the situation cannot happen.

    But that's still wrong because the root cause is not the fact that the
    percpu thread is still on the runqueue and neither that preemption is
    disabled, which could be simply solved by enabling preemption before
    calling kthread_unpark().

    The real issue is that the calling thread is the idle task of the upcoming
    CPU, which is not supposed to call anything which might sleep. The moron,
    who wrote that code, missed completely that kthread_unpark() might end up
    in schedule().

    The solution is simpler than expected. The thread which controls the
    hotplug operation is waiting for the CPU to call complete() on the hotplug
    state completion. So the idle task of the upcoming CPU can set its state to
    CPUHP_AP_ONLINE_IDLE and invoke complete(). This in turn wakes the control
    task on a different CPU, which then can safely do the unpark and kick the
    now unparked hotplug thread of the upcoming CPU to complete the bringup to
    the final target state.

    Control CPU AP

    bringup_cpu();
    __cpu_up() ------------>
    bringup_ap();
    bringup_wait_for_ap()
    wait_for_completion();
    cpuhp_online_idle();
    stopper);
    unpark(AP->hotplugthread);
    while(1)
    do_idle();
    kick(AP->hotplugthread);
    wait_for_completion(); hotplug_thread()
    run_online_callbacks();
    complete();

    Fixes: 8df3e07e7f21 ("cpu/hotplug: Let upcoming cpu bring itself fully up")
    Reported-by: Vikram Mulukutla
    Signed-off-by: Thomas Gleixner
    Acked-by: Peter Zijlstra
    Cc: Sebastian Sewior
    Cc: Rusty Russell
    Cc: Tejun Heo
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1707042218020.2131@nanos
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

04 Jul, 2017

1 commit

  • Pull SMP hotplug updates from Thomas Gleixner:
    "This update is primarily a cleanup of the CPU hotplug locking code.

    The hotplug locking mechanism is an open coded RWSEM, which allows
    recursive locking. The main problem with that is the recursive nature
    as it evades the full lockdep coverage and hides potential deadlocks.

    The rework replaces the open coded RWSEM with a percpu RWSEM and
    establishes full lockdep coverage that way.

    The bulk of the changes fix up recursive locking issues and address
    the now fully reported potential deadlocks all over the place. Some of
    these deadlocks have been observed in the RT tree, but on mainline the
    probability was low enough to hide them away."

    * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
    cpu/hotplug: Constify attribute_group structures
    powerpc: Only obtain cpu_hotplug_lock if called by rtasd
    ARM/hw_breakpoint: Fix possible recursive locking for arch_hw_breakpoint_init
    cpu/hotplug: Remove unused check_for_tasks() function
    perf/core: Don't release cred_guard_mutex if not taken
    cpuhotplug: Link lock stacks for hotplug callbacks
    acpi/processor: Prevent cpu hotplug deadlock
    sched: Provide is_percpu_thread() helper
    cpu/hotplug: Convert hotplug locking to percpu rwsem
    s390: Prevent hotplug rwsem recursion
    arm: Prevent hotplug rwsem recursion
    arm64: Prevent cpu hotplug rwsem recursion
    kprobes: Cure hotplug lock ordering issues
    jump_label: Reorder hotplug lock and jump_label_lock
    perf/tracing/cpuhotplug: Fix locking order
    ACPI/processor: Use cpu_hotplug_disable() instead of get_online_cpus()
    PCI: Replace the racy recursion prevention
    PCI: Use cpu_hotplug_disable() instead of get_online_cpus()
    perf/x86/intel: Drop get_online_cpus() in intel_snb_check_microcode()
    x86/perf: Drop EXPORT of perf_check_microcode
    ...

    Linus Torvalds
     

30 Jun, 2017

1 commit

  • attribute_groups are not supposed to change at runtime. All functions
    working with attribute_groups provided by work with const
    attribute_group.

    So mark the non-const structs as const:

    File size before:
    text data bss dec hex filename
    12582 15361 20 27963 6d3b kernel/cpu.o

    File size After adding 'const':
    text data bss dec hex filename
    12710 15265 20 27995 6d5b kernel/cpu.o

    Signed-off-by: Arvind Yadav
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: anna-maria@linutronix.de
    Cc: bigeasy@linutronix.de
    Cc: boris.ostrovsky@oracle.com
    Cc: rcochran@linutronix.de
    Link: http://lkml.kernel.org/r/f9079e94e12b36d245e7adbf67d312bc5d0250c6.1498737970.git.arvind.yadav.cs@gmail.com
    Signed-off-by: Ingo Molnar

    Arvind Yadav
     

23 Jun, 2017

1 commit

  • If a CPU goes offline, interrupts affine to the CPU are moved away. If the
    outgoing CPU is the last CPU in the affinity mask the migration code breaks
    the affinity and sets it it all online cpus.

    This is a problem for affinity managed interrupts as CPU hotplug is often
    used for power management purposes. If the affinity is broken, the
    interrupt is not longer affine to the CPUs to which it was allocated.

    The affinity spreading allows to lay out multi queue devices in a way that
    they are assigned to a single CPU or a group of CPUs. If the last CPU goes
    offline, then the queue is not longer used, so the interrupt can be
    shutdown gracefully and parked until one of the assigned CPUs comes online
    again.

    Add a graceful shutdown mechanism into the irq affinity breaking code path,
    mark the irq as MANAGED_SHUTDOWN and leave the affinity mask unmodified.

    In the online path, scan the active interrupts for managed interrupts and
    if the interrupt is functional and the newly online CPU is part of the
    affinity mask, restart the interrupt if it is marked MANAGED_SHUTDOWN or if
    the interrupts is started up, try to add the CPU back to the effective
    affinity mask.

    Originally-by: Christoph Hellwig
    Signed-off-by: Thomas Gleixner
    Cc: Jens Axboe
    Cc: Marc Zyngier
    Cc: Michael Ellerman
    Cc: Keith Busch
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20170619235447.273417334@linutronix.de

    Thomas Gleixner
     

13 Jun, 2017

1 commit

  • clang -Wunused-function found one remaining function that was
    apparently meant to be removed in a recent code cleanup:

    kernel/cpu.c:565:20: warning: unused function 'check_for_tasks' [-Wunused-function]

    Sebastian explained: The function became unused unintentionally, but there
    is already a failure check, when a task cannot be removed from the outgoing
    cpu in the scheduler code, so bringing it back is not really giving any
    extra value.

    Fixes: 530e9b76ae8f ("cpu/hotplug: Remove obsolete cpu hotplug register/unregister functions")
    Signed-off-by: Arnd Bergmann
    Cc: Peter Zijlstra
    Cc: Sebastian Andrzej Siewior
    Cc: Boris Ostrovsky
    Cc: Anna-Maria Gleixner
    Link: http://lkml.kernel.org/r/20170608085544.2257132-1-arnd@arndb.de
    Signed-off-by: Thomas Gleixner

    Arnd Bergmann
     

03 Jun, 2017

1 commit

  • If a custom CPU target is specified and that one is not available _or_
    can't be interrupted then the code returns to userland without dropping a
    lock as notices by lockdep:

    |echo 133 > /sys/devices/system/cpu/cpu7/hotplug/target
    | ================================================
    | [ BUG: lock held when returning to user space! ]
    | ------------------------------------------------
    | bash/503 is leaving the kernel with locks still held!
    | 1 lock held by bash/503:
    | #0: (device_hotplug_lock){+.+...}, at: [] lock_device_hotplug_sysfs+0x10/0x40

    So release the lock then.

    Fixes: 757c989b9994 ("cpu/hotplug: Make target state writeable")
    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Thomas Gleixner
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/20170602142714.3ogo25f2wbq6fjpj@linutronix.de

    Sebastian Andrzej Siewior
     

26 May, 2017

1 commit

  • The CPU hotplug callbacks are not covered by lockdep versus the cpu hotplug
    rwsem.

    CPU0 CPU1
    cpuhp_setup_state(STATE, startup, teardown);
    cpus_read_lock();
    invoke_callback_on_ap();
    kick_hotplug_thread(ap);
    wait_for_completion(); hotplug_thread_fn()
    lock(m);
    do_stuff();
    unlock(m);

    Lockdep does not know about this dependency and will not trigger on the
    following code sequence:

    lock(m);
    cpus_read_lock();

    Add a lockdep map and connect the initiators lock chain with the hotplug
    thread lock chain, so potential deadlocks can be detected.

    Signed-off-by: Thomas Gleixner
    Tested-by: Paul E. McKenney
    Acked-by: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Sebastian Siewior
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/20170524081549.709375845@linutronix.de

    Thomas Gleixner