22 Jun, 2020

1 commit

  • [ Upstream commit bf2c59fce4074e55d622089b34be3a6bc95484fb ]

    In the CPU-offline process, it calls mmdrop() after idle entry and the
    subsequent call to cpuhp_report_idle_dead(). Once execution passes the
    call to rcu_report_dead(), RCU is ignoring the CPU, which results in
    lockdep complaining when mmdrop() uses RCU from either memcg or
    debugobjects below.

    Fix it by cleaning up the active_mm state from BP instead. Every arch
    which has CONFIG_HOTPLUG_CPU should have already called idle_task_exit()
    from AP. The only exception is parisc because it switches them to
    &init_mm unconditionally (see smp_boot_one_cpu() and smp_cpu_init()),
    but the patch will still work there because it calls mmgrab(&init_mm) in
    smp_cpu_init() and then should call mmdrop(&init_mm) in finish_cpu().

    WARNING: suspicious RCU usage
    -----------------------------
    kernel/workqueue.c:710 RCU or wq_pool_mutex should be held!

    other info that might help us debug this:

    RCU used illegally from offline CPU!
    Call Trace:
    dump_stack+0xf4/0x164 (unreliable)
    lockdep_rcu_suspicious+0x140/0x164
    get_work_pool+0x110/0x150
    __queue_work+0x1bc/0xca0
    queue_work_on+0x114/0x120
    css_release+0x9c/0xc0
    percpu_ref_put_many+0x204/0x230
    free_pcp_prepare+0x264/0x570
    free_unref_page+0x38/0xf0
    __mmdrop+0x21c/0x2c0
    idle_task_exit+0x170/0x1b0
    pnv_smp_cpu_kill_self+0x38/0x2e0
    cpu_die+0x48/0x64
    arch_cpu_idle_dead+0x30/0x50
    do_idle+0x2f4/0x470
    cpu_startup_entry+0x38/0x40
    start_secondary+0x7a8/0xa80
    start_secondary_resume+0x10/0x14

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Qian Cai
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Michael Ellerman (powerpc)
    Link: https://lkml.kernel.org/r/20200401214033.8448-1-cai@lca.pw
    Signed-off-by: Sasha Levin

    Peter Zijlstra
     

17 Apr, 2020

1 commit

  • commit e98eac6ff1b45e4e73f2e6031b37c256ccb5d36b upstream.

    A recent change to freeze_secondary_cpus() which added an early abort if a
    wakeup is pending missed the fact that the function is also invoked for
    shutdown, reboot and kexec via disable_nonboot_cpus().

    In case of disable_nonboot_cpus() the wakeup event needs to be ignored as
    the purpose is to terminate the currently running kernel.

    Add a 'suspend' argument which is only set when the freeze is in context of
    a suspend operation. If not set then an eventually pending wakeup event is
    ignored.

    Fixes: a66d955e910a ("cpu/hotplug: Abort disabling secondary CPUs if wakeup is pending")
    Reported-by: Boqun Feng
    Signed-off-by: Thomas Gleixner
    Cc: Pavankumar Kondeti
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/874kuaxdiz.fsf@nanos.tec.linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

24 Feb, 2020

1 commit

  • [ Upstream commit 45178ac0cea853fe0e405bf11e101bdebea57b15 ]

    Paul reported a very sporadic, rcutorture induced, workqueue failure.
    When the planets align, the workqueue rescuer's self-migrate fails and
    then triggers a WARN for running a work on the wrong CPU.

    Tejun then figured that set_cpus_allowed_ptr()'s stop_one_cpu() call
    could be ignored! When stopper->enabled is false, stop_machine will
    insta complete the work, without actually doing the work. Worse, it
    will not WARN about this (we really should fix this).

    It turns out there is a small window where a freshly online'ed CPU is
    marked 'online' but doesn't yet have the stopper task running:

    BP AP

    bringup_cpu()
    __cpu_up(cpu, idle) --> start_secondary()
    ...
    cpu_startup_entry()
    bringup_wait_for_ap()
    wait_for_ap_thread() enable = true;

    Close this by moving the stop_machine_unpark() into
    cpuhp_online_idle(), such that the stopper thread is ready before we
    start the idle loop and schedule.

    Reported-by: "Paul E. McKenney"
    Debugged-by: Tejun Heo
    Signed-off-by: Peter Zijlstra (Intel)
    Tested-by: "Paul E. McKenney"
    Signed-off-by: Sasha Levin

    Peter Zijlstra
     

23 Jan, 2020

1 commit

  • commit dc8d37ed304eeeea47e65fb9edc1c6c8b0093386 upstream.

    When CONFIG_SYSFS is disabled, but CONFIG_HOTPLUG_SMT is enabled,
    the kernel fails to link:

    arch/x86/power/cpu.o: In function `hibernate_resume_nonboot_cpu_disable':
    (.text+0x38d): undefined reference to `cpuhp_smt_enable'
    arch/x86/power/hibernate.o: In function `arch_resume_nosmt':
    hibernate.c:(.text+0x291): undefined reference to `cpuhp_smt_enable'
    hibernate.c:(.text+0x29c): undefined reference to `cpuhp_smt_disable'

    Move the exported functions out of the #ifdef section into its
    own with the correct conditions.

    The patch that caused this is marked for stable backports, so
    this one may need to be backported as well.

    Fixes: ec527c318036 ("x86/power: Fix 'nosmt' vs hibernation triple fault during resume")
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Jiri Kosina
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20191210195614.786555-1-arnd@arndb.de
    Signed-off-by: Greg Kroah-Hartman

    Arnd Bergmann
     

04 Nov, 2019

1 commit

  • A kernel module may need to check the value of the "mitigations=" kernel
    command line parameter as part of its setup when the module needs
    to perform software mitigations for a CPU flaw.

    Uninline and export the helper functions surrounding the cpu_mitigations
    enum to allow for their usage from a module.

    Lastly, privatize the enum and cpu_mitigations variable since the value of
    cpu_mitigations can be checked with the exported helper functions.

    Signed-off-by: Tyler Hicks
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Thomas Gleixner

    Tyler Hicks
     

24 Sep, 2019

1 commit


25 Jul, 2019

2 commits

  • Re-evaluating the bitmap wheight of the online cpus bitmap in every
    invocation of num_online_cpus() over and over is a pretty useless
    exercise. Especially when num_online_cpus() is used in code paths
    like the IPI delivery of x86 or the membarrier code.

    Cache the number of online CPUs in the core and just return the cached
    variable. The accessor function provides only a snapshot when used without
    protection against concurrent CPU hotplug.

    The storage needs to use an atomic_t because the kexec and reboot code
    (ab)use set_cpu_online() in their 'shutdown' handlers without any form of
    serialization as pointed out by Mathieu. Regular CPU hotplug usage is
    properly serialized.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Mathieu Desnoyers
    Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1907091622590.1634@nanos.tec.linutronix.de

    Thomas Gleixner
     
  • The booted once information which is required to deal with the MCE
    broadcast issue on X86 correctly is stored in the per cpu hotplug state,
    which is perfectly fine for the intended purpose.

    X86 needs that information for supporting NMI broadcasting via shortcuts,
    but retrieving it from per cpu data is cumbersome.

    Move it to a cpumask so the information can be checked against the
    cpu_present_mask quickly.

    No functional change intended.

    Signed-off-by: Thomas Gleixner
    Acked-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/20190722105219.818822855@linutronix.de

    Thomas Gleixner
     

09 Jul, 2019

1 commit

  • Pull SMP/hotplug updates from Thomas Gleixner:
    "A small set of updates for SMP and CPU hotplug:

    - Abort disabling secondary CPUs in the freezer when a wakeup is
    pending instead of evaluating it only after all CPUs have been
    offlined.

    - Remove the shared annotation for the strict per CPU cfd_data in the
    smp function call core code.

    - Remove the return values of smp_call_function() and on_each_cpu()
    as they are unconditionally 0. Fixup the few callers which actually
    bothered to check the return value"

    * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    smp: Remove smp_call_function() and on_each_cpu() return values
    smp: Do not mark call_function_data as shared
    cpu/hotplug: Abort disabling secondary CPUs if wakeup is pending
    cpu/hotplug: Fix notify_cpu_starting() reference in bringup_wait_for_ap()

    Linus Torvalds
     

27 Jun, 2019

1 commit

  • Setting invalid value to /sys/devices/system/cpu/cpuX/hotplug/fail
    can control `struct cpuhp_step *sp` address, results in the following
    global-out-of-bounds read.

    Reproducer:

    # echo -2 > /sys/devices/system/cpu/cpu0/hotplug/fail

    KASAN report:

    BUG: KASAN: global-out-of-bounds in write_cpuhp_fail+0x2cd/0x2e0
    Read of size 8 at addr ffffffff89734438 by task bash/1941

    CPU: 0 PID: 1941 Comm: bash Not tainted 5.2.0-rc6+ #31
    Call Trace:
    write_cpuhp_fail+0x2cd/0x2e0
    dev_attr_store+0x58/0x80
    sysfs_kf_write+0x13d/0x1a0
    kernfs_fop_write+0x2bc/0x460
    vfs_write+0x1e1/0x560
    ksys_write+0x126/0x250
    do_syscall_64+0xc1/0x390
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x7f05e4f4c970

    The buggy address belongs to the variable:
    cpu_hotplug_lock+0x98/0xa0

    Memory state around the buggy address:
    ffffffff89734300: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00
    ffffffff89734380: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00
    >ffffffff89734400: 00 00 00 00 fa fa fa fa 00 00 00 00 fa fa fa fa
    ^
    ffffffff89734480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    ffffffff89734500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

    Add a sanity check for the value written from user space.

    Fixes: 1db49484f21ed ("smp/hotplug: Hotplug state fail injection")
    Signed-off-by: Eiichi Tsukata
    Signed-off-by: Thomas Gleixner
    Cc: peterz@infradead.org
    Link: https://lkml.kernel.org/r/20190627024732.31672-1-devel@etsukata.com

    Eiichi Tsukata
     

26 Jun, 2019

1 commit

  • Currently, if the user specifies an unsupported mitigation strategy on the
    kernel command line, it will be ignored silently. The code will fall back
    to the default strategy, possibly leaving the system more vulnerable than
    expected.

    This may happen due to e.g. a simple typo, or, for a stable kernel release,
    because not all mitigation strategies have been backported.

    Inform the user by printing a message.

    Fixes: 98af8452945c5565 ("cpu/speculation: Add 'mitigations=' cmdline option")
    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: Thomas Gleixner
    Acked-by: Josh Poimboeuf
    Cc: Peter Zijlstra
    Cc: Jiri Kosina
    Cc: Greg Kroah-Hartman
    Cc: Ben Hutchings
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190516070935.22546-1-geert@linux-m68k.org

    Geert Uytterhoeven
     

12 Jun, 2019

1 commit

  • When "deep" suspend is enabled, all CPUs except the primary CPU are frozen
    via CPU hotplug one by one. After all secondary CPUs are unplugged the
    wakeup pending condition is evaluated and if pending the suspend operation
    is aborted and the secondary CPUs are brought up again.

    CPU hotplug is a slow operation, so it makes sense to check for wakeup
    pending in the freezer loop before bringing down the next CPU. This
    improves the system suspend abort latency significantly.

    [ tglx: Massaged changelog and improved printk message ]

    Signed-off-by: Pavankumar Kondeti
    Signed-off-by: Thomas Gleixner
    Cc: "Rafael J. Wysocki"
    Cc: Len Brown
    Cc: Pavel Machek
    Cc: Josh Poimboeuf
    Cc: Peter Zijlstra
    Cc: Konrad Rzeszutek Wilk
    Cc: iri Kosina
    Cc: Mukesh Ojha
    Cc: linux-pm@vger.kernel.org
    Link: https://lkml.kernel.org/r/1559536263-16472-1-git-send-email-pkondeti@codeaurora.org

    Pavankumar Kondeti
     

03 Jun, 2019

1 commit

  • As explained in

    0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once")

    we always, no matter what, have to bring up x86 HT siblings during boot at
    least once in order to avoid first MCE bringing the system to its knees.

    That means that whenever 'nosmt' is supplied on the kernel command-line,
    all the HT siblings are as a result sitting in mwait or cpudile after
    going through the online-offline cycle at least once.

    This causes a serious issue though when a kernel, which saw 'nosmt' on its
    commandline, is going to perform resume from hibernation: if the resume
    from the hibernated image is successful, cr3 is flipped in order to point
    to the address space of the kernel that is being resumed, which in turn
    means that all the HT siblings are all of a sudden mwaiting on address
    which is no longer valid.

    That results in triple fault shortly after cr3 is switched, and machine
    reboots.

    Fix this by always waking up all the SMT siblings before initiating the
    'restore from hibernation' process; this guarantees that all the HT
    siblings will be properly carried over to the resumed kernel waiting in
    resume_play_dead(), and acted upon accordingly afterwards, based on the
    target kernel configuration.

    Symmetricaly, the resumed kernel has to push the SMT siblings to mwait
    again in case it has SMT disabled; this means it has to online all
    the siblings when resuming (so that they come out of hlt) and offline
    them again to let them reach mwait.

    Cc: 4.19+ # v4.19+
    Debugged-by: Thomas Gleixner
    Fixes: 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once")
    Signed-off-by: Jiri Kosina
    Acked-by: Pavel Machek
    Reviewed-by: Thomas Gleixner
    Reviewed-by: Josh Poimboeuf
    Signed-off-by: Rafael J. Wysocki

    Jiri Kosina
     

29 May, 2019

1 commit


07 May, 2019

4 commits

  • Pull timer updates from Ingo Molnar:
    "This cycle had the following changes:

    - Timer tracing improvements (Anna-Maria Gleixner)

    - Continued tasklet reduction work: remove the hrtimer_tasklet
    (Thomas Gleixner)

    - Fix CPU hotplug remove race in the tick-broadcast mask handling
    code (Thomas Gleixner)

    - Force upper bound for setting CLOCK_REALTIME, to fix ABI
    inconsistencies with handling values that are close to the maximum
    supported and the vagueness of when uptime related wraparound might
    occur. Make the consistent maximum the year 2232 across all
    relevant ABIs and APIs. (Thomas Gleixner)

    - various cleanups and smaller fixes"

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    tick: Fix typos in comments
    tick/broadcast: Fix warning about undefined tick_broadcast_oneshot_offline()
    timekeeping: Force upper bound for setting CLOCK_REALTIME
    timer/trace: Improve timer tracing
    timer/trace: Replace deprecated vsprintf pointer extension %pf by %ps
    timer: Move trace point to get proper index
    tick/sched: Update tick_sched struct documentation
    tick: Remove outgoing CPU from broadcast masks
    timekeeping: Consistently use unsigned int for seqcount snapshot
    softirq: Remove tasklet_hrtimer
    xfrm: Replace hrtimer tasklet with softirq hrtimer
    mac80211_hwsim: Replace hrtimer tasklet with softirq hrtimer

    Linus Torvalds
     
  • Pull CPU hotplug updates from Ingo Molnar:
    "Two changes in this cycle:

    - Make the /sys/devices/system/cpu/smt/* files available on all
    arches, so user space has a consistent way to detect whether SMT is
    enabled.

    - Sparse annotation fix"

    * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    smpboot: Place the __percpu annotation correctly
    cpu/hotplug: Create SMT sysfs interface for all arches

    Linus Torvalds
     
  • Pull scheduler updates from Ingo Molnar:
    "The main changes in this cycle were:

    - Make nohz housekeeping processing more permissive and less
    intrusive to isolated CPUs

    - Decouple CPU-bound workqueue acconting from the scheduler and move
    it into the workqueue code.

    - Optimize topology building

    - Better handle quota and period overflows

    - Add more RCU annotations

    - Comment updates, misc cleanups"

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
    nohz_full: Allow the boot CPU to be nohz_full
    sched/isolation: Require a present CPU in housekeeping mask
    kernel/cpu: Allow non-zero CPU to be primary for suspend / kexec freeze
    power/suspend: Add function to disable secondaries for suspend
    sched/core: Allow the remote scheduler tick to be started on CPU0
    sched/nohz: Run NOHZ idle load balancer on HK_FLAG_MISC CPUs
    sched/debug: Fix spelling mistake "logaritmic" -> "logarithmic"
    sched/topology: Update init_sched_domains() comment
    cgroup/cpuset: Update stale generate_sched_domains() comments
    sched/core: Check quota and period overflow at usec to nsec conversion
    sched/core: Handle overflow in cpu_shares_write_u64
    sched/rt: Check integer overflow at usec to nsec conversion
    sched/core: Fix typo in comment
    sched/core: Make some functions static
    sched/core: Unify p->on_rq updates
    sched/core: Remove ttwu_activate()
    sched/core, workqueues: Distangle worker accounting from rq lock
    sched/fair: Remove unneeded prototype of capacity_of()
    sched/topology: Skip duplicate group rewrites in build_sched_groups()
    sched/topology: Fix build_sched_groups() comment
    ...

    Linus Torvalds
     
  • Pull speculation mitigation update from Ingo Molnar:
    "This adds the "mitigations=" bootline option, which offers a
    cross-arch set of options that will work on x86, PowerPC and s390 that
    will map to the arch specific option internally"

    * 'core-speculation-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    s390/speculation: Support 'mitigations=' cmdline option
    powerpc/speculation: Support 'mitigations=' cmdline option
    x86/speculation: Support 'mitigations=' cmdline option
    cpu/speculation: Add 'mitigations=' cmdline option

    Linus Torvalds
     

04 May, 2019

1 commit

  • This patch provides an arch option, ARCH_SUSPEND_NONZERO_CPU, to
    opt-in to allowing suspend to occur on one of the housekeeping CPUs
    rather than hardcoded CPU0.

    This will allow CPU0 to be a nohz_full CPU with a later change.

    It may be possible for platforms with hardware/firmware restrictions
    on suspend/wake effectively support this by handing off the final
    stage to CPU0 when kernel housekeeping is no longer required. Another
    option is to make housekeeping / nohz_full mask dynamic at runtime,
    but the complexity could not be justified at this time.

    Signed-off-by: Nicholas Piggin
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Frederic Weisbecker
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Rafael J . Wysocki
    Cc: Thomas Gleixner
    Cc: linuxppc-dev@lists.ozlabs.org
    Link: https://lkml.kernel.org/r/20190411033448.20842-4-npiggin@gmail.com
    Signed-off-by: Ingo Molnar

    Nicholas Piggin
     

18 Apr, 2019

1 commit

  • Keeping track of the number of mitigations for all the CPU speculation
    bugs has become overwhelming for many users. It's getting more and more
    complicated to decide which mitigations are needed for a given
    architecture. Complicating matters is the fact that each arch tends to
    have its own custom way to mitigate the same vulnerability.

    Most users fall into a few basic categories:

    a) they want all mitigations off;

    b) they want all reasonable mitigations on, with SMT enabled even if
    it's vulnerable; or

    c) they want all reasonable mitigations on, with SMT disabled if
    vulnerable.

    Define a set of curated, arch-independent options, each of which is an
    aggregation of existing options:

    - mitigations=off: Disable all mitigations.

    - mitigations=auto: [default] Enable all the default mitigations, but
    leave SMT enabled, even if it's vulnerable.

    - mitigations=auto,nosmt: Enable all the default mitigations, disabling
    SMT if needed by a mitigation.

    Currently, these options are placeholders which don't actually do
    anything. They will be fleshed out in upcoming patches.

    Signed-off-by: Josh Poimboeuf
    Signed-off-by: Thomas Gleixner
    Tested-by: Jiri Kosina (on x86)
    Reviewed-by: Jiri Kosina
    Cc: Borislav Petkov
    Cc: "H . Peter Anvin"
    Cc: Andy Lutomirski
    Cc: Peter Zijlstra
    Cc: Jiri Kosina
    Cc: Waiman Long
    Cc: Andrea Arcangeli
    Cc: Jon Masters
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Cc: linuxppc-dev@lists.ozlabs.org
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: linux-s390@vger.kernel.org
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: linux-arch@vger.kernel.org
    Cc: Greg Kroah-Hartman
    Cc: Tyler Hicks
    Cc: Linus Torvalds
    Cc: Randy Dunlap
    Cc: Steven Price
    Cc: Phil Auld
    Link: https://lkml.kernel.org/r/b07a8ef9b7c5055c3a4637c87d07c296d5016fe0.1555085500.git.jpoimboe@redhat.com

    Josh Poimboeuf
     

02 Apr, 2019

1 commit

  • Make the /sys/devices/system/cpu/smt/* files available on all arches, so
    user space has a consistent way to detect whether SMT is enabled.

    The 'control' file now shows 'notimplemented' for architectures which
    don't yet have CONFIG_HOTPLUG_SMT.

    [ tglx: Make notimplemented a real state ]

    Signed-off-by: Josh Poimboeuf
    Signed-off-by: Thomas Gleixner
    Cc: Andrea Arcangeli
    Cc: Waiman Long
    Cc: Peter Zijlstra
    Cc: Jiri Kosina
    Link: https://lkml.kernel.org/r/469c2b98055f2c41e75748e06447d592a64080c9.1553635520.git.jpoimboe@redhat.com

    Josh Poimboeuf
     

28 Mar, 2019

1 commit

  • Tianyu reported a crash in a CPU hotplug teardown callback when booting a
    kernel which has CONFIG_HOTPLUG_CPU disabled with the 'nosmt' boot
    parameter.

    It turns out that the SMP=y CONFIG_HOTPLUG_CPU=n case has been broken
    forever in case that a bringup callback fails. Unfortunately this issue was
    not recognized when the CPU hotplug code was reworked, so the shortcoming
    just stayed in place.

    When a bringup callback fails, the CPU hotplug code rolls back the
    operation and takes the CPU offline.

    The 'nosmt' command line argument uses a bringup failure to abort the
    bringup of SMT sibling CPUs. This partial bringup is required due to the
    MCE misdesign on Intel CPUs.

    With CONFIG_HOTPLUG_CPU=y the rollback works perfectly fine, but
    CONFIG_HOTPLUG_CPU=n lacks essential mechanisms to exercise the low level
    teardown of a CPU including the synchronizations in various facilities like
    RCU, NOHZ and others.

    As a consequence the teardown callbacks which must be executed on the
    outgoing CPU within stop machine with interrupts disabled are executed on
    the control CPU in interrupt enabled and preemptible context causing the
    kernel to crash and burn. The pre state machine code has a different
    failure mode which is more subtle and resulting in a less obvious use after
    free crash because the control side frees resources which are still in use
    by the undead CPU.

    But this is not a x86 only problem. Any architecture which supports the
    SMP=y HOTPLUG_CPU=n combination suffers from the same issue. It's just less
    likely to be triggered because in 99.99999% of the cases all bringup
    callbacks succeed.

    The easy solution of making HOTPLUG_CPU mandatory for SMP is not working on
    all architectures as the following architectures have either no hotplug
    support at all or not all subarchitectures support it:

    alpha, arc, hexagon, openrisc, riscv, sparc (32bit), mips (partial).

    Crashing the kernel in such a situation is not an acceptable state
    either.

    Implement a minimal rollback variant by limiting the teardown to the point
    where all regular teardown callbacks have been invoked and leave the CPU in
    the 'dead' idle state. This has the following consequences:

    - the CPU is brought down to the point where the stop_machine takedown
    would happen.

    - the CPU stays there forever and is idle

    - The CPU is cleared in the CPU active mask, but not in the CPU online
    mask which is a legit state.

    - Interrupts are not forced away from the CPU

    - All facilities which only look at online mask would still see it, but
    that is the case during normal hotplug/unplug operations as well. It's
    just a (way) longer time frame.

    This will expose issues, which haven't been exposed before or only seldom,
    because now the normally transient state of being non active but online is
    a permanent state. In testing this exposed already an issue vs. work queues
    where the vmstat code schedules work on the almost dead CPU which ends up
    in an unbound workqueue and triggers 'preemtible context' warnings. This is
    not a problem of this change, it merily exposes an already existing issue.
    Still this is better than crashing fully without a chance to debug it.

    This is mainly thought as workaround for those architectures which do not
    support HOTPLUG_CPU. All others should enforce HOTPLUG_CPU for SMP.

    Fixes: 2e1a3483ce74 ("cpu/hotplug: Split out the state walk into functions")
    Reported-by: Tianyu Lan
    Signed-off-by: Thomas Gleixner
    Tested-by: Tianyu Lan
    Acked-by: Greg Kroah-Hartman
    Cc: Konrad Wilk
    Cc: Josh Poimboeuf
    Cc: Mukesh Ojha
    Cc: Peter Zijlstra
    Cc: Jiri Kosina
    Cc: Rik van Riel
    Cc: Andy Lutomirski
    Cc: Micheal Kelley
    Cc: "K. Y. Srinivasan"
    Cc: Linus Torvalds
    Cc: Borislav Petkov
    Cc: K. Y. Srinivasan
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190326163811.503390616@linutronix.de

    Thomas Gleixner
     

24 Mar, 2019

1 commit

  • Valentin reported that unplugging a CPU occasionally results in a warning
    in the tick broadcast code which is triggered when an offline CPU is in the
    broadcast mask.

    This happens because the outgoing CPU is not removing itself from the
    broadcast masks, especially not from the broadcast_force_mask. The removal
    happens on the control CPU after the outgoing CPU is dead. It's a long
    standing issue, but the warning is harmless.

    Rework the hotplug mechanism so that the outgoing CPU removes itself from
    the broadcast masks after disabling interrupts and removing itself from the
    online mask.

    Reported-by: Valentin Schneider
    Signed-off-by: Thomas Gleixner
    Tested-by: Valentin Schneider
    Cc: Frederic Weisbecker
    Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1903211540180.1784@nanos.tec.linutronix.de

    Thomas Gleixner
     

04 Feb, 2019

1 commit


31 Jan, 2019

1 commit

  • With the following commit:

    73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS")

    ... the hotplug code attempted to detect when SMT was disabled by BIOS,
    in which case it reported SMT as permanently disabled. However, that
    code broke a virt hotplug scenario, where the guest is booted with only
    primary CPU threads, and a sibling is brought online later.

    The problem is that there doesn't seem to be a way to reliably
    distinguish between the HW "SMT disabled by BIOS" case and the virt
    "sibling not yet brought online" case. So the above-mentioned commit
    was a bit misguided, as it permanently disabled SMT for both cases,
    preventing future virt sibling hotplugs.

    Going back and reviewing the original problems which were attempted to
    be solved by that commit, when SMT was disabled in BIOS:

    1) /sys/devices/system/cpu/smt/control showed "on" instead of
    "notsupported"; and

    2) vmx_vm_init() was incorrectly showing the L1TF_MSG_SMT warning.

    I'd propose that we instead consider #1 above to not actually be a
    problem. Because, at least in the virt case, it's possible that SMT
    wasn't disabled by BIOS and a sibling thread could be brought online
    later. So it makes sense to just always default the smt control to "on"
    to allow for that possibility (assuming cpuid indicates that the CPU
    supports SMT).

    The real problem is #2, which has a simple fix: change vmx_vm_init() to
    query the actual current SMT state -- i.e., whether any siblings are
    currently online -- instead of looking at the SMT "control" sysfs value.

    So fix it by:

    a) reverting the original "fix" and its followup fix:

    73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS")
    bc2d8d262cba ("cpu/hotplug: Fix SMT supported evaluation")

    and

    b) changing vmx_vm_init() to query the actual current SMT state --
    instead of the sysfs control value -- to determine whether the L1TF
    warning is needed. This also requires the 'sched_smt_present'
    variable to exported, instead of 'cpu_smt_control'.

    Fixes: 73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS")
    Reported-by: Igor Mammedov
    Signed-off-by: Josh Poimboeuf
    Signed-off-by: Thomas Gleixner
    Cc: Joe Mario
    Cc: Jiri Kosina
    Cc: Peter Zijlstra
    Cc: kvm@vger.kernel.org
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/e3a85d585da28cc333ecbc1e78ee9216e6da9396.1548794349.git.jpoimboe@redhat.com

    Josh Poimboeuf
     

30 Jan, 2019

1 commit

  • With commit a74cfffb03b7 ("x86/speculation: Rework SMT state change"),
    arch_smt_update() is invoked from each individual CPU hotplug function.

    Therefore the extra arch_smt_update() call in the sysfs SMT control is
    redundant.

    Fixes: a74cfffb03b7 ("x86/speculation: Rework SMT state change")
    Signed-off-by: Zhenzhong Duan
    Signed-off-by: Thomas Gleixner
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Link: https://lkml.kernel.org/r/e2e064f2-e8ef-42ca-bf4f-76b612964752@default

    Zhenzhong Duan
     

21 Jan, 2019

1 commit

  • Since we've had:

    commit cb538267ea1e ("jump_label/lockdep: Assert we hold the hotplug lock for _cpuslocked() operations")

    we've been getting some lockdep warnings during init, such as on HiKey960:

    [ 0.820495] WARNING: CPU: 4 PID: 0 at kernel/cpu.c:316 lockdep_assert_cpus_held+0x3c/0x48
    [ 0.820498] Modules linked in:
    [ 0.820509] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G S 4.20.0-rc5-00051-g4cae42a #34
    [ 0.820511] Hardware name: HiKey960 (DT)
    [ 0.820516] pstate: 600001c5 (nZCv dAIF -PAN -UAO)
    [ 0.820520] pc : lockdep_assert_cpus_held+0x3c/0x48
    [ 0.820523] lr : lockdep_assert_cpus_held+0x38/0x48
    [ 0.820526] sp : ffff00000a9cbe50
    [ 0.820528] x29: ffff00000a9cbe50 x28: 0000000000000000
    [ 0.820533] x27: 00008000b69e5000 x26: ffff8000bff4cfe0
    [ 0.820537] x25: ffff000008ba69e0 x24: 0000000000000001
    [ 0.820541] x23: ffff000008fce000 x22: ffff000008ba70c8
    [ 0.820545] x21: 0000000000000001 x20: 0000000000000003
    [ 0.820548] x19: ffff00000a35d628 x18: ffffffffffffffff
    [ 0.820552] x17: 0000000000000000 x16: 0000000000000000
    [ 0.820556] x15: ffff00000958f848 x14: 455f3052464d4d34
    [ 0.820559] x13: 00000000769dde98 x12: ffff8000bf3f65a8
    [ 0.820564] x11: 0000000000000000 x10: ffff00000958f848
    [ 0.820567] x9 : ffff000009592000 x8 : ffff00000958f848
    [ 0.820571] x7 : ffff00000818ffa0 x6 : 0000000000000000
    [ 0.820574] x5 : 0000000000000000 x4 : 0000000000000001
    [ 0.820578] x3 : 0000000000000000 x2 : 0000000000000001
    [ 0.820582] x1 : 00000000ffffffff x0 : 0000000000000000
    [ 0.820587] Call trace:
    [ 0.820591] lockdep_assert_cpus_held+0x3c/0x48
    [ 0.820598] static_key_enable_cpuslocked+0x28/0xd0
    [ 0.820606] arch_timer_check_ool_workaround+0xe8/0x228
    [ 0.820610] arch_timer_starting_cpu+0xe4/0x2d8
    [ 0.820615] cpuhp_invoke_callback+0xe8/0xd08
    [ 0.820619] notify_cpu_starting+0x80/0xb8
    [ 0.820625] secondary_start_kernel+0x118/0x1d0

    We've also had a similar warning in sched_init_smp() for every
    asymmetric system that would enable the sched_asym_cpucapacity static
    key, although that was singled out in:

    commit 40fa3780bac2 ("sched/core: Take the hotplug lock in sched_init_smp()")

    Those warnings are actually harmless, since we cannot have hotplug
    operations at the time they appear. Instead of starting to sprinkle
    useless hotplug lock operations in the init codepaths, mute the
    warnings until they start warning about real problems.

    Suggested-by: Peter Zijlstra
    Signed-off-by: Valentin Schneider
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Cc: cai@gmx.us
    Cc: daniel.lezcano@linaro.org
    Cc: dietmar.eggemann@arm.com
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: longman@redhat.com
    Cc: marc.zyngier@arm.com
    Cc: mark.rutland@arm.com
    Link: https://lkml.kernel.org/r/1545243796-23224-2-git-send-email-valentin.schneider@arm.com
    Signed-off-by: Ingo Molnar

    Valentin Schneider
     

28 Nov, 2018

1 commit

  • arch_smt_update() is only called when the sysfs SMT control knob is
    changed. This means that when SMT is enabled in the sysfs control knob the
    system is considered to have SMT active even if all siblings are offline.

    To allow finegrained control of the speculation mitigations, the actual SMT
    state is more interesting than the fact that siblings could be enabled.

    Rework the code, so arch_smt_update() is invoked from each individual CPU
    hotplug function, and simplify the update function while at it.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Andy Lutomirski
    Cc: Linus Torvalds
    Cc: Jiri Kosina
    Cc: Tom Lendacky
    Cc: Josh Poimboeuf
    Cc: Andrea Arcangeli
    Cc: David Woodhouse
    Cc: Tim Chen
    Cc: Andi Kleen
    Cc: Dave Hansen
    Cc: Casey Schaufler
    Cc: Asit Mallick
    Cc: Arjan van de Ven
    Cc: Jon Masters
    Cc: Waiman Long
    Cc: Greg KH
    Cc: Dave Stewart
    Cc: Kees Cook
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20181125185004.521974984@linutronix.de

    Thomas Gleixner
     

24 Oct, 2018

1 commit

  • Pull x86 pti updates from Ingo Molnar:
    "The main changes:

    - Make the IBPB barrier more strict and add STIBP support (Jiri
    Kosina)

    - Micro-optimize and clean up the entry code (Andy Lutomirski)

    - ... plus misc other fixes"

    * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/speculation: Propagate information about RSB filling mitigation to sysfs
    x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation
    x86/speculation: Apply IBPB more strictly to avoid cross-process data leak
    x86/speculation: Add RETPOLINE_AMD support to the inline asm CALL_NOSPEC variant
    x86/CPU: Fix unused variable warning when !CONFIG_IA32_EMULATION
    x86/pti/64: Remove the SYSCALL64 entry trampoline
    x86/entry/64: Use the TSS sp2 slot for SYSCALL/SYSRET scratch space
    x86/entry/64: Document idtentry

    Linus Torvalds
     

23 Oct, 2018

1 commit

  • Pull scheduler updates from Ingo Molnar:
    "The main changes are:

    - Migrate CPU-intense 'misfit' tasks on asymmetric capacity systems,
    to better utilize (much) faster 'big core' CPUs. (Morten Rasmussen,
    Valentin Schneider)

    - Topology handling improvements, in particular when CPU capacity
    changes and related load-balancing fixes/improvements (Morten
    Rasmussen)

    - ... plus misc other improvements, fixes and updates"

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits)
    sched/completions/Documentation: Add recommendation for dynamic and ONSTACK completions
    sched/completions/Documentation: Clean up the document some more
    sched/completions/Documentation: Fix a couple of punctuation nits
    cpu/SMT: State SMT is disabled even with nosmt and without "=force"
    sched/core: Fix comment regarding nr_iowait_cpu() and get_iowait_load()
    sched/fair: Remove setting task's se->runnable_weight during PELT update
    sched/fair: Disable LB_BIAS by default
    sched/pelt: Fix warning and clean up IRQ PELT config
    sched/topology: Make local variables static
    sched/debug: Use symbolic names for task state constants
    sched/numa: Remove unused numa_stats::nr_running field
    sched/numa: Remove unused code from update_numa_stats()
    sched/debug: Explicitly cast sched_feat() to bool
    sched/core: Disable SD_PREFER_SIBLING on asymmetric CPU capacity domains
    sched/fair: Don't move tasks to lower capacity CPUs unless necessary
    sched/fair: Set rq->rd->overload when misfit
    sched/fair: Wrap rq->rd->overload accesses with READ/WRITE_ONCE()
    sched/core: Change root_domain->overload type to int
    sched/fair: Change 'prefer_sibling' type to bool
    sched/fair: Kick nohz balance if rq->misfit_task_load
    ...

    Linus Torvalds
     

05 Oct, 2018

1 commit

  • When booting with "nosmt=force" a message is issued into dmesg to
    confirm that SMT has been force-disabled but such a message is not
    issued when only "nosmt" is on the kernel command line.

    Fix that.

    Signed-off-by: Borislav Petkov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20181004172227.10094-1-bp@alien8.de
    Signed-off-by: Ingo Molnar

    Borislav Petkov
     

26 Sep, 2018

1 commit

  • STIBP is a feature provided by certain Intel ucodes / CPUs. This feature
    (once enabled) prevents cross-hyperthread control of decisions made by
    indirect branch predictors.

    Enable this feature if

    - the CPU is vulnerable to spectre v2
    - the CPU supports SMT and has SMT siblings online
    - spectre_v2 mitigation autoselection is enabled (default)

    After some previous discussion, this leaves STIBP on all the time, as wrmsr
    on crossing kernel boundary is a no-no. This could perhaps later be a bit
    more optimized (like disabling it in NOHZ, experiment with disabling it in
    idle, etc) if needed.

    Note that the synchronization of the mask manipulation via newly added
    spec_ctrl_mutex is currently not strictly needed, as the only updater is
    already being serialized by cpu_add_remove_lock, but let's make this a
    little bit more future-proof.

    Signed-off-by: Jiri Kosina
    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Josh Poimboeuf
    Cc: Andrea Arcangeli
    Cc: "WoodhouseDavid"
    Cc: Andi Kleen
    Cc: Tim Chen
    Cc: "SchauflerCasey"
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251438240.15880@cbobk.fhfr.pm

    Jiri Kosina
     

12 Sep, 2018

1 commit

  • Anybody trying to assert the cpu_hotplug_lock is held (lockdep_assert_cpus_held())
    from AP callbacks will fail, because the lock is held by the BP.

    Stick in an explicit annotation in cpuhp_thread_fun() to make this work.

    Reported-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-tip-commits@vger.kernel.org
    Fixes: cb538267ea1e ("jump_label/lockdep: Assert we hold the hotplug lock for _cpuslocked() operations")
    Link: http://lkml.kernel.org/r/20180911095127.GT24082@hirez.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

06 Sep, 2018

2 commits

  • When a teardown callback fails, the CPU hotplug code brings the CPU back to
    the previous state. The previous state becomes the new target state. The
    rollback happens in undo_cpu_down() which increments the state
    unconditionally even if the state is already the same as the target.

    As a consequence the next CPU hotplug operation will start at the wrong
    state. This is easily to observe when __cpu_disable() fails.

    Prevent the unconditional undo by checking the state vs. target before
    incrementing state and fix up the consequently wrong conditional in the
    unplug code which handles the failure of the final CPU take down on the
    control CPU side.

    Fixes: 4dddfb5faa61 ("smp/hotplug: Rewrite AP state machine core")
    Reported-by: Neeraj Upadhyay
    Signed-off-by: Thomas Gleixner
    Tested-by: Geert Uytterhoeven
    Tested-by: Sudeep Holla
    Tested-by: Neeraj Upadhyay
    Cc: josh@joshtriplett.org
    Cc: peterz@infradead.org
    Cc: jiangshanlai@gmail.com
    Cc: dzickus@redhat.com
    Cc: brendan.jackman@arm.com
    Cc: malat@debian.org
    Cc: sramana@codeaurora.org
    Cc: linux-arm-msm@vger.kernel.org
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1809051419580.1416@nanos.tec.linutronix.de

    ----

    Thomas Gleixner
     
  • The smp_mb() in cpuhp_thread_fun() is misplaced. It needs to be after the
    load of st->should_run to prevent reordering of the later load/stores
    w.r.t. the load of st->should_run.

    Fixes: 4dddfb5faa61 ("smp/hotplug: Rewrite AP state machine core")
    Signed-off-by: Neeraj Upadhyay
    Signed-off-by: Thomas Gleixner
    Acked-by: Peter Zijlstra (Intel)
    Cc: josh@joshtriplett.org
    Cc: peterz@infradead.org
    Cc: jiangshanlai@gmail.com
    Cc: dzickus@redhat.com
    Cc: brendan.jackman@arm.com
    Cc: malat@debian.org
    Cc: mojha@codeaurora.org
    Cc: sramana@codeaurora.org
    Cc: linux-arm-msm@vger.kernel.org
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/1536126727-11629-1-git-send-email-neeraju@codeaurora.org

    Neeraj Upadhyay
     

31 Aug, 2018

1 commit

  • When notifiers were there, `skip_onerr` was used to avoid calling
    particular step startup/teardown callbacks in the CPU up/down rollback
    path, which made the hotplug asymmetric.

    As notifiers are gone now after the full state machine conversion, the
    `skip_onerr` field is no longer required.

    Remove it from the structure and its usage.

    Signed-off-by: Mukesh Ojha
    Signed-off-by: Thomas Gleixner
    Acked-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/1535439294-31426-1-git-send-email-mojha@codeaurora.org

    Mukesh Ojha
     

15 Aug, 2018

3 commits

  • Commit 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once")
    breaks non-SMP builds.

    [ I suspect the 'bool' fields should just be made to be bitfields and be
    exposed regardless of configuration, but that's a separate cleanup
    that I'll leave to the owners of this file for later. - Linus ]

    Fixes: 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once")
    Cc: Dave Hansen
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Signed-off-by: Abel Vesa
    Signed-off-by: Linus Torvalds

    Abel Vesa
     
  • Pull power management updates from Rafael Wysocki:
    "These add a new framework for CPU idle time injection, to be used by
    all of the idle injection code in the kernel in the future, fix some
    issues and add a number of relatively small extensions in multiple
    places.

    Specifics:

    - Add a new framework for CPU idle time injection (Daniel Lezcano).

    - Add AVS support to the armada-37xx cpufreq driver (Gregory
    CLEMENT).

    - Add support for current CPU frequency reporting to the ACPI CPPC
    cpufreq driver (George Cherian).

    - Rework the cooling device registration in the imx6q/thermal driver
    (Bastian Stender).

    - Make the pcc-cpufreq driver refuse to work with dynamic scaling
    governors on systems with many CPUs to avoid scalability issues
    with it (Rafael Wysocki).

    - Fix the intel_pstate driver to report different maximum CPU
    frequencies on systems where they really are different and to
    ignore the turbo active ratio if hardware-managend P-states (HWP)
    are in use; make it use the match_string() helper (Xie Yisheng,
    Srinivas Pandruvada).

    - Fix a minor deferred probe issue in the qcom-kryo cpufreq driver
    (Niklas Cassel).

    - Add a tracepoint for the tracking of frequency limits changes (from
    Andriod) to the cpufreq core (Ruchi Kandoi).

    - Fix a circular lock dependency between CPU hotplug and sysfs
    locking in the cpufreq core reported by lockdep (Waiman Long).

    - Avoid excessive error reports on driver registration failures in
    the ARM cpuidle driver (Sudeep Holla).

    - Add a new device links flag to the driver core to make links go
    away automatically on supplier driver removal (Vivek Gautam).

    - Eliminate potential race condition between system-wide power
    management transitions and system shutdown (Pingfan Liu).

    - Add a quirk to save NVS memory on system suspend for the ASUS 1025C
    laptop (Willy Tarreau).

    - Make more systems use suspend-to-idle (instead of ACPI S3) by
    default (Tristian Celestin).

    - Get rid of stack VLA usage in the low-level hibernation code on
    64-bit x86 (Kees Cook).

    - Fix error handling in the hibernation core and mark an expected
    fall-through switch in it (Chengguang Xu, Gustavo Silva).

    - Extend the generic power domains (genpd) framework to support
    attaching a device to a power domain by name (Ulf Hansson).

    - Fix device reference counting and user limits initialization in the
    devfreq core (Arvind Yadav, Matthias Kaehlcke).

    - Fix a few issues in the rk3399_dmc devfreq driver and improve its
    documentation (Enric Balletbo i Serra, Lin Huang, Nick Milner).

    - Drop a redundant error message from the exynos-ppmu devfreq driver
    (Markus Elfring)"

    * tag 'pm-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (35 commits)
    PM / reboot: Eliminate race between reboot and suspend
    PM / hibernate: Mark expected switch fall-through
    cpufreq: intel_pstate: Ignore turbo active ratio in HWP
    cpufreq: Fix a circular lock dependency problem
    cpu/hotplug: Add a cpus_read_trylock() function
    x86/power/hibernate_64: Remove VLA usage
    cpufreq: trace frequency limits change
    cpufreq: intel_pstate: Show different max frequency with turbo 3 and HWP
    cpufreq: pcc-cpufreq: Disable dynamic scaling on many-CPU systems
    cpufreq: qcom-kryo: Silently error out on EPROBE_DEFER
    cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC
    cpufreq: armada-37xx: Add AVS support
    dt-bindings: marvell: Add documentation for the Armada 3700 AVS binding
    PM / devfreq: rk3399_dmc: Fix duplicated opp table on reload.
    PM / devfreq: Init user limits from OPP limits, not viceversa
    PM / devfreq: rk3399_dmc: fix spelling mistakes.
    PM / devfreq: rk3399_dmc: do not print error when get supply and clk defer.
    dt-bindings: devfreq: rk3399_dmc: move interrupts to be optional.
    PM / devfreq: rk3399_dmc: remove wait for dcf irq event.
    dt-bindings: clock: add rk3399 DDR3 standard speed bins.
    ...

    Linus Torvalds
     
  • Merge L1 Terminal Fault fixes from Thomas Gleixner:
    "L1TF, aka L1 Terminal Fault, is yet another speculative hardware
    engineering trainwreck. It's a hardware vulnerability which allows
    unprivileged speculative access to data which is available in the
    Level 1 Data Cache when the page table entry controlling the virtual
    address, which is used for the access, has the Present bit cleared or
    other reserved bits set.

    If an instruction accesses a virtual address for which the relevant
    page table entry (PTE) has the Present bit cleared or other reserved
    bits set, then speculative execution ignores the invalid PTE and loads
    the referenced data if it is present in the Level 1 Data Cache, as if
    the page referenced by the address bits in the PTE was still present
    and accessible.

    While this is a purely speculative mechanism and the instruction will
    raise a page fault when it is retired eventually, the pure act of
    loading the data and making it available to other speculative
    instructions opens up the opportunity for side channel attacks to
    unprivileged malicious code, similar to the Meltdown attack.

    While Meltdown breaks the user space to kernel space protection, L1TF
    allows to attack any physical memory address in the system and the
    attack works across all protection domains. It allows an attack of SGX
    and also works from inside virtual machines because the speculation
    bypasses the extended page table (EPT) protection mechanism.

    The assoicated CVEs are: CVE-2018-3615, CVE-2018-3620, CVE-2018-3646

    The mitigations provided by this pull request include:

    - Host side protection by inverting the upper address bits of a non
    present page table entry so the entry points to uncacheable memory.

    - Hypervisor protection by flushing L1 Data Cache on VMENTER.

    - SMT (HyperThreading) control knobs, which allow to 'turn off' SMT
    by offlining the sibling CPU threads. The knobs are available on
    the kernel command line and at runtime via sysfs

    - Control knobs for the hypervisor mitigation, related to L1D flush
    and SMT control. The knobs are available on the kernel command line
    and at runtime via sysfs

    - Extensive documentation about L1TF including various degrees of
    mitigations.

    Thanks to all people who have contributed to this in various ways -
    patches, review, testing, backporting - and the fruitful, sometimes
    heated, but at the end constructive discussions.

    There is work in progress to provide other forms of mitigations, which
    might be less horrible performance wise for a particular kind of
    workloads, but this is not yet ready for consumption due to their
    complexity and limitations"

    * 'l1tf-final' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits)
    x86/microcode: Allow late microcode loading with SMT disabled
    tools headers: Synchronise x86 cpufeatures.h for L1TF additions
    x86/mm/kmmio: Make the tracer robust against L1TF
    x86/mm/pat: Make set_memory_np() L1TF safe
    x86/speculation/l1tf: Make pmd/pud_mknotpresent() invert
    x86/speculation/l1tf: Invert all not present mappings
    cpu/hotplug: Fix SMT supported evaluation
    KVM: VMX: Tell the nested hypervisor to skip L1D flush on vmentry
    x86/speculation: Use ARCH_CAPABILITIES to skip L1D flush on vmentry
    x86/speculation: Simplify sysfs report of VMX L1TF vulnerability
    Documentation/l1tf: Remove Yonah processors from not vulnerable list
    x86/KVM/VMX: Don't set l1tf_flush_l1d from vmx_handle_external_intr()
    x86/irq: Let interrupt handlers set kvm_cpu_l1tf_flush_l1d
    x86: Don't include linux/irq.h from asm/hardirq.h
    x86/KVM/VMX: Introduce per-host-cpu analogue of l1tf_flush_l1d
    x86/irq: Demote irq_cpustat_t::__softirq_pending to u16
    x86/KVM/VMX: Move the l1tf_flush_l1d test to vmx_l1d_flush()
    x86/KVM/VMX: Replace 'vmx_l1d_flush_always' with 'vmx_l1d_flush_cond'
    x86/KVM/VMX: Don't set l1tf_flush_l1d to true from vmx_l1d_flush()
    cpu/hotplug: detect SMT disabled by BIOS
    ...

    Linus Torvalds
     

14 Aug, 2018

1 commit