04 Oct, 2018

5 commits

  • [ Upstream commit 9b2e0388bec8ec5427403e23faff3b58dd1c3200 ]

    When sockmap code is using the stream parser it also handles the write
    space events in order to handle the case where (a) verdict redirects
    skb to another socket and (b) the sockmap then sends the skb but due
    to memory constraints (or other EAGAIN errors) needs to do a retry.

    But the initial code missed a third case where the
    skb_send_sock_locked() triggers an sk_wait_event(). A typically case
    would be when sndbuf size is exceeded. If this happens because we
    do not pass the write_space event to the lower layers we never wake
    up the event and it will wait for sndtimeo. Which as noted in ktls
    fix may be rather large and look like a hang to the user.

    To reproduce the best test is to reduce the sndbuf size and send
    1B data chunks to stress the memory handling. To fix this pass the
    event from the upper layer to the lower layer.

    Signed-off-by: John Fastabend
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    John Fastabend
     
  • [ Upstream commit 9f2d1e68cf4d641def734adaccfc3823d3575e6c ]

    Livepatch modules are special in that we preserve their entire symbol
    tables in order to be able to apply relocations after module load. The
    unwanted side effect of this is that undefined (SHN_UNDEF) symbols of
    livepatch modules are accessible via the kallsyms api and this can
    confuse symbol resolution in livepatch (klp_find_object_symbol()) and
    cause subtle bugs in livepatch.

    Have the module kallsyms api skip over SHN_UNDEF symbols. These symbols
    are usually not available for normal modules anyway as we cut down their
    symbol tables to just the core (non-undefined) symbols, so this should
    really just affect livepatch modules. Note that this patch doesn't
    affect the display of undefined symbols in /proc/kallsyms.

    Reported-by: Josh Poimboeuf
    Tested-by: Josh Poimboeuf
    Reviewed-by: Josh Poimboeuf
    Signed-off-by: Jessica Yu
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Jessica Yu
     
  • [ Upstream commit 78c9c4dfbf8c04883941445a195276bb4bb92c76 ]

    The posix timer overrun handling is broken because the forwarding functions
    can return a huge number of overruns which does not fit in an int. As a
    consequence timer_getoverrun(2) and siginfo::si_overrun can turn into
    random number generators.

    The k_clock::timer_forward() callbacks return a 64 bit value now. Make
    k_itimer::ti_overrun[_last] 64bit as well, so the kernel internal
    accounting is correct. 3Remove the temporary (int) casts.

    Add a helper function which clamps the overrun value returned to user space
    via timer_getoverrun(2) or siginfo::si_overrun limited to a positive value
    between 0 and INT_MAX. INT_MAX is an indicator for user space that the
    overrun value has been clamped.

    Reported-by: Team OWL337
    Signed-off-by: Thomas Gleixner
    Acked-by: John Stultz
    Cc: Peter Zijlstra
    Cc: Michael Kerrisk
    Link: https://lkml.kernel.org/r/20180626132705.018623573@linutronix.de
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • [ Upstream commit 6fec64e1c92d5c715c6d0f50786daa7708266bde ]

    The posix timer ti_overrun handling is broken because the forwarding
    functions can return a huge number of overruns which does not fit in an
    int. As a consequence timer_getoverrun(2) and siginfo::si_overrun can turn
    into random number generators.

    As a first step to address that let the timer_forward() callbacks return
    the full 64 bit value.

    Cast it to (int) temporarily until k_itimer::ti_overrun is converted to
    64bit and the conversion to user space visible values is sanitized.

    Reported-by: Team OWL337
    Signed-off-by: Thomas Gleixner
    Acked-by: John Stultz
    Cc: Peter Zijlstra
    Cc: Michael Kerrisk
    Link: https://lkml.kernel.org/r/20180626132704.922098090@linutronix.de
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • [ Upstream commit 5f936e19cc0ef97dbe3a56e9498922ad5ba1edef ]

    Air Icy reported:

    UBSAN: Undefined behaviour in kernel/time/alarmtimer.c:811:7
    signed integer overflow:
    1529859276030040771 + 9223372036854775807 cannot be represented in type 'long long int'
    Call Trace:
    alarm_timer_nsleep+0x44c/0x510 kernel/time/alarmtimer.c:811
    __do_sys_clock_nanosleep kernel/time/posix-timers.c:1235 [inline]
    __se_sys_clock_nanosleep kernel/time/posix-timers.c:1213 [inline]
    __x64_sys_clock_nanosleep+0x326/0x4e0 kernel/time/posix-timers.c:1213
    do_syscall_64+0xb8/0x3a0 arch/x86/entry/common.c:290

    alarm_timer_nsleep() uses ktime_add() to add the current time and the
    relative expiry value. ktime_add() has no sanity checks so the addition
    can overflow when the relative timeout is large enough.

    Use ktime_add_safe() which has the necessary sanity checks in place and
    limits the result to the valid range.

    Fixes: 9a7adcf5c6de ("timers: Posix interface for alarm-timers")
    Reported-by: Team OWL337
    Signed-off-by: Thomas Gleixner
    Cc: John Stultz
    Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1807020926360.1595@nanos.tec.linutronix.de
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

29 Sep, 2018

3 commits

  • Commit 0a0e0829f990 ("nohz: Fix missing tick reprogram when interrupting an
    inline softirq") got backported to stable trees and now causes the NOHZ
    softirq pending warning to trigger. It's not an upstream issue as the NOHZ
    update logic has been changed there.

    The problem is when a softirq disabled section gets interrupted and on
    return from interrupt the tick/nohz state is evaluated, which then can
    observe pending soft interrupts. These soft interrupts are legitimately
    pending because they cannot be processed as long as soft interrupts are
    disabled and the interrupted code will correctly process them when soft
    interrupts are reenabled.

    Add a check for softirqs disabled to the pending check to prevent the
    warning.

    Reported-by: Grygorii Strashko
    Reported-by: John Crispin
    Signed-off-by: Thomas Gleixner
    Tested-by: Grygorii Strashko
    Tested-by: John Crispin
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Anna-Maria Gleixner
    Cc: stable@vger.kernel.org
    Fixes: 2d898915ccf4838c ("nohz: Fix missing tick reprogram when interrupting an inline softirq")
    Acked-by: Frederic Weisbecker
    Tested-by: Geert Uytterhoeven

    Thomas Gleixner
     
  • commit d0cdb3ce8834332d918fc9c8ff74f8a169ec9abe upstream.

    When a task which previously ran on a given CPU is remotely queued to
    wake up on that same CPU, there is a period where the task's state is
    TASK_WAKING and its vruntime is not normalized. This is not accounted
    for in vruntime_normalized() which will cause an error in the task's
    vruntime if it is switched from the fair class during this time.

    For example if it is boosted to RT priority via rt_mutex_setprio(),
    rq->min_vruntime will not be subtracted from the task's vruntime but
    it will be added again when the task returns to the fair class. The
    task's vruntime will have been erroneously doubled and the effective
    priority of the task will be reduced.

    Note this will also lead to inflation of all vruntimes since the doubled
    vruntime value will become the rq's min_vruntime when other tasks leave
    the rq. This leads to repeated doubling of the vruntime and priority
    penalty.

    Fix this by recognizing a WAKING task's vruntime as normalized only if
    sched_remote_wakeup is true. This indicates a migration, in which case
    the vruntime would have been normalized in migrate_task_rq_fair().

    Based on a similar patch from John Dias .

    Suggested-by: Peter Zijlstra
    Tested-by: Dietmar Eggemann
    Signed-off-by: Steve Muckle
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Chris Redpath
    Cc: John Dias
    Cc: Linus Torvalds
    Cc: Miguel de Dios
    Cc: Morten Rasmussen
    Cc: Patrick Bellasi
    Cc: Paul Turner
    Cc: Quentin Perret
    Cc: Thomas Gleixner
    Cc: Todd Kjos
    Cc: kernel-team@android.com
    Fixes: b5179ac70de8 ("sched/fair: Prepare to fix fairness problems on migration")
    Link: http://lkml.kernel.org/r/20180831224217.169476-1-smuckle@google.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Steve Muckle
     
  • commit 83f365554e47997ec68dc4eca3f5dce525cd15c3 upstream.

    When reducing ring buffer size, pages are removed by scheduling a work
    item on each CPU for the corresponding CPU ring buffer. After the pages
    are removed from ring buffer linked list, the pages are free()d in a
    tight loop. The loop does not give up CPU until all pages are removed.
    In a worst case behavior, when lot of pages are to be freed, it can
    cause system stall.

    After the pages are removed from the list, the free() can happen while
    the work is rescheduled. Call cond_resched() in the loop to prevent the
    system hangup.

    Link: http://lkml.kernel.org/r/20180907223129.71994-1-vnagarnaik@google.com

    Cc: stable@vger.kernel.org
    Fixes: 83f40318dab00 ("ring-buffer: Make removal of ring buffer pages atomic")
    Reported-by: Jason Behmer
    Signed-off-by: Vaibhav Nagarnaik
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Greg Kroah-Hartman

    Vaibhav Nagarnaik
     

26 Sep, 2018

4 commits

  • [ Upstream commit 8fe5c5a937d0f4e84221631833a2718afde52285 ]

    When a new task wakes-up for the first time, its initial utilization
    is set to half of the spare capacity of its CPU. The current
    implementation of post_init_entity_util_avg() uses SCHED_CAPACITY_SCALE
    directly as a capacity reference. As a result, on a big.LITTLE system, a
    new task waking up on an idle little CPU will be given ~512 of util_avg,
    even if the CPU's capacity is significantly less than that.

    Fix this by computing the spare capacity with arch_scale_cpu_capacity().

    Signed-off-by: Quentin Perret
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Vincent Guittot
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: dietmar.eggemann@arm.com
    Cc: morten.rasmussen@arm.com
    Cc: patrick.bellasi@arm.com
    Link: http://lkml.kernel.org/r/20180612112215.25448-1-quentin.perret@arm.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Quentin Perret
     
  • [ Upstream commit 76e079fefc8f62bd9b2cd2950814d1ee806e31a5 ]

    wake_woken_function() synchronizes with wait_woken() as follows:

    [wait_woken] [wake_woken_function]

    entry->flags &= ~wq_flag_woken; condition = true;
    smp_mb(); smp_wmb();
    if (condition) wq_entry->flags |= wq_flag_woken;
    break;

    This commit replaces the above smp_wmb() with an smp_mb() in order to
    guarantee that either wait_woken() sees the wait condition being true
    or the store to wq_entry->flags in woken_wake_function() follows the
    store in wait_woken() in the coherence order (so that the former can
    eventually be observed by wait_woken()).

    The commit also fixes a comment associated to set_current_state() in
    wait_woken(): the comment pairs the barrier in set_current_state() to
    the above smp_wmb(), while the actual pairing involves the barrier in
    set_current_state() and the barrier executed by the try_to_wake_up()
    in wake_woken_function().

    Signed-off-by: Andrea Parri
    Signed-off-by: Paul E. McKenney
    Acked-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: akiyks@gmail.com
    Cc: boqun.feng@gmail.com
    Cc: dhowells@redhat.com
    Cc: j.alglave@ucl.ac.uk
    Cc: linux-arch@vger.kernel.org
    Cc: luc.maranget@inria.fr
    Cc: npiggin@gmail.com
    Cc: parri.andrea@gmail.com
    Cc: stern@rowland.harvard.edu
    Cc: will.deacon@arm.com
    Link: http://lkml.kernel.org/r/20180716180605.16115-10-paulmck@linux.vnet.ibm.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Andrea Parri
     
  • [ Upstream commit baa2a4fdd525c8c4b0f704d20457195b29437839 ]

    audit_add_watch stores locally krule->watch without taking a reference
    on watch. Then, it calls audit_add_to_parent, and uses the watch stored
    locally.

    Unfortunately, it is possible that audit_add_to_parent updates
    krule->watch.
    When it happens, it also drops a reference of watch which
    could free the watch.

    How to reproduce (with KASAN enabled):

    auditctl -w /etc/passwd -F success=0 -k test_passwd
    auditctl -w /etc/passwd -F success=1 -k test_passwd2

    The second call to auditctl triggers the use-after-free, because
    audit_to_parent updates krule->watch to use a previous existing watch
    and drops the reference to the newly created watch.

    To fix the issue, we grab a reference of watch and we release it at the
    end of the function.

    Signed-off-by: Ronny Chevalier
    Reviewed-by: Richard Guy Briggs
    Signed-off-by: Paul Moore
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Ronny Chevalier
     
  • commit 02e184476eff848273826c1d6617bb37e5bcc7ad upstream.

    Perf can record user stack data in response to a synchronous request, such
    as a tracepoint firing. If this happens under set_fs(KERNEL_DS), then we
    end up reading user stack data using __copy_from_user_inatomic() under
    set_fs(KERNEL_DS). I think this conflicts with the intention of using
    set_fs(KERNEL_DS). And it is explicitly forbidden by hardware on ARM64
    when both CONFIG_ARM64_UAO and CONFIG_ARM64_PAN are used.

    So fix this by forcing USER_DS when recording user stack data.

    Signed-off-by: Yabin Cui
    Acked-by: Peter Zijlstra (Intel)
    Cc:
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Fixes: 88b0193d9418 ("perf/callchain: Force USER_DS when invoking perf_callchain_user()")
    Link: http://lkml.kernel.org/r/20180823225935.27035-1-yabinc@google.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Yabin Cui
     

20 Sep, 2018

3 commits

  • [ Upstream commit 363e934d8811d799c88faffc5bfca782fd728334 ]

    timer_base::must_forward_clock is indicating that the base clock might be
    stale due to a long idle sleep.

    The forwarding of the base clock takes place in the timer softirq or when a
    timer is enqueued to a base which is idle. If the enqueue of timer to an
    idle base happens from a remote CPU, then the following race can happen:

    CPU0 CPU1
    run_timer_softirq mod_timer

    base = lock_timer_base(timer);
    base->must_forward_clk = false
    if (base->must_forward_clk)
    forward(base); -> skipped

    enqueue_timer(base, timer, idx);
    -> idx is calculated high due to
    stale base
    unlock_timer_base(timer);
    base = lock_timer_base(timer);
    forward(base);

    The root cause is that timer_base::must_forward_clk is cleared outside the
    timer_base::lock held region, so the remote queuing CPU observes it as
    cleared, but the base clock is still stale. This can cause large
    granularity values for timers, i.e. the accuracy of the expiry time
    suffers.

    Prevent this by clearing the flag with timer_base::lock held, so that the
    forwarding takes place before the cleared flag is observable by a remote
    CPU.

    Signed-off-by: Gaurav Kohli
    Signed-off-by: Thomas Gleixner
    Cc: john.stultz@linaro.org
    Cc: sboyd@kernel.org
    Cc: linux-arm-msm@vger.kernel.org
    Link: https://lkml.kernel.org/r/1533199863-22748-1-git-send-email-gkohli@codeaurora.org
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Gaurav Kohli
     
  • commit 69fa6eb7d6a64801ea261025cce9723d9442d773 upstream.

    When a teardown callback fails, the CPU hotplug code brings the CPU back to
    the previous state. The previous state becomes the new target state. The
    rollback happens in undo_cpu_down() which increments the state
    unconditionally even if the state is already the same as the target.

    As a consequence the next CPU hotplug operation will start at the wrong
    state. This is easily to observe when __cpu_disable() fails.

    Prevent the unconditional undo by checking the state vs. target before
    incrementing state and fix up the consequently wrong conditional in the
    unplug code which handles the failure of the final CPU take down on the
    control CPU side.

    Fixes: 4dddfb5faa61 ("smp/hotplug: Rewrite AP state machine core")
    Reported-by: Neeraj Upadhyay
    Signed-off-by: Thomas Gleixner
    Tested-by: Geert Uytterhoeven
    Tested-by: Sudeep Holla
    Tested-by: Neeraj Upadhyay
    Cc: josh@joshtriplett.org
    Cc: peterz@infradead.org
    Cc: jiangshanlai@gmail.com
    Cc: dzickus@redhat.com
    Cc: brendan.jackman@arm.com
    Cc: malat@debian.org
    Cc: sramana@codeaurora.org
    Cc: linux-arm-msm@vger.kernel.org
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1809051419580.1416@nanos.tec.linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    ----

    Thomas Gleixner
     
  • commit f8b7530aa0a1def79c93101216b5b17cf408a70a upstream.

    The smp_mb() in cpuhp_thread_fun() is misplaced. It needs to be after the
    load of st->should_run to prevent reordering of the later load/stores
    w.r.t. the load of st->should_run.

    Fixes: 4dddfb5faa61 ("smp/hotplug: Rewrite AP state machine core")
    Signed-off-by: Neeraj Upadhyay
    Signed-off-by: Thomas Gleixner
    Acked-by: Peter Zijlstra (Intel)
    Cc: josh@joshtriplett.org
    Cc: peterz@infradead.org
    Cc: jiangshanlai@gmail.com
    Cc: dzickus@redhat.com
    Cc: brendan.jackman@arm.com
    Cc: malat@debian.org
    Cc: mojha@codeaurora.org
    Cc: sramana@codeaurora.org
    Cc: linux-arm-msm@vger.kernel.org
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/1536126727-11629-1-git-send-email-neeraju@codeaurora.org
    Signed-off-by: Greg Kroah-Hartman

    Neeraj Upadhyay
     

15 Sep, 2018

3 commits

  • commit 77dd66a3c67c93ab401ccc15efff25578be281fd upstream.

    If devm_memremap_pages() detects a collision while adding entries
    to the radix-tree, we call pgmap_radix_release(). Unfortunately,
    the function removes *all* entries for the range -- including the
    entries that caused the collision in the first place.

    Modify pgmap_radix_release() to take an additional argument to
    indicate where to stop, so that only newly added entries are removed
    from the tree.

    Cc:
    Fixes: 9476df7d80df ("mm: introduce find_dev_pagemap()")
    Signed-off-by: Jan H. Schönherr
    Signed-off-by: Dan Williams
    Signed-off-by: Sudip Mukherjee
    Signed-off-by: Greg Kroah-Hartman

    Jan H. Schönherr
     
  • commit 295d6d5e373607729bcc8182c25afe964655714f upstream.

    Fix a bug introduced in:

    72f9f3fdc928 ("sched/deadline: Remove dl_new from struct sched_dl_entity")

    After that commit, when switching to -deadline if the scheduling
    deadline of a task is in the past then switched_to_dl() calls
    setup_new_entity() to properly initialize the scheduling deadline
    and runtime.

    The problem is that the task is enqueued _before_ having its parameters
    initialized by setup_new_entity(), and this can cause problems.
    For example, a task with its out-of-date deadline in the past will
    potentially be enqueued as the highest priority one; however, its
    adjusted deadline may not be the earliest one.

    This patch fixes the problem by initializing the task's parameters before
    enqueuing it.

    Signed-off-by: luca abeni
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Daniel Bristot de Oliveira
    Cc: Juri Lelli
    Cc: Linus Torvalds
    Cc: Mathieu Poirier
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1504778971-13573-3-git-send-email-luca.abeni@santannapisa.it
    Signed-off-by: Ingo Molnar
    Signed-off-by: Sudip Mukherjee
    Signed-off-by: Greg Kroah-Hartman

    Luca Abeni
     
  • [ Upstream commit 06e62a46bbba20aa5286102016a04214bb446141 ]

    Before this change, if a multithreaded process forks while one of its
    threads is changing a signal handler using sigaction(), the memcpy() in
    copy_sighand() can race with the struct assignment in do_sigaction(). It
    isn't clear whether this can cause corruption of the userspace signal
    handler pointer, but it definitely can cause inconsistency between
    different fields of struct sigaction.

    Take the appropriate spinlock to avoid this.

    I have tested that this patch prevents inconsistency between sa_sigaction
    and sa_flags, which is possible before this patch.

    Link: http://lkml.kernel.org/r/20180702145108.73189-1-jannh@google.com
    Signed-off-by: Jann Horn
    Acked-by: Michal Hocko
    Reviewed-by: Andrew Morton
    Cc: Rik van Riel
    Cc: "Peter Zijlstra (Intel)"
    Cc: Kees Cook
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Jann Horn
     

10 Sep, 2018

8 commits

  • commit 5820f140edef111a9ea2ef414ab2428b8cb805b1 upstream.

    The old code would hold the userns_state_mutex indefinitely if
    memdup_user_nul stalled due to e.g. a userfault region. Prevent that by
    moving the memdup_user_nul in front of the mutex_lock().

    Note: This changes the error precedence of invalid buf/count/*ppos vs
    map already written / capabilities missing.

    Fixes: 22d917d80e84 ("userns: Rework the user_namespace adding uid/gid...")
    Cc: stable@vger.kernel.org
    Signed-off-by: Jann Horn
    Acked-by: Christian Brauner
    Acked-by: Serge Hallyn
    Signed-off-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Jann Horn
     
  • commit 42a0cc3478584d4d63f68f2f5af021ddbea771fa upstream.

    Holding uts_sem as a writer while accessing userspace memory allows a
    namespace admin to stall all processes that attempt to take uts_sem.
    Instead, move data through stack buffers and don't access userspace memory
    while uts_sem is held.

    Cc: stable@vger.kernel.org
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Jann Horn
    Signed-off-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Jann Horn
     
  • commit 3df6f61fff49632492490fb6e42646b803a9958a upstream.

    Commit ea0212f40c6 (power: auto select CONFIG_SRCU) made the code in
    drivers/base/power/wakeup.c use SRCU instead of RCU, but it forgot to
    select CONFIG_SRCU in Kconfig, which leads to the following build
    error if CONFIG_SRCU is not selected somewhere else:

    drivers/built-in.o: In function `wakeup_source_remove':
    (.text+0x3c6fc): undefined reference to `synchronize_srcu'
    drivers/built-in.o: In function `pm_print_active_wakeup_sources':
    (.text+0x3c7a8): undefined reference to `__srcu_read_lock'
    drivers/built-in.o: In function `pm_print_active_wakeup_sources':
    (.text+0x3c84c): undefined reference to `__srcu_read_unlock'
    drivers/built-in.o: In function `device_wakeup_arm_wake_irqs':
    (.text+0x3d1d8): undefined reference to `__srcu_read_lock'
    drivers/built-in.o: In function `device_wakeup_arm_wake_irqs':
    (.text+0x3d228): undefined reference to `__srcu_read_unlock'
    drivers/built-in.o: In function `device_wakeup_disarm_wake_irqs':
    (.text+0x3d24c): undefined reference to `__srcu_read_lock'
    drivers/built-in.o: In function `device_wakeup_disarm_wake_irqs':
    (.text+0x3d29c): undefined reference to `__srcu_read_unlock'
    drivers/built-in.o:(.data+0x4158): undefined reference to `process_srcu'

    Fix this error by selecting CONFIG_SRCU when PM_SLEEP is enabled.

    Fixes: ea0212f40c6 (power: auto select CONFIG_SRCU)
    Cc: 4.2+ # 4.2+
    Signed-off-by: zhangyi (F)
    [ rjw: Minor subject/changelog fixups ]
    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Greg Kroah-Hartman

    zhangyi (F)
     
  • commit 016f8ffc48cb01d1e7701649c728c5d2e737d295 upstream.

    While debugging another bug, I was looking at all the synchronize*()
    functions being used in kernel/trace, and noticed that trace_uprobes was
    using synchronize_sched(), with a comment to synchronize with
    {u,ret}_probe_trace_func(). When looking at those functions, the data is
    protected with "rcu_read_lock()" and not with "rcu_read_lock_sched()". This
    is using the wrong synchronize_*() function.

    Link: http://lkml.kernel.org/r/20180809160553.469e1e32@gandalf.local.home

    Cc: stable@vger.kernel.org
    Fixes: 70ed91c6ec7f8 ("tracing/uprobes: Support ftrace_event_file base multibuffer")
    Acked-by: Oleg Nesterov
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Greg Kroah-Hartman

    Steven Rostedt (VMware)
     
  • commit 6e9df95b76cad18f7b217bdad7bb8a26d63b8c47 upstream.

    livepatch module author can pass module name/old function name with more
    than the defined character limit. With obj->name length greater than
    MODULE_NAME_LEN, the livepatch module gets loaded but waits forever on
    the module specified by obj->name to be loaded. It also populates a /sys
    directory with an untruncated object name.

    In the case of funcs->old_name length greater then KSYM_NAME_LEN, it
    would not match against any of the symbol table entries. Instead loop
    through the symbol table comparing them against a nonexisting function,
    which can be avoided.

    The same issues apply, to misspelled/incorrect names. At least gatekeep
    the modules with over the limit string length, by checking for their
    length during livepatch module registration.

    Cc: stable@vger.kernel.org
    Signed-off-by: Kamalesh Babulal
    Acked-by: Josh Poimboeuf
    Signed-off-by: Jiri Kosina
    Signed-off-by: Greg Kroah-Hartman

    Kamalesh Babulal
     
  • commit d1c392c9e2a301f38998a353f467f76414e38725 upstream.

    I hit the following splat in my tests:

    ------------[ cut here ]------------
    IRQs not enabled as expected
    WARNING: CPU: 3 PID: 0 at kernel/time/tick-sched.c:982 tick_nohz_idle_enter+0x44/0x8c
    Modules linked in: ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables ipv6
    CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.0-rc2-test+ #2
    Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
    EIP: tick_nohz_idle_enter+0x44/0x8c
    Code: ec 05 00 00 00 75 26 83 b8 c0 05 00 00 00 75 1d 80 3d d0 36 3e c1 00
    75 14 68 94 63 12 c1 c6 05 d0 36 3e c1 01 e8 04 ee f8 ff 0b 58 fa bb a0
    e5 66 c1 e8 25 0f 04 00 64 03 1d 28 31 52 c1 8b
    EAX: 0000001c EBX: f26e7f8c ECX: 00000006 EDX: 00000007
    ESI: f26dd1c0 EDI: 00000000 EBP: f26e7f40 ESP: f26e7f38
    DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010296
    CR0: 80050033 CR2: 0813c6b0 CR3: 2f342000 CR4: 001406f0
    Call Trace:
    do_idle+0x33/0x202
    cpu_startup_entry+0x61/0x63
    start_secondary+0x18e/0x1ed
    startup_32_smp+0x164/0x168
    irq event stamp: 18773830
    hardirqs last enabled at (18773829): [] trace_hardirqs_on_thunk+0xc/0x10
    hardirqs last disabled at (18773830): [] trace_hardirqs_off_thunk+0xc/0x10
    softirqs last enabled at (18773824): [] __do_softirq+0x25f/0x2bf
    softirqs last disabled at (18773767): [] call_on_stack+0x45/0x4b
    ---[ end trace b7c64aa79e17954a ]---

    After a bit of debugging, I found what was happening. This would trigger
    when performing "perf" with a high NMI interrupt rate, while enabling and
    disabling function tracer. Ftrace uses breakpoints to convert the nops at
    the start of functions to calls to the function trampolines. The breakpoint
    traps disable interrupts and this makes calls into lockdep via the
    trace_hardirqs_off_thunk in the entry.S code. What happens is the following:

    do_idle {

    [interrupts enabled]

    [interrupts disabled]
    TRACE_IRQS_OFF [lockdep says irqs off]
    [...]
    TRACE_IRQS_IRET
    test if pt_regs say return to interrupts enabled [yes]
    TRACE_IRQS_ON [lockdep says irqs are on]


    nmi_enter() {
    printk_nmi_enter() [traced by ftrace]
    [ hit ftrace breakpoint ]

    TRACE_IRQS_OFF [lockdep says irqs off]
    [...]
    TRACE_IRQS_IRET [return from breakpoint]
    test if pt_regs say interrupts enabled [no]
    [iret back to interrupt]
    [iret back to code]

    tick_nohz_idle_enter() {

    lockdep_assert_irqs_enabled() [lockdep say no!]

    Although interrupts are indeed enabled, lockdep thinks it is not, and since
    we now do asserts via lockdep, it gives a false warning. The issue here is
    that printk_nmi_enter() is called before lockdep_off(), which disables
    lockdep (for this reason) in NMIs. By simply not allowing ftrace to see
    printk_nmi_enter() (via notrace annotation) we keep lockdep from getting
    confused.

    Cc: stable@vger.kernel.org
    Fixes: 42a0bb3f71383 ("printk/nmi: generic solution for safe printk in NMI")
    Acked-by: Sergey Senozhatsky
    Acked-by: Petr Mladek
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Greg Kroah-Hartman

    Steven Rostedt (VMware)
     
  • commit 757d9140072054528b13bbe291583d9823cde195 upstream.

    Masami Hiramatsu reported:

    Current trace-enable attribute in sysfs returns an error
    if user writes the same setting value as current one,
    e.g.

    # cat /sys/block/sda/trace/enable
    0
    # echo 0 > /sys/block/sda/trace/enable
    bash: echo: write error: Invalid argument
    # echo 1 > /sys/block/sda/trace/enable
    # echo 1 > /sys/block/sda/trace/enable
    bash: echo: write error: Device or resource busy

    But this is not a preferred behavior, it should ignore
    if new setting is same as current one. This fixes the
    problem as below.

    # cat /sys/block/sda/trace/enable
    0
    # echo 0 > /sys/block/sda/trace/enable
    # echo 1 > /sys/block/sda/trace/enable
    # echo 1 > /sys/block/sda/trace/enable

    Link: http://lkml.kernel.org/r/20180816103802.08678002@gandalf.local.home

    Cc: Ingo Molnar
    Cc: Jens Axboe
    Cc: linux-block@vger.kernel.org
    Cc: stable@vger.kernel.org
    Fixes: cd649b8bb830d ("blktrace: remove sysfs_blk_trace_enable_show/store()")
    Reported-by: Masami Hiramatsu
    Tested-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Steven Rostedt (VMware)
     
  • commit f143641bfef9a4a60c57af30de26c63057e7e695 upstream.

    Currently, when one echo's in 1 into tracing_on, the current tracer's
    "start()" function is executed, even if tracing_on was already one. This can
    lead to strange side effects. One being that if the hwlat tracer is enabled,
    and someone does "echo 1 > tracing_on" into tracing_on, the hwlat tracer's
    start() function is called again which will recreate another kernel thread,
    and make it unable to remove the old one.

    Link: http://lkml.kernel.org/r/1533120354-22923-1-git-send-email-erica.bugden@linutronix.de

    Cc: stable@vger.kernel.org
    Fixes: 2df8f8a6a897e ("tracing: Fix regression with irqsoff tracer and tracing_on file")
    Reported-by: Erica Bugden
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Greg Kroah-Hartman

    Steven Rostedt (VMware)
     

05 Sep, 2018

9 commits

  • commit cb9d7fd51d9fbb329d182423bd7b92d0f8cb0e01 upstream.

    Some architectures need to use stop_machine() to patch functions for
    ftrace, and the assumption is that the stopped CPUs do not make function
    calls to traceable functions when they are in the stopped state.

    Commit ce4f06dcbb5d ("stop_machine: Touch_nmi_watchdog() after
    MULTI_STOP_PREPARE") added calls to the watchdog touch functions from
    the stopped CPUs and those functions lack notrace annotations. This
    leads to crashes when enabling/disabling ftrace on ARM kernels built
    with the Thumb-2 instruction set.

    Fix it by adding the necessary notrace annotations.

    Fixes: ce4f06dcbb5d ("stop_machine: Touch_nmi_watchdog() after MULTI_STOP_PREPARE")
    Signed-off-by: Vincent Whitchurch
    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: oleg@redhat.com
    Cc: tj@kernel.org
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20180821152507.18313-1-vincent.whitchurch@axis.com
    Signed-off-by: Greg Kroah-Hartman

    Vincent Whitchurch
     
  • commit f2a3ab36077222437b4826fc76111caa14562b7c upstream.

    Since the blacklist and list files on debugfs indicates
    a sensitive address information to reader, it should be
    restricted to the root user.

    Suggested-by: Thomas Richter
    Suggested-by: Ingo Molnar
    Signed-off-by: Masami Hiramatsu
    Cc: Ananth N Mavinakayanahalli
    Cc: Anil S Keshavamurthy
    Cc: Arnd Bergmann
    Cc: David Howells
    Cc: David S . Miller
    Cc: Heiko Carstens
    Cc: Jon Medhurst
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Tobin C . Harding
    Cc: Will Deacon
    Cc: acme@kernel.org
    Cc: akpm@linux-foundation.org
    Cc: brueckner@linux.vnet.ibm.com
    Cc: linux-arch@vger.kernel.org
    Cc: rostedt@goodmis.org
    Cc: schwidefsky@de.ibm.com
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/lkml/152491890171.9916.5183693615601334087.stgit@devbox
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Masami Hiramatsu
     
  • commit cfd355145c32bb7ccb65fccbe2d67280dc2119e1 upstream.

    When cpu_stop_queue_work() releases the lock for the stopper
    thread that was queued into its wake queue, preemption is
    enabled, which leads to the following deadlock:

    CPU0 CPU1
    sched_setaffinity(0, ...)
    __set_cpus_allowed_ptr()
    stop_one_cpu(0, ...) stop_two_cpus(0, 1, ...)
    cpu_stop_queue_work(0, ...) cpu_stop_queue_two_works(0, ..., 1, ...)

    -grabs lock for migration/0-
    -spins with preemption disabled,
    waiting for migration/0's lock to be
    released-

    -adds work items for migration/0
    and queues migration/0 to its
    wake_q-

    -releases lock for migration/0
    and preemption is enabled-

    -current thread is preempted,
    and __set_cpus_allowed_ptr
    has changed the thread's
    cpu allowed mask to CPU1 only-

    -acquires migration/0 and migration/1's
    locks-

    -adds work for migration/0 but does not
    add migration/0 to wake_q, since it is
    already in a wake_q-

    -adds work for migration/1 and adds
    migration/1 to its wake_q-

    -releases migration/0 and migration/1's
    locks, wakes migration/1, and enables
    preemption-

    -since migration/1 is requested to run,
    migration/1 begins to run and waits on
    migration/0, but migration/0 will never
    be able to run, since the thread that
    can wake it is affine to CPU1-

    Disable preemption in cpu_stop_queue_work() before queueing works for
    stopper threads, and queueing the stopper thread in the wake queue, to
    ensure that the operation of queueing the works and waking the stopper
    threads is atomic.

    Fixes: 0b26351b910f ("stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock")
    Signed-off-by: Prasad Sodagudi
    Signed-off-by: Isaac J. Manjarres
    Signed-off-by: Thomas Gleixner
    Cc: peterz@infradead.org
    Cc: matt@codeblueprint.co.uk
    Cc: bigeasy@linutronix.de
    Cc: gregkh@linuxfoundation.org
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/1533329766-4856-1-git-send-email-isaacm@codeaurora.org
    Signed-off-by: Greg Kroah-Hartman

    Co-Developed-by: Isaac J. Manjarres

    Prasad Sodagudi
     
  • commit b80a2bfce85e1051056d98d04ecb2d0b55cbbc1c upstream.

    The code flow in cpu_stop_queue_two_works() is a little arcane; fix this by
    lifting the preempt_disable() to the top to create more natural nesting wrt
    the spinlocks and make the wake_up_q() and preempt_enable() unconditional
    at the end.

    Furthermore, enable preemption in the -EDEADLK case, such that we spin-wait
    with preemption enabled.

    Suggested-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Thomas Gleixner
    Cc: Sebastian Andrzej Siewior
    Cc: isaacm@codeaurora.org
    Cc: matt@codeblueprint.co.uk
    Cc: psodagud@codeaurora.org
    Cc: gregkh@linuxfoundation.org
    Cc: pkondeti@codeaurora.org
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20180730112140.GH2494@hirez.programming.kicks-ass.net
    Signed-off-by: Greg Kroah-Hartman

    Peter Zijlstra
     
  • commit 03fc7f9c99c1e7ae2925d459e8487f1a6f199f79 upstream.

    The commit 719f6a7040f1bdaf96 ("printk: Use the main logbuf in NMI
    when logbuf_lock is available") brought back the possible deadlocks
    in printk() and NMI.

    The check of logbuf_lock is done only in printk_nmi_enter() to prevent
    mixed output. But another CPU might take the lock later, enter NMI, and:

    + Both NMIs might be serialized by yet another lock, for example,
    the one in nmi_cpu_backtrace().

    + The other CPU might get stopped in NMI, see smp_send_stop()
    in panic().

    The only safe solution is to use trylock when storing the message
    into the main log-buffer. It might cause reordering when some lines
    go to the main lock buffer directly and others are delayed via
    the per-CPU buffer. It means that it is not useful in general.

    This patch replaces the problematic NMI deferred context with NMI
    direct context. It can be used to mark a code that might produce
    many messages in NMI and the risk of losing them is more critical
    than problems with eventual reordering.

    The context is then used when dumping trace buffers on oops. It was
    the primary motivation for the original fix. Also the reordering is
    even smaller issue there because some traces have their own time stamps.

    Finally, nmi_cpu_backtrace() need not longer be serialized because
    it will always us the per-CPU buffers again.

    Fixes: 719f6a7040f1bdaf96 ("printk: Use the main logbuf in NMI when logbuf_lock is available")
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/20180627142028.11259-1-pmladek@suse.com
    To: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Tetsuo Handa
    Cc: Sergey Senozhatsky
    Cc: linux-kernel@vger.kernel.org
    Cc: stable@vger.kernel.org
    Acked-by: Sergey Senozhatsky
    Signed-off-by: Petr Mladek
    Signed-off-by: Greg Kroah-Hartman

    Petr Mladek
     
  • commit a338f84dc196f44b63ba0863d2f34fd9b1613572 upstream.

    It is just a preparation step. The patch does not change
    the existing behavior.

    Link: http://lkml.kernel.org/r/20180627140817.27764-3-pmladek@suse.com
    To: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Tetsuo Handa
    Cc: Sergey Senozhatsky
    Cc: linux-kernel@vger.kernel.org
    Cc: stable@vger.kernel.org
    Acked-by: Sergey Senozhatsky
    Signed-off-by: Petr Mladek
    Signed-off-by: Greg Kroah-Hartman

    Petr Mladek
     
  • commit ba552399954dde1b388f7749fecad5c349216981 upstream.

    It is just a preparation step. The patch does not change
    the existing behavior.

    Link: http://lkml.kernel.org/r/20180627140817.27764-2-pmladek@suse.com
    To: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Tetsuo Handa
    Cc: Sergey Senozhatsky
    Cc: linux-kernel@vger.kernel.org
    Cc: stable@vger.kernel.org
    Acked-by: Sergey Senozhatsky
    Signed-off-by: Petr Mladek
    Signed-off-by: Greg Kroah-Hartman

    Petr Mladek
     
  • [ Upstream commit f3d133ee0a17d5694c6f21873eec9863e11fa423 ]

    NO_RT_RUNTIME_SHARE feature is used to prevent a CPU borrow enough
    runtime with a spin-rt-task.

    However, if RT_RUNTIME_SHARE feature is enabled and rt_rq has borrowd
    enough rt_runtime at the beginning, rt_runtime can't be restored to
    its initial bandwidth rt_runtime after we disable RT_RUNTIME_SHARE.

    E.g. on my PC with 4 cores, procedure to reproduce:
    1) Make sure RT_RUNTIME_SHARE is enabled
    cat /sys/kernel/debug/sched_features
    GENTLE_FAIR_SLEEPERS START_DEBIT NO_NEXT_BUDDY LAST_BUDDY
    CACHE_HOT_BUDDY WAKEUP_PREEMPTION NO_HRTICK NO_DOUBLE_TICK
    LB_BIAS NONTASK_CAPACITY TTWU_QUEUE NO_SIS_AVG_CPU SIS_PROP
    NO_WARN_DOUBLE_CLOCK RT_PUSH_IPI RT_RUNTIME_SHARE NO_LB_MIN
    ATTACH_AGE_LOAD WA_IDLE WA_WEIGHT WA_BIAS
    2) Start a spin-rt-task
    ./loop_rr &
    3) set affinity to the last cpu
    taskset -p 8 $pid_of_loop_rr
    4) Observe that last cpu have borrowed enough runtime.
    cat /proc/sched_debug | grep rt_runtime
    .rt_runtime : 950.000000
    .rt_runtime : 900.000000
    .rt_runtime : 950.000000
    .rt_runtime : 1000.000000
    5) Disable RT_RUNTIME_SHARE
    echo NO_RT_RUNTIME_SHARE > /sys/kernel/debug/sched_features
    6) Observe that rt_runtime can not been restored
    cat /proc/sched_debug | grep rt_runtime
    .rt_runtime : 950.000000
    .rt_runtime : 900.000000
    .rt_runtime : 950.000000
    .rt_runtime : 1000.000000

    This patch help to restore rt_runtime after we disable
    RT_RUNTIME_SHARE.

    Signed-off-by: Hailong Liu
    Signed-off-by: Jiang Biao
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: zhong.weidong@zte.com.cn
    Link: http://lkml.kernel.org/r/1531874815-39357-1-git-send-email-liu.hailong6@zte.com.cn
    Signed-off-by: Ingo Molnar
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Hailong Liu
     
  • [ Upstream commit 62cedf3e60af03e47849fe2bd6a03ec179422a8a ]

    Needed for annotating rt_mutex locks.

    Tested-by: John Sperbeck
    Signed-off-by: Peter Rosin
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Davidlohr Bueso
    Cc: Deepa Dinamani
    Cc: Greg Kroah-Hartman
    Cc: Linus Torvalds
    Cc: Peter Chang
    Cc: Peter Zijlstra
    Cc: Philippe Ombredanne
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Cc: Wolfram Sang
    Link: http://lkml.kernel.org/r/20180720083914.1950-2-peda@axentia.se
    Signed-off-by: Ingo Molnar
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Peter Rosin
     

24 Aug, 2018

3 commits

  • [ Upstream commit 26b68dd2f48fe7699a89f0cfbb9f4a650dc1c837 ]

    Silence warnings (triggered at W=1) by adding relevant __printf attributes.

    CC kernel/trace/trace.o
    kernel/trace/trace.c: In function ‘__trace_array_vprintk’:
    kernel/trace/trace.c:2979:2: warning: function might be possible candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format]
    len = vscnprintf(tbuffer, TRACE_BUF_SIZE, fmt, args);
    ^~~
    AR kernel/trace/built-in.o

    Link: http://lkml.kernel.org/r/20180308205843.27447-1-malat@debian.org

    Signed-off-by: Mathieu Malaterre
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Mathieu Malaterre
     
  • [ Upstream commit ed2b82c03dc187018307c7c6bf9299705f3db383 ]

    Decrement the number of elements in the map in case the allocation
    of a new node fails.

    Fixes: 6c9059817432 ("bpf: pre-allocate hash map elements")
    Signed-off-by: Mauricio Vasquez B
    Acked-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Mauricio Vasquez B
     
  • [ Upstream commit fcc784be837714a9173b372ff9fb9b514590dad9 ]

    While debugging where things were going wrong with mapping
    enabling/disabling interrupts with the lockdep state and actual real
    enabling and disabling interrupts, I had to silent the IRQ
    disabling/enabling in debug_check_no_locks_freed() because it was
    always showing up as it was called before the splat was.

    Use raw_local_irq_save/restore() for not only debug_check_no_locks_freed()
    but for all internal lockdep functions, as they hide useful information
    about where interrupts were used incorrectly last.

    Signed-off-by: Steven Rostedt (VMware)
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Link: https://lkml.kernel.org/lkml/20180404140630.3f4f4c7a@gandalf.local.home
    Signed-off-by: Ingo Molnar
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Steven Rostedt (VMware)
     

16 Aug, 2018

2 commits

  • commit 269777aa530f3438ec1781586cdac0b5fe47b061 upstream.

    Commit 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once")
    breaks non-SMP builds.

    [ I suspect the 'bool' fields should just be made to be bitfields and be
    exposed regardless of configuration, but that's a separate cleanup
    that I'll leave to the owners of this file for later. - Linus ]

    Fixes: 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once")
    Cc: Dave Hansen
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Signed-off-by: Abel Vesa
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Abel Vesa
     
  • commit bc2d8d262cba5736332cbc866acb11b1c5748aa9 upstream

    Josh reported that the late SMT evaluation in cpu_smt_state_init() sets
    cpu_smt_control to CPU_SMT_NOT_SUPPORTED in case that 'nosmt' was supplied
    on the kernel command line as it cannot differentiate between SMT disabled
    by BIOS and SMT soft disable via 'nosmt'. That wreckages the state and
    makes the sysfs interface unusable.

    Rework this so that during bringup of the non boot CPUs the availability of
    SMT is determined in cpu_smt_allowed(). If a newly booted CPU is not a
    'primary' thread then set the local cpu_smt_available marker and evaluate
    this explicitely right after the initial SMP bringup has finished.

    SMT evaulation on x86 is a trainwreck as the firmware has all the
    information _before_ booting the kernel, but there is no interface to query
    it.

    Fixes: 73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS")
    Reported-by: Josh Poimboeuf
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner