29 Oct, 2018

2 commits

  • When CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=n, the call path
    hrtimer_reprogram -> clockevents_program_event ->
    clockevents_program_min_delta will not retry if the clock event driver
    returns -ETIME.

    If the driver could not satisfy the program_min_delta for any reason, the
    lack of a retry means the CPU may not receive a tick interrupt, potentially
    until the counter does a full period. This leads to rcu_sched timeout
    messages as the stalled CPU is detected by other CPUs, and other issues if
    the CPU is holding locks or other resources at the point at which it
    stalls.

    There have been a couple of observed mechanisms through which a clock event
    driver could not satisfy the requested min_delta and return -ETIME.

    With the MIPS GIC driver, the shared execution resource within MT cores
    means inconventient latency due to execution of instructions from other
    hardware threads in the core, within gic_next_event, can result in an event
    being set in the past.

    Additionally under virtualisation it is possible to get unexpected latency
    during a clockevent device's set_next_event() callback which can make it
    return -ETIME even for a delta based on min_delta_ns.

    It isn't appropriate to use MIN_ADJUST in the virtualisation case as
    occasional hypervisor induced high latency will cause min_delta_ns to
    quickly increase to the maximum.

    Instead, borrow the retry pattern from the MIN_ADJUST case, but without
    making adjustments. Retry up to 10 times, each time increasing the
    attempted delta by min_delta, before giving up.

    [ Matt: Reworked the loop and made retry increase the delta. ]

    Signed-off-by: James Hogan
    Signed-off-by: Matt Redfearn
    Signed-off-by: Thomas Gleixner
    Cc: linux-mips@linux-mips.org
    Cc: Daniel Lezcano
    Cc: "Martin Schwidefsky"
    Cc: James Hogan
    Link: https://lkml.kernel.org/r/1508422643-6075-1-git-send-email-matt.redfearn@mips.com

    James Hogan
     
  • These macros can be reused by governors which don't use the common
    governor code present in cpufreq_governor.c and should be moved to the
    relevant header.

    Now that they are getting moved to the right header file, reuse them in
    schedutil governor as well (that required rename of show/store
    routines).

    Also create gov_attr_wo() macro for write-only sysfs files, this will be
    used by Interactive governor in a later patch.

    Signed-off-by: Viresh Kumar

    Viresh Kumar
     

20 Oct, 2018

1 commit

  • commit 15d36fecd0bdc7510b70a0e5ec6671140b3fce0c upstream.

    When pmem namespaces created are smaller than section size, this can
    cause an issue during removal and gpf was observed:

    general protection fault: 0000 1 SMP PTI
    CPU: 36 PID: 3941 Comm: ndctl Tainted: G W 4.14.28-1.el7uek.x86_64 #2
    task: ffff88acda150000 task.stack: ffffc900233a4000
    RIP: 0010:__put_page+0x56/0x79
    Call Trace:
    devm_memremap_pages_release+0x155/0x23a
    release_nodes+0x21e/0x260
    devres_release_all+0x3c/0x48
    device_release_driver_internal+0x15c/0x207
    device_release_driver+0x12/0x14
    unbind_store+0xba/0xd8
    drv_attr_store+0x27/0x31
    sysfs_kf_write+0x3f/0x46
    kernfs_fop_write+0x10f/0x18b
    __vfs_write+0x3a/0x16d
    vfs_write+0xb2/0x1a1
    SyS_write+0x55/0xb9
    do_syscall_64+0x79/0x1ae
    entry_SYSCALL_64_after_hwframe+0x3d/0x0

    Add code to check whether we have a mapping already in the same section
    and prevent additional mappings from being created if that is the case.

    Link: http://lkml.kernel.org/r/152909478401.50143.312364396244072931.stgit@djiang5-desk3.ch.intel.com
    Signed-off-by: Dave Jiang
    Cc: Dan Williams
    Cc: Robert Elliott
    Cc: Jeff Moyer
    Cc: Matthew Wilcox
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sudip Mukherjee
    Signed-off-by: Greg Kroah-Hartman

    Dave Jiang
     

18 Oct, 2018

1 commit

  • commit 479adb89a97b0a33e5a9d702119872cc82ca21aa upstream.

    A cgroup which is already a threaded domain may be converted into a
    threaded cgroup if the prerequisite conditions are met. When this
    happens, all threaded descendant should also have their ->dom_cgrp
    updated to the new threaded domain cgroup. Unfortunately, this
    propagation was missing leading to the following failure.

    # cd /sys/fs/cgroup/unified
    # cat cgroup.subtree_control # show that no controllers are enabled

    # mkdir -p mycgrp/a/b/c
    # echo threaded > mycgrp/a/b/cgroup.type

    At this point, the hierarchy looks as follows:

    mycgrp [d]
    a [dt]
    b [t]
    c [inv]

    Now let's make node "a" threaded (and thus "mycgrp" s made "domain threaded"):

    # echo threaded > mycgrp/a/cgroup.type

    By this point, we now have a hierarchy that looks as follows:

    mycgrp [dt]
    a [t]
    b [t]
    c [inv]

    But, when we try to convert the node "c" from "domain invalid" to
    "threaded", we get ENOTSUP on the write():

    # echo threaded > mycgrp/a/b/c/cgroup.type
    sh: echo: write error: Operation not supported

    This patch fixes the problem by

    * Moving the opencoded ->dom_cgrp save and restoration in
    cgroup_enable_threaded() into cgroup_{save|restore}_control() so
    that mulitple cgroups can be handled.

    * Updating all threaded descendants' ->dom_cgrp to point to the new
    dom_cgrp when enabling threaded mode.

    Signed-off-by: Tejun Heo
    Reported-and-tested-by: "Michael Kerrisk (man-pages)"
    Reported-by: Amin Jamali
    Reported-by: Joao De Almeida Pereira
    Link: https://lore.kernel.org/r/CAKgNAkhHYCMn74TCNiMJ=ccLd7DcmXSbvw3CbZ1YREeG7iJM5g@mail.gmail.com
    Fixes: 454000adaa2a ("cgroup: introduce cgroup->dom_cgrp and threaded css_set handling")
    Cc: stable@vger.kernel.org # v4.14+
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     

13 Oct, 2018

1 commit

  • commit befb1b3c2703897c5b8ffb0044dc5d0e5f27c5d7 upstream.

    It is possible that a failure can occur during the scheduling of a
    pinned event. The initial portion of perf_event_read_local() contains
    the various error checks an event should pass before it can be
    considered valid. Ensure that the potential scheduling failure
    of a pinned event is checked for and have a credible error.

    Suggested-by: Peter Zijlstra
    Signed-off-by: Reinette Chatre
    Signed-off-by: Thomas Gleixner
    Acked-by: Peter Zijlstra (Intel)
    Cc: fenghua.yu@intel.com
    Cc: tony.luck@intel.com
    Cc: acme@kernel.org
    Cc: gavin.hindman@intel.com
    Cc: jithu.joseph@intel.com
    Cc: dave.hansen@intel.com
    Cc: hpa@zytor.com
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/6486385d1f30336e9973b24c8c65f5079543d3d3.1537377064.git.reinette.chatre@intel.com
    Signed-off-by: Greg Kroah-Hartman

    Reinette Chatre
     

10 Oct, 2018

1 commit

  • commit b799207e1e1816b09e7a5920fbb2d5fcf6edd681 upstream.

    When I wrote commit 468f6eafa6c4 ("bpf: fix 32-bit ALU op verification"), I
    assumed that, in order to emulate 64-bit arithmetic with 32-bit logic, it
    is sufficient to just truncate the output to 32 bits; and so I just moved
    the register size coercion that used to be at the start of the function to
    the end of the function.

    That assumption is true for almost every op, but not for 32-bit right
    shifts, because those can propagate information towards the least
    significant bit. Fix it by always truncating inputs for 32-bit ops to 32
    bits.

    Also get rid of the coerce_reg_to_size() after the ALU op, since that has
    no effect.

    Fixes: 468f6eafa6c4 ("bpf: fix 32-bit ALU op verification")
    Acked-by: Daniel Borkmann
    Signed-off-by: Jann Horn
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Greg Kroah-Hartman

    Jann Horn
     

04 Oct, 2018

5 commits

  • [ Upstream commit 9b2e0388bec8ec5427403e23faff3b58dd1c3200 ]

    When sockmap code is using the stream parser it also handles the write
    space events in order to handle the case where (a) verdict redirects
    skb to another socket and (b) the sockmap then sends the skb but due
    to memory constraints (or other EAGAIN errors) needs to do a retry.

    But the initial code missed a third case where the
    skb_send_sock_locked() triggers an sk_wait_event(). A typically case
    would be when sndbuf size is exceeded. If this happens because we
    do not pass the write_space event to the lower layers we never wake
    up the event and it will wait for sndtimeo. Which as noted in ktls
    fix may be rather large and look like a hang to the user.

    To reproduce the best test is to reduce the sndbuf size and send
    1B data chunks to stress the memory handling. To fix this pass the
    event from the upper layer to the lower layer.

    Signed-off-by: John Fastabend
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    John Fastabend
     
  • [ Upstream commit 9f2d1e68cf4d641def734adaccfc3823d3575e6c ]

    Livepatch modules are special in that we preserve their entire symbol
    tables in order to be able to apply relocations after module load. The
    unwanted side effect of this is that undefined (SHN_UNDEF) symbols of
    livepatch modules are accessible via the kallsyms api and this can
    confuse symbol resolution in livepatch (klp_find_object_symbol()) and
    cause subtle bugs in livepatch.

    Have the module kallsyms api skip over SHN_UNDEF symbols. These symbols
    are usually not available for normal modules anyway as we cut down their
    symbol tables to just the core (non-undefined) symbols, so this should
    really just affect livepatch modules. Note that this patch doesn't
    affect the display of undefined symbols in /proc/kallsyms.

    Reported-by: Josh Poimboeuf
    Tested-by: Josh Poimboeuf
    Reviewed-by: Josh Poimboeuf
    Signed-off-by: Jessica Yu
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Jessica Yu
     
  • [ Upstream commit 78c9c4dfbf8c04883941445a195276bb4bb92c76 ]

    The posix timer overrun handling is broken because the forwarding functions
    can return a huge number of overruns which does not fit in an int. As a
    consequence timer_getoverrun(2) and siginfo::si_overrun can turn into
    random number generators.

    The k_clock::timer_forward() callbacks return a 64 bit value now. Make
    k_itimer::ti_overrun[_last] 64bit as well, so the kernel internal
    accounting is correct. 3Remove the temporary (int) casts.

    Add a helper function which clamps the overrun value returned to user space
    via timer_getoverrun(2) or siginfo::si_overrun limited to a positive value
    between 0 and INT_MAX. INT_MAX is an indicator for user space that the
    overrun value has been clamped.

    Reported-by: Team OWL337
    Signed-off-by: Thomas Gleixner
    Acked-by: John Stultz
    Cc: Peter Zijlstra
    Cc: Michael Kerrisk
    Link: https://lkml.kernel.org/r/20180626132705.018623573@linutronix.de
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • [ Upstream commit 6fec64e1c92d5c715c6d0f50786daa7708266bde ]

    The posix timer ti_overrun handling is broken because the forwarding
    functions can return a huge number of overruns which does not fit in an
    int. As a consequence timer_getoverrun(2) and siginfo::si_overrun can turn
    into random number generators.

    As a first step to address that let the timer_forward() callbacks return
    the full 64 bit value.

    Cast it to (int) temporarily until k_itimer::ti_overrun is converted to
    64bit and the conversion to user space visible values is sanitized.

    Reported-by: Team OWL337
    Signed-off-by: Thomas Gleixner
    Acked-by: John Stultz
    Cc: Peter Zijlstra
    Cc: Michael Kerrisk
    Link: https://lkml.kernel.org/r/20180626132704.922098090@linutronix.de
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • [ Upstream commit 5f936e19cc0ef97dbe3a56e9498922ad5ba1edef ]

    Air Icy reported:

    UBSAN: Undefined behaviour in kernel/time/alarmtimer.c:811:7
    signed integer overflow:
    1529859276030040771 + 9223372036854775807 cannot be represented in type 'long long int'
    Call Trace:
    alarm_timer_nsleep+0x44c/0x510 kernel/time/alarmtimer.c:811
    __do_sys_clock_nanosleep kernel/time/posix-timers.c:1235 [inline]
    __se_sys_clock_nanosleep kernel/time/posix-timers.c:1213 [inline]
    __x64_sys_clock_nanosleep+0x326/0x4e0 kernel/time/posix-timers.c:1213
    do_syscall_64+0xb8/0x3a0 arch/x86/entry/common.c:290

    alarm_timer_nsleep() uses ktime_add() to add the current time and the
    relative expiry value. ktime_add() has no sanity checks so the addition
    can overflow when the relative timeout is large enough.

    Use ktime_add_safe() which has the necessary sanity checks in place and
    limits the result to the valid range.

    Fixes: 9a7adcf5c6de ("timers: Posix interface for alarm-timers")
    Reported-by: Team OWL337
    Signed-off-by: Thomas Gleixner
    Cc: John Stultz
    Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1807020926360.1595@nanos.tec.linutronix.de
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

29 Sep, 2018

3 commits

  • Commit 0a0e0829f990 ("nohz: Fix missing tick reprogram when interrupting an
    inline softirq") got backported to stable trees and now causes the NOHZ
    softirq pending warning to trigger. It's not an upstream issue as the NOHZ
    update logic has been changed there.

    The problem is when a softirq disabled section gets interrupted and on
    return from interrupt the tick/nohz state is evaluated, which then can
    observe pending soft interrupts. These soft interrupts are legitimately
    pending because they cannot be processed as long as soft interrupts are
    disabled and the interrupted code will correctly process them when soft
    interrupts are reenabled.

    Add a check for softirqs disabled to the pending check to prevent the
    warning.

    Reported-by: Grygorii Strashko
    Reported-by: John Crispin
    Signed-off-by: Thomas Gleixner
    Tested-by: Grygorii Strashko
    Tested-by: John Crispin
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Anna-Maria Gleixner
    Cc: stable@vger.kernel.org
    Fixes: 2d898915ccf4838c ("nohz: Fix missing tick reprogram when interrupting an inline softirq")
    Acked-by: Frederic Weisbecker
    Tested-by: Geert Uytterhoeven

    Thomas Gleixner
     
  • commit d0cdb3ce8834332d918fc9c8ff74f8a169ec9abe upstream.

    When a task which previously ran on a given CPU is remotely queued to
    wake up on that same CPU, there is a period where the task's state is
    TASK_WAKING and its vruntime is not normalized. This is not accounted
    for in vruntime_normalized() which will cause an error in the task's
    vruntime if it is switched from the fair class during this time.

    For example if it is boosted to RT priority via rt_mutex_setprio(),
    rq->min_vruntime will not be subtracted from the task's vruntime but
    it will be added again when the task returns to the fair class. The
    task's vruntime will have been erroneously doubled and the effective
    priority of the task will be reduced.

    Note this will also lead to inflation of all vruntimes since the doubled
    vruntime value will become the rq's min_vruntime when other tasks leave
    the rq. This leads to repeated doubling of the vruntime and priority
    penalty.

    Fix this by recognizing a WAKING task's vruntime as normalized only if
    sched_remote_wakeup is true. This indicates a migration, in which case
    the vruntime would have been normalized in migrate_task_rq_fair().

    Based on a similar patch from John Dias .

    Suggested-by: Peter Zijlstra
    Tested-by: Dietmar Eggemann
    Signed-off-by: Steve Muckle
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Chris Redpath
    Cc: John Dias
    Cc: Linus Torvalds
    Cc: Miguel de Dios
    Cc: Morten Rasmussen
    Cc: Patrick Bellasi
    Cc: Paul Turner
    Cc: Quentin Perret
    Cc: Thomas Gleixner
    Cc: Todd Kjos
    Cc: kernel-team@android.com
    Fixes: b5179ac70de8 ("sched/fair: Prepare to fix fairness problems on migration")
    Link: http://lkml.kernel.org/r/20180831224217.169476-1-smuckle@google.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Steve Muckle
     
  • commit 83f365554e47997ec68dc4eca3f5dce525cd15c3 upstream.

    When reducing ring buffer size, pages are removed by scheduling a work
    item on each CPU for the corresponding CPU ring buffer. After the pages
    are removed from ring buffer linked list, the pages are free()d in a
    tight loop. The loop does not give up CPU until all pages are removed.
    In a worst case behavior, when lot of pages are to be freed, it can
    cause system stall.

    After the pages are removed from the list, the free() can happen while
    the work is rescheduled. Call cond_resched() in the loop to prevent the
    system hangup.

    Link: http://lkml.kernel.org/r/20180907223129.71994-1-vnagarnaik@google.com

    Cc: stable@vger.kernel.org
    Fixes: 83f40318dab00 ("ring-buffer: Make removal of ring buffer pages atomic")
    Reported-by: Jason Behmer
    Signed-off-by: Vaibhav Nagarnaik
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Greg Kroah-Hartman

    Vaibhav Nagarnaik
     

26 Sep, 2018

4 commits

  • [ Upstream commit 8fe5c5a937d0f4e84221631833a2718afde52285 ]

    When a new task wakes-up for the first time, its initial utilization
    is set to half of the spare capacity of its CPU. The current
    implementation of post_init_entity_util_avg() uses SCHED_CAPACITY_SCALE
    directly as a capacity reference. As a result, on a big.LITTLE system, a
    new task waking up on an idle little CPU will be given ~512 of util_avg,
    even if the CPU's capacity is significantly less than that.

    Fix this by computing the spare capacity with arch_scale_cpu_capacity().

    Signed-off-by: Quentin Perret
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Vincent Guittot
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: dietmar.eggemann@arm.com
    Cc: morten.rasmussen@arm.com
    Cc: patrick.bellasi@arm.com
    Link: http://lkml.kernel.org/r/20180612112215.25448-1-quentin.perret@arm.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Quentin Perret
     
  • [ Upstream commit 76e079fefc8f62bd9b2cd2950814d1ee806e31a5 ]

    wake_woken_function() synchronizes with wait_woken() as follows:

    [wait_woken] [wake_woken_function]

    entry->flags &= ~wq_flag_woken; condition = true;
    smp_mb(); smp_wmb();
    if (condition) wq_entry->flags |= wq_flag_woken;
    break;

    This commit replaces the above smp_wmb() with an smp_mb() in order to
    guarantee that either wait_woken() sees the wait condition being true
    or the store to wq_entry->flags in woken_wake_function() follows the
    store in wait_woken() in the coherence order (so that the former can
    eventually be observed by wait_woken()).

    The commit also fixes a comment associated to set_current_state() in
    wait_woken(): the comment pairs the barrier in set_current_state() to
    the above smp_wmb(), while the actual pairing involves the barrier in
    set_current_state() and the barrier executed by the try_to_wake_up()
    in wake_woken_function().

    Signed-off-by: Andrea Parri
    Signed-off-by: Paul E. McKenney
    Acked-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: akiyks@gmail.com
    Cc: boqun.feng@gmail.com
    Cc: dhowells@redhat.com
    Cc: j.alglave@ucl.ac.uk
    Cc: linux-arch@vger.kernel.org
    Cc: luc.maranget@inria.fr
    Cc: npiggin@gmail.com
    Cc: parri.andrea@gmail.com
    Cc: stern@rowland.harvard.edu
    Cc: will.deacon@arm.com
    Link: http://lkml.kernel.org/r/20180716180605.16115-10-paulmck@linux.vnet.ibm.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Andrea Parri
     
  • [ Upstream commit baa2a4fdd525c8c4b0f704d20457195b29437839 ]

    audit_add_watch stores locally krule->watch without taking a reference
    on watch. Then, it calls audit_add_to_parent, and uses the watch stored
    locally.

    Unfortunately, it is possible that audit_add_to_parent updates
    krule->watch.
    When it happens, it also drops a reference of watch which
    could free the watch.

    How to reproduce (with KASAN enabled):

    auditctl -w /etc/passwd -F success=0 -k test_passwd
    auditctl -w /etc/passwd -F success=1 -k test_passwd2

    The second call to auditctl triggers the use-after-free, because
    audit_to_parent updates krule->watch to use a previous existing watch
    and drops the reference to the newly created watch.

    To fix the issue, we grab a reference of watch and we release it at the
    end of the function.

    Signed-off-by: Ronny Chevalier
    Reviewed-by: Richard Guy Briggs
    Signed-off-by: Paul Moore
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Ronny Chevalier
     
  • commit 02e184476eff848273826c1d6617bb37e5bcc7ad upstream.

    Perf can record user stack data in response to a synchronous request, such
    as a tracepoint firing. If this happens under set_fs(KERNEL_DS), then we
    end up reading user stack data using __copy_from_user_inatomic() under
    set_fs(KERNEL_DS). I think this conflicts with the intention of using
    set_fs(KERNEL_DS). And it is explicitly forbidden by hardware on ARM64
    when both CONFIG_ARM64_UAO and CONFIG_ARM64_PAN are used.

    So fix this by forcing USER_DS when recording user stack data.

    Signed-off-by: Yabin Cui
    Acked-by: Peter Zijlstra (Intel)
    Cc:
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Fixes: 88b0193d9418 ("perf/callchain: Force USER_DS when invoking perf_callchain_user()")
    Link: http://lkml.kernel.org/r/20180823225935.27035-1-yabinc@google.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Yabin Cui
     

20 Sep, 2018

3 commits

  • [ Upstream commit 363e934d8811d799c88faffc5bfca782fd728334 ]

    timer_base::must_forward_clock is indicating that the base clock might be
    stale due to a long idle sleep.

    The forwarding of the base clock takes place in the timer softirq or when a
    timer is enqueued to a base which is idle. If the enqueue of timer to an
    idle base happens from a remote CPU, then the following race can happen:

    CPU0 CPU1
    run_timer_softirq mod_timer

    base = lock_timer_base(timer);
    base->must_forward_clk = false
    if (base->must_forward_clk)
    forward(base); -> skipped

    enqueue_timer(base, timer, idx);
    -> idx is calculated high due to
    stale base
    unlock_timer_base(timer);
    base = lock_timer_base(timer);
    forward(base);

    The root cause is that timer_base::must_forward_clk is cleared outside the
    timer_base::lock held region, so the remote queuing CPU observes it as
    cleared, but the base clock is still stale. This can cause large
    granularity values for timers, i.e. the accuracy of the expiry time
    suffers.

    Prevent this by clearing the flag with timer_base::lock held, so that the
    forwarding takes place before the cleared flag is observable by a remote
    CPU.

    Signed-off-by: Gaurav Kohli
    Signed-off-by: Thomas Gleixner
    Cc: john.stultz@linaro.org
    Cc: sboyd@kernel.org
    Cc: linux-arm-msm@vger.kernel.org
    Link: https://lkml.kernel.org/r/1533199863-22748-1-git-send-email-gkohli@codeaurora.org
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Gaurav Kohli
     
  • commit 69fa6eb7d6a64801ea261025cce9723d9442d773 upstream.

    When a teardown callback fails, the CPU hotplug code brings the CPU back to
    the previous state. The previous state becomes the new target state. The
    rollback happens in undo_cpu_down() which increments the state
    unconditionally even if the state is already the same as the target.

    As a consequence the next CPU hotplug operation will start at the wrong
    state. This is easily to observe when __cpu_disable() fails.

    Prevent the unconditional undo by checking the state vs. target before
    incrementing state and fix up the consequently wrong conditional in the
    unplug code which handles the failure of the final CPU take down on the
    control CPU side.

    Fixes: 4dddfb5faa61 ("smp/hotplug: Rewrite AP state machine core")
    Reported-by: Neeraj Upadhyay
    Signed-off-by: Thomas Gleixner
    Tested-by: Geert Uytterhoeven
    Tested-by: Sudeep Holla
    Tested-by: Neeraj Upadhyay
    Cc: josh@joshtriplett.org
    Cc: peterz@infradead.org
    Cc: jiangshanlai@gmail.com
    Cc: dzickus@redhat.com
    Cc: brendan.jackman@arm.com
    Cc: malat@debian.org
    Cc: sramana@codeaurora.org
    Cc: linux-arm-msm@vger.kernel.org
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1809051419580.1416@nanos.tec.linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    ----

    Thomas Gleixner
     
  • commit f8b7530aa0a1def79c93101216b5b17cf408a70a upstream.

    The smp_mb() in cpuhp_thread_fun() is misplaced. It needs to be after the
    load of st->should_run to prevent reordering of the later load/stores
    w.r.t. the load of st->should_run.

    Fixes: 4dddfb5faa61 ("smp/hotplug: Rewrite AP state machine core")
    Signed-off-by: Neeraj Upadhyay
    Signed-off-by: Thomas Gleixner
    Acked-by: Peter Zijlstra (Intel)
    Cc: josh@joshtriplett.org
    Cc: peterz@infradead.org
    Cc: jiangshanlai@gmail.com
    Cc: dzickus@redhat.com
    Cc: brendan.jackman@arm.com
    Cc: malat@debian.org
    Cc: mojha@codeaurora.org
    Cc: sramana@codeaurora.org
    Cc: linux-arm-msm@vger.kernel.org
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/1536126727-11629-1-git-send-email-neeraju@codeaurora.org
    Signed-off-by: Greg Kroah-Hartman

    Neeraj Upadhyay
     

15 Sep, 2018

3 commits

  • commit 77dd66a3c67c93ab401ccc15efff25578be281fd upstream.

    If devm_memremap_pages() detects a collision while adding entries
    to the radix-tree, we call pgmap_radix_release(). Unfortunately,
    the function removes *all* entries for the range -- including the
    entries that caused the collision in the first place.

    Modify pgmap_radix_release() to take an additional argument to
    indicate where to stop, so that only newly added entries are removed
    from the tree.

    Cc:
    Fixes: 9476df7d80df ("mm: introduce find_dev_pagemap()")
    Signed-off-by: Jan H. Schönherr
    Signed-off-by: Dan Williams
    Signed-off-by: Sudip Mukherjee
    Signed-off-by: Greg Kroah-Hartman

    Jan H. Schönherr
     
  • commit 295d6d5e373607729bcc8182c25afe964655714f upstream.

    Fix a bug introduced in:

    72f9f3fdc928 ("sched/deadline: Remove dl_new from struct sched_dl_entity")

    After that commit, when switching to -deadline if the scheduling
    deadline of a task is in the past then switched_to_dl() calls
    setup_new_entity() to properly initialize the scheduling deadline
    and runtime.

    The problem is that the task is enqueued _before_ having its parameters
    initialized by setup_new_entity(), and this can cause problems.
    For example, a task with its out-of-date deadline in the past will
    potentially be enqueued as the highest priority one; however, its
    adjusted deadline may not be the earliest one.

    This patch fixes the problem by initializing the task's parameters before
    enqueuing it.

    Signed-off-by: luca abeni
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Daniel Bristot de Oliveira
    Cc: Juri Lelli
    Cc: Linus Torvalds
    Cc: Mathieu Poirier
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1504778971-13573-3-git-send-email-luca.abeni@santannapisa.it
    Signed-off-by: Ingo Molnar
    Signed-off-by: Sudip Mukherjee
    Signed-off-by: Greg Kroah-Hartman

    Luca Abeni
     
  • [ Upstream commit 06e62a46bbba20aa5286102016a04214bb446141 ]

    Before this change, if a multithreaded process forks while one of its
    threads is changing a signal handler using sigaction(), the memcpy() in
    copy_sighand() can race with the struct assignment in do_sigaction(). It
    isn't clear whether this can cause corruption of the userspace signal
    handler pointer, but it definitely can cause inconsistency between
    different fields of struct sigaction.

    Take the appropriate spinlock to avoid this.

    I have tested that this patch prevents inconsistency between sa_sigaction
    and sa_flags, which is possible before this patch.

    Link: http://lkml.kernel.org/r/20180702145108.73189-1-jannh@google.com
    Signed-off-by: Jann Horn
    Acked-by: Michal Hocko
    Reviewed-by: Andrew Morton
    Cc: Rik van Riel
    Cc: "Peter Zijlstra (Intel)"
    Cc: Kees Cook
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Jann Horn
     

10 Sep, 2018

8 commits

  • commit 5820f140edef111a9ea2ef414ab2428b8cb805b1 upstream.

    The old code would hold the userns_state_mutex indefinitely if
    memdup_user_nul stalled due to e.g. a userfault region. Prevent that by
    moving the memdup_user_nul in front of the mutex_lock().

    Note: This changes the error precedence of invalid buf/count/*ppos vs
    map already written / capabilities missing.

    Fixes: 22d917d80e84 ("userns: Rework the user_namespace adding uid/gid...")
    Cc: stable@vger.kernel.org
    Signed-off-by: Jann Horn
    Acked-by: Christian Brauner
    Acked-by: Serge Hallyn
    Signed-off-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Jann Horn
     
  • commit 42a0cc3478584d4d63f68f2f5af021ddbea771fa upstream.

    Holding uts_sem as a writer while accessing userspace memory allows a
    namespace admin to stall all processes that attempt to take uts_sem.
    Instead, move data through stack buffers and don't access userspace memory
    while uts_sem is held.

    Cc: stable@vger.kernel.org
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Jann Horn
    Signed-off-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Jann Horn
     
  • commit 3df6f61fff49632492490fb6e42646b803a9958a upstream.

    Commit ea0212f40c6 (power: auto select CONFIG_SRCU) made the code in
    drivers/base/power/wakeup.c use SRCU instead of RCU, but it forgot to
    select CONFIG_SRCU in Kconfig, which leads to the following build
    error if CONFIG_SRCU is not selected somewhere else:

    drivers/built-in.o: In function `wakeup_source_remove':
    (.text+0x3c6fc): undefined reference to `synchronize_srcu'
    drivers/built-in.o: In function `pm_print_active_wakeup_sources':
    (.text+0x3c7a8): undefined reference to `__srcu_read_lock'
    drivers/built-in.o: In function `pm_print_active_wakeup_sources':
    (.text+0x3c84c): undefined reference to `__srcu_read_unlock'
    drivers/built-in.o: In function `device_wakeup_arm_wake_irqs':
    (.text+0x3d1d8): undefined reference to `__srcu_read_lock'
    drivers/built-in.o: In function `device_wakeup_arm_wake_irqs':
    (.text+0x3d228): undefined reference to `__srcu_read_unlock'
    drivers/built-in.o: In function `device_wakeup_disarm_wake_irqs':
    (.text+0x3d24c): undefined reference to `__srcu_read_lock'
    drivers/built-in.o: In function `device_wakeup_disarm_wake_irqs':
    (.text+0x3d29c): undefined reference to `__srcu_read_unlock'
    drivers/built-in.o:(.data+0x4158): undefined reference to `process_srcu'

    Fix this error by selecting CONFIG_SRCU when PM_SLEEP is enabled.

    Fixes: ea0212f40c6 (power: auto select CONFIG_SRCU)
    Cc: 4.2+ # 4.2+
    Signed-off-by: zhangyi (F)
    [ rjw: Minor subject/changelog fixups ]
    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Greg Kroah-Hartman

    zhangyi (F)
     
  • commit 016f8ffc48cb01d1e7701649c728c5d2e737d295 upstream.

    While debugging another bug, I was looking at all the synchronize*()
    functions being used in kernel/trace, and noticed that trace_uprobes was
    using synchronize_sched(), with a comment to synchronize with
    {u,ret}_probe_trace_func(). When looking at those functions, the data is
    protected with "rcu_read_lock()" and not with "rcu_read_lock_sched()". This
    is using the wrong synchronize_*() function.

    Link: http://lkml.kernel.org/r/20180809160553.469e1e32@gandalf.local.home

    Cc: stable@vger.kernel.org
    Fixes: 70ed91c6ec7f8 ("tracing/uprobes: Support ftrace_event_file base multibuffer")
    Acked-by: Oleg Nesterov
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Greg Kroah-Hartman

    Steven Rostedt (VMware)
     
  • commit 6e9df95b76cad18f7b217bdad7bb8a26d63b8c47 upstream.

    livepatch module author can pass module name/old function name with more
    than the defined character limit. With obj->name length greater than
    MODULE_NAME_LEN, the livepatch module gets loaded but waits forever on
    the module specified by obj->name to be loaded. It also populates a /sys
    directory with an untruncated object name.

    In the case of funcs->old_name length greater then KSYM_NAME_LEN, it
    would not match against any of the symbol table entries. Instead loop
    through the symbol table comparing them against a nonexisting function,
    which can be avoided.

    The same issues apply, to misspelled/incorrect names. At least gatekeep
    the modules with over the limit string length, by checking for their
    length during livepatch module registration.

    Cc: stable@vger.kernel.org
    Signed-off-by: Kamalesh Babulal
    Acked-by: Josh Poimboeuf
    Signed-off-by: Jiri Kosina
    Signed-off-by: Greg Kroah-Hartman

    Kamalesh Babulal
     
  • commit d1c392c9e2a301f38998a353f467f76414e38725 upstream.

    I hit the following splat in my tests:

    ------------[ cut here ]------------
    IRQs not enabled as expected
    WARNING: CPU: 3 PID: 0 at kernel/time/tick-sched.c:982 tick_nohz_idle_enter+0x44/0x8c
    Modules linked in: ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables ipv6
    CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.0-rc2-test+ #2
    Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
    EIP: tick_nohz_idle_enter+0x44/0x8c
    Code: ec 05 00 00 00 75 26 83 b8 c0 05 00 00 00 75 1d 80 3d d0 36 3e c1 00
    75 14 68 94 63 12 c1 c6 05 d0 36 3e c1 01 e8 04 ee f8 ff 0b 58 fa bb a0
    e5 66 c1 e8 25 0f 04 00 64 03 1d 28 31 52 c1 8b
    EAX: 0000001c EBX: f26e7f8c ECX: 00000006 EDX: 00000007
    ESI: f26dd1c0 EDI: 00000000 EBP: f26e7f40 ESP: f26e7f38
    DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010296
    CR0: 80050033 CR2: 0813c6b0 CR3: 2f342000 CR4: 001406f0
    Call Trace:
    do_idle+0x33/0x202
    cpu_startup_entry+0x61/0x63
    start_secondary+0x18e/0x1ed
    startup_32_smp+0x164/0x168
    irq event stamp: 18773830
    hardirqs last enabled at (18773829): [] trace_hardirqs_on_thunk+0xc/0x10
    hardirqs last disabled at (18773830): [] trace_hardirqs_off_thunk+0xc/0x10
    softirqs last enabled at (18773824): [] __do_softirq+0x25f/0x2bf
    softirqs last disabled at (18773767): [] call_on_stack+0x45/0x4b
    ---[ end trace b7c64aa79e17954a ]---

    After a bit of debugging, I found what was happening. This would trigger
    when performing "perf" with a high NMI interrupt rate, while enabling and
    disabling function tracer. Ftrace uses breakpoints to convert the nops at
    the start of functions to calls to the function trampolines. The breakpoint
    traps disable interrupts and this makes calls into lockdep via the
    trace_hardirqs_off_thunk in the entry.S code. What happens is the following:

    do_idle {

    [interrupts enabled]

    [interrupts disabled]
    TRACE_IRQS_OFF [lockdep says irqs off]
    [...]
    TRACE_IRQS_IRET
    test if pt_regs say return to interrupts enabled [yes]
    TRACE_IRQS_ON [lockdep says irqs are on]


    nmi_enter() {
    printk_nmi_enter() [traced by ftrace]
    [ hit ftrace breakpoint ]

    TRACE_IRQS_OFF [lockdep says irqs off]
    [...]
    TRACE_IRQS_IRET [return from breakpoint]
    test if pt_regs say interrupts enabled [no]
    [iret back to interrupt]
    [iret back to code]

    tick_nohz_idle_enter() {

    lockdep_assert_irqs_enabled() [lockdep say no!]

    Although interrupts are indeed enabled, lockdep thinks it is not, and since
    we now do asserts via lockdep, it gives a false warning. The issue here is
    that printk_nmi_enter() is called before lockdep_off(), which disables
    lockdep (for this reason) in NMIs. By simply not allowing ftrace to see
    printk_nmi_enter() (via notrace annotation) we keep lockdep from getting
    confused.

    Cc: stable@vger.kernel.org
    Fixes: 42a0bb3f71383 ("printk/nmi: generic solution for safe printk in NMI")
    Acked-by: Sergey Senozhatsky
    Acked-by: Petr Mladek
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Greg Kroah-Hartman

    Steven Rostedt (VMware)
     
  • commit 757d9140072054528b13bbe291583d9823cde195 upstream.

    Masami Hiramatsu reported:

    Current trace-enable attribute in sysfs returns an error
    if user writes the same setting value as current one,
    e.g.

    # cat /sys/block/sda/trace/enable
    0
    # echo 0 > /sys/block/sda/trace/enable
    bash: echo: write error: Invalid argument
    # echo 1 > /sys/block/sda/trace/enable
    # echo 1 > /sys/block/sda/trace/enable
    bash: echo: write error: Device or resource busy

    But this is not a preferred behavior, it should ignore
    if new setting is same as current one. This fixes the
    problem as below.

    # cat /sys/block/sda/trace/enable
    0
    # echo 0 > /sys/block/sda/trace/enable
    # echo 1 > /sys/block/sda/trace/enable
    # echo 1 > /sys/block/sda/trace/enable

    Link: http://lkml.kernel.org/r/20180816103802.08678002@gandalf.local.home

    Cc: Ingo Molnar
    Cc: Jens Axboe
    Cc: linux-block@vger.kernel.org
    Cc: stable@vger.kernel.org
    Fixes: cd649b8bb830d ("blktrace: remove sysfs_blk_trace_enable_show/store()")
    Reported-by: Masami Hiramatsu
    Tested-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Steven Rostedt (VMware)
     
  • commit f143641bfef9a4a60c57af30de26c63057e7e695 upstream.

    Currently, when one echo's in 1 into tracing_on, the current tracer's
    "start()" function is executed, even if tracing_on was already one. This can
    lead to strange side effects. One being that if the hwlat tracer is enabled,
    and someone does "echo 1 > tracing_on" into tracing_on, the hwlat tracer's
    start() function is called again which will recreate another kernel thread,
    and make it unable to remove the old one.

    Link: http://lkml.kernel.org/r/1533120354-22923-1-git-send-email-erica.bugden@linutronix.de

    Cc: stable@vger.kernel.org
    Fixes: 2df8f8a6a897e ("tracing: Fix regression with irqsoff tracer and tracing_on file")
    Reported-by: Erica Bugden
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Greg Kroah-Hartman

    Steven Rostedt (VMware)
     

05 Sep, 2018

8 commits

  • commit cb9d7fd51d9fbb329d182423bd7b92d0f8cb0e01 upstream.

    Some architectures need to use stop_machine() to patch functions for
    ftrace, and the assumption is that the stopped CPUs do not make function
    calls to traceable functions when they are in the stopped state.

    Commit ce4f06dcbb5d ("stop_machine: Touch_nmi_watchdog() after
    MULTI_STOP_PREPARE") added calls to the watchdog touch functions from
    the stopped CPUs and those functions lack notrace annotations. This
    leads to crashes when enabling/disabling ftrace on ARM kernels built
    with the Thumb-2 instruction set.

    Fix it by adding the necessary notrace annotations.

    Fixes: ce4f06dcbb5d ("stop_machine: Touch_nmi_watchdog() after MULTI_STOP_PREPARE")
    Signed-off-by: Vincent Whitchurch
    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: oleg@redhat.com
    Cc: tj@kernel.org
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20180821152507.18313-1-vincent.whitchurch@axis.com
    Signed-off-by: Greg Kroah-Hartman

    Vincent Whitchurch
     
  • commit f2a3ab36077222437b4826fc76111caa14562b7c upstream.

    Since the blacklist and list files on debugfs indicates
    a sensitive address information to reader, it should be
    restricted to the root user.

    Suggested-by: Thomas Richter
    Suggested-by: Ingo Molnar
    Signed-off-by: Masami Hiramatsu
    Cc: Ananth N Mavinakayanahalli
    Cc: Anil S Keshavamurthy
    Cc: Arnd Bergmann
    Cc: David Howells
    Cc: David S . Miller
    Cc: Heiko Carstens
    Cc: Jon Medhurst
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Tobin C . Harding
    Cc: Will Deacon
    Cc: acme@kernel.org
    Cc: akpm@linux-foundation.org
    Cc: brueckner@linux.vnet.ibm.com
    Cc: linux-arch@vger.kernel.org
    Cc: rostedt@goodmis.org
    Cc: schwidefsky@de.ibm.com
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/lkml/152491890171.9916.5183693615601334087.stgit@devbox
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Masami Hiramatsu
     
  • commit cfd355145c32bb7ccb65fccbe2d67280dc2119e1 upstream.

    When cpu_stop_queue_work() releases the lock for the stopper
    thread that was queued into its wake queue, preemption is
    enabled, which leads to the following deadlock:

    CPU0 CPU1
    sched_setaffinity(0, ...)
    __set_cpus_allowed_ptr()
    stop_one_cpu(0, ...) stop_two_cpus(0, 1, ...)
    cpu_stop_queue_work(0, ...) cpu_stop_queue_two_works(0, ..., 1, ...)

    -grabs lock for migration/0-
    -spins with preemption disabled,
    waiting for migration/0's lock to be
    released-

    -adds work items for migration/0
    and queues migration/0 to its
    wake_q-

    -releases lock for migration/0
    and preemption is enabled-

    -current thread is preempted,
    and __set_cpus_allowed_ptr
    has changed the thread's
    cpu allowed mask to CPU1 only-

    -acquires migration/0 and migration/1's
    locks-

    -adds work for migration/0 but does not
    add migration/0 to wake_q, since it is
    already in a wake_q-

    -adds work for migration/1 and adds
    migration/1 to its wake_q-

    -releases migration/0 and migration/1's
    locks, wakes migration/1, and enables
    preemption-

    -since migration/1 is requested to run,
    migration/1 begins to run and waits on
    migration/0, but migration/0 will never
    be able to run, since the thread that
    can wake it is affine to CPU1-

    Disable preemption in cpu_stop_queue_work() before queueing works for
    stopper threads, and queueing the stopper thread in the wake queue, to
    ensure that the operation of queueing the works and waking the stopper
    threads is atomic.

    Fixes: 0b26351b910f ("stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock")
    Signed-off-by: Prasad Sodagudi
    Signed-off-by: Isaac J. Manjarres
    Signed-off-by: Thomas Gleixner
    Cc: peterz@infradead.org
    Cc: matt@codeblueprint.co.uk
    Cc: bigeasy@linutronix.de
    Cc: gregkh@linuxfoundation.org
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/1533329766-4856-1-git-send-email-isaacm@codeaurora.org
    Signed-off-by: Greg Kroah-Hartman

    Co-Developed-by: Isaac J. Manjarres

    Prasad Sodagudi
     
  • commit b80a2bfce85e1051056d98d04ecb2d0b55cbbc1c upstream.

    The code flow in cpu_stop_queue_two_works() is a little arcane; fix this by
    lifting the preempt_disable() to the top to create more natural nesting wrt
    the spinlocks and make the wake_up_q() and preempt_enable() unconditional
    at the end.

    Furthermore, enable preemption in the -EDEADLK case, such that we spin-wait
    with preemption enabled.

    Suggested-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Thomas Gleixner
    Cc: Sebastian Andrzej Siewior
    Cc: isaacm@codeaurora.org
    Cc: matt@codeblueprint.co.uk
    Cc: psodagud@codeaurora.org
    Cc: gregkh@linuxfoundation.org
    Cc: pkondeti@codeaurora.org
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20180730112140.GH2494@hirez.programming.kicks-ass.net
    Signed-off-by: Greg Kroah-Hartman

    Peter Zijlstra
     
  • commit 03fc7f9c99c1e7ae2925d459e8487f1a6f199f79 upstream.

    The commit 719f6a7040f1bdaf96 ("printk: Use the main logbuf in NMI
    when logbuf_lock is available") brought back the possible deadlocks
    in printk() and NMI.

    The check of logbuf_lock is done only in printk_nmi_enter() to prevent
    mixed output. But another CPU might take the lock later, enter NMI, and:

    + Both NMIs might be serialized by yet another lock, for example,
    the one in nmi_cpu_backtrace().

    + The other CPU might get stopped in NMI, see smp_send_stop()
    in panic().

    The only safe solution is to use trylock when storing the message
    into the main log-buffer. It might cause reordering when some lines
    go to the main lock buffer directly and others are delayed via
    the per-CPU buffer. It means that it is not useful in general.

    This patch replaces the problematic NMI deferred context with NMI
    direct context. It can be used to mark a code that might produce
    many messages in NMI and the risk of losing them is more critical
    than problems with eventual reordering.

    The context is then used when dumping trace buffers on oops. It was
    the primary motivation for the original fix. Also the reordering is
    even smaller issue there because some traces have their own time stamps.

    Finally, nmi_cpu_backtrace() need not longer be serialized because
    it will always us the per-CPU buffers again.

    Fixes: 719f6a7040f1bdaf96 ("printk: Use the main logbuf in NMI when logbuf_lock is available")
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/20180627142028.11259-1-pmladek@suse.com
    To: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Tetsuo Handa
    Cc: Sergey Senozhatsky
    Cc: linux-kernel@vger.kernel.org
    Cc: stable@vger.kernel.org
    Acked-by: Sergey Senozhatsky
    Signed-off-by: Petr Mladek
    Signed-off-by: Greg Kroah-Hartman

    Petr Mladek
     
  • commit a338f84dc196f44b63ba0863d2f34fd9b1613572 upstream.

    It is just a preparation step. The patch does not change
    the existing behavior.

    Link: http://lkml.kernel.org/r/20180627140817.27764-3-pmladek@suse.com
    To: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Tetsuo Handa
    Cc: Sergey Senozhatsky
    Cc: linux-kernel@vger.kernel.org
    Cc: stable@vger.kernel.org
    Acked-by: Sergey Senozhatsky
    Signed-off-by: Petr Mladek
    Signed-off-by: Greg Kroah-Hartman

    Petr Mladek
     
  • commit ba552399954dde1b388f7749fecad5c349216981 upstream.

    It is just a preparation step. The patch does not change
    the existing behavior.

    Link: http://lkml.kernel.org/r/20180627140817.27764-2-pmladek@suse.com
    To: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Tetsuo Handa
    Cc: Sergey Senozhatsky
    Cc: linux-kernel@vger.kernel.org
    Cc: stable@vger.kernel.org
    Acked-by: Sergey Senozhatsky
    Signed-off-by: Petr Mladek
    Signed-off-by: Greg Kroah-Hartman

    Petr Mladek
     
  • [ Upstream commit f3d133ee0a17d5694c6f21873eec9863e11fa423 ]

    NO_RT_RUNTIME_SHARE feature is used to prevent a CPU borrow enough
    runtime with a spin-rt-task.

    However, if RT_RUNTIME_SHARE feature is enabled and rt_rq has borrowd
    enough rt_runtime at the beginning, rt_runtime can't be restored to
    its initial bandwidth rt_runtime after we disable RT_RUNTIME_SHARE.

    E.g. on my PC with 4 cores, procedure to reproduce:
    1) Make sure RT_RUNTIME_SHARE is enabled
    cat /sys/kernel/debug/sched_features
    GENTLE_FAIR_SLEEPERS START_DEBIT NO_NEXT_BUDDY LAST_BUDDY
    CACHE_HOT_BUDDY WAKEUP_PREEMPTION NO_HRTICK NO_DOUBLE_TICK
    LB_BIAS NONTASK_CAPACITY TTWU_QUEUE NO_SIS_AVG_CPU SIS_PROP
    NO_WARN_DOUBLE_CLOCK RT_PUSH_IPI RT_RUNTIME_SHARE NO_LB_MIN
    ATTACH_AGE_LOAD WA_IDLE WA_WEIGHT WA_BIAS
    2) Start a spin-rt-task
    ./loop_rr &
    3) set affinity to the last cpu
    taskset -p 8 $pid_of_loop_rr
    4) Observe that last cpu have borrowed enough runtime.
    cat /proc/sched_debug | grep rt_runtime
    .rt_runtime : 950.000000
    .rt_runtime : 900.000000
    .rt_runtime : 950.000000
    .rt_runtime : 1000.000000
    5) Disable RT_RUNTIME_SHARE
    echo NO_RT_RUNTIME_SHARE > /sys/kernel/debug/sched_features
    6) Observe that rt_runtime can not been restored
    cat /proc/sched_debug | grep rt_runtime
    .rt_runtime : 950.000000
    .rt_runtime : 900.000000
    .rt_runtime : 950.000000
    .rt_runtime : 1000.000000

    This patch help to restore rt_runtime after we disable
    RT_RUNTIME_SHARE.

    Signed-off-by: Hailong Liu
    Signed-off-by: Jiang Biao
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: zhong.weidong@zte.com.cn
    Link: http://lkml.kernel.org/r/1531874815-39357-1-git-send-email-liu.hailong6@zte.com.cn
    Signed-off-by: Ingo Molnar
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Hailong Liu