Eric Lee / smarc-fsl-linux-kernel

29 Oct, 2018

2 commits

3a9036886 clockevents: Retry programming min delta up to 10 times ... Browse Code »

When CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=n, the call path
hrtimer_reprogram -> clockevents_program_event ->
clockevents_program_min_delta will not retry if the clock event driver
returns -ETIME.

If the driver could not satisfy the program_min_delta for any reason, the
lack of a retry means the CPU may not receive a tick interrupt, potentially
until the counter does a full period. This leads to rcu_sched timeout
messages as the stalled CPU is detected by other CPUs, and other issues if
the CPU is holding locks or other resources at the point at which it
stalls.

There have been a couple of observed mechanisms through which a clock event
driver could not satisfy the requested min_delta and return -ETIME.

With the MIPS GIC driver, the shared execution resource within MT cores
means inconventient latency due to execution of instructions from other
hardware threads in the core, within gic_next_event, can result in an event
being set in the past.

Additionally under virtualisation it is possible to get unexpected latency
during a clockevent device's set_next_event() callback which can make it
return -ETIME even for a delta based on min_delta_ns.

It isn't appropriate to use MIN_ADJUST in the virtualisation case as
occasional hypervisor induced high latency will cause min_delta_ns to
quickly increase to the maximum.

Instead, borrow the retry pattern from the MIN_ADJUST case, but without
making adjustments. Retry up to 10 times, each time increasing the
attempted delta by min_delta, before giving up.

[ Matt: Reworked the loop and made retry increase the delta. ]

Signed-off-by: James Hogan
Signed-off-by: Matt Redfearn
Signed-off-by: Thomas Gleixner
Cc: linux-mips@linux-mips.org
Cc: Daniel Lezcano
Cc: "Martin Schwidefsky"
Cc: James Hogan
Link: https://lkml.kernel.org/r/1508422643-6075-1-git-send-email-matt.redfearn@mips.com

James Hogan
2018-10-29 11:10:38 +0800
860c1b982 cpufreq: Move gov_attr_* macros to cpufreq.h ... Browse Code »

These macros can be reused by governors which don't use the common
governor code present in cpufreq_governor.c and should be moved to the
relevant header.

Now that they are getting moved to the right header file, reuse them in
schedutil governor as well (that required rename of show/store
routines).

Also create gov_attr_wo() macro for write-only sysfs files, this will be
used by Interactive governor in a later patch.

Signed-off-by: Viresh Kumar

Viresh Kumar
2018-10-29 11:10:38 +0800

20 Oct, 2018

1 commit

2a797fd8f mm: disallow mappings that conflict for devm_memremap_pages() ... Browse Code »

commit 15d36fecd0bdc7510b70a0e5ec6671140b3fce0c upstream.

When pmem namespaces created are smaller than section size, this can
cause an issue during removal and gpf was observed:

general protection fault: 0000 1 SMP PTI
CPU: 36 PID: 3941 Comm: ndctl Tainted: G W 4.14.28-1.el7uek.x86_64 #2
task: ffff88acda150000 task.stack: ffffc900233a4000
RIP: 0010:__put_page+0x56/0x79
Call Trace:
devm_memremap_pages_release+0x155/0x23a
release_nodes+0x21e/0x260
devres_release_all+0x3c/0x48
device_release_driver_internal+0x15c/0x207
device_release_driver+0x12/0x14
unbind_store+0xba/0xd8
drv_attr_store+0x27/0x31
sysfs_kf_write+0x3f/0x46
kernfs_fop_write+0x10f/0x18b
__vfs_write+0x3a/0x16d
vfs_write+0xb2/0x1a1
SyS_write+0x55/0xb9
do_syscall_64+0x79/0x1ae
entry_SYSCALL_64_after_hwframe+0x3d/0x0

Add code to check whether we have a mapping already in the same section
and prevent additional mappings from being created if that is the case.

Link: http://lkml.kernel.org/r/152909478401.50143.312364396244072931.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dave Jiang
Cc: Dan Williams
Cc: Robert Elliott
Cc: Jeff Moyer
Cc: Matthew Wilcox
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Sudip Mukherjee
Signed-off-by: Greg Kroah-Hartman

Dave Jiang
2018-10-20 15:48:53 +0800

18 Oct, 2018

1 commit

bc183079d cgroup: Fix dom_cgrp propagation when enabling threaded mode ... Browse Code »

commit 479adb89a97b0a33e5a9d702119872cc82ca21aa upstream.

A cgroup which is already a threaded domain may be converted into a
threaded cgroup if the prerequisite conditions are met. When this
happens, all threaded descendant should also have their ->dom_cgrp
updated to the new threaded domain cgroup. Unfortunately, this
propagation was missing leading to the following failure.

# cd /sys/fs/cgroup/unified
# cat cgroup.subtree_control # show that no controllers are enabled

# mkdir -p mycgrp/a/b/c
# echo threaded > mycgrp/a/b/cgroup.type

At this point, the hierarchy looks as follows:

mycgrp [d]
a [dt]
b [t]
c [inv]

Now let's make node "a" threaded (and thus "mycgrp" s made "domain threaded"):

# echo threaded > mycgrp/a/cgroup.type

By this point, we now have a hierarchy that looks as follows:

mycgrp [dt]
a [t]
b [t]
c [inv]

But, when we try to convert the node "c" from "domain invalid" to
"threaded", we get ENOTSUP on the write():

# echo threaded > mycgrp/a/b/c/cgroup.type
sh: echo: write error: Operation not supported

This patch fixes the problem by

* Moving the opencoded ->dom_cgrp save and restoration in
cgroup_enable_threaded() into cgroup_{save|restore}_control() so
that mulitple cgroups can be handled.

* Updating all threaded descendants' ->dom_cgrp to point to the new
dom_cgrp when enabling threaded mode.

Signed-off-by: Tejun Heo
Reported-and-tested-by: "Michael Kerrisk (man-pages)"
Reported-by: Amin Jamali
Reported-by: Joao De Almeida Pereira
Link: https://lore.kernel.org/r/CAKgNAkhHYCMn74TCNiMJ=ccLd7DcmXSbvw3CbZ1YREeG7iJM5g@mail.gmail.com
Fixes: 454000adaa2a ("cgroup: introduce cgroup->dom_cgrp and threaded css_set handling")
Cc: stable@vger.kernel.org # v4.14+
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2018-10-18 15:16:24 +0800

13 Oct, 2018

1 commit

ab18409cf perf/core: Add sanity check to deal with pinned event failure ... Browse Code »

commit befb1b3c2703897c5b8ffb0044dc5d0e5f27c5d7 upstream.

It is possible that a failure can occur during the scheduling of a
pinned event. The initial portion of perf_event_read_local() contains
the various error checks an event should pass before it can be
considered valid. Ensure that the potential scheduling failure
of a pinned event is checked for and have a credible error.

Suggested-by: Peter Zijlstra
Signed-off-by: Reinette Chatre
Signed-off-by: Thomas Gleixner
Acked-by: Peter Zijlstra (Intel)
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: acme@kernel.org
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/6486385d1f30336e9973b24c8c65f5079543d3d3.1537377064.git.reinette.chatre@intel.com
Signed-off-by: Greg Kroah-Hartman

Reinette Chatre
2018-10-13 15:27:21 +0800

10 Oct, 2018

1 commit

10fdfea70 bpf: 32-bit RSH verification must truncate input before the ALU op ... Browse Code »

commit b799207e1e1816b09e7a5920fbb2d5fcf6edd681 upstream.

When I wrote commit 468f6eafa6c4 ("bpf: fix 32-bit ALU op verification"), I
assumed that, in order to emulate 64-bit arithmetic with 32-bit logic, it
is sufficient to just truncate the output to 32 bits; and so I just moved
the register size coercion that used to be at the start of the function to
the end of the function.

That assumption is true for almost every op, but not for 32-bit right
shifts, because those can propagate information towards the least
significant bit. Fix it by always truncating inputs for 32-bit ops to 32
bits.

Also get rid of the coerce_reg_to_size() after the ALU op, since that has
no effect.

Fixes: 468f6eafa6c4 ("bpf: fix 32-bit ALU op verification")
Acked-by: Daniel Borkmann
Signed-off-by: Jann Horn
Signed-off-by: Daniel Borkmann
Signed-off-by: Greg Kroah-Hartman

Jann Horn
2018-10-10 14:54:22 +0800

04 Oct, 2018

5 commits

92935e1c2 bpf: sockmap: write_space events need to be passed to TCP handler ... Browse Code »

[ Upstream commit 9b2e0388bec8ec5427403e23faff3b58dd1c3200 ]

When sockmap code is using the stream parser it also handles the write
space events in order to handle the case where (a) verdict redirects
skb to another socket and (b) the sockmap then sends the skb but due
to memory constraints (or other EAGAIN errors) needs to do a retry.

But the initial code missed a third case where the
skb_send_sock_locked() triggers an sk_wait_event(). A typically case
would be when sndbuf size is exceeded. If this happens because we
do not pass the write_space event to the lower layers we never wake
up the event and it will wait for sndtimeo. Which as noted in ktls
fix may be rather large and look like a hang to the user.

To reproduce the best test is to reduce the sndbuf size and send
1B data chunks to stress the memory handling. To fix this pass the
event from the upper layer to the lower layer.

Signed-off-by: John Fastabend
Signed-off-by: Daniel Borkmann
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

John Fastabend
2018-10-04 08:00:58 +0800
5bcbbadf6 module: exclude SHN_UNDEF symbols from kallsyms api ... Browse Code »

[ Upstream commit 9f2d1e68cf4d641def734adaccfc3823d3575e6c ]

Livepatch modules are special in that we preserve their entire symbol
tables in order to be able to apply relocations after module load. The
unwanted side effect of this is that undefined (SHN_UNDEF) symbols of
livepatch modules are accessible via the kallsyms api and this can
confuse symbol resolution in livepatch (klp_find_object_symbol()) and
cause subtle bugs in livepatch.

Have the module kallsyms api skip over SHN_UNDEF symbols. These symbols
are usually not available for normal modules anyway as we cut down their
symbol tables to just the core (non-undefined) symbols, so this should
really just affect livepatch modules. Note that this patch doesn't
affect the display of undefined symbols in /proc/kallsyms.

Reported-by: Josh Poimboeuf
Tested-by: Josh Poimboeuf
Reviewed-by: Josh Poimboeuf
Signed-off-by: Jessica Yu
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Jessica Yu
2018-10-04 08:00:53 +0800
3e3f075f7 posix-timers: Sanitize overrun handling ... Browse Code »

[ Upstream commit 78c9c4dfbf8c04883941445a195276bb4bb92c76 ]

The posix timer overrun handling is broken because the forwarding functions
can return a huge number of overruns which does not fit in an int. As a
consequence timer_getoverrun(2) and siginfo::si_overrun can turn into
random number generators.

The k_clock::timer_forward() callbacks return a 64 bit value now. Make
k_itimer::ti_overrun[_last] 64bit as well, so the kernel internal
accounting is correct. 3Remove the temporary (int) casts.

Add a helper function which clamps the overrun value returned to user space
via timer_getoverrun(2) or siginfo::si_overrun limited to a positive value
between 0 and INT_MAX. INT_MAX is an indicator for user space that the
overrun value has been clamped.

Reported-by: Team OWL337
Signed-off-by: Thomas Gleixner
Acked-by: John Stultz
Cc: Peter Zijlstra
Cc: Michael Kerrisk
Link: https://lkml.kernel.org/r/20180626132705.018623573@linutronix.de
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2018-10-04 08:00:50 +0800
a05bd4ba6 posix-timers: Make forward callback return s64 ... Browse Code »

[ Upstream commit 6fec64e1c92d5c715c6d0f50786daa7708266bde ]

The posix timer ti_overrun handling is broken because the forwarding
functions can return a huge number of overruns which does not fit in an
int. As a consequence timer_getoverrun(2) and siginfo::si_overrun can turn
into random number generators.

As a first step to address that let the timer_forward() callbacks return
the full 64 bit value.

Cast it to (int) temporarily until k_itimer::ti_overrun is converted to
64bit and the conversion to user space visible values is sanitized.

Reported-by: Team OWL337
Signed-off-by: Thomas Gleixner
Acked-by: John Stultz
Cc: Peter Zijlstra
Cc: Michael Kerrisk
Link: https://lkml.kernel.org/r/20180626132704.922098090@linutronix.de
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2018-10-04 08:00:50 +0800
a4dbaf7c2 alarmtimer: Prevent overflow for relative nanosleep ... Browse Code »

[ Upstream commit 5f936e19cc0ef97dbe3a56e9498922ad5ba1edef ]

Air Icy reported:

UBSAN: Undefined behaviour in kernel/time/alarmtimer.c:811:7
signed integer overflow:
1529859276030040771 + 9223372036854775807 cannot be represented in type 'long long int'
Call Trace:
alarm_timer_nsleep+0x44c/0x510 kernel/time/alarmtimer.c:811
__do_sys_clock_nanosleep kernel/time/posix-timers.c:1235 [inline]
__se_sys_clock_nanosleep kernel/time/posix-timers.c:1213 [inline]
__x64_sys_clock_nanosleep+0x326/0x4e0 kernel/time/posix-timers.c:1213
do_syscall_64+0xb8/0x3a0 arch/x86/entry/common.c:290

alarm_timer_nsleep() uses ktime_add() to add the current time and the
relative expiry value. ktime_add() has no sanity checks so the addition
can overflow when the relative timeout is large enough.

Use ktime_add_safe() which has the necessary sanity checks in place and
limits the result to the valid range.

Fixes: 9a7adcf5c6de ("timers: Posix interface for alarm-timers")
Reported-by: Team OWL337
Signed-off-by: Thomas Gleixner
Cc: John Stultz
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1807020926360.1595@nanos.tec.linutronix.de
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2018-10-04 08:00:49 +0800

29 Sep, 2018

3 commits

ed5e9462f tick/nohz: Prevent bogus softirq pending warning ... Browse Code »

Commit 0a0e0829f990 ("nohz: Fix missing tick reprogram when interrupting an
inline softirq") got backported to stable trees and now causes the NOHZ
softirq pending warning to trigger. It's not an upstream issue as the NOHZ
update logic has been changed there.

The problem is when a softirq disabled section gets interrupted and on
return from interrupt the tick/nohz state is evaluated, which then can
observe pending soft interrupts. These soft interrupts are legitimately
pending because they cannot be processed as long as soft interrupts are
disabled and the interrupted code will correctly process them when soft
interrupts are reenabled.

Add a check for softirqs disabled to the pending check to prevent the
warning.

Reported-by: Grygorii Strashko
Reported-by: John Crispin
Signed-off-by: Thomas Gleixner
Tested-by: Grygorii Strashko
Tested-by: John Crispin
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Anna-Maria Gleixner
Cc: stable@vger.kernel.org
Fixes: 2d898915ccf4838c ("nohz: Fix missing tick reprogram when interrupting an inline softirq")
Acked-by: Frederic Weisbecker
Tested-by: Geert Uytterhoeven

Thomas Gleixner
2018-09-29 18:06:07 +0800
fe87d18b1 sched/fair: Fix vruntime_normalized() for remote non-migration wakeup ... Browse Code »

commit d0cdb3ce8834332d918fc9c8ff74f8a169ec9abe upstream.

When a task which previously ran on a given CPU is remotely queued to
wake up on that same CPU, there is a period where the task's state is
TASK_WAKING and its vruntime is not normalized. This is not accounted
for in vruntime_normalized() which will cause an error in the task's
vruntime if it is switched from the fair class during this time.

For example if it is boosted to RT priority via rt_mutex_setprio(),
rq->min_vruntime will not be subtracted from the task's vruntime but
it will be added again when the task returns to the fair class. The
task's vruntime will have been erroneously doubled and the effective
priority of the task will be reduced.

Note this will also lead to inflation of all vruntimes since the doubled
vruntime value will become the rq's min_vruntime when other tasks leave
the rq. This leads to repeated doubling of the vruntime and priority
penalty.

Fix this by recognizing a WAKING task's vruntime as normalized only if
sched_remote_wakeup is true. This indicates a migration, in which case
the vruntime would have been normalized in migrate_task_rq_fair().

Based on a similar patch from John Dias .

Suggested-by: Peter Zijlstra
Tested-by: Dietmar Eggemann
Signed-off-by: Steve Muckle
Signed-off-by: Peter Zijlstra (Intel)
Cc: Chris Redpath
Cc: John Dias
Cc: Linus Torvalds
Cc: Miguel de Dios
Cc: Morten Rasmussen
Cc: Patrick Bellasi
Cc: Paul Turner
Cc: Quentin Perret
Cc: Thomas Gleixner
Cc: Todd Kjos
Cc: kernel-team@android.com
Fixes: b5179ac70de8 ("sched/fair: Prepare to fix fairness problems on migration")
Link: http://lkml.kernel.org/r/20180831224217.169476-1-smuckle@google.com
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman

Steve Muckle
2018-09-29 18:06:07 +0800
7eba38a3f ring-buffer: Allow for rescheduling when removing pages ... Browse Code »

commit 83f365554e47997ec68dc4eca3f5dce525cd15c3 upstream.

When reducing ring buffer size, pages are removed by scheduling a work
item on each CPU for the corresponding CPU ring buffer. After the pages
are removed from ring buffer linked list, the pages are free()d in a
tight loop. The loop does not give up CPU until all pages are removed.
In a worst case behavior, when lot of pages are to be freed, it can
cause system stall.

After the pages are removed from the list, the free() can happen while
the work is rescheduled. Call cond_resched() in the loop to prevent the
system hangup.

Link: http://lkml.kernel.org/r/20180907223129.71994-1-vnagarnaik@google.com

Cc: stable@vger.kernel.org
Fixes: 83f40318dab00 ("ring-buffer: Make removal of ring buffer pages atomic")
Reported-by: Jason Behmer
Signed-off-by: Vaibhav Nagarnaik
Signed-off-by: Steven Rostedt (VMware)
Signed-off-by: Greg Kroah-Hartman

Vaibhav Nagarnaik
2018-09-29 18:06:04 +0800

26 Sep, 2018

4 commits

e593232f6 sched/fair: Fix util_avg of new tasks for asymmetric systems ... Browse Code »

[ Upstream commit 8fe5c5a937d0f4e84221631833a2718afde52285 ]

When a new task wakes-up for the first time, its initial utilization
is set to half of the spare capacity of its CPU. The current
implementation of post_init_entity_util_avg() uses SCHED_CAPACITY_SCALE
directly as a capacity reference. As a result, on a big.LITTLE system, a
new task waking up on an idle little CPU will be given ~512 of util_avg,
even if the CPU's capacity is significantly less than that.

Fix this by computing the spare capacity with arch_scale_cpu_capacity().

Signed-off-by: Quentin Perret
Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Vincent Guittot
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: dietmar.eggemann@arm.com
Cc: morten.rasmussen@arm.com
Cc: patrick.bellasi@arm.com
Link: http://lkml.kernel.org/r/20180612112215.25448-1-quentin.perret@arm.com
Signed-off-by: Ingo Molnar
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Quentin Perret
2018-09-26 14:38:12 +0800
9b85641c2 sched/core: Use smp_mb() in wake_woken_function() ... Browse Code »

[ Upstream commit 76e079fefc8f62bd9b2cd2950814d1ee806e31a5 ]

wake_woken_function() synchronizes with wait_woken() as follows:

[wait_woken] [wake_woken_function]

entry->flags &= ~wq_flag_woken; condition = true;
smp_mb(); smp_wmb();
if (condition) wq_entry->flags |= wq_flag_woken;
break;

This commit replaces the above smp_wmb() with an smp_mb() in order to
guarantee that either wait_woken() sees the wait condition being true
or the store to wq_entry->flags in woken_wake_function() follows the
store in wait_woken() in the coherence order (so that the former can
eventually be observed by wait_woken()).

The commit also fixes a comment associated to set_current_state() in
wait_woken(): the comment pairs the barrier in set_current_state() to
the above smp_wmb(), while the actual pairing involves the barrier in
set_current_state() and the barrier executed by the try_to_wake_up()
in wake_woken_function().

Signed-off-by: Andrea Parri
Signed-off-by: Paul E. McKenney
Acked-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: akiyks@gmail.com
Cc: boqun.feng@gmail.com
Cc: dhowells@redhat.com
Cc: j.alglave@ucl.ac.uk
Cc: linux-arch@vger.kernel.org
Cc: luc.maranget@inria.fr
Cc: npiggin@gmail.com
Cc: parri.andrea@gmail.com
Cc: stern@rowland.harvard.edu
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/20180716180605.16115-10-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Andrea Parri
2018-09-26 14:38:10 +0800
67e522a76 audit: fix use-after-free in audit_add_watch ... Browse Code »

[ Upstream commit baa2a4fdd525c8c4b0f704d20457195b29437839 ]

audit_add_watch stores locally krule->watch without taking a reference
on watch. Then, it calls audit_add_to_parent, and uses the watch stored
locally.

Unfortunately, it is possible that audit_add_to_parent updates
krule->watch.
When it happens, it also drops a reference of watch which
could free the watch.

How to reproduce (with KASAN enabled):

auditctl -w /etc/passwd -F success=0 -k test_passwd
auditctl -w /etc/passwd -F success=1 -k test_passwd2

The second call to auditctl triggers the use-after-free, because
audit_to_parent updates krule->watch to use a previous existing watch
and drops the reference to the newly created watch.

To fix the issue, we grab a reference of watch and we release it at the
end of the function.

Signed-off-by: Ronny Chevalier
Reviewed-by: Richard Guy Briggs
Signed-off-by: Paul Moore
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Ronny Chevalier
2018-09-26 14:38:09 +0800
d9951521d perf/core: Force USER_DS when recording user stack data ... Browse Code »

commit 02e184476eff848273826c1d6617bb37e5bcc7ad upstream.

Perf can record user stack data in response to a synchronous request, such
as a tracepoint firing. If this happens under set_fs(KERNEL_DS), then we
end up reading user stack data using __copy_from_user_inatomic() under
set_fs(KERNEL_DS). I think this conflicts with the intention of using
set_fs(KERNEL_DS). And it is explicitly forbidden by hardware on ARM64
when both CONFIG_ARM64_UAO and CONFIG_ARM64_PAN are used.

So fix this by forcing USER_DS when recording user stack data.

Signed-off-by: Yabin Cui
Acked-by: Peter Zijlstra (Intel)
Cc:
Cc: Alexander Shishkin
Cc: Arnaldo Carvalho de Melo
Cc: Jiri Olsa
Cc: Linus Torvalds
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Fixes: 88b0193d9418 ("perf/callchain: Force USER_DS when invoking perf_callchain_user()")
Link: http://lkml.kernel.org/r/20180823225935.27035-1-yabinc@google.com
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman

Yabin Cui
2018-09-26 14:38:08 +0800

20 Sep, 2018

3 commits

cb71229f6 timers: Clear timer_base::must_forward_clk with timer_base::lock held ... Browse Code »

[ Upstream commit 363e934d8811d799c88faffc5bfca782fd728334 ]

timer_base::must_forward_clock is indicating that the base clock might be
stale due to a long idle sleep.

The forwarding of the base clock takes place in the timer softirq or when a
timer is enqueued to a base which is idle. If the enqueue of timer to an
idle base happens from a remote CPU, then the following race can happen:

CPU0 CPU1
run_timer_softirq mod_timer

base = lock_timer_base(timer);
base->must_forward_clk = false
if (base->must_forward_clk)
forward(base); -> skipped

enqueue_timer(base, timer, idx);
-> idx is calculated high due to
stale base
unlock_timer_base(timer);
base = lock_timer_base(timer);
forward(base);

The root cause is that timer_base::must_forward_clk is cleared outside the
timer_base::lock held region, so the remote queuing CPU observes it as
cleared, but the base clock is still stale. This can cause large
granularity values for timers, i.e. the accuracy of the expiry time
suffers.

Prevent this by clearing the flag with timer_base::lock held, so that the
forwarding takes place before the cleared flag is observable by a remote
CPU.

Signed-off-by: Gaurav Kohli
Signed-off-by: Thomas Gleixner
Cc: john.stultz@linaro.org
Cc: sboyd@kernel.org
Cc: linux-arm-msm@vger.kernel.org
Link: https://lkml.kernel.org/r/1533199863-22748-1-git-send-email-gkohli@codeaurora.org
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Gaurav Kohli
2018-09-20 04:43:39 +0800
1d92a611d cpu/hotplug: Prevent state corruption on error rollback ... Browse Code »

commit 69fa6eb7d6a64801ea261025cce9723d9442d773 upstream.

When a teardown callback fails, the CPU hotplug code brings the CPU back to
the previous state. The previous state becomes the new target state. The
rollback happens in undo_cpu_down() which increments the state
unconditionally even if the state is already the same as the target.

As a consequence the next CPU hotplug operation will start at the wrong
state. This is easily to observe when __cpu_disable() fails.

Prevent the unconditional undo by checking the state vs. target before
incrementing state and fix up the consequently wrong conditional in the
unplug code which handles the failure of the final CPU take down on the
control CPU side.

Fixes: 4dddfb5faa61 ("smp/hotplug: Rewrite AP state machine core")
Reported-by: Neeraj Upadhyay
Signed-off-by: Thomas Gleixner
Tested-by: Geert Uytterhoeven
Tested-by: Sudeep Holla
Tested-by: Neeraj Upadhyay
Cc: josh@joshtriplett.org
Cc: peterz@infradead.org
Cc: jiangshanlai@gmail.com
Cc: dzickus@redhat.com
Cc: brendan.jackman@arm.com
Cc: malat@debian.org
Cc: sramana@codeaurora.org
Cc: linux-arm-msm@vger.kernel.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1809051419580.1416@nanos.tec.linutronix.de
Signed-off-by: Greg Kroah-Hartman

----

Thomas Gleixner
2018-09-20 04:43:36 +0800
cb2625854 cpu/hotplug: Adjust misplaced smb() in cpuhp_thread_fun() ... Browse Code »

commit f8b7530aa0a1def79c93101216b5b17cf408a70a upstream.

The smp_mb() in cpuhp_thread_fun() is misplaced. It needs to be after the
load of st->should_run to prevent reordering of the later load/stores
w.r.t. the load of st->should_run.

Fixes: 4dddfb5faa61 ("smp/hotplug: Rewrite AP state machine core")
Signed-off-by: Neeraj Upadhyay
Signed-off-by: Thomas Gleixner
Acked-by: Peter Zijlstra (Intel)
Cc: josh@joshtriplett.org
Cc: peterz@infradead.org
Cc: jiangshanlai@gmail.com
Cc: dzickus@redhat.com
Cc: brendan.jackman@arm.com
Cc: malat@debian.org
Cc: mojha@codeaurora.org
Cc: sramana@codeaurora.org
Cc: linux-arm-msm@vger.kernel.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1536126727-11629-1-git-send-email-neeraju@codeaurora.org
Signed-off-by: Greg Kroah-Hartman

Neeraj Upadhyay
2018-09-20 04:43:36 +0800

15 Sep, 2018

3 commits

801fc191b mm: Fix devm_memremap_pages() collision handling ... Browse Code »

commit 77dd66a3c67c93ab401ccc15efff25578be281fd upstream.

If devm_memremap_pages() detects a collision while adding entries
to the radix-tree, we call pgmap_radix_release(). Unfortunately,
the function removes *all* entries for the range -- including the
entries that caused the collision in the first place.

Modify pgmap_radix_release() to take an additional argument to
indicate where to stop, so that only newly added entries are removed
from the tree.

Cc:
Fixes: 9476df7d80df ("mm: introduce find_dev_pagemap()")
Signed-off-by: Jan H. Schönherr
Signed-off-by: Dan Williams
Signed-off-by: Sudip Mukherjee
Signed-off-by: Greg Kroah-Hartman

Jan H. Schönherr
2018-09-15 15:45:35 +0800
fd8cb2e71 sched/deadline: Fix switching to -deadline ... Browse Code »

commit 295d6d5e373607729bcc8182c25afe964655714f upstream.

Fix a bug introduced in:

72f9f3fdc928 ("sched/deadline: Remove dl_new from struct sched_dl_entity")

After that commit, when switching to -deadline if the scheduling
deadline of a task is in the past then switched_to_dl() calls
setup_new_entity() to properly initialize the scheduling deadline
and runtime.

The problem is that the task is enqueued _before_ having its parameters
initialized by setup_new_entity(), and this can cause problems.
For example, a task with its out-of-date deadline in the past will
potentially be enqueued as the highest priority one; however, its
adjusted deadline may not be the earliest one.

This patch fixes the problem by initializing the task's parameters before
enqueuing it.

Signed-off-by: luca abeni
Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Daniel Bristot de Oliveira
Cc: Juri Lelli
Cc: Linus Torvalds
Cc: Mathieu Poirier
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1504778971-13573-3-git-send-email-luca.abeni@santannapisa.it
Signed-off-by: Ingo Molnar
Signed-off-by: Sudip Mukherjee
Signed-off-by: Greg Kroah-Hartman

Luca Abeni
2018-09-15 15:45:35 +0800
f552f8c28 fork: don't copy inconsistent signal handler state to child ... Browse Code »

[ Upstream commit 06e62a46bbba20aa5286102016a04214bb446141 ]

Before this change, if a multithreaded process forks while one of its
threads is changing a signal handler using sigaction(), the memcpy() in
copy_sighand() can race with the struct assignment in do_sigaction(). It
isn't clear whether this can cause corruption of the userspace signal
handler pointer, but it definitely can cause inconsistency between
different fields of struct sigaction.

Take the appropriate spinlock to avoid this.

I have tested that this patch prevents inconsistency between sa_sigaction
and sa_flags, which is possible before this patch.

Link: http://lkml.kernel.org/r/20180702145108.73189-1-jannh@google.com
Signed-off-by: Jann Horn
Acked-by: Michal Hocko
Reviewed-by: Andrew Morton
Cc: Rik van Riel
Cc: "Peter Zijlstra (Intel)"
Cc: Kees Cook
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Jann Horn
2018-09-15 15:45:27 +0800

10 Sep, 2018

8 commits

656d6e6f6 userns: move user access out of the mutex ... Browse Code »

commit 5820f140edef111a9ea2ef414ab2428b8cb805b1 upstream.

The old code would hold the userns_state_mutex indefinitely if
memdup_user_nul stalled due to e.g. a userfault region. Prevent that by
moving the memdup_user_nul in front of the mutex_lock().

Note: This changes the error precedence of invalid buf/count/*ppos vs
map already written / capabilities missing.

Fixes: 22d917d80e84 ("userns: Rework the user_namespace adding uid/gid...")
Cc: stable@vger.kernel.org
Signed-off-by: Jann Horn
Acked-by: Christian Brauner
Acked-by: Serge Hallyn
Signed-off-by: Eric W. Biederman
Signed-off-by: Greg Kroah-Hartman

Jann Horn
2018-09-10 01:56:00 +0800
b692c405a sys: don't hold uts_sem while accessing userspace memory ... Browse Code »

commit 42a0cc3478584d4d63f68f2f5af021ddbea771fa upstream.

Holding uts_sem as a writer while accessing userspace memory allows a
namespace admin to stall all processes that attempt to take uts_sem.
Instead, move data through stack buffers and don't access userspace memory
while uts_sem is held.

Cc: stable@vger.kernel.org
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jann Horn
Signed-off-by: Eric W. Biederman
Signed-off-by: Greg Kroah-Hartman

Jann Horn
2018-09-10 01:56:00 +0800
015156f50 PM / sleep: wakeup: Fix build error caused by missing SRCU support ... Browse Code »

commit 3df6f61fff49632492490fb6e42646b803a9958a upstream.

Commit ea0212f40c6 (power: auto select CONFIG_SRCU) made the code in
drivers/base/power/wakeup.c use SRCU instead of RCU, but it forgot to
select CONFIG_SRCU in Kconfig, which leads to the following build
error if CONFIG_SRCU is not selected somewhere else:

drivers/built-in.o: In function `wakeup_source_remove':
(.text+0x3c6fc): undefined reference to `synchronize_srcu'
drivers/built-in.o: In function `pm_print_active_wakeup_sources':
(.text+0x3c7a8): undefined reference to `__srcu_read_lock'
drivers/built-in.o: In function `pm_print_active_wakeup_sources':
(.text+0x3c84c): undefined reference to `__srcu_read_unlock'
drivers/built-in.o: In function `device_wakeup_arm_wake_irqs':
(.text+0x3d1d8): undefined reference to `__srcu_read_lock'
drivers/built-in.o: In function `device_wakeup_arm_wake_irqs':
(.text+0x3d228): undefined reference to `__srcu_read_unlock'
drivers/built-in.o: In function `device_wakeup_disarm_wake_irqs':
(.text+0x3d24c): undefined reference to `__srcu_read_lock'
drivers/built-in.o: In function `device_wakeup_disarm_wake_irqs':
(.text+0x3d29c): undefined reference to `__srcu_read_unlock'
drivers/built-in.o:(.data+0x4158): undefined reference to `process_srcu'

Fix this error by selecting CONFIG_SRCU when PM_SLEEP is enabled.

Fixes: ea0212f40c6 (power: auto select CONFIG_SRCU)
Cc: 4.2+ # 4.2+
Signed-off-by: zhangyi (F)
[ rjw: Minor subject/changelog fixups ]
Signed-off-by: Rafael J. Wysocki
Signed-off-by: Greg Kroah-Hartman

zhangyi (F)
2018-09-10 01:55:58 +0800
4f6789cad uprobes: Use synchronize_rcu() not synchronize_sched() ... Browse Code »

commit 016f8ffc48cb01d1e7701649c728c5d2e737d295 upstream.

While debugging another bug, I was looking at all the synchronize*()
functions being used in kernel/trace, and noticed that trace_uprobes was
using synchronize_sched(), with a comment to synchronize with
{u,ret}_probe_trace_func(). When looking at those functions, the data is
protected with "rcu_read_lock()" and not with "rcu_read_lock_sched()". This
is using the wrong synchronize_*() function.

Link: http://lkml.kernel.org/r/20180809160553.469e1e32@gandalf.local.home

Cc: stable@vger.kernel.org
Fixes: 70ed91c6ec7f8 ("tracing/uprobes: Support ftrace_event_file base multibuffer")
Acked-by: Oleg Nesterov
Signed-off-by: Steven Rostedt (VMware)
Signed-off-by: Greg Kroah-Hartman

Steven Rostedt (VMware)
2018-09-10 01:55:58 +0800
a36e2aa90 livepatch: Validate module/old func name length ... Browse Code »

commit 6e9df95b76cad18f7b217bdad7bb8a26d63b8c47 upstream.

livepatch module author can pass module name/old function name with more
than the defined character limit. With obj->name length greater than
MODULE_NAME_LEN, the livepatch module gets loaded but waits forever on
the module specified by obj->name to be loaded. It also populates a /sys
directory with an untruncated object name.

In the case of funcs->old_name length greater then KSYM_NAME_LEN, it
would not match against any of the symbol table entries. Instead loop
through the symbol table comparing them against a nonexisting function,
which can be avoided.

The same issues apply, to misspelled/incorrect names. At least gatekeep
the modules with over the limit string length, by checking for their
length during livepatch module registration.

Cc: stable@vger.kernel.org
Signed-off-by: Kamalesh Babulal
Acked-by: Josh Poimboeuf
Signed-off-by: Jiri Kosina
Signed-off-by: Greg Kroah-Hartman

Kamalesh Babulal
2018-09-10 01:55:58 +0800
68a735eb9 printk/tracing: Do not trace printk_nmi_enter() ... Browse Code »

commit d1c392c9e2a301f38998a353f467f76414e38725 upstream.

I hit the following splat in my tests:

------------[ cut here ]------------
IRQs not enabled as expected
WARNING: CPU: 3 PID: 0 at kernel/time/tick-sched.c:982 tick_nohz_idle_enter+0x44/0x8c
Modules linked in: ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables ipv6
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.0-rc2-test+ #2
Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
EIP: tick_nohz_idle_enter+0x44/0x8c
Code: ec 05 00 00 00 75 26 83 b8 c0 05 00 00 00 75 1d 80 3d d0 36 3e c1 00
75 14 68 94 63 12 c1 c6 05 d0 36 3e c1 01 e8 04 ee f8 ff 0b 58 fa bb a0
e5 66 c1 e8 25 0f 04 00 64 03 1d 28 31 52 c1 8b
EAX: 0000001c EBX: f26e7f8c ECX: 00000006 EDX: 00000007
ESI: f26dd1c0 EDI: 00000000 EBP: f26e7f40 ESP: f26e7f38
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010296
CR0: 80050033 CR2: 0813c6b0 CR3: 2f342000 CR4: 001406f0
Call Trace:
do_idle+0x33/0x202
cpu_startup_entry+0x61/0x63
start_secondary+0x18e/0x1ed
startup_32_smp+0x164/0x168
irq event stamp: 18773830
hardirqs last enabled at (18773829): [] trace_hardirqs_on_thunk+0xc/0x10
hardirqs last disabled at (18773830): [] trace_hardirqs_off_thunk+0xc/0x10
softirqs last enabled at (18773824): [] __do_softirq+0x25f/0x2bf
softirqs last disabled at (18773767): [] call_on_stack+0x45/0x4b
---[ end trace b7c64aa79e17954a ]---

After a bit of debugging, I found what was happening. This would trigger
when performing "perf" with a high NMI interrupt rate, while enabling and
disabling function tracer. Ftrace uses breakpoints to convert the nops at
the start of functions to calls to the function trampolines. The breakpoint
traps disable interrupts and this makes calls into lockdep via the
trace_hardirqs_off_thunk in the entry.S code. What happens is the following:

do_idle {

[interrupts enabled]

[interrupts disabled]
TRACE_IRQS_OFF [lockdep says irqs off]
[...]
TRACE_IRQS_IRET
test if pt_regs say return to interrupts enabled [yes]
TRACE_IRQS_ON [lockdep says irqs are on]

nmi_enter() {
printk_nmi_enter() [traced by ftrace]
[ hit ftrace breakpoint ]

TRACE_IRQS_OFF [lockdep says irqs off]
[...]
TRACE_IRQS_IRET [return from breakpoint]
test if pt_regs say interrupts enabled [no]
[iret back to interrupt]
[iret back to code]

tick_nohz_idle_enter() {

lockdep_assert_irqs_enabled() [lockdep say no!]

Although interrupts are indeed enabled, lockdep thinks it is not, and since
we now do asserts via lockdep, it gives a false warning. The issue here is
that printk_nmi_enter() is called before lockdep_off(), which disables
lockdep (for this reason) in NMIs. By simply not allowing ftrace to see
printk_nmi_enter() (via notrace annotation) we keep lockdep from getting
confused.

Cc: stable@vger.kernel.org
Fixes: 42a0bb3f71383 ("printk/nmi: generic solution for safe printk in NMI")
Acked-by: Sergey Senozhatsky
Acked-by: Petr Mladek
Signed-off-by: Steven Rostedt (VMware)
Signed-off-by: Greg Kroah-Hartman

Steven Rostedt (VMware)
2018-09-10 01:55:57 +0800
cbde057aa tracing/blktrace: Fix to allow setting same value ... Browse Code »

commit 757d9140072054528b13bbe291583d9823cde195 upstream.

Masami Hiramatsu reported:

Current trace-enable attribute in sysfs returns an error
if user writes the same setting value as current one,
e.g.

# cat /sys/block/sda/trace/enable
0
# echo 0 > /sys/block/sda/trace/enable
bash: echo: write error: Invalid argument
# echo 1 > /sys/block/sda/trace/enable
# echo 1 > /sys/block/sda/trace/enable
bash: echo: write error: Device or resource busy

But this is not a preferred behavior, it should ignore
if new setting is same as current one. This fixes the
problem as below.

# cat /sys/block/sda/trace/enable
0
# echo 0 > /sys/block/sda/trace/enable
# echo 1 > /sys/block/sda/trace/enable
# echo 1 > /sys/block/sda/trace/enable

Link: http://lkml.kernel.org/r/20180816103802.08678002@gandalf.local.home

Cc: Ingo Molnar
Cc: Jens Axboe
Cc: linux-block@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: cd649b8bb830d ("blktrace: remove sysfs_blk_trace_enable_show/store()")
Reported-by: Masami Hiramatsu
Tested-by: Masami Hiramatsu
Signed-off-by: Steven Rostedt (VMware)
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman

Steven Rostedt (VMware)
2018-09-10 01:55:57 +0800
4c9016757 tracing: Do not call start/stop() functions when tracing_on does not change ... Browse Code »

commit f143641bfef9a4a60c57af30de26c63057e7e695 upstream.

Currently, when one echo's in 1 into tracing_on, the current tracer's
"start()" function is executed, even if tracing_on was already one. This can
lead to strange side effects. One being that if the hwlat tracer is enabled,
and someone does "echo 1 > tracing_on" into tracing_on, the hwlat tracer's
start() function is called again which will recreate another kernel thread,
and make it unable to remove the old one.

Link: http://lkml.kernel.org/r/1533120354-22923-1-git-send-email-erica.bugden@linutronix.de

Cc: stable@vger.kernel.org
Fixes: 2df8f8a6a897e ("tracing: Fix regression with irqsoff tracer and tracing_on file")
Reported-by: Erica Bugden
Signed-off-by: Steven Rostedt (VMware)
Signed-off-by: Greg Kroah-Hartman

Steven Rostedt (VMware)
2018-09-10 01:55:57 +0800

05 Sep, 2018

8 commits

63a0f9de0 watchdog: Mark watchdog touch functions as notrace ... Browse Code »

commit cb9d7fd51d9fbb329d182423bd7b92d0f8cb0e01 upstream.

Some architectures need to use stop_machine() to patch functions for
ftrace, and the assumption is that the stopped CPUs do not make function
calls to traceable functions when they are in the stopped state.

Commit ce4f06dcbb5d ("stop_machine: Touch_nmi_watchdog() after
MULTI_STOP_PREPARE") added calls to the watchdog touch functions from
the stopped CPUs and those functions lack notrace annotations. This
leads to crashes when enabling/disabling ftrace on ARM kernels built
with the Thumb-2 instruction set.

Fix it by adding the necessary notrace annotations.

Fixes: ce4f06dcbb5d ("stop_machine: Touch_nmi_watchdog() after MULTI_STOP_PREPARE")
Signed-off-by: Vincent Whitchurch
Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra
Cc: oleg@redhat.com
Cc: tj@kernel.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180821152507.18313-1-vincent.whitchurch@axis.com
Signed-off-by: Greg Kroah-Hartman

Vincent Whitchurch
2018-09-05 15:26:42 +0800
4bdf9c175 kprobes: Make list and blacklist root user read only ... Browse Code »

commit f2a3ab36077222437b4826fc76111caa14562b7c upstream.

Since the blacklist and list files on debugfs indicates
a sensitive address information to reader, it should be
restricted to the root user.

Suggested-by: Thomas Richter
Suggested-by: Ingo Molnar
Signed-off-by: Masami Hiramatsu
Cc: Ananth N Mavinakayanahalli
Cc: Anil S Keshavamurthy
Cc: Arnd Bergmann
Cc: David Howells
Cc: David S . Miller
Cc: Heiko Carstens
Cc: Jon Medhurst
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Tobin C . Harding
Cc: Will Deacon
Cc: acme@kernel.org
Cc: akpm@linux-foundation.org
Cc: brueckner@linux.vnet.ibm.com
Cc: linux-arch@vger.kernel.org
Cc: rostedt@goodmis.org
Cc: schwidefsky@de.ibm.com
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/lkml/152491890171.9916.5183693615601334087.stgit@devbox
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman

Masami Hiramatsu
2018-09-05 15:26:41 +0800
bd0f93a63 stop_machine: Atomically queue and wake stopper threads ... Browse Code »

commit cfd355145c32bb7ccb65fccbe2d67280dc2119e1 upstream.

When cpu_stop_queue_work() releases the lock for the stopper
thread that was queued into its wake queue, preemption is
enabled, which leads to the following deadlock:

CPU0 CPU1
sched_setaffinity(0, ...)
__set_cpus_allowed_ptr()
stop_one_cpu(0, ...) stop_two_cpus(0, 1, ...)
cpu_stop_queue_work(0, ...) cpu_stop_queue_two_works(0, ..., 1, ...)

-grabs lock for migration/0-
-spins with preemption disabled,
waiting for migration/0's lock to be
released-

-adds work items for migration/0
and queues migration/0 to its
wake_q-

-releases lock for migration/0
and preemption is enabled-

-current thread is preempted,
and __set_cpus_allowed_ptr
has changed the thread's
cpu allowed mask to CPU1 only-

-acquires migration/0 and migration/1's
locks-

-adds work for migration/0 but does not
add migration/0 to wake_q, since it is
already in a wake_q-

-adds work for migration/1 and adds
migration/1 to its wake_q-

-releases migration/0 and migration/1's
locks, wakes migration/1, and enables
preemption-

-since migration/1 is requested to run,
migration/1 begins to run and waits on
migration/0, but migration/0 will never
be able to run, since the thread that
can wake it is affine to CPU1-

Disable preemption in cpu_stop_queue_work() before queueing works for
stopper threads, and queueing the stopper thread in the wake queue, to
ensure that the operation of queueing the works and waking the stopper
threads is atomic.

Fixes: 0b26351b910f ("stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock")
Signed-off-by: Prasad Sodagudi
Signed-off-by: Isaac J. Manjarres
Signed-off-by: Thomas Gleixner
Cc: peterz@infradead.org
Cc: matt@codeblueprint.co.uk
Cc: bigeasy@linutronix.de
Cc: gregkh@linuxfoundation.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1533329766-4856-1-git-send-email-isaacm@codeaurora.org
Signed-off-by: Greg Kroah-Hartman

Co-Developed-by: Isaac J. Manjarres

Prasad Sodagudi
2018-09-05 15:26:36 +0800
e72107b2d stop_machine: Reflow cpu_stop_queue_two_works() ... Browse Code »

commit b80a2bfce85e1051056d98d04ecb2d0b55cbbc1c upstream.

The code flow in cpu_stop_queue_two_works() is a little arcane; fix this by
lifting the preempt_disable() to the top to create more natural nesting wrt
the spinlocks and make the wake_up_q() and preempt_enable() unconditional
at the end.

Furthermore, enable preemption in the -EDEADLK case, such that we spin-wait
with preemption enabled.

Suggested-by: Thomas Gleixner
Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Thomas Gleixner
Cc: Sebastian Andrzej Siewior
Cc: isaacm@codeaurora.org
Cc: matt@codeblueprint.co.uk
Cc: psodagud@codeaurora.org
Cc: gregkh@linuxfoundation.org
Cc: pkondeti@codeaurora.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180730112140.GH2494@hirez.programming.kicks-ass.net
Signed-off-by: Greg Kroah-Hartman

Peter Zijlstra
2018-09-05 15:26:36 +0800
cd71265a8 printk/nmi: Prevent deadlock when accessing the main log buffer in NMI ... Browse Code »

commit 03fc7f9c99c1e7ae2925d459e8487f1a6f199f79 upstream.

The commit 719f6a7040f1bdaf96 ("printk: Use the main logbuf in NMI
when logbuf_lock is available") brought back the possible deadlocks
in printk() and NMI.

The check of logbuf_lock is done only in printk_nmi_enter() to prevent
mixed output. But another CPU might take the lock later, enter NMI, and:

+ Both NMIs might be serialized by yet another lock, for example,
the one in nmi_cpu_backtrace().

+ The other CPU might get stopped in NMI, see smp_send_stop()
in panic().

The only safe solution is to use trylock when storing the message
into the main log-buffer. It might cause reordering when some lines
go to the main lock buffer directly and others are delayed via
the per-CPU buffer. It means that it is not useful in general.

This patch replaces the problematic NMI deferred context with NMI
direct context. It can be used to mark a code that might produce
many messages in NMI and the risk of losing them is more critical
than problems with eventual reordering.

The context is then used when dumping trace buffers on oops. It was
the primary motivation for the original fix. Also the reordering is
even smaller issue there because some traces have their own time stamps.

Finally, nmi_cpu_backtrace() need not longer be serialized because
it will always us the per-CPU buffers again.

Fixes: 719f6a7040f1bdaf96 ("printk: Use the main logbuf in NMI when logbuf_lock is available")
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20180627142028.11259-1-pmladek@suse.com
To: Steven Rostedt
Cc: Peter Zijlstra
Cc: Tetsuo Handa
Cc: Sergey Senozhatsky
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Acked-by: Sergey Senozhatsky
Signed-off-by: Petr Mladek
Signed-off-by: Greg Kroah-Hartman

Petr Mladek
2018-09-05 15:26:35 +0800
943276ef1 printk: Create helper function to queue deferred console handling ... Browse Code »

commit a338f84dc196f44b63ba0863d2f34fd9b1613572 upstream.

It is just a preparation step. The patch does not change
the existing behavior.

Link: http://lkml.kernel.org/r/20180627140817.27764-3-pmladek@suse.com
To: Steven Rostedt
Cc: Peter Zijlstra
Cc: Tetsuo Handa
Cc: Sergey Senozhatsky
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Acked-by: Sergey Senozhatsky
Signed-off-by: Petr Mladek
Signed-off-by: Greg Kroah-Hartman

Petr Mladek
2018-09-05 15:26:34 +0800
646e7c048 printk: Split the code for storing a message into the log buffer ... Browse Code »

commit ba552399954dde1b388f7749fecad5c349216981 upstream.

It is just a preparation step. The patch does not change
the existing behavior.

Link: http://lkml.kernel.org/r/20180627140817.27764-2-pmladek@suse.com
To: Steven Rostedt
Cc: Peter Zijlstra
Cc: Tetsuo Handa
Cc: Sergey Senozhatsky
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Acked-by: Sergey Senozhatsky
Signed-off-by: Petr Mladek
Signed-off-by: Greg Kroah-Hartman

Petr Mladek
2018-09-05 15:26:34 +0800
d35aab9df sched/rt: Restore rt_runtime after disabling RT_RUNTIME_SHARE ... Browse Code »

[ Upstream commit f3d133ee0a17d5694c6f21873eec9863e11fa423 ]

NO_RT_RUNTIME_SHARE feature is used to prevent a CPU borrow enough
runtime with a spin-rt-task.

However, if RT_RUNTIME_SHARE feature is enabled and rt_rq has borrowd
enough rt_runtime at the beginning, rt_runtime can't be restored to
its initial bandwidth rt_runtime after we disable RT_RUNTIME_SHARE.

E.g. on my PC with 4 cores, procedure to reproduce:
1) Make sure RT_RUNTIME_SHARE is enabled
cat /sys/kernel/debug/sched_features
GENTLE_FAIR_SLEEPERS START_DEBIT NO_NEXT_BUDDY LAST_BUDDY
CACHE_HOT_BUDDY WAKEUP_PREEMPTION NO_HRTICK NO_DOUBLE_TICK
LB_BIAS NONTASK_CAPACITY TTWU_QUEUE NO_SIS_AVG_CPU SIS_PROP
NO_WARN_DOUBLE_CLOCK RT_PUSH_IPI RT_RUNTIME_SHARE NO_LB_MIN
ATTACH_AGE_LOAD WA_IDLE WA_WEIGHT WA_BIAS
2) Start a spin-rt-task
./loop_rr &
3) set affinity to the last cpu
taskset -p 8 $pid_of_loop_rr
4) Observe that last cpu have borrowed enough runtime.
cat /proc/sched_debug | grep rt_runtime
.rt_runtime : 950.000000
.rt_runtime : 900.000000
.rt_runtime : 950.000000
.rt_runtime : 1000.000000
5) Disable RT_RUNTIME_SHARE
echo NO_RT_RUNTIME_SHARE > /sys/kernel/debug/sched_features
6) Observe that rt_runtime can not been restored
cat /proc/sched_debug | grep rt_runtime
.rt_runtime : 950.000000
.rt_runtime : 900.000000
.rt_runtime : 950.000000
.rt_runtime : 1000.000000

This patch help to restore rt_runtime after we disable
RT_RUNTIME_SHARE.

Signed-off-by: Hailong Liu
Signed-off-by: Jiang Biao
Signed-off-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: zhong.weidong@zte.com.cn
Link: http://lkml.kernel.org/r/1531874815-39357-1-git-send-email-liu.hailong6@zte.com.cn
Signed-off-by: Ingo Molnar
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Hailong Liu
2018-09-05 15:26:29 +0800