Eric Lee / smarc-fsl-linux-kernel

16 Jul, 2016

1 commit

8dcf5a80d Merge branch 'for-4.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq ... Browse Code »

Pull workqueue fix from Tejun Heo:
"The optimization for setting unbound worker affinity masks collided
with recent scheduler changes triggering warning messages.

This late pull request fixes the bug by removing the optimization"

* 'for-4.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Fix setting affinity of unbound worker threads

Linus Torvalds
2016-07-16 05:36:55 +0800

15 Jul, 2016

3 commits

fa3a9f574 Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge misc fixes from Andrew Morton:
"20 fixes"

* emailed patches from Andrew Morton :
m32r: fix build warning about putc
mm: workingset: printk missing log level, use pr_info()
mm: thp: refix false positive BUG in page_move_anon_rmap()
mm: rmap: call page_check_address() with sync enabled to avoid racy check
mm: thp: move pmd check inside ptl for freeze_page()
vmlinux.lds: account for destructor sections
gcov: add support for gcc version >= 6
mm, meminit: ensure node is online before checking whether pages are uninitialised
mm, meminit: always return a valid node from early_pfn_to_nid
kasan/quarantine: fix bugs on qlist_move_cache()
uapi: export lirc.h header
madvise_free, thp: fix madvise_free_huge_pmd return value after splitting
Revert "scripts/gdb: add documentation example for radix tree"
Revert "scripts/gdb: add a Radix Tree Parser"
scripts/gdb: Perform path expansion to lx-symbol's arguments
scripts/gdb: add constants.py to .gitignore
scripts/gdb: rebuild constants.py on dependancy change
scripts/gdb: silence 'nothing to do' message
kasan: add newline to messages
mm, compaction: prevent VM_BUG_ON when terminating freeing scanner

Linus Torvalds
2016-07-15 15:00:18 +0800
d83a4c116 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler fix from Ingo Molnar:
"Fix a CPU hotplug related corruption of the load average that got
introduced in this merge window"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/core: Correct off by one bug in load migration calculation

Linus Torvalds
2016-07-15 14:02:49 +0800
d02038f97 gcov: add support for gcc version >= 6 ... Browse Code »

Link: http://lkml.kernel.org/r/20160701130914.GA23225@styxhp
Signed-off-by: Florian Meier
Reviewed-by: Peter Oberparleiter
Tested-by: Peter Oberparleiter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Florian Meier
2016-07-15 13:54:27 +0800

14 Jul, 2016

1 commit

f97d10454 Merge branches 'perf-urgent-for-linus' and 'timers-urgent-for-linus' of git://gi… ... Browse Code »

…t.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf and timer fixes from Ingo Molnar:
"A fix for a posix CPU timers bug, and a perf printk message fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86: Fix bogus kernel printk, again

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
posix_cpu_timer: Exit early when process has been reaped

Linus Torvalds
2016-07-14 04:44:47 +0800

13 Jul, 2016

2 commits

d60585c57 sched/core: Correct off by one bug in load migration calculation ... Browse Code »

The move of calc_load_migrate() from CPU_DEAD to CPU_DYING did not take into
account that the function is now called from a thread running on the outgoing
CPU. As a result a cpu unplug leakes a load of 1 into the global load
accounting mechanism.

Fix it by adjusting for the currently running thread which calls
calc_load_migrate().

Reported-by: Anton Blanchard
Signed-off-by: Thomas Gleixner
Acked-by: Peter Zijlstra
Cc: Linus Torvalds
Cc: Michael Ellerman
Cc: Vaidyanathan Srinivasan
Cc: rt@linutronix.de
Cc: shreyas@linux.vnet.ibm.com
Fixes: e9cd8fa4fcfd: ("sched/migration: Move calc_load_migrate() into CPU_DYING")
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1607121744350.4083@nanos
Signed-off-by: Ingo Molnar

Thomas Gleixner
2016-07-13 20:58:20 +0800
a7c734140 cpu/hotplug: Keep enough storage space if SMP=n to avoid array out of bounds scribble ... Browse Code »

Xiaolong Ye reported lock debug warnings triggered by the following commit:

8de4a0066106 ("perf/x86: Convert the core to the hotplug state machine")

The bug is the following: the cpuhp_bp_states[] array is cut short when
CONFIG_SMP=n, but the dynamically registered callbacks are stored nevertheless
and happily scribble outside of the array bounds...

We need to store them in case that the state is unregistered so we can invoke
the teardown function. That's independent of CONFIG_SMP. Make sure the array
is large enough.

Reported-by: kernel test robot
Signed-off-by: Thomas Gleixner
Cc: Adam Borowski
Cc: Alexander Shishkin
Cc: Anna-Maria Gleixner
Cc: Arnaldo Carvalho de Melo
Cc: Arnaldo Carvalho de Melo
Cc: Borislav Petkov
Cc: Jiri Olsa
Cc: Kan Liang
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Sebastian Andrzej Siewior
Cc: Stephane Eranian
Cc: Vince Weaver
Cc: lkp@01.org
Cc: stable@vger.kernel.org
Cc: tipbuild@zytor.com
Fixes: cff7d378d3fd "cpu/hotplug: Convert to a state machine for the control processor"
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1607122144560.4083@nanos
Signed-off-by: Ingo Molnar

Thomas Gleixner
2016-07-13 15:29:39 +0800

11 Jul, 2016

1 commit

2c13ce8f6 posix_cpu_timer: Exit early when process has been reaped ... Browse Code »

Variable "now" seems to be genuinely used unintialized
if branch

if (CPUCLOCK_PERTHREAD(timer->it_clock)) {

is not taken and branch

if (unlikely(sighand == NULL)) {

is taken. In this case the process has been reaped and the timer is marked as
disarmed anyway. So none of the postprocessing of the sample is
required. Return right away.

Signed-off-by: Alexey Dobriyan
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20160707223911.GA26483@p183.telecom.by
Signed-off-by: Thomas Gleixner

Alexey Dobriyan
2016-07-11 23:20:12 +0800

09 Jul, 2016

1 commit

369da7fc6 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler fixes from Ingo Molnar:
"Two load-balancing fixes for cgroups-intense workloads"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Fix calc_cfs_shares() fixed point arithmetics width confusion
sched/fair: Fix effective_load() to consistently use smoothed load

Linus Torvalds
2016-07-09 00:04:34 +0800

07 Jul, 2016

1 commit

2c81a6477 perf/core: Fix pmu::filter_match for SW-led groups ... Browse Code »

The following commit:

66eb579e66ec ("perf: allow for PMU-specific event filtering")

added the pmu::filter_match() callback. This was intended to
avoid HW constraints on events from resulting in extremely
pessimistic scheduling.

However, pmu::filter_match() is only called for the leader of each event
group. When the leader is a SW event, we do not filter the groups, and
may fail at pmu::add() time, and when this happens we'll give up on
scheduling any event groups later in the list until they are rotated
ahead of the failing group.

This can result in extremely sub-optimal event scheduling behaviour,
e.g. if running the following on a big.LITTLE platform:

$ taskset -c 0 ./perf stat \
-e 'a57{context-switches,armv8_cortex_a57/config=0x11/}' \
-e 'a53{context-switches,armv8_cortex_a53/config=0x11/}' \
ls

context-switches (0.00%)
armv8_cortex_a57/config=0x11/ (0.00%)
24 context-switches (37.36%)
57589154 armv8_cortex_a53/config=0x11/ (37.36%)

Here the 'a53' event group was always eligible to be scheduled, but
the 'a57' group never eligible to be scheduled, as the task was always
affine to a Cortex-A53 CPU. The SW (group leader) event in the 'a57'
group was eligible, but the HW event failed at pmu::add() time,
resulting in ctx_flexible_sched_in giving up on scheduling further
groups with HW events.

One way of avoiding this is to check pmu::filter_match() on siblings
as well as the group leader. If any of these fail their
pmu::filter_match() call, we must skip the entire group before
attempting to add any events.

Signed-off-by: Mark Rutland
Signed-off-by: Peter Zijlstra (Intel)
Cc: Alexander Shishkin
Cc: Arnaldo Carvalho de Melo
Cc: Arnaldo Carvalho de Melo
Cc: Jiri Olsa
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Will Deacon
Fixes: 66eb579e66ec ("perf: allow for PMU-specific event filtering")
Link: http://lkml.kernel.org/r/1465917041-15339-1-git-send-email-mark.rutland@arm.com
[ Small readability edits. ]
Signed-off-by: Ingo Molnar

Mark Rutland
2016-07-07 14:57:57 +0800

30 Jun, 2016

3 commits

89a82a921 Merge branch 'stable-4.7' of git://git.infradead.org/users/pcmoore/audit ... Browse Code »

Pull audit fixes from Paul Moore:
"Two small patches to fix audit problems in 4.7-rcX: the first fixes a
potential kref leak, the second removes some header file noise.

The first is an important bug fix that really should go in before 4.7
is released, the second is not critical, but falls into the very-nice-
to-have category so I'm including in the pull request.

Both patches are straightforward, self-contained, and pass our
testsuite without problem"

* 'stable-4.7' of git://git.infradead.org/users/pcmoore/audit:
audit: move audit_get_tty to reduce scope and kabi changes
audit: move calcs after alloc and check when logging set loginuid

Linus Torvalds
2016-06-30 06:18:47 +0800
32826ac41 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:
"I've been traveling so this accumulates more than week or so of bug
fixing. It perhaps looks a little worse than it really is.

1) Fix deadlock in ath10k driver, from Ben Greear.

2) Increase scan timeout in iwlwifi, from Luca Coelho.

3) Unbreak STP by properly reinjecting STP packets back into the
stack. Regression fix from Ido Schimmel.

4) Mediatek driver fixes (missing malloc failure checks, leaking of
scratch memory, wrong indexing when mapping TX buffers, etc.) from
John Crispin.

5) Fix endianness bug in icmpv6_err() handler, from Hannes Frederic
Sowa.

6) Fix hashing of flows in UDP in the ruseport case, from Xuemin Su.

7) Fix netlink notifications in ovs for tunnels, delete link messages
are never emitted because of how the device registry state is
handled. From Nicolas Dichtel.

8) Conntrack module leaks kmemcache on unload, from Florian Westphal.

9) Prevent endless jump loops in nft rules, from Liping Zhang and
Pablo Neira Ayuso.

10) Not early enough spinlock initialization in mlx4, from Eric
Dumazet.

11) Bind refcount leak in act_ipt, from Cong WANG.

12) Missing RCU locking in HTB scheduler, from Florian Westphal.

13) Several small MACSEC bug fixes from Sabrina Dubroca (missing RCU
barrier, using heap for SG and IV, and erroneous use of async flag
when allocating AEAD conext.)

14) RCU handling fix in TIPC, from Ying Xue.

15) Pass correct protocol down into ipv4_{update_pmtu,redirect}() in
SIT driver, from Simon Horman.

16) Socket timer deadlock fix in TIPC from Jon Paul Maloy.

17) Fix potential deadlock in team enslave, from Ido Schimmel.

18) Memory leak in KCM procfs handling, from Jiri Slaby.

19) ESN generation fix in ipv4 ESP, from Herbert Xu.

20) Fix GFP_KERNEL allocations with locks held in act_ife, from Cong
WANG.

21) Use after free in netem, from Eric Dumazet.

22) Uninitialized last assert time in multicast router code, from Tom
Goff.

23) Skip raw sockets in sock_diag destruction broadcast, from Willem
de Bruijn.

24) Fix link status reporting in thunderx, from Sunil Goutham.

25) Limit resegmentation of retransmit queue so that we do not
retransmit too large GSO frames. From Eric Dumazet.

26) Delay bpf program release after grace period, from Daniel
Borkmann"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (141 commits)
openvswitch: fix conntrack netlink event delivery
qed: Protect the doorbell BAR with the write barriers.
neigh: Explicitly declare RCU-bh read side critical section in neigh_xmit()
e1000e: keep VLAN interfaces functional after rxvlan off
cfg80211: fix proto in ieee80211_data_to_8023 for frames without LLC header
qlcnic: use the correct ring in qlcnic_83xx_process_rcv_ring_diag()
bpf, perf: delay release of BPF prog after grace period
net: bridge: fix vlan stats continue counter
tcp: do not send too big packets at retransmit time
ibmvnic: fix to use list_for_each_safe() when delete items
net: thunderx: Fix TL4 configuration for secondary Qsets
net: thunderx: Fix link status reporting
net/mlx5e: Reorganize ethtool statistics
net/mlx5e: Fix number of PFC counters reported to ethtool
net/mlx5e: Prevent adding the same vxlan port
net/mlx5e: Check for BlueFlame capability before allocating SQ uar
net/mlx5e: Change enum to better reflect usage
net/mlx5: Add ConnectX-5 PCIe 4.0 to list of supported devices
net/mlx5: Update command strings
net: marvell: Add separate config ANEG function for Marvell 88E1111
...

Linus Torvalds
2016-06-30 02:50:42 +0800
52827f389 Merge branch 'for-4.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup ... Browse Code »

Pull cgroup fixes from Tejun Heo:
"Three fix patches. Two are for cgroup / css init failure path. The
last one makes css_set_lock irq-safe as the deadline scheduler ends up
calling put_css_set() from irq context"

* 'for-4.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: Disable IRQs while holding css_set_lock
cgroup: set css->id to -1 during init
cgroup: remove redundant cleanup in css_create

Linus Torvalds
2016-06-30 01:04:42 +0800

29 Jun, 2016

3 commits

ceb560703 bpf, perf: delay release of BPF prog after grace period ... Browse Code »

Commit dead9f29ddcc ("perf: Fix race in BPF program unregister") moved
destruction of BPF program from free_event_rcu() callback to __free_event(),
which is problematic if used with tail calls: if prog A is attached as
trace event directly, but at the same time present in a tail call map used
by another trace event program elsewhere, then we need to delay destruction
via RCU grace period since it can still be in use by the program doing the
tail call (the prog first needs to be dropped from the tail call map, then
trace event with prog A attached destroyed, so we get immediate destruction).

Fixes: dead9f29ddcc ("perf: Fix race in BPF program unregister")
Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Cc: Jann Horn
Signed-off-by: David S. Miller

Daniel Borkmann
2016-06-29 17:42:55 +0800
3f5be2da8 audit: move audit_get_tty to reduce scope and kabi changes ... Browse Code »

The only users of audit_get_tty and audit_put_tty are internal to
audit, so move it out of include/linux/audit.h to kernel.h and create
a proper function rather than inlining it. This also reduces kABI
changes.

Suggested-by: Paul Moore
Signed-off-by: Richard Guy Briggs
[PM: line wrapped description]
Signed-off-by: Paul Moore

Richard Guy Briggs
2016-06-29 03:48:48 +0800
76a658c20 audit: move calcs after alloc and check when logging set loginuid ... Browse Code »

Move the calculations of values after the allocation in case the
allocation fails. This avoids wasting effort in the rare case that it
fails, but more importantly saves us extra logic to release the tty
ref.

Signed-off-by: Richard Guy Briggs
Signed-off-by: Paul Moore

Richard Guy Briggs
2016-06-29 03:40:17 +0800

27 Jun, 2016

2 commits

ea1dc6fc6 sched/fair: Fix calc_cfs_shares() fixed point arithmetics width confusion ... Browse Code »

Commit:

fde7d22e01aa ("sched/fair: Fix overly small weight for interactive group entities")

did something non-obvious but also did it buggy yet latent.

The problem was exposed for real by a later commit in the v4.7 merge window:

2159197d6677 ("sched/core: Enable increased load resolution on 64-bit kernels")

... after which tg->load_avg and cfs_rq->load.weight had different
units (10 bit fixed point and 20 bit fixed point resp.).

Add a comment to explain the use of cfs_rq->load.weight over the
'natural' cfs_rq->avg.load_avg and add scale_load_down() to correct
for the difference in unit.

Since this is (now, as per a previous commit) the only user of
calc_tg_weight(), collapse it.

The effects of this bug should be randomly inconsistent SMP-balancing
of cgroups workloads.

Reported-by: Jirka Hladky
Signed-off-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Fixes: 2159197d6677 ("sched/core: Enable increased load resolution on 64-bit kernels")
Fixes: fde7d22e01aa ("sched/fair: Fix overly small weight for interactive group entities")
Signed-off-by: Ingo Molnar

Peter Zijlstra
2016-06-27 17:18:37 +0800
7dd491259 sched/fair: Fix effective_load() to consistently use smoothed load ... Browse Code »

Starting with the following commit:

fde7d22e01aa ("sched/fair: Fix overly small weight for interactive group entities")

calc_tg_weight() doesn't compute the right value as expected by effective_load().

The difference is in the 'correction' term. In order to ensure \Sum
rw_j >= rw_i we cannot use tg->load_avg directly, since that might be
lagging a correction on the current cfs_rq->avg.load_avg value.
Therefore we use tg->load_avg - cfs_rq->tg_load_avg_contrib +
cfs_rq->avg.load_avg.

Now, per the referenced commit, calc_tg_weight() doesn't use
cfs_rq->avg.load_avg, as is later used in @w, but uses
cfs_rq->load.weight instead.

So stop using calc_tg_weight() and do it explicitly.

The effects of this bug are wake_affine() making randomly
poor choices in cgroup-intense workloads.

Signed-off-by: Peter Zijlstra (Intel)
Cc: # v4.3+
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Fixes: fde7d22e01aa ("sched/fair: Fix overly small weight for interactive group entities")
Signed-off-by: Ingo Molnar

Peter Zijlstra
2016-06-27 17:18:36 +0800

25 Jun, 2016

6 commits

57801c1b8 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler fixes from Thomas Gleixner:
"A couple of scheduler fixes:

- force watchdog reset while processing sysrq-w

- fix a deadlock when enabling trace events in the scheduler

- fixes to the throttled next buddy logic

- fixes for the average accounting (missing serialization and
underflow handling)

- allow kernel threads for fallback to online but not active cpus"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/core: Allow kthreads to fall back to online && !active cpus
sched/fair: Do not announce throttled next buddy in dequeue_task_fair()
sched/fair: Initialize throttle_count for new task-groups lazily
sched/fair: Fix cfs_rq avg tracking underflow
kernel/sysrq, watchdog, sched/core: Reset watchdog on all CPUs while processing sysrq-w
sched/debug: Fix deadlock when enabling sched events
sched/fair: Fix post_init_entity_util_avg() serialization

Linus Torvalds
2016-06-25 21:38:42 +0800
e3b22bc3d Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull locking fix from Thomas Gleixner:
"A single fix to address a race in the static key logic"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/static_key: Fix concurrent static_key_slow_inc()

Linus Torvalds
2016-06-25 21:14:44 +0800
9521d3997 Fix build break in fork.c when THREAD_SIZE < PAGE_SIZE ... Browse Code »

Commit b235beea9e99 ("Clarify naming of thread info/stack allocators")
breaks the build on some powerpc configs, where THREAD_SIZE < PAGE_SIZE:

kernel/fork.c:235:2: error: implicit declaration of function 'free_thread_stack'
kernel/fork.c:355:8: error: assignment from incompatible pointer type
stack = alloc_thread_stack_node(tsk, node);
^

Fix it by renaming free_stack() to free_thread_stack(), and updating the
return type of alloc_thread_stack_node().

Fixes: b235beea9e99 ("Clarify naming of thread info/stack allocators")
Signed-off-by: Michael Ellerman
Signed-off-by: Linus Torvalds

Michael Ellerman
2016-06-25 21:01:28 +0800
086e3eb65 Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge misc fixes from Andrew Morton:
"Two weeks worth of fixes here"

* emailed patches from Andrew Morton : (41 commits)
init/main.c: fix initcall_blacklisted on ia64, ppc64 and parisc64
autofs: don't get stuck in a loop if vfs_write() returns an error
mm/page_owner: avoid null pointer dereference
tools/vm/slabinfo: fix spelling mistake: "Ocurrences" -> "Occurrences"
fs/nilfs2: fix potential underflow in call to crc32_le
oom, suspend: fix oom_reaper vs. oom_killer_disable race
ocfs2: disable BUG assertions in reading blocks
mm, compaction: abort free scanner if split fails
mm: prevent KASAN false positives in kmemleak
mm/hugetlb: clear compound_mapcount when freeing gigantic pages
mm/swap.c: flush lru pvecs on compound page arrival
memcg: css_alloc should return an ERR_PTR value on error
memcg: mem_cgroup_migrate() may be called with irq disabled
hugetlb: fix nr_pmds accounting with shared page tables
Revert "mm: disable fault around on emulated access bit architecture"
Revert "mm: make faultaround produce old ptes"
mailmap: add Boris Brezillon's email
mailmap: add Antoine Tenart's email
mm, sl[au]b: add __GFP_ATOMIC to the GFP reclaim mask
mm: mempool: kasan: don't poot mempool objects in quarantine
...

Linus Torvalds
2016-06-25 10:08:33 +0800
740705420 oom, suspend: fix oom_reaper vs. oom_killer_disable race ... Browse Code »

Tetsuo has reported the following potential oom_killer_disable vs.
oom_reaper race:

(1) freeze_processes() starts freezing user space threads.
(2) Somebody (maybe a kenrel thread) calls out_of_memory().
(3) The OOM killer calls mark_oom_victim() on a user space thread
P1 which is already in __refrigerator().
(4) oom_killer_disable() sets oom_killer_disabled = true.
(5) P1 leaves __refrigerator() and enters do_exit().
(6) The OOM reaper calls exit_oom_victim(P1) before P1 can call
exit_oom_victim(P1).
(7) oom_killer_disable() returns while P1 not yet finished
(8) P1 perform IO/interfere with the freezer.

This situation is unfortunate. We cannot move oom_killer_disable after
all the freezable kernel threads are frozen because the oom victim might
depend on some of those kthreads to make a forward progress to exit so
we could deadlock. It is also far from trivial to teach the oom_reaper
to not call exit_oom_victim() because then we would lose a guarantee of
the OOM killer and oom_killer_disable forward progress because
exit_mm->mmput might block and never call exit_oom_victim.

It seems the easiest way forward is to workaround this race by calling
try_to_freeze_tasks again after oom_killer_disable. This will make sure
that all the tasks are frozen or it bails out.

Fixes: 449d777d7ad6 ("mm, oom_reaper: clear TIF_MEMDIE for all tasks queued for oom_reaper")
Link: http://lkml.kernel.org/r/1466597634-16199-1-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko
Reported-by: Tetsuo Handa
Cc: "Rafael J. Wysocki"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2016-06-25 08:23:52 +0800
b235beea9 Clarify naming of thread info/stack allocators ... Browse Code »

We've had the thread info allocated together with the thread stack for
most architectures for a long time (since the thread_info was split off
from the task struct), but that is about to change.

But the patches that move the thread info to be off-stack (and a part of
the task struct instead) made it clear how confused the allocator and
freeing functions are.

Because the common case was that we share an allocation with the thread
stack and the thread_info, the two pointers were identical. That
identity then meant that we would have things like

ti = alloc_thread_info_node(tsk, node);
...
tsk->stack = ti;

which certainly _worked_ (since stack and thread_info have the same
value), but is rather confusing: why are we assigning a thread_info to
the stack? And if we move the thread_info away, the "confusing" code
just gets to be entirely bogus.

So remove all this confusion, and make it clear that we are doing the
stack allocation by renaming and clarifying the function names to be
about the stack. The fact that the thread_info then shares the
allocation is an implementation detail, and not really about the
allocation itself.

This is a pure renaming and type fix: we pass in the same pointer, it's
just that we clarify what the pointer means.

The ia64 code that actually only has one single allocation (for all of
task_struct, thread_info and kernel thread stack) now looks a bit odd,
but since "tsk->stack" is actually not even used there, that oddity
doesn't matter. It would be a separate thing to clean that up, I
intentionally left the ia64 changes as a pure brute-force renaming and
type change.

Acked-by: Andy Lutomirski
Signed-off-by: Linus Torvalds

Linus Torvalds
2016-06-25 06:09:37 +0800

24 Jun, 2016

6 commits

feb245e30 sched/core: Allow kthreads to fall back to online && !active cpus ... Browse Code »

During CPU hotplug, CPU_ONLINE callbacks are run while the CPU is
online but not active. A CPU_ONLINE callback may create or bind a
kthread so that its cpus_allowed mask only allows the CPU which is
being brought online. The kthread may start executing before the CPU
is made active and can end up in select_fallback_rq().

In such cases, the expected behavior is selecting the CPU which is
coming online; however, because select_fallback_rq() only chooses from
active CPUs, it determines that the task doesn't have any viable CPU
in its allowed mask and ends up overriding it to cpu_possible_mask.

CPU_ONLINE callbacks should be able to put kthreads on the CPU which
is coming online. Update select_fallback_rq() so that it follows
cpu_online() rather than cpu_active() for kthreads.

Reported-by: Gautham R Shenoy
Tested-by: Gautham R. Shenoy
Signed-off-by: Tejun Heo
Signed-off-by: Peter Zijlstra (Intel)
Cc: Abdul Haleem
Cc: Aneesh Kumar
Cc: Linus Torvalds
Cc: Michael Ellerman
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: kernel-team@fb.com
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20160616193504.GB3262@mtj.duckdns.org
Signed-off-by: Ingo Molnar

Tejun Heo
2016-06-24 14:26:53 +0800
754bd598b sched/fair: Do not announce throttled next buddy in dequeue_task_fair() ... Browse Code »

Hierarchy could be already throttled at this point. Throttled next
buddy could trigger a NULL pointer dereference in pick_next_task_fair().

Signed-off-by: Konstantin Khlebnikov
Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Ben Segall
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/146608183552.21905.15924473394414832071.stgit@buzz
Signed-off-by: Ingo Molnar

Konstantin Khlebnikov
2016-06-24 14:26:45 +0800
094f46917 sched/fair: Initialize throttle_count for new task-groups lazily ... Browse Code »

Cgroup created inside throttled group must inherit current throttle_count.
Broken throttle_count allows to nominate throttled entries as a next buddy,
later this leads to null pointer dereference in pick_next_task_fair().

This patch initialize cfs_rq->throttle_count at first enqueue: laziness
allows to skip locking all rq at group creation. Lazy approach also allows
to skip full sub-tree scan at throttling hierarchy (not in this patch).

Signed-off-by: Konstantin Khlebnikov
Signed-off-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: bsegall@google.com
Link: http://lkml.kernel.org/r/146608182119.21870.8439834428248129633.stgit@buzz
Signed-off-by: Ingo Molnar

Konstantin Khlebnikov
2016-06-24 14:26:44 +0800
4c5ea0a9c locking/static_key: Fix concurrent static_key_slow_inc() ... Browse Code »

The following scenario is possible:

CPU 1 CPU 2
static_key_slow_inc()
atomic_inc_not_zero()
-> key.enabled == 0, no increment
jump_label_lock()
atomic_inc_return()
-> key.enabled == 1 now
static_key_slow_inc()
atomic_inc_not_zero()
-> key.enabled == 1, inc to 2
return
** static key is wrong!
jump_label_update()
jump_label_unlock()

Testing the static key at the point marked by (**) will follow the
wrong path for jumps that have not been patched yet. This can
actually happen when creating many KVM virtual machines with userspace
LAPIC emulation; just run several copies of the following program:

#include
#include
#include
#include

int main(void)
{
for (;;) {
int kvmfd = open("/dev/kvm", O_RDONLY);
int vmfd = ioctl(kvmfd, KVM_CREATE_VM, 0);
close(ioctl(vmfd, KVM_CREATE_VCPU, 1));
close(vmfd);
close(kvmfd);
}
return 0;
}

Every KVM_CREATE_VCPU ioctl will attempt a static_key_slow_inc() call.
The static key's purpose is to skip NULL pointer checks and indeed one
of the processes eventually dereferences NULL.

As explained in the commit that introduced the bug:

706249c222f6 ("locking/static_keys: Rework update logic")

jump_label_update() needs key.enabled to be true. The solution adopted
here is to temporarily make key.enabled == -1, and use go down the
slow path when key.enabled
Signed-off-by: Paolo Bonzini
Signed-off-by: Peter Zijlstra (Intel)
Cc: # v4.3+
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Fixes: 706249c222f6 ("locking/static_keys: Rework update logic")
Link: http://lkml.kernel.org/r/1466527937-69798-1-git-send-email-pbonzini@redhat.com
[ Small stylistic edits to the changelog and the code. ]
Signed-off-by: Ingo Molnar

Paolo Bonzini
2016-06-24 14:23:16 +0800
82d6489d0 cgroup: Disable IRQs while holding css_set_lock ... Browse Code »

While testing the deadline scheduler + cgroup setup I hit this
warning.

[ 132.612935] ------------[ cut here ]------------
[ 132.612951] WARNING: CPU: 5 PID: 0 at kernel/softirq.c:150 __local_bh_enable_ip+0x6b/0x80
[ 132.612952] Modules linked in: (a ton of modules...)
[ 132.612981] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.7.0-rc2 #2
[ 132.612981] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.2-20150714_191134- 04/01/2014
[ 132.612982] 0000000000000086 45c8bb5effdd088b ffff88013fd43da0 ffffffff813d229e
[ 132.612984] 0000000000000000 0000000000000000 ffff88013fd43de0 ffffffff810a652b
[ 132.612985] 00000096811387b5 0000000000000200 ffff8800bab29d80 ffff880034c54c00
[ 132.612986] Call Trace:
[ 132.612987] [] dump_stack+0x63/0x85
[ 132.612994] [] __warn+0xcb/0xf0
[ 132.612997] [] ? push_dl_task.part.32+0x170/0x170
[ 132.612999] [] warn_slowpath_null+0x1d/0x20
[ 132.613000] [] __local_bh_enable_ip+0x6b/0x80
[ 132.613008] [] _raw_write_unlock_bh+0x1a/0x20
[ 132.613010] [] _raw_spin_unlock_bh+0xe/0x10
[ 132.613015] [] put_css_set+0x5c/0x60
[ 132.613016] [] cgroup_free+0x7f/0xa0
[ 132.613017] [] __put_task_struct+0x42/0x140
[ 132.613018] [] dl_task_timer+0xca/0x250
[ 132.613027] [] ? push_dl_task.part.32+0x170/0x170
[ 132.613030] [] __hrtimer_run_queues+0xee/0x270
[ 132.613031] [] hrtimer_interrupt+0xa8/0x190
[ 132.613034] [] local_apic_timer_interrupt+0x38/0x60
[ 132.613035] [] smp_apic_timer_interrupt+0x3d/0x50
[ 132.613037] [] apic_timer_interrupt+0x8c/0xa0
[ 132.613038] [] ? native_safe_halt+0x6/0x10
[ 132.613043] [] default_idle+0x1e/0xd0
[ 132.613044] [] arch_cpu_idle+0xf/0x20
[ 132.613046] [] default_idle_call+0x2a/0x40
[ 132.613047] [] cpu_startup_entry+0x2e7/0x340
[ 132.613048] [] start_secondary+0x155/0x190
[ 132.613049] ---[ end trace f91934d162ce9977 ]---

The warn is the spin_(lock|unlock)_bh(&css_set_lock) in the interrupt
context. Converting the spin_lock_bh to spin_lock_irq(save) to avoid
this problem - and other problems of sharing a spinlock with an
interrupt.

Cc: Tejun Heo
Cc: Li Zefan
Cc: Johannes Weiner
Cc: Juri Lelli
Cc: Steven Rostedt
Cc: cgroups@vger.kernel.org
Cc: stable@vger.kernel.org # 4.5+
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Rik van Riel
Reviewed-by: "Luis Claudio R. Goncalves"
Signed-off-by: Daniel Bristot de Oliveira
Acked-by: Zefan Li
Signed-off-by: Tejun Heo

Daniel Bristot de Oliveira
2016-06-24 05:23:12 +0800
6720a305d locking: avoid passing around 'thread_info' in mutex debugging code ... Browse Code »

None of the code actually wants a thread_info, it all wants a
task_struct, and it's just converting back and forth between the two
("ti->task" to get the task_struct from the thread_info, and
"task_thread_info(task)" to go the other way).

No semantic change.

Acked-by: Peter Zijlstra
Signed-off-by: Linus Torvalds

Linus Torvalds
2016-06-24 03:11:17 +0800

21 Jun, 2016

1 commit

f780f00d7 Merge tag 'trace-v4.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace ... Browse Code »

Pull tracing fixes from Steven Rostedt:
"Two fixes for the tracing system:

- When trace_printk() is used with a non constant format descriptor,
it adds a NULL pointer into the trace format section, and the code
isn't prepared to deal with it. This bug appeared by a change that
was added in v3.5.

- The ftracetest (selftests section) can't handle testing histograms
when histograms are not configured. Currently it shows that they
fail the test, when they should state that they are unsupported.
This bug was added in the 4.7 merge window with the addition of the
historgram code"

* tag 'trace-v4.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftracetest: Fix hist unsupported result in hist selftests
tracing: Handle NULL formats in hold_module_trace_bprintk_format()

Linus Torvalds
2016-06-21 01:35:48 +0800

20 Jun, 2016

2 commits

70c8217ac tracing: Handle NULL formats in hold_module_trace_bprintk_format() ... Browse Code »

If a task uses a non constant string for the format parameter in
trace_printk(), then the trace_printk_fmt variable is set to NULL. This
variable is then saved in the __trace_printk_fmt section.

The function hold_module_trace_bprintk_format() checks to see if duplicate
formats are used by modules, and reuses them if so (saves them to the list
if it is new). But this function calls lookup_format() that does a strcmp()
to the value (which is now NULL) and can cause a kernel oops.

This wasn't an issue till 3debb0a9ddb ("tracing: Fix trace_printk() to print
when not using bprintk()") which added "__used" to the trace_printk_fmt
variable, and before that, the kernel simply optimized it out (no NULL value
was saved).

The fix is simply to handle the NULL pointer in lookup_format() and have the
caller ignore the value if it was NULL.

Link: http://lkml.kernel.org/r/1464769870-18344-1-git-send-email-zhengjun.xing@intel.com

Reported-by: xingzhen
Acked-by: Namhyung Kim
Fixes: 3debb0a9ddb ("tracing: Fix trace_printk() to print when not using bprintk()")
Cc: stable@vger.kernel.org # v3.5+
Signed-off-by: Steven Rostedt

Steven Rostedt (Red Hat)
2016-06-20 21:46:12 +0800
897418922 sched/fair: Fix cfs_rq avg tracking underflow ... Browse Code »

As per commit:

b7fa30c9cc48 ("sched/fair: Fix post_init_entity_util_avg() serialization")

> the code generated from update_cfs_rq_load_avg():
>
> if (atomic_long_read(&cfs_rq->removed_load_avg)) {
> s64 r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0);
> sa->load_avg = max_t(long, sa->load_avg - r, 0);
> sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0);
> removed_load = 1;
> }
>
> turns into:
>
> ffffffff81087064: 49 8b 85 98 00 00 00 mov 0x98(%r13),%rax
> ffffffff8108706b: 48 85 c0 test %rax,%rax
> ffffffff8108706e: 74 40 je ffffffff810870b0
> ffffffff81087070: 4c 89 f8 mov %r15,%rax
> ffffffff81087073: 49 87 85 98 00 00 00 xchg %rax,0x98(%r13)
> ffffffff8108707a: 49 29 45 70 sub %rax,0x70(%r13)
> ffffffff8108707e: 4c 89 f9 mov %r15,%rcx
> ffffffff81087081: bb 01 00 00 00 mov $0x1,%ebx
> ffffffff81087086: 49 83 7d 70 00 cmpq $0x0,0x70(%r13)
> ffffffff8108708b: 49 0f 49 4d 70 cmovns 0x70(%r13),%rcx
>
> Which you'll note ends up with sa->load_avg -= r in memory at
> ffffffff8108707a.

So I _should_ have looked at other unserialized users of ->load_avg,
but alas. Luckily nikbor reported a similar /0 from task_h_load() which
instantly triggered recollection of this here problem.

Aside from the intermediate value hitting memory and causing problems,
there's another problem: the underflow detection relies on the signed
bit. This reduces the effective width of the variables, IOW its
effectively the same as having these variables be of signed type.

This patch changes to a different means of unsigned underflow
detection to not rely on the signed bit. This allows the variables to
use the 'full' unsigned range. And it does so with explicit LOAD -
STORE to ensure any intermediate value will never be visible in
memory, allowing these unserialized loads.

Note: GCC generates crap code for this, might warrant a look later.

Note2: I say 'full' above, if we end up at U*_MAX we'll still explode;
maybe we should do clamping on add too.

Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrey Ryabinin
Cc: Chris Wilson
Cc: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Yuyang Du
Cc: bsegall@google.com
Cc: kernel@kyup.com
Cc: morten.rasmussen@arm.com
Cc: pjt@google.com
Cc: steve.muckle@linaro.org
Fixes: 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking")
Link: http://lkml.kernel.org/r/20160617091948.GJ30927@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar

Peter Zijlstra
2016-06-20 17:29:09 +0800

17 Jun, 2016

2 commits

8fa3b8d68 cgroup: set css->id to -1 during init ... Browse Code »

If percpu_ref initialization fails during css_create(), the free path
can end up trying to free css->id of zero. As ID 0 is unused, it
doesn't cause a critical breakage but it does trigger a warning
message. Fix it by setting css->id to -1 from init_and_link_css().

Signed-off-by: Tejun Heo
Cc: Wenwei Tao
Fixes: 01e586598b22 ("cgroup: release css->id after css_free")
Cc: stable@vger.kernel.org # v4.0+
Signed-off-by: Tejun Heo

Tejun Heo
2016-06-17 05:59:35 +0800
d945b5e9f workqueue: Fix setting affinity of unbound worker threads ... Browse Code »

With commit e9d867a67fd03ccc ("sched: Allow per-cpu kernel threads to
run on online && !active"), __set_cpus_allowed_ptr() expects that only
strict per-cpu kernel threads can have affinity to an online CPU which
is not yet active.

This assumption is currently broken in the CPU_ONLINE notification
handler for the workqueues where restore_unbound_workers_cpumask()
calls set_cpus_allowed_ptr() when the first cpu in the unbound
worker's pool->attr->cpumask comes online. Since
set_cpus_allowed_ptr() is called with pool->attr->cpumask in which
only one CPU is online which is not yet active, we get the following
WARN_ON during an CPU online operation.

------------[ cut here ]------------
WARNING: CPU: 40 PID: 248 at kernel/sched/core.c:1166
__set_cpus_allowed_ptr+0x228/0x2e0
Modules linked in:
CPU: 40 PID: 248 Comm: cpuhp/40 Not tainted 4.6.0-autotest+ #4

Call Trace:
[c000000f273ff920] [c00000000010493c] __set_cpus_allowed_ptr+0x2cc/0x2e0 (unreliable)
[c000000f273ffac0] [c0000000000ed4b0] workqueue_cpu_up_callback+0x2c0/0x470
[c000000f273ffb70] [c0000000000f5c58] notifier_call_chain+0x98/0x100
[c000000f273ffbc0] [c0000000000c5ed0] __cpu_notify+0x70/0xe0
[c000000f273ffc00] [c0000000000c6028] notify_online+0x38/0x50
[c000000f273ffc30] [c0000000000c5214] cpuhp_invoke_callback+0x84/0x250
[c000000f273ffc90] [c0000000000c562c] cpuhp_up_callbacks+0x5c/0x120
[c000000f273ffce0] [c0000000000c64d4] cpuhp_thread_fun+0x184/0x1c0
[c000000f273ffd20] [c0000000000fa050] smpboot_thread_fn+0x290/0x2a0
[c000000f273ffd80] [c0000000000f45b0] kthread+0x110/0x130
[c000000f273ffe30] [c000000000009570] ret_from_kernel_thread+0x5c/0x6c
---[ end trace 00f1456578b2a3b2 ]---

This patch fixes this by limiting the mask to the intersection of
the pool affinity and online CPUs.

Changelog-cribbed-from: Gautham R. Shenoy
Reported-by: Abdul Haleem
Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Tejun Heo

Peter Zijlstra
2016-06-17 03:37:05 +0800

16 Jun, 2016

2 commits

ad572d174 bpf, trace: check event type in bpf_perf_event_read ... Browse Code »

similar to bpf_perf_event_output() the bpf_perf_event_read() helper
needs to check the type of the perf_event before reading the counter.

Fixes: a43eec304259 ("bpf: introduce bpf_perf_event_output() helper")
Reported-by: Daniel Borkmann
Signed-off-by: Alexei Starovoitov
Acked-by: Daniel Borkmann
Signed-off-by: David S. Miller

Alexei Starovoitov
2016-06-16 14:37:54 +0800
19de99f70 bpf: fix matching of data/data_end in verifier ... Browse Code »

The ctx structure passed into bpf programs is different depending on bpf
program type. The verifier incorrectly marked ctx->data and ctx->data_end
access based on ctx offset only. That caused loads in tracing programs
int bpf_prog(struct pt_regs *ctx) { .. ctx->ax .. }
to be incorrectly marked as PTR_TO_PACKET which later caused verifier
to reject the program that was actually valid in tracing context.
Fix this by doing program type specific matching of ctx offsets.

Fixes: 969bf05eb3ce ("bpf: direct packet access")
Reported-by: Sasha Goldshtein
Signed-off-by: Alexei Starovoitov
Acked-by: Daniel Borkmann
Signed-off-by: David S. Miller

Alexei Starovoitov
2016-06-16 14:37:54 +0800

15 Jun, 2016

1 commit

df4565f9e kernel/kcov: unproxify debugfs file's fops ... Browse Code »

Since commit 49d200deaa68 ("debugfs: prevent access to removed files'
private data"), a debugfs file's file_operations methods get proxied
through lifetime aware wrappers.

However, only a certain subset of the file_operations members is supported
by debugfs and ->mmap isn't among them -- it appears to be NULL from the
VFS layer's perspective.

This behaviour breaks the /sys/kernel/debug/kcov file introduced
concurrently with commit 5c9a8750a640 ("kernel: add kcov code coverage").

Since that file never gets removed, there is no file removal race and thus,
a lifetime checking proxy isn't needed.

Avoid the proxying for /sys/kernel/debug/kcov by creating it via
debugfs_create_file_unsafe() rather than debugfs_create_file().

Fixes: 49d200deaa68 ("debugfs: prevent access to removed files' private data")
Fixes: 5c9a8750a640 ("kernel: add kcov code coverage")
Reported-by: Sasha Levin
Signed-off-by: Nicolai Stange
Signed-off-by: Greg Kroah-Hartman

Nicolai Stange
2016-06-15 19:56:35 +0800

14 Jun, 2016

2 commits

57675cb97 kernel/sysrq, watchdog, sched/core: Reset watchdog on all CPUs while processing sysrq-w ... Browse Code »

Lengthy output of sysrq-w may take a lot of time on slow serial console.

Currently we reset NMI-watchdog on the current CPU to avoid spurious
lockup messages. Sometimes this doesn't work since softlockup watchdog
might trigger on another CPU which is waiting for an IPI to proceed.
We reset softlockup watchdogs on all CPUs, but we do this only after
listing all tasks, and this may be too late on a busy system.

So, reset watchdogs CPUs earlier, in for_each_process_thread() loop.

Signed-off-by: Andrey Ryabinin
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc:
Link: http://lkml.kernel.org/r/1465474805-14641-1-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Ingo Molnar

Andrey Ryabinin
2016-06-14 18:48:38 +0800
eda8dca51 sched/debug: Fix deadlock when enabling sched events ... Browse Code »

I see a hang when enabling sched events:

echo 1 > /sys/kernel/debug/tracing/events/sched/enable

The printk buffer shows:

BUG: spinlock recursion on CPU#1, swapper/1/0
lock: 0xffff88007d5d8c00, .magic: dead4ead, .owner: swapper/1/0, .owner_cpu: 1
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.7.0-rc2+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
...
Call Trace:
[] dump_stack+0x85/0xc2
[] spin_dump+0x78/0xc0
[] do_raw_spin_lock+0x11a/0x150
[] _raw_spin_lock+0x61/0x80
[] ? try_to_wake_up+0x256/0x4e0
[] try_to_wake_up+0x256/0x4e0
[] ? _raw_spin_unlock_irqrestore+0x4a/0x80
[] wake_up_process+0x15/0x20
[] insert_work+0x84/0xc0
[] __queue_work+0x18f/0x660
[] queue_work_on+0x46/0x90
[] drm_fb_helper_dirty.isra.11+0xcb/0xe0 [drm_kms_helper]
[] drm_fb_helper_sys_imageblit+0x30/0x40 [drm_kms_helper]
[] soft_cursor+0x1ad/0x230
[] bit_cursor+0x649/0x680
[] ? update_attr.isra.2+0x90/0x90
[] fbcon_cursor+0x14a/0x1c0
[] hide_cursor+0x28/0x90
[] vt_console_print+0x3bf/0x3f0
[] call_console_drivers.constprop.24+0x183/0x200
[] console_unlock+0x3d4/0x610
[] vprintk_emit+0x3c5/0x610
[] vprintk_default+0x29/0x40
[] printk+0x57/0x73
[] enqueue_entity+0xc2e/0xc70
[] enqueue_task_fair+0x59/0xab0
[] ? kvm_sched_clock_read+0x9/0x20
[] ? sched_clock+0x9/0x10
[] activate_task+0x5c/0xa0
[] ttwu_do_activate+0x54/0xb0
[] sched_ttwu_pending+0x7a/0xb0
[] scheduler_ipi+0x61/0x170
[] smp_trace_reschedule_interrupt+0x4f/0x2a0
[] trace_reschedule_interrupt+0x96/0xa0
[] ? native_safe_halt+0x6/0x10
[] ? trace_hardirqs_on+0xd/0x10
[] default_idle+0x20/0x1a0
[] arch_cpu_idle+0xf/0x20
[] default_idle_call+0x2f/0x50
[] cpu_startup_entry+0x37e/0x450
[] start_secondary+0x160/0x1a0

Note the hang only occurs when echoing the above from a physical serial
console, not from an ssh session.

The bug is caused by a deadlock where the task is trying to grab the rq
lock twice because printk()'s aren't safe in sched code.

Signed-off-by: Josh Poimboeuf
Cc: Linus Torvalds
Cc: Matt Fleming
Cc: Mel Gorman
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Srikar Dronamraju
Cc: Thomas Gleixner
Cc: stable@vger.kernel.org
Fixes: cb2517653fcc ("sched/debug: Make schedstats a runtime tunable that is disabled by default")
Link: http://lkml.kernel.org/r/20160613073209.gdvdybiruljbkn3p@treble
Signed-off-by: Ingo Molnar

Josh Poimboeuf
2016-06-14 18:47:21 +0800