13 Jul, 2017
3 commits
-
…d963889dbbe2b0aa3ae11
Conflicts:
drivers/mxc/gpu-viv/hal/os/linux/kernel/platform/freescale/gc_hal_kernel_platform_imx6q14.config -
Enable vndbinder device else mediacodecservice crash with below log.
2471 I /vendor/bin/hw/android.hardware.media.omx@1.0-service: mediacodecservice starting
2471 W ProcessState: Opening '/dev/vndbinder' failed: No such file or directory
2471 F ProcessState: Binder driver could not be opened. Terminating.
2471 F libc : Fatal signal 6 (SIGABRT), code -6 in tid 2471 (android.hardwar)Change-Id: I2563fb664bbd8f3b130a817568fd4708d9f76f6f
Signed-off-by: Richard Liu -
…d963889dbbe2b0aa3ae11
Conflicts:
drivers/cpufreq/cpufreq_interactive.c
drivers/dma/imx-sdma.c
drivers/mmc/core/core.c
drivers/mmc/core/host.c
drivers/staging/android/Kconfig
09 Jun, 2017
1 commit
-
When the tick is stopped and we reach the dynticks evaluation code on
IRQ exit, we perform a soft tick restart if we observe an expired timer
from there. It means we program the nearest possible tick but we stay in
dynticks mode (ts->tick_stopped = 1) because we may need to stop the tick
again after that expired timer is handled.Now this solution works most of the time but if we suffer an IRQ storm
and those interrupts trigger faster than the hardware clockevents min
delay, our tick won't fire until that IRQ storm is finished.Here is the problem: on IRQ exit we reprog the timer to at least
NOW() + min_clockevents_delay. Another IRQ fires before the tick so we
reschedule again to NOW() + min_clockevents_delay, etc... The tick
is eternally rescheduled min_clockevents_delay ahead.A solution is to simply remove this soft tick restart. After all
the normal dynticks evaluation path can handle 0 delay just fine. And
by doing that we benefit from the optimization branch which avoids
clock reprogramming if the clockevents deadline hasn't changed since
the last reprog. This fixes our issue because we don't do repetitive
clock reprog that always add hardware min delay.As a side effect it should even optimize the 0 delay path in general.
Reported-and-tested-by: Octavian Purdila
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Rik van Riel
Cc: Ingo Molnar
Signed-off-by: Frederic Weisbecker
22 Mar, 2017
14 commits
-
am: c406096522
Change-Id: I2ac74fa5f03a3dff0b45e91d8be4e80c9c70da98
-
am: 1522181f4b
Change-Id: I74d73e5550ccb3a8d846f636b4f7940b124d261c
-
am: 6244ffc5a1
Change-Id: I973ed47c736e8deeebcea67e885abfb557fc1469
-
am: 0e0f1d6fdb
Change-Id: I3990cdd8bc095423723df1771ec87158dc8cb63a
-
am: 1889d6d9b5
Change-Id: I528df00bc5cd00aec10f56ad53907f6ca8044973
-
am: b7f5aa1ca0
Change-Id: I1d61f752ac33e41ebb748eba7d3f48003ee29c19
-
am: 1411707acb
Change-Id: I02e316379d5a19258ea601e75b9f4cece6450acd
-
commit 17fcbd590d0c3e35bd9646e2215f86586378bc42 upstream.
We hang if SIGKILL has been sent, but the task is stuck in down_read()
(after do_exit()), even though no task is doing down_write() on the
rwsem in question:INFO: task libupnp:21868 blocked for more than 120 seconds.
libupnp D 0 21868 1 0x08100008
...
Call Trace:
__schedule()
schedule()
__down_read()
do_exit()
do_group_exit()
__wake_up_parent()This bug has already been fixed for CONFIG_RWSEM_XCHGADD_ALGORITHM=y in
the following commit:04cafed7fc19 ("locking/rwsem: Fix down_write_killable()")
... however, this bug also exists for CONFIG_RWSEM_GENERIC_SPINLOCK=y.
Signed-off-by: Niklas Cassel
Signed-off-by: Peter Zijlstra (Intel)
Cc:
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Niklas Cassel
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Fixes: d47996082f52 ("locking/rwsem: Introduce basis for down_write_killable()")
Link: http://lkml.kernel.org/r/1487981873-12649-1-git-send-email-niklass@axis.com
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit 9bbb25afeb182502ca4f2c4f3f88af0681b34cae upstream.
Thomas spotted that fixup_pi_state_owner() can return errors and we
fail to unlock the rt_mutex in that case.Reported-by: Thomas Gleixner
Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Darren Hart
Cc: juri.lelli@arm.com
Cc: bigeasy@linutronix.de
Cc: xlpang@redhat.com
Cc: rostedt@goodmis.org
Cc: mathieu.desnoyers@efficios.com
Cc: jdesfossez@efficios.com
Cc: dvhart@infradead.org
Cc: bristot@redhat.com
Link: http://lkml.kernel.org/r/20170304093558.867401760@infradead.org
Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman -
commit c236c8e95a3d395b0494e7108f0d41cf36ec107c upstream.
While working on the futex code, I stumbled over this potential
use-after-free scenario. Dmitry triggered it later with syzkaller.pi_mutex is a pointer into pi_state, which we drop the reference on in
unqueue_me_pi(). So any access to that pointer after that is bad.Since other sites already do rt_mutex_unlock() with hb->lock held, see
for example futex_lock_pi(), simply move the unlock before
unqueue_me_pi().Reported-by: Dmitry Vyukov
Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Darren Hart
Cc: juri.lelli@arm.com
Cc: bigeasy@linutronix.de
Cc: xlpang@redhat.com
Cc: rostedt@goodmis.org
Cc: mathieu.desnoyers@efficios.com
Cc: jdesfossez@efficios.com
Cc: dvhart@infradead.org
Cc: bristot@redhat.com
Link: http://lkml.kernel.org/r/20170304093558.801744246@infradead.org
Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 6760bf2ddde8ad64f8205a651223a93de3a35494 ]
Martin reported a verifier issue that hit the BUG_ON() for his
test case in the mark_reg_unknown_value() function:[ 202.861380] kernel BUG at kernel/bpf/verifier.c:467!
[...]
[ 203.291109] Call Trace:
[ 203.296501] [] mark_map_reg+0x45/0x50
[ 203.308225] [] mark_map_regs+0x78/0x90
[ 203.320140] [] do_check+0x226d/0x2c90
[ 203.331865] [] bpf_check+0x48b/0x780
[ 203.343403] [] bpf_prog_load+0x27e/0x440
[ 203.355705] [] ? handle_mm_fault+0x11af/0x1230
[ 203.369158] [] ? security_capable+0x48/0x60
[ 203.382035] [] SyS_bpf+0x124/0x960
[ 203.393185] [] ? __do_page_fault+0x276/0x490
[ 203.406258] [] entry_SYSCALL_64_fastpath+0x13/0x94This issue got uncovered after the fix in a08dd0da5307 ("bpf: fix
regression on verifier pruning wrt map lookups"). The reason why it
wasn't noticed before was, because as mentioned in a08dd0da5307,
mark_map_regs() was doing the id matching incorrectly based on the
uncached regs[regno].id. So, in the first loop, we walked all regs
and as soon as we found regno == i, then this reg's id was cleared
when calling mark_reg_unknown_value() thus that every subsequent
register was probed against id of 0 (which, in combination with the
PTR_TO_MAP_VALUE_OR_NULL type is an invalid condition that no other
register state can hold), and therefore wasn't type transitioned such
as in the spilled register case for the second loop.Now since that got fixed, it turned out that 57a09bf0a416 ("bpf:
Detect identical PTR_TO_MAP_VALUE_OR_NULL registers") used
mark_reg_unknown_value() incorrectly for the spilled regs, and thus
hitting the BUG_ON() in some cases due to regno >= MAX_BPF_REG.Although spilled regs have the same type as the non-spilled regs
for the verifier state, that is, struct bpf_reg_state, they are
semantically different from the non-spilled regs. In other words,
there can be up to 64 (MAX_BPF_STACK / BPF_REG_SIZE) spilled regs
in the stack, for example, register R could have been spilled by
the program to stack location X, Y, Z, and in mark_map_regs() we
need to scan these stack slots of type STACK_SPILL for potential
registers that we have to transition from PTR_TO_MAP_VALUE_OR_NULL.
Therefore, depending on the location, the spilled_regs regno can
be a lot higher than just MAX_BPF_REG's value since we operate on
stack instead. The reset in mark_reg_unknown_value() itself is
just fine, only that the BUG_ON() was inappropriate for this. Fix
it by making a __mark_reg_unknown_value() version that can be
called from mark_map_reg() generically; we know for the non-spilled
case that the regno is always < MAX_BPF_REG anyway.Fixes: 57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers")
Reported-by: Martin KaFai Lau
Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit a08dd0da5307ba01295c8383923e51e7997c3576 ]
Commit 57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL
registers") introduced a regression where existing programs stopped
loading due to reaching the verifier's maximum complexity limit,
whereas prior to this commit they were loading just fine; the affected
program has roughly 2k instructions.What was found is that state pruning couldn't be performed effectively
anymore due to mismatches of the verifier's register state, in particular
in the id tracking. It doesn't mean that 57a09bf0a416 is incorrect per
se, but rather that verifier needs to perform a lot more work for the
same program with regards to involved map lookups.Since commit 57a09bf0a416 is only about tracking registers with type
PTR_TO_MAP_VALUE_OR_NULL, the id is only needed to follow registers
until they are promoted through pattern matching with a NULL check to
either PTR_TO_MAP_VALUE or UNKNOWN_VALUE type. After that point, the
id becomes irrelevant for the transitioned types.For UNKNOWN_VALUE, id is already reset to 0 via mark_reg_unknown_value(),
but not so for PTR_TO_MAP_VALUE where id is becoming stale. It's even
transferred further into other types that don't make use of it. Among
others, one example is where UNKNOWN_VALUE is set on function call
return with RET_INTEGER return type.states_equal() will then fall through the memcmp() on register state;
note that the second memcmp() uses offsetofend(), so the id is part of
that since d2a4dd37f6b4 ("bpf: fix state equivalence"). But the bisect
pointed already to 57a09bf0a416, where we really reach beyond complexity
limit. What I found was that states_equal() often failed in this
case due to id mismatches in spilled regs with registers in type
PTR_TO_MAP_VALUE. Unlike non-spilled regs, spilled regs just perform
a memcmp() on their reg state and don't have any other optimizations
in place, therefore also id was relevant in this case for making a
pruning decision.We can safely reset id to 0 as well when converting to PTR_TO_MAP_VALUE.
For the affected program, it resulted in a ~17 fold reduction of
complexity and let the program load fine again. Selftest suite also
runs fine. The only other place where env->id_gen is used currently is
through direct packet access, but for these cases id is long living, thus
a different scenario.Also, the current logic in mark_map_regs() is not fully correct when
marking NULL branch with UNKNOWN_VALUE. We need to cache the destination
reg's id in any case. Otherwise, once we marked that reg as UNKNOWN_VALUE,
it's id is reset and any subsequent registers that hold the original id
and are of type PTR_TO_MAP_VALUE_OR_NULL won't be marked UNKNOWN_VALUE
anymore, since mark_map_reg() reuses the uncached regs[regno].id that
was just overridden. Note, we don't need to cache it outside of
mark_map_regs(), since it's called once on this_branch and the other
time on other_branch, which are both two independent verifier states.
A test case for this is added here, too.Fixes: 57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers")
Signed-off-by: Daniel Borkmann
Acked-by: Thomas Graf
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit d2a4dd37f6b41fbcad76efbf63124eb3126c66fe ]
Commmits 57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers")
and 484611357c19 ("bpf: allow access into map value arrays") by themselves
are correct, but in combination they make state equivalence ignore 'id' field
of the register state which can lead to accepting invalid program.Fixes: 57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers")
Fixes: 484611357c19 ("bpf: allow access into map value arrays")
Signed-off-by: Alexei Starovoitov
Acked-by: Daniel Borkmann
Acked-by: Thomas Graf
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 57a09bf0a416700676e77102c28f9cfcb48267e0 ]
A BPF program is required to check the return register of a
map_elem_lookup() call before accessing memory. The verifier keeps
track of this by converting the type of the result register from
PTR_TO_MAP_VALUE_OR_NULL to PTR_TO_MAP_VALUE after a conditional
jump ensures safety. This check is currently exclusively performed
for the result register 0.In the event the compiler reorders instructions, BPF_MOV64_REG
instructions may be moved before the conditional jump which causes
them to keep their type PTR_TO_MAP_VALUE_OR_NULL to which the
verifier objects when the register is accessed:0: (b7) r1 = 10
1: (7b) *(u64 *)(r10 -8) = r1
2: (bf) r2 = r10
3: (07) r2 += -8
4: (18) r1 = 0x59c00000
6: (85) call 1
7: (bf) r4 = r0
8: (15) if r0 == 0x0 goto pc+1
R0=map_value(ks=8,vs=8) R4=map_value_or_null(ks=8,vs=8) R10=fp
9: (7a) *(u64 *)(r4 +0) = 0
R4 invalid mem access 'map_value_or_null'This commit extends the verifier to keep track of all identical
PTR_TO_MAP_VALUE_OR_NULL registers after a map_elem_lookup() by
assigning them an ID and then marking them all when the conditional
jump is observed.Signed-off-by: Thomas Graf
Reviewed-by: Josef Bacik
Acked-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
18 Mar, 2017
3 commits
-
am: ee6f7ee1e4
Change-Id: I222bf26458f918b2481e0c14ce2b0bf5fd3c85d8
-
commit 040757f738e13caaa9c5078bca79aa97e11dde88 upstream.
Always increment/decrement ucount->count under the ucounts_lock. The
increments are there already and moving the decrements there means the
locking logic of the code is simpler. This simplification in the
locking logic fixes a race between put_ucounts and get_ucounts that
could result in a use-after-free because the count could go zero then
be found by get_ucounts and then be freed by put_ucounts.A bug presumably this one was found by a combination of syzkaller and
KASAN. JongWhan Kim reported the syzkaller failure and Dmitry Vyukov
spotted the race in the code.Fixes: f6b2db1a3e8d ("userns: Make the count of user namespaces per user")
Reported-by: JongHwan Kim
Reported-by: Dmitry Vyukov
Reviewed-by: Andrei Vagin
Signed-off-by: "Eric W. Biederman"
Signed-off-by: Greg Kroah-Hartman -
EAS uses "const struct sched_group_energy * const" fairly consistently.
But a couple of places swap the "*" and second "const", making the
pointer mutable.In the case of struct sched_group, "* const" would have been an error,
since init_sched_energy() writes to sd->groups->sge.Change-Id: Ic6a8fcf99e65c0f25d9cc55c32625ef3ca5c9aca
Signed-off-by: Greg Hackmann
15 Mar, 2017
2 commits
-
am: d3381fab77
Change-Id: I5f0019d2c86afd0a055f823f6751db9f6378e38f
-
commit 93faccbbfa958a9668d3ab4e30f38dd205cee8d8 upstream.
To support unprivileged users mounting filesystems two permission
checks have to be performed: a test to see if the user allowed to
create a mount in the mount namespace, and a test to see if
the user is allowed to access the specified filesystem.The automount case is special in that mounting the original filesystem
grants permission to mount the sub-filesystems, to any user who
happens to stumble across the their mountpoint and satisfies the
ordinary filesystem permission checks.Attempting to handle the automount case by using override_creds
almost works. It preserves the idea that permission to mount
the original filesystem is permission to mount the sub-filesystem.
Unfortunately using override_creds messes up the filesystems
ordinary permission checks.Solve this by being explicit that a mount is a submount by introducing
vfs_submount, and using it where appropriate.vfs_submount uses a new mount internal mount flags MS_SUBMOUNT, to let
sget and friends know that a mount is a submount so they can take appropriate
action.sget and sget_userns are modified to not perform any permission checks
on submounts.follow_automount is modified to stop using override_creds as that
has proven problemantic.do_mount is modified to always remove the new MS_SUBMOUNT flag so
that we know userspace will never by able to specify it.autofs4 is modified to stop using current_real_cred that was put in
there to handle the previous version of submount permission checking.cifs is modified to pass the mountpoint all of the way down to vfs_submount.
debugfs is modified to pass the mountpoint all of the way down to
trace_automount by adding a new parameter. To make this change easier
a new typedef debugfs_automount_t is introduced to capture the type of
the debugfs automount function.Fixes: 069d5ac9ae0d ("autofs: Fix automounts by using current_real_cred()->uid")
Fixes: aeaa4a79ff6a ("fs: Call d_automount with the filesystems creds")
Reviewed-by: Trond Myklebust
Reviewed-by: Seth Forshee
Signed-off-by: "Eric W. Biederman"
Signed-off-by: Greg Kroah-Hartman
14 Mar, 2017
1 commit
-
commit 26c2154816c7 ("ANDROID: sched: Introduce Window Assisted Load
Tracking (WALT)") which was a forward port of WALT from android 4.4
missed locking for 2 set_task_cpu() calls, re-added here.Signed-off-by: Andres Oportus
12 Mar, 2017
6 commits
-
am: 3de5a92847
Change-Id: Ib7d8806d40c785e0128af2d846bda76727aa1255
-
am: 6d94a6b32e
Change-Id: I972940b00813db773e278ce8b6bc46f3cba15988
-
am: f1faaec484
Change-Id: I7e37ea31b7376f63e9ff38b5d40020e793a64c1a
-
commit 907565337ebf998a68cb5c5b2174ce5e5da065eb upstream.
Userspace applications should be allowed to expect the membarrier system
call with MEMBARRIER_CMD_SHARED command to issue memory barriers on
nohz_full CPUs, but synchronize_sched() does not take those into
account.Given that we do not want unrelated processes to be able to affect
real-time sensitive nohz_full CPUs, simply return ENOSYS when membarrier
is invoked on a kernel with enabled nohz_full CPUs.Signed-off-by: Mathieu Desnoyers
CC: Josh Triplett
CC: Steven Rostedt
Signed-off-by: Paul E. McKenney
Cc: Frederic Weisbecker
Cc: Chris Metcalf
Cc: Rik van Riel
Acked-by: Lai Jiangshan
Reviewed-by: Josh Triplett
Signed-off-by: Greg Kroah-Hartman -
commit 441398d378f29a5ad6d0fcda07918e54e4961800 upstream.
Currently SS_AUTODISARM is not supported in compatibility mode, but does
not return -EINVAL either. This makes dosemu built with -m32 on x86_64
to crash. Also the kernel's sigaltstack selftest fails if compiled with
-m32.This patch adds the needed support.
Link: http://lkml.kernel.org/r/20170205101213.8163-2-stsp@list.ru
Signed-off-by: Stas Sergeev
Cc: Milosz Tanski
Cc: Andy Lutomirski
Cc: Al Viro
Cc: Arnd Bergmann
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Oleg Nesterov
Cc: Nicolas Pitre
Cc: Waiman Long
Cc: Dave Hansen
Cc: Dmitry Safonov
Cc: Wang Xiaoqiang
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman -
commit b5d24fda9c3dce51fcb4eee459550a458eaaf1e2 upstream.
The mem_hotplug_{begin,done} lock coordinates with {get,put}_online_mems()
to hold off "readers" of the current state of memory from new hotplug
actions. mem_hotplug_begin() expects exclusive access, via the
device_hotplug lock, to set mem_hotplug.active_writer. Calling
mem_hotplug_begin() without locking device_hotplug can lead to
corrupting mem_hotplug.refcount and missed wakeups / soft lockups.[dan.j.williams@intel.com: v2]
Link: http://lkml.kernel.org/r/148728203365.38457.17804568297887708345.stgit@dwillia2-desk3.amr.corp.intel.com
Link: http://lkml.kernel.org/r/148693885680.16345.17802627926777862337.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: f931ab479dd2 ("mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done}")
Signed-off-by: Dan Williams
Reported-by: Ben Hutchings
Cc: Michal Hocko
Cc: Toshi Kani
Cc: Vlastimil Babka
Cc: Logan Gunthorpe
Cc: Masayoshi Mizuma
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman
24 Feb, 2017
4 commits
-
commit f222449c9dfad7c9bb8cb53e64c5c407b172ebbc upstream.
We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock[ 48.951721] Possible unsafe locking scenario:
[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
[ 48.951934] [] (show_stack) from [] (dump_stack+0xac/0xe0)
[ 48.951934] [] (dump_stack) from [] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [] (print_circular_bug) from [] (validate_chain+0xf50/0x1324)
[ 48.951965] [] (validate_chain) from [] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [] (__lock_acquire) from [] (lock_acquire+0xe0/0x294)
[ 48.951995] [] (lock_acquire) from [] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [] (ktime_get_update_offsets_now) from [] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [] (retrigger_next_event) from [] (on_each_cpu+0x40/0x7c)
[ 48.952056] [] (on_each_cpu) from [] (clock_was_set_work+0x14/0x20)
[ 48.952056] [] (clock_was_set_work) from [] (process_one_work+0x2b4/0x808)
[ 48.952087] [] (process_one_work) from [] (worker_thread+0x3c/0x550)
[ 48.952087] [] (worker_thread) from [] (kthread+0x104/0x148)
[ 48.952087] [] (kthread) from [] (ret_from_fork+0x14/0x24)Replace printk() with printk_deferred(), which does not call into
the scheduler.Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren
Signed-off-by: Sergey Senozhatsky
Cc: Petr Mladek
Cc: Sergey Senozhatsky
Cc: Peter Zijlstra
Cc: "Rafael J . Wysocki"
Cc: Steven Rostedt
Cc: John Stultz
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman -
commit fc98c3c8c9dcafd67adcce69e6ce3191d5306c9c upstream.
Use rcuidle console tracepoint because, apparently, it may be issued
from an idle CPU:hw-breakpoint: Failed to enable monitor mode on CPU 0.
hw-breakpoint: CPU 0 failed to disable vector catch===============================
[ ERR: suspicious RCU usage. ]
4.10.0-rc8-next-20170215+ #119 Not tainted
-------------------------------
./include/trace/events/printk.h:32 suspicious rcu_dereference_check() usage!other info that might help us debug this:
RCU used illegally from idle CPU!
rcu_scheduler_active = 2, debug_locks = 0
RCU used illegally from extended quiescent state!
2 locks held by swapper/0/0:
#0: (cpu_pm_notifier_lock){......}, at: [] cpu_pm_exit+0x10/0x54
#1: (console_lock){+.+.+.}, at: [] vprintk_emit+0x264/0x474stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.0-rc8-next-20170215+ #119
Hardware name: Generic OMAP4 (Flattened Device Tree)
console_unlock
vprintk_emit
vprintk_default
printk
reset_ctrl_regs
dbg_cpu_pm_notify
notifier_call_chain
cpu_pm_exit
omap_enter_idle_coupled
cpuidle_enter_state
cpuidle_enter_state_coupled
do_idle
cpu_startup_entry
start_kernelThis RCU warning, however, is suppressed by lockdep_off() in printk().
lockdep_off() increments the ->lockdep_recursion counter and thus
disables RCU_LOCKDEP_WARN() and debug_lockdep_rcu_enabled(), which want
lockdep to be enabled "current->lockdep_recursion == 0".Link: http://lkml.kernel.org/r/20170217015932.11898-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky
Reported-by: Tony Lindgren
Tested-by: Tony Lindgren
Acked-by: Paul E. McKenney
Acked-by: Steven Rostedt (VMware)
Cc: Petr Mladek
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Tony Lindgren
Cc: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman -
commit 25f71d1c3e98ef0e52371746220d66458eac75bc upstream.
The UEVENT user mode helper is enabled before the initcalls are executed
and is available when the root filesystem has been mounted.The user mode helper is triggered by device init calls and the executable
might use the futex syscall.futex_init() is marked __initcall which maps to device_initcall, but there
is no guarantee that futex_init() is invoked _before_ the first device init
call which triggers the UEVENT user mode helper.If the user mode helper uses the futex syscall before futex_init() then the
syscall crashes with a NULL pointer dereference because the futex subsystem
has not been initialized yet.Move futex_init() to core_initcall so futexes are initialized before the
root filesystem is mounted and the usermode helper becomes available.[ tglx: Rewrote changelog ]
Signed-off-by: Yang Yang
Cc: jiang.biao2@zte.com.cn
Cc: jiang.zhengxiong@zte.com.cn
Cc: zhong.weidong@zte.com.cn
Cc: deng.huali@zte.com.cn
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1483085875-6130-1-git-send-email-yang.yang29@zte.com.cn
Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman
23 Feb, 2017
2 commits
-
Move the x86_64 idle notifiers originally by Andi Kleen and Venkatesh
Pallipadi to generic.Change-Id: Idf29cda15be151f494ff245933c12462643388d5
Acked-by: Nicolas Pitre
Signed-off-by: Todd Poynor -
These macros can be reused by governors which don't use the common
governor code present in cpufreq_governor.c and should be moved to the
relevant header.Now that they are getting moved to the right header file, reuse them in
schedutil governor as well (that required rename of show/store
routines).Also create gov_attr_wo() macro for write-only sysfs files, this will be
used by Interactive governor in a later patch.Signed-off-by: Viresh Kumar
17 Feb, 2017
4 commits
-
Modify the schedutil cpufreq governor to boost the CPU
frequency if the SCHED_CPUFREQ_IOWAIT flag is passed to
it via cpufreq_update_util().If that happens, the frequency is set to the maximum during
the first update after receiving the SCHED_CPUFREQ_IOWAIT flag
and then the boost is reduced by half during each following update.Signed-off-by: Rafael J. Wysocki
Looks-good-to: Steve Muckle
Acked-by: Peter Zijlstra (Intel)
(cherry picked from commit 21ca6d2c52f8ca8638129c1dfc489d0b0ae68532) -
PELT does not consider SMT when scaling its utilization values via
arch_scale_cpu_capacity(). The value in rq->cpu_capacity_orig does
take SMT into consideration though and therefore may be smaller than
the utilization reported by PELT.On an Intel i7-3630QM for example rq->cpu_capacity_orig is 589 but
util_avg scales up to 1024. This means that a 50% utilized CPU will show
up in schedutil as ~86% busy.Fix this by using the same CPU scaling value in schedutil as that which
is used by PELT.Signed-off-by: Steve Muckle
Signed-off-by: Rafael J. Wysocki
(cherry picked from commit 8314bc83f6a33958a033955e9bdc48e8dd4d5fb0) -
It is useful to know the reason why cpufreq_update_util() has just
been called and that can be passed as flags to cpufreq_update_util()
and to the ->func() callback in struct update_util_data. However,
doing that in addition to passing the util and max arguments they
already take would be clumsy, so avoid it.Instead, use the observation that the schedutil governor is part
of the scheduler proper, so it can access scheduler data directly.
This allows the util and max arguments of cpufreq_update_util()
and the ->func() callback in struct update_util_data to be replaced
with a flags one, but schedutil has to be modified to follow.Thus make the schedutil governor obtain the CFS utilization
information from the scheduler and use the "RT" and "DL" flags
instead of the special utilization value of ULONG_MAX to track
updates from the RT and DL sched classes. Make it non-modular
too to avoid having to export scheduler variables to modules at
large.Next, update all of the other users of cpufreq_update_util()
and the ->func() callback in struct update_util_data accordingly.Suggested-by: Peter Zijlstra
Signed-off-by: Rafael J. Wysocki
Acked-by: Peter Zijlstra (Intel)
Acked-by: Viresh Kumar
(cherry picked from commit 58919e83c85c3a3c5fb34025dc0e95ddd998c478) -
The slow-path frequency transition path is relatively expensive as it
requires waking up a thread to do work. Should support be added for
remote CPU cpufreq updates that is also expensive since it requires an
IPI. These activities should be avoided if they are not necessary.To that end, calculate the actual driver-supported frequency required by
the new utilization value in schedutil by using the recently added
cpufreq_driver_resolve_freq API. If it is the same as the previously
requested driver frequency then there is no need to continue with the
update assuming the cpu frequency limits have not changed. This will
have additional benefits should the semantics of the rate limit be
changed to apply solely to frequency transitions rather than to
frequency calculations in schedutil.The last raw required frequency is cached. This allows the driver
frequency lookup to be skipped in the event that the new raw required
frequency matches the last one, assuming a frequency update has not been
forced due to limits changing (indicated by a next_freq value of
UINT_MAX, see sugov_should_update_freq).Signed-off-by: Steve Muckle
Reviewed-by: Viresh Kumar
Signed-off-by: Rafael J. Wysocki
(cherry picked from commit 5cbea46984d67f614c74c4401b54b9d681861e80)