Eric Lee / smarc-fsl-linux-kernel

15 Jan, 2016

1 commit

edb6a7985 Move x86_64 idle notifiers to generic ... Browse Code »

Move the x86_64 idle notifiers originally by Andi Kleen and Venkatesh
Pallipadi to generic.

Change-Id: Idf29cda15be151f494ff245933c12462643388d5
Acked-by: Nicolas Pitre
Signed-off-by: Todd Poynor

Todd Poynor
2016-01-15 01:03:00 +0800

15 Dec, 2015

1 commit

879e7d33b bpf, array: fix heap out-of-bounds access when updating elements ... Browse Code »

[ Upstream commit fbca9d2d35c6ef1b323fae75cc9545005ba25097 ]

During own review but also reported by Dmitry's syzkaller [1] it has been
noticed that we trigger a heap out-of-bounds access on eBPF array maps
when updating elements. This happens with each map whose map->value_size
(specified during map creation time) is not multiple of 8 bytes.

In array_map_alloc(), elem_size is round_up(attr->value_size, 8) and
used to align array map slots for faster access. However, in function
array_map_update_elem(), we update the element as ...

memcpy(array->value + array->elem_size * index, value, array->elem_size);

... where we access 'value' out-of-bounds, since it was allocated from
map_update_elem() from syscall side as kmalloc(map->value_size, GFP_USER)
and later on copied through copy_from_user(value, uvalue, map->value_size).
Thus, up to 7 bytes, we can access out-of-bounds.

Same could happen from within an eBPF program, where in worst case we
access beyond an eBPF program's designated stack.

Since 1be7f75d1668 ("bpf: enable non-root eBPF programs") didn't hit an
official release yet, it only affects priviledged users.

In case of array_map_lookup_elem(), the verifier prevents eBPF programs
from accessing beyond map->value_size through check_map_access(). Also
from syscall side map_lookup_elem() only copies map->value_size back to
user, so nothing could leak.

[1] http://github.com/google/syzkaller

Fixes: 28fbcfa08d8e ("bpf: add array type of eBPF maps")
Reported-by: Dmitry Vyukov
Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Daniel Borkmann
2015-12-15 13:24:32 +0800

10 Nov, 2015

1 commit

310b5f129 module: Fix locking in symbol_put_addr() ... Browse Code »

commit 275d7d44d802ef271a42dc87ac091a495ba72fc5 upstream.

Poma (on the way to another bug) reported an assertion triggering:

[] module_assert_mutex_or_preempt+0x49/0x90
[] __module_address+0x32/0x150
[] __module_text_address+0x16/0x70
[] symbol_put_addr+0x29/0x40
[] dvb_frontend_detach+0x7d/0x90 [dvb_core]

Laura Abbott produced a patch which lead us to
inspect symbol_put_addr(). This function has a comment claiming it
doesn't need to disable preemption around the module lookup
because it holds a reference to the module it wants to find, which
therefore cannot go away.

This is wrong (and a false optimization too, preempt_disable() is really
rather cheap, and I doubt any of this is on uber critical paths,
otherwise it would've retained a pointer to the actual module anyway and
avoided the second lookup).

While its true that the module cannot go away while we hold a reference
on it, the data structure we do the lookup in very much _CAN_ change
while we do the lookup. Therefore fix the comment and add the
required preempt_disable().

Reported-by: poma
Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Rusty Russell
Fixes: a6e6abd575fc ("module: remove module_text_address()")
Signed-off-by: Greg Kroah-Hartman

Peter Zijlstra
2015-11-10 06:33:37 +0800

27 Oct, 2015

2 commits

98197d3de sched/preempt: Fix cond_resched_lock() and cond_resched_softirq() ... Browse Code »

commit fe32d3cd5e8eb0f82e459763374aa80797023403 upstream.

These functions check should_resched() before unlocking spinlock/bh-enable:
preempt_count always non-zero => should_resched() always returns false.
cond_resched_lock() worked iff spin_needbreak is set.

This patch adds argument "preempt_offset" to should_resched().

preempt_count offset constants for that:

PREEMPT_DISABLE_OFFSET - offset after preempt_disable()
PREEMPT_LOCK_OFFSET - offset after spin_lock()
SOFTIRQ_DISABLE_OFFSET - offset after local_bh_distable()
SOFTIRQ_LOCK_OFFSET - offset after spin_lock_bh()

Signed-off-by: Konstantin Khlebnikov
Signed-off-by: Peter Zijlstra (Intel)
Cc: Alexander Graf
Cc: Boris Ostrovsky
Cc: David Vrabel
Cc: Linus Torvalds
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Fixes: bdb438065890 ("sched: Extract the basic add/sub preempt_count modifiers")
Link: http://lkml.kernel.org/r/20150715095204.12246.98268.stgit@buzz
Signed-off-by: Ingo Molnar
Signed-off-by: Mike Galbraith
Signed-off-by: Greg Kroah-Hartman

Konstantin Khlebnikov
2015-10-27 08:51:58 +0800
65825ff63 workqueue: make sure delayed work run in local cpu ... Browse Code »

commit 874bbfe600a660cba9c776b3957b1ce393151b76 upstream.

My system keeps crashing with below message. vmstat_update() schedules a delayed
work in current cpu and expects the work runs in the cpu.
schedule_delayed_work() is expected to make delayed work run in local cpu. The
problem is timer can be migrated with NO_HZ. __queue_work() queues work in
timer handler, which could run in a different cpu other than where the delayed
work is scheduled. The end result is the delayed work runs in different cpu.
The patch makes __queue_delayed_work records local cpu earlier. Where the timer
runs doesn't change where the work runs with the change.

[ 28.010131] ------------[ cut here ]------------
[ 28.010609] kernel BUG at ../mm/vmstat.c:1392!
[ 28.011099] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
[ 28.011860] Modules linked in:
[ 28.012245] CPU: 0 PID: 289 Comm: kworker/0:3 Tainted: G W4.3.0-rc3+ #634
[ 28.013065] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153802- 04/01/2014
[ 28.014160] Workqueue: events vmstat_update
[ 28.014571] task: ffff880117682580 ti: ffff8800ba428000 task.ti: ffff8800ba428000
[ 28.015445] RIP: 0010:[] []vmstat_update+0x31/0x80
[ 28.016282] RSP: 0018:ffff8800ba42fd80 EFLAGS: 00010297
[ 28.016812] RAX: 0000000000000000 RBX: ffff88011a858dc0 RCX:0000000000000000
[ 28.017585] RDX: ffff880117682580 RSI: ffffffff81f14d8c RDI:ffffffff81f4df8d
[ 28.018366] RBP: ffff8800ba42fd90 R08: 0000000000000001 R09:0000000000000000
[ 28.019169] R10: 0000000000000000 R11: 0000000000000121 R12:ffff8800baa9f640
[ 28.019947] R13: ffff88011a81e340 R14: ffff88011a823700 R15:0000000000000000
[ 28.020071] FS: 0000000000000000(0000) GS:ffff88011a800000(0000)knlGS:0000000000000000
[ 28.020071] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 28.020071] CR2: 00007ff6144b01d0 CR3: 00000000b8e93000 CR4:00000000000006f0
[ 28.020071] Stack:
[ 28.020071] ffff88011a858dc0 ffff8800baa9f640 ffff8800ba42fe00ffffffff8106bd88
[ 28.020071] ffffffff8106bd0b 0000000000000096 0000000000000000ffffffff82f9b1e8
[ 28.020071] ffffffff829f0b10 0000000000000000 ffffffff81f18460ffff88011a81e340
[ 28.020071] Call Trace:
[ 28.020071] [] process_one_work+0x1c8/0x540
[ 28.020071] [] ? process_one_work+0x14b/0x540
[ 28.020071] [] worker_thread+0x114/0x460
[ 28.020071] [] ? process_one_work+0x540/0x540
[ 28.020071] [] kthread+0xf8/0x110
[ 28.020071] [] ?kthread_create_on_node+0x200/0x200
[ 28.020071] [] ret_from_fork+0x3f/0x70
[ 28.020071] [] ?kthread_create_on_node+0x200/0x200

Signed-off-by: Shaohua Li
Signed-off-by: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Shaohua Li
2015-10-27 08:51:56 +0800

23 Oct, 2015

6 commits

0cf68c236 genirq: Fix race in register_irq_proc() ... Browse Code »

commit 95c2b17534654829db428f11bcf4297c059a2a7e upstream.

Per-IRQ directories in procfs are created only when a handler is first
added to the irqdesc, not when the irqdesc is created. In the case of
a shared IRQ, multiple tasks can race to create a directory. This
race condition seems to have been present forever, but is easier to
hit with async probing.

Signed-off-by: Ben Hutchings
Link: http://lkml.kernel.org/r/1443266636.2004.2.camel@decadent.org.uk
Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Ben Hutchings
2015-10-23 05:43:25 +0800
c04b33c93 sched/fair: Prevent throttling in early pick_next_task_fair() ... Browse Code »

commit 54d27365cae88fbcc853b391dcd561e71acb81fa upstream.

The optimized task selection logic optimistically selects a new task
to run without first doing a full put_prev_task(). This is so that we
can avoid a put/set on the common ancestors of the old and new task.

Similarly, we should only call check_cfs_rq_runtime() to throttle
eligible groups if they're part of the common ancestry, otherwise it
is possible to end up with no eligible task in the simple task
selection.

Imagine:
/root
/prev /next
/A /B

If our optimistic selection ends up throttling /next, we goto simple
and our put_prev_task() ends up throttling /prev, after which we're
going to bug out in set_next_entity() because there aren't any tasks
left.

Avoid this scenario by only throttling common ancestors.

Reported-by: Mohammed Naser
Reported-by: Konstantin Khlebnikov
Signed-off-by: Ben Segall
[ munged Changelog ]
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Roman Gushchin
Cc: Thomas Gleixner
Cc: pjt@google.com
Fixes: 678d5718d8d0 ("sched/fair: Optimize cgroup pick_next_task_fair()")
Link: http://lkml.kernel.org/r/xm26wq1oswoq.fsf@sword-of-the-dawn.mtv.corp.google.com
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman

Ben Segall
2015-10-23 05:43:20 +0800
8210b9199 sched/core: Fix TASK_DEAD race in finish_task_switch() ... Browse Code »

commit 95913d97914f44db2b81271c2e2ebd4d2ac2df83 upstream.

So the problem this patch is trying to address is as follows:

CPU0 CPU1

context_switch(A, B)
ttwu(A)
LOCK A->pi_lock
A->on_cpu == 0
finish_task_switch(A)
prev_state = A->state on_cpu = 0; |
UNLOCK rq0->lock |
| context_switch(C, A)
`-- A->state = TASK_DEAD
prev_state == TASK_DEAD
put_task_struct(A)
context_switch(A, C)
finish_task_switch(A)
A->state == TASK_DEAD
put_task_struct(A)

The argument being that the WMB will allow the load of A->state on CPU0
to cross over and observe CPU1's store of A->state, which will then
result in a double-drop and use-after-free.

Now the comment states (and this was true once upon a long time ago)
that we need to observe A->state while holding rq->lock because that
will order us against the wakeup; however the wakeup will not in fact
acquire (that) rq->lock; it takes A->pi_lock these days.

We can obviously fix this by upgrading the WMB to an MB, but that is
expensive, so we'd rather avoid that.

The alternative this patch takes is: smp_store_release(&A->on_cpu, 0),
which avoids the MB on some archs, but not important ones like ARM.

Reported-by: Oleg Nesterov
Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Cc: manfred@colorfullife.com
Cc: will.deacon@arm.com
Fixes: e4a52bcb9a18 ("sched: Remove rq->lock from the first half of ttwu()")
Link: http://lkml.kernel.org/r/20150929124509.GG3816@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman

Peter Zijlstra
2015-10-23 05:43:14 +0800
feada04ec sched: access local runqueue directly in single_task_running ... Browse Code »

commit 00cc1633816de8c95f337608a1ea64e228faf771 upstream.

Commit 2ee507c47293 ("sched: Add function single_task_running to let a task
check if it is the only task running on a cpu") referenced the current
runqueue with the smp_processor_id. When CONFIG_DEBUG_PREEMPT is enabled,
that is only allowed if preemption is disabled or the currrent task is
bound to the local cpu (e.g. kernel worker).

With commit f78195129963 ("kvm: add halt_poll_ns module parameter") KVM
calls single_task_running. If CONFIG_DEBUG_PREEMPT is enabled that
generates a lot of kernel messages.

To avoid adding preemption in that cases, as it would limit the usefulness,
we change single_task_running to access directly the cpu local runqueue.

Cc: Tim Chen
Suggested-by: Peter Zijlstra
Acked-by: Peter Zijlstra (Intel)
Fixes: 2ee507c472939db4b146d545352b8a7c79ef47f8
Signed-off-by: Dominik Dingel
Signed-off-by: Paolo Bonzini
Signed-off-by: Greg Kroah-Hartman

Dominik Dingel
2015-10-23 05:43:12 +0800
125222276 perf: Fix AUX buffer refcounting ... Browse Code »

commit 57ffc5ca679f499f4704fd9b6a372916f59930ee upstream.

Its currently possible to drop the last refcount to the aux buffer
from NMI context, which results in the expected fireworks.

The refcounting needs a bigger overhaul, but to cure the immediate
problem, delay the freeing by using an irq_work.

Reviewed-and-tested-by: Alexander Shishkin
Reported-by: Vince Weaver
Signed-off-by: Peter Zijlstra (Intel)
Cc: Arnaldo Carvalho de Melo
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20150618103249.GK19282@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman

Peter Zijlstra
2015-10-23 05:43:12 +0800
a5428d82f time: Fix timekeeping_freqadjust()'s incorrect use of abs() instead of abs64() ... Browse Code »

commit 2619d7e9c92d524cb155ec89fd72875321512e5b upstream.

The internal clocksteering done for fine-grained error
correction uses a logarithmic approximation, so any time
adjtimex() adjusts the clock steering, timekeeping_freqadjust()
quickly approximates the correct clock frequency over a series
of ticks.

Unfortunately, the logic in timekeeping_freqadjust(), introduced
in commit:

dc491596f639 ("timekeeping: Rework frequency adjustments to work better w/ nohz")

used the abs() function with a s64 error value to calculate the
size of the approximated adjustment to be made.

Per include/linux/kernel.h:

"abs() should not be used for 64-bit types (s64, u64, long long) - use abs64()".

Thus on 32-bit platforms, this resulted in the clocksteering to
take a quite dampended random walk trying to converge on the
proper frequency, which caused the adjustments to be made much
slower then intended (most easily observed when large
adjustments are made).

This patch fixes the issue by using abs64() instead.

Reported-by: Nuno Gonçalves
Tested-by: Nuno Goncalves
Signed-off-by: John Stultz
Cc: Linus Torvalds
Cc: Miroslav Lichvar
Cc: Peter Zijlstra
Cc: Prarit Bhargava
Cc: Richard Cochran
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1441840051-20244-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman

John Stultz
2015-10-23 05:43:11 +0800

30 Sep, 2015

1 commit

11e43d237 unshare: Unsharing a thread does not require unsharing a vm ... Browse Code »

commit 12c641ab8270f787dfcce08b5f20ce8b65008096 upstream.

In the logic in the initial commit of unshare made creating a new
thread group for a process, contingent upon creating a new memory
address space for that process. That is wrong. Two separate
processes in different thread groups can share a memory address space
and clone allows creation of such proceses.

This is significant because it was observed that mm_users > 1 does not
mean that a process is multi-threaded, as reading /proc/PID/maps
temporarily increments mm_users, which allows other processes to
(accidentally) interfere with unshare() calls.

Correct the check in check_unshare_flags() to test for
!thread_group_empty() for CLONE_THREAD, CLONE_SIGHAND, and CLONE_VM.
For sighand->count > 1 for CLONE_SIGHAND and CLONE_VM.
For !current_is_single_threaded instead of mm_users > 1 for CLONE_VM.

By using the correct checks in unshare this removes the possibility of
an accidental denial of service attack.

Additionally using the correct checks in unshare ensures that only an
explicit unshare(CLONE_VM) can possibly trigger the slow path of
current_is_single_threaded(). As an explict unshare(CLONE_VM) is
pointless it is not expected there are many applications that make
that call.

Fixes: b2e0d98705e60e45bbb3c0032c48824ad7ae0704 userns: Implement unshare of the user namespace
Reported-by: Ricky Zhou
Reported-by: Kees Cook
Reviewed-by: Kees Cook
Signed-off-by: "Eric W. Biederman"
Signed-off-by: Greg Kroah-Hartman

Eric W. Biederman
2015-09-30 01:25:56 +0800

22 Sep, 2015

2 commits

d3b428f03 fs: create and use seq_show_option for escaping ... Browse Code »

commit a068acf2ee77693e0bf39d6e07139ba704f461c3 upstream.

Many file systems that implement the show_options hook fail to correctly
escape their output which could lead to unescaped characters (e.g. new
lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files. This
could lead to confusion, spoofed entries (resulting in things like
systemd issuing false d-bus "mount" notifications), and who knows what
else. This looks like it would only be the root user stepping on
themselves, but it's possible weird things could happen in containers or
in other situations with delegated mount privileges.

Here's an example using overlay with setuid fusermount trusting the
contents of /proc/mounts (via the /etc/mtab symlink). Imagine the use
of "sudo" is something more sneaky:

$ BASE="ovl"
$ MNT="$BASE/mnt"
$ LOW="$BASE/lower"
$ UP="$BASE/upper"
$ WORK="$BASE/work/ 0 0
none /proc fuse.pwn user_id=1000"
$ mkdir -p "$LOW" "$UP" "$WORK"
$ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none /mnt
$ cat /proc/mounts
none /root/ovl/mnt overlay rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0
none /proc fuse.pwn user_id=1000 0 0
$ fusermount -u /proc
$ cat /proc/mounts
cat: /proc/mounts: No such file or directory

This fixes the problem by adding new seq_show_option and
seq_show_option_n helpers, and updating the vulnerable show_option
handlers to use them as needed. Some, like SELinux, need to be open
coded due to unusual existing escape mechanisms.

[akpm@linux-foundation.org: add lost chunk, per Kees]
[keescook@chromium.org: seq_show_option should be using const parameters]
Signed-off-by: Kees Cook
Acked-by: Serge Hallyn
Acked-by: Jan Kara
Acked-by: Paul Moore
Cc: J. R. Okajima
Signed-off-by: Kees Cook
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Kees Cook
2015-09-22 01:05:45 +0800
668327e49 sched: Fix cpu_active_mask/cpu_online_mask race ... Browse Code »

commit dd9d3843755da95f63dd3a376f62b3e45c011210 upstream.

There is a race condition in SMP bootup code, which may result
in

WARNING: CPU: 0 PID: 1 at kernel/workqueue.c:4418
workqueue_cpu_up_callback()
or
kernel BUG at kernel/smpboot.c:135!

It can be triggered with a bit of luck in Linux guests running
on busy hosts.

CPU0 CPUn
==== ====

_cpu_up()
__cpu_up()
start_secondary()
set_cpu_online()
cpumask_set_cpu(cpu,
to_cpumask(cpu_online_bits));
cpu_notify(CPU_ONLINE)

cpumask_set_cpu(cpu,
to_cpumask(cpu_active_bits));

During the various CPU_ONLINE callbacks CPUn is online but not
active. Several things can go wrong at that point, depending on
the scheduling of tasks on CPU0.

Variant 1:

cpu_notify(CPU_ONLINE)
workqueue_cpu_up_callback()
rebind_workers()
set_cpus_allowed_ptr()

This call fails because it requires an active CPU; rebind_workers()
ends with a warning:

WARNING: CPU: 0 PID: 1 at kernel/workqueue.c:4418
workqueue_cpu_up_callback()

Variant 2:

cpu_notify(CPU_ONLINE)
smpboot_thread_call()
smpboot_unpark_threads()
..
__kthread_unpark()
__kthread_bind()
wake_up_state()
..
select_task_rq()
select_fallback_rq()

The ->wake_cpu of the unparked thread is not allowed, making a call
to select_fallback_rq() necessary. Then, select_fallback_rq() cannot
find an allowed, active CPU and promptly resets the allowed CPUs, so
that the task in question ends up on CPU0.

When those unparked tasks are eventually executed, they run
immediately into a BUG:

kernel BUG at kernel/smpboot.c:135!

Just changing the order in which the online/active bits are set
(and adding some memory barriers), would solve the two issues
above. However, it would change the order of operations back to
the one before commit 6acbfb96976f ("sched: Fix hotplug vs.
set_cpus_allowed_ptr()"), thus, reintroducing that particular
problem.

Going further back into history, we have at least the following
commits touching this topic:
- commit 2baab4e90495 ("sched: Fix select_fallback_rq() vs cpu_active/cpu_online")
- commit 5fbd036b552f ("sched: Cleanup cpu_active madness")

Together, these give us the following non-working solutions:

- secondary CPU sets active before online, because active is assumed to
be a subset of online;

- secondary CPU sets online before active, because the primary CPU
assumes that an online CPU is also active;

- secondary CPU sets online and waits for primary CPU to set active,
because it might deadlock.

Commit 875ebe940d77 ("powerpc/smp: Wait until secondaries are
active & online") introduces an arch-specific solution to this
arch-independent problem.

Now, go for a more general solution without explicit waiting and
simply set active twice: once on the secondary CPU after online
was set and once on the primary CPU after online was seen.

set_cpus_allowed_ptr()")

Signed-off-by: Jan H. Schönherr
Acked-by: Peter Zijlstra
Cc: Anton Blanchard
Cc: Borislav Petkov
Cc: Joerg Roedel
Cc: Linus Torvalds
Cc: Matt Wilson
Cc: Michael Ellerman
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Fixes: 6acbfb96976f ("sched: Fix hotplug vs. set_cpus_allowed_ptr()")
Link: http://lkml.kernel.org/r/1439408156-18840-1-git-send-email-jschoenh@amazon.de
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman

Jan H. Schönherr
2015-09-22 01:05:29 +0800

14 Sep, 2015

7 commits

0c7e8b8a0 genirq: Introduce irq_chip_set_type_parent() helper ... Browse Code »

commit b7560de198222994374c1340a389f12d5efb244a upstream.

This helper is required for irq chips which do not implement a
irq_set_type callback and need to call down the irq domain hierarchy
for the actual trigger type change.

This helper is required to fix further wreckage caused by the
conversion of TI OMAP to hierarchical irq domains and therefor tagged
for stable.

[ tglx: Massaged changelog ]

Signed-off-by: Grygorii Strashko
Cc: Sudeep Holla
Cc:
Cc:
Cc:
Cc:
Cc:
Cc:
Cc:
Cc: stable@vger.kernel.org # 4.1
Link: http://lkml.kernel.org/r/1439554830-19502-3-git-send-email-grygorii.strashko@ti.com
Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Grygorii Strashko
2015-09-14 00:07:51 +0800
533eb4c08 genirq: Don't return ENOSYS in irq_chip_retrigger_hierarchy ... Browse Code »

commit 6d4affea7d5aa5ca5ff4c3e5fbf3ee16801cc527 upstream.

irq_chip_retrigger_hierarchy() returns -ENOSYS if it was not able to
find at least one .irq_retrigger() callback implemented in the IRQ
domain hierarchy.

That's wrong, because check_irq_resend() expects a 0 return value from
the callback in case that the hardware assisted resend was not
possible. If the return value is non zero the core code assumes
hardware resend success and the software resend is not invoked.

This results in lost interrupts on platforms where none of the parent
irq chips in the hierarchy implements the retrigger callback.

This is observable on TI OMAP, where the hierarchy is:

ARM GIC
Reviewed-by: Marc Zyngier
Reviewed-by: Jiang Liu
Cc: Sudeep Holla
Cc:
Cc:
Cc:
Cc:
Cc:
Cc:
Link: http://lkml.kernel.org/r/1439554830-19502-2-git-send-email-grygorii.strashko@ti.com
Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Grygorii Strashko
2015-09-14 00:07:51 +0800
2f9de0cc2 cpuset: use trialcs->mems_allowed as a temp variable ... Browse Code »

commit 24ee3cf89bef04e8bc23788aca4e029a3f0f06d9 upstream.

The comment says it's using trialcs->mems_allowed as a temp variable but
it didn't match the code. Change the code to match the comment.

This fixes an issue when writing in cpuset.mems when a sub-directory
exists: we need to write several times for the information to persist:

| root@alban:/sys/fs/cgroup/cpuset# mkdir footest9
| root@alban:/sys/fs/cgroup/cpuset# cd footest9
| root@alban:/sys/fs/cgroup/cpuset/footest9# mkdir aa
| root@alban:/sys/fs/cgroup/cpuset/footest9# cat cpuset.mems
|
| root@alban:/sys/fs/cgroup/cpuset/footest9# echo 0 > cpuset.mems
| root@alban:/sys/fs/cgroup/cpuset/footest9# cat cpuset.mems
|
| root@alban:/sys/fs/cgroup/cpuset/footest9# echo 0 > cpuset.mems
| root@alban:/sys/fs/cgroup/cpuset/footest9# cat cpuset.mems
| 0
| root@alban:/sys/fs/cgroup/cpuset/footest9# cat aa/cpuset.mems
|
| root@alban:/sys/fs/cgroup/cpuset/footest9# echo 0 > aa/cpuset.mems
| root@alban:/sys/fs/cgroup/cpuset/footest9# cat aa/cpuset.mems
| 0
| root@alban:/sys/fs/cgroup/cpuset/footest9#

This should help to fix the following issue in Docker:
https://github.com/opencontainers/runc/issues/133
In some conditions, a Docker container needs to be started twice in
order to work.

Signed-off-by: Alban Crequy
Tested-by: Iago López Galeiras
Acked-by: Li Zefan
Signed-off-by: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Alban Crequy
2015-09-14 00:07:46 +0800
58b9cca67 perf: Fix PERF_EVENT_IOC_PERIOD migration race ... Browse Code »

commit c7999c6f3fed9e383d3131474588f282ae6d56b9 upstream.

I ran the perf fuzzer, which triggered some WARN()s which are due to
trying to stop/restart an event on the wrong CPU.

Use the normal IPI pattern to ensure we run the code on the correct CPU.

Signed-off-by: Peter Zijlstra (Intel)
Cc: Vince Weaver
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Fixes: bad7192b842c ("perf: Fix PERF_EVENT_IOC_PERIOD to force-reset the period")
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman

Peter Zijlstra
2015-09-14 00:07:39 +0800
db8d87e1c perf: Fix double-free of the AUX buffer ... Browse Code »

commit ee9397a6fb9bc4e52677f5e33eed4abee0f515e6 upstream.

If rb->aux_refcount is decremented to zero before rb->refcount,
__rb_free_aux() may be called twice resulting in a double free of
rb->aux_pages. Fix this by adding a check to __rb_free_aux().

Signed-off-by: Ben Hutchings
Signed-off-by: Peter Zijlstra (Intel)
Cc: Alexander Shishkin
Cc: Arnaldo Carvalho de Melo
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Fixes: 57ffc5ca679f ("perf: Fix AUX buffer refcounting")
Link: http://lkml.kernel.org/r/1437953468.12842.17.camel@decadent.org.uk
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman

Ben Hutchings
2015-09-14 00:07:39 +0800
b8cae722c perf: Fix running time accounting ... Browse Code »

commit 00a2916f7f82c348a2a94dbb572874173bc308a3 upstream.

A recent fix to the shadow timestamp inadvertly broke the running time
accounting.

We must not update the running timestamp if we fail to schedule the
event, the event will not have ran. This can (and did) result in
negative total runtime because the stopped timestamp was before the
running timestamp (we 'started' but never stopped the event -- because
it never really started we didn't have to stop it either).

Reported-and-Tested-by: Vince Weaver
Fixes: 72f669c0086f ("perf: Update shadow timestamp before add event")
Signed-off-by: Peter Zijlstra (Intel)
Cc: Shaohua Li
Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Peter Zijlstra
2015-09-14 00:07:39 +0800
75d370fe0 perf: Fix fasync handling on inherited events ... Browse Code »

commit fed66e2cdd4f127a43fd11b8d92a99bdd429528c upstream.

Vince reported that the fasync signal stuff doesn't work proper for
inherited events. So fix that.

Installing fasync allocates memory and sets filp->f_flags |= FASYNC,
which upon the demise of the file descriptor ensures the allocation is
freed and state is updated.

Now for perf, we can have the events stick around for a while after the
original FD is dead because of references from child events. So we
cannot copy the fasync pointer around. We can however consistently use
the parent's fasync, as that will be updated.

Reported-and-Tested-by: Vince Weaver
Signed-off-by: Peter Zijlstra (Intel)
Cc: Arnaldo Carvalho deMelo
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1434011521.1495.71.camel@twins
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman

Peter Zijlstra
2015-09-14 00:07:39 +0800

17 Aug, 2015

2 commits

52124831a signal: fix information leak in copy_siginfo_from_user32 ... Browse Code »

commit 3c00cb5e68dc719f2fc73a33b1b230aadfcb1309 upstream.

This function can leak kernel stack data when the user siginfo_t has a
positive si_code value. The top 16 bits of si_code descibe which fields
in the siginfo_t union are active, but they are treated inconsistently
between copy_siginfo_from_user32, copy_siginfo_to_user32 and
copy_siginfo_to_user.

copy_siginfo_from_user32 is called from rt_sigqueueinfo and
rt_tgsigqueueinfo in which the user has full control overthe top 16 bits
of si_code.

This fixes the following information leaks:
x86: 8 bytes leaked when sending a signal from a 32-bit process to
itself. This leak grows to 16 bytes if the process uses x32.
(si_code = __SI_CHLD)
x86: 100 bytes leaked when sending a signal from a 32-bit process to
a 64-bit process. (si_code = -1)
sparc: 4 bytes leaked when sending a signal from a 32-bit process to a
64-bit process. (si_code = any)

parsic and s390 have similar bugs, but they are not vulnerable because
rt_[tg]sigqueueinfo have checks that prevent sending a positive si_code
to a different process. These bugs are also fixed for consistency.

Signed-off-by: Amanieu d'Antras
Cc: Oleg Nesterov
Cc: Ingo Molnar
Cc: Russell King
Cc: Ralf Baechle
Cc: Benjamin Herrenschmidt
Cc: Chris Metcalf
Cc: Paul Mackerras
Cc: Michael Ellerman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Amanieu d'Antras
2015-08-17 11:52:26 +0800
c08a75d95 signal: fix information leak in copy_siginfo_to_user ... Browse Code »

commit 26135022f85105ad725cda103fa069e29e83bd16 upstream.

This function may copy the si_addr_lsb, si_lower and si_upper fields to
user mode when they haven't been initialized, which can leak kernel
stack data to user mode.

Just checking the value of si_code is insufficient because the same
si_code value is shared between multiple signals. This is solved by
checking the value of si_signo in addition to si_code.

Signed-off-by: Amanieu d'Antras
Cc: Oleg Nesterov
Cc: Ingo Molnar
Cc: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Amanieu d'Antras
2015-08-17 11:52:26 +0800

11 Aug, 2015

2 commits

23713b4de ftrace: Fix breakage of set_ftrace_pid ... Browse Code »

commit e3eea1404f5ff7a2ceb7b5e7ba412a6fd94f2935 upstream.

Commit 4104d326b670 ("ftrace: Remove global function list and call function
directly") simplified the ftrace code by removing the global_ops list with a
new design. But this cleanup also broke the filtering of PIDs that are added
to the set_ftrace_pid file.

Add back the proper hooks to have pid filtering working once again.

Reported-by: Matt Fleming
Reported-by: Richard Weinberger
Tested-by: Matt Fleming
Signed-off-by: Steven Rostedt
Signed-off-by: Greg Kroah-Hartman

Steven Rostedt (Red Hat)
2015-08-11 03:21:55 +0800
8ee1239a0 genirq: Prevent resend to interrupts marked IRQ_NESTED_THREAD ... Browse Code »

commit 75a06189fc508a2acf470b0b12710362ffb2c4b1 upstream.

The resend mechanism happily calls the interrupt handler of interrupts
which are marked IRQ_NESTED_THREAD from softirq context. This can
result in crashes because the interrupt handler is not the proper way
to invoke the device handlers. They must be invoked via
handle_nested_irq.

Prevent the resend even if the interrupt has no valid parent irq
set. Its better to have a lost interrupt than a crashing machine.

Reported-by: Uwe Kleine-König
Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2015-08-11 03:21:53 +0800

04 Aug, 2015

6 commits

ae41bfc68 security_syslog() should be called once only ... Browse Code »

commit d194e5d666225b04c7754471df0948f645b6ab3a upstream.

The final version of commit 637241a900cb ("kmsg: honor dmesg_restrict
sysctl on /dev/kmsg") lost few hooks, as result security_syslog() are
processed incorrectly:

- open of /dev/kmsg checks syslog access permissions by using
check_syslog_permissions() where security_syslog() is not called if
dmesg_restrict is set.

- syslog syscall and /proc/kmsg calls do_syslog() where security_syslog
can be executed twice (inside check_syslog_permissions() and then
directly in do_syslog())

With this patch security_syslog() is called once only in all
syslog-related operations regardless of dmesg_restrict value.

Fixes: 637241a900cb ("kmsg: honor dmesg_restrict sysctl on /dev/kmsg")
Signed-off-by: Vasily Averin
Cc: Kees Cook
Cc: Josh Boyer
Cc: Eric Paris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Vasily Averin
2015-08-04 00:29:15 +0800
01fed2338 PM / sleep: Increase default DPM watchdog timeout to 60 ... Browse Code »

commit fff3b16d2754a061a3549c4307a186423a0128fd upstream.

Many harddisks (mostly WD ones) have firmware problems and take too
long, more than 10 seconds, to resume from suspend. And this often
exceeds the default DPM watchdog timeout (12 seconds), resulting in a
kernel panic out of sudden.

Since most distros just take the default as is, we should give a bit
more safer value. This patch increases the default value from 12
seconds to one minute, which has been confirmed to be long enough for
such problematic disks.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=91921
Fixes: 70fea60d888d (PM / Sleep: Detect device suspend/resume lockup and log event)
Signed-off-by: Takashi Iwai
Signed-off-by: Rafael J. Wysocki
Signed-off-by: Greg Kroah-Hartman

Takashi Iwai
2015-08-04 00:29:15 +0800
624dda42c tracing: Have branch tracer use recursive field of task struct ... Browse Code »

commit 6224beb12e190ff11f3c7d4bf50cb2922878f600 upstream.

Fengguang Wu's tests triggered a bug in the branch tracer's start up
test when CONFIG_DEBUG_PREEMPT set. This was because that config
adds some debug logic in the per cpu field, which calls back into
the branch tracer.

The branch tracer has its own recursive checks, but uses a per cpu
variable to implement it. If retrieving the per cpu variable calls
back into the branch tracer, you can see how things will break.

Instead of using a per cpu variable, use the trace_recursion field
of the current task struct. Simply set a bit when entering the
branch tracing and clear it when leaving. If the bit is set on
entry, just don't do the tracing.

There's also the case with lockdep, as the local_irq_save() called
before the recursion can also trigger code that can call back into
the function. Changing that to a raw_local_irq_save() will protect
that as well.

This prevents the recursion and the inevitable crash that follows.

Link: http://lkml.kernel.org/r/20150630141803.GA28071@wfg-t540p.sh.intel.com

Reported-by: Fengguang Wu
Tested-by: Fengguang Wu
Signed-off-by: Steven Rostedt
Signed-off-by: Greg Kroah-Hartman

Steven Rostedt (Red Hat)
2015-08-04 00:29:12 +0800
2161c8679 tracing: Fix typo from "static inlin" to "static inline" ... Browse Code »

commit cc9e4bde03f2b4cfba52406c021364cbd2a4a0f3 upstream.

The trace.h header when called without CONFIG_EVENT_TRACING enabled
(seldom done), will not compile because of a typo in the protocol
of trace_event_enum_update().

Signed-off-by: Steven Rostedt
Signed-off-by: Greg Kroah-Hartman

Steven Rostedt (Red Hat)
2015-08-04 00:29:12 +0800
a27274be0 tracing/filter: Do not allow infix to exceed end of string ... Browse Code »

commit 6b88f44e161b9ee2a803e5b2b1fbcf4e20e8b980 upstream.

While debugging a WARN_ON() for filtering, I found that it is possible
for the filter string to be referenced after its end. With the filter:

# echo '>' > /sys/kernel/debug/events/ext4/ext4_truncate_exit/filter

The filter_parse() function can call infix_get_op() which calls
infix_advance() that updates the infix filter pointers for the cnt
and tail without checking if the filter is already at the end, which
will put the cnt to zero and the tail beyond the end. The loop then calls
infix_next() that has

ps->infix.cnt--;
return ps->infix.string[ps->infix.tail++];

The cnt will now be below zero, and the tail that is returned is
already passed the end of the filter string. So far the allocation
of the filter string usually has some buffer that is zeroed out, but
if the filter string is of the exact size of the allocated buffer
there's no guarantee that the charater after the nul terminating
character will be zero.

Luckily, only root can write to the filter.

Signed-off-by: Steven Rostedt
Signed-off-by: Greg Kroah-Hartman

Steven Rostedt (Red Hat)
2015-08-04 00:29:12 +0800
baa7b4625 tracing/filter: Do not WARN on operand count going below zero ... Browse Code »

commit b4875bbe7e68f139bd3383828ae8e994a0df6d28 upstream.

When testing the fix for the trace filter, I could not come up with
a scenario where the operand count goes below zero, so I added a
WARN_ON_ONCE(cnt < 0) to the logic. But there is legitimate case
that it can happen (although the filter would be wrong).

# echo '>' > /sys/kernel/debug/events/ext4/ext4_truncate_exit/filter

That is, a single operation without any operands will hit the path
where the WARN_ON_ONCE() can trigger. Although this is harmless,
and the filter is reported as a error. But instead of spitting out
a warning to the kernel dmesg, just fail nicely and report it via
the proper channels.

Link: http://lkml.kernel.org/r/558C6082.90608@oracle.com

Reported-by: Vince Weaver
Reported-by: Sasha Levin
Signed-off-by: Steven Rostedt
Signed-off-by: Greg Kroah-Hartman

Steven Rostedt (Red Hat)
2015-08-04 00:29:12 +0800

22 Jul, 2015

5 commits

25d8f169e genirq: devres: Fix testing return value of request_any_context_irq() ... Browse Code »

commit 63781394c540dd9e666a6b21d70b64dd52bce76e upstream.

request_any_context_irq() returns a negative value on failure.
It returns either IRQC_IS_HARDIRQ or IRQC_IS_NESTED on success.
So fix testing return value of request_any_context_irq().

Also fixup the return value of devm_request_any_context_irq() to make it
consistent with request_any_context_irq().

Fixes: 0668d3065128 ("genirq: Add devm_request_any_context_irq()")
Signed-off-by: Axel Lin
Reviewed-by: Stephen Boyd
Link: http://lkml.kernel.org/r/1431334978.17783.4.camel@ingics.com
Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Axel Lin
2015-07-22 01:10:05 +0800
9da8e034d livepatch: add module locking around kallsyms calls ... Browse Code »

commit 9a1bd63cdae4b623494c4ebaf723a91c35ec49fb upstream.

The list of loaded modules is walked through in
module_kallsyms_on_each_symbol (called by kallsyms_on_each_symbol). The
module_mutex lock should be acquired to prevent potential corruptions
in the list.

This was uncovered with new lockdep asserts in module code introduced by
the commit 0be964be0d45 ("module: Sanitize RCU usage and locking") in
recent next- trees.

Signed-off-by: Miroslav Benes
Acked-by: Josh Poimboeuf
Signed-off-by: Jiri Kosina
Signed-off-by: Greg Kroah-Hartman

Miroslav Benes
2015-07-22 01:10:04 +0800
a6556a506 rcu: Correctly handle non-empty Tiny RCU callback list with none ready ... Browse Code »

commit 6e91f8cb138625be96070b778d9ba71ce520ea7e upstream.

If, at the time __rcu_process_callbacks() is invoked, there are callbacks
in Tiny RCU's callback list, but none of them are ready to be invoked,
the current list-management code will knit the non-ready callbacks out
of the list. This can result in hangs and possibly worse. This commit
therefore inserts a check for there being no callbacks that can be
invoked immediately.

This bug is unlikely to occur -- you have to get a new callback between
the time rcu_sched_qs() or rcu_bh_qs() was called, but before we get to
__rcu_process_callbacks(). It was detected by the addition of RCU-bh
testing to rcutorture, which in turn was instigated by Iftekhar Ahmed's
mutation testing. Although this bug was made much more likely by
915e8a4fe45e (rcu: Remove fastpath from __rcu_process_callbacks()), this
did not cause the bug, but rather made it much more probable. That
said, it takes more than 40 hours of rcutorture testing, on average,
for this bug to appear, so this fix cannot be considered an emergency.

Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett
Signed-off-by: Greg Kroah-Hartman

Paul E. McKenney
2015-07-22 01:10:01 +0800
28dd1f346 sysfs: Create mountpoints with sysfs_create_mount_point ... Browse Code »

commit f9bb48825a6b5d02f4cabcc78967c75db903dcdc upstream.

This allows for better documentation in the code and
it allows for a simpler and fully correct version of
fs_fully_visible to be written.

The mount points converted and their filesystems are:
/sys/hypervisor/s390/ s390_hypfs
/sys/kernel/config/ configfs
/sys/kernel/debug/ debugfs
/sys/firmware/efi/efivars/ efivarfs
/sys/fs/fuse/connections/ fusectl
/sys/fs/pstore/ pstore
/sys/kernel/tracing/ tracefs
/sys/fs/cgroup/ cgroup
/sys/kernel/security/ securityfs
/sys/fs/selinux/ selinuxfs
/sys/fs/smackfs/ smackfs

Acked-by: Greg Kroah-Hartman
Signed-off-by: "Eric W. Biederman"
Signed-off-by: Greg Kroah-Hartman

Eric W. Biederman
2015-07-22 01:10:01 +0800
bdbdf7ee9 sysctl: Allow creating permanently empty directories that serve as mountpoints. ... Browse Code »

commit f9bd6733d3f11e24f3949becf277507d422ee1eb upstream.

Add a magic sysctl table sysctl_mount_point that when used to
create a directory forces that directory to be permanently empty.

Update the code to use make_empty_dir_inode when accessing permanently
empty directories.

Update the code to not allow adding to permanently empty directories.

Update /proc/sys/fs/binfmt_misc to be a permanently empty directory.

Signed-off-by: "Eric W. Biederman"
Signed-off-by: Greg Kroah-Hartman

Eric W. Biederman
2015-07-22 01:10:00 +0800

30 Jun, 2015

1 commit

12709f95f perf: Fix ring_buffer_attach() RCU sync, again ... Browse Code »

commit 2f993cf093643b98477c421fa2b9a98dcc940323 upstream.

While looking for other users of get_state/cond_sync. I Found
ring_buffer_attach() and it looks obviously buggy?

Don't we need to ensure that we have "synchronize" _between_
list_del() and list_add() ?

IOW. Suppose that ring_buffer_attach() preempts right_after
get_state_synchronize_rcu() and gp completes before spin_lock().

In this case cond_synchronize_rcu() does nothing and we reuse
->rb_entry without waiting for gp in between?

It also moves the ->rcu_pending check under "if (rb)", to make it
more readable imo.

Signed-off-by: Oleg Nesterov
Signed-off-by: Peter Zijlstra (Intel)
Cc: Alexander Shishkin
Cc: Andrew Morton
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: dave@stgolabs.net
Cc: der.herr@hofr.at
Cc: josh@joshtriplett.org
Cc: tj@kernel.org
Fixes: b69cf53640da ("perf: Fix a race between ring_buffer_detach() and ring_buffer_attach()")
Link: http://lkml.kernel.org/r/20150530200425.GA15748@redhat.com
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman

Oleg Nesterov
2015-06-30 03:35:28 +0800

18 Jun, 2015

1 commit

17fda38f1 Merge tag 'trace-fix-filter-4.1-rc8' of git://git.kernel.org/pub/scm/linux/kerne… ... Browse Code »

…l/git/rostedt/linux-trace

Pull tracing filter fix from Steven Rostedt:
"Vince Weaver reported a warning when he added perf event filters into
his fuzzer tests. There's a missing check of balanced operations when
parenthesis are used, and this triggers a WARN_ON() and when reading
the failure, the filter reports no failure occurred.

The operands were not being checked if they match, this adds that"

* tag 'trace-fix-filter-4.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Have filter check for balanced ops

Linus Torvalds
2015-06-18 14:56:57 +0800

17 Jun, 2015

1 commit

2cf30dc18 tracing: Have filter check for balanced ops ... Browse Code »

When the following filter is used it causes a warning to trigger:

# cd /sys/kernel/debug/tracing
# echo "((dev==1)blocks==2)" > events/ext4/ext4_truncate_exit/filter
-bash: echo: write error: Invalid argument
# cat events/ext4/ext4_truncate_exit/filter
((dev==1)blocks==2)
^
parse_error: No error

------------[ cut here ]------------
WARNING: CPU: 2 PID: 1223 at kernel/trace/trace_events_filter.c:1640 replace_preds+0x3c5/0x990()
Modules linked in: bnep lockd grace bluetooth ...
CPU: 3 PID: 1223 Comm: bash Tainted: G W 4.1.0-rc3-test+ #450
Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
0000000000000668 ffff8800c106bc98 ffffffff816ed4f9 ffff88011ead0cf0
0000000000000000 ffff8800c106bcd8 ffffffff8107fb07 ffffffff8136b46c
ffff8800c7d81d48 ffff8800d4c2bc00 ffff8800d4d4f920 00000000ffffffea
Call Trace:
[] dump_stack+0x4c/0x6e
[] warn_slowpath_common+0x97/0xe0
[] ? _kstrtoull+0x2c/0x80
[] warn_slowpath_null+0x1a/0x20
[] replace_preds+0x3c5/0x990
[] create_filter+0x82/0xb0
[] apply_event_filter+0xd4/0x180
[] event_filter_write+0x8f/0x120
[] __vfs_write+0x28/0xe0
[] ? __sb_start_write+0x53/0xf0
[] ? security_file_permission+0x30/0xc0
[] vfs_write+0xb8/0x1b0
[] SyS_write+0x4f/0xb0
[] system_call_fastpath+0x12/0x6a
---[ end trace e11028bd95818dcd ]---

Worse yet, reading the error message (the filter again) it says that
there was no error, when there clearly was. The issue is that the
code that checks the input does not check for balanced ops. That is,
having an op between a closed parenthesis and the next token.

This would only cause a warning, and fail out before doing any real
harm, but it should still not caues a warning, and the error reported
should work:

# cd /sys/kernel/debug/tracing
# echo "((dev==1)blocks==2)" > events/ext4/ext4_truncate_exit/filter
-bash: echo: write error: Invalid argument
# cat events/ext4/ext4_truncate_exit/filter
((dev==1)blocks==2)
^
parse_error: Meaningless filter expression

And give no kernel warning.

Link: http://lkml.kernel.org/r/20150615175025.7e809215@gandalf.local.home

Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Arnaldo Carvalho de Melo
Cc: stable@vger.kernel.org # 2.6.31+
Reported-by: Vince Weaver
Tested-by: Vince Weaver
Signed-off-by: Steven Rostedt

Steven Rostedt
2015-06-17 19:13:30 +0800

15 Jun, 2015

1 commit

5bd2c2867 Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull lockdep fix from Ingo Molnar:
"A lockdep/modules unload race fix that can oops"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
lockdep: Fix a race between /proc/lock_stat and module unload

Linus Torvalds
2015-06-15 08:03:11 +0800