Eric Lee / smarc-fsl-linux-kernel

27 Apr, 2015

1 commit

9ec3a646f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull fourth vfs update from Al Viro:
"d_inode() annotations from David Howells (sat in for-next since before
the beginning of merge window) + four assorted fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
RCU pathwalk breakage when running into a symlink overmounting something
fix I_DIO_WAKEUP definition
direct-io: only inc/dec inode->i_dio_count for file systems
fs/9p: fix readdir()
VFS: assorted d_backing_inode() annotations
VFS: fs/inode.c helpers: d_inode() annotations
VFS: fs/cachefiles: d_backing_inode() annotations
VFS: fs library helpers: d_inode() annotations
VFS: assorted weird filesystems: d_inode() annotations
VFS: normal filesystems (and lustre): d_inode() annotations
VFS: security/: d_inode() annotations
VFS: security/: d_backing_inode() annotations
VFS: net/: d_inode() annotations
VFS: net/unix: d_backing_inode() annotations
VFS: kernel/: d_inode() annotations
VFS: audit: d_backing_inode() annotations
VFS: Fix up some ->d_inode accesses in the chelsio driver
VFS: Cachefiles should perform fs modifications on the top layer only
VFS: AF_UNIX sockets should call mknod on the top layer only

Linus Torvalds
2015-04-27 08:22:07 +0800

23 Apr, 2015

3 commits

27cf3a16b Merge branch 'upstream' of git://git.infradead.org/users/pcmoore/audit ... Browse Code »

Pull audit fixes from Paul Moore:
"Seven audit patches for v4.1, all bug fixes.

The largest, and perhaps most significant commit helps resolve some
memory pressure issues related to the inode cache and audit, there are
also a few small commits which help resolve some timing issues with
the audit log queue, and the rest fall into the always popular "code
clean-up" category.

In general, nothing really substantial, just a nice set of maintenance
patches"

* 'upstream' of git://git.infradead.org/users/pcmoore/audit:
audit: Remove condition which always evaluates to false
audit: reduce mmap_sem hold for mm->exe_file
audit: consolidate handling of mm->exe_file
audit: code clean up
audit: don't reset working wait time accidentally with auditd
audit: don't lose set wait time on first successful call to audit_log_start()
audit: move the tree pruning to a dedicated thread

Linus Torvalds
2015-04-23 05:49:23 +0800
4f2112351 Merge tag 'trace-v4.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace ... Browse Code »

Pull tracing fixes from Steven Rostedt:
"This adds three fixes for the tracing code.

The first is a bug when ftrace_dump_on_oops is triggered in atomic
context and function graph tracer is the tracer that is being
reported.

The second fix is bad parsing of the trace_events from the kernel
command line, where it would ignore specific events if the system name
is used with defining the event(it enables all events within the
system).

The last one is a fix to the TRACE_DEFINE_ENUM(), where a check was
missing to see if the ptr was incremented to the end of the string,
but the loop increments it again and can miss the nul delimiter to
stop processing"

* tag 'trace-v4.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix possible out of bounds memory access when parsing enums
tracing: Fix incorrect enabling of trace events by boot cmdline
tracing: Handle ftrace_dump() atomic context in graph_trace_open()

Linus Torvalds
2015-04-23 02:27:36 +0800
15ce2658d Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux ... Browse Code »

Pull module updates from Rusty Russell:
"Quentin opened a can of worms by adding extable entry checking to
modpost, but most architectures seem fixed now. Thanks to all
involved.

Last minute rebase because I noticed a "[PATCH]" had snuck into a
commit message somehow"

* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
modpost: don't emit section mismatch warnings for compiler optimizations
modpost: expand pattern matching to support substring matches
modpost: do not try to match the SHT_NUL section.
modpost: fix extable entry size calculation.
modpost: fix inverted logic in is_extable_fault_address().
modpost: handle -ffunction-sections
modpost: Whitelist .text.fixup and .exception.text
params: handle quotes properly for values not of form foo="bar".
modpost: document the use of struct section_check.
modpost: handle relocations mismatch in __ex_table.
scripts: add check_extable.sh script.
modpost: mismatch_handler: retrieve tosym information only when needed.
modpost: factorize symbol pretty print in get_pretty_name().
modpost: add handler function pointer to sectioncheck.
modpost: add .sched.text and .kprobes.text to the TEXT_SECTIONS list.
modpost: add strict white-listing when referencing sections.
module: do not print allocation-fail warning on bogus user buffer size
kernel/module.c: fix typos in message about unused symbols

Linus Torvalds
2015-04-23 00:49:24 +0800

22 Apr, 2015

2 commits

1fc149933 Merge tag 'char-misc-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc ... Browse Code »

Pull char/misc driver updates from Greg KH:
"Here's the big char/misc driver patchset for 4.1-rc1.

Lots of different driver subsystem updates here, nothing major, full
details are in the shortlog.

All of this has been in linux-next for a while"

* tag 'char-misc-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (133 commits)
mei: trace: remove unused TRACE_SYSTEM_STRING
DTS: ARM: OMAP3-N900: Add lis3lv02d support
Documentation: DT: lis302: update wakeup binding
lis3lv02d: DT: add wakeup unit 2 and wakeup threshold
lis3lv02d: DT: use s32 to support negative values
Drivers: hv: hv_balloon: correctly handle num_pages>INT_MAX case
Drivers: hv: hv_balloon: correctly handle val.freeram directory
coresight-tmc: Adding a status interface to sysfs
coresight: remove the unnecessary configuration coresight-default-sink
...

Linus Torvalds
2015-04-22 00:42:58 +0800
41d5e08ea Merge tag 'tty-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty ... Browse Code »

Pull tty/serial updates from Greg KH:
"Here's the big tty/serial driver update for 4.1-rc1.

It was delayed for a bit due to some questions surrounding some of the
console command line parsing changes that are in here. There's still
one tiny regression for people who were previously putting multiple
console command lines and expecting them all to be ignored for some
odd reason, but Peter is working on fixing that. If not, I'll send a
revert for the offending patch, but I have faith that Peter can
address it.

Other than the console work here, there's the usual serial driver
updates and changes, and a buch of 8250 reworks to try to make that
driver easier to maintain over time, and have it support more devices
in the future.

All of these have been in linux-next for a while"

* tag 'tty-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (119 commits)
n_gsm: Drop unneeded cast on netdev_priv
sc16is7xx: expose RTS inversion in RS-485 mode
serial: 8250_pci: port failed after wakeup from S3
earlycon: 8250: Document kernel command line options
earlycon: 8250: Fix command line regression
earlycon: Fix __earlycon_table stride
tty: clean up the tty time logic a bit
serial: 8250_dw: only get the clock rate in one place
serial: 8250_dw: remove useless ACPI ID check
dmaengine: hsu: move memory allocation to GFP_NOWAIT
dmaengine: hsu: remove redundant pieces of code
serial: 8250_pci: add Intel Tangier support
dmaengine: hsu: add Intel Tangier PCI ID
serial: 8250_pci: replace switch-case by formula for Intel MID
serial: 8250_pci: replace switch-case by formula
tty: cpm_uart: replace CONFIG_8xx by CONFIG_CPM1
serial: jsm: some off by one bugs
serial: xuartps: Fix check in console_setup().
serial: xuartps: Get rid of register access macros.
serial: xuartps: Fix iobase use.
...

Linus Torvalds
2015-04-22 00:33:10 +0800

20 Apr, 2015

1 commit

5224b9613 smp: Fix error case handling in smp_call_function_*() ... Browse Code »

Commit 8053871d0f7f ("smp: Fix smp_call_function_single_async()
locking") fixed the locking for the asynchronous smp-call case, but in
the process of moving the lock handling around, one of the error cases
ended up not unlocking the call data at all.

This went unnoticed on x86, because this is a "caller is buggy" case,
where the caller is trying to call a non-existent CPU. But apparently
ARM does that (at least under qemu-arm). Bindly doing cross-cpu calls
to random CPU's that aren't even online seems a bit fishy, but the error
handling was clearly not correct.

Simply add the missing "csd_unlock()" to the error path.

Reported-and-tested-by: Guenter Roeck
Analyzed-by: Rabin Vincent
Acked-by: Ingo Molnar
Signed-off-by: Linus Torvalds

Linus Torvalds
2015-04-20 04:19:23 +0800

18 Apr, 2015

2 commits

396c9df22 Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull locking fixes from Ingo Molnar:
"Two fixes: an smp-call fix and a lockdep fix"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
smp: Fix smp_call_function_single_async() locking
lockdep: Make print_lock() robust against concurrent release

Linus Torvalds
2015-04-18 23:23:42 +0800
388f99762 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:

1) Fix verifier memory corruption and other bugs in BPF layer, from
Alexei Starovoitov.

2) Add a conservative fix for doing BPF properly in the BPF classifier
of the packet scheduler on ingress. Also from Alexei.

3) The SKB scrubber should not clear out the packet MARK and security
label, from Herbert Xu.

4) Fix oops on rmmod in stmmac driver, from Bryan O'Donoghue.

5) Pause handling is not correct in the stmmac driver because it
doesn't take into consideration the RX and TX fifo sizes. From
Vince Bridgers.

6) Failure path missing unlock in FOU driver, from Wang Cong.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
net: dsa: use DEVICE_ATTR_RW to declare temp1_max
netns: remove BUG_ONs from net_generic()
IB/ipoib: Fix ndo_get_iflink
sfc: Fix memcpy() with const destination compiler warning.
altera tse: Fix network-delays and -retransmissions after high throughput.
net: remove unused 'dev' argument from netif_needs_gso()
act_mirred: Fix bogus header when redirecting from VLAN
inet_diag: fix access to tcp cc information
tcp: tcp_get_info() should fetch socket fields once
net: dsa: mv88e6xxx: Add missing initialization in mv88e6xxx_set_port_state()
skbuff: Do not scrub skb mark within the same name space
Revert "net: Reset secmark when scrubbing packet"
bpf: fix two bugs in verification logic when accessing 'ctx' pointer
bpf: fix bpf helpers to use skb->mac_header relative offsets
stmmac: Configure Flow Control to work correctly based on rxfifo size
stmmac: Enable unicast pause frame detect in GMAC Register 6
stmmac: Read tx-fifo-depth and rx-fifo-depth from the devicetree
stmmac: Add defines and documentation for enabling flow control
stmmac: Add properties for transmit and receive fifo sizes
stmmac: fix oops on rmmod after assigning ip addr
...

Linus Torvalds
2015-04-18 04:31:08 +0800

17 Apr, 2015

18 commits

3193899d4 tracing: Fix possible out of bounds memory access when parsing enums ... Browse Code »

The code that replaces the enum names with the enum values in the
tracepoints' format files could possible miss the end of string nul
character. This was caused by processing things like backslashes, quotes
and other tokens. After processing the tokens, a check for the nul
character needed to be done before continuing the loop, because the loop
incremented the pointer before doing the check, which could bypass the nul
character.

Link: http://lkml.kernel.org/r/552E661D.5060502@oracle.com

Reported-by: Sasha Levin # via KASan
Tested-by: Andrey Ryabinin
Fixes: 0c564a538aa9 "tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values"
Signed-off-by: Steven Rostedt

Steven Rostedt (Red Hat)
2015-04-17 22:34:43 +0800
11163348a oprofile: reduce mmap_sem hold for mm->exe_file ... Browse Code »

sync_buffer() needs the mmap_sem for two distinct operations, both only
occurring upon user context switch handling:

1) Dealing with the exe_file.

2) Adding the dcookie data as we need to lookup the vma that
backs it. This is done via add_sample() and add_data().

This patch isolates 1), for it will no longer need the mmap_sem for
serialization. However, for now, make of the more standard
get_mm_exe_file(), requiring only holding the mmap_sem to read the value,
and relying on reference counting to make sure that the exe file won't
dissappear underneath us while doing the get dcookie.

As a consequence, for 2) we move the mmap_sem locking into where we really
need it, in lookup_dcookie(). The benefits are twofold: reduce mmap_sem
hold times, and cleaner code.

[akpm@linux-foundation.org: export get_mm_exe_file for arch/x86/oprofile/oprofile.ko]
Signed-off-by: Davidlohr Bueso
Cc: Robert Richter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2015-04-17 21:04:11 +0800
9d796e662 gcov: fix softlockups ... Browse Code »

gcov profiling if enabled with other heavy compile-time instrumentation
like KASan could trigger following softlockups:

NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]
Modules linked in:
irq event stamp: 22823276
hardirqs last enabled at (22823275): [] mutex_lock_nested+0x7d9/0x930
hardirqs last disabled at (22823276): [] apic_timer_interrupt+0x6d/0x80
softirqs last enabled at (22823172): [] __do_softirq+0x4db/0x729
softirqs last disabled at (22823167): [] irq_exit+0x7d/0x15b
CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 3.19.0-05245-gbb33326-dirty #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014
task: ffff88006cba8000 ti: ffff88006cbb0000 task.ti: ffff88006cbb0000
RIP: kasan_mem_to_shadow+0x1e/0x1f
Call Trace:
strcmp+0x28/0x70
get_node_by_name+0x66/0x99
gcov_event+0x4f/0x69e
gcov_enable_events+0x54/0x7b
gcov_fs_init+0xf8/0x134
do_one_initcall+0x1b2/0x288
kernel_init_freeable+0x467/0x580
kernel_init+0x15/0x18b
ret_from_fork+0x7c/0xb0
Kernel panic - not syncing: softlockup: hung tasks

Fix this by sticking cond_resched() in gcov_enable_events().

Signed-off-by: Andrey Ryabinin
Reported-by: Fengguang Wu
Cc: Peter Oberparleiter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrey Ryabinin
2015-04-17 21:04:08 +0800
230633d10 kernel/sysctl.c: detect overflows when converting to int ... Browse Code »

When converting unsigned long to int overflows may occur. These currently
are not detected when writing to the sysctl file system.

E.g. on a system where int has 32 bits and long has 64 bits
echo 0x800001234 > /proc/sys/kernel/threads-max
has the same effect as
echo 0x1234 > /proc/sys/kernel/threads-max

The patch adds the missing check in do_proc_dointvec_conv.

With the patch an overflow will result in an error EINVAL when writing to
the the sysctl file system.

Signed-off-by: Heinrich Schuchardt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Heinrich Schuchardt
2015-04-17 21:04:08 +0800
6e399cd14 prctl: avoid using mmap_sem for exe_file serialization ... Browse Code »

Oleg cleverly suggested using xchg() to set the new mm->exe_file instead
of calling set_mm_exe_file() which requires some form of serialization --
mmap_sem in this case. For archs that do not have atomic rmw instructions
we still fallback to a spinlock alternative, so this should always be
safe. As such, we only need the mmap_sem for looking up the backing
vm_file, which can be done sharing the lock. Naturally, this means we
need to manually deal with both the new and old file reference counting,
and we need not worry about the MMF_EXE_FILE_CHANGED bits, which can
probably be deleted in the future anyway.

Signed-off-by: Davidlohr Bueso
Suggested-by: Oleg Nesterov
Acked-by: Oleg Nesterov
Reviewed-by: Konstantin Khlebnikov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2015-04-17 21:04:07 +0800
90f31d0ea mm: rcu-protected get_mm_exe_file() ... Browse Code »

This patch removes mm->mmap_sem from mm->exe_file read side.
Also it kills dup_mm_exe_file() and moves exe_file duplication into
dup_mmap() where both mmap_sems are locked.

[akpm@linux-foundation.org: fix comment typo]
Signed-off-by: Konstantin Khlebnikov
Cc: Davidlohr Bueso
Cc: Al Viro
Cc: Oleg Nesterov
Cc: "Paul E. McKenney"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Konstantin Khlebnikov
2015-04-17 21:04:07 +0800
16db3d3f1 kernel/sysctl.c: threads-max observe limits ... Browse Code »

Users can change the maximum number of threads by writing to
/proc/sys/kernel/threads-max.

With the patch the value entered is checked against the same limits that
apply when fork_init is called.

Signed-off-by: Heinrich Schuchardt
Cc: Oleg Nesterov
Cc: Ingo Molnar
Cc: Guenter Roeck
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Heinrich Schuchardt
2015-04-17 21:04:07 +0800
ac1b398de kernel/fork.c: avoid division by zero ... Browse Code »

PAGE_SIZE is not guaranteed to be equal to or less than 8 times the
THREAD_SIZE.

E.g. architecture hexagon may have page size 1M and thread size 4096.
This would lead to a division by zero in the calculation of max_threads.

With 32-bit calculation there is no solution which delivers valid results
for all possible combinations of the parameters. The code is only called
once. Hence a 64-bit calculation can be used as solution.

[akpm@linux-foundation.org: use clamp_t(), per Oleg]
Signed-off-by: Heinrich Schuchardt
Cc: Oleg Nesterov
Cc: Ingo Molnar
Cc: Guenter Roeck
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Heinrich Schuchardt
2015-04-17 21:04:07 +0800
ff691f6e0 kernel/fork.c: new function for max_threads ... Browse Code »

PAGE_SIZE is not guaranteed to be equal to or less than 8 times the
THREAD_SIZE.

E.g. architecture hexagon may have page size 1M and thread size 4096.
This would lead to a division by zero in the calculation of max_threads.

With this patch the buggy code is moved to a separate function
set_max_threads. The error is not fixed.

After fixing the problem in a separate patch the new function can be
reused to adjust max_threads after adding or removing memory.

Argument mempages of function fork_init() is removed as totalram_pages is
an exported symbol.

The creation of separate patches for refactoring to a new function and for
fixing the logic was suggested by Ingo Molnar.

Signed-off-by: Heinrich Schuchardt
Cc: Oleg Nesterov
Cc: Ingo Molnar
Cc: Guenter Roeck
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Heinrich Schuchardt
2015-04-17 21:04:06 +0800
3ea7f5e25 fork_init: update max_threads comment ... Browse Code »

The comment explaining what value max_threads is set to is outdated. The
maximum memory consumption ratio for thread structures was 1/2 until
February 2002, then it was briefly changed to 1/16 before being set to 1/8
which we still use today. The comment was never updated to reflect that
change, it's about time.

Signed-off-by: Jean Delvare
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jean Delvare
2015-04-17 21:04:06 +0800
35f71bc0a fork: report pid reservation failure properly ... Browse Code »

copy_process will report any failure in alloc_pid as ENOMEM currently
which is misleading because the pid allocation might fail not only when
the memory is short but also when the pid space is consumed already.

The current man page even mentions this case:

: EAGAIN
:
: A system-imposed limit on the number of threads was encountered.
: There are a number of limits that may trigger this error: the
: RLIMIT_NPROC soft resource limit (set via setrlimit(2)), which
: limits the number of processes and threads for a real user ID, was
: reached; the kernel's system-wide limit on the number of processes
: and threads, /proc/sys/kernel/threads-max, was reached (see
: proc(5)); or the maximum number of PIDs, /proc/sys/kernel/pid_max,
: was reached (see proc(5)).

so the current behavior is also incorrect wrt. documentation. POSIX man
page also suggest returing EAGAIN when the process count limit is reached.

This patch simply propagates error code from alloc_pid and makes sure we
return -EAGAIN due to reservation failure. This will make behavior of
fork closer to both our documentation and POSIX.

alloc_pid might alsoo fail when the reaper in the pid namespace is dead
(the namespace basically disallows all new processes) and there is no
good error code which would match documented ones. We have traditionally
returned ENOMEM for this case which is misleading as well but as per
Eric W. Biederman this behavior is documented in man pid_namespaces(7)

: If the "init" process of a PID namespace terminates, the kernel
: terminates all of the processes in the namespace via a SIGKILL signal.
: This behavior reflects the fact that the "init" process is essential for
: the correct operation of a PID namespace. In this case, a subsequent
: fork(2) into this PID namespace will fail with the error ENOMEM; it is
: not possible to create a new processes in a PID namespace whose "init"
: process has terminated.

and introducing a new error code would be too risky so let's stick to
ENOMEM for this case.

Signed-off-by: Michal Hocko
Cc: Oleg Nesterov
Cc: "Eric W. Biederman"
Cc: Michael Kerrisk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2015-04-17 21:04:06 +0800
69828dce7 signal: remove warning about using SI_TKILL in rt_[tg]sigqueueinfo ... Browse Code »

Sending SI_TKILL from rt_[tg]sigqueueinfo was deprecated, so now we issue
a warning on the first attempt of doing it. We use WARN_ON_ONCE, which is
not informative and, what is worse, taints the kernel, making the trinity
syscall fuzzer complain false-positively from time to time.

It does not look like we need this warning at all, because the behaviour
changed quite a long time ago (2.6.39), and if an application relies on
the old API, it gets EPERM anyway and can issue a warning by itself.

So let us zap the warning in kernel.

Signed-off-by: Vladimir Davydov
Acked-by: Oleg Nesterov
Cc: Richard Weinberger
Cc: "Paul E. McKenney"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vladimir Davydov
2015-04-17 21:04:06 +0800
64a4096c5 ptrace: ptrace_detach() can no longer race with SIGKILL ... Browse Code »

ptrace_detach() re-checks ->ptrace under tasklist lock and calls
release_task() if __ptrace_detach() returns true. This was needed because
the __TASK_TRACED tracee could be killed/untraced, and it could even pass
exit_notify() before we take tasklist_lock.

But this is no longer possible after 9899d11f6544 "ptrace: ensure
arch_ptrace/ptrace_request can never race with SIGKILL". We can turn
these checks into WARN_ON() and remove release_task().

While at it, document the setting of child->exit_code.

Signed-off-by: Oleg Nesterov
Cc: Pavel Labath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2015-04-17 21:04:06 +0800
b72c18699 ptrace: fix race between ptrace_resume() and wait_task_stopped() ... Browse Code »

ptrace_resume() is called when the tracee is still __TASK_TRACED. We set
tracee->exit_code and then wake_up_state() changes tracee->state. If the
tracer's sub-thread does wait() in between, task_stopped_code(ptrace => T)
wrongly looks like another report from tracee.

This confuses debugger, and since wait_task_stopped() clears ->exit_code
the tracee can miss a signal.

Test-case:

#include
#include
#include
#include
#include
#include

int pid;

void *waiter(void *arg)
{
int stat;

for (;;) {
assert(pid == wait(&stat));
assert(WIFSTOPPED(stat));
if (WSTOPSIG(stat) == SIGHUP)
continue;

assert(WSTOPSIG(stat) == SIGCONT);
printf("ERR! extra/wrong report:%x\n", stat);
}
}

int main(void)
{
pthread_t thread;

pid = fork();
if (!pid) {
assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
for (;;)
kill(getpid(), SIGHUP);
}

assert(pthread_create(&thread, NULL, waiter, NULL) == 0);

for (;;)
ptrace(PTRACE_CONT, pid, 0, SIGCONT);

return 0;
}

Note for stable: the bug is very old, but without 9899d11f6544 "ptrace:
ensure arch_ptrace/ptrace_request can never race with SIGKILL" the fix
should use lock_task_sighand(child).

Signed-off-by: Oleg Nesterov
Reported-by: Pavel Labath
Tested-by: Pavel Labath
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2015-04-17 21:04:06 +0800
8053871d0 smp: Fix smp_call_function_single_async() locking ... Browse Code »

The current smp_function_call code suffers a number of problems, most
notably smp_call_function_single_async() is broken.

The problem is that flush_smp_call_function_queue() does csd_unlock()
_after_ calling csd->func(). This means that a caller cannot properly
synchronize the csd usage as it has to.

Change the code to release the csd before calling ->func() for the
async case, and put a WARN_ON_ONCE(csd->flags & CSD_FLAG_LOCK) in
smp_call_function_single_async() to warn us of improper serialization,
because any waiting there can results in deadlocks when called with
IRQs disabled.

Rename the (currently) unused WAIT flag to SYNCHRONOUS and (re)use it
such that we know what to do in flush_smp_call_function_queue().

Rework csd_{,un}lock() to use smp_load_acquire() / smp_store_release()
to avoid some full barriers while more clearly providing lock
semantics.

Finally move the csd maintenance out of generic_exec_single() into its
callers for clearer code.

Signed-off-by: Linus Torvalds
[ Added changelog. ]
Signed-off-by: Peter Zijlstra (Intel)
Cc: Frederic Weisbecker
Cc: Jens Axboe
Cc: Rafael David Tinoco
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/CA+55aFz492bzLFhdbKN-Hygjcreup7CjMEYk3nTSfRWjppz-OA@mail.gmail.com
Signed-off-by: Ingo Molnar

Linus Torvalds
2015-04-17 15:57:52 +0800
d7bc3197b lockdep: Make print_lock() robust against concurrent release ... Browse Code »

During sysrq's show-held-locks command it is possible that
hlock_class() returns NULL for a given lock. The result is then (after
the warning):

|BUG: unable to handle kernel NULL pointer dereference at 0000001c
|IP: [] get_usage_chars+0x5/0x100
|Call Trace:
| [] print_lock_name+0x23/0x60
| [] print_lock+0x5d/0x7e
| [] lockdep_print_held_locks+0x74/0xe0
| [] debug_show_all_locks+0x132/0x1b0
| [] sysrq_handle_showlocks+0x8/0x10

This *might* happen because the thread on the other CPU drops the lock
after we are looking ->lockdep_depth and ->held_locks points no longer
to a lock that is held.

The fix here is to simply ignore it and continue.

Reported-by: Andreas Messerschmid
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Sebastian Andrzej Siewior
Cc: Thomas Gleixner
Signed-off-by: Ingo Molnar

Peter Zijlstra
2015-04-17 15:42:14 +0800
725f9dcd5 bpf: fix two bugs in verification logic when accessing 'ctx' pointer ... Browse Code »

1.
first bug is a silly mistake. It broke tracing examples and prevented
simple bpf programs from loading.

In the following code:
if (insn->imm == 0 && BPF_SIZE(insn->code) == BPF_W) {
} else if (...) {
// this part should have been executed when
// insn->code == BPF_W and insn->imm != 0
}

Obviously it's not doing that. So simple instructions like:
r2 = *(u64 *)(r1 + 8)
will be rejected. Note the comments in the code around these branches
were and still valid and indicate the true intent.

Replace it with:
if (BPF_SIZE(insn->code) != BPF_W)
continue;

if (insn->imm == 0) {
} else if (...) {
// now this code will be executed when
// insn->code == BPF_W and insn->imm != 0
}

2.
second bug is more subtle.
If malicious code is using the same dest register as source register,
the checks designed to prevent the same instruction to be used with different
pointer types will fail to trigger, since we were assigning src_reg_type
when it was already overwritten by check_mem_access().
The fix is trivial. Just move line:
src_reg_type = regs[insn->src_reg].type;
before check_mem_access().
Add new 'access skb fields bad4' test to check this case.

Fixes: 9bac3d6d548e ("bpf: allow extended BPF programs access skb fields")
Signed-off-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Alexei Starovoitov
2015-04-17 02:08:49 +0800
c3de6317d bpf: fix verifier memory corruption ... Browse Code »

Due to missing bounds check the DAG pass of the BPF verifier can corrupt
the memory which can cause random crashes during program loading:

[8.449451] BUG: unable to handle kernel paging request at ffffffffffffffff
[8.451293] IP: [] kmem_cache_alloc_trace+0x8d/0x2f0
[8.452329] Oops: 0000 [#1] SMP
[8.452329] Call Trace:
[8.452329] [] bpf_check+0x852/0x2000
[8.452329] [] bpf_prog_load+0x1e4/0x310
[8.452329] [] ? might_fault+0x5f/0xb0
[8.452329] [] SyS_bpf+0x806/0xa30

Fixes: f1bca824dabb ("bpf: add search pruning optimization to verifier")
Signed-off-by: Alexei Starovoitov
Acked-by: Hannes Frederic Sowa
Acked-by: Daniel Borkmann
Signed-off-by: David S. Miller

Alexei Starovoitov
2015-04-17 00:06:11 +0800

16 Apr, 2015

13 commits

84fce9db4 tracing: Fix incorrect enabling of trace events by boot cmdline ... Browse Code »

There is a problem that trace events are not properly enabled with
boot cmdline. The problem is that if we pass "trace_event=kmem:mm_page_alloc"
to the boot cmdline, it enables all kmem trace events, and not just
the page_alloc event.

This is caused by the parsing mechanism. When we parse the cmdline, the buffer
contents is modified due to tokenization. And, if we use this buffer
again, we will get the wrong result.

Unfortunately, this buffer is be accessed three times to set trace events
properly at boot time. So, we need to handle this situation.

There is already code handling ",", but we need another for ":".
This patch adds it.

Link: http://lkml.kernel.org/r/1429159484-22977-1-git-send-email-iamjoonsoo.kim@lge.com

Cc: stable@vger.kernel.org # 3.19+
Signed-off-by: Joonsoo Kim
[ added missing return ret; ]
Signed-off-by: Steven Rostedt

Joonsoo Kim
2015-04-16 21:44:07 +0800
ef99b88b1 tracing: Handle ftrace_dump() atomic context in graph_trace_open() ... Browse Code »

graph_trace_open() can be called in atomic context from ftrace_dump().
Use GFP_ATOMIC for the memory allocations when that's the case, in order
to avoid the following splat.

BUG: sleeping function called from invalid context at mm/slab.c:2849
in_atomic(): 1, irqs_disabled(): 128, pid: 0, name: swapper/0
Backtrace:
..
[] (__might_sleep) from [] (kmem_cache_alloc_trace+0x160/0x238)
r7:87800040 r6:000080d0 r5:810d16e8 r4:000080d0
[] (kmem_cache_alloc_trace) from [] (graph_trace_open+0x30/0xd0)
r10:00000100 r9:809171a8 r8:00008e28 r7:810d16f0 r6:00000001 r5:810d16e8
r4:810d16f0
[] (graph_trace_open) from [] (trace_init_global_iter+0x50/0x9c)
r8:00008e28 r7:808c853c r6:00000001 r5:810d16e8 r4:810d16f0 r3:800cbd30
[] (trace_init_global_iter) from [] (ftrace_dump+0x90/0x2ec)
r4:810d2580 r3:00000000
[] (ftrace_dump) from [] (sysrq_ftrace_dump+0x1c/0x20)
r10:00000100 r9:809171a8 r8:808f6e7c r7:00000001 r6:00000007 r5:0000007a
r4:808d5394
[] (sysrq_ftrace_dump) from [] (return_to_handler+0x0/0x18)
[] (__handle_sysrq) from [] (return_to_handler+0x0/0x18)
r8:808c8100 r7:808c8444 r6:00000101 r5:00000010 r4:84eb3210
[] (handle_sysrq) from [] (return_to_handler+0x0/0x18)
[] (pl011_int) from [] (return_to_handler+0x0/0x18)
r10:809171bc r9:809171a8 r8:00000001 r7:00000026 r6:808c6000 r5:84f01e60
r4:8454fe00
[] (handle_irq_event_percpu) from [] (handle_irq_event+0x4c/0x6c)
r10:808c7ef0 r9:87283e00 r8:00000001 r7:00000000 r6:8454fe00 r5:84f01e60
r4:84f01e00
[] (handle_irq_event) from [] (handle_fasteoi_irq+0xf0/0x1ac)
r6:808f52a4 r5:84f01e60 r4:84f01e00 r3:00000000
[] (handle_fasteoi_irq) from [] (generic_handle_irq+0x3c/0x4c)
r6:00000026 r5:00000000 r4:00000026 r3:8007a938
[] (generic_handle_irq) from [] (__handle_domain_irq+0x8c/0xfc)
r4:808c1e38 r3:0000002e
[] (__handle_domain_irq) from [] (gic_handle_irq+0x34/0x6c)
r10:80917748 r9:00000001 r8:88802100 r7:808c7ef0 r6:808c8fb0 r5:00000015
r4:8880210c r3:808c7ef0
[] (gic_handle_irq) from [] (__irq_svc+0x44/0x7c)

Link: http://lkml.kernel.org/r/1428953721-31349-1-git-send-email-rabin@rab.in
Link: http://lkml.kernel.org/r/1428957012-2319-1-git-send-email-rabin@rab.in

Cc: stable@vger.kernel.org # 3.13+
Signed-off-by: Rabin Vincent
Signed-off-by: Steven Rostedt

Rabin Vincent
2015-04-16 21:32:17 +0800
eea3a0026 Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge second patchbomb from Andrew Morton:

- the rest of MM

- various misc bits

- add ability to run /sbin/reboot at reboot time

- printk/vsprintf changes

- fiddle with seq_printf() return value

* akpm: (114 commits)
parisc: remove use of seq_printf return value
lru_cache: remove use of seq_printf return value
tracing: remove use of seq_printf return value
cgroup: remove use of seq_printf return value
proc: remove use of seq_printf return value
s390: remove use of seq_printf return value
cris fasttimer: remove use of seq_printf return value
cris: remove use of seq_printf return value
openrisc: remove use of seq_printf return value
ARM: plat-pxa: remove use of seq_printf return value
nios2: cpuinfo: remove use of seq_printf return value
microblaze: mb: remove use of seq_printf return value
ipc: remove use of seq_printf return value
rtc: remove use of seq_printf return value
power: wakeup: remove use of seq_printf return value
x86: mtrr: if: remove use of seq_printf return value
linux/bitmap.h: improve BITMAP_{LAST,FIRST}_WORD_MASK
MAINTAINERS: CREDITS: remove Stefano Brivio from B43
.mailmap: add Ricardo Ribalda
CREDITS: add Ricardo Ribalda Delgado
...

Linus Torvalds
2015-04-16 07:39:15 +0800
962e3707d tracing: remove use of seq_printf return value ... Browse Code »

The seq_printf return value, because it's frequently misused,
will eventually be converted to void.

See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to
seq_has_overflowed() and make public")

Miscellanea:

o Remove unused return value from trace_lookup_stack

Signed-off-by: Joe Perches
Acked-by: Steven Rostedt
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joe Perches
2015-04-16 07:35:25 +0800
94ff212d0 cgroup: remove use of seq_printf return value ... Browse Code »

The seq_printf return value, because it's frequently misused,
will eventually be converted to void.

See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to
seq_has_overflowed() and make public")

Signed-off-by: Joe Perches
Acked-by: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joe Perches
2015-04-16 07:35:25 +0800
7a54f46b3 kernel/reboot.c: add orderly_reboot for graceful reboot ... Browse Code »

The kernel has orderly_poweroff which allows the kernel to initiate a
graceful shutdown of userspace, by running /sbin/poweroff. This adds
orderly_reboot that will cause userspace to shut itself down by calling
/sbin/reboot.

This will be used for shutdown initiated by a system controller on
platforms that do not use ACPI.

orderly_reboot() should be used when the system wants to allow userspace
to gracefully shut itself down. For cases where the system may imminently
catch on fire, the existing emergency_restart() provides an immediate
reboot without involving userspace.

Signed-off-by: Joel Stanley
Cc: Fabian Frederick
Cc: Benjamin Herrenschmidt
Cc: Michael Ellerman
Cc: Rusty Russell
Cc: Jeremy Kerr
Cc: David S. Miller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joel Stanley
2015-04-16 07:35:23 +0800
972fae699 kernel/hung_task.c: change hung_task.c to use for_each_process_thread() ... Browse Code »

In check_hung_uninterruptible_tasks() avoid the use of deprecated
while_each_thread().

The "max_count" logic will prevent a livelock - see commit 0c740d0a
("introduce for_each_thread() to replace the buggy while_each_thread()").
Having said this let's use for_each_process_thread().

Signed-off-by: Aaron Tomlin
Acked-by: Oleg Nesterov
Cc: David Rientjes
Cc: Dave Wysochanski
Cc: Aaron Tomlin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Aaron Tomlin
2015-04-16 07:35:22 +0800
96831c0a6 kernel/resource.c: remove deprecated __check_region() and friends ... Browse Code »

All users of __check_region(), check_region(), and check_mem_region() are
gone. We got rid of the last user in v4.0-rc1. Remove them.

bloat-o-meter on x86_64 shows:

add/remove: 0/3 grow/shrink: 0/0 up/down: 0/-102 (-102)
function old new delta
__kstrtab___check_region 15 - -15
__ksymtab___check_region 16 - -16
__check_region 71 - -71

Signed-off-by: Jakub Sitnicki
Cc: Bjorn Helgaas
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jakub Sitnicki
2015-04-16 07:35:22 +0800
2813893f8 kernel: conditionally support non-root users, groups and capabilities ... Browse Code »

There are a lot of embedded systems that run most or all of their
functionality in init, running as root:root. For these systems,
supporting multiple users is not necessary.

This patch adds a new symbol, CONFIG_MULTIUSER, that makes support for
non-root users, non-root groups, and capabilities optional. It is enabled
under CONFIG_EXPERT menu.

When this symbol is not defined, UID and GID are zero in any possible case
and processes always have all capabilities.

The following syscalls are compiled out: setuid, setregid, setgid,
setreuid, setresuid, getresuid, setresgid, getresgid, setgroups,
getgroups, setfsuid, setfsgid, capget, capset.

Also, groups.c is compiled out completely.

In kernel/capability.c, capable function was moved in order to avoid
adding two ifdef blocks.

This change saves about 25 KB on a defconfig build. The most minimal
kernels have total text sizes in the high hundreds of kB rather than
low MB. (The 25k goes down a bit with allnoconfig, but not that much.

The kernel was booted in Qemu. All the common functionalities work.
Adding users/groups is not possible, failing with -ENOSYS.

Bloat-o-meter output:
add/remove: 7/87 grow/shrink: 19/397 up/down: 1675/-26325 (-24650)

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Iulia Manda
Reviewed-by: Josh Triplett
Acked-by: Geert Uytterhoeven
Tested-by: Paul E. McKenney
Reviewed-by: Paul E. McKenney
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Iulia Manda
2015-04-16 07:35:22 +0800
5bbe3547a mm: allow compaction of unevictable pages ... Browse Code »

Currently, pages which are marked as unevictable are protected from
compaction, but not from other types of migration. The POSIX real time
extension explicitly states that mlock() will prevent a major page
fault, but the spirit of this is that mlock() should give a process the
ability to control sources of latency, including minor page faults.
However, the mlock manpage only explicitly says that a locked page will
not be written to swap and this can cause some confusion. The
compaction code today does not give a developer who wants to avoid swap
but wants to have large contiguous areas available any method to achieve
this state. This patch introduces a sysctl for controlling compaction
behavior with respect to the unevictable lru. Users who demand no page
faults after a page is present can set compact_unevictable_allowed to 0
and users who need the large contiguous areas can enable compaction on
locked memory by leaving the default value of 1.

To illustrate this problem I wrote a quick test program that mmaps a
large number of 1MB files filled with random data. These maps are
created locked and read only. Then every other mmap is unmapped and I
attempt to allocate huge pages to the static huge page pool. When the
compact_unevictable_allowed sysctl is 0, I cannot allocate hugepages
after fragmenting memory. When the value is set to 1, allocations
succeed.

Signed-off-by: Eric B Munson
Acked-by: Michal Hocko
Acked-by: Vlastimil Babka
Acked-by: Christoph Lameter
Acked-by: David Rientjes
Acked-by: Rik van Riel
Cc: Vlastimil Babka
Cc: Thomas Gleixner
Cc: Christoph Lameter
Cc: Peter Zijlstra
Cc: Mel Gorman
Cc: David Rientjes
Cc: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric B Munson
2015-04-16 07:35:17 +0800
adbe427b9 memcg: zap mem_cgroup_lookup() ... Browse Code »

mem_cgroup_lookup() is a wrapper around mem_cgroup_from_id(), which
checks that id != 0 before issuing the function call. Today, there is
no point in this additional check apart from optimization, because there
is no css with id 0 to css_from_id.

Signed-off-by: Vladimir Davydov
Acked-by: Michal Hocko
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vladimir Davydov
2015-04-16 07:35:16 +0800
fa2e5c073 Merge branch 'exec_domain_rip_v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/misc ... Browse Code »

Pull exec domain removal from Richard Weinberger:
"This series removes execution domain support from Linux.

The idea behind exec domains was to support different ABIs. The
feature was never complete nor stable. Let's rip it out and make the
kernel signal handling code less complicated"

* 'exec_domain_rip_v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/misc: (27 commits)
arm64: Removed unused variable
sparc: Fix execution domain removal
Remove rest of exec domains.
arch: Remove exec_domain from remaining archs
arc: Remove signal translation and exec_domain
xtensa: Remove signal translation and exec_domain
xtensa: Autogenerate offsets in struct thread_info
x86: Remove signal translation and exec_domain
unicore32: Remove signal translation and exec_domain
um: Remove signal translation and exec_domain
tile: Remove signal translation and exec_domain
sparc: Remove signal translation and exec_domain
sh: Remove signal translation and exec_domain
s390: Remove signal translation and exec_domain
mn10300: Remove signal translation and exec_domain
microblaze: Remove signal translation and exec_domain
m68k: Remove signal translation and exec_domain
m32r: Remove signal translation and exec_domain
m32r: Autogenerate offsets in struct thread_info
frv: Remove signal translation and exec_domain
...

Linus Torvalds
2015-04-16 04:53:55 +0800
fa927894b Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull second vfs update from Al Viro:
"Now that net-next went in... Here's the next big chunk - killing
->aio_read() and ->aio_write().

There'll be one more pile today (direct_IO changes and
generic_write_checks() cleanups/fixes), but I'd prefer to keep that
one separate"

* 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (37 commits)
->aio_read and ->aio_write removed
pcm: another weird API abuse
infinibad: weird APIs switched to ->write_iter()
kill do_sync_read/do_sync_write
fuse: use iov_iter_get_pages() for non-splice path
fuse: switch to ->read_iter/->write_iter
switch drivers/char/mem.c to ->read_iter/->write_iter
make new_sync_{read,write}() static
coredump: accept any write method
switch /dev/loop to vfs_iter_write()
serial2002: switch to __vfs_read/__vfs_write
ashmem: use __vfs_read()
export __vfs_read()
autofs: switch to __vfs_write()
new helper: __vfs_write()
switch hugetlbfs to ->read_iter()
coda: switch to ->read_iter/->write_iter
ncpfs: switch to ->read_iter/->write_iter
net/9p: remove (now-)unused helpers
p9_client_attach(): set fid->uid correctly
...

Linus Torvalds
2015-04-16 04:22:56 +0800