Doug / smarc-fsl-linux-kernel | Embedian Git Server

07 Dec, 2012

2 commits

54d1ae492 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux ... Browse Code »

Pull module signing fixes from Rusty Russell:
"David gave me these a month ago, during my git workflow churn :("

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
ASN.1: Fix an indefinite length skip error
MODSIGN: Don't use enum-type bitfields in module signature info block

Linus Torvalds
2012-12-07 00:29:08 +0800
cfd1f032f Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull watchdog fix from Thomas Gleixner:
"Trivial CPU hotplug regression fix for the watchdog code"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
watchdog: Fix CPU hotplug regression

Linus Torvalds
2012-12-07 00:27:11 +0800

05 Dec, 2012

4 commits

12e130b04 MODSIGN: Don't use enum-type bitfields in module signature info block ... Browse Code »

Don't use enum-type bitfields in the module signature info block as we can't be
certain how the compiler will handle them. As I understand it, it is arch
dependent, and it is possible for the compiler to rearrange them based on
endianness and to insert a byte of padding to pad the three enums out to four
bytes.

Instead use u8 fields for these, which the compiler should emit in the right
order without padding.

Signed-off-by: David Howells
Signed-off-by: Rusty Russell

David Howells
2012-12-05 08:57:24 +0800
8d4516904 watchdog: Fix CPU hotplug regression ... Browse Code »

Norbert reported:
"3.7-rc6 booted with nmi_watchdog=0 fails to suspend to RAM or
offline CPUs. It's reproducable with a KVM guest and physical
system."

The reason is that commit bcd951cf(watchdog: Use hotplug thread
infrastructure) missed to take this into account. So the cpu offline
code gets stuck in the teardown function because it accesses non
initialized data structures.

Add a check for watchdog_enabled into that path to cure the issue.

Reported-and-tested-by: Norbert Warmuth
Tested-by: Joseph Salisbury
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1211231033230.2701@ionos
Link: http://bugs.launchpad.net/bugs/1079534
Signed-off-by: Thomas Gleixner

Thomas Gleixner
2012-12-05 02:56:59 +0800
df2fc246c Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux ... Browse Code »

Pull module fixes from Rusty Russell:
"Module signing build fixes for blackfin and metag"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
modsign: add symbol prefix to certificate list
linux/kernel.h: define SYMBOL_PREFIX

Linus Torvalds
2012-12-05 01:32:12 +0800
ca50496eb Merge branch 'for-3.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq ... Browse Code »

Pull workqueue fixes from Tejun Heo:
"So, safe fixes my ass.

Commit 8852aac25e79 ("workqueue: mod_delayed_work_on() shouldn't queue
timer on 0 delay") had the side-effect of performing delayed_work
sanity checks even when @delay is 0, which should be fine for any sane
use cases.

Unfortunately, megaraid was being overly ingenious. It seemingly
wanted to use cancel_delayed_work_sync() before cancel_work_sync() was
introduced, but didn't want to waste the space for full delayed_work
as it was only going to use 0 @delay. So, it only allocated space for
struct work_struct and then cast it to struct delayed_work and passed
it into delayed_work functions - truly awesome engineering tradeoff to
save some bytes.

Xiaotian fixed it by making megraid allocate full delayed_work for
now. It should be converted to use work_struct and cancel_work_sync()
but I think we better do that after 3.7.

I added another commit to change BUG_ON()s in __queue_delayed_work()
to WARN_ON_ONCE()s so that the kernel doesn't crash even if there are
more such abuses."

* 'for-3.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: convert BUG_ON()s in __queue_delayed_work() to WARN_ON_ONCE()s
megaraid: fix BUG_ON() from incorrect use of delayed work

Linus Torvalds
2012-12-05 01:02:45 +0800

04 Dec, 2012

2 commits

fc4b514f2 workqueue: convert BUG_ON()s in __queue_delayed_work() to WARN_ON_ONCE()s ... Browse Code »

8852aac25e ("workqueue: mod_delayed_work_on() shouldn't queue timer on
0 delay") unexpectedly uncovered a very nasty abuse of delayed_work in
megaraid - it allocated work_struct, casted it to delayed_work and
then pass that into queue_delayed_work().

Previously, this was okay because 0 @delay short-circuited to
queue_work() before doing anything with delayed_work. 8852aac25e
moved 0 @delay test into __queue_delayed_work() after sanity check on
delayed_work making megaraid trigger BUG_ON().

Although megaraid is already fixed by c1d390d8e6 ("megaraid: fix
BUG_ON() from incorrect use of delayed work"), this patch converts
BUG_ON()s in __queue_delayed_work() to WARN_ON_ONCE()s so that such
abusers, if there are more, trigger warning but don't crash the
machine.

Signed-off-by: Tejun Heo
Cc: Xiaotian Feng

Tejun Heo
2012-12-04 23:58:47 +0800
fd8ef1173 Revert "sched, autogroup: Stop going ahead if autogroup is disabled" ... Browse Code »

This reverts commit 800d4d30c8f20bd728e5741a3b77c4859a613f7c.

Between commits 8323f26ce342 ("sched: Fix race in task_group()") and
800d4d30c8f2 ("sched, autogroup: Stop going ahead if autogroup is
disabled"), autogroup is a wreck.

With both applied, all you have to do to crash a box is disable
autogroup during boot up, then reboot.. boom, NULL pointer dereference
due to commit 800d4d30c8f2 not allowing autogroup to move things, and
commit 8323f26ce342 making that the only way to switch runqueues:

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [] effective_load.isra.43+0x50/0x90
Pid: 7047, comm: systemd-user-se Not tainted 3.6.8-smp #7 MEDIONPC MS-7502/MS-7502
RIP: effective_load.isra.43+0x50/0x90
Process systemd-user-se (pid: 7047, threadinfo ffff880221dde000, task ffff88022618b3a0)
Call Trace:
select_task_rq_fair+0x255/0x780
try_to_wake_up+0x156/0x2c0
wake_up_state+0xb/0x10
signal_wake_up+0x28/0x40
complete_signal+0x1d6/0x250
__send_signal+0x170/0x310
send_signal+0x40/0x80
do_send_sig_info+0x47/0x90
group_send_sig_info+0x4a/0x70
kill_pid_info+0x3a/0x60
sys_kill+0x97/0x1a0
? vfs_read+0x120/0x160
? sys_read+0x45/0x90
system_call_fastpath+0x16/0x1b
Code: 49 0f af 41 50 31 d2 49 f7 f0 48 83 f8 01 48 0f 46 c6 48 2b 07 48 8b bf 40 01 00 00 48 85 ff 74 3a 45 31 c0 48 8b 8f 50 01 00 00 8b 11 4c 8b 89 80 00 00 00 49 89 d2 48 01 d0 45 8b 59 58 4c
RIP [] effective_load.isra.43+0x50/0x90
RSP
CR2: 0000000000000000

Signed-off-by: Mike Galbraith
Acked-by: Ingo Molnar
Cc: Yong Zhang
Cc: Peter Zijlstra
Cc: stable@vger.kernel.org # 2.6.39+
Signed-off-by: Linus Torvalds

Mike Galbraith
2012-12-04 03:10:24 +0800

03 Dec, 2012

1 commit

84ecfd15f modsign: add symbol prefix to certificate list ... Browse Code »

Add the arch symbol prefix (if applicable) to the asm definition of
modsign_certificate_list and modsign_certificate_list_end. This uses the
recently defined SYMBOL_PREFIX which is derived from
CONFIG_SYMBOL_PREFIX.

This fixes the build of module signing on the blackfin and metag
architectures.

Signed-off-by: James Hogan
Cc: Rusty Russell
Cc: David Howells
Cc: Mike Frysinger
Signed-off-by: Rusty Russell

James Hogan
2012-12-03 10:36:25 +0800

02 Dec, 2012

3 commits

8852aac25 workqueue: mod_delayed_work_on() shouldn't queue timer on 0 delay ... Browse Code »

8376fe22c7 ("workqueue: implement mod_delayed_work[_on]()")
implemented mod_delayed_work[_on]() using the improved
try_to_grab_pending(). The function is later used, among others, to
replace [__]candel_delayed_work() + queue_delayed_work() combinations.

Unfortunately, a delayed_work item w/ zero @delay is handled slightly
differently by mod_delayed_work_on() compared to
queue_delayed_work_on(). The latter skips timer altogether and
directly queues it using queue_work_on() while the former schedules
timer which will expire on the closest tick. This means, when @delay
is zero, that [__]cancel_delayed_work() + queue_delayed_work_on()
makes the target item immediately executable while
mod_delayed_work_on() may induce delay of upto a full tick.

This somewhat subtle difference breaks some of the converted users.
e.g. block queue plugging uses delayed_work for deferred processing
and uses mod_delayed_work_on() when the queue needs to be immediately
unplugged. The above problem manifested as noticeably higher number
of context switches under certain circumstances.

The difference in behavior was caused by missing special case handling
for 0 delay in mod_delayed_work_on() compared to
queue_delayed_work_on(). Joonsoo Kim posted a patch to add it -
("workqueue: optimize mod_delayed_work_on() when @delay == 0")[1].
The patch was queued for 3.8 but it was described as optimization and
I missed that it was a correctness issue.

As both queue_delayed_work_on() and mod_delayed_work_on() use
__queue_delayed_work() for queueing, it seems that the better approach
is to move the 0 delay special handling to the function instead of
duplicating it in mod_delayed_work_on().

Fix the problem by moving 0 delay special case handling from
queue_delayed_work_on() to __queue_delayed_work(). This replaces
Joonsoo's patch.

[1] http://thread.gmane.org/gmane.linux.kernel/1379011/focus=1379012

Signed-off-by: Tejun Heo
Reported-and-tested-by: Anders Kaseorg
Reported-and-tested-by: Zlatko Calusic
LKML-Reference:
LKML-Reference:
Cc: Joonsoo Kim

Tejun Heo
2012-12-02 08:43:18 +0800
412d32e6c workqueue: exit rescuer_thread() as TASK_RUNNING ... Browse Code »

A rescue thread exiting TASK_INTERRUPTIBLE can lead to a task scheduling
off, never to be seen again. In the case where this occurred, an exiting
thread hit reiserfs homebrew conditional resched while holding a mutex,
bringing the box to its knees.

PID: 18105 TASK: ffff8807fd412180 CPU: 5 COMMAND: "kdmflush"
#0 [ffff8808157e7670] schedule at ffffffff8143f489
#1 [ffff8808157e77b8] reiserfs_get_block at ffffffffa038ab2d [reiserfs]
#2 [ffff8808157e79a8] __block_write_begin at ffffffff8117fb14
#3 [ffff8808157e7a98] reiserfs_write_begin at ffffffffa0388695 [reiserfs]
#4 [ffff8808157e7ad8] generic_perform_write at ffffffff810ee9e2
#5 [ffff8808157e7b58] generic_file_buffered_write at ffffffff810eeb41
#6 [ffff8808157e7ba8] __generic_file_aio_write at ffffffff810f1a3a
#7 [ffff8808157e7c58] generic_file_aio_write at ffffffff810f1c88
#8 [ffff8808157e7cc8] do_sync_write at ffffffff8114f850
#9 [ffff8808157e7dd8] do_acct_process at ffffffff810a268f
[exception RIP: kernel_thread_helper]
RIP: ffffffff8144a5c0 RSP: ffff8808157e7f58 RFLAGS: 00000202
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff8107af60 RDI: ffff8803ee491d18
RBP: 0000000000000000 R8: 0000000000000000 R9: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018

Signed-off-by: Mike Galbraith
Signed-off-by: Tejun Heo
Cc: stable@vger.kernel.org

Mike Galbraith
2012-12-02 07:56:42 +0800
455e987c0 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull perf fixes from Ingo Molnar:
"This is mostly about unbreaking architectures that took the UAPI
changes in the v3.7 cycle, plus misc fixes."

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf kvm: Fix building perf kvm on non x86 arches
perf kvm: Rename perf_kvm to perf_kvm_stat
perf: Make perf build for x86 with UAPI disintegration applied
perf powerpc: Use uapi/unistd.h to fix build error
tools: Pass the target in descend
tools: Honour the O= flag when tool build called from a higher Makefile
tools: Define a Makefile function to do subdir processing
x86: Export asm/{svm.h,vmx.h,perf_regs.h}
perf tools: Fix strbuf_addf() when the buffer needs to grow
perf header: Fix numa topology printing
perf, powerpc: Fix hw breakpoints returning -ENOSPC

Linus Torvalds
2012-12-02 05:07:48 +0800

27 Nov, 2012

2 commits

aa10990e0 futex: avoid wake_futex() for a PI futex_q ... Browse Code »

Dave Jones reported a bug with futex_lock_pi() that his trinity test
exposed. Sometime between queue_me() and taking the q.lock_ptr, the
lock_ptr became NULL, resulting in a crash.

While futex_wake() is careful to not call wake_futex() on futex_q's with
a pi_state or an rt_waiter (which are either waiting for a
futex_unlock_pi() or a PI futex_requeue()), futex_wake_op() and
futex_requeue() do not perform the same test.

Update futex_wake_op() and futex_requeue() to test for q.pi_state and
q.rt_waiter and abort with -EINVAL if detected. To ensure any future
breakage is caught, add a WARN() to wake_futex() if the same condition
is true.

This fix has seen 3 hours of testing with "trinity -c futex" on an
x86_64 VM with 4 CPUS.

[akpm@linux-foundation.org: tidy up the WARN()]
Signed-off-by: Darren Hart
Reported-by: Dave Jones
Cc: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: John Kacur
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Darren Hart
2012-11-27 09:41:24 +0800
8ffeb9b0e watchdog: using u64 in get_sample_period() ... Browse Code »

In get_sample_period(), unsigned long is not enough:

watchdog_thresh * 2 * (NSEC_PER_SEC / 5)

case1:
watchdog_thresh is 10 by default, the sample value will be: 0xEE6B2800

case2:
set watchdog_thresh is 20, the sample value will be: 0x1 DCD6 5000

In case2, we need use u64 to express the sample period. Otherwise,
changing the threshold thru proc often can not be successful.

Signed-off-by: liu chuansheng
Acked-by: Don Zickus
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Chuansheng Liu
2012-11-27 09:41:24 +0800

13 Nov, 2012

1 commit

b0db954c0 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull futex fix from Thomas Gleixner:
"Single fix for a long standing futex race when taking over a futex
whose owner died. You can end up with two owners, which violates
quite some rules."

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: Handle futex_pi OWNER_DIED take over correctly

Linus Torvalds
2012-11-13 09:02:21 +0800

01 Nov, 2012

1 commit

59fa62451 futex: Handle futex_pi OWNER_DIED take over correctly ... Browse Code »

Siddhesh analyzed a failure in the take over of pi futexes in case the
owner died and provided a workaround.
See: http://sourceware.org/bugzilla/show_bug.cgi?id=14076

The detailed problem analysis shows:

Futex F is initialized with PTHREAD_PRIO_INHERIT and
PTHREAD_MUTEX_ROBUST_NP attributes.

T1 lock_futex_pi(F);

T2 lock_futex_pi(F);
--> T2 blocks on the futex and creates pi_state which is associated
to T1.

T1 exits
--> exit_robust_list() runs
--> Futex F userspace value TID field is set to 0 and
FUTEX_OWNER_DIED bit is set.

T3 lock_futex_pi(F);
--> Succeeds due to the check for F's userspace TID field == 0
--> Claims ownership of the futex and sets its own TID into the
userspace TID field of futex F
--> returns to user space

T1 --> exit_pi_state_list()
--> Transfers pi_state to waiter T2 and wakes T2 via
rt_mutex_unlock(&pi_state->mutex)

T2 --> acquires pi_state->mutex and gains real ownership of the
pi_state
--> Claims ownership of the futex and sets its own TID into the
userspace TID field of futex F
--> returns to user space

T3 --> observes inconsistent state

This problem is independent of UP/SMP, preemptible/non preemptible
kernels, or process shared vs. private. The only difference is that
certain configurations are more likely to expose it.

So as Siddhesh correctly analyzed the following check in
futex_lock_pi_atomic() is the culprit:

if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) {

We check the userspace value for a TID value of 0 and take over the
futex unconditionally if that's true.

AFAICT this check is there as it is correct for a different corner
case of futexes: the WAITERS bit became stale.

Now the proposed change

- if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) {
+ if (unlikely(ownerdied ||
+ !(curval & (FUTEX_TID_MASK | FUTEX_WAITERS)))) {

solves the problem, but it's not obvious why and it wreckages the
"stale WAITERS bit" case.

What happens is, that due to the WAITERS bit being set (T2 is blocked
on that futex) it enforces T3 to go through lookup_pi_state(), which
in the above case returns an existing pi_state and therefor forces T3
to legitimately fight with T2 over the ownership of the pi_state (via
pi_state->mutex). Probelm solved!

Though that does not work for the "WAITERS bit is stale" problem
because if lookup_pi_state() does not find existing pi_state it
returns -ERSCH (due to TID == 0) which causes futex_lock_pi() to
return -ESRCH to user space because the OWNER_DIED bit is not set.

Now there is a different solution to that problem. Do not look at the
user space value at all and enforce a lookup of possibly available
pi_state. If pi_state can be found, then the new incoming locker T3
blocks on that pi_state and legitimately races with T2 to acquire the
rt_mutex and the pi_state and therefor the proper ownership of the
user space futex.

lookup_pi_state() has the correct order of checks. It first tries to
find a pi_state associated with the user space futex and only if that
fails it checks for futex TID value = 0. If no pi_state is available
nothing can create new state at that point because this happens with
the hash bucket lock held.

So the above scenario changes to:

T1 lock_futex_pi(F);

T2 lock_futex_pi(F);
--> T2 blocks on the futex and creates pi_state which is associated
to T1.

T1 exits
--> exit_robust_list() runs
--> Futex F userspace value TID field is set to 0 and
FUTEX_OWNER_DIED bit is set.

T3 lock_futex_pi(F);
--> Finds pi_state and blocks on pi_state->rt_mutex

T1 --> exit_pi_state_list()
--> Transfers pi_state to waiter T2 and wakes it via
rt_mutex_unlock(&pi_state->mutex)

T2 --> acquires pi_state->mutex and gains ownership of the pi_state
--> Claims ownership of the futex and sets its own TID into the
userspace TID field of futex F
--> returns to user space

This covers all gazillion points on which T3 might come in between
T1's exit_robust_list() clearing the TID field and T2 fixing it up. It
also solves the "WAITERS bit stale" problem by forcing the take over.

Another benefit of changing the code this way is that it makes it less
dependent on untrusted user space values and therefor minimizes the
possible wreckage which might be inflicted.

As usual after staring for too long at the futex code my brain hurts
so much that I really want to ditch that whole optimization of
avoiding the syscall for the non contended case for PI futexes and rip
out the maze of corner case handling code. Unfortunately we can't as
user space relies on that existing behaviour, but at least thinking
about it helps me to preserve my mental sanity. Maybe we should
nevertheless :)

Reported-and-tested-by: Siddhesh Poyarekar
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1210232138540.2756@ionos
Acked-by: Darren Hart
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner

Thomas Gleixner
2012-11-01 19:06:54 +0800

31 Oct, 2012

1 commit

59ef28b1f module: fix out-by-one error in kallsyms ... Browse Code »

Masaki found and patched a kallsyms issue: the last symbol in a
module's symtab wasn't transferred. This is because we manually copy
the zero'th entry (which is always empty) then copy the rest in a loop
starting at 1, though from src[0]. His fix was minimal, I prefer to
rewrite the loops in more standard form.

There are two loops: one to get the size, and one to copy. Make these
identical: always count entry 0 and any defined symbol in an allocated
non-init section.

This bug exists since the following commit was introduced.
module: reduce symbol table for loaded modules (v2)
commit: 4a4962263f07d14660849ec134ee42b63e95ea9a

LKML: http://lkml.org/lkml/2012/10/24/27
Reported-by: Masaki Kimura
Cc: stable@kernel.org

Rusty Russell
2012-10-31 11:26:37 +0800

30 Oct, 2012

1 commit

0d855354e perf, powerpc: Fix hw breakpoints returning -ENOSPC ... Browse Code »

I've been trying to get hardware breakpoints with perf to work
on POWER7 but I'm getting the following:

% perf record -e mem:0x10000000 true

Error: sys_perf_event_open() syscall returned with 28 (No space left on device). /bin/dmesg may provide additional information.

Fatal: No CONFIG_PERF_EVENTS=y kernel support configured?

true: Terminated

(FWIW adding -a and it works fine)

Debugging it seems that __reserve_bp_slot() is returning ENOSPC
because it thinks there are no free breakpoint slots on this
CPU.

I have a 2 CPUs, so perf userspace is doing two perf_event_open
syscalls to add a counter to each CPU [1]. The first syscall
succeeds but the second is failing.

On this second syscall, fetch_bp_busy_slots() sets slots.pinned
to be 1, despite there being no breakpoint on this CPU. This is
because the call the task_bp_pinned, checks all CPUs, rather
than just the current CPU. POWER7 only has one hardware
breakpoint per CPU (ie. HBP_NUM=1), so we return ENOSPC.

The following patch fixes this by checking the associated CPU
for each breakpoint in task_bp_pinned. I'm not familiar with
this code, so it's provided as a reference to the above issue.

Signed-off-by: Michael Neuling
Acked-by: Peter Zijlstra
Cc: Michael Ellerman
Cc: Jovi Zhang
Cc: K Prasad
Link: http://lkml.kernel.org/r/1351268936-2956-1-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar

Michael Neuling
2012-10-30 17:07:58 +0800

26 Oct, 2012

3 commits

2ab3f29dd Merge branch 'akpm' (Andrew's fixes) ... Browse Code »

Merge misc fixes from Andrew Morton:
"18 total. 15 fixes and some updates to a device_cgroup patchset which
bring it up to date with the version which I should have merged in the
first place."

* emailed patches from Andrew Morton : (18 patches)
fs/compat_ioctl.c: VIDEO_SET_SPU_PALETTE missing error check
gen_init_cpio: avoid stack overflow when expanding
drivers/rtc/rtc-imxdi.c: add missing spin lock initialization
mm, numa: avoid setting zone_reclaim_mode unless a node is sufficiently distant
pidns: limit the nesting depth of pid namespaces
drivers/dma/dw_dmac: make driver's endianness configurable
mm/mmu_notifier: allocate mmu_notifier in advance
tools/testing/selftests/epoll/test_epoll.c: fix build
UAPI: fix tools/vm/page-types.c
mm/page_alloc.c:alloc_contig_range(): return early for err path
rbtree: include linux/compiler.h for definition of __always_inline
genalloc: stop crashing the system when destroying a pool
backlight: ili9320: add missing SPI dependency
device_cgroup: add proper checking when changing default behavior
device_cgroup: stop using simple_strtoul()
device_cgroup: rename deny_all to behavior
cgroup: fix invalid rcu dereference
mm: fix XFS oops due to dirty pages without buffers on s390

Linus Torvalds
2012-10-26 07:05:57 +0800
2008713c7 Makefile: Documentation for external tool should be correct ... Browse Code »

If one includes documentation for an external tool, it should be
correct. This is not:

1. Overriding the input to rngd should typically be neither
necessary nor desired. This is especially so since newer
versions of rngd support a number of different *types* of sources.
2. The default kernel-exported device is called /dev/hwrng not
/dev/hwrandom nor /dev/hw_random (both of which were used in the
past; however, kernel and udev seem to have converged on
/dev/hwrng.)

Overall it is better if the documentation for rngd is kept with rngd
rather than in a kernel Makefile.

Signed-off-by: H. Peter Anvin
Cc: David Howells
Cc: Jeff Garzik
Signed-off-by: Linus Torvalds

H. Peter Anvin
2012-10-26 07:00:53 +0800
f23025057 pidns: limit the nesting depth of pid namespaces ... Browse Code »

'struct pid' is a "variable sized struct" - a header with an array of
upids at the end.

The size of the array depends on a level (depth) of pid namespaces. Now a
level of pidns is not limited, so 'struct pid' can be more than one page.

Looks reasonable, that it should be less than a page. MAX_PIS_NS_LEVEL is
not calculated from PAGE_SIZE, because in this case it depends on
architectures, config options and it will be reduced, if someone adds a
new fields in struct pid or struct upid.

I suggest to set MAX_PIS_NS_LEVEL = 32, because it saves ability to expand
"struct pid" and it's more than enough for all known for me use-cases.
When someone finds a reasonable use case, we can add a config option or a
sysctl parameter.

In addition it will reduce the effect of another problem, when we have
many nested namespaces and the oldest one starts dying.
zap_pid_ns_processe will be called for each namespace and find_vpid will
be called for each process in a namespace. find_vpid will be called
minimum max_level^2 / 2 times. The reason of that is that when we found a
bit in pidmap, we can't determine this pidns is top for this process or it
isn't.

vpid is a heavy operation, so a fork bomb, which create many nested
namespace, can make a system inaccessible for a long time. For example my
system becomes inaccessible for a few minutes with 4000 processes.

[akpm@linux-foundation.org: return -EINVAL in response to excessive nesting, not -ENOMEM]
Signed-off-by: Andrew Vagin
Acked-by: Oleg Nesterov
Cc: Cyrill Gorcunov
Cc: "Eric W. Biederman"
Cc: Pavel Emelyanov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Vagin
2012-10-26 05:37:53 +0800

25 Oct, 2012

3 commits

cbb525b44 Merge branch 'for-3.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup ... Browse Code »

Pull cgroup fixes from Tejun Heo:
"This pull request contains three fixes.

Two are reverts of task_lock() removal in cgroup fork path. The
optimizations incorrectly assumed that threadgroup_lock can protect
process forks (as opposed to thread creations) too. Further cleanup
of cgroup fork path is scheduled.

The third fixes cgroup emptiness notification loss."

* 'for-3.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
Revert "cgroup: Remove task_lock() from cgroup_post_fork()"
Revert "cgroup: Drop task_lock(parent) on cgroup_fork()"
cgroup: notify_on_release may not be triggered in some cases

Linus Torvalds
2012-10-25 07:35:13 +0800
d579a35d0 Merge branch 'for-3.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq ... Browse Code »

Pull workqueue fix from Tejun Heo:
"This pull request contains one patch from Dan Magenheimer to fix
cancel_delayed_work() regression introduced by its reimplementation
using try_to_grab_pending(). The reimplementation made it incorrectly
return %true when the work item is idle.

There aren't too many consumers of the return value but it broke at
least ramster."

* 'for-3.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: cancel_delayed_work() should return %false if work item is idle

Linus Torvalds
2012-10-25 07:33:22 +0800
c0158ca64 workqueue: cancel_delayed_work() should return %false if work item is idle ... Browse Code »

57b30ae77b ("workqueue: reimplement cancel_delayed_work() using
try_to_grab_pending()") made cancel_delayed_work() always return %true
unless someone else is also trying to cancel the work item, which is
broken - if the target work item is idle, the return value should be
%false.

try_to_grab_pending() indicates that the target work item was idle by
zero return value. Use it for return. Note that this brings
cancel_delayed_work() in line with __cancel_work_timer() in return
value handling.

Signed-off-by: Dan Magenheimer
Signed-off-by: Tejun Heo
LKML-Reference:

Dan Magenheimer
2012-10-25 03:38:16 +0800

24 Oct, 2012

1 commit

e17b13158 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull perf fixes from Ingo Molnar:
"Most of these are uprobes race fixes from Oleg, and their preparatory
cleanups. (It's larger than what I'd normally send for an -rc kernel,
but they looked significant enough to not delay them.)

There's also an oprofile fix and an uncore PMU fix."

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
perf/x86: Disable uncore on virtualized CPUs
oprofile, x86: Fix wrapping bug in op_x86_get_ctrl()
ring-buffer: Check for uninitialized cpu buffer before resizing
uprobes: Fix the racy uprobe->flags manipulation
uprobes: Fix prepare_uprobe() race with itself
uprobes: Introduce prepare_uprobe()
uprobes: Fix handle_swbp() vs unregister() + register() race
uprobes: Do not delete uprobe if uprobe_unregister() fails
uprobes: Don't return success if alloc_uprobe() fails
uprobes/x86: Only rep+nop can be emulated correctly
uprobes: Simplify is_swbp_at_addr(), remove stale comments
uprobes: Kill set_orig_insn()->is_swbp_at_addr()
uprobes: Introduce copy_opcode(), kill read_opcode()
uprobes: Kill set_swbp()->is_swbp_at_addr()
uprobes: Restrict valid_vma(false) to skip VM_SHARED vmas
uprobes: Change valid_vma() to demand VM_MAYEXEC rather than VM_EXEC
uprobes: Change write_opcode() to use FOLL_FORCE
uprobes: Move clear_thread_flag(TIF_UPROBE) to uprobe_notify_resume()
uprobes: Kill UTASK_BP_HIT state
uprobes: Fix UPROBE_SKIP_SSTEP checks in handle_swbp()
...

Linus Torvalds
2012-10-24 09:07:51 +0800

22 Oct, 2012

3 commits

0390c8835 module_signing: fix printk format warning ... Browse Code »

Fix the warning:

kernel/module_signing.c:195:2: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'size_t'

by using the proper 'z' modifier for printing a size_t.

Signed-off-by: Randy Dunlap
Cc: David Howells
Signed-off-by: Linus Torvalds

Randy Dunlap
2012-10-22 13:56:34 +0800
ef8ff74ed Merge branch 'tip/perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/… ... Browse Code »

…rostedt/linux-trace into perf/urgent

Pull ftrace ring-buffer resizing fix from Steve Rostedt.

Signed-off-by: Ingo Molnar <mingo@kernel.org>

Ingo Molnar
2012-10-22 01:53:34 +0800
f38787f4f Merge branch 'uprobes/core' of git://git.kernel.org/pub/scm/linux/kernel/git/ole… ... Browse Code »

…g/misc into perf/urgent

Pull various uprobes bugfixes from Oleg Nesterov - mostly race and
failure path fixes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>

Ingo Molnar
2012-10-22 00:18:17 +0800

20 Oct, 2012

6 commits

31fd84b95 use clamp_t in UNAME26 fix ... Browse Code »

The min/max call needed to have explicit types on some architectures
(e.g. mn10300). Use clamp_t instead to avoid the warning:

kernel/sys.c: In function 'override_release':
kernel/sys.c:1287:10: warning: comparison of distinct pointer types lacks a cast [enabled by default]

Reported-by: Fengguang Wu
Signed-off-by: Kees Cook
Signed-off-by: Linus Torvalds

Kees Cook
2012-10-20 09:51:17 +0800
caabe2405 MODSIGN: Move the magic string to the end of a module and eliminate the search ... Browse Code »

Emit the magic string that indicates a module has a signature after the
signature data instead of before it. This allows module_sig_check() to
be made simpler and faster by the elimination of the search for the
magic string. Instead we just need to do a single memcmp().

This works because at the end of the signature data there is the
fixed-length signature information block. This block then falls
immediately prior to the magic number.

From the contents of the information block, it is trivial to calculate
the size of the signature data and thus the size of the actual module
data.

Signed-off-by: David Howells
Signed-off-by: Linus Torvalds

David Howells
2012-10-20 08:30:40 +0800
d87838321 Revert "cgroup: Remove task_lock() from cgroup_post_fork()" ... Browse Code »

This reverts commit 7e3aa30ac8c904a706518b725c451bb486daaae9.

The commit incorrectly assumed that fork path always performed
threadgroup_change_begin/end() and depended on that for
synchronization against task exit and cgroup migration paths instead
of explicitly grabbing task_lock().

threadgroup_change is not locked when forking a new process (as
opposed to a new thread in the same process) and even if it were it
wouldn't be effective as different processes use different threadgroup
locks.

Revert the incorrect optimization.

Signed-off-by: Tejun Heo
LKML-Reference:
Acked-by: Li Zefan
Cc: Frederic Weisbecker
Cc: stable@vger.kernel.org

Tejun Heo
2012-10-20 05:09:35 +0800
9bb71308b Revert "cgroup: Drop task_lock(parent) on cgroup_fork()" ... Browse Code »

This reverts commit 7e381b0eb1e1a9805c37335562e8dc02e7d7848c.

The commit incorrectly assumed that fork path always performed
threadgroup_change_begin/end() and depended on that for
synchronization against task exit and cgroup migration paths instead
of explicitly grabbing task_lock().

threadgroup_change is not locked when forking a new process (as
opposed to a new thread in the same process) and even if it were it
wouldn't be effective as different processes use different threadgroup
locks.

Revert the incorrect optimization.

Signed-off-by: Tejun Heo
LKML-Reference:
Acked-by: Li Zefan
Bitterly-Acked-by: Frederic Weisbecker
Cc: stable@vger.kernel.org

Tejun Heo
2012-10-20 05:08:49 +0800
bbc2e3ef8 pidns: remove recursion from free_pid_ns() ... Browse Code »

free_pid_ns() operates in a recursive fashion:

free_pid_ns(parent)
put_pid_ns(parent)
kref_put(&ns->kref, free_pid_ns);
free_pid_ns

thus if there was a huge nesting of namespaces the userspace may trigger
avalanche calling of free_pid_ns leading to kernel stack exhausting and a
panic eventually.

This patch turns the recursion into an iterative loop.

Based on a patch by Andrew Vagin.

[akpm@linux-foundation.org: export put_pid_ns() to modules]
Signed-off-by: Cyrill Gorcunov
Cc: Andrew Vagin
Cc: Oleg Nesterov
Cc: "Eric W. Biederman"
Cc: Pavel Emelyanov
Cc: Greg KH
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cyrill Gorcunov
2012-10-20 05:07:47 +0800
2702b1526 kernel/sys.c: fix stack memory content leak via UNAME26 ... Browse Code »

Calling uname() with the UNAME26 personality set allows a leak of kernel
stack contents. This fixes it by defensively calculating the length of
copy_to_user() call, making the len argument unsigned, and initializing
the stack buffer to zero (now technically unneeded, but hey, overkill).

CVE-2012-0957

Reported-by: PaX Team
Signed-off-by: Kees Cook
Cc: Andi Kleen
Cc: PaX Team
Cc: Brad Spengler
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
2012-10-20 05:07:47 +0800

17 Oct, 2012

2 commits

85eae82a0 printk: Fix scheduling-while-atomic problem in console_cpu_notify() ... Browse Code »

The console_cpu_notify() function runs with interrupts disabled in the
CPU_DYING case. It therefore cannot block, for example, as will happen
when it calls console_lock(). Therefore, remove the CPU_DYING leg of
the switch statement to avoid this problem.

Signed-off-by: Paul E. McKenney
Reviewed-by: Srivatsa S. Bhat
Signed-off-by: Linus Torvalds

Paul E. McKenney
2012-10-17 09:17:44 +0800
1f5320d59 cgroup: notify_on_release may not be triggered in some cases ... Browse Code »

notify_on_release must be triggered when the last process in a cgroup is
move to another. But if the first(and only) process in a cgroup is moved to
another, notify_on_release is not triggered.

# mkdir /cgroup/cpu/SRC
# mkdir /cgroup/cpu/DST
#
# echo 1 >/cgroup/cpu/SRC/notify_on_release
# echo 1 >/cgroup/cpu/DST/notify_on_release
#
# sleep 300 &
[1] 8629
#
# echo 8629 >/cgroup/cpu/SRC/tasks
# echo 8629 >/cgroup/cpu/DST/tasks
-> notify_on_release for /SRC must be triggered at this point,
but it isn't.

This is because put_css_set() is called before setting CGRP_RELEASABLE
in cgroup_task_migrate(), and is a regression introduce by the
commit:74a1166d(cgroups: make procs file writable), which was merged
into v3.0.

Cc: Ben Blum
Cc: # v3.0.x and later
Acked-by: Li Zefan
Signed-off-by: Daisuke Nishimura
Signed-off-by: Tejun Heo

Daisuke Nishimura
2012-10-17 08:09:36 +0800

15 Oct, 2012

1 commit

d25282d1c Merge branch 'modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux ... Browse Code »

Pull module signing support from Rusty Russell:
"module signing is the highlight, but it's an all-over David Howells frenzy..."

Hmm "Magrathea: Glacier signing key". Somebody has been reading too much HHGTTG.

* 'modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (37 commits)
X.509: Fix indefinite length element skip error handling
X.509: Convert some printk calls to pr_devel
asymmetric keys: fix printk format warning
MODSIGN: Fix 32-bit overflow in X.509 certificate validity date checking
MODSIGN: Make mrproper should remove generated files.
MODSIGN: Use utf8 strings in signer's name in autogenerated X.509 certs
MODSIGN: Use the same digest for the autogen key sig as for the module sig
MODSIGN: Sign modules during the build process
MODSIGN: Provide a script for generating a key ID from an X.509 cert
MODSIGN: Implement module signature checking
MODSIGN: Provide module signing public keys to the kernel
MODSIGN: Automatically generate module signing keys if missing
MODSIGN: Provide Kconfig options
MODSIGN: Provide gitignore and make clean rules for extra files
MODSIGN: Add FIPS policy
module: signature checking hook
X.509: Add a crypto key parser for binary (DER) X.509 certificates
MPILIB: Provide a function to read raw data into an MPI
X.509: Add an ASN.1 decoder
X.509: Add simple ASN.1 grammar compiler
...

Linus Torvalds
2012-10-15 04:39:34 +0800

13 Oct, 2012

3 commits

6c536a17f Merge tag 'for_linus-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb ... Browse Code »

Pull KGDB/KDB fixes and cleanups from Jason Wessel:
"Cleanups
- Clean up compile warnings in kgdboc.c and x86/kernel/kgdb.c
- Add module event hooks for simplified debugging with gdb
Fixes
- Fix kdb to stop paging with 'q' on bta and dmesg
- Fix for data that scrolls off the vga console due to line wrapping
when using the kdb pager
New
- The debug core registers for kernel module events which allows a
kernel aware gdb to automatically load symbols and break on entry
to a kernel module
- Allow kgdboc=kdb to setup kdb on the vga console"

* tag 'for_linus-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb:
tty/console: fix warnings in drivers/tty/serial/kgdboc.c
kdb,vt_console: Fix missed data due to pager overruns
kdb: Fix dmesg/bta scroll to quit with 'q'
kgdboc: Accept either kbd or kdb to activate the vga + keyboard kdb shell
kgdb,x86: fix warning about unused variable
mips,kgdb: fix recursive page fault with CONFIG_KPROBES
kgdb: Add module event hooks

Linus Torvalds
2012-10-13 10:16:58 +0800
ade0899b2 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull perf updates from Ingo Molnar:
"This tree includes some late late perf items that missed the first
round:

tools:

- Bash auto completion improvements, now we can auto complete the
tools long options, tracepoint event names, etc, from Namhyung Kim.

- Look up thread using tid instead of pid in 'perf sched'.

- Move global variables into a perf_kvm struct, from David Ahern.

- Hists refactorings, preparatory for improved 'diff' command, from
Jiri Olsa.

- Hists refactorings, preparatory for event group viewieng work, from
Namhyung Kim.

- Remove double negation on optional feature macro definitions, from
Namhyung Kim.

- Remove several cases of needless global variables, on most
builtins.

- misc fixes

kernel:

- sysfs support for IBS on AMD CPUs, from Robert Richter.

- Support for an upcoming Intel CPU, the Xeon-Phi / Knights Corner
HPC blade PMU, from Vince Weaver.

- misc fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
perf: Fix perf_cgroup_switch for sw-events
perf: Clarify perf_cpu_context::active_pmu usage by renaming it to ::unique_pmu
perf/AMD/IBS: Add sysfs support
perf hists: Add more helpers for hist entry stat
perf hists: Move he->stat.nr_events initialization to a template
perf hists: Introduce struct he_stat
perf diff: Removing the total_period argument from output code
perf tool: Add hpp interface to enable/disable hpp column
perf tools: Removing hists pair argument from output path
perf hists: Separate overhead and baseline columns
perf diff: Refactor diff displacement possition info
perf hists: Add struct hists pointer to struct hist_entry
perf tools: Complete tracepoint event names
perf/x86: Add support for Intel Xeon-Phi Knights Corner PMU
perf evlist: Remove some unused methods
perf evlist: Introduce add_newtp method
perf kvm: Move global variables into a perf_kvm struct
perf tools: Convert to BACKTRACE_SUPPORT
perf tools: Long option completion support for each subcommands
perf tools: Complete long option names of perf command
...

Linus Torvalds
2012-10-13 09:20:11 +0800
4e21fc138 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal ... Browse Code »

Pull third pile of kernel_execve() patches from Al Viro:
"The last bits of infrastructure for kernel_thread() et.al., with
alpha/arm/x86 use of those. Plus sanitizing the asm glue and
do_notify_resume() on alpha, fixing the "disabled irq while running
task_work stuff" breakage there.

At that point the rest of kernel_thread/kernel_execve/sys_execve work
can be done independently for different architectures. The only
pending bits that do depend on having all architectures converted are
restrictred to fs/* and kernel/* - that'll obviously have to wait for
the next cycle.

I thought we'd have to wait for all of them done before we start
eliminating the longjump-style insanity in kernel_execve(), but it
turned out there's a very simple way to do that without flagday-style
changes."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
alpha: switch to saner kernel_execve() semantics
arm: switch to saner kernel_execve() semantics
x86, um: convert to saner kernel_execve() semantics
infrastructure for saner ret_from_kernel_thread semantics
make sure that kernel_thread() callbacks call do_exit() themselves
make sure that we always have a return path from kernel_execve()
ppc: eeh_event should just use kthread_run()
don't bother with kernel_thread/kernel_execve for launching linuxrc
alpha: get rid of switch_stack argument of do_work_pending()
alpha: don't bother passing switch_stack separately from regs
alpha: take SIGPENDING/NOTIFY_RESUME loop into signal.c
alpha: simplify TIF_NEED_RESCHED handling

Linus Torvalds
2012-10-13 09:05:52 +0800