Eric Lee / smarc-fsl-linux-kernel

13 Apr, 2012

6 commits

fbb67524e cred: copy_process() should clear child->replacement_session_keyring ... Browse Code »

commit 79549c6dfda0603dba9a70a53467ce62d9335c33 upstream.

keyctl_session_to_parent(task) sets ->replacement_session_keyring,
it should be processed and cleared by key_replace_session_keyring().

However, this task can fork before it notices TIF_NOTIFY_RESUME and
the new child gets the bogus ->replacement_session_keyring copied by
dup_task_struct(). This is obviously wrong and, if nothing else, this
leads to put_cred(already_freed_cred).

change copy_creds() to clear this member. If copy_process() fails
before this point the wrong ->replacement_session_keyring doesn't
matter, exit_creds() won't be called.

Signed-off-by: Oleg Nesterov
Acked-by: David Howells
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Oleg Nesterov
2012-04-13 23:33:50 +0800
7e2d1469b sysctl: fix write access to dmesg_restrict/kptr_restrict ... Browse Code »

commit 620f6e8e855d6d447688a5f67a4e176944a084e8 upstream.

Commit bfdc0b4 adds code to restrict access to dmesg_restrict,
however, it incorrectly alters kptr_restrict rather than
dmesg_restrict.

The original patch from Richard Weinberger
(https://lkml.org/lkml/2011/3/14/362) alters dmesg_restrict as
expected, and so the patch seems to have been misapplied.

This adds the CAP_SYS_ADMIN check to both dmesg_restrict and
kptr_restrict, since both are sensitive.

Reported-by: Phillip Lougher
Signed-off-by: Kees Cook
Acked-by: Serge Hallyn
Acked-by: Richard Weinberger
Signed-off-by: James Morris
Signed-off-by: Greg Kroah-Hartman

Kees Cook
2012-04-13 23:33:49 +0800
5aa0f9815 kgdb,debug_core: pass the breakpoint struct instead of address and memory ... Browse Code »

commit 98b54aa1a2241b59372468bd1e9c2d207bdba54b upstream.

There is extra state information that needs to be exposed in the
kgdb_bpt structure for tracking how a breakpoint was installed. The
debug_core only uses the the probe_kernel_write() to install
breakpoints, but this is not enough for all the archs. Some arch such
as x86 need to use text_poke() in order to install a breakpoint into a
read only page.

Passing the kgdb_bpt structure to kgdb_arch_set_breakpoint() and
kgdb_arch_remove_breakpoint() allows other archs to set the type
variable which indicates how the breakpoint was installed.

Signed-off-by: Jason Wessel
Signed-off-by: Greg Kroah-Hartman

Jason Wessel
2012-04-13 23:33:48 +0800
7be29c2aa tracing: Fix ent_size in trace output ... Browse Code »

commit 12b5da349a8b94c9dbc3430a6bc42eabd9eaf50b upstream.

When reading the trace file, the records of each of the per_cpu buffers
are examined to find the next event to print out. At the point of looking
at the event, the size of the event is recorded. But if the first event is
chosen, the other events in the other CPU buffers will reset the event size
that is stored in the iterator descriptor, causing the event size passed to
the output functions to be incorrect.

In most cases this is not a problem, but for the case of stack traces, it
is. With the change to the stack tracing to record a dynamic number of
back traces, the output depends on the size of the entry instead of the
fixed 8 back traces. When the entry size is not correct, the back traces
would not be fully printed.

Note, reading from the per-cpu trace files were not affected.

Reported-by: Thomas Gleixner
Tested-by: Thomas Gleixner
Signed-off-by: Steven Rostedt
Signed-off-by: Greg Kroah-Hartman

Steven Rostedt
2012-04-13 23:33:47 +0800
a4c3bcc7b tracing: Fix ftrace stack trace entries ... Browse Code »

commit 01de982abf8c9e10fc3089e10585cd2cc914bdab upstream.

8 hex characters tell only half the tale for 64 bit CPUs,
so use the appropriate length.

Link: http://lkml.kernel.org/r/1332411501-8059-2-git-send-email-wolfgang.mauerer@siemens.com

Signed-off-by: Wolfgang Mauerer
Signed-off-by: Steven Rostedt
Signed-off-by: Greg Kroah-Hartman

Wolfgang Mauerer
2012-04-13 23:33:47 +0800
14768b0d5 genirq: Adjust irq thread affinity on IRQ_SET_MASK_OK_NOCOPY return value ... Browse Code »

commit f5cb92ac82d06cb583c1f66666314c5c0a4d7913 upstream.

irq_move_masked_irq() checks the return code of
chip->irq_set_affinity() only for 0, but IRQ_SET_MASK_OK_NOCOPY is
also a valid return code, which is there to avoid a redundant copy of
the cpumask. But in case of IRQ_SET_MASK_OK_NOCOPY we not only avoid
the redundant copy, we also fail to adjust the thread affinity of an
eventually threaded interrupt handler.

Handle IRQ_SET_MASK_OK (==0) and IRQ_SET_MASK_OK_NOCOPY(==1) return
values correctly by checking the valid return values seperately.

Signed-off-by: Jiang Liu
Cc: Jiang Liu
Cc: Keping Chen
Link: http://lkml.kernel.org/r/1333120296-13563-2-git-send-email-jiang.liu@huawei.com
Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Jiang Liu
2012-04-13 23:33:46 +0800

03 Apr, 2012

6 commits

c88e98e32 module: Remove module size limit ... Browse Code »

commit f946eeb9313ff1470758e171a60fe7438a2ded3f upstream.

Module size was limited to 64MB, this was legacy limitation due to vmalloc()
which was removed a while ago.

Limiting module size to 64MB is both pointless and affects real world use
cases.

Cc: Tim Abbott
Signed-off-by: Sasha Levin
Signed-off-by: Rusty Russell
Signed-off-by: Greg Kroah-Hartman

Sasha Levin
2012-04-03 00:53:07 +0800
4b952bf66 PM / Hibernate: Enable usermodehelpers in hibernate() error path ... Browse Code »

commit 05b4877f6a4f1ba4952d1222213d262bf8c132b7 upstream.

If create_basic_memory_bitmaps() fails, usermodehelpers are not re-enabled
before returning. Fix this. And while at it, reword the goto labels so that
they look more meaningful.

Signed-off-by: Srivatsa S. Bhat
Signed-off-by: Rafael J. Wysocki
Signed-off-by: Greg Kroah-Hartman

Srivatsa S. Bhat
2012-04-03 00:53:03 +0800
d06ce5423 genirq: Fix incorrect check for forced IRQ thread handler ... Browse Code »

commit 540b60e24f3f4781d80e47122f0c4486a03375b8 upstream.

We do not want a bitwise AND between boolean operands

Signed-off-by: Alexander Gordeev
Cc: Oleg Nesterov
Link: http://lkml.kernel.org/r/20120309135912.GA2114@dhcp-26-207.brq.redhat.com
Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Alexander Gordeev
2012-04-03 00:52:35 +0800
4e6c04eca genirq: Fix long-term regression in genirq irq_set_irq_type() handling ... Browse Code »

commit a09b659cd68c10ec6a30cb91ebd2c327fcd5bfe5 upstream.

In 2008, commit 0c5d1eb77a8be ("genirq: record trigger type") modified the
way set_irq_type() handles the 'no trigger' condition. However, this has
an adverse effect on PCMCIA support on Intel StrongARM and probably PXA
platforms.

PCMCIA has several status signals on the socket which can trigger
interrupts; some of these status signals depend on the card's mode
(whether it is configured in memory or IO mode). For example, cards have
a 'Ready/IRQ' signal: in memory mode, this provides an indication to
PCMCIA that the card has finished its power up initialization. In IO
mode, it provides the device interrupt signal. Other status signals
switch between on-board battery status and loud speaker output.

In classical PCMCIA implementations, where you have a specific socket
controller, the controller provides a method to mask interrupts from the
socket, and importantly ignore any state transitions on the pins which
correspond with interrupts once masked. This masking prevents unwanted
events caused by the removal and application of socket power being
forwarded.

However, on platforms where there is no socket controller, the PCMCIA
status and interrupt signals are routed to standard edge-triggered GPIOs.
These GPIOs can be configured to interrupt on rising edge, falling edge,
or never. This is where the problems start.

Edge triggered interrupts are required to record events while disabled via
the usual methods of {free,request,disable,enable}_irq() to prevent
problems with dropped interrupts (eg, the 8390 driver uses disable_irq()
to defer the delivery of interrupts). As a result, these interfaces can
not be used to implement the desired behaviour.

The side effect of this is that if the 'Ready/IRQ' GPIO is disabled via
disable_irq() on suspend, and enabled via enable_irq() after resume, we
will record the state transitions caused by powering events as valid
interrupts, and foward them to the card driver, which may attempt to
access a card which is not powered up.

This leads delays resume while drivers spin in their interrupt handlers,
and complaints from drivers before they realize what's happened.

Moreover, in the case of the 'Ready/IRQ' signal, this is requested and
freed by the card driver itself; the PCMCIA core has no idea whether the
interrupt is requested, and, therefore, whether a call to disable_irq()
would be valid. (We tried this around 2.4.17 / 2.5.1 kernel era, and
ended up throwing it out because of this problem.)

Therefore, it was decided back in around 2002 to disable the edge
triggering instead, resulting in all state transitions on the GPIO being
ignored. That's what we actually need the hardware to do.

The commit above changes this behaviour; it explicitly prevents the 'no
trigger' state being selected.

The reason that request_irq() does not accept the 'no trigger' state is
for compatibility with existing drivers which do not provide their desired
triggering configuration. The set_irq_type() function is 'new' and not
used by non-trigger aware drivers.

Therefore, revert this change, and restore previously working platforms
back to their former state.

Signed-off-by: Russell King
Cc: linux@arm.linux.org.uk
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Russell King
2012-04-03 00:52:35 +0800
1f2c44df4 ntp: Fix integer overflow when setting time ... Browse Code »

commit a078c6d0e6288fad6d83fb6d5edd91ddb7b6ab33 upstream.

'long secs' is passed as divisor to div_s64, which accepts a 32bit
divisor. On 64bit machines that value is trimmed back from 8 bytes
back to 4, causing a divide by zero when the number is bigger than
(1 << 32) - 1 and all 32 lower bits are 0.

Use div64_long() instead.

Signed-off-by: Sasha Levin
Cc: johnstul@us.ibm.com
Link: http://lkml.kernel.org/r/1331829374-31543-2-git-send-email-levinsasha928@gmail.com
Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Sasha Levin
2012-04-03 00:52:35 +0800
29bfcea07 futex: Cover all PI opcodes with cmpxchg enabled check ... Browse Code »

commit 59263b513c11398cd66a52d4c5b2b118ce1e0359 upstream.

Some of the newer futex PI opcodes do not check the cmpxchg enabled
variable and call unconditionally into the handling functions. Cover
all PI opcodes in a separate check.

Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Darren Hart
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2012-04-03 00:52:30 +0800

20 Mar, 2012

1 commit

2053689f6 Block: use a freezable workqueue for disk-event polling ... Browse Code »

commit 62d3c5439c534b0e6c653fc63e6d8c67be3a57b1 upstream.

This patch (as1519) fixes a bug in the block layer's disk-events
polling. The polling is done by a work routine queued on the
system_nrt_wq workqueue. Since that workqueue isn't freezable, the
polling continues even in the middle of a system sleep transition.

Obviously, polling a suspended drive for media changes and such isn't
a good thing to do; in the case of USB mass-storage devices it can
lead to real problems requiring device resets and even re-enumeration.

The patch fixes things by creating a new system-wide, non-reentrant,
freezable workqueue and using it for disk-events polling.

Signed-off-by: Alan Stern
Acked-by: Tejun Heo
Acked-by: Rafael J. Wysocki
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman

Alan Stern
2012-03-20 00:02:34 +0800

13 Mar, 2012

2 commits

4050cecba kprobes: return proper error code from register_kprobe() ... Browse Code »

commit f986a499ef6f317d906e6f6f281be966e1237a10 upstream.

register_kprobe() aborts if the address of the new request falls in a
prohibited area (such as ftrace pouch, __kprobes annotated functions,
non-kernel text addresses, jump label text). We however don't return the
right error on this abort, resulting in a silent failure - incorrect
adding/reporting of kprobes ('perf probe do_fork+18' or 'perf probe
mcount' for instance).

In V2 we are incorporating Masami Hiramatsu's feedback.

This patch fixes it by returning -EINVAL upon failure.

While we are here, rename the label used for exit to be more appropriate.

Signed-off-by: Ananth N Mavinakayanahalli
Signed-off-by: Prashanth K Nageshappa
Acked-by: Masami Hiramatsu
Cc: Jason Baron
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Prashanth Nageshappa
2012-03-13 03:31:26 +0800
8ab46fc85 genirq: Clear action->thread_mask if IRQ_ONESHOT is not set ... Browse Code »

commit 52abb700e16a9aa4cbc03f3d7f80206cbbc80680 upstream.

Xommit ac5637611(genirq: Unmask oneshot irqs when thread was not woken)
fails to unmask when a !IRQ_ONESHOT threaded handler is handled by
handle_level_irq.

This happens because thread_mask is or'ed unconditionally in
irq_wake_thread(), but for !IRQ_ONESHOT interrupts never cleared. So
the check for !desc->thread_active fails and keeps the interrupt
disabled.

Keep the thread_mask zero for !IRQ_ONESHOT interrupts.

Document the thread_mask magic while at it.

Reported-and-tested-by: Sven Joachim
Reported-and-tested-by: Stefan Lippers-Hollmann
Signed-off-by: Thomas Gleixner
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2012-03-13 03:31:24 +0800

01 Mar, 2012

3 commits

7741374fa epoll: introduce POLLFREE to flush ->signalfd_wqh before kfree() ... Browse Code »

commit d80e731ecab420ddcb79ee9d0ac427acbc187b4b upstream.

This patch is intentionally incomplete to simplify the review.
It ignores ep_unregister_pollwait() which plays with the same wqh.
See the next change.

epoll assumes that the EPOLL_CTL_ADD'ed file controls everything
f_op->poll() needs. In particular it assumes that the wait queue
can't go away until eventpoll_release(). This is not true in case
of signalfd, the task which does EPOLL_CTL_ADD uses its ->sighand
which is not connected to the file.

This patch adds the special event, POLLFREE, currently only for
epoll. It expects that init_poll_funcptr()'ed hook should do the
necessary cleanup. Perhaps it should be defined as EPOLLFREE in
eventpoll.

__cleanup_sighand() is changed to do wake_up_poll(POLLFREE) if
->signalfd_wqh is not empty, we add the new signalfd_cleanup()
helper.

ep_poll_callback(POLLFREE) simply does list_del_init(task_list).
This make this poll entry inconsistent, but we don't care. If you
share epoll fd which contains our sigfd with another process you
should blame yourself. signalfd is "really special". I simply do
not know how we can define the "right" semantics if it used with
epoll.

The main problem is, epoll calls signalfd_poll() once to establish
the connection with the wait queue, after that signalfd_poll(NULL)
returns the different/inconsistent results depending on who does
EPOLL_CTL_MOD/signalfd_read/etc. IOW: apart from sigmask, signalfd
has nothing to do with the file, it works with the current thread.

In short: this patch is the hack which tries to fix the symptoms.
It also assumes that nobody can take tasklist_lock under epoll
locks, this seems to be true.

Note:

- we do not have wake_up_all_poll() but wake_up_poll()
is fine, poll/epoll doesn't use WQ_FLAG_EXCLUSIVE.

- signalfd_cleanup() uses POLLHUP along with POLLFREE,
we need a couple of simple changes in eventpoll.c to
make sure it can't be "lost".

Reported-by: Maxime Bizon
Signed-off-by: Oleg Nesterov
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Oleg Nesterov
2012-03-01 08:31:23 +0800
37ef0e621 genirq: Handle pending irqs in irq_startup() ... Browse Code »

commit b4bc724e82e80478cba5fe9825b62e71ddf78757 upstream.

An interrupt might be pending when irq_startup() is called, but the
startup code does not invoke the resend logic. In some cases this
prevents the device from issuing another interrupt which renders the
device non functional.

Call the resend function in irq_startup() to keep things going.

Reported-and-tested-by: Russell King
Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2012-03-01 08:31:17 +0800
aa0eb3474 genirq: Unmask oneshot irqs when thread was not woken ... Browse Code »

commit ac5637611150281f398bb7a47e3fcb69a09e7803 upstream.

When the primary handler of an interrupt which is marked IRQ_ONESHOT
returns IRQ_HANDLED or IRQ_NONE, then the interrupt thread is not
woken and the unmask logic of the interrupt line is never
invoked. This keeps the interrupt masked forever.

This was not noticed as most IRQ_ONESHOT users wake the thread
unconditionally (usually because they cannot access the underlying
device from hard interrupt context). Though this behaviour was nowhere
documented and not necessarily intentional. Some drivers can avoid the
thread wakeup in certain cases and run into the situation where the
interrupt line s kept masked.

Handle it gracefully.

Reported-and-tested-by: Lothar Wassmann
Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2012-03-01 08:31:16 +0800

21 Feb, 2012

1 commit

a0cbc2da8 relay: prevent integer overflow in relay_open() ... Browse Code »

commit f6302f1bcd75a042df69866d98b8d775a668f8f1 upstream.

"subbuf_size" and "n_subbufs" come from the user and they need to be
capped to prevent an integer overflow.

Signed-off-by: Dan Carpenter
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman

Dan Carpenter
2012-02-21 04:46:16 +0800

14 Feb, 2012

7 commits

6492a0fb9 lockdep, bug: Exclude TAINT_OOT_MODULE from disabling lock debugging ... Browse Code »

commit 9ec84acee1e221d99dc33237bff5e82839d10cc0 upstream.

We do want to allow lock debugging for GPL-compatible modules
that are not (yet) built in-tree. This was disabled as a
side-effect of commit 2449b8ba0745327c5fa49a8d9acffe03b2eded69
('module,bug: Add TAINT_OOT_MODULE flag for modules not built
in-tree'). Lock debug warnings now include taint flags, so
kernel developers should still be able to deflect warnings
caused by out-of-tree modules.

The TAINT_PROPRIETARY_MODULE flag for non-GPL-compatible modules
will still disable lock debugging.

Signed-off-by: Ben Hutchings
Cc: Nick Bowler
Cc: Dave Jones
Cc: Rusty Russell
Cc: Randy Dunlap
Cc: Debian kernel maintainers
Cc: Peter Zijlstra
Cc: Alan Cox
Link: http://lkml.kernel.org/r/1323268258.18450.11.camel@deadeye
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman

Ben Hutchings
2012-02-14 03:16:59 +0800
1a90d01be lockdep, bug: Exclude TAINT_FIRMWARE_WORKAROUND from disabling lockdep ... Browse Code »

commit df754e6af2f237a6c020c0daff55a1a609338e31 upstream.

It's unlikely that TAINT_FIRMWARE_WORKAROUND causes false
lockdep messages, so do not disable lockdep in that case.
We still want to keep lockdep disabled in the
TAINT_OOT_MODULE case:

- bin-only modules can cause various instabilities in
their and in unrelated kernel code

- they are impossible to debug for kernel developers

- they also typically do not have the copyright license
permission to link to the GPL-ed lockdep code.

Suggested-by: Ben Hutchings
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/n/tip-xopopjjens57r0i13qnyh2yo@git.kernel.org
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman

Peter Zijlstra
2012-02-14 03:16:58 +0800
695cb013a PM / Hibernate: Thaw kernel threads in SNAPSHOT_CREATE_IMAGE ioctl path ... Browse Code »

commit fe9161db2e6053da21e4649d77bbefaf3030b11d upstream.

In the SNAPSHOT_CREATE_IMAGE ioctl, if the call to hibernation_snapshot()
fails, the frozen tasks are not thawed.

And in the case of success, if we happen to exit due to a successful freezer
test, all tasks (including those of userspace) are thawed, whereas actually
we should have thawed only the kernel threads at that point. Fix both these
issues.

Signed-off-by: Srivatsa S. Bhat
Signed-off-by: Rafael J. Wysocki
Signed-off-by: Greg Kroah-Hartman

Srivatsa S. Bhat
2012-02-14 03:16:56 +0800
26b67a54a PM / Hibernate: Thaw processes in SNAPSHOT_CREATE_IMAGE ioctl test path ... Browse Code »

commit 97819a26224f019e73d88bb2fd4eb5a614860461 upstream.

Commit 2aede851ddf08666f68ffc17be446420e9d2a056 (PM / Hibernate: Freeze
kernel threads after preallocating memory) moved the freezing of kernel
threads to hibernation_snapshot() function.

So now, if the call to hibernation_snapshot() returns early due to a
successful hibernation test, the caller has to thaw processes to ensure
that the system gets back to its original state.

But in SNAPSHOT_CREATE_IMAGE hibernation ioctl, the caller does not thaw
processes in case hibernation_snapshot() returned due to a successful
freezer test. Fix this issue. But note we still send the value of 'in_suspend'
(which is now 0) to userspace, because we are not in an error path per-se,
and moreover, the value of in_suspend correctly depicts the situation here.

Signed-off-by: Srivatsa S. Bhat
Signed-off-by: Rafael J. Wysocki
Signed-off-by: Greg Kroah-Hartman

Srivatsa S. Bhat
2012-02-14 03:16:56 +0800
6341f8928 sched/rt: Fix task stack corruption under __ARCH_WANT_INTERRUPTS_ON_CTXSW ... Browse Code »

commit cb297a3e433dbdcf7ad81e0564e7b804c941ff0d upstream.

This issue happens under the following conditions:

1. preemption is off
2. __ARCH_WANT_INTERRUPTS_ON_CTXSW is defined
3. RT scheduling class
4. SMP system

Sequence is as follows:

1.suppose current task is A. start schedule()
2.task A is enqueued pushable task at the entry of schedule()
__schedule
prev = rq->curr;
...
put_prev_task
put_prev_task_rt
enqueue_pushable_task
4.pick the task B as next task.
next = pick_next_task(rq);
3.rq->curr set to task B and context_switch is started.
rq->curr = next;
4.At the entry of context_swtich, release this cpu's rq->lock.
context_switch
prepare_task_switch
prepare_lock_switch
raw_spin_unlock_irq(&rq->lock);
5.Shortly after rq->lock is released, interrupt is occurred and start IRQ context
6.try_to_wake_up() which called by ISR acquires rq->lock
try_to_wake_up
ttwu_remote
rq = __task_rq_lock(p)
ttwu_do_wakeup(rq, p, wake_flags);
task_woken_rt
7.push_rt_task picks the task A which is enqueued before.
task_woken_rt
push_rt_tasks(rq)
next_task = pick_next_pushable_task(rq)
8.At find_lock_lowest_rq(), If double_lock_balance() returns 0,
lowest_rq can be the remote rq.
(But,If preemption is on, double_lock_balance always return 1 and it
does't happen.)
push_rt_task
find_lock_lowest_rq
if (double_lock_balance(rq, lowest_rq))..
9.find_lock_lowest_rq return the available rq. task A is migrated to
the remote cpu/rq.
push_rt_task
...
deactivate_task(rq, next_task, 0);
set_task_cpu(next_task, lowest_rq->cpu);
activate_task(lowest_rq, next_task, 0);
10. But, task A is on irq context at this cpu.
So, task A is scheduled by two cpus at the same time until restore from IRQ.
Task A's stack is corrupted.

To fix it, don't migrate an RT task if it's still running.

Signed-off-by: Chanho Min
Signed-off-by: Peter Zijlstra
Acked-by: Steven Rostedt
Link: http://lkml.kernel.org/r/CAOAMb1BHA=5fm7KTewYyke6u-8DP0iUuJMpgQw54vNeXFsGpoQ@mail.gmail.com
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman

Chanho Min
2012-02-14 03:16:56 +0800
d483054fe PM / Hibernate: Fix s2disk regression related to freezing workqueues ... Browse Code »

commit 181e9bdef37bfcaa41f3ab6c948a2a0d60a268b5 upstream.

Commit 2aede851ddf08666f68ffc17be446420e9d2a056

PM / Hibernate: Freeze kernel threads after preallocating memory

introduced a mechanism by which kernel threads were frozen after
the preallocation of hibernate image memory to avoid problems with
frozen kernel threads not responding to memory freeing requests.
However, it overlooked the s2disk code path in which the
SNAPSHOT_CREATE_IMAGE ioctl was run directly after SNAPSHOT_FREE,
which caused freeze_workqueues_begin() to BUG(), because it saw
that worqueues had been already frozen.

Although in principle this issue might be addressed by removing
the relevant BUG_ON() from freeze_workqueues_begin(), that would
reintroduce the very problem that commit 2aede851ddf08666f68ffc17be4
attempted to avoid into that particular code path. For this reason,
to fix the issue at hand, introduce thaw_kernel_threads() and make
the SNAPSHOT_FREE ioctl execute it.

Special thanks to Srivatsa S. Bhat for detailed analysis of the
problem.

Reported-and-tested-by: Jiri Slaby
Signed-off-by: Rafael J. Wysocki
Acked-by: Srivatsa S. Bhat
Signed-off-by: Greg Kroah-Hartman

Rafael J. Wysocki
2012-02-14 03:16:55 +0800
ff016619c kprobes: fix a memory leak in function pre_handler_kretprobe() ... Browse Code »

commit 55ca6140e9bb307efc97a9301a4f501de02a6fd6 upstream.

In function pre_handler_kretprobe(), the allocated kretprobe_instance
object will get leaked if the entry_handler callback returns non-zero.
This may cause all the preallocated kretprobe_instance objects exhausted.

This issue can be reproduced by changing
samples/kprobes/kretprobe_example.c to probe "mutex_unlock". And the fix
is straightforward: just put the allocated kretprobe_instance object back
onto the free_instances list.

[akpm@linux-foundation.org: use raw_spin_lock/unlock]
Signed-off-by: Jiang Liu
Acked-by: Jim Keniston
Acked-by: Ananth N Mavinakayanahalli
Cc: Masami Hiramatsu
Cc: Anil S Keshavamurthy
Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Jiang Liu
2012-02-14 03:16:54 +0800

26 Jan, 2012

3 commits

6ee7663e9 kprobes: initialize before using a hlist ... Browse Code »

commit d496aab567e7e52b3e974c9192a5de6e77dce32c upstream.

Commit ef53d9c5e ("kprobes: improve kretprobe scalability with hashed
locking") introduced a bug where we can potentially leak
kretprobe_instances since we initialize a hlist head after having used
it.

Initialize the hlist head before using it.

Reported by: Jim Keniston
Acked-by: Jim Keniston
Signed-off-by: Ananth N Mavinakayanahalli
Acked-by: Masami Hiramatsu
Cc: Srinivasa D S
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Ananth N Mavinakayanahalli
2012-01-26 08:13:57 +0800
8962e9fca tracepoints/module: Fix disabling tracepoints with taint CRAP or OOT ... Browse Code »

commit c10076c4304083af15a41f6bc5e657e781c1f9a6 upstream.

Tracepoints are disabled for tainted modules, which is usually because the
module is either proprietary or was forced, and we don't want either of them
using kernel tracepoints.

But, a module can also be tainted by being in the staging directory or
compiled out of tree. Either is fine for use with tracepoints, no need
to punish them. I found this out when I noticed that my sample trace event
module, when done out of tree, stopped working.

Cc: Mathieu Desnoyers
Cc: Ben Hutchings
Cc: Dave Jones
Cc: Greg Kroah-Hartman
Cc: Rusty Russell
Signed-off-by: Steven Rostedt
Signed-off-by: Greg Kroah-Hartman

Steven Rostedt
2012-01-26 08:13:52 +0800
afe62ad1e ftrace: Fix unregister ftrace_ops accounting ... Browse Code »

commit 30fb6aa74011dcf595f306ca2727254d708b786e upstream.

Multiple users of the function tracer can register their functions
with the ftrace_ops structure. The accounting within ftrace will
update the counter on each function record that is being traced.
When the ftrace_ops filtering adds or removes functions, the
function records will be updated accordingly if the ftrace_ops is
still registered.

When a ftrace_ops is removed, the counter of the function records,
that the ftrace_ops traces, are decremented. When they reach zero
the functions that they represent are modified to stop calling the
mcount code.

When changes are made, the code is updated via stop_machine() with
a command passed to the function to tell it what to do. There is an
ENABLE and DISABLE command that tells the called function to enable
or disable the functions. But the ENABLE is really a misnomer as it
should just update the records, as records that have been enabled
and now have a count of zero should be disabled.

The DISABLE command is used to disable all functions regardless of
their counter values. This is the big off switch and is not the
complement of the ENABLE command.

To make matters worse, when a ftrace_ops is unregistered and there
is another ftrace_ops registered, neither the DISABLE nor the
ENABLE command are set when calling into the stop_machine() function
and the records will not be updated to match their counter. A command
is passed to that function that will update the mcount code to call
the registered callback directly if it is the only one left. This
means that the ftrace_ops that is still registered will have its callback
called by all functions that have been set for it as well as the ftrace_ops
that was just unregistered.

Here's a way to trigger this bug. Compile the kernel with
CONFIG_FUNCTION_PROFILER set and with CONFIG_FUNCTION_GRAPH not set:

CONFIG_FUNCTION_PROFILER=y
# CONFIG_FUNCTION_GRAPH is not set

This will force the function profiler to use the function tracer instead
of the function graph tracer.

# cd /sys/kernel/debug/tracing
# echo schedule > set_ftrace_filter
# echo function > current_tracer
# cat set_ftrace_filter
schedule
# cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 692/68108025 #P:4
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
kworker/0:2-909 [000] .... 531.235574: schedule -0 [001] .N.. 531.235575: schedule function_profile_enabled
# echo 0 > function_porfile_enabled
# cat set_ftrace_filter
schedule
# cat trace
# tracer: function
#
# entries-in-buffer/entries-written: 159701/118821262 #P:4
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
-0 [002] ...1 604.870655: local_touch_nmi -0 [002] d..1 604.870655: enter_idle -0 [002] d..1 604.870656: atomic_notifier_call_chain -0 [002] d..1 604.870656: __atomic_notifier_call_chain
Signed-off-by: Steven Rostedt
Signed-off-by: Greg Kroah-Hartman

Jiri Olsa
2012-01-26 08:13:29 +0800

13 Jan, 2012

1 commit

62cf6918d cgroup: fix to allow mounting a hierarchy by name ... Browse Code »

commit 0d19ea866562e46989412a0676412fa0983c9ce7 upstream.

If we mount a hierarchy with a specified name, the name is unique,
and we can use it to mount the hierarchy without specifying its
set of subsystem names. This feature is documented is
Documentation/cgroups/cgroups.txt section 2.3

Here's an example:

# mount -t cgroup -o cpuset,name=myhier xxx /cgroup1
# mount -t cgroup -o name=myhier xxx /cgroup2

But it was broken by commit 32a8cf235e2f192eb002755076994525cdbaa35a
(cgroup: make the mount options parsing more accurate)

This fixes the regression.

Signed-off-by: Li Zefan
Signed-off-by: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Li Zefan
2012-01-13 03:29:35 +0800

05 Jan, 2012

2 commits

8a88951b5 ptrace: ensure JOBCTL_STOP_SIGMASK is not zero after detach ... Browse Code »

This is the temporary simple fix for 3.2, we need more changes in this
area.

1. do_signal_stop() assumes that the running untraced thread in the
stopped thread group is not possible. This was our goal but it is
not yet achieved: a stopped-but-resumed tracee can clone the running
thread which can initiate another group-stop.

Remove WARN_ON_ONCE(!current->ptrace).

2. A new thread always starts with ->jobctl = 0. If it is auto-attached
and this group is stopped, __ptrace_unlink() sets JOBCTL_STOP_PENDING
but JOBCTL_STOP_SIGMASK part is zero, this triggers WANR_ON(!signr)
in do_jobctl_trap() if another debugger attaches.

Change __ptrace_unlink() to set the artificial SIGSTOP for report.

Alternatively we could change ptrace_init_task() to copy signr from
current, but this means we can copy it for no reason and hide the
possible similar problems.

Acked-by: Tejun Heo
Cc: [3.1]
Signed-off-by: Oleg Nesterov
Signed-off-by: Linus Torvalds

Oleg Nesterov
2012-01-05 07:01:59 +0800
50b8d2574 ptrace: partially fix the do_wait(WEXITED) vs EXIT_DEAD->EXIT_ZOMBIE race ... Browse Code »
1

Test-case:

int main(void)
{
int pid, status;

pid = fork();
if (!pid) {
for (;;) {
if (!fork())
return 0;
if (waitpid(-1, &status, 0) < 0) {
printf("ERR!! wait: %m\n");
return 0;
}
}
}

assert(ptrace(PTRACE_ATTACH, pid, 0,0) == 0);
assert(waitpid(-1, NULL, 0) == pid);

assert(ptrace(PTRACE_SETOPTIONS, pid, 0,
PTRACE_O_TRACEFORK) == 0);

do {
ptrace(PTRACE_CONT, pid, 0, 0);
pid = waitpid(-1, NULL, 0);
} while (pid > 0);

return 1;
}

It fails because ->real_parent sees its child in EXIT_DEAD state
while the tracer is going to change the state back to EXIT_ZOMBIE
in wait_task_zombie().

The offending commit is 823b018e which moved the EXIT_DEAD check,
but in fact we should not blame it. The original code was not
correct as well because it didn't take ptrace_reparented() into
account and because we can't really trust ->ptrace.

This patch adds the additional check to close this particular
race but it doesn't solve the whole problem. We simply can't
rely on ->ptrace in this case, it can be cleared if the tracer
is multithreaded by the exiting ->parent.

I think we should kill EXIT_DEAD altogether, we should always
remove the soon-to-be-reaped child from ->children or at least
we should never do the DEAD->ZOMBIE transition. But this is too
complex for 3.2.

Reported-and-tested-by: Denys Vlasenko
Tested-by: Lukasz Michalik
Acked-by: Tejun Heo
Cc: [3.0+]
Signed-off-by: Oleg Nesterov
Signed-off-by: Linus Torvalds

Oleg Nesterov
2012-01-05 07:01:59 +0800

04 Jan, 2012

1 commit

f9fab10bb hung_task: fix false positive during vfork ... Browse Code »
1

vfork parent uninterruptibly and unkillably waits for its child to
exec/exit. This wait is of unbounded length. Ignore such waits
in the hung_task detector.

Signed-off-by: Mandeep Singh Baines
Reported-by: Sasha Levin
LKML-Reference:
Cc: Linus Torvalds
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Andrew Morton
Cc: John Kacur
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds

Mandeep Singh Baines
2012-01-04 08:14:32 +0800

01 Jan, 2012

1 commit

e6780f724 futex: Fix uninterruptible loop due to gate_area ... Browse Code »
1

It was found (by Sasha) that if you use a futex located in the gate
area we get stuck in an uninterruptible infinite loop, much like the
ZERO_PAGE issue.

While looking at this problem, PeterZ realized you'll get into similar
trouble when hitting any install_special_pages() mapping. And are there
still drivers setting up their own special mmaps without page->mapping,
and without special VM or pte flags to make get_user_pages fail?

In most cases, if page->mapping is NULL, we do not need to retry at all:
Linus points out that even /proc/sys/vm/drop_caches poses no problem,
because it ends up using remove_mapping(), which takes care not to
interfere when the page reference count is raised.

But there is still one case which does need a retry: if memory pressure
called shmem_writepage in between get_user_pages_fast dropping page
table lock and our acquiring page lock, then the page gets switched from
filecache to swapcache (and ->mapping set to NULL) whatever the refcount.
Fault it back in to get the page->mapping needed for key->shared.inode.

Reported-by: Sasha Levin
Signed-off-by: Hugh Dickins
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds

Hugh Dickins
2012-01-01 03:48:28 +0800

31 Dec, 2011

1 commit

3b87487ac Revert "clockevents: Set noop handler in clockevents_exchange_device()" ... Browse Code »
1

This reverts commit de28f25e8244c7353abed8de0c7792f5f883588c.

It results in resume problems for various people. See for example

http://thread.gmane.org/gmane.linux.kernel/1233033
http://thread.gmane.org/gmane.linux.kernel/1233389
http://thread.gmane.org/gmane.linux.kernel/1233159
http://thread.gmane.org/gmane.linux.kernel/1227868/focus=1230877

and the fedora and ubuntu bug reports

https://bugzilla.redhat.com/show_bug.cgi?id=767248
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/904569

which got bisected down to the stable version of this commit.

Reported-by: Jonathan Nieder
Reported-by: Phil Miller
Reported-by: Philip Langdale
Reported-by: Tim Gardner
Cc: Thomas Gleixner
Cc: Greg KH
Cc: stable@kernel.org # for stable kernels that applied the original
Signed-off-by: Linus Torvalds

Linus Torvalds
2011-12-31 05:24:40 +0800

21 Dec, 2011

4 commits

a4a492391 Merge branch 'for-3.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup ... Browse Code »

* 'for-3.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroups: fix a css_set not found bug in cgroup_attach_proc

Linus Torvalds
2011-12-21 03:44:18 +0800
5fbd305dd Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
time/clocksource: Fix kernel-doc warnings
rtc: m41t80: Workaround broken alarm functionality
rtc: Expire alarms after the time is set.

Linus Torvalds
2011-12-21 03:42:38 +0800
3d3c8f93a binary_sysctl(): fix memory leak ... Browse Code »
1

binary_sysctl() calls sysctl_getname() which allocates from names_cache
slab usin __getname()

The matching function to free the name is __putname(), and not putname()
which should be used only to match getname() allocations.

This is because when auditing is enabled, putname() calls audit_putname
*instead* (not in addition) to __putname(). Then, if a syscall is in
progress, audit_putname does not release the name - instead, it expects
the name to get released when the syscall completes, but that will happen
only if audit_getname() was called previously, i.e. if the name was
allocated with getname() rather than the naked __getname(). So,
__getname() followed by putname() ends up leaking memory.

Signed-off-by: Michel Lespinasse
Acked-by: Al Viro
Cc: Christoph Hellwig
Cc: Eric Paris
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2011-12-21 02:25:04 +0800
b246272ec cpusets: stall when updating mems_allowed for mempolicy or disjoint nodemask ... Browse Code »

Kernels where MAX_NUMNODES > BITS_PER_LONG may temporarily see an empty
nodemask in a tsk's mempolicy if its previous nodemask is remapped onto a
new set of allowed cpuset nodes where the two nodemasks, as a result of
the remap, are now disjoint.

c0ff7453bb5c ("cpuset,mm: fix no node to alloc memory when changing
cpuset's mems") adds get_mems_allowed() to prevent the set of allowed
nodes from changing for a thread. This causes any update to a set of
allowed nodes to stall until put_mems_allowed() is called.

This stall is unncessary, however, if at least one node remains unchanged
in the update to the set of allowed nodes. This was addressed by
89e8a244b97e ("cpusets: avoid looping when storing to mems_allowed if one
node remains set"), but it's still possible that an empty nodemask may be
read from a mempolicy because the old nodemask may be remapped to the new
nodemask during rebind. To prevent this, only avoid the stall if there is
no mempolicy for the thread being changed.

This is a temporary solution until all reads from mempolicy nodemasks can
be guaranteed to not be empty without the get_mems_allowed()
synchronization.

Also moves the check for nodemask intersection inside task_lock() so that
tsk->mems_allowed cannot change. This ensures that nothing can set this
tsk's mems_allowed out from under us and also protects tsk->mempolicy.

Reported-by: Miao Xie
Signed-off-by: David Rientjes
Cc: KOSAKI Motohiro
Cc: Paul Menage
Cc: Stephen Rothwell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2011-12-21 02:25:04 +0800

20 Dec, 2011

1 commit

e0197aae5 cgroups: fix a css_set not found bug in cgroup_attach_proc ... Browse Code »
1

There is a BUG when migrating a PF_EXITING proc. Since css_set_prefetch()
is not called for the PF_EXITING case, find_existing_css_set() will return
NULL inside cgroup_task_migrate() causing a BUG.

This bug is easy to reproduce. Create a zombie and echo its pid to
cgroup.procs.

$ cat zombie.c
\#include

int main()
{
if (fork())
pause();
return 0;
}
$

We are hitting this bug pretty regularly on ChromeOS.

This bug is already fixed by Tejun Heo's cgroup patchset which is
targetted for the next merge window:

https://lkml.org/lkml/2011/11/1/356

I've create a smaller patch here which just fixes this bug so that a
fix can be merged into the current release and stable.

Signed-off-by: Mandeep Singh Baines
Downstream-Bug-Report: http://crosbug.com/23953
Reviewed-by: Li Zefan
Signed-off-by: Tejun Heo
Cc: containers@lists.linux-foundation.org
Cc: cgroups@vger.kernel.org
Cc: stable@kernel.org
Cc: KAMEZAWA Hiroyuki
Cc: Frederic Weisbecker
Cc: Oleg Nesterov
Cc: Andrew Morton
Cc: Paul Menage
Cc: Olof Johansson

Mandeep Singh Baines
2011-12-20 01:09:09 +0800