Eric Lee / smarc-fsl-linux-kernel

14 Dec, 2011

2 commits

b2efa0526 block, cfq: unlink cfq_io_context's immediately ... Browse Code »
86

cic is association between io_context and request_queue. A cic is
linked from both ioc and q and should be destroyed when either one
goes away. As ioc and q both have their own locks, locking becomes a
bit complex - both orders work for removal from one but not from the
other.

Currently, cfq tries to circumvent this locking order issue with RCU.
ioc->lock nests inside queue_lock but the radix tree and cic's are
also protected by RCU allowing either side to walk their lists without
grabbing lock.

This rather unconventional use of RCU quickly devolves into extremely
fragile convolution. e.g. The following is from cfqd going away too
soon after ioc and q exits raced.

general protection fault: 0000 [#1] PREEMPT SMP
CPU 2
Modules linked in:
[ 88.503444]
Pid: 599, comm: hexdump Not tainted 3.1.0-rc10-work+ #158 Bochs Bochs
RIP: 0010:[] [] cfq_exit_single_io_context+0x58/0xf0
...
Call Trace:
[] call_for_each_cic+0x5a/0x90
[] cfq_exit_io_context+0x15/0x20
[] exit_io_context+0x100/0x140
[] do_exit+0x579/0x850
[] do_group_exit+0x5b/0xd0
[] sys_exit_group+0x17/0x20
[] system_call_fastpath+0x16/0x1b

The only real hot path here is cic lookup during request
initialization and avoiding extra locking requires very confined use
of RCU. This patch makes cic removal from both ioc and request_queue
perform double-locking and unlink immediately.

* From q side, the change is almost trivial as ioc->lock nests inside
queue_lock. It just needs to grab each ioc->lock as it walks
cic_list and unlink it.

* From ioc side, it's a bit more difficult because of inversed lock
order. ioc needs its lock to walk its cic_list but can't grab the
matching queue_lock and needs to perform unlock-relock dancing.

Unlinking is now wholly done from put_io_context() and fast path is
optimized by using the queue_lock the caller already holds, which is
by far the most common case. If the ioc accessed multiple devices,
it tries with trylock. In unlikely cases of fast path failure, it
falls back to full double-locking dance from workqueue.

Double-locking isn't the prettiest thing in the world but it's *far*
simpler and more understandable than RCU trick without adding any
meaningful overhead.

This still leaves a lot of now unnecessary RCU logics. Future patches
will trim them.

-v2: Vivek pointed out that cic->q was being dereferenced after
cic->release() was called. Updated to use local variable @this_q
instead.

Signed-off-by: Tejun Heo
Cc: Vivek Goyal
Signed-off-by: Jens Axboe

Tejun Heo
2011-12-14 07:33:39 +0800
6e736be7f block: make ioc get/put interface more conventional and fix race on alloction ... Browse Code »

Ignoring copy_io() during fork, io_context can be allocated from two
places - current_io_context() and set_task_ioprio(). The former is
always called from local task while the latter can be called from
different task. The synchornization between them are peculiar and
dubious.

* current_io_context() doesn't grab task_lock() and assumes that if it
saw %NULL ->io_context, it would stay that way until allocation and
assignment is complete. It has smp_wmb() between alloc/init and
assignment.

* set_task_ioprio() grabs task_lock() for assignment and does
smp_read_barrier_depends() between "ioc = task->io_context" and "if
(ioc)". Unfortunately, this doesn't achieve anything - the latter
is not a dependent load of the former. ie, if ioc itself were being
dereferenced "ioc->xxx", it would mean something (not sure what tho)
but as the code currently stands, the dependent read barrier is
noop.

As only one of the the two test-assignment sequences is task_lock()
protected, the task_lock() can't do much about race between the two.
Nothing prevents current_io_context() and set_task_ioprio() allocating
its own ioc for the same task and overwriting the other's.

Also, set_task_ioprio() can race with exiting task and create a new
ioc after exit_io_context() is finished.

ioc get/put doesn't have any reason to be complex. The only hot path
is accessing the existing ioc of %current, which is simple to achieve
given that ->io_context is never destroyed as long as the task is
alive. All other paths can happily go through task_lock() like all
other task sub structures without impacting anything.

This patch updates ioc get/put so that it becomes more conventional.

* alloc_io_context() is replaced with get_task_io_context(). This is
the only interface which can acquire access to ioc of another task.
On return, the caller has an explicit reference to the object which
should be put using put_io_context() afterwards.

* The functionality of current_io_context() remains the same but when
creating a new ioc, it shares the code path with
get_task_io_context() and always goes through task_lock().

* get_io_context() now means incrementing ref on an ioc which the
caller already has access to (be that an explicit refcnt or implicit
%current one).

* PF_EXITING inhibits creation of new io_context and once
exit_io_context() is finished, it's guaranteed that both ioc
acquisition functions return %NULL.

* All users are updated. Most are trivial but
smp_read_barrier_depends() removal from cfq_get_io_context() needs a
bit of explanation. I suppose the original intention was to ensure
ioc->ioprio is visible when set_task_ioprio() allocates new
io_context and installs it; however, this wouldn't have worked
because set_task_ioprio() doesn't have wmb between init and install.
There are other problems with this which will be fixed in another
patch.

* While at it, use NUMA_NO_NODE instead of -1 for wildcard node
specification.

-v2: Vivek spotted contamination from debug patch. Removed.

Signed-off-by: Tejun Heo
Cc: Vivek Goyal
Signed-off-by: Jens Axboe

Tejun Heo
2011-12-14 07:33:38 +0800

10 Dec, 2011

1 commit

975e32c28 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Do no try to schedule task events if there are none
lockdep, kmemcheck: Annotate ->lock in lockdep_init_map()
perf header: Use event_name() to get an event name
perf stat: Failure with "Operation not supported"

Linus Torvalds
2011-12-10 00:07:24 +0800

09 Dec, 2011

3 commits

031af165b sys_getppid: add missing rcu_dereference ... Browse Code »

In order to safely dereference current->real_parent inside an
rcu_read_lock, we need an rcu_dereference.

Signed-off-by: Mandeep Singh Baines
Cc: Thomas Gleixner
Cc: Pavel Emelyanov
Cc: Oleg Nesterov
Cc: Kees Cook
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mandeep Singh Baines
2011-12-09 23:50:29 +0800
09dc3cf93 printk: avoid double lock acquire ... Browse Code »

Commit 4f2a8d3cf5e ("printk: Fix console_sem vs logbuf_lock unlock race")
introduced another silly bug where we would want to acquire an already
held lock. Avoid this.

Reported-by: Andrea Arcangeli
Signed-off-by: Peter Zijlstra
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2011-12-09 23:50:28 +0800
09d9673d5 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
alarmtimers: Fix time comparison
ptp: Fix clock_getres() implementation

Linus Torvalds
2011-12-09 05:21:28 +0800

07 Dec, 2011

3 commits

86b47c254 perf: Do no try to schedule task events if there are none ... Browse Code »

perf_event_sched_in() shouldn't try to schedule task events if there
are none otherwise task's ctx->is_active will be set and will not be
cleared during sched_out. This will prevent newly added events from
being scheduled into the task context.

Fixes a boo-boo in commit 1d5f003f5a9 ("perf: Do not set task_ctx
pointer in cpuctx if there are no events in the context").

Signed-off-by: Gleb Natapov
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20111122140821.GF2557@redhat.com
Signed-off-by: Ingo Molnar

Gleb Natapov
2011-12-07 23:31:22 +0800
091c0f86b Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
ftrace: Fix hash record accounting bug
perf: Fix parsing of __print_flags() in TP_printk()
jump_label: jump_label_inc may return before the code is patched
ftrace: Remove force undef config value left for testing
tracing: Restore system filter behavior
tracing: fix event_subsystem ref counting

Linus Torvalds
2011-12-07 03:54:33 +0800
a33caeb11 lockdep, kmemcheck: Annotate ->lock in lockdep_init_map() ... Browse Code »

Since commit f59de89 ("lockdep: Clear whole lockdep_map on initialization"),
lockdep_init_map() will clear all the struct. But it will break
lock_set_class()/lock_set_subclass(). A typical race condition
is like below:

CPU A CPU B
lock_set_subclass(lockA);
lock_set_class(lockA);
lockdep_init_map(lockA);
/* lockA->name is cleared */
memset(lockA);
__lock_acquire(lockA);
/* lockA->class_cache[] is cleared */
register_lock_class(lockA);
look_up_lock_class(lockA);
WARN_ON_ONCE(class->name !=
lock->name);

lock->name = name;

So restore to what we have done before commit f59de89 but annotate
->lock with kmemcheck_mark_initialized() to suppress the kmemcheck
warning reported in commit f59de89.

Reported-by: Sergey Senozhatsky
Reported-by: Borislav Petkov
Suggested-by: Vegard Nossum
Signed-off-by: Yong Zhang
Cc: Tejun Heo
Cc: David Rientjes
Cc:
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20111109080451.GB8124@zhy
Signed-off-by: Ingo Molnar

Yong Zhang
2011-12-07 01:18:13 +0800

06 Dec, 2011

10 commits

c9c024b3f alarmtimers: Fix time comparison ... Browse Code »
1

The expiry function compares the timer against current time and does
not expire the timer when the expiry time is >= now. That's wrong. If
the timer is set for now, then it must expire.

Make the condition expiry > now for breaking out the loop.

Signed-off-by: Thomas Gleixner
Acked-by: John Stultz
Cc: stable@kernel.org

Thomas Gleixner
2011-12-06 18:38:32 +0800
232ea3445 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Fix loss of notification with multi-event
perf, x86: Force IBS LVT offset assignment for family 10h
perf, x86: Disable PEBS on SandyBridge chips
trace_events_filter: Use rcu_assign_pointer() when setting ftrace_event_call->filter
perf session: Fix crash with invalid CPU list
perf python: Fix undefined symbol problem
perf/x86: Enable raw event access to Intel offcore events
perf: Don't use -ENOSPC for out of PMU resources
perf: Do not set task_ctx pointer in cpuctx if there are no events in the context
perf/x86: Fix PEBS instruction unwind
oprofile, x86: Fix crash when unloading module (nmi timer mode)
oprofile: Fix crash when unloading module (hr timer mode)

Linus Torvalds
2011-12-06 08:54:00 +0800
40c043b07 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clockevents: Set noop handler in clockevents_exchange_device()
tick-broadcast: Stop active broadcast device when replacing it
clocksource: Fix bug with max_deferment margin calculation
rtc: Fix some bugs that allowed accumulating time drift in suspend/resume
rtc: Disable the alarm in the hardware

Linus Torvalds
2011-12-06 08:53:43 +0800
f14aa871c Merge branches 'core-urgent-for-linus' and 'irq-urgent-for-linus' of git://git.k… ... Browse Code »

…ernel.org/pub/scm/linux/kernel/git/tip/tip

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
slab, lockdep: Fix silly bug

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Fix race condition when stopping the irq thread

Linus Torvalds
2011-12-06 08:51:21 +0800
7125facea Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched, x86: Avoid unnecessary overflow in sched_clock
sched: Fix buglet in return_cfs_rq_runtime()
sched: Avoid SMT siblings in select_idle_sibling() if possible
sched: Set the command name of the idle tasks in SMP kernels
sched, rt: Provide means of disabling cross-cpu bandwidth sharing
sched: Document wait_for_completion_*() return values
sched_fair: Fix a typo in the comment describing update_sd_lb_stats
sched: Add a comment to effective_load() since it's a pain

Linus Torvalds
2011-12-06 08:50:24 +0800
ddf6e0e50 ftrace: Fix hash record accounting bug ... Browse Code »

If the set_ftrace_filter is cleared by writing just whitespace to
it, then the filter hash refcounts will be decremented but not
updated. This causes two bugs:

1) No functions will be enabled for tracing when they all should be

2) If the users clears the set_ftrace_filter twice, it will crash ftrace:

------------[ cut here ]------------
WARNING: at /home/rostedt/work/git/linux-trace.git/kernel/trace/ftrace.c:1384 __ftrace_hash_rec_update.part.27+0x157/0x1a7()
Modules linked in:
Pid: 2330, comm: bash Not tainted 3.1.0-test+ #32
Call Trace:
[] warn_slowpath_common+0x83/0x9b
[] warn_slowpath_null+0x1a/0x1c
[] __ftrace_hash_rec_update.part.27+0x157/0x1a7
[] ? ftrace_regex_release+0xa7/0x10f
[] ? kfree+0xe5/0x115
[] ftrace_hash_move+0x2e/0x151
[] ftrace_regex_release+0xba/0x10f
[] fput+0xfd/0x1c2
[] filp_close+0x6d/0x78
[] sys_dup3+0x197/0x1c1
[] sys_dup2+0x4f/0x54
[] system_call_fastpath+0x16/0x1b
---[ end trace 77a3a7ee73794a02 ]---

Link: http://lkml.kernel.org/r/20111101141420.GA4918@debian

Reported-by: Rabin Vincent
Signed-off-by: Steven Rostedt

Steven Rostedt
2011-12-06 02:28:47 +0800
bbbf7af4b jump_label: jump_label_inc may return before the code is patched ... Browse Code »
1

If cpu A calls jump_label_inc() just after atomic_add_return() is
called by cpu B, atomic_inc_not_zero() will return value greater then
zero and jump_label_inc() will return to a caller before jump_label_update()
finishes its job on cpu B.

Link: http://lkml.kernel.org/r/20111018175551.GH17571@redhat.com

Cc: stable@vger.kernel.org
Cc: Peter Zijlstra
Acked-by: Jason Baron
Signed-off-by: Gleb Natapov
Signed-off-by: Steven Rostedt

Gleb Natapov
2011-12-06 02:28:46 +0800
c7c6ec8be ftrace: Remove force undef config value left for testing ... Browse Code »

A forced undef of a config value was used for testing and was
accidently left in during the final commit. This causes x86 to
run slower than needed while running function tracing as well
as causes the function graph selftest to fail when DYNMAIC_FTRACE
is not set. This is because the code in MCOUNT expects the ftrace
code to be processed with the config value set that happened to
be forced not set.

The forced config option was left in by:
commit 6331c28c962561aee59e5a493b7556a4bb585957
ftrace: Fix dynamic selftest failure on some archs

Link: http://lkml.kernel.org/r/20111102150255.GA6973@debian

Cc: stable@vger.kernel.org
Reported-by: Rabin Vincent
Signed-off-by: Steven Rostedt

Steven Rostedt
2011-12-06 02:28:45 +0800
27b14b56a tracing: Restore system filter behavior ... Browse Code »

Though not all events have field 'prev_pid', it was allowed to do this:

# echo 'prev_pid == 100' > events/sched/filter

but commit 75b8e98263fdb0bfbdeba60d4db463259f1fe8a2 (tracing/filter: Swap
entire filter of events) broke it without any reason.

Link: http://lkml.kernel.org/r/4EAF46CF.8040408@cn.fujitsu.com

Signed-off-by: Li Zefan
Signed-off-by: Steven Rostedt

Li Zefan
2011-12-06 02:28:45 +0800
cb5997474 tracing: fix event_subsystem ref counting ... Browse Code »
1

Fix a bug introduced by e9dbfae5, which prevents event_subsystem from
ever being released.

Ref_count was added to keep track of subsystem users, not for counting
events. Subsystem is created with ref_count = 1, so there is no need to
increment it for every event, we have nr_events for that. Fix this by
touching ref_count only when we actually have a new user -
subsystem_open().

Cc: stable@vger.kernel.org
Signed-off-by: Ilya Dryomov
Link: http://lkml.kernel.org/r/1320052062-7846-1-git-send-email-idryomov@gmail.com
Signed-off-by: Steven Rostedt

Ilya Dryomov
2011-12-06 02:28:44 +0800

05 Dec, 2011

2 commits

dc440d10e Merge branch 'tip/perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/… ... Browse Code »

…rostedt/linux-trace into perf/urgent

Ingo Molnar
2011-12-05 21:34:00 +0800
10c6db110 perf: Fix loss of notification with multi-event ... Browse Code »

When you do:
$ perf record -e cycles,cycles,cycles noploop 10

You expect about 10,000 samples for each event, i.e., 10s at
1000samples/sec. However, this is not what's happening. You
get much fewer samples, maybe 3700 samples/event:

$ perf report -D | tail -15
Aggregated stats:
TOTAL events: 10998
MMAP events: 66
COMM events: 2
SAMPLE events: 10930
cycles stats:
TOTAL events: 3644
SAMPLE events: 3644
cycles stats:
TOTAL events: 3642
SAMPLE events: 3642
cycles stats:
TOTAL events: 3644
SAMPLE events: 3644

On a Intel Nehalem or even AMD64, there are 4 counters capable
of measuring cycles, so there is plenty of space to measure those
events without multiplexing (even with the NMI watchdog active).
And even with multiplexing, we'd expect roughly the same number
of samples per event.

The root of the problem was that when the event that caused the buffer
to become full was not the first event passed on the cmdline, the user
notification would get lost. The notification was sent to the file
descriptor of the overflowed event but the perf tool was not polling
on it. The perf tool aggregates all samples into a single buffer,
i.e., the buffer of the first event. Consequently, it assumes
notifications for any event will come via that descriptor.

The seemingly straight forward solution of moving the waitq into the
ringbuffer object doesn't work because of life-time issues. One could
perf_event_set_output() on a fd that you're also blocking on and cause
the old rb object to be freed while its waitq would still be
referenced by the blocked thread -> FAIL.

Therefore link all events to the ringbuffer and broadcast the wakeup
from the ringbuffer object to all possible events that could be waited
upon. This is rather ugly, and we're open to better solutions but it
works for now.

Reported-by: Stephane Eranian
Finished-by: Stephane Eranian
Reviewed-by: Stephane Eranian
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20111126014731.GA7030@quad
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-12-05 16:33:03 +0800

02 Dec, 2011

5 commits

de28f25e8 clockevents: Set noop handler in clockevents_exchange_device() ... Browse Code »
2

If a device is shutdown, then there might be a pending interrupt,
which will be processed after we reenable interrupts, which causes the
original handler to be run. If the old handler is the (broadcast)
periodic handler the shutdown state might hang the kernel completely.

Signed-off-by: Thomas Gleixner
Cc: stable@vger.kernel.org

Thomas Gleixner
2011-12-02 23:07:23 +0800
c1be84309 tick-broadcast: Stop active broadcast device when replacing it ... Browse Code »
1

When a better rated broadcast device is installed, then the current
active device is not disabled, which results in two running broadcast
devices.

Signed-off-by: Thomas Gleixner
Cc: stable@vger.kernel.org

Thomas Gleixner
2011-12-02 23:06:54 +0800
550acb192 genirq: Fix race condition when stopping the irq thread ... Browse Code »
1

In irq_wait_for_interrupt(), the should_stop member is verified before
setting the task's state to TASK_INTERRUPTIBLE and calling schedule().
In case kthread_stop sets should_stop and wakes up the process after
should_stop is checked by the irq thread but before the task's state
is changed, the irq thread might never exit:

kthread_stop irq_wait_for_interrupt
------------ ----------------------

...
... while (!kthread_should_stop()) {
kthread->should_stop = 1;
wake_up_process(k);
wait_for_completion(&kthread->exited);
...
set_current_state(TASK_INTERRUPTIBLE);

...

schedule();
}

Fix this by checking if the thread should stop after modifying the
task's state.

[ tglx: Simplified it a bit ]

Signed-off-by: Ido Yariv
Link: http://lkml.kernel.org/r/1322740508-22640-1-git-send-email-ido@wizery.com
Signed-off-by: Thomas Gleixner
Cc: stable@kernel.org

Ido Yariv
2011-12-02 18:54:24 +0800
d3d9acf64 trace_events_filter: Use rcu_assign_pointer() when setting ftrace_event_call->filter ... Browse Code »
1

ftrace_event_call->filter is sched RCU protected but didn't use
rcu_assign_pointer(). Use it.

TODO: Add proper __rcu annotation to call->filter and all its users.

-v2: Use RCU_INIT_POINTER() for %NULL clearing as suggested by Eric.

Link: http://lkml.kernel.org/r/20111123164949.GA29639@google.com

Cc: Eric Dumazet
Cc: Frederic Weisbecker
Cc: Jiri Olsa
Cc: stable@kernel.org # (2.6.39+)
Signed-off-by: Tejun Heo
Signed-off-by: Steven Rostedt

Tejun Heo
2011-12-02 11:16:47 +0800
b1f919664 clocksource: Fix bug with max_deferment margin calculation ... Browse Code »
1

In order to leave a margin of 12.5% we should >> 3 not >> 5.

CC: stable@kernel.org
Signed-off-by: Yang Honggang (Joseph)
[jstultz: Modified commit subject]
Signed-off-by: John Stultz

Yang Honggang (Joseph)
2011-12-02 07:50:00 +0800

30 Nov, 2011

1 commit

8cd792037 Merge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm ... Browse Code »

* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: Update comments describing device power management callbacks
PM / Sleep: Update documentation related to system wakeup
PM / Runtime: Make documentation follow the new behavior of irq_safe
PM / Sleep: Correct inaccurate information in devices.txt
PM / Domains: Document how PM domains are used by the PM core
PM / Hibernate: Do not leak memory in error/test code paths

Linus Torvalds
2011-11-30 06:43:22 +0800

29 Nov, 2011

4 commits

a34815b96 Merge branch 'for-3.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup ... Browse Code »

* 'for-3.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup_freezer: fix freezing groups with stopped tasks

Linus Torvalds
2011-11-29 05:55:59 +0800
c28800a9c Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
hrtimer: Fix extra wakeups from __remove_hrtimer()
timekeeping: add arch_offset hook to ktime_get functions
clocksource: Avoid selecting mult values that might overflow when adjusted
time: Improve documentation of timekeeeping_adjust()

Linus Torvalds
2011-11-29 00:43:52 +0800
ce8f55c2a Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Don't allow per cpu interrupts to be suspended

Linus Torvalds
2011-11-29 00:43:32 +0800
52553ddff genirq: fix regression in irqfixup, irqpoll ... Browse Code »
1

Commit fa27271bc8d2("genirq: Fixup poll handling") introduced a
regression that broke irqfixup/irqpoll for some hardware configurations.

Amidst reorganizing 'try_one_irq', that patch removed a test that
checked for 'action->handler' returning IRQ_HANDLED, before acting on
the interrupt. Restoring this test back returns the functionality lost
since 2.6.39. In the current set of tests, after 'action' is set, it
must precede '!action->next' to take effect.

With this and my previous patch to irq/spurious.c, c75d720fca8a, all
IRQ regressions that I have encountered are fixed.

Signed-off-by: Edward Donovan
Reported-and-tested-by: Rogério Brito
Cc: Thomas Gleixner
Cc: stable@kernel.org (2.6.39+)
Signed-off-by: Linus Torvalds

Edward Donovan
2011-11-29 00:43:09 +0800

25 Nov, 2011

1 commit

884a45d96 cgroup_freezer: fix freezing groups with stopped tasks ... Browse Code »
1

2d3cbf8b (cgroup_freezer: update_freezer_state() does incorrect state
transitions) removed is_task_frozen_enough and replaced it with a simple
frozen call. This, however, breaks freezing for a group with stopped tasks
because those cannot be frozen and so the group remains in CGROUP_FREEZING
state (update_if_frozen doesn't count stopped tasks) and never reaches
CGROUP_FROZEN.

Let's add is_task_frozen_enough back and use it at the original locations
(update_if_frozen and try_to_freeze_cgroup). Semantically we consider
stopped tasks as frozen enough so we should consider both cases when
testing frozen tasks.

Testcase:
mkdir /dev/freezer
mount -t cgroup -o freezer none /dev/freezer
mkdir /dev/freezer/foo
sleep 1h &
pid=$!
kill -STOP $pid
echo $pid > /dev/freezer/foo/tasks
echo FROZEN > /dev/freezer/foo/freezer.state
while true
do
cat /dev/freezer/foo/freezer.state
[ "`cat /dev/freezer/foo/freezer.state`" = "FROZEN" ] && break
sleep 1
done
echo OK

Signed-off-by: Michal Hocko
Acked-by: Li Zefan
Cc: Tomasz Buchert
Cc: Paul Menage
Cc: Andrew Morton
Cc: stable@kernel.org
Signed-off-by: Tejun Heo

Michal Hocko
2011-11-25 03:58:22 +0800

24 Nov, 2011

1 commit

bb58dd5d1 PM / Hibernate: Do not leak memory in error/test code paths ... Browse Code »

The hibernation core code forgets to release memory preallocated
for hibernation if there's an error in its early stages or if test
modes causing hibernation_snapshot() to return early are used. This
causes the system to be hardly usable, because the amount of
preallocated memory is usually huge. Fix this problem.

Reported-by: Srivatsa S. Bhat
Signed-off-by: Rafael J. Wysocki
Acked-by: Srivatsa S. Bhat

Rafael J. Wysocki
2011-11-24 04:03:38 +0800

23 Nov, 2011

1 commit

8ba8ed54d Merge branch 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux ... Browse Code »

* 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
writeback: remove vm_dirties and task->dirties
writeback: hard throttle 1000+ dd on a slow USB stick
mm: Make task in balance_dirty_pages() killable

Linus Torvalds
2011-11-23 00:22:48 +0800

21 Nov, 2011

1 commit

2d360fcbd Merge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm ... Browse Code »

* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / Suspend: Fix bug in suspend statistics update
PM / Hibernate: Fix the early termination of test modes
PM / shmobile: Fix build of sh7372_pm_init() for CONFIG_PM unset
PM Sleep: Do not extend wakeup paths to devices with ignore_children set
PM / driver core: disable device's runtime PM during shutdown
PM / devfreq: correct Kconfig dependency
PM / devfreq: fix use after free in devfreq_remove_device
PM / shmobile: Avoid restoring the INTCS state during initialization
PM / devfreq: Remove compiler error after irq.h update
PM / QoS: Properly use the WARN() macro in dev_pm_qos_add_request()
PM / Clocks: Only disable enabled clocks in pm_clk_suspend()
ARM: mach-shmobile: sh7372 A3SP no_suspend_console fix
PM / shmobile: Don't skip debugging output in pd_power_up()

Linus Torvalds
2011-11-21 06:33:02 +0800

19 Nov, 2011

3 commits

501a708f1 PM / Suspend: Fix bug in suspend statistics update ... Browse Code »

After commit 2a77c46de1e3dace73745015635ebbc648eca69c
(PM / Suspend: Add statistics debugfs file for suspend to RAM)
a missing pair of braces inside the state_store() function causes even
invalid arguments to suspend to be wrongly treated as failed suspend
attempts. Fix this.

[rjw: Put the hash/subject of the buggy commit into the changelog.]

Signed-off-by: Srivatsa S. Bhat
Signed-off-by: Rafael J. Wysocki

Srivatsa S. Bhat
2011-11-19 21:37:57 +0800
27c9cd7e6 hrtimer: Fix extra wakeups from __remove_hrtimer() ... Browse Code »
1

__remove_hrtimer() attempts to reprogram the clockevent device when
the timer being removed is the next to expire. However,
__remove_hrtimer() reprograms the clockevent *before* removing the
timer from the timerqueue and thus when hrtimer_force_reprogram()
finds the next timer to expire it finds the timer we're trying to
remove.

This is especially noticeable when the system switches to NOHz mode
and the system tick is removed. The timer tick is removed from the
system but the clockevent is programmed to wakeup in another HZ
anyway.

Silence the extra wakeup by removing the timer from the timerqueue
before calling hrtimer_force_reprogram() so that we actually program
the clockevent for the next timer to expire.

This was broken by 998adc3 "hrtimers: Convert hrtimers to use
timerlist infrastructure".

Signed-off-by: Jeff Ohlstein
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1321660030-8520-1-git-send-email-johlstei@codeaurora.org
Signed-off-by: Thomas Gleixner

Jeff Ohlstein
2011-11-19 19:17:37 +0800
aa9a7b118 PM / Hibernate: Fix the early termination of test modes ... Browse Code »

Commit 2aede851ddf08666f68ffc17be446420e9d2a056
(PM / Hibernate: Freeze kernel threads after preallocating memory)
postponed the freezing of kernel threads to after preallocating memory
for hibernation. But while doing that, the hibernation test TEST_FREEZER
and the test mode HIBERNATION_TESTPROC were not moved accordingly.

As a result, when using these test modes, it only goes upto the freezing of
userspace and exits, when in fact it should go till the complete end of task
freezing stage, namely the freezing of kernel threads as well.

So, move these points of exit to appropriate places so that freezing of
kernel threads is also tested while using these test harnesses.

Signed-off-by: Srivatsa S. Bhat
Signed-off-by: Rafael J. Wysocki

Srivatsa S. Bhat
2011-11-19 06:02:42 +0800

18 Nov, 2011

2 commits

d004e0240 timekeeping: add arch_offset hook to ktime_get functions ... Browse Code »
1

ktime_get and ktime_get_ts were calling timekeeping_get_ns()
but later they were not calling arch_gettimeoffset() so architectures
using this mechanism returned 0 ns when calling these functions.

This happened for example when running Busybox's ping which calls
syscall(__NR_clock_gettime, CLOCK_MONOTONIC, ts) which eventually
calls ktime_get. As a result the returned ping travel time was zero.

CC: stable@kernel.org
Signed-off-by: Hector Palacios
Signed-off-by: John Stultz

Hector Palacios
2011-11-18 06:57:19 +0800
2ed0e645f genirq: Don't allow per cpu interrupts to be suspended ... Browse Code »

The power management functions related to interrupts do not know
(yet) about per-cpu interrupts and end up calling the wrong
low-level methods to enable/disable interrupts.

This leads to all kind of interesting issues (action taken on one
CPU only, updating a refcount which is not used otherwise...).

The workaround for the time being is simply to flag these interrupts
with IRQF_NO_SUSPEND. At least on ARM, these interrupts are actually
dealt with at the architecture level.

Reported-by: Santosh Shilimkar
Tested-by: Santosh Shilimkar
Signed-off-by: Marc Zyngier
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1321446459-31409-1-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner

Marc Zyngier
2011-11-18 00:44:04 +0800