Eric Lee / smarc-fsl-linux-kernel

25 Apr, 2011

1 commit

bf26c0184 ptrace: Prepare to fix racy accesses on task breakpoints ... Browse Code »

When a task is traced and is in a stopped state, the tracer
may execute a ptrace request to examine the tracee state and
get its task struct. Right after, the tracee can be killed
and thus its breakpoints released.
This can happen concurrently when the tracer is in the middle
of reading or modifying these breakpoints, leading to dereferencing
a freed pointer.

Hence, to prepare the fix, create a generic breakpoint reference
holding API. When a reference on the breakpoints of a task is
held, the breakpoints won't be released until the last reference
is dropped. After that, no more ptrace request on the task's
breakpoints can be serviced for the tracer.

Reported-by: Oleg Nesterov
Signed-off-by: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Will Deacon
Cc: Prasad
Cc: Paul Mundt
Cc: v2.6.33..
Link: http://lkml.kernel.org/r/1302284067-7860-2-git-send-email-fweisbec@gmail.com

Frederic Weisbecker
14 years ago

31 Mar, 2011

1 commit

25985edce Fix common misspellings ... Browse Code »

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi

Lucas De Marchi
14 years ago

10 Mar, 2011

1 commit

73c101011 block: initial patch for on-stack per-task plugging ... Browse Code »

This patch adds support for creating a queuing context outside
of the queue itself. This enables us to batch up pieces of IO
before grabbing the block device queue lock and submitting them to
the IO scheduler.

The context is created on the stack of the process and assigned in
the task structure, so that we can auto-unplug it if we hit a schedule
event.

The current queue plugging happens implicitly if IO is submitted to
an empty device, yet callers have to remember to unplug that IO when
they are going to wait for it. This is an ugly API and has caused bugs
in the past. Additionally, it requires hacks in the vm (->sync_page()
callback) to handle that logic. By switching to an explicit plugging
scheme we make the API a lot nicer and can get rid of the ->sync_page()
hack in the vm.

Signed-off-by: Jens Axboe

Jens Axboe
14 years ago

12 Jan, 2011

1 commit

42776163e Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel… ... Browse Code »

…/git/tip/linux-2.6-tip

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (28 commits)
perf session: Fix infinite loop in __perf_session__process_events
perf evsel: Support perf_evsel__open(cpus > 1 && threads > 1)
perf sched: Use PTHREAD_STACK_MIN to avoid pthread_attr_setstacksize() fail
perf tools: Emit clearer message for sys_perf_event_open ENOENT return
perf stat: better error message for unsupported events
perf sched: Fix allocation result check
perf, x86: P4 PMU - Fix unflagged overflows handling
dynamic debug: Fix build issue with older gcc
tracing: Fix TRACE_EVENT power tracepoint creation
tracing: Fix preempt count leak
tracepoint: Add __rcu annotation
tracing: remove duplicate null-pointer check in skb tracepoint
tracing/trivial: Add missing comma in TRACE_EVENT comment
tracing: Include module.h in define_trace.h
x86: Save rbp in pt_regs on irq entry
x86, dumpstack: Fix unused variable warning
x86, NMI: Clean-up default_do_nmi()
x86, NMI: Allow NMI reason io port (0x61) to be processed on any CPU
x86, NMI: Remove DIE_NMI_IPI
x86, NMI: Add priorities to handlers
...

Linus Torvalds
14 years ago

07 Jan, 2011

1 commit

0b3fcf178 perf_events: Move code around to prepare for cgroup ... Browse Code »

In particular this patch move perf_event_exit_task() before
cgroup_exit() to allow for cgroup support. The cgroup_exit()
function detaches the cgroups attached to a task.

Other movements include hoisting some definitions and inlines
at the top of perf_event.c

Signed-off-by: Stephane Eranian
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Stephane Eranian
14 years ago

17 Dec, 2010

1 commit

909ea9646 core: Replace __get_cpu_var with __this_cpu_read if not used for an address. ... Browse Code »

__get_cpu_var() can be replaced with this_cpu_read and will then use a
single read instruction with implied address calculation to access the
correct per cpu instance.

However, the address of a per cpu variable passed to __this_cpu_read()
cannot be determined (since it's an implied address conversion through
segment prefixes). Therefore apply this only to uses of __get_cpu_var
where the address of the variable is not used.

Cc: Pekka Enberg
Cc: Hugh Dickins
Cc: Thomas Gleixner
Acked-by: H. Peter Anvin
Signed-off-by: Christoph Lameter
Signed-off-by: Tejun Heo

Christoph Lameter
14 years ago

03 Dec, 2010

1 commit

33dd94ae1 do_exit(): make sure that we run with get_fs() == USER_DS ... Browse Code »

If a user manages to trigger an oops with fs set to KERNEL_DS, fs is not
otherwise reset before do_exit(). do_exit may later (via mm_release in
fork.c) do a put_user to a user-controlled address, potentially allowing
a user to leverage an oops into a controlled write into kernel memory.

This is only triggerable in the presence of another bug, but this
potentially turns a lot of DoS bugs into privilege escalations, so it's
worth fixing. I have proof-of-concept code which uses this bug along
with CVE-2010-3849 to write a zero to an arbitrary kernel address, so
I've tested that this is not theoretical.

A more logical place to put this fix might be when we know an oops has
occurred, before we call do_exit(), but that would involve changing
every architecture, in multiple places.

Let's just stick it in do_exit instead.

[akpm@linux-foundation.org: update code comment]
Signed-off-by: Nelson Elhage
Cc: KOSAKI Motohiro
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nelson Elhage
14 years ago

06 Nov, 2010

1 commit

e0a702171 posix-cpu-timers: workaround to suppress the problems with mt exec ... Browse Code »

posix-cpu-timers.c correctly assumes that the dying process does
posix_cpu_timers_exit_group() and removes all !CPUCLOCK_PERTHREAD
timers from signal->cpu_timers list.

But, it also assumes that timer->it.cpu.task is always the group
leader, and thus the dead ->task means the dead thread group.

This is obviously not true after de_thread() changes the leader.
After that almost every posix_cpu_timer_ method has problems.

It is not simple to fix this bug correctly. First of all, I think
that timer->it.cpu should use struct pid instead of task_struct.
Also, the locking should be reworked completely. In particular,
tasklist_lock should not be used at all. This all needs a lot of
nontrivial and hard-to-test changes.

Change __exit_signal() to do posix_cpu_timers_exit_group() when
the old leader dies during exec. This is not the fix, just the
temporary hack to hide the problem for 2.6.37 and stable. IOW,
this is obviously wrong but this is what we currently have anyway:
cpu timers do not work after mt exec.

In theory this change adds another race. The exiting leader can
detach the timers which were attached to the new leader. However,
the window between de_thread() and release_task() is small, we
can pretend that sys_timer_create() was called before de_thread().

Signed-off-by: Oleg Nesterov
Signed-off-by: Linus Torvalds

Oleg Nesterov
14 years ago

28 Oct, 2010

1 commit

d16e15f5b exit: add lock context annotation on find_new_reaper() ... Browse Code »

find_new_reaper() releases and regrabs tasklist_lock but was missing
proper annotations. Add it. This remove following sparse warning:

warning: context imbalance in 'find_new_reaper' - unexpected unlock

Signed-off-by: Namhyung Kim
Acked-by: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Namhyung Kim
14 years ago

27 Oct, 2010

1 commit

3d5992d2a oom: add per-mm oom disable count ... Browse Code »

It's pointless to kill a task if another thread sharing its mm cannot be
killed to allow future memory freeing. A subsequent patch will prevent
kills in such cases, but first it's necessary to have a way to flag a task
that shares memory with an OOM_DISABLE task that doesn't incur an
additional tasklist scan, which would make select_bad_process() an O(n^2)
function.

This patch adds an atomic counter to struct mm_struct that follows how
many threads attached to it have an oom_score_adj of OOM_SCORE_ADJ_MIN.
They cannot be killed by the kernel, so their memory cannot be freed in
oom conditions.

This only requires task_lock() on the task that we're operating on, it
does not require mm->mmap_sem since task_lock() pins the mm and the
operation is atomic.

[rientjes@google.com: changelog and sys_unshare() code]
[rientjes@google.com: protect oom_disable_count with task_lock in fork]
[rientjes@google.com: use old_mm for oom_disable_count in exec]
Signed-off-by: Ying Han
Signed-off-by: David Rientjes
Cc: KAMEZAWA Hiroyuki
Cc: KOSAKI Motohiro
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ying Han
14 years ago

10 Sep, 2010

1 commit

4e231c796 perf: Fix up delayed_put_task_struct() ... Browse Code »

I missed a perf_event_ctxp user when converting it to an array. Pull this
last user into perf_event.c as well and fix it up.

Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
15 years ago

18 Aug, 2010

1 commit

f362b7324 Fix unprotected access to task credentials in waitid() ... Browse Code »

Using a program like the following:

#include
#include
#include
#include

int main() {
id_t id;
siginfo_t infop;
pid_t res;

id = fork();
if (id == 0) { sleep(1); exit(0); }
kill(id, SIGSTOP);
alarm(1);
waitid(P_PID, id, &infop, WCONTINUED);
return 0;
}

to call waitid() on a stopped process results in access to the child task's
credentials without the RCU read lock being held - which may be replaced in the
meantime - eliciting the following warning:

===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
kernel/exit.c:1460 invoked rcu_dereference_check() without protection!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 1
2 locks held by waitid02/22252:
#0: (tasklist_lock){.?.?..}, at: [] do_wait+0xc5/0x310
#1: (&(&sighand->siglock)->rlock){-.-...}, at: []
wait_consider_task+0x19a/0xbe0

stack backtrace:
Pid: 22252, comm: waitid02 Not tainted 2.6.35-323cd+ #3
Call Trace:
[] lockdep_rcu_dereference+0xa4/0xc0
[] wait_consider_task+0xaf1/0xbe0
[] do_wait+0xf5/0x310
[] sys_waitid+0x86/0x1f0
[] ? child_wait_callback+0x0/0x70
[] system_call_fastpath+0x16/0x1b

This is fixed by holding the RCU read lock in wait_task_continued() to ensure
that the task's current credentials aren't destroyed between us reading the
cred pointer and us reading the UID from those credentials.

Furthermore, protect wait_task_stopped() in the same way.

We don't need to keep holding the RCU read lock once we've read the UID from
the credentials as holding the RCU read lock doesn't stop the target task from
changing its creds under us - so the credentials may be outdated immediately
after we've read the pointer, lock or no lock.

Signed-off-by: Daniel J Blueman
Signed-off-by: David Howells
Acked-by: Paul E. McKenney
Acked-by: Oleg Nesterov
Signed-off-by: Linus Torvalds

Daniel J Blueman
15 years ago

11 Aug, 2010

1 commit

c7e49c148 ptrace: optimize exit_ptrace() for the likely case ... Browse Code »

exit_ptrace() takes tasklist_lock unconditionally. We need this lock to
avoid the race with ptrace_traceme(), it acts as a barrier.

Change its caller, forget_original_parent(), to call exit_ptrace() under
tasklist_lock. Change exit_ptrace() to drop and reacquire this lock if
needed.

This allows us to add the fastpath list_empty(ptraced) check. In the
likely no-tracees case exit_ptrace() just returns and we avoid the lock()
+ unlock() sequence.

"Zhang, Yanmin" suggested to add this
check, and he reports that this change adds about 11% improvement in some
tests.

Suggested-and-tested-by: "Zhang, Yanmin"
Signed-off-by: Oleg Nesterov
Acked-by: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
15 years ago

28 May, 2010

10 commits

b3ac022cb proc: turn signal_struct->count into "int nr_threads" ... Browse Code »

No functional changes, just s/atomic_t count/int nr_threads/.

With the recent changes this counter has a single user, get_nr_threads()
And, none of its callers need the really accurate number of threads, not
to mention each caller obviously races with fork/exit. It is only used to
report this value to the user-space, except first_tid() uses it to avoid
the unnecessary while_each_thread() loop in the unlikely case.

It is a bit sad we need a word in struct signal_struct for this, perhaps
we can change get_nr_threads() to approximate the number of threads using
signal->live and kill ->nr_threads later.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Oleg Nesterov
Cc: Alexey Dobriyan
Cc: "Eric W. Biederman"
Acked-by: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
15 years ago
97101eb41 exit: move taskstats_tgid_free() from __exit_signal() to free_signal_struct() ... Browse Code »

Move taskstats_tgid_free() from __exit_signal() to free_signal_struct().

This way signal->stats never points to nowhere and we can read ->stats
lockless.

Signed-off-by: Oleg Nesterov
Cc: Balbir Singh
Cc: Roland McGrath
Cc: Veaceslav Falico
Cc: Stanislaw Gruszka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
15 years ago
d40e48e02 exit: __exit_signal: use thread_group_leader() consistently ... Browse Code »

Cleanup:

- Add the boolean, group_dead = thread_group_leader(), for clarity.

- Do not test/set sig == NULL to detect the all-dead case, use this
boolean.

- Pass this boolen to __unhash_process() and use it instead of another
thread_group_leader() call which needs ->group_leader.

This can be considered as microoptimization, but hopefully this also
allows us do do other cleanups later.

Signed-off-by: Oleg Nesterov
Cc: Balbir Singh
Cc: Roland McGrath
Cc: Veaceslav Falico
Cc: Stanislaw Gruszka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
15 years ago
b7b8ff637 signals: kill the awful task_rq_unlock_wait() hack ... Browse Code »

Now that task->signal can't go away we can revert the horrible hack added
by ad474caca3e2a0550b7ce0706527ad5ab389a4d4 ("fix for
account_group_exec_runtime(), make sure ->signal can't be freed under
rq->lock").

And we can do more cleanups sched_stats.h/posix-cpu-timers.c later.

Signed-off-by: Oleg Nesterov
Cc: Alan Cox
Cc: Ingo Molnar
Cc: Peter Zijlstra
Acked-by: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
15 years ago
4ada856fb signals: clear signal->tty when the last thread exits ... Browse Code »

When the last thread exits signal->tty is freed, but the pointer is not
cleared and points to nowhere.

This is OK. Nobody should use signal->tty lockless, and it is no longer
possible to take ->siglock. However this looks wrong even if correct, and
the nice OOPS is better than subtle and hard to find bugs.

Change __exit_signal() to clear signal->tty under ->siglock.

Note: __exit_signal() needs more cleanups. It should not check "sig !=
NULL" to detect the all-dead case and we have the same issues with
signal->stats.

Signed-off-by: Oleg Nesterov
Cc: Alan Cox
Cc: Ingo Molnar
Acked-by: Peter Zijlstra
Acked-by: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
15 years ago
ea6d290ca signals: make task_struct->signal immutable/refcountable ... Browse Code »

We have a lot of problems with accessing task_struct->signal, it can
"disappear" at any moment. Even current can't use its ->signal safely
after exit_notify(). ->siglock helps, but it is not convenient, not
always possible, and sometimes it makes sense to use task->signal even
after this task has already dead.

This patch adds the reference counter, sigcnt, into signal_struct. This
reference is owned by task_struct and it is dropped in
__put_task_struct(). Perhaps it makes sense to export
get/put_signal_struct() later, but currently I don't see the immediate
reason.

Rename __cleanup_signal() to free_signal_struct() and unexport it. With
the previous changes it does nothing except kmem_cache_free().

Change __exit_signal() to not clear/free ->signal, it will be freed when
the last reference to any thread in the thread group goes away.

Note:
- when the last thead exits signal->tty can point to nowhere, see
the next patch.

- with or without this patch signal_struct->count should go away,
or at least it should be "int nr_threads" for fs/proc. This will
be addressed later.

Signed-off-by: Oleg Nesterov
Cc: Alan Cox
Cc: Ingo Molnar
Cc: Peter Zijlstra
Acked-by: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
15 years ago
4dec2a91f fork/exit: move tty_kref_put() outside of __cleanup_signal() ... Browse Code »

tty_kref_put() has two callsites in copy_process() paths,

1. if copy_process() suceeds it is called before we copy
signal->tty from parent

2. otherwise it is called from __cleanup_signal() under
bad_fork_cleanup_signal: label

In both cases tty_kref_put() is not right and unneeded because we don't
have the balancing tty_kref_get(). Fortunately, this is harmless because
this can only happen without CLONE_THREAD, and in this case signal->tty
must be NULL.

Remove tty_kref_put() from copy_process() and __cleanup_signal(), and
change another caller of __cleanup_signal(), __exit_signal(), to call
tty_kref_put() by hand.

I hope this change makes sense by itself, but it is also needed to make
->signal refcountable.

Signed-off-by: Oleg Nesterov
Acked-by: Alan Cox
Acked-by: Roland McGrath
Cc: Greg KH
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
15 years ago
4a5999429 exit: avoid sig->count in __exit_signal() to detect the group-dead case ... Browse Code »

Change __exit_signal() to check thread_group_leader() instead of
atomic_dec_and_test(&sig->count). This must be equivalent, the group
leader must be released only after all other threads have exited and
passed __exit_signal().

Henceforth sig->count is not actually used, except in fs/proc for
get_nr_threads/etc.

Signed-off-by: Oleg Nesterov
Acked-by: Roland McGrath
Cc: Veaceslav Falico
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
15 years ago
d344193a0 exit: avoid sig->count in de_thread/__exit_signal synchronization ... Browse Code »

de_thread() and __exit_signal() use signal_struct->count/notify_count for
synchronization. We can simplify the code and use ->notify_count only.
Instead of comparing these two counters, we can change de_thread() to set
->notify_count = nr_of_sub_threads, then change __exit_signal() to
dec-and-test this counter and notify group_exit_task.

Note that __exit_signal() checks "notify_count > 0" just for symmetry with
exit_notify(), we could just check it is != 0.

Signed-off-by: Oleg Nesterov
Acked-by: Roland McGrath
Cc: Veaceslav Falico
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
15 years ago
9c3391684 exit: exit_notify() can trust signal->notify_count < 0 ... Browse Code »

signal_struct->count in its current form must die.

- it has no reasons to be atomic_t

- it looks like a reference counter, but it is not

- otoh, we really need to make task->signal refcountable, just look at
the extremely ugly task_rq_unlock_wait() called from __exit_signals().

- we should change the lifetime rules for task->signal, it should be
pinned to task_struct. We have a lot of code which can be simplified
after that.

- it is not needed! while the code is correct, any usage of this
counter is artificial, except fs/proc uses it correctly to show the
number of threads.

This series removes the usage of sig->count from exit pathes.

This patch:

Now that Veaceslav changed copy_signal() to use zalloc(), exit_notify()
can just check notify_count < 0 to ensure the execing sub-threads needs
the notification from us. No need to do other checks, notify_count != 0
must always mean ->group_exit_task != NULL is waiting for us.

Signed-off-by: Oleg Nesterov
Acked-by: Roland McGrath
Cc: Veaceslav Falico
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
15 years ago

25 May, 2010

1 commit

c0ff7453b cpuset,mm: fix no node to alloc memory when changing cpuset's mems ... Browse Code »
44

Before applying this patch, cpuset updates task->mems_allowed and
mempolicy by setting all new bits in the nodemask first, and clearing all
old unallowed bits later. But in the way, the allocator may find that
there is no node to alloc memory.

The reason is that cpuset rebinds the task's mempolicy, it cleans the
nodes which the allocater can alloc pages on, for example:

(mpol: mempolicy)
task1 task1's mpol task2
alloc page 1
alloc on node0? NO 1
1 change mems from 1 to 0
1 rebind task1's mpol
0-1 set new bits
0 clear disallowed bits
alloc on node1? NO 0
...
can't alloc page
goto oom

This patch fixes this problem by expanding the nodes range first(set newly
allowed bits) and shrink it lazily(clear newly disallowed bits). So we
use a variable to tell the write-side task that read-side task is reading
nodemask, and the write-side task clears newly disallowed nodes after
read-side task ends the current memory allocation.

[akpm@linux-foundation.org: fix spello]
Signed-off-by: Miao Xie
Cc: David Rientjes
Cc: Nick Piggin
Cc: Paul Menage
Cc: Lee Schermerhorn
Cc: Hugh Dickins
Cc: Ravikiran Thirumalai
Cc: KOSAKI Motohiro
Cc: Christoph Lameter
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miao Xie
15 years ago

15 Apr, 2010

1 commit

b257c14ce Merge branch 'linus' into sched/core ... Browse Code »

Merge reason: merge the latest fixes, update to -rc4.

Signed-off-by: Ingo Molnar

Ingo Molnar
15 years ago

07 Apr, 2010

1 commit

a3a2e76c7 mm: avoid null-pointer deref in sync_mm_rss() ... Browse Code »

- We weren't zeroing p->rss_stat[] at fork()

- Consequently sync_mm_rss() was dereferencing tsk->mm for kernel
threads and was oopsing.

- Make __sync_task_rss_stat() static, too.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=15648

[akpm@linux-foundation.org: remove the BUG_ON(!mm->rss)]
Reported-by: Troels Liebe Bentsen
Signed-off-by: KAMEZAWA Hiroyuki
"Michael S. Tsirkin"
Cc: Andrea Arcangeli
Cc: Rik van Riel
Cc: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
15 years ago

03 Apr, 2010

1 commit

32bd7eb5a sched: Remove remaining USER_SCHED code ... Browse Code »

This is left over from commit 7c9414385e ("sched: Remove USER_SCHED"")

Signed-off-by: Li Zefan
Acked-by: Dhaval Giani
Signed-off-by: Peter Zijlstra
Cc: David Howells
LKML-Reference:
Signed-off-by: Ingo Molnar

Li Zefan
15 years ago

14 Mar, 2010

1 commit

4e3eaddd1 Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel… ... Browse Code »

…/git/tip/linux-2.6-tip

* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
locking: Make sparse work with inline spinlocks and rwlocks
x86/mce: Fix RCU lockdep splats
rcu: Increase RCU CPU stall timeouts if PROVE_RCU
ftrace: Replace read_barrier_depends() with rcu_dereference_raw()
rcu: Suppress RCU lockdep warnings during early boot
rcu, ftrace: Fix RCU lockdep splat in ftrace_perf_buf_prepare()
rcu: Suppress __mpol_dup() false positive from RCU lockdep
rcu: Make rcu_read_lock_sched_held() handle !PREEMPT
rcu: Add control variables to lockdep_rcu_dereference() diagnostics
rcu, cgroup: Relax the check in task_subsys_state() as early boot is now handled by lockdep-RCU
rcu: Use wrapper function instead of exporting tasklist_lock
sched, rcu: Fix rcu_dereference() for RCU-lockdep
rcu: Make task_subsys_state() RCU-lockdep checks handle boot-time use
rcu: Fix holdoff for accelerated GPs for last non-dynticked CPU
x86/gart: Unexport gart_iommu_aperture

Fix trivial conflicts in kernel/trace/ftrace.c

Linus Torvalds
15 years ago

07 Mar, 2010

2 commits

f3abd4f95 kernel/exit.c: fix shadows sparse warning ... Browse Code »

kernel/exit.c:1183:26: warning: symbol 'status' shadows an earlier one
kernel/exit.c:1173:21: originally declared here

Signed-off-by: Thiago Farina
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Thiago Farina
15 years ago
34e55232e mm: avoid false sharing of mm_counter ... Browse Code »

Considering the nature of per mm stats, it's the shared object among
threads and can be a cache-miss point in the page fault path.

This patch adds per-thread cache for mm_counter. RSS value will be
counted into a struct in task_struct and synchronized with mm's one at
events.

Now, in this patch, the event is the number of calls to handle_mm_fault.
Per-thread value is added to mm at each 64 calls.

rough estimation with small benchmark on parallel thread (2threads) shows
[before]
4.5 cache-miss/faults
[after]
4.0 cache-miss/faults
Anyway, the most contended object is mmap_sem if the number of threads grows.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: KAMEZAWA Hiroyuki
Cc: Minchan Kim
Cc: Christoph Lameter
Cc: Lee Schermerhorn
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
15 years ago

04 Mar, 2010

1 commit

db1466b3e rcu: Use wrapper function instead of exporting tasklist_lock ... Browse Code »

Lockdep-RCU commit d11c563d exported tasklist_lock, which is not
a good thing. This patch instead exports a function that uses
lockdep to check whether tasklist_lock is held.

Suggested-by: Christoph Hellwig
Signed-off-by: Paul E. McKenney
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
Cc: Christoph Hellwig
LKML-Reference:
Signed-off-by: Ingo Molnar

Paul E. McKenney
15 years ago

25 Feb, 2010

1 commit

d11c563dd sched: Use lockdep-based checking on rcu_dereference() ... Browse Code »

Update the rcu_dereference() usages to take advantage of the new
lockdep-based checking.

Signed-off-by: Paul E. McKenney
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference:
[ -v2: fix allmodconfig missing symbol export build failure on x86 ]
Signed-off-by: Ingo Molnar

Paul E. McKenney
15 years ago

18 Dec, 2009

1 commit

9cd80bbb0 do_wait() optimization: do not place sub-threads on task_struct->children list ... Browse Code »

Thanks to Roland who pointed out de_thread() issues.

Currently we add sub-threads to ->real_parent->children list. This buys
nothing but slows down do_wait().

With this patch ->children contains only main threads (group leaders).
The only complication is that forget_original_parent() should iterate over
sub-threads by hand, and de_thread() needs another list_replace() when it
changes ->group_leader.

Henceforth do_wait_thread() can never see task_detached() && !EXIT_DEAD
tasks, we can remove this check (and we can unify do_wait_thread() and
ptrace_do_wait()).

This change can confuse the optimistic search in mm_update_next_owner(),
but this is fixable and minor.

Perhaps badness() and oom_kill_process() should be updated, but they
should be fixed in any case.

Signed-off-by: Oleg Nesterov
Cc: Roland McGrath
Cc: Ingo Molnar
Cc: Ratan Nalumasu
Cc: Vitaly Mayatskikh
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
15 years ago

15 Dec, 2009

1 commit

1d6154825 sched: Convert pi_lock to raw_spinlock ... Browse Code »

Convert locks which cannot be sleeping locks in preempt-rt to
raw_spinlocks.

Signed-off-by: Thomas Gleixner
Acked-by: Peter Zijlstra
Acked-by: Ingo Molnar

Thomas Gleixner
15 years ago

12 Dec, 2009

1 commit

5ec93d115 tty: Move the leader test in disassociate ... Browse Code »

There are two call points, both want to check that tty->signal->leader is
set. Move the test into disassociate_ctty() as that will make locking
changes easier in a bit

Signed-off-by: Alan Cox
Signed-off-by: Greg Kroah-Hartman

Alan Cox
15 years ago

09 Dec, 2009

1 commit

6035ccd8e Merge branch 'for-2.6.33' of git://git.kernel.dk/linux-2.6-block ... Browse Code »

* 'for-2.6.33' of git://git.kernel.dk/linux-2.6-block: (113 commits)
cfq-iosched: Do not access cfqq after freeing it
block: include linux/err.h to use ERR_PTR
cfq-iosched: use call_rcu() instead of doing grace period stall on queue exit
blkio: Allow CFQ group IO scheduling even when CFQ is a module
blkio: Implement dynamic io controlling policy registration
blkio: Export some symbols from blkio as its user CFQ can be a module
block: Fix io_context leak after failure of clone with CLONE_IO
block: Fix io_context leak after clone with CLONE_IO
cfq-iosched: make nonrot check logic consistent
io controller: quick fix for blk-cgroup and modular CFQ
cfq-iosched: move IO controller declerations to a header file
cfq-iosched: fix compile problem with !CONFIG_CGROUP
blkio: Documentation
blkio: Wait on sync-noidle queue even if rq_noidle = 1
blkio: Implement group_isolation tunable
blkio: Determine async workload length based on total number of queues
blkio: Wait for cfq queue to get backlogged if group is empty
blkio: Propagate cgroup weight updation to cfq groups
blkio: Drop the reference to queue once the task changes cgroup
blkio: Provide some isolation between groups
...

Linus Torvalds
15 years ago

06 Dec, 2009

1 commit

897e81bea Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel… ... Browse Code »

…/git/tip/linux-2.6-tip

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (35 commits)
sched, cputime: Introduce thread_group_times()
sched, cputime: Cleanups related to task_times()
Revert "sched, x86: Optimize branch hint in __switch_to()"
sched: Fix isolcpus boot option
sched: Revert 498657a478c60be092208422fefa9c7b248729c2
sched, time: Define nsecs_to_jiffies()
sched: Remove task_{u,s,g}time()
sched: Introduce task_times() to replace task_{u,s}time() pair
sched: Limit the number of scheduler debug messages
sched.c: Call debug_show_all_locks() when dumping all tasks
sched, x86: Optimize branch hint in __switch_to()
sched: Optimize branch hint in context_switch()
sched: Optimize branch hint in pick_next_task_fair()
sched_feat_write(): Update ppos instead of file->f_pos
sched: Sched_rt_periodic_timer vs cpu hotplug
sched, kvm: Fix race condition involving sched_in_preempt_notifers
sched: More generic WAKE_AFFINE vs select_idle_sibling()
sched: Cleanup select_task_rq_fair()
sched: Fix granularity of task_u/stime()
sched: Fix/add missing update_rq_clock() calls
...

Linus Torvalds
15 years ago

04 Dec, 2009

1 commit

b69f22920 block: Fix io_context leak after failure of clone with CLONE_IO ... Browse Code »

With CLONE_IO, parent's io_context->nr_tasks is incremented, but never
decremented whenever copy_process() fails afterwards, which prevents
exit_io_context() from calling IO schedulers exit functions.

Give a task_struct to exit_io_context(), and call exit_io_context() instead of
put_io_context() in copy_process() cleanup path.

Signed-off-by: Louis Rilling
Signed-off-by: Jens Axboe

Louis Rilling
15 years ago

03 Dec, 2009

1 commit

0cf55e1ec sched, cputime: Introduce thread_group_times() ... Browse Code »

This is a real fix for problem of utime/stime values decreasing
described in the thread:

http://lkml.org/lkml/2009/11/3/522

Now cputime is accounted in the following way:

- {u,s}time in task_struct are increased every time when the thread
is interrupted by a tick (timer interrupt).

- When a thread exits, its {u,s}time are added to signal->{u,s}time,
after adjusted by task_times().

- When all threads in a thread_group exits, accumulated {u,s}time
(and also c{u,s}time) in signal struct are added to c{u,s}time
in signal struct of the group's parent.

So {u,s}time in task struct are "raw" tick count, while
{u,s}time and c{u,s}time in signal struct are "adjusted" values.

And accounted values are used by:

- task_times(), to get cputime of a thread:
This function returns adjusted values that originates from raw
{u,s}time and scaled by sum_exec_runtime that accounted by CFS.

- thread_group_cputime(), to get cputime of a thread group:
This function returns sum of all {u,s}time of living threads in
the group, plus {u,s}time in the signal struct that is sum of
adjusted cputimes of all exited threads belonged to the group.

The problem is the return value of thread_group_cputime(),
because it is mixed sum of "raw" value and "adjusted" value:

group's {u,s}time = foreach(thread){{u,s}time} + exited({u,s}time)

This misbehavior can break {u,s}time monotonicity.
Assume that if there is a thread that have raw values greater
than adjusted values (e.g. interrupted by 1000Hz ticks 50 times
but only runs 45ms) and if it exits, cputime will decrease (e.g.
-5ms).

To fix this, we could do:

group's {u,s}time = foreach(t){task_times(t)} + exited({u,s}time)

But task_times() contains hard divisions, so applying it for
every thread should be avoided.

This patch fixes the above problem in the following way:

- Modify thread's exit (= __exit_signal()) not to use task_times().
It means {u,s}time in signal struct accumulates raw values instead
of adjusted values. As the result it makes thread_group_cputime()
to return pure sum of "raw" values.

- Introduce a new function thread_group_times(*task, *utime, *stime)
that converts "raw" values of thread_group_cputime() to "adjusted"
values, in same calculation procedure as task_times().

- Modify group's exit (= wait_task_zombie()) to use this introduced
thread_group_times(). It make c{u,s}time in signal struct to
have adjusted values like before this patch.

- Replace some thread_group_cputime() by thread_group_times().
This replacements are only applied where conveys the "adjusted"
cputime to users, and where already uses task_times() near by it.
(i.e. sys_times(), getrusage(), and /proc//stat.)

This patch have a positive side effect:

- Before this patch, if a group contains many short-life threads
(e.g. runs 0.9ms and not interrupted by ticks), the group's
cputime could be invisible since thread's cputime was accumulated
after adjusted: imagine adjustment function as adj(ticks, runtime),
{adj(0, 0.9) + adj(0, 0.9) + ....} = {0 + 0 + ....} = 0.
After this patch it will not happen because the adjustment is
applied after accumulated.

v2:
- remove if()s, put new variables into signal_struct.

Signed-off-by: Hidetoshi Seto
Acked-by: Peter Zijlstra
Cc: Spencer Candland
Cc: Americo Wang
Cc: Oleg Nesterov
Cc: Balbir Singh
Cc: Stanislaw Gruszka
LKML-Reference:
Signed-off-by: Ingo Molnar

Hidetoshi Seto
15 years ago

26 Nov, 2009

1 commit

d5b7c78e9 sched: Remove task_{u,s,g}time() ... Browse Code »

Now all task_{u,s}time() pairs are replaced by task_times().
And task_gtime() is too simple to be an inline function.

Cleanup them all.

Signed-off-by: Hidetoshi Seto
Acked-by: Peter Zijlstra
Cc: Stanislaw Gruszka
Cc: Spencer Candland
Cc: Oleg Nesterov
Cc: Balbir Singh
Cc: Americo Wang
LKML-Reference:
Signed-off-by: Ingo Molnar

Hidetoshi Seto
15 years ago