Doug / smarc-fsl-linux-kernel | Embedian Git Server

20 Oct, 2007

1 commit

bac0abd61 Isolate some explicit usage of task->tgid ... Browse Code »

With pid namespaces this field is now dangerous to use explicitly, so hide
it behind the helpers.

Also the pid and pgrp fields o task_struct and signal_struct are to be
deprecated. Unfortunately this patch cannot be sent right now as this
leads to tons of warnings, so start isolating them, and deprecate later.

Actually the p->tgid == pid has to be changed to has_group_leader_pid(),
but Oleg pointed out that in case of posix cpu timers this is the same, and
thread_group_leader() is more preferable.

Signed-off-by: Pavel Emelyanov
Acked-by: Oleg Nesterov
Cc: Sukadev Bhattiprolu
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2007-10-20 02:53:40 +0800

10 Jul, 2007

1 commit

41b86e9c5 sched: make posix-cpu-timers use CFS's accounting information ... Browse Code »

update the posix-cpu-timers code to use CFS's CPU accounting information.

Signed-off-by: Ingo Molnar

Ingo Molnar
2007-07-10 00:51:58 +0800

09 May, 2007

1 commit

b5e618181 Introduce a handy list_first_entry macro ... Browse Code »

There are many places in the kernel where the construction like

foo = list_entry(head->next, struct foo_struct, list);

are used.
The code might look more descriptive and neat if using the macro

list_first_entry(head, type, member) \
list_entry((head)->next, type, member)

Here is the macro itself and the examples of its usage in the generic code.
If it will turn out to be useful, I can prepare the set of patches to
inject in into arch-specific code, drivers, networking, etc.

Signed-off-by: Pavel Emelianov
Signed-off-by: Kirill Korotaev
Cc: Randy Dunlap
Cc: Andi Kleen
Cc: Zach Brown
Cc: Davide Libenzi
Cc: John McCutchan
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: john stultz
Cc: Ram Pai
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelianov
2007-05-09 02:15:11 +0800

17 Feb, 2007

1 commit

1f2ea0837 [PATCH] posix timers: RCU optimization for clock_gettime() ... Browse Code »

Use RCU to avoid the need to acquire tasklist_lock in the single-threaded
case of clock_gettime(). It still acquires tasklist_lock when for a
(potentially multithreaded) process. This change allows realtime
applications to frequently monitor CPU consumption of individual tasks, as
requested (and now deployed) by some off-list users.

This has been in Ingo Molnar's -rt patchset since late 2005 with no
problems reported, and tests successfully on 2.6.20-rc6, so I believe that
it is long-since ready for mainline adoption.

[paulmck@linux.vnet.ibm.com: fix exit()/posix_cpu_clock_get() race spotted by Oleg]
Signed-off-by: Paul E. McKenney
Signed-off-by: Ingo Molnar
Cc: Thomas Gleixner
Cc: john stultz
Cc: Roman Zippel
Cc: Oleg Nesterov
Signed-off-by: Paul E. McKenney
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul E. McKenney
2007-02-17 00:14:00 +0800

17 Oct, 2006

1 commit

ac08c2649 [PATCH] posix-cpu-timers: prevent signal delivery starvation ... Browse Code »

The integer divisions in the timer accounting code can round the result
down to 0. Adding 0 is without effect and the signal delivery stops.

Clamp the division result to minimum 1 to avoid this.

Problem was reported by Seongbae Park , who provided
also an inital patch.

Roland sayeth:

I have had some more time to think about the problem, and to reproduce it
using Toyo's test case. For the record, if my understanding of the problem
is correct, this happens only in one very particular case. First, the
expiry time has to be so soon that in cputime_t units (usually 1s/HZ ticks)
it's < nthreads so the division yields zero. Second, it only affects each
thread that is so new that its CPU time accumulation is zero so now+0 is
still zero and ->it_*_expires winds up staying zero. For the VIRT and PROF
clocks when cputime_t is tick granularity (or the SCHED clock on
configurations where sched_clock's value only advances on clock ticks), this
is not hard to arrange with new threads starting up and blocking before they
accumulate a whole tick of CPU time. That's what happens in Toyo's test
case.

Note that in general it is fine for that division to round down to zero,
and set each thread's expiry time to its "now" time. The problem only
arises with thread's whose "now" value is still zero, so that now+0 winds up
0 and is interpreted as "not set" instead of ">= now". So it would be a
sufficient and more precise fix to just use max(ticks, 1) inside the loop
when setting each it_*_expires value.

But, it does no harm to round the division up to one and always advance
every thread's expiry time. If the thread didn't already fire timers for
the expiry time of "now", there is no expectation that it will do so before
the next tick anyway. So I followed Thomas's patch in lifting the max out
of the loops.

This patch also covers the reload cases, which are harder to write a test
for (and I didn't try). I've tested it with Toyo's case and it fixes that.

[toyoa@mvista.com: fix: min_t -> max_t]
Signed-off-by: Thomas Gleixner
Cc: Ingo Molnar
Signed-off-by: Roland McGrath
Cc: Daniel Walker
Cc: Toyo Abe
Cc: john stultz
Cc: Roman Zippel
Cc: Seongbae Park
Cc: Peter Mattis
Cc: Rohit Seth
Cc: Martin Bligh
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Thomas Gleixner
2006-10-17 23:18:43 +0800

30 Sep, 2006

2 commits

e4b765551 [PATCH] posix-timers: Fix the flags handling in posix_cpu_nsleep() ... Browse Code »

When a posix_cpu_nsleep() sleep is interrupted by a signal more than twice, it
incorrectly reports the sleep time remaining to the user. Because
posix_cpu_nsleep() doesn't report back to the user when it's called from
restart function due to the wrong flags handling.

This patch, which applies after previous one, moves the nanosleep() function
from posix_cpu_nsleep() to do_cpu_nanosleep() and cleans up the flags handling
appropriately.

Signed-off-by: Toyo Abe
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Toyo Abe
2006-09-30 00:18:15 +0800
1711ef386 [PATCH] posix-timers: Fix clock_nanosleep() doesn't return the remaining time in compatibility mode ... Browse Code »

The clock_nanosleep() function does not return the time remaining when the
sleep is interrupted by a signal.

This patch creates a new call out, compat_clock_nanosleep_restart(), which
handles returning the remaining time after a sleep is interrupted. This
patch revives clock_nanosleep_restart(). It is now accessed via the new
call out. The compat_clock_nanosleep_restart() is used for compatibility
access.

Since this is implemented in compatibility mode the normal path is
virtually unaffected - no real performance impact.

Signed-off-by: Toyo Abe
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Toyo Abe
2006-09-30 00:18:15 +0800

18 Jun, 2006

3 commits

f53ae1dc3 [PATCH] arm_timer: remove a racy and obsolete PF_EXITING check ... Browse Code »

arm_timer() checks PF_EXITING to prevent BUG_ON(->exit_state)
in run_posix_cpu_timers().

However, for some reason it does so only for CPUCLOCK_PERTHREAD
case (which is imho wrong).

Also, this check is not reliable, PF_EXITING could be set on
another cpu without any locks/barriers just after the check,
so it can't prevent from attaching the timer to the exiting
task.

The previous patch makes this check unneeded.

Signed-off-by: Oleg Nesterov
Signed-off-by: Linus Torvalds

Oleg Nesterov
2006-06-18 01:52:13 +0800
30f1e3dd8 [PATCH] run_posix_cpu_timers: remove a bogus BUG_ON() ... Browse Code »

do_exit() clears ->it_##clock##_expires, but nothing prevents
another cpu to attach the timer to exiting process after that.
arm_timer() tries to protect against this race, but the check
is racy.

After exit_notify() does 'write_unlock_irq(&tasklist_lock)' and
before do_exit() calls 'schedule() local timer interrupt can find
tsk->exit_state != 0. If that state was EXIT_DEAD (or another cpu
does sys_wait4) interrupted task has ->signal == NULL.

At this moment exiting task has no pending cpu timers, they were
cleanuped in __exit_signal()->posix_cpu_timers_exit{,_group}(),
so we can just return from irq.

John Stultz recently confirmed this bug, see

http://marc.theaimsgroup.com/?l=linux-kernel&m=115015841413687

Signed-off-by: Oleg Nesterov
Signed-off-by: Linus Torvalds

Oleg Nesterov
2006-06-18 01:52:13 +0800
8f17fc20b [PATCH] check_process_timers: fix possible lockup ... Browse Code »

If the local timer interrupt happens just after do_exit() sets PF_EXITING
(and before it clears ->it_xxx_expires) run_posix_cpu_timers() will call
check_process_timers() with tasklist_lock + ->siglock held and

check_process_timers:

t = tsk;
do {
....

do {
t = next_thread(t);
} while (unlikely(t->flags & PF_EXITING));
} while (t != tsk);

the outer loop will never stop.

Actually, the window is bigger. Another process can attach the timer
after ->it_xxx_expires was cleared (see the next commit) and the 'if
(PF_EXITING)' check in arm_timer() is racy (see the one after that).

Signed-off-by: Oleg Nesterov
Signed-off-by: Linus Torvalds

Oleg Nesterov
2006-06-18 01:52:13 +0800

11 Jan, 2006

2 commits

97735f25d [PATCH] hrtimer: switch clock_nanosleep to hrtimer nanosleep API ... Browse Code »

Switch clock_nanosleep to use the new nanosleep functions in hrtimer.c

Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Thomas Gleixner
2006-01-11 00:01:38 +0800
a924b04dd [PATCH] hrtimer: make clockid_t arguments const ... Browse Code »

add const arguments to the posix-timers.h API functions

Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Thomas Gleixner
2006-01-11 00:01:36 +0800

07 Jan, 2006

1 commit

0aec63e67 [PATCH] Fix posix-cpu-timers sched_time accumulation ... Browse Code »

I've spent the past 3 days digging into a glibc testsuite failure in
current CVS, specifically libc/rt/tst-cputimer1.c The thr1 and thr2
timers fire too early in the second pass of this test. The second
pass is noteworthy because it makes use of intervals, whereas the
first pass does not.

All throughout the posix-cpu-timers.c code, the calculation of the
process sched_time sum is implemented roughly as:

unsigned long long sum;

sum = tsk->signal->sched_time;
t = tsk;
do {
sum += t->sched_time;
t = next_thread(t);
} while (t != tsk);

In fact this is the exact scheme used by check_process_timers().

In the case of check_process_timers(), current->sched_time has just
been updated (via scheduler_tick(), which is invoked by
update_process_times(), which subsequently invokes
run_posix_cpu_timers()) So there is no special processing necessary
wrt. that.

In other contexts, we have to allot for the fact that tsk->sched_time
might be a bit out of date if we are current. And the
posix-cpu-timers.c code uses current_sched_time() to deal with that.

Unfortunately it does so in an erroneous and inconsistent manner in
one spot which is what results in the early timer firing.

In cpu_clock_sample_group_locked(), it does this:

cpu->sched = p->signal->sched_time;
/* Add in each other live thread. */
while ((t = next_thread(t)) != p) {
cpu->sched += t->sched_time;
}
if (p->tgid == current->tgid) {
/*
* We're sampling ourselves, so include the
* cycles not yet banked. We still omit
* other threads running on other CPUs,
* so the total can always be behind as
* much as max(nthreads-1,ncpus) * (NSEC_PER_SEC/HZ).
*/
cpu->sched += current_sched_time(current);
} else {
cpu->sched += p->sched_time;
}

The problem is the "p->tgid == current->tgid" test. If "p" is
not current, and the tgids are the same, we will add the process
t->sched_time twice into cpu->sched and omit "p"'s sched_time
which is very very very wrong.

posix-cpu-timers.c has a helper function, sched_ns(p) which takes care
of this, so my fix is to use that here instead of this special tgid
test.

The fact that current can be one of the sub-threads of "p" points out
that we could make things a little bit more accurate, perhaps by using
sched_ns() on every thread we process in these loops. It also points
out that we don't use the most accurate value for threads in the group
actively running other cpus (and this is mentioned in the comment).

But that is a future enhancement, and this fix here definitely makes
sense.

Signed-off-by: David S. Miller
Signed-off-by: Linus Torvalds

David S. Miller
2006-01-07 12:23:04 +0800

29 Nov, 2005

1 commit

ee500f274 [PATCH] fix 32bit overflow in timespec_to_sample() ... Browse Code »

fix 32bit overflow in timespec_to_sample()

Signed-off-by: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2005-11-29 06:42:23 +0800

07 Nov, 2005

1 commit

7fd93cf30 [PATCH] posix-timers `unlikely' rejig ... Browse Code »

!unlikely(expr) hurts my brain. likely(!expr) is more straightforward.

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2005-11-07 23:53:24 +0800

31 Oct, 2005

1 commit

708f430dc [PATCH] posix-cpu-timers: fix overrun reporting ... Browse Code »

This change corrects an omission in posix_cpu_timer_schedule, so that it
correctly propagates the overrun calculation to where it will get reported
to the user.

Signed-off-by: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Roland McGrath
2005-10-31 09:37:27 +0800

28 Oct, 2005

2 commits

72ab373a5 [PATCH] Yet more posix-cpu-timer fixes ... Browse Code »

This just makes sure that a thread's expiry times can't get reset after
it clears them in do_exit.

This is what allowed us to re-introduce the stricter BUG_ON() check in
a362f463a6d316d14daed0f817e151835ce97ff7.

Signed-off-by: Linus Torvalds

Roland McGrath
2005-10-28 00:08:43 +0800
a362f463a Revert "remove false BUG_ON() from run_posix_cpu_timers()" ... Browse Code »

This reverts commit 3de463c7d9d58f8cf3395268230cb20a4c15bffa.

Roland has another patch that allows us to leave the BUG_ON() in place
by just making sure that the condition it tests for really is always
true.

That goes in next.

Signed-off-by: Linus Torvalds

Linus Torvalds
2005-10-28 00:07:33 +0800

27 Oct, 2005

2 commits

7a4ed937a [PATCH] Fix cpu timers expiration time ... Browse Code »

There's a silly off-by-one error in the code that updates the expiration
of posix CPU timers, causing them to not be properly updated when they
hit exactly on their expiration time (which should be the normal case).

This causes them to then fire immediately again, and only _then_ get
properly updated.

Signed-off-by: Linus Torvalds

Oleg Nesterov
2005-10-27 06:21:14 +0800
70ab81c2e posix cpu timers: fix timer ordering ... Browse Code »

Pointed out by Oleg Nesterov, who has been walking over the code
forwards and backwards.

Signed-off-by: Linus Torvalds

Linus Torvalds
2005-10-27 02:23:06 +0800

24 Oct, 2005

5 commits

a69ac4a78 [PATCH] posix-timers: fix posix_cpu_timer_set() vs run_posix_cpu_timers() race ... Browse Code »

This might be harmless, but looks like a race from code inspection (I
was unable to trigger it). I must admit, I don't understand why we
can't return TIMER_RETRY after 'spin_unlock(&p->sighand->siglock)'
without doing bump_cpu_timer(), but this is what original code does.

posix_cpu_timer_set:

read_lock(&tasklist_lock);

spin_lock(&p->sighand->siglock);
list_del_init(&timer->it.cpu.entry);
spin_unlock(&p->sighand->siglock);

We are probaly deleting the timer from run_posix_cpu_timers's 'firing'
local list_head while run_posix_cpu_timers() does list_for_each_safe.

Various bad things can happen, for example we can just delete this timer
so that list_for_each() will not notice it and run_posix_cpu_timers()
will not reset '->firing' flag. In that case,

....

if (timer->it.cpu.firing) {
read_unlock(&tasklist_lock);
timer->it.cpu.firing = -1;
return TIMER_RETRY;
}

sys_timer_settime() goes to 'retry:', calls posix_cpu_timer_set() again,
it returns TIMER_RETRY ...

Signed-off-by: Oleg Nesterov
Signed-off-by: Linus Torvalds

Oleg Nesterov
2005-10-24 23:13:14 +0800
ca531a0a5 [PATCH] posix-timers: exit path cleanup ... Browse Code »

No need to rebalance when task exited

Signed-off-by: Oleg Nesterov
Signed-off-by: Linus Torvalds

Oleg Nesterov
2005-10-24 23:12:35 +0800
3de463c7d [PATCH] posix-timers: remove false BUG_ON() from run_posix_cpu_timers() ... Browse Code »

do_exit() clears ->it_##clock##_expires, but nothing prevents
another cpu to attach the timer to exiting process after that.

After exit_notify() does 'write_unlock_irq(&tasklist_lock)' and
before do_exit() calls 'schedule() local timer interrupt can find
tsk->exit_state != 0. If that state was EXIT_DEAD (or another cpu
does sys_wait4) interrupted task has ->signal == NULL.

At this moment exiting task has no pending cpu timers, they were cleaned
up in __exit_signal()->posix_cpu_timers_exit{,_group}(), so we can just
return from irq.

Signed-off-by: Oleg Nesterov
Signed-off-by: Linus Torvalds

Oleg Nesterov
2005-10-24 23:12:35 +0800
108150ea7 [PATCH] posix-timers: fix cleanup_timers() and run_posix_cpu_timers() races ... Browse Code »

1. cleanup_timers() sets timer->task = NULL under tasklist + ->sighand locks.
That means that this code in posix_cpu_timer_del() and posix_cpu_timer_set()

lock_timer(timer);
if (timer->task == NULL)
return;
read_lock(tasklist);
put_task_struct(timer->task)

is racy. With this patch timer->task modified and accounted only under
timer->it_lock. Sadly, this means that dead task_struct won't be freed
until timer deleted or armed.

2. run_posix_cpu_timers() collects expired timers into local list under
tasklist + ->sighand again. That means that posix_cpu_timer_del()
should check timer->it.cpu.firing under these locks too.

Signed-off-by: Oleg Nesterov
Signed-off-by: Linus Torvalds

Oleg Nesterov
2005-10-24 23:12:35 +0800
e80eda94d Posix timers: limit number of timers firing at once ... Browse Code »

Bursty timers aren't good for anybody, very much including latency for
other programs when we trigger lots of timers in interrupt context. So
set a random limit, after which we'll handle the rest on the next timer
tick.

Noted by Oleg Nesterov

Signed-off-by: Linus Torvalds

Linus Torvalds
2005-10-24 01:02:50 +0800

22 Oct, 2005

1 commit

9465bee86 Revert "Fix cpu timers exit deadlock and races" ... Browse Code »

Revert commit e03d13e985d48ac4885382c9e3b1510c78bd047f, to be replaced
by a much nicer fix from Roland.

Linus Torvalds
2005-10-22 06:36:00 +0800

20 Oct, 2005

1 commit

e03d13e98 [PATCH] Fix cpu timers exit deadlock and races ... Browse Code »

Oleg Nesterov reported an SMP deadlock. If there is a running timer
tracking a different process's CPU time clock when the process owning
the timer exits, we deadlock on tasklist_lock in posix_cpu_timer_del via
exit_itimers.

That code was using tasklist_lock to check for a race with __exit_signal
being called on the timer-target task and clearing its ->signal.
However, there is actually no such race. __exit_signal will have called
posix_cpu_timers_exit and posix_cpu_timers_exit_group before it does
that. Those will clear those k_itimer's association with the dying
task, so posix_cpu_timer_del will return early and never reach the code
in question.

In addition, posix_cpu_timer_del called from exit_itimers during execve
or directly from timer_delete in the process owning the timer can race
with an exiting timer-target task to cause a double put on timer-target
task struct. Make sure we always access cpu_timers lists with sighand
lock held.

Signed-off-by: Roland McGrath
Signed-off-by: Chris Wright
Signed-off-by: Linus Torvalds

Roland McGrath
2005-10-20 14:02:01 +0800

18 Oct, 2005

1 commit

47d6b0833 [PATCH] posix-timers: fix task accounting ... Browse Code »

Make sure we release the task struct properly when releasing pending
timers.

release_task() does write_lock_irq(&tasklist_lock), so it can't race
with run_posix_cpu_timers() on any cpu.

Signed-off-by: Oleg Nesterov
Signed-off-by: Linus Torvalds

Oleg Nesterov
2005-10-18 06:00:00 +0800

17 Apr, 2005

1 commit

1da177e4c Linux-2.6.12-rc2 ... Browse Code »

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!

Linus Torvalds
2005-04-17 06:20:36 +0800