Eric Lee / smarc-fsl-linux-kernel

27 Mar, 2010

1 commit

054319b5e Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kern… ... Browse Code »

…el/git/tip/linux-2.6-tip

* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
time: Fix accumulation bug triggered by long delay.
posix-cpu-timers: Reset expire cache when no timer is running
timer stats: Fix del_timer_sync() and try_to_del_timer_sync()
clockevents: Sanitize min_delta_ns adjustment and prevent overflows

Linus Torvalds
2010-03-27 06:10:38 +0800

13 Mar, 2010

1 commit

15365c108 posix-cpu-timers: Reset expire cache when no timer is running ... Browse Code »

When a process deletes cpu timer or a timer expires we do not clear
the expiration cache sig->cputimer_expires.

As a result the fastpath_timer_check() which prevents us to loop over
all threads in case no timer is active is not working and we run the
slow path needlessly on every tick.

Zero sig->cputimer_expires in stop_process_timers().

Signed-off-by: Stanislaw Gruszka
Cc: Ingo Molnar
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Hidetoshi Seto
Cc: Spencer Candland
Signed-off-by: Andrew Morton
Signed-off-by: Thomas Gleixner

Stanislaw Gruszka
2010-03-13 02:12:18 +0800

07 Mar, 2010

2 commits

78d7d407b kernel core: use helpers for rlimits ... Browse Code »

Make sure compiler won't do weird things with limits. E.g. fetching them
twice may return 2 different values after writable limits are implemented.

I.e. either use rlimit helpers added in commit 3e10e716abf3 ("resource:
add helpers for fetching rlimits") or ACCESS_ONCE if not applicable.

Signed-off-by: Jiri Slaby
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: john stultz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jiri Slaby
2010-03-07 03:26:33 +0800
d4bb52743 posix-cpu-timers: cleanup rlimits usage ... Browse Code »

Fetch rlimit (both hard and soft) values only once and work on them. It
removes many accesses through sig structure and makes the code cleaner.

Mostly a preparation for writable resource limits support.

Signed-off-by: Jiri Slaby
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: john stultz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jiri Slaby
2010-03-07 03:26:32 +0800

18 Nov, 2009

1 commit

ba5ea951d posix-cpu-timers: optimize and document timer_create callback ... Browse Code »

We have already new_timer initialized to all-zeros hence in function
initializations are not needed. Document function expectation about
new_timer argument as well.

Signed-off-by: Stanislaw Gruszka
Cc: johnstul@us.ibm.com
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Thomas Gleixner

Stanislaw Gruszka
2009-11-18 19:36:05 +0800

29 Aug, 2009

2 commits

3f0a525eb itimers: Add tracepoints for itimer ... Browse Code »

Add tracepoints for all itimer variants: ITIMER_REAL, ITIMER_VIRTUAL
and ITIMER_PROF.

[ tglx: Fixed comments and made the output more readable, parseable
and consistent. Replaced pid_vnr by pid_nr because the hrtimer
callback can happen in any namespace ]

Signed-off-by: Xiao Guangrong
Cc: Steven Rostedt
Cc: Frederic Weisbecker
Cc: Mathieu Desnoyers
Cc: Anton Blanchard
Cc: Peter Zijlstra
Cc: KOSAKI Motohiro
Cc: Zhaolei
LKML-Reference:
Signed-off-by: Thomas Gleixner

Xiao Guangrong
2009-08-29 20:10:07 +0800
f71bb0ac5 Merge branch 'timers/posixtimers' into timers/tracing ... Browse Code »

Merge reason: timer tracepoint patches depend on both branches

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2009-08-29 16:34:29 +0800

09 Aug, 2009

1 commit

17d42c1c4 posix_cpu_timers_exit_group(): Do not use thread_group_cputimer() ... Browse Code »

When the process exits we don't have to run new cputimer nor
use running one (as it not accounts when tsk->exit_state != 0)
to get process CPU times. As there is only one thread we can
just use CPU times fields from task and signal structs.

Signed-off-by: Stanislaw Gruszka
Cc: Peter Zijlstra
Cc: Roland McGrath
Cc: Vitaly Mayatskikh
Signed-off-by: Andrew Morton
Signed-off-by: Ingo Molnar

Stanislaw Gruszka
2009-08-09 00:30:25 +0800

03 Aug, 2009

4 commits

a42548a18 cputime: Optimize jiffies_to_cputime(1) ... Browse Code »

For powerpc with CONFIG_VIRT_CPU_ACCOUNTING
jiffies_to_cputime(1) is not compile time constant and run time
calculations are quite expensive. To optimize we use
precomputed value. For all other architectures is is
preprocessor definition.

Signed-off-by: Stanislaw Gruszka
Acked-by: Peter Zijlstra
Acked-by: Thomas Gleixner
Cc: Oleg Nesterov
Cc: Andrew Morton
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
LKML-Reference:
Signed-off-by: Ingo Molnar

Stanislaw Gruszka
2009-08-03 20:48:36 +0800
d1e3b6d19 itimers: Simplify arm_timer() code a bit ... Browse Code »

Don't update values in expiration cache when new ones are
equal. Add expire_le() and expire_gt() helpers to simplify the
code.

Signed-off-by: Stanislaw Gruszka
Acked-by: Peter Zijlstra
Acked-by: Thomas Gleixner
Cc: Oleg Nesterov
Cc: Andrew Morton
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
LKML-Reference:
Signed-off-by: Ingo Molnar

Stanislaw Gruszka
2009-08-03 20:48:36 +0800
8356b5f9c itimers: Fix periodic tics precision ... Browse Code »

Measure ITIMER_PROF and ITIMER_VIRT timers interval error
between real ticks and requested by user. Take it into account
when scheduling next tick.

This patch introduce possibility where time between two
consecutive tics is smaller then requested interval, it
preserve however dependency that n tick is generated not
earlier than n*interval time - counting from the beginning of
periodic signal generation.

Signed-off-by: Stanislaw Gruszka
Acked-by: Peter Zijlstra
Acked-by: Thomas Gleixner
Cc: Oleg Nesterov
Cc: Andrew Morton
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
LKML-Reference:
Signed-off-by: Ingo Molnar

Stanislaw Gruszka
2009-08-03 20:48:35 +0800
42c4ab41a itimers: Merge ITIMER_VIRT and ITIMER_PROF ... Browse Code »

Both cpu itimers have same data flow in the few places, this
patch make unification of code related with VIRT and PROF
itimers.

Signed-off-by: Stanislaw Gruszka
Acked-by: Peter Zijlstra
Acked-by: Thomas Gleixner
Cc: Oleg Nesterov
Cc: Andrew Morton
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
LKML-Reference:
Signed-off-by: Ingo Molnar

Stanislaw Gruszka
2009-08-03 20:48:35 +0800

30 Apr, 2009

1 commit

6e85c5ba7 kernel/posix-cpu-timers.c: fix sparse warning ... Browse Code »

Sparse reports the following in kernel/posix-cpu-timers.c:

warning: symbol 'firing' shadows an earlier one

Signed-off-by: H Hartley Sweeten
Cc: Subrata Modak
LKML-Reference:
Signed-off-by: Ingo Molnar

H Hartley Sweeten
2009-04-30 14:08:31 +0800

10 Apr, 2009

1 commit

17b2e9bf2 Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kerne… ... Browse Code »

…l/git/tip/linux-2.6-tip

* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: do not count frozen tasks toward load
sched: refresh MAINTAINERS entry
sched: Print sched_group::__cpu_power in sched_domain_debug
cpuacct: add per-cgroup utime/stime statistics
posixtimers, sched: Fix posix clock monotonicity
sched_rt: don't allocate cpumask in fastpath
cpuacct: make cpuacct hierarchy walk in cpuacct_charge() safe when rcupreempt is used -v2

Linus Torvalds
2009-04-10 01:37:28 +0800

08 Apr, 2009

2 commits

8f2e58656 posix-timers: fix RLIMIT_CPU && setitimer(CPUCLOCK_PROF) ... Browse Code »

update_rlimit_cpu() tries to optimize out set_process_cpu_timer() in case
when we already have CPUCLOCK_PROF timer which should expire first. But it
uses cputime_lt() instead of cputime_gt().

Test case:

int main(void)
{
struct itimerval it = {
.it_value = { .tv_sec = 1000 },
};

assert(!setitimer(ITIMER_PROF, &it, NULL));

struct rlimit rl = {
.rlim_cur = 1,
.rlim_max = 1,
};

assert(!setrlimit(RLIMIT_CPU, &rl));

for (;;)
;

return 0;
}

Without this patch, the task is not killed as RLIMIT_CPU demands.

Signed-off-by: Oleg Nesterov
Acked-by: Peter Zijlstra
Cc: Peter Lojkin
Cc: Roland McGrath
Cc: stable@kernel.org
LKML-Reference:
Signed-off-by: Ingo Molnar

Oleg Nesterov
2009-04-08 23:51:39 +0800
5af8c4e0f Merge commit 'v2.6.30-rc1' into sched/urgent ... Browse Code »

Merge reason: update to latest upstream to queue up fix

Signed-off-by: Ingo Molnar

Ingo Molnar
2009-04-08 23:26:00 +0800

01 Apr, 2009

1 commit

c5f8d9958 posixtimers, sched: Fix posix clock monotonicity ... Browse Code »

Impact: Regression fix (against clock_gettime() backwarding bug)

This patch re-introduces a couple of functions, task_sched_runtime
and thread_group_sched_runtime, which was once removed at the
time of 2.6.28-rc1.

These functions protect the sampling of thread/process clock with
rq lock. This rq lock is required not to update rq->clock during
the sampling.

i.e.
The clock_gettime() may return
((accounted runtime before update) + (delta after update))
that is less than what it should be.

v2 -> v3:
- Rename static helper function __task_delta_exec()
to do_task_delta_exec() since -tip tree already has
a __task_delta_exec() of different version.

v1 -> v2:
- Revises comments of function and patch description.
- Add note about accuracy of thread group's runtime.

Signed-off-by: Hidetoshi Seto
Acked-by: Peter Zijlstra
Cc: stable@kernel.org [2.6.28.x][2.6.29.x]
LKML-Reference:
Signed-off-by: Ingo Molnar

Hidetoshi Seto
2009-04-01 22:44:16 +0800

24 Mar, 2009

1 commit

37bebc70d posix timers: fix RLIMIT_CPU && fork() ... Browse Code »

See http://bugzilla.kernel.org/show_bug.cgi?id=12911

copy_signal() copies signal->rlim, but RLIMIT_CPU is "lost". Because
posix_cpu_timers_init_group() sets cputime_expires.prof_exp = 0 and thus
fastpath_timer_check() returns false unless we have other cpu timers.

This is the minimal fix for 2.6.29 (tested) and 2.6.28. The patch is not
optimal, we need further cleanups here. With this patch update_rlimit_cpu()
is not really needed, but I don't think it should be removed.

The proper fix (I think) is:

- set_process_cpu_timer() should just start the cputimer->running
logic (it does), no need to change cputime_expires.xxx_exp

- posix_cpu_timers_init_group() should set ->running when needed

- fastpath_timer_check() can check ->running instead of
task_cputime_zero(signal->cputime_expires)

Reported-by: Peter Lojkin
Signed-off-by: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Roland McGrath
Cc: [for 2.6.29.x]
LKML-Reference:
Signed-off-by: Ingo Molnar

Oleg Nesterov
2009-03-24 03:43:35 +0800

13 Feb, 2009

1 commit

3997ad317 timers: more consistently use clock vs timer ... Browse Code »

While reviewing the manpages, I noticed I'd missed some clock vs timer sites.

Make sure that all timer functions call cpu_timer_sample_group() and not
cpu_clock_sample_group(). This ensures that we enable the process wide timer
in time, and therefore pay the O(n) thread group cost from the syscall.

Not doing it here, will result in the first jiffy tick after setting the timer
doing this, resulting in a very expensive tick (but only once) and a delay in
actually starting the timer.

Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-02-13 20:04:05 +0800

11 Feb, 2009

2 commits

4da94d49b timers: fix TIMER_ABSTIME for process wide cpu timers ... Browse Code »

The POSIX timer interface allows for absolute time expiry values through the
TIMER_ABSTIME flag, therefore we have to synchronize the timer to the clock
every time we start it.

Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-02-11 21:04:21 +0800
3fccfd67d timers: split process wide cpu clocks/timers, fix ... Browse Code »

To decrease the chance of a missed enable, always enable the timer when we
sample it, we'll always disable it when we find that there are no active timers
in the jiffy tick.

This fixes a flood of warnings reported by Mike Galbraith.

Reported-by: Mike Galbraith
Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-02-11 21:04:19 +0800

05 Feb, 2009

1 commit

4cd4c1b40 timers: split process wide cpu clocks/timers ... Browse Code »

Change the process wide cpu timers/clocks so that we:

1) don't mess up the kernel with too many threads,
2) don't have a per-cpu allocation for each process,
3) have no impact when not used.

In order to accomplish this we're going to split it into two parts:

- clocks; which can take all the time they want since they run
from user context -- ie. sys_clock_gettime(CLOCK_PROCESS_CPUTIME_ID)

- timers; which need constant time sampling but since they're
explicity used, the user can pay the overhead.

The clock readout will go back to a full sum of the thread group, while the
timers will run of a global 'clock' that only runs when needed, so only
programs that make use of the facility pay the price.

Signed-off-by: Peter Zijlstra
Reviewed-by: Ingo Molnar
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-02-05 20:04:33 +0800

08 Jan, 2009

1 commit

490dea45d itimers: remove the per-cpu-ish-ness ... Browse Code »

Either we bounce once cacheline per cpu per tick, yielding n^2 bounces
or we just bounce a single..

Also, using per-cpu allocations for the thread-groups complicates the
per-cpu allocator in that its currently aimed to be a fixed sized
allocator and the only possible extention to that would be vmap based,
which is seriously constrained on 32 bit archs.

So making the per-cpu memory requirement depend on the number of
processes is an issue.

Lastly, it didn't deal with cpu-hotplug, although admittedly that might
be fixable.

Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-01-08 01:52:44 +0800

25 Dec, 2008

1 commit

0b271ef45 Merge commit 'v2.6.28' into core/core Browse Code »

Ingo Molnar
2008-12-25 20:51:46 +0800

24 Nov, 2008

1 commit

eccdaeafa posix-cpu-timers: fix clock_gettime with CLOCK_PROCESS_CPUTIME_ID ... Browse Code »

Since CLOCK_PROCESS_CPUTIME_ID is in fact translated to -6, the switch
statement in cpu_clock_sample_group() must first mask off the irrelevant
bits, similar to cpu_clock_sample().

Signed-off-by: Petr Tesarik
Signed-off-by: Thomas Gleixner

--
posix-cpu-timers.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Petr Tesarik
2008-11-24 23:41:40 +0800

17 Nov, 2008

2 commits

ce394471d thread_group_cputime: kill the bogus ->signal != NULL check ... Browse Code »

Impact: simplify the code

thread_group_cputime() is called by current when it must have the valid
->signal, or under ->siglock, or under tasklist_lock after the ->signal
check, or the caller is wait_task_zombie() which reaps the child. In any
case ->signal can't be NULL.

But the point of this patch is not optimization. If it is possible to call
thread_group_cputime() when ->signal == NULL we are doing something wrong,
and we should not mask the problem. thread_group_cputime() fills *times
and the caller will use it, if we silently use task_struct->*times* we
report the wrong values.

Signed-off-by: Oleg Nesterov
Signed-off-by: Ingo Molnar

Oleg Nesterov
2008-11-17 23:55:54 +0800
ad133ba3d sched, signals: fix the racy usage of ->signal in account_group_xxx/run_posix_cpu_timers ... Browse Code »

Impact: fix potential NULL dereference

Contrary to ad474caca3e2a0550b7ce0706527ad5ab389a4d4 changelog, other
acct_group_xxx() helpers can be called after exit_notify() by timer tick.
Thanks to Roland for pointing out this. Somehow I missed this simple fact
when I read the original patch, and I am afraid I confused Frank during
the discussion. Sorry.

Fortunately, these helpers work with current, we can check ->exit_state
to ensure that ->signal can't go away under us.

Also, add the comment and compiler barrier to account_group_exec_runtime(),
to make sure we load ->signal only once.

Signed-off-by: Oleg Nesterov
Signed-off-by: Ingo Molnar

Oleg Nesterov
2008-11-17 23:49:35 +0800

23 Sep, 2008

1 commit

bb34d92f6 timers: fix itimer/many thread hang, v2 ... Browse Code »

This is the second resubmission of the posix timer rework patch, posted
a few days ago.

This includes the changes from the previous resubmittion, which addressed
Oleg Nesterov's comments, removing the RCU stuff from the patch and
un-inlining the thread_group_cputime() function for SMP.

In addition, per Ingo Molnar it simplifies the UP code, consolidating much
of it with the SMP version and depending on lower-level SMP/UP handling to
take care of the differences.

It also cleans up some UP compile errors, moves the scheduler stats-related
macros into kernel/sched_stats.h, cleans up a merge error in
kernel/fork.c and has a few other minor fixes and cleanups as suggested
by Oleg and Ingo. Thanks for the review, guys.

Signed-off-by: Frank Mayhar
Cc: Roland McGrath
Cc: Alexey Dobriyan
Cc: Andrew Morton
Signed-off-by: Ingo Molnar

Frank Mayhar
2008-09-23 19:38:44 +0800

14 Sep, 2008

2 commits

5ce73a4a5 timers: fix itimer/many thread hang, cleanups ... Browse Code »

Signed-off-by: Ingo Molnar

Ingo Molnar
2008-09-14 23:11:46 +0800
f06febc96 timers: fix itimer/many thread hang ... Browse Code »

Overview

This patch reworks the handling of POSIX CPU timers, including the
ITIMER_PROF, ITIMER_VIRT timers and rlimit handling. It was put together
with the help of Roland McGrath, the owner and original writer of this code.

The problem we ran into, and the reason for this rework, has to do with using
a profiling timer in a process with a large number of threads. It appears
that the performance of the old implementation of run_posix_cpu_timers() was
at least O(n*3) (where "n" is the number of threads in a process) or worse.
Everything is fine with an increasing number of threads until the time taken
for that routine to run becomes the same as or greater than the tick time, at
which point things degrade rather quickly.

This patch fixes bug 9906, "Weird hang with NPTL and SIGPROF."

Code Changes

This rework corrects the implementation of run_posix_cpu_timers() to make it
run in constant time for a particular machine. (Performance may vary between
one machine and another depending upon whether the kernel is built as single-
or multiprocessor and, in the latter case, depending upon the number of
running processors.) To do this, at each tick we now update fields in
signal_struct as well as task_struct. The run_posix_cpu_timers() function
uses those fields to make its decisions.

We define a new structure, "task_cputime," to contain user, system and
scheduler times and use these in appropriate places:

struct task_cputime {
cputime_t utime;
cputime_t stime;
unsigned long long sum_exec_runtime;
};

This is included in the structure "thread_group_cputime," which is a new
substructure of signal_struct and which varies for uniprocessor versus
multiprocessor kernels. For uniprocessor kernels, it uses "task_cputime" as
a simple substructure, while for multiprocessor kernels it is a pointer:

struct thread_group_cputime {
struct task_cputime totals;
};

struct thread_group_cputime {
struct task_cputime *totals;
};

We also add a new task_cputime substructure directly to signal_struct, to
cache the earliest expiration of process-wide timers, and task_cputime also
replaces the it_*_expires fields of task_struct (used for earliest expiration
of thread timers). The "thread_group_cputime" structure contains process-wide
timers that are updated via account_user_time() and friends. In the non-SMP
case the structure is a simple aggregator; unfortunately in the SMP case that
simplicity was not achievable due to cache-line contention between CPUs (in
one measured case performance was actually _worse_ on a 16-cpu system than
the same test on a 4-cpu system, due to this contention). For SMP, the
thread_group_cputime counters are maintained as a per-cpu structure allocated
using alloc_percpu(). The timer functions update only the timer field in
the structure corresponding to the running CPU, obtained using per_cpu_ptr().

We define a set of inline functions in sched.h that we use to maintain the
thread_group_cputime structure and hide the differences between UP and SMP
implementations from the rest of the kernel. The thread_group_cputime_init()
function initializes the thread_group_cputime structure for the given task.
The thread_group_cputime_alloc() is a no-op for UP; for SMP it calls the
out-of-line function thread_group_cputime_alloc_smp() to allocate and fill
in the per-cpu structures and fields. The thread_group_cputime_free()
function, also a no-op for UP, in SMP frees the per-cpu structures. The
thread_group_cputime_clone_thread() function (also a UP no-op) for SMP calls
thread_group_cputime_alloc() if the per-cpu structures haven't yet been
allocated. The thread_group_cputime() function fills the task_cputime
structure it is passed with the contents of the thread_group_cputime fields;
in UP it's that simple but in SMP it must also safely check that tsk->signal
is non-NULL (if it is it just uses the appropriate fields of task_struct) and,
if so, sums the per-cpu values for each online CPU. Finally, the three
functions account_group_user_time(), account_group_system_time() and
account_group_exec_runtime() are used by timer functions to update the
respective fields of the thread_group_cputime structure.

Non-SMP operation is trivial and will not be mentioned further.

The per-cpu structure is always allocated when a task creates its first new
thread, via a call to thread_group_cputime_clone_thread() from copy_signal().
It is freed at process exit via a call to thread_group_cputime_free() from
cleanup_signal().

All functions that formerly summed utime/stime/sum_sched_runtime values from
from all threads in the thread group now use thread_group_cputime() to
snapshot the values in the thread_group_cputime structure or the values in
the task structure itself if the per-cpu structure hasn't been allocated.

Finally, the code in kernel/posix-cpu-timers.c has changed quite a bit.
The run_posix_cpu_timers() function has been split into a fast path and a
slow path; the former safely checks whether there are any expired thread
timers and, if not, just returns, while the slow path does the heavy lifting.
With the dedicated thread group fields, timers are no longer "rebalanced" and
the process_timer_rebalance() function and related code has gone away. All
summing loops are gone and all code that used them now uses the
thread_group_cputime() inline. When process-wide timers are set, the new
task_cputime structure in signal_struct is used to cache the earliest
expiration; this is checked in the fast path.

Performance

The fix appears not to add significant overhead to existing operations. It
generally performs the same as the current code except in two cases, one in
which it performs slightly worse (Case 5 below) and one in which it performs
very significantly better (Case 2 below). Overall it's a wash except in those
two cases.

I've since done somewhat more involved testing on a dual-core Opteron system.

Case 1: With no itimer running, for a test with 100,000 threads, the fixed
kernel took 1428.5 seconds, 513 seconds more than the unfixed system,
all of which was spent in the system. There were twice as many
voluntary context switches with the fix as without it.

Case 2: With an itimer running at .01 second ticks and 4000 threads (the most
an unmodified kernel can handle), the fixed kernel ran the test in
eight percent of the time (5.8 seconds as opposed to 70 seconds) and
had better tick accuracy (.012 seconds per tick as opposed to .023
seconds per tick).

Case 3: A 4000-thread test with an initial timer tick of .01 second and an
interval of 10,000 seconds (i.e. a timer that ticks only once) had
very nearly the same performance in both cases: 6.3 seconds elapsed
for the fixed kernel versus 5.5 seconds for the unfixed kernel.

With fewer threads (eight in these tests), the Case 1 test ran in essentially
the same time on both the modified and unmodified kernels (5.2 seconds versus
5.8 seconds). The Case 2 test ran in about the same time as well, 5.9 seconds
versus 5.4 seconds but again with much better tick accuracy, .013 seconds per
tick versus .025 seconds per tick for the unmodified kernel.

Since the fix affected the rlimit code, I also tested soft and hard CPU limits.

Case 4: With a hard CPU limit of 20 seconds and eight threads (and an itimer
running), the modified kernel was very slightly favored in that while
it killed the process in 19.997 seconds of CPU time (5.002 seconds of
wall time), only .003 seconds of that was system time, the rest was
user time. The unmodified kernel killed the process in 20.001 seconds
of CPU (5.014 seconds of wall time) of which .016 seconds was system
time. Really, though, the results were too close to call. The results
were essentially the same with no itimer running.

Case 5: With a soft limit of 20 seconds and a hard limit of 2000 seconds
(where the hard limit would never be reached) and an itimer running,
the modified kernel exhibited worse tick accuracy than the unmodified
kernel: .050 seconds/tick versus .028 seconds/tick. Otherwise,
performance was almost indistinguishable. With no itimer running this
test exhibited virtually identical behavior and times in both cases.

In times past I did some limited performance testing. those results are below.

On a four-cpu Opteron system without this fix, a sixteen-thread test executed
in 3569.991 seconds, of which user was 3568.435s and system was 1.556s. On
the same system with the fix, user and elapsed time were about the same, but
system time dropped to 0.007 seconds. Performance with eight, four and one
thread were comparable. Interestingly, the timer ticks with the fix seemed
more accurate: The sixteen-thread test with the fix received 149543 ticks
for 0.024 seconds per tick, while the same test without the fix received 58720
for 0.061 seconds per tick. Both cases were configured for an interval of
0.01 seconds. Again, the other tests were comparable. Each thread in this
test computed the primes up to 25,000,000.

I also did a test with a large number of threads, 100,000 threads, which is
impossible without the fix. In this case each thread computed the primes only
up to 10,000 (to make the runtime manageable). System time dominated, at
1546.968 seconds out of a total 2176.906 seconds (giving a user time of
629.938s). It received 147651 ticks for 0.015 seconds per tick, still quite
accurate. There is obviously no comparable test without the fix.

Signed-off-by: Frank Mayhar
Cc: Roland McGrath
Cc: Alexey Dobriyan
Cc: Andrew Morton
Signed-off-by: Ingo Molnar

Frank Mayhar
2008-09-14 22:25:35 +0800

25 May, 2008

1 commit

81d50bb25 posix-timers: print RT watchdog message ... Browse Code »

It's useful to detect which process is killed by RT watchdog.

Signed-off-by: Hiroshi Shimamoto
Signed-off-by: Ingo Molnar
Signed-off-by: Thomas Gleixner

Hiroshi Shimamoto
2008-05-25 00:49:22 +0800

01 May, 2008

1 commit

f8bd2258e remove div_long_long_rem ... Browse Code »

x86 is the only arch right now, which provides an optimized for
div_long_long_rem and it has the downside that one has to be very careful that
the divide doesn't overflow.

The API is a little akward, as the arguments for the unsigned divide are
signed. The signed version also doesn't handle a negative divisor and
produces worse code on 64bit archs.

There is little incentive to keep this API alive, so this converts the few
users to the new API.

Signed-off-by: Roman Zippel
Cc: Ralf Baechle
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: john stultz
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Roman Zippel
2008-05-01 23:03:58 +0800

17 Apr, 2008

1 commit

ee7dd205b posix-timers: fix shadowed variables ... Browse Code »

Fix sparse warnings like this:
kernel/posix-cpu-timers.c:1090:25: warning: symbol 't' shadows an earlier one
kernel/posix-cpu-timers.c:1058:21: originally declared here

Signed-off-by: WANG Cong
Signed-off-by: Andrew Morton
Signed-off-by: Thomas Gleixner

WANG Cong
2008-04-17 18:22:30 +0800

09 Feb, 2008

1 commit

8dc86af00 Use find_task_by_vpid in posix timers ... Browse Code »

All the functions that need to lookup a task by pid in posix timers obtain
this pid from a user space, and thus this value refers to a task in the same
namespace, as the current task lives in.

So the proper behavior is to call find_task_by_vpid() here.

Signed-off-by: Pavel Emelyanov
Cc: "Eric W. Biederman"
Cc: Thomas Gleixner
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2008-02-09 01:22:41 +0800

26 Jan, 2008

2 commits

5a52dd500 sched: rt-watchdog: fix .rlim_max = RLIM_INFINITY ... Browse Code »

Remove the curious logic to set it_sched_expires in the future. It useless
because rt.timeout wouldn't be incremented anyway.

Explicity check for RLIM_INFINITY as a test programm that had a 1s soft limit
and a inf hard limit would SIGKILL at 1s. This is because RLIM_INFINITY+d-1
is d-2.

Signed-off-by: Peter Zijlsta
CC: Michal Schmidt
Signed-off-by: Ingo Molnar

Peter Zijlstra
2008-01-26 04:08:32 +0800
78f2c7db6 sched: SCHED_FIFO/SCHED_RR watchdog timer ... Browse Code »

Introduce a new rlimit that allows the user to set a runtime timeout on
real-time tasks their slice. Once this limit is exceeded the task will receive
SIGXCPU.

So it measures runtime since the last sleep.

Input and ideas by Thomas Gleixner and Lennart Poettering.

Signed-off-by: Peter Zijlstra
CC: Lennart Poettering
CC: Michael Kerrisk
CC: Ulrich Drepper
Signed-off-by: Ingo Molnar

Peter Zijlstra
2008-01-26 04:08:27 +0800

20 Oct, 2007

1 commit

bac0abd61 Isolate some explicit usage of task->tgid ... Browse Code »

With pid namespaces this field is now dangerous to use explicitly, so hide
it behind the helpers.

Also the pid and pgrp fields o task_struct and signal_struct are to be
deprecated. Unfortunately this patch cannot be sent right now as this
leads to tons of warnings, so start isolating them, and deprecate later.

Actually the p->tgid == pid has to be changed to has_group_leader_pid(),
but Oleg pointed out that in case of posix cpu timers this is the same, and
thread_group_leader() is more preferable.

Signed-off-by: Pavel Emelyanov
Acked-by: Oleg Nesterov
Cc: Sukadev Bhattiprolu
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2007-10-20 02:53:40 +0800

10 Jul, 2007

1 commit

41b86e9c5 sched: make posix-cpu-timers use CFS's accounting information ... Browse Code »

update the posix-cpu-timers code to use CFS's CPU accounting information.

Signed-off-by: Ingo Molnar

Ingo Molnar
2007-07-10 00:51:58 +0800

09 May, 2007

1 commit

b5e618181 Introduce a handy list_first_entry macro ... Browse Code »

There are many places in the kernel where the construction like

foo = list_entry(head->next, struct foo_struct, list);

are used.
The code might look more descriptive and neat if using the macro

list_first_entry(head, type, member) \
list_entry((head)->next, type, member)

Here is the macro itself and the examples of its usage in the generic code.
If it will turn out to be useful, I can prepare the set of patches to
inject in into arch-specific code, drivers, networking, etc.

Signed-off-by: Pavel Emelianov
Signed-off-by: Kirill Korotaev
Cc: Randy Dunlap
Cc: Andi Kleen
Cc: Zach Brown
Cc: Davide Libenzi
Cc: John McCutchan
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: john stultz
Cc: Ram Pai
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelianov
2007-05-09 02:15:11 +0800

17 Feb, 2007

1 commit

1f2ea0837 [PATCH] posix timers: RCU optimization for clock_gettime() ... Browse Code »

Use RCU to avoid the need to acquire tasklist_lock in the single-threaded
case of clock_gettime(). It still acquires tasklist_lock when for a
(potentially multithreaded) process. This change allows realtime
applications to frequently monitor CPU consumption of individual tasks, as
requested (and now deployed) by some off-list users.

This has been in Ingo Molnar's -rt patchset since late 2005 with no
problems reported, and tests successfully on 2.6.20-rc6, so I believe that
it is long-since ready for mainline adoption.

[paulmck@linux.vnet.ibm.com: fix exit()/posix_cpu_clock_get() race spotted by Oleg]
Signed-off-by: Paul E. McKenney
Signed-off-by: Ingo Molnar
Cc: Thomas Gleixner
Cc: john stultz
Cc: Roman Zippel
Cc: Oleg Nesterov
Signed-off-by: Paul E. McKenney
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul E. McKenney
2007-02-17 00:14:00 +0800