Doug / smarc-fsl-linux-kernel | Embedian Git Server

03 Jul, 2013

5 commits

a0b2062b0 posix_timers: fix racy timer delta caching on task exit ... Browse Code »

When a task exits, we perform a caching of the remaining cputime delta
before expiring of its timers.

This is done from the following places:

* When the task is reaped. We iterate through its list of
posix cpu timers and store the remaining timer delta to
the timer struct instead of the absolute value.
(See posix_cpu_timers_exit() / posix_cpu_timers_exit_group() )

* When we call posix_cpu_timer_get() or posix_cpu_timer_schedule().
If the timer's task is considered dying when watched from these
places, the same conversion from absolute to relative expiry time
is performed. Then the given task's reference is released.
(See clear_dead_task() ).

The relevance of this caching is questionable but this is another
and deeper debate.

The big issue here is that these two sources of caching don't mix
up very well together.

More specifically, the caching can easily be done twice, resulting
in a wrong delta as it gets spuriously substracted a second time by
the elapsed clock. This can happen in the following scenario:

1) The task exits and gets reaped: we call posix_cpu_timers_exit()
and the absolute timer expiry values are converted to a relative
delta.

2) timer_gettime() -> posix_cpu_timer_get() is called and relies on
clear_dead_task() because tsk->exit_state == EXIT_DEAD.
The delta gets substracted again by the elapsed clock and we return
a wrong result.

To fix this, just remove the caching done on task reaping time. It
doesn't bring much value on its own. The caching done from
posix_cpu_timer_get/schedule is enough.

And it would also be hard to get it really right: we could make it put and
clear the target task in the timer struct so that readers know if they are
dealing with a relative cached of absolute value. But it would be racy.
The only safe way to do it would be to lock the itimer->it_lock so that we
know nobody reads the cputime expiry value while we modify it and its
target task reference. Doing so would involve some funny workarounds to
avoid circular lock against the sighand lock. There is just no reason to
maintain this.

The user visible effect of this patch can be observed by running the
following code: it creates a subthread that launches a posix cputimer
which expires after 10 seconds. But then the subthread only busy loops for 2
seconds and exits. The parent reaps the subthread and read the timer value.
Its expected value should the be the initial timer's expiration value
minus the cputime elapsed in the subthread. Roughly 10 - 2 = 8 seconds:

#include
#include
#include
#include
#include

static timer_t id;
static struct itimerspec val = { .it_value.tv_sec = 10, }, new;

static void *thread(void *unused)
{
int err;
struct timeval start, end, diff;

timer_create(CLOCK_THREAD_CPUTIME_ID, NULL, &id);
if (err < 0) {
perror("Can't create timer\n");
return NULL;
}

/* Arm 10 sec timer */
err = timer_settime(id, 0, &val, NULL);
if (err < 0) {
perror("Can't set timer\n");
return NULL;
}

/* Exit after 2 seconds of execution */
gettimeofday(&start, NULL);
do {
gettimeofday(&end, NULL);
timersub(&end, &start, &diff);
} while (diff.tv_sec < 2);

return NULL;
}

int main(int argc, char **argv)
{
pthread_t pthread;
int err;

err = pthread_create(&pthread, NULL, thread, NULL);
if (err) {
perror("Can't create thread\n");
return -1;
}
pthread_join(pthread, NULL);
/* Just wait a little bit to make sure the child got reaped */
sleep(1);
err = timer_gettime(id, &new);
if (err)
perror("Can't get timer value\n");
printf("%d %ld\n", new.it_value.tv_sec, new.it_value.tv_nsec);

return 0;
}

Before the patch:

$ ./posix_cpu_timers
6 2278074

After the patch:

$ ./posix_cpu_timers
8 1158766

Before the patch, the elapsed time got two more seconds spuriously accounted.

Signed-off-by: Frederic Weisbecker
Cc: Stanislaw Gruszka
Cc: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Oleg Nesterov
Cc: KOSAKI Motohiro
Cc: Olivier Langlois
Signed-off-by: Andrew Morton

Frederic Weisbecker
2013-07-03 22:54:42 +0800
76cdcdd97 posix-timers: correctly get dying task time sample in posix_cpu_timer_schedule() ... Browse Code »

In order to re-arm a timer after it fired, we take a sample of the current
process or thread cputime.

If the task is dying though, we don't arm anything but we cache the
remaining timer expiration delta for further reads.

Something similar is performed in posix_cpu_timer_get() but here we forget
to take the process wide cputime sample before caching it.

As a result we are storing random stack content, leading every further
reads of that timer to return junk values.

Fix this by taking the appropriate sample in the case of process wide
timers.

This probably doesn't matter much in practice because, at this stage, the
thread is the last one in the group and we reached exit_notify(). This
implies that we called exit_itimers() and there should be no more timers
to handle for that task.

So this is likely dead code anyway but let's fix the current logic
and the warning that came along:

kernel/posix-cpu-timers.c: In function 'posix_cpu_timer_schedule':
kernel/posix-cpu-timers.c:1127: warning: 'now' may be used uninitialized in this function

Then we can start to think further about cleaning up that code.

Reported-by: Andrew Morton
Reported-by: Chen Gang
Signed-off-by: Frederic Weisbecker
Cc: Stanislaw Gruszka
Cc: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Oleg Nesterov
Cc: Chen Gang
Cc: KOSAKI Motohiro
Cc: Olivier Langlois
Signed-off-by: Andrew Morton

Frederic Weisbecker
2013-07-03 22:20:20 +0800
2473f3e7a posix_cpu_timers: consolidate expired timers check ... Browse Code »

Consolidate the common code amongst per thread and per process timers list
on tick time.

List traversal, expiry check and subsequent updates can be shared in a
common helper.

Signed-off-by: Frederic Weisbecker
Cc: Stanislaw Gruszka
Cc: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Oleg Nesterov
Cc: KOSAKI Motohiro
Cc: Olivier Langlois
Signed-off-by: Andrew Morton

Frederic Weisbecker
2013-07-03 22:19:23 +0800
1a7fa510b posix_cpu_timers: consolidate timer list cleanups ... Browse Code »

Cleaning up the posix cpu timers on task exit shares some common code
among timer list types, most notably the list traversal and expiry time
update.

Unify this in a common helper.

Signed-off-by: Frederic Weisbecker
Cc: Stanislaw Gruszka
Cc: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Oleg Nesterov
Cc: KOSAKI Motohiro
Cc: Olivier Langlois
Signed-off-by: Andrew Morton

Frederic Weisbecker
2013-07-03 22:18:37 +0800
55ccb616a posix_cpu_timer: consolidate expiry time type ... Browse Code »

The posix cpu timer expiry time is stored in a union of two types: a 64
bits field if we rely on scheduler precise accounting, or a cputime_t if
we rely on jiffies.

This results in quite some duplicate code and special cases to handle the
two types.

Just unify this into a single 64 bits field. cputime_t can always fit
into it.

Signed-off-by: Frederic Weisbecker
Cc: Stanislaw Gruszka
Cc: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Oleg Nesterov
Cc: KOSAKI Motohiro
Cc: Olivier Langlois
Signed-off-by: Andrew Morton

Frederic Weisbecker
2013-07-03 22:16:20 +0800

23 Apr, 2013

1 commit

6ac29178b posix_timers: Fix pre-condition to stop the tick on full dynticks ... Browse Code »

The test that checks if a CPU can stop its tick from posix CPU
timers angle was mistakenly inverted.

What we want is to prevent the tick from being stopped as long
as the current CPU's task runs a posix CPU timer.

Fix this.

Signed-off-by: Frederic Weisbecker
Cc: Chris Metcalf
Cc: Christoph Lameter
Cc: Geoff Levand
Cc: Gilad Ben Yossef
Cc: Hakan Akkan
Cc: Ingo Molnar
Cc: Kevin Hilman
Cc: Li Zhong
Cc: Oleg Nesterov
Cc: Paul E. McKenney
Cc: Paul Gortmaker
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: Thomas Gleixner

Frederic Weisbecker
2013-04-23 01:59:25 +0800

19 Apr, 2013

2 commits

555347f6c posix_timers: New API to prevent from stopping the tick when timers are running ... Browse Code »

Bring a new helper that the full dynticks infrastructure can
call in order to know if it can safely stop the tick from
the posix cpu timers subsystem point of view.

Signed-off-by: Frederic Weisbecker
Cc: Chris Metcalf
Cc: Christoph Lameter
Cc: Geoff Levand
Cc: Gilad Ben Yossef
Cc: Hakan Akkan
Cc: Ingo Molnar
Cc: Kevin Hilman
Cc: Li Zhong
Cc: Oleg Nesterov
Cc: Paul E. McKenney
Cc: Paul Gortmaker
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: Thomas Gleixner

Frederic Weisbecker
2013-04-19 22:31:40 +0800
a85721601 posix_timers: Kick full dynticks CPUs when a posix cpu timer is armed ... Browse Code »

Kick the full dynticks CPUs when a posix cpu timer is enqueued by
way of a standard call to posix_cpu_timer_set() or set_process_cpu_timer().
This also include rescheduled firing timers.

This way they can re-evaluate the state of (and possibly restart) their
tick against the new expiry modification.

Signed-off-by: Frederic Weisbecker
Cc: Chris Metcalf
Cc: Christoph Lameter
Cc: Geoff Levand
Cc: Gilad Ben Yossef
Cc: Hakan Akkan
Cc: Ingo Molnar
Cc: Kevin Hilman
Cc: Li Zhong
Cc: Oleg Nesterov
Cc: Paul E. McKenney
Cc: Paul Gortmaker
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: Thomas Gleixner

Frederic Weisbecker
2013-04-19 22:30:13 +0800

20 Feb, 2013

1 commit

266d7ad7f Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull timer changes from Ingo Molnar:
"Main changes:

- ntp: Add CONFIG_RTC_SYSTOHC: a generic RTC driver facility
complementing the existing CONFIG_RTC_HCTOSYS, which uses NTP to
keep the hardware clock updated.

- posix-timers: Fix clock_adjtime to always return timex data on
success. This is changing the ABI, but no breakage was expected
and found - caution is warranted nevertheless.

- platform persistent clock improvements/cleanups.

- clockevents: refactor timer broadcast handling to be more generic
and less duplicated with matching architecture code (mostly ARM
motivated.)

- various fixes and cleanups"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timers/x86/hpet: Use HPET_COUNTER to specify the hpet counter in vread_hpet()
posix-cpu-timers: Fix nanosleep task_struct leak
clockevents: Fix generic broadcast for FEAT_C3STOP
time, Fix setting of hardware clock in NTP code
hrtimer: Prevent hrtimer_enqueue_reprogram race
clockevents: Add generic timer broadcast function
clockevents: Add generic timer broadcast receiver
timekeeping: Switch HAS_PERSISTENT_CLOCK to ALWAYS_USE_PERSISTENT_CLOCK
x86/time/rtc: Don't print extended CMOS year when reading RTC
x86: Select HAS_PERSISTENT_CLOCK on x86
timekeeping: Add CONFIG_HAS_PERSISTENT_CLOCK option
rtc: Skip the suspend/resume handling if persistent clock exist
timekeeping: Add persistent_clock_exist flag
posix-timers: Fix clock_adjtime to always return timex data on success
Round the calculated scale factor in set_cyc2ns_scale()
NTP: Add a CONFIG_RTC_SYSTOHC configuration
MAINTAINERS: Update John Stultz's email
time: create __getnstimeofday for WARNless calls

Linus Torvalds
2013-02-20 11:05:45 +0800

15 Feb, 2013

1 commit

e6c42c295 posix-cpu-timers: Fix nanosleep task_struct leak ... Browse Code »

The trinity fuzzer triggered a task_struct reference leak via
clock_nanosleep with CPU_TIMERs. do_cpu_nanosleep() calls
posic_cpu_timer_create(), but misses a corresponding
posix_cpu_timer_del() which leads to the task_struct reference leak.

Reported-and-tested-by: Tommi Rantala
Signed-off-by: Stanislaw Gruszka
Cc: Dave Jones
Cc: John Stultz
Cc: Oleg Nesterov
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20130215100810.GF4392@redhat.com
Signed-off-by: Thomas Gleixner

Stanislaw Gruszka
2013-02-15 18:41:56 +0800

28 Jan, 2013

1 commit

6fac4829c cputime: Use accessors to read task cputime stats ... Browse Code »

This is in preparation for the full dynticks feature. While
remotely reading the cputime of a task running in a full
dynticks CPU, we'll need to do some extra-computation. This
way we can account the time it spent tickless in userspace
since its last cputime snapshot.

Signed-off-by: Frederic Weisbecker
Cc: Andrew Morton
Cc: Ingo Molnar
Cc: Li Zhong
Cc: Namhyung Kim
Cc: Paul E. McKenney
Cc: Paul Gortmaker
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: Thomas Gleixner

Frederic Weisbecker
2013-01-28 02:23:31 +0800

20 Dec, 2012

1 commit

7005cd397 Merge tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random ... Browse Code »

Pull random updates from Ted Ts'o:
"A few /dev/random improvements for the v3.8 merge window."

* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
random: Mix cputime from each thread that exits to the pool
random: prime last_data value per fips requirements
random: fix debug format strings
random: make it possible to enable debugging without rebuild

Linus Torvalds
2012-12-20 12:23:37 +0800

17 Dec, 2012

1 commit

613370549 random: Mix cputime from each thread that exits to the pool ... Browse Code »

When a thread exits mix it's cputime (userspace + kernelspace) to the entropy pool.

We don't know how "random" this is, so we use add_device_randomness that doesn't mess
with entropy count.

Signed-off-by: Nick Kossifidis
Signed-off-by: Theodore Ts'o

Nick Kossifidis
2012-12-17 11:18:11 +0800

29 Nov, 2012

1 commit

a634f9333 cputime: Move thread_group_cputime() to sched code ... Browse Code »

thread_group_cputime() is a general cputime API that is not only
used by posix cpu timer. Let's move this helper to sched code.

Signed-off-by: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Steven Rostedt
Cc: Paul Gortmaker

Frederic Weisbecker
2012-11-29 00:07:38 +0800

15 Dec, 2011

1 commit

648616343 [S390] cputime: add sparse checking and cleanup ... Browse Code »

Make cputime_t and cputime64_t nocast to enable sparse checking to
detect incorrect use of cputime. Drop the cputime macros for simple
scalar operations. The conversion macros are still needed.

Signed-off-by: Martin Schwidefsky

Martin Schwidefsky
2011-12-15 21:56:19 +0800

26 Oct, 2011

1 commit

3cfef9524 Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
rtmutex: Add missing rcu_read_unlock() in debug_rt_mutex_print_deadlock()
lockdep: Comment all warnings
lib: atomic64: Change the type of local lock to raw_spinlock_t
locking, lib/atomic64: Annotate atomic64_lock::lock as raw
locking, x86, iommu: Annotate qi->q_lock as raw
locking, x86, iommu: Annotate irq_2_ir_lock as raw
locking, x86, iommu: Annotate iommu->register_lock as raw
locking, dma, ipu: Annotate bank_lock as raw
locking, ARM: Annotate low level hw locks as raw
locking, drivers/dca: Annotate dca_lock as raw
locking, powerpc: Annotate uic->lock as raw
locking, x86: mce: Annotate cmci_discover_lock as raw
locking, ACPI: Annotate c3_lock as raw
locking, oprofile: Annotate oprofilefs lock as raw
locking, video: Annotate vga console lock as raw
locking, latencytop: Annotate latency_lock as raw
locking, timer_stats: Annotate table_lock as raw
locking, rwsem: Annotate inner lock as raw
locking, semaphores: Annotate inner lock as raw
locking, sched: Annotate thread_group_cputimer as raw
...

Fix up conflicts in kernel/posix-cpu-timers.c manually: making
cputimer->cputime a raw lock conflicted with the ABBA fix in commit
bcd5cff7216f ("cputimer: Cure lock inversion").

Linus Torvalds
2011-10-26 22:17:32 +0800

18 Oct, 2011

1 commit

bcd5cff72 cputimer: Cure lock inversion ... Browse Code »

There's a lock inversion between the cputimer->lock and rq->lock;
notably the two callchains involved are:

update_rlimit_cpu()
sighand->siglock
set_process_cpu_timer()
cpu_timer_sample_group()
thread_group_cputimer()
cputimer->lock
thread_group_cputime()
task_sched_runtime()
->pi_lock
rq->lock

scheduler_tick()
rq->lock
task_tick_fair()
update_curr()
account_group_exec()
cputimer->lock

Where the first one is enabling a CLOCK_PROCESS_CPUTIME_ID timer, and
the second one is keeping up-to-date.

This problem was introduced by e8abccb7193 ("posix-cpu-timers: Cure
SMP accounting oddities").

Cure the problem by removing the cputimer->lock and rq->lock nesting,
this leaves concurrent enablers doing duplicate work, but the time
wasted should be on the same order otherwise wasted spinning on the
lock and the greater-than assignment filter should ensure we preserve
monotonicity.

Reported-by: Dave Jones
Reported-by: Simon Kirby
Signed-off-by: Peter Zijlstra
Cc: stable@kernel.org
Cc: Linus Torvalds
Cc: Martin Schwidefsky
Link: http://lkml.kernel.org/r/1318928713.21167.4.camel@twins
Signed-off-by: Thomas Gleixner

Peter Zijlstra
2011-10-18 17:36:59 +0800

30 Sep, 2011

1 commit

d670ec131 posix-cpu-timers: Cure SMP wobbles ... Browse Code »

David reported:

Attached below is a watered-down version of rt/tst-cpuclock2.c from
GLIBC. Just build it with "gcc -o test test.c -lpthread -lrt" or
similar.

Run it several times, and you will see cases where the main thread
will measure a process clock difference before and after the nanosleep
which is smaller than the cpu-burner thread's individual thread clock
difference. This doesn't make any sense since the cpu-burner thread
is part of the top-level process's thread group.

I've reproduced this on both x86-64 and sparc64 (using both 32-bit and
64-bit binaries).

For example:

[davem@boricha build-x86_64-linux]$ ./test
process: before(0.001221967) after(0.498624371) diff(497402404)
thread: before(0.000081692) after(0.498316431) diff(498234739)
self: before(0.001223521) after(0.001240219) diff(16698)
[davem@boricha build-x86_64-linux]$

The diff of 'process' should always be >= the diff of 'thread'.

I make sure to wrap the 'thread' clock measurements the most tightly
around the nanosleep() call, and that the 'process' clock measurements
are the outer-most ones.

---
#include
#include
#include
#include
#include
#include
#include
#include

static pthread_barrier_t barrier;

static void *chew_cpu(void *arg)
{
pthread_barrier_wait(&barrier);
while (1)
__asm__ __volatile__("" : : : "memory");
return NULL;
}

int main(void)
{
clockid_t process_clock, my_thread_clock, th_clock;
struct timespec process_before, process_after;
struct timespec me_before, me_after;
struct timespec th_before, th_after;
struct timespec sleeptime;
unsigned long diff;
pthread_t th;
int err;

err = clock_getcpuclockid(0, &process_clock);
if (err)
return 1;

err = pthread_getcpuclockid(pthread_self(), &my_thread_clock);
if (err)
return 1;

pthread_barrier_init(&barrier, NULL, 2);
err = pthread_create(&th, NULL, chew_cpu, NULL);
if (err)
return 1;

err = pthread_getcpuclockid(th, &th_clock);
if (err)
return 1;

pthread_barrier_wait(&barrier);

err = clock_gettime(process_clock, &process_before);
if (err)
return 1;

err = clock_gettime(my_thread_clock, &me_before);
if (err)
return 1;

err = clock_gettime(th_clock, &th_before);
if (err)
return 1;

sleeptime.tv_sec = 0;
sleeptime.tv_nsec = 500000000;
nanosleep(&sleeptime, NULL);

err = clock_gettime(th_clock, &th_after);
if (err)
return 1;

err = clock_gettime(my_thread_clock, &me_after);
if (err)
return 1;

err = clock_gettime(process_clock, &process_after);
if (err)
return 1;

diff = process_after.tv_nsec - process_before.tv_nsec;
printf("process: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
process_before.tv_sec, process_before.tv_nsec,
process_after.tv_sec, process_after.tv_nsec, diff);
diff = th_after.tv_nsec - th_before.tv_nsec;
printf("thread: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
th_before.tv_sec, th_before.tv_nsec,
th_after.tv_sec, th_after.tv_nsec, diff);
diff = me_after.tv_nsec - me_before.tv_nsec;
printf("self: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
me_before.tv_sec, me_before.tv_nsec,
me_after.tv_sec, me_after.tv_nsec, diff);

return 0;
}

This is due to us using p->se.sum_exec_runtime in
thread_group_cputime() where we iterate the thread group and sum all
data. This does not take time since the last schedule operation (tick
or otherwise) into account. We can cure this by using
task_sched_runtime() at the cost of having to take locks.

This also means we can (and must) do away with
thread_group_sched_runtime() since the modified thread_group_cputime()
is now more accurate and would deadlock when called from
thread_group_sched_runtime().

Aside of that it makes the function safe on 32 bit systems. The old
code added t->se.sum_exec_runtime unprotected. sum_exec_runtime is a
64bit value and could be changed on another cpu at the same time.

Reported-by: David Miller
Signed-off-by: Peter Zijlstra
Cc: stable@kernel.org
Link: http://lkml.kernel.org/r/1314874459.7945.22.camel@twins
Tested-by: David Miller
Signed-off-by: Thomas Gleixner

Peter Zijlstra
2011-09-30 20:07:06 +0800

13 Sep, 2011

1 commit

ee30a7b2f locking, sched: Annotate thread_group_cputimer as raw ... Browse Code »

The thread_group_cputimer lock can be taken in atomic context and therefore
cannot be preempted on -rt - annotate it.

In mainline this change documents the low level nature of
the lock - otherwise there's no functional difference. Lockdep
and Sparse checking will work as usual.

Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar

Thomas Gleixner
2011-09-13 17:11:55 +0800

23 May, 2011

1 commit

ab8177bc5 hrtimers: Avoid touching inactive timer bases ... Browse Code »

Instead of iterating over all possible timer bases avoid it by marking
the active bases in the cpu base.

Signed-off-by: Thomas Gleixner
Reviewed-by: Peter Zijlstra

Thomas Gleixner
2011-05-23 19:59:54 +0800

31 Mar, 2011

1 commit

25985edce Fix common misspellings ... Browse Code »

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi

Lucas De Marchi
2011-03-31 22:26:23 +0800

02 Feb, 2011

7 commits

527087374 posix-timers: Cleanup namespace ... Browse Code »

Rename register_posix_clock() to posix_timers_register_clock(). That's
what the function really does. As a side effect this cleans up the
posix_clock namespace for the upcoming dynamic posix_clock
infrastructure.

Signed-off-by: Thomas Gleixner
Tested-by: Richard Cochran
Cc: John Stultz
LKML-Reference:

Thomas Gleixner
2011-02-02 22:28:19 +0800
bc2c8ea48 posix-timers: Make posix-cpu-timers functions static ... Browse Code »

All functions are accessed via clock_posix_cpu now. So make them static.

Signed-off-by: Thomas Gleixner
Acked-by: John Stultz
Tested-by: Richard Cochran
LKML-Reference:

Thomas Gleixner
2011-02-02 22:28:17 +0800
26f9a4796 posix-timers: Convert clock_settime to clockid_to_kclock() ... Browse Code »

Use the new kclock decoding function in clock_settime and cleanup all
kclocks which use the default functions. Rename the misnomed
common_clock_set() to posix_clock_realtime_set().

Signed-off-by: Thomas Gleixner
Acked-by: John Stultz
Tested-by: Richard Cochran
LKML-Reference:

Thomas Gleixner
2011-02-02 22:28:14 +0800
79c9da0d0 posix-cpu-timers: Remove the stub nanosleep functions ... Browse Code »

CLOCK_THREAD_CPUTIME_ID implements stub functions for nanosleep and
nanosleep_restart, which return -EINVAL. That return value is
wrong. The correct return value is -ENOTSUP.

Remove the stubs and let the new dispatch code return the correct
error code.

Signed-off-by: Thomas Gleixner
Acked-by: John Stultz
Tested-by: Richard Cochran
LKML-Reference:

Thomas Gleixner
2011-02-02 22:28:14 +0800
3751f9f29 posix-timers: Cleanup restart_block usage ... Browse Code »

posix timers still use the legacy arg0-arg3 members of
restart_block. Use restart_block.nanosleep instead

Signed-off-by: Thomas Gleixner
Acked-by: John Stultz
Tested-by: Richard Cochran
LKML-Reference:

Thomas Gleixner
2011-02-02 22:28:13 +0800
1976945ee posix-timers: Introduce clock_posix_cpu ... Browse Code »

The CLOCK_DISPATCH() macro is a horrible magic. We call common
functions if a function pointer is not set. That's just backwards.

To support dynamic file decriptor based clocks we need to cleanup that
dispatch logic.

Create a k_clock struct clock_posix_cpu which has all the
posix-cpu-timer functions filled in. After the cleanup the functions
can be made static.

Signed-off-by: Thomas Gleixner
Acked-by: John Stultz
Tested-by: Richard Cochran
LKML-Reference:

Thomas Gleixner
2011-02-02 22:28:12 +0800
2fd1f0408 posix-timers: Cleanup struct initializers ... Browse Code »

Cosmetic. No functional change

Signed-off-by: Thomas Gleixner
Acked-by: John Stultz
Tested-by: Richard Cochran
LKML-Reference:

Thomas Gleixner
2011-02-02 22:28:12 +0800

10 Nov, 2010

1 commit

c0deae8c9 posix-cpu-timers: Rcu_read_lock/unlock protect find_task_by_vpid call ... Browse Code »

Commit 4221a9918e38b7494cee341dda7b7b4bb8c04bde "Add RCU check for
find_task_by_vpid()" introduced rcu_lockdep_assert to find_task_by_pid_ns.
Add rcu_read_lock/rcu_read_unlock to call find_task_by_vpid.

Tetsuo Handa wrote:
| Quoting from one of posts in that thead
| http://kerneltrap.org/mailarchive/linux-kernel/2010/2/8/4536388
|
|| Usually tasklist gives enough protection, but if copy_process() fails
|| it calls free_pid() lockless and does call_rcu(delayed_put_pid().
|| This means, without rcu lock find_pid_ns() can't scan the hash table
|| safely.

Thomas Gleixner wrote:
| We can remove the tasklist_lock while at it. rcu_read_lock is enough.

Patch also replaces thread_group_leader with has_group_leader_pid
in accordance to comment by Oleg Nesterov:

| ... thread_group_leader() check is not relaible without
| tasklist. If we race with de_thread() find_task_by_vpid() can find
| the new leader before it updates its ->group_leader.
|
| perhaps it makes sense to change posix_cpu_timer_create() to use
| has_group_leader_pid() instead, just to make this code not look racy
| and avoid adding new problems.

Signed-off-by: Sergey Senozhatsky
Cc: Peter Zijlstra
Cc: Stanislaw Gruszka
Reviewed-by: Oleg Nesterov
LKML-Reference:
Signed-off-by: Thomas Gleixner

Sergey Senozhatsky
2010-11-10 20:07:06 +0800

11 Aug, 2010

1 commit

b34d8915c Merge branch 'writable_limits' of git://decibel.fi.muni.cz/~xslaby/linux ... Browse Code »

* 'writable_limits' of git://decibel.fi.muni.cz/~xslaby/linux:
unistd: add __NR_prlimit64 syscall numbers
rlimits: implement prlimit64 syscall
rlimits: switch more rlimit syscalls to do_prlimit
rlimits: redo do_setrlimit to more generic do_prlimit
rlimits: add rlimit64 structure
rlimits: do security check under task_lock
rlimits: allow setrlimit to non-current tasks
rlimits: split sys_setrlimit
rlimits: selinux, do rlimits changes under task_lock
rlimits: make sure ->rlim_max never grows in sys_setrlimit
rlimits: add task_struct to update_rlimit_cpu
rlimits: security, add task_struct to setrlimit

Fix up various system call number conflicts. We not only added fanotify
system calls in the meantime, but asm-generic/unistd.h added a wait4
along with a range of reserved per-architecture system calls.

Linus Torvalds
2010-08-11 03:07:51 +0800

16 Jul, 2010

1 commit

5ab46b345 rlimits: add task_struct to update_rlimit_cpu ... Browse Code »

Add task_struct as a parameter to update_rlimit_cpu to be able to set
rlimit_cpu of different task than current.

Signed-off-by: Jiri Slaby
Acked-by: James Morris

Jiri Slaby
2010-07-16 15:48:45 +0800

18 Jun, 2010

3 commits

8d1f431cb sched: Fix the racy usage of thread_group_cputimer() in fastpath_timer_check() ... Browse Code »

fastpath_timer_check()->thread_group_cputimer() is racy and
unneeded.

It is racy because another thread can clear ->running before
thread_group_cputimer() takes cputimer->lock. In this case
thread_group_cputimer() will set ->running = true again and call
thread_group_cputime(). But since we do not hold tasklist or
siglock, we can race with fork/exit and copy the wrong results
into cputimer->cputime.

It is unneeded because if ->running == true we can just use
the numbers in cputimer->cputime we already have.

Change fastpath_timer_check() to copy cputimer->cputime into
the local variable under cputimer->lock. We do not re-check
->running under cputimer->lock, run_posix_cpu_timers() does
this check later.

Note: we can add more optimizations on top of this change.

Signed-off-by: Oleg Nesterov
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Oleg Nesterov
2010-06-18 16:46:57 +0800
0bdd2ed41 sched: run_posix_cpu_timers: Don't check ->exit_state, use lock_task_sighand() ... Browse Code »

run_posix_cpu_timers() doesn't work if current has already passed
exit_notify(). This was needed to prevent the races with do_wait().

Since ea6d290c ->signal is always valid and can't go away. We can
remove the "tsk->exit_state == 0" in fastpath_timer_check() and
convert run_posix_cpu_timers() to use lock_task_sighand().

Note: it makes sense to take group_leader's sighand instead, the
sub-thread still uses CPU after release_task(). But we need more
changes to do this.

Signed-off-by: Oleg Nesterov
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Oleg Nesterov
2010-06-18 16:46:57 +0800
bfac70091 sched: thread_group_cputime: Simplify, document the "alive" check ... Browse Code »

thread_group_cputime() looks as if it is rcu-safe, but in fact this
was wrong until ea6d290c which pins task->signal to task_struct.
It checks ->sighand != NULL under rcu, but this can't help if ->signal
can go away. Fortunately the caller either holds ->siglock, or it is
fastpath_timer_check() which uses current and checks exit_state == 0.

- Since ea6d290c commit tsk->signal is stable, we can read it first
and avoid the initialization from INIT_CPUTIME.

- Even if tsk->signal is always valid, we still have to check it
is safe to use next_thread() under rcu_read_lock(). Currently
the code checks ->sighand != NULL, change it to use pid_alive()
which is commonly used to ensure the task wasn't unhashed before
we take rcu_read_lock().

Add the comment to explain this check.

- Change the main loop to use the while_each_thread() helper.

Signed-off-by: Oleg Nesterov
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Oleg Nesterov
2010-06-18 16:46:56 +0800

28 May, 2010

1 commit

d30fda355 posix-cpu-timers: avoid "task->signal != NULL" checks ... Browse Code »

Preparation to make task->signal immutable, no functional changes.

posix-cpu-timers.c checks task->signal != NULL to ensure this task is
alive and didn't pass __exit_signal(). This is correct but we are going
to change the lifetime rules for ->signal and never reset this pointer.

Change the code to check ->sighand instead, it doesn't matter which
pointer we check under tasklist, they both are cleared simultaneously.

As Roland pointed out, some of these changes are not strictly needed and
probably it makes sense to revert them later, when ->signal will be pinned
to task_struct. But this patch tries to ensure the subsequent changes in
fork/exit can't make any visible impact on posix cpu timers.

Signed-off-by: Oleg Nesterov
Cc: Fenghua Yu
Acked-by: Roland McGrath
Cc: Stanislaw Gruszka
Cc: Tony Luck
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2010-05-28 00:12:46 +0800

10 May, 2010

2 commits

29f87b793 posix-cpu-timers: Optimize run_posix_cpu_timers() ... Browse Code »

We can optimize and simplify things taking into account signal->cputimer
is always running when we have configured any process wide cpu timer.

In check_process_timers(), we don't have to check if new updated value of
signal->cputime_expires is smaller, since we maintain new first expiration
time ({prof,virt,sched}_expires) in code flow and all other writes to
expiration cache are protected by sighand->siglock .

Signed-off-by: Stanislaw Gruszka
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Hidetoshi Seto
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Thomas Gleixner

Stanislaw Gruszka
2010-05-10 20:24:26 +0800
dbb6be6d5 Merge branch 'linus' into timers/core ... Browse Code »

Reason: Further posix_cpu_timer patches depend on mainline changes

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2010-05-10 20:20:42 +0800

27 Mar, 2010

1 commit

054319b5e Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kern… ... Browse Code »

…el/git/tip/linux-2.6-tip

* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
time: Fix accumulation bug triggered by long delay.
posix-cpu-timers: Reset expire cache when no timer is running
timer stats: Fix del_timer_sync() and try_to_del_timer_sync()
clockevents: Sanitize min_delta_ns adjustment and prevent overflows

Linus Torvalds
2010-03-27 06:10:38 +0800

13 Mar, 2010

2 commits

c28739375 cpu-timers: Avoid iterating over all threads in fastpath_timer_check() ... Browse Code »

Spread p->sighand->siglock locking scope to make sure that
fastpath_timer_check() never iterates over all threads. Without
locking there is small possibility that signal->cputimer will stop
running while we write values to signal->cputime_expires.

Calling thread_group_cputime() from fastpath_timer_check() is not only
bad because it is slow, also it is racy with __exit_signal() which can
lead to invalid signal->{s,u}time values.

Signed-off-by: Stanislaw Gruszka
Cc: Ingo Molnar
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Hidetoshi Seto
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Thomas Gleixner

Stanislaw Gruszka
2010-03-13 05:40:41 +0800
1f169f84d cpu-timers: Change SIGEV_NONE timer implementation ... Browse Code »

When user sets up a timer without associated signal and process does
not use any other cpu timers and does not exit, tsk->signal->cputimer
is enabled and running forever.

Avoid running the timer for no reason.

I used below program to check patch does not break current user space
visible behavior.

#include
#include
#include
#include
#include
#include
#include
#include

void consume_cpu(void)
{
int i = 0;
int count = 0;

for(i=0; i< 30; i++) {
consume_cpu();
memset(&spec, 0, sizeof(spec));
assert(timer_gettime(tid, &spec) == 0);
printf("%lu.%09lu\n",
(unsigned long) spec.it_value.tv_sec,
(unsigned long) spec.it_value.tv_nsec);
}

assert(timer_delete(tid) == 0);
return 0;
}

Signed-off-by: Stanislaw Gruszka
Cc: Ingo Molnar
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Hidetoshi Seto
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Thomas Gleixner

Stanislaw Gruszka
2010-03-13 05:40:40 +0800