03 Jul, 2013

5 commits

  • When a task exits, we perform a caching of the remaining cputime delta
    before expiring of its timers.

    This is done from the following places:

    * When the task is reaped. We iterate through its list of
    posix cpu timers and store the remaining timer delta to
    the timer struct instead of the absolute value.
    (See posix_cpu_timers_exit() / posix_cpu_timers_exit_group() )

    * When we call posix_cpu_timer_get() or posix_cpu_timer_schedule().
    If the timer's task is considered dying when watched from these
    places, the same conversion from absolute to relative expiry time
    is performed. Then the given task's reference is released.
    (See clear_dead_task() ).

    The relevance of this caching is questionable but this is another
    and deeper debate.

    The big issue here is that these two sources of caching don't mix
    up very well together.

    More specifically, the caching can easily be done twice, resulting
    in a wrong delta as it gets spuriously substracted a second time by
    the elapsed clock. This can happen in the following scenario:

    1) The task exits and gets reaped: we call posix_cpu_timers_exit()
    and the absolute timer expiry values are converted to a relative
    delta.

    2) timer_gettime() -> posix_cpu_timer_get() is called and relies on
    clear_dead_task() because tsk->exit_state == EXIT_DEAD.
    The delta gets substracted again by the elapsed clock and we return
    a wrong result.

    To fix this, just remove the caching done on task reaping time. It
    doesn't bring much value on its own. The caching done from
    posix_cpu_timer_get/schedule is enough.

    And it would also be hard to get it really right: we could make it put and
    clear the target task in the timer struct so that readers know if they are
    dealing with a relative cached of absolute value. But it would be racy.
    The only safe way to do it would be to lock the itimer->it_lock so that we
    know nobody reads the cputime expiry value while we modify it and its
    target task reference. Doing so would involve some funny workarounds to
    avoid circular lock against the sighand lock. There is just no reason to
    maintain this.

    The user visible effect of this patch can be observed by running the
    following code: it creates a subthread that launches a posix cputimer
    which expires after 10 seconds. But then the subthread only busy loops for 2
    seconds and exits. The parent reaps the subthread and read the timer value.
    Its expected value should the be the initial timer's expiration value
    minus the cputime elapsed in the subthread. Roughly 10 - 2 = 8 seconds:

    #include
    #include
    #include
    #include
    #include

    static timer_t id;
    static struct itimerspec val = { .it_value.tv_sec = 10, }, new;

    static void *thread(void *unused)
    {
    int err;
    struct timeval start, end, diff;

    timer_create(CLOCK_THREAD_CPUTIME_ID, NULL, &id);
    if (err < 0) {
    perror("Can't create timer\n");
    return NULL;
    }

    /* Arm 10 sec timer */
    err = timer_settime(id, 0, &val, NULL);
    if (err < 0) {
    perror("Can't set timer\n");
    return NULL;
    }

    /* Exit after 2 seconds of execution */
    gettimeofday(&start, NULL);
    do {
    gettimeofday(&end, NULL);
    timersub(&end, &start, &diff);
    } while (diff.tv_sec < 2);

    return NULL;
    }

    int main(int argc, char **argv)
    {
    pthread_t pthread;
    int err;

    err = pthread_create(&pthread, NULL, thread, NULL);
    if (err) {
    perror("Can't create thread\n");
    return -1;
    }
    pthread_join(pthread, NULL);
    /* Just wait a little bit to make sure the child got reaped */
    sleep(1);
    err = timer_gettime(id, &new);
    if (err)
    perror("Can't get timer value\n");
    printf("%d %ld\n", new.it_value.tv_sec, new.it_value.tv_nsec);

    return 0;
    }

    Before the patch:

    $ ./posix_cpu_timers
    6 2278074

    After the patch:

    $ ./posix_cpu_timers
    8 1158766

    Before the patch, the elapsed time got two more seconds spuriously accounted.

    Signed-off-by: Frederic Weisbecker
    Cc: Stanislaw Gruszka
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Oleg Nesterov
    Cc: KOSAKI Motohiro
    Cc: Olivier Langlois
    Signed-off-by: Andrew Morton

    Frederic Weisbecker
     
  • In order to re-arm a timer after it fired, we take a sample of the current
    process or thread cputime.

    If the task is dying though, we don't arm anything but we cache the
    remaining timer expiration delta for further reads.

    Something similar is performed in posix_cpu_timer_get() but here we forget
    to take the process wide cputime sample before caching it.

    As a result we are storing random stack content, leading every further
    reads of that timer to return junk values.

    Fix this by taking the appropriate sample in the case of process wide
    timers.

    This probably doesn't matter much in practice because, at this stage, the
    thread is the last one in the group and we reached exit_notify(). This
    implies that we called exit_itimers() and there should be no more timers
    to handle for that task.

    So this is likely dead code anyway but let's fix the current logic
    and the warning that came along:

    kernel/posix-cpu-timers.c: In function 'posix_cpu_timer_schedule':
    kernel/posix-cpu-timers.c:1127: warning: 'now' may be used uninitialized in this function

    Then we can start to think further about cleaning up that code.

    Reported-by: Andrew Morton
    Reported-by: Chen Gang
    Signed-off-by: Frederic Weisbecker
    Cc: Stanislaw Gruszka
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Oleg Nesterov
    Cc: Chen Gang
    Cc: KOSAKI Motohiro
    Cc: Olivier Langlois
    Signed-off-by: Andrew Morton

    Frederic Weisbecker
     
  • Consolidate the common code amongst per thread and per process timers list
    on tick time.

    List traversal, expiry check and subsequent updates can be shared in a
    common helper.

    Signed-off-by: Frederic Weisbecker
    Cc: Stanislaw Gruszka
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Oleg Nesterov
    Cc: KOSAKI Motohiro
    Cc: Olivier Langlois
    Signed-off-by: Andrew Morton

    Frederic Weisbecker
     
  • Cleaning up the posix cpu timers on task exit shares some common code
    among timer list types, most notably the list traversal and expiry time
    update.

    Unify this in a common helper.

    Signed-off-by: Frederic Weisbecker
    Cc: Stanislaw Gruszka
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Oleg Nesterov
    Cc: KOSAKI Motohiro
    Cc: Olivier Langlois
    Signed-off-by: Andrew Morton

    Frederic Weisbecker
     
  • The posix cpu timer expiry time is stored in a union of two types: a 64
    bits field if we rely on scheduler precise accounting, or a cputime_t if
    we rely on jiffies.

    This results in quite some duplicate code and special cases to handle the
    two types.

    Just unify this into a single 64 bits field. cputime_t can always fit
    into it.

    Signed-off-by: Frederic Weisbecker
    Cc: Stanislaw Gruszka
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Oleg Nesterov
    Cc: KOSAKI Motohiro
    Cc: Olivier Langlois
    Signed-off-by: Andrew Morton

    Frederic Weisbecker
     

23 Apr, 2013

1 commit

  • The test that checks if a CPU can stop its tick from posix CPU
    timers angle was mistakenly inverted.

    What we want is to prevent the tick from being stopped as long
    as the current CPU's task runs a posix CPU timer.

    Fix this.

    Signed-off-by: Frederic Weisbecker
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Oleg Nesterov
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     

19 Apr, 2013

2 commits

  • Bring a new helper that the full dynticks infrastructure can
    call in order to know if it can safely stop the tick from
    the posix cpu timers subsystem point of view.

    Signed-off-by: Frederic Weisbecker
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Oleg Nesterov
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     
  • Kick the full dynticks CPUs when a posix cpu timer is enqueued by
    way of a standard call to posix_cpu_timer_set() or set_process_cpu_timer().
    This also include rescheduled firing timers.

    This way they can re-evaluate the state of (and possibly restart) their
    tick against the new expiry modification.

    Signed-off-by: Frederic Weisbecker
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Oleg Nesterov
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     

20 Feb, 2013

1 commit

  • Pull timer changes from Ingo Molnar:
    "Main changes:

    - ntp: Add CONFIG_RTC_SYSTOHC: a generic RTC driver facility
    complementing the existing CONFIG_RTC_HCTOSYS, which uses NTP to
    keep the hardware clock updated.

    - posix-timers: Fix clock_adjtime to always return timex data on
    success. This is changing the ABI, but no breakage was expected
    and found - caution is warranted nevertheless.

    - platform persistent clock improvements/cleanups.

    - clockevents: refactor timer broadcast handling to be more generic
    and less duplicated with matching architecture code (mostly ARM
    motivated.)

    - various fixes and cleanups"

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    timers/x86/hpet: Use HPET_COUNTER to specify the hpet counter in vread_hpet()
    posix-cpu-timers: Fix nanosleep task_struct leak
    clockevents: Fix generic broadcast for FEAT_C3STOP
    time, Fix setting of hardware clock in NTP code
    hrtimer: Prevent hrtimer_enqueue_reprogram race
    clockevents: Add generic timer broadcast function
    clockevents: Add generic timer broadcast receiver
    timekeeping: Switch HAS_PERSISTENT_CLOCK to ALWAYS_USE_PERSISTENT_CLOCK
    x86/time/rtc: Don't print extended CMOS year when reading RTC
    x86: Select HAS_PERSISTENT_CLOCK on x86
    timekeeping: Add CONFIG_HAS_PERSISTENT_CLOCK option
    rtc: Skip the suspend/resume handling if persistent clock exist
    timekeeping: Add persistent_clock_exist flag
    posix-timers: Fix clock_adjtime to always return timex data on success
    Round the calculated scale factor in set_cyc2ns_scale()
    NTP: Add a CONFIG_RTC_SYSTOHC configuration
    MAINTAINERS: Update John Stultz's email
    time: create __getnstimeofday for WARNless calls

    Linus Torvalds
     

15 Feb, 2013

1 commit

  • The trinity fuzzer triggered a task_struct reference leak via
    clock_nanosleep with CPU_TIMERs. do_cpu_nanosleep() calls
    posic_cpu_timer_create(), but misses a corresponding
    posix_cpu_timer_del() which leads to the task_struct reference leak.

    Reported-and-tested-by: Tommi Rantala
    Signed-off-by: Stanislaw Gruszka
    Cc: Dave Jones
    Cc: John Stultz
    Cc: Oleg Nesterov
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/20130215100810.GF4392@redhat.com
    Signed-off-by: Thomas Gleixner

    Stanislaw Gruszka
     

28 Jan, 2013

1 commit

  • This is in preparation for the full dynticks feature. While
    remotely reading the cputime of a task running in a full
    dynticks CPU, we'll need to do some extra-computation. This
    way we can account the time it spent tickless in userspace
    since its last cputime snapshot.

    Signed-off-by: Frederic Weisbecker
    Cc: Andrew Morton
    Cc: Ingo Molnar
    Cc: Li Zhong
    Cc: Namhyung Kim
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     

20 Dec, 2012

1 commit


17 Dec, 2012

1 commit


29 Nov, 2012

1 commit


15 Dec, 2011

1 commit


26 Oct, 2011

1 commit

  • * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
    rtmutex: Add missing rcu_read_unlock() in debug_rt_mutex_print_deadlock()
    lockdep: Comment all warnings
    lib: atomic64: Change the type of local lock to raw_spinlock_t
    locking, lib/atomic64: Annotate atomic64_lock::lock as raw
    locking, x86, iommu: Annotate qi->q_lock as raw
    locking, x86, iommu: Annotate irq_2_ir_lock as raw
    locking, x86, iommu: Annotate iommu->register_lock as raw
    locking, dma, ipu: Annotate bank_lock as raw
    locking, ARM: Annotate low level hw locks as raw
    locking, drivers/dca: Annotate dca_lock as raw
    locking, powerpc: Annotate uic->lock as raw
    locking, x86: mce: Annotate cmci_discover_lock as raw
    locking, ACPI: Annotate c3_lock as raw
    locking, oprofile: Annotate oprofilefs lock as raw
    locking, video: Annotate vga console lock as raw
    locking, latencytop: Annotate latency_lock as raw
    locking, timer_stats: Annotate table_lock as raw
    locking, rwsem: Annotate inner lock as raw
    locking, semaphores: Annotate inner lock as raw
    locking, sched: Annotate thread_group_cputimer as raw
    ...

    Fix up conflicts in kernel/posix-cpu-timers.c manually: making
    cputimer->cputime a raw lock conflicted with the ABBA fix in commit
    bcd5cff7216f ("cputimer: Cure lock inversion").

    Linus Torvalds
     

18 Oct, 2011

1 commit

  • There's a lock inversion between the cputimer->lock and rq->lock;
    notably the two callchains involved are:

    update_rlimit_cpu()
    sighand->siglock
    set_process_cpu_timer()
    cpu_timer_sample_group()
    thread_group_cputimer()
    cputimer->lock
    thread_group_cputime()
    task_sched_runtime()
    ->pi_lock
    rq->lock

    scheduler_tick()
    rq->lock
    task_tick_fair()
    update_curr()
    account_group_exec()
    cputimer->lock

    Where the first one is enabling a CLOCK_PROCESS_CPUTIME_ID timer, and
    the second one is keeping up-to-date.

    This problem was introduced by e8abccb7193 ("posix-cpu-timers: Cure
    SMP accounting oddities").

    Cure the problem by removing the cputimer->lock and rq->lock nesting,
    this leaves concurrent enablers doing duplicate work, but the time
    wasted should be on the same order otherwise wasted spinning on the
    lock and the greater-than assignment filter should ensure we preserve
    monotonicity.

    Reported-by: Dave Jones
    Reported-by: Simon Kirby
    Signed-off-by: Peter Zijlstra
    Cc: stable@kernel.org
    Cc: Linus Torvalds
    Cc: Martin Schwidefsky
    Link: http://lkml.kernel.org/r/1318928713.21167.4.camel@twins
    Signed-off-by: Thomas Gleixner

    Peter Zijlstra
     

30 Sep, 2011

1 commit

  • David reported:

    Attached below is a watered-down version of rt/tst-cpuclock2.c from
    GLIBC. Just build it with "gcc -o test test.c -lpthread -lrt" or
    similar.

    Run it several times, and you will see cases where the main thread
    will measure a process clock difference before and after the nanosleep
    which is smaller than the cpu-burner thread's individual thread clock
    difference. This doesn't make any sense since the cpu-burner thread
    is part of the top-level process's thread group.

    I've reproduced this on both x86-64 and sparc64 (using both 32-bit and
    64-bit binaries).

    For example:

    [davem@boricha build-x86_64-linux]$ ./test
    process: before(0.001221967) after(0.498624371) diff(497402404)
    thread: before(0.000081692) after(0.498316431) diff(498234739)
    self: before(0.001223521) after(0.001240219) diff(16698)
    [davem@boricha build-x86_64-linux]$

    The diff of 'process' should always be >= the diff of 'thread'.

    I make sure to wrap the 'thread' clock measurements the most tightly
    around the nanosleep() call, and that the 'process' clock measurements
    are the outer-most ones.

    ---
    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include

    static pthread_barrier_t barrier;

    static void *chew_cpu(void *arg)
    {
    pthread_barrier_wait(&barrier);
    while (1)
    __asm__ __volatile__("" : : : "memory");
    return NULL;
    }

    int main(void)
    {
    clockid_t process_clock, my_thread_clock, th_clock;
    struct timespec process_before, process_after;
    struct timespec me_before, me_after;
    struct timespec th_before, th_after;
    struct timespec sleeptime;
    unsigned long diff;
    pthread_t th;
    int err;

    err = clock_getcpuclockid(0, &process_clock);
    if (err)
    return 1;

    err = pthread_getcpuclockid(pthread_self(), &my_thread_clock);
    if (err)
    return 1;

    pthread_barrier_init(&barrier, NULL, 2);
    err = pthread_create(&th, NULL, chew_cpu, NULL);
    if (err)
    return 1;

    err = pthread_getcpuclockid(th, &th_clock);
    if (err)
    return 1;

    pthread_barrier_wait(&barrier);

    err = clock_gettime(process_clock, &process_before);
    if (err)
    return 1;

    err = clock_gettime(my_thread_clock, &me_before);
    if (err)
    return 1;

    err = clock_gettime(th_clock, &th_before);
    if (err)
    return 1;

    sleeptime.tv_sec = 0;
    sleeptime.tv_nsec = 500000000;
    nanosleep(&sleeptime, NULL);

    err = clock_gettime(th_clock, &th_after);
    if (err)
    return 1;

    err = clock_gettime(my_thread_clock, &me_after);
    if (err)
    return 1;

    err = clock_gettime(process_clock, &process_after);
    if (err)
    return 1;

    diff = process_after.tv_nsec - process_before.tv_nsec;
    printf("process: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
    process_before.tv_sec, process_before.tv_nsec,
    process_after.tv_sec, process_after.tv_nsec, diff);
    diff = th_after.tv_nsec - th_before.tv_nsec;
    printf("thread: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
    th_before.tv_sec, th_before.tv_nsec,
    th_after.tv_sec, th_after.tv_nsec, diff);
    diff = me_after.tv_nsec - me_before.tv_nsec;
    printf("self: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
    me_before.tv_sec, me_before.tv_nsec,
    me_after.tv_sec, me_after.tv_nsec, diff);

    return 0;
    }

    This is due to us using p->se.sum_exec_runtime in
    thread_group_cputime() where we iterate the thread group and sum all
    data. This does not take time since the last schedule operation (tick
    or otherwise) into account. We can cure this by using
    task_sched_runtime() at the cost of having to take locks.

    This also means we can (and must) do away with
    thread_group_sched_runtime() since the modified thread_group_cputime()
    is now more accurate and would deadlock when called from
    thread_group_sched_runtime().

    Aside of that it makes the function safe on 32 bit systems. The old
    code added t->se.sum_exec_runtime unprotected. sum_exec_runtime is a
    64bit value and could be changed on another cpu at the same time.

    Reported-by: David Miller
    Signed-off-by: Peter Zijlstra
    Cc: stable@kernel.org
    Link: http://lkml.kernel.org/r/1314874459.7945.22.camel@twins
    Tested-by: David Miller
    Signed-off-by: Thomas Gleixner

    Peter Zijlstra
     

13 Sep, 2011

1 commit

  • The thread_group_cputimer lock can be taken in atomic context and therefore
    cannot be preempted on -rt - annotate it.

    In mainline this change documents the low level nature of
    the lock - otherwise there's no functional difference. Lockdep
    and Sparse checking will work as usual.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

23 May, 2011

1 commit


31 Mar, 2011

1 commit


02 Feb, 2011

7 commits


10 Nov, 2010

1 commit

  • Commit 4221a9918e38b7494cee341dda7b7b4bb8c04bde "Add RCU check for
    find_task_by_vpid()" introduced rcu_lockdep_assert to find_task_by_pid_ns.
    Add rcu_read_lock/rcu_read_unlock to call find_task_by_vpid.

    Tetsuo Handa wrote:
    | Quoting from one of posts in that thead
    | http://kerneltrap.org/mailarchive/linux-kernel/2010/2/8/4536388
    |
    || Usually tasklist gives enough protection, but if copy_process() fails
    || it calls free_pid() lockless and does call_rcu(delayed_put_pid().
    || This means, without rcu lock find_pid_ns() can't scan the hash table
    || safely.

    Thomas Gleixner wrote:
    | We can remove the tasklist_lock while at it. rcu_read_lock is enough.

    Patch also replaces thread_group_leader with has_group_leader_pid
    in accordance to comment by Oleg Nesterov:

    | ... thread_group_leader() check is not relaible without
    | tasklist. If we race with de_thread() find_task_by_vpid() can find
    | the new leader before it updates its ->group_leader.
    |
    | perhaps it makes sense to change posix_cpu_timer_create() to use
    | has_group_leader_pid() instead, just to make this code not look racy
    | and avoid adding new problems.

    Signed-off-by: Sergey Senozhatsky
    Cc: Peter Zijlstra
    Cc: Stanislaw Gruszka
    Reviewed-by: Oleg Nesterov
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Sergey Senozhatsky
     

11 Aug, 2010

1 commit

  • * 'writable_limits' of git://decibel.fi.muni.cz/~xslaby/linux:
    unistd: add __NR_prlimit64 syscall numbers
    rlimits: implement prlimit64 syscall
    rlimits: switch more rlimit syscalls to do_prlimit
    rlimits: redo do_setrlimit to more generic do_prlimit
    rlimits: add rlimit64 structure
    rlimits: do security check under task_lock
    rlimits: allow setrlimit to non-current tasks
    rlimits: split sys_setrlimit
    rlimits: selinux, do rlimits changes under task_lock
    rlimits: make sure ->rlim_max never grows in sys_setrlimit
    rlimits: add task_struct to update_rlimit_cpu
    rlimits: security, add task_struct to setrlimit

    Fix up various system call number conflicts. We not only added fanotify
    system calls in the meantime, but asm-generic/unistd.h added a wait4
    along with a range of reserved per-architecture system calls.

    Linus Torvalds
     

16 Jul, 2010

1 commit


18 Jun, 2010

3 commits

  • fastpath_timer_check()->thread_group_cputimer() is racy and
    unneeded.

    It is racy because another thread can clear ->running before
    thread_group_cputimer() takes cputimer->lock. In this case
    thread_group_cputimer() will set ->running = true again and call
    thread_group_cputime(). But since we do not hold tasklist or
    siglock, we can race with fork/exit and copy the wrong results
    into cputimer->cputime.

    It is unneeded because if ->running == true we can just use
    the numbers in cputimer->cputime we already have.

    Change fastpath_timer_check() to copy cputimer->cputime into
    the local variable under cputimer->lock. We do not re-check
    ->running under cputimer->lock, run_posix_cpu_timers() does
    this check later.

    Note: we can add more optimizations on top of this change.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     
  • run_posix_cpu_timers() doesn't work if current has already passed
    exit_notify(). This was needed to prevent the races with do_wait().

    Since ea6d290c ->signal is always valid and can't go away. We can
    remove the "tsk->exit_state == 0" in fastpath_timer_check() and
    convert run_posix_cpu_timers() to use lock_task_sighand().

    Note: it makes sense to take group_leader's sighand instead, the
    sub-thread still uses CPU after release_task(). But we need more
    changes to do this.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     
  • thread_group_cputime() looks as if it is rcu-safe, but in fact this
    was wrong until ea6d290c which pins task->signal to task_struct.
    It checks ->sighand != NULL under rcu, but this can't help if ->signal
    can go away. Fortunately the caller either holds ->siglock, or it is
    fastpath_timer_check() which uses current and checks exit_state == 0.

    - Since ea6d290c commit tsk->signal is stable, we can read it first
    and avoid the initialization from INIT_CPUTIME.

    - Even if tsk->signal is always valid, we still have to check it
    is safe to use next_thread() under rcu_read_lock(). Currently
    the code checks ->sighand != NULL, change it to use pid_alive()
    which is commonly used to ensure the task wasn't unhashed before
    we take rcu_read_lock().

    Add the comment to explain this check.

    - Change the main loop to use the while_each_thread() helper.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     

28 May, 2010

1 commit

  • Preparation to make task->signal immutable, no functional changes.

    posix-cpu-timers.c checks task->signal != NULL to ensure this task is
    alive and didn't pass __exit_signal(). This is correct but we are going
    to change the lifetime rules for ->signal and never reset this pointer.

    Change the code to check ->sighand instead, it doesn't matter which
    pointer we check under tasklist, they both are cleared simultaneously.

    As Roland pointed out, some of these changes are not strictly needed and
    probably it makes sense to revert them later, when ->signal will be pinned
    to task_struct. But this patch tries to ensure the subsequent changes in
    fork/exit can't make any visible impact on posix cpu timers.

    Signed-off-by: Oleg Nesterov
    Cc: Fenghua Yu
    Acked-by: Roland McGrath
    Cc: Stanislaw Gruszka
    Cc: Tony Luck
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

10 May, 2010

2 commits

  • We can optimize and simplify things taking into account signal->cputimer
    is always running when we have configured any process wide cpu timer.

    In check_process_timers(), we don't have to check if new updated value of
    signal->cputime_expires is smaller, since we maintain new first expiration
    time ({prof,virt,sched}_expires) in code flow and all other writes to
    expiration cache are protected by sighand->siglock .

    Signed-off-by: Stanislaw Gruszka
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Hidetoshi Seto
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Stanislaw Gruszka
     
  • Reason: Further posix_cpu_timer patches depend on mainline changes

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

27 Mar, 2010

1 commit


13 Mar, 2010

2 commits

  • Spread p->sighand->siglock locking scope to make sure that
    fastpath_timer_check() never iterates over all threads. Without
    locking there is small possibility that signal->cputimer will stop
    running while we write values to signal->cputime_expires.

    Calling thread_group_cputime() from fastpath_timer_check() is not only
    bad because it is slow, also it is racy with __exit_signal() which can
    lead to invalid signal->{s,u}time values.

    Signed-off-by: Stanislaw Gruszka
    Cc: Ingo Molnar
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Hidetoshi Seto
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Stanislaw Gruszka
     
  • When user sets up a timer without associated signal and process does
    not use any other cpu timers and does not exit, tsk->signal->cputimer
    is enabled and running forever.

    Avoid running the timer for no reason.

    I used below program to check patch does not break current user space
    visible behavior.

    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include

    void consume_cpu(void)
    {
    int i = 0;
    int count = 0;

    for(i=0; i< 30; i++) {
    consume_cpu();
    memset(&spec, 0, sizeof(spec));
    assert(timer_gettime(tid, &spec) == 0);
    printf("%lu.%09lu\n",
    (unsigned long) spec.it_value.tv_sec,
    (unsigned long) spec.it_value.tv_nsec);
    }

    assert(timer_delete(tid) == 0);
    return 0;
    }

    Signed-off-by: Stanislaw Gruszka
    Cc: Ingo Molnar
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Hidetoshi Seto
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Stanislaw Gruszka