20 Oct, 2007

1 commit

  • With pid namespaces this field is now dangerous to use explicitly, so hide
    it behind the helpers.

    Also the pid and pgrp fields o task_struct and signal_struct are to be
    deprecated. Unfortunately this patch cannot be sent right now as this
    leads to tons of warnings, so start isolating them, and deprecate later.

    Actually the p->tgid == pid has to be changed to has_group_leader_pid(),
    but Oleg pointed out that in case of posix cpu timers this is the same, and
    thread_group_leader() is more preferable.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Oleg Nesterov
    Cc: Sukadev Bhattiprolu
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     

10 Jul, 2007

1 commit


09 May, 2007

1 commit

  • There are many places in the kernel where the construction like

    foo = list_entry(head->next, struct foo_struct, list);

    are used.
    The code might look more descriptive and neat if using the macro

    list_first_entry(head, type, member) \
    list_entry((head)->next, type, member)

    Here is the macro itself and the examples of its usage in the generic code.
    If it will turn out to be useful, I can prepare the set of patches to
    inject in into arch-specific code, drivers, networking, etc.

    Signed-off-by: Pavel Emelianov
    Signed-off-by: Kirill Korotaev
    Cc: Randy Dunlap
    Cc: Andi Kleen
    Cc: Zach Brown
    Cc: Davide Libenzi
    Cc: John McCutchan
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: john stultz
    Cc: Ram Pai
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelianov
     

17 Feb, 2007

1 commit

  • Use RCU to avoid the need to acquire tasklist_lock in the single-threaded
    case of clock_gettime(). It still acquires tasklist_lock when for a
    (potentially multithreaded) process. This change allows realtime
    applications to frequently monitor CPU consumption of individual tasks, as
    requested (and now deployed) by some off-list users.

    This has been in Ingo Molnar's -rt patchset since late 2005 with no
    problems reported, and tests successfully on 2.6.20-rc6, so I believe that
    it is long-since ready for mainline adoption.

    [paulmck@linux.vnet.ibm.com: fix exit()/posix_cpu_clock_get() race spotted by Oleg]
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: john stultz
    Cc: Roman Zippel
    Cc: Oleg Nesterov
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul E. McKenney
     

17 Oct, 2006

1 commit

  • The integer divisions in the timer accounting code can round the result
    down to 0. Adding 0 is without effect and the signal delivery stops.

    Clamp the division result to minimum 1 to avoid this.

    Problem was reported by Seongbae Park , who provided
    also an inital patch.

    Roland sayeth:

    I have had some more time to think about the problem, and to reproduce it
    using Toyo's test case. For the record, if my understanding of the problem
    is correct, this happens only in one very particular case. First, the
    expiry time has to be so soon that in cputime_t units (usually 1s/HZ ticks)
    it's < nthreads so the division yields zero. Second, it only affects each
    thread that is so new that its CPU time accumulation is zero so now+0 is
    still zero and ->it_*_expires winds up staying zero. For the VIRT and PROF
    clocks when cputime_t is tick granularity (or the SCHED clock on
    configurations where sched_clock's value only advances on clock ticks), this
    is not hard to arrange with new threads starting up and blocking before they
    accumulate a whole tick of CPU time. That's what happens in Toyo's test
    case.

    Note that in general it is fine for that division to round down to zero,
    and set each thread's expiry time to its "now" time. The problem only
    arises with thread's whose "now" value is still zero, so that now+0 winds up
    0 and is interpreted as "not set" instead of ">= now". So it would be a
    sufficient and more precise fix to just use max(ticks, 1) inside the loop
    when setting each it_*_expires value.

    But, it does no harm to round the division up to one and always advance
    every thread's expiry time. If the thread didn't already fire timers for
    the expiry time of "now", there is no expectation that it will do so before
    the next tick anyway. So I followed Thomas's patch in lifting the max out
    of the loops.

    This patch also covers the reload cases, which are harder to write a test
    for (and I didn't try). I've tested it with Toyo's case and it fixes that.

    [toyoa@mvista.com: fix: min_t -> max_t]
    Signed-off-by: Thomas Gleixner
    Cc: Ingo Molnar
    Signed-off-by: Roland McGrath
    Cc: Daniel Walker
    Cc: Toyo Abe
    Cc: john stultz
    Cc: Roman Zippel
    Cc: Seongbae Park
    Cc: Peter Mattis
    Cc: Rohit Seth
    Cc: Martin Bligh
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     

30 Sep, 2006

2 commits

  • When a posix_cpu_nsleep() sleep is interrupted by a signal more than twice, it
    incorrectly reports the sleep time remaining to the user. Because
    posix_cpu_nsleep() doesn't report back to the user when it's called from
    restart function due to the wrong flags handling.

    This patch, which applies after previous one, moves the nanosleep() function
    from posix_cpu_nsleep() to do_cpu_nanosleep() and cleans up the flags handling
    appropriately.

    Signed-off-by: Toyo Abe
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Toyo Abe
     
  • The clock_nanosleep() function does not return the time remaining when the
    sleep is interrupted by a signal.

    This patch creates a new call out, compat_clock_nanosleep_restart(), which
    handles returning the remaining time after a sleep is interrupted. This
    patch revives clock_nanosleep_restart(). It is now accessed via the new
    call out. The compat_clock_nanosleep_restart() is used for compatibility
    access.

    Since this is implemented in compatibility mode the normal path is
    virtually unaffected - no real performance impact.

    Signed-off-by: Toyo Abe
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Toyo Abe
     

18 Jun, 2006

3 commits

  • arm_timer() checks PF_EXITING to prevent BUG_ON(->exit_state)
    in run_posix_cpu_timers().

    However, for some reason it does so only for CPUCLOCK_PERTHREAD
    case (which is imho wrong).

    Also, this check is not reliable, PF_EXITING could be set on
    another cpu without any locks/barriers just after the check,
    so it can't prevent from attaching the timer to the exiting
    task.

    The previous patch makes this check unneeded.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • do_exit() clears ->it_##clock##_expires, but nothing prevents
    another cpu to attach the timer to exiting process after that.
    arm_timer() tries to protect against this race, but the check
    is racy.

    After exit_notify() does 'write_unlock_irq(&tasklist_lock)' and
    before do_exit() calls 'schedule() local timer interrupt can find
    tsk->exit_state != 0. If that state was EXIT_DEAD (or another cpu
    does sys_wait4) interrupted task has ->signal == NULL.

    At this moment exiting task has no pending cpu timers, they were
    cleanuped in __exit_signal()->posix_cpu_timers_exit{,_group}(),
    so we can just return from irq.

    John Stultz recently confirmed this bug, see

    http://marc.theaimsgroup.com/?l=linux-kernel&m=115015841413687

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • If the local timer interrupt happens just after do_exit() sets PF_EXITING
    (and before it clears ->it_xxx_expires) run_posix_cpu_timers() will call
    check_process_timers() with tasklist_lock + ->siglock held and

    check_process_timers:

    t = tsk;
    do {
    ....

    do {
    t = next_thread(t);
    } while (unlikely(t->flags & PF_EXITING));
    } while (t != tsk);

    the outer loop will never stop.

    Actually, the window is bigger. Another process can attach the timer
    after ->it_xxx_expires was cleared (see the next commit) and the 'if
    (PF_EXITING)' check in arm_timer() is racy (see the one after that).

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

11 Jan, 2006

2 commits


07 Jan, 2006

1 commit

  • I've spent the past 3 days digging into a glibc testsuite failure in
    current CVS, specifically libc/rt/tst-cputimer1.c The thr1 and thr2
    timers fire too early in the second pass of this test. The second
    pass is noteworthy because it makes use of intervals, whereas the
    first pass does not.

    All throughout the posix-cpu-timers.c code, the calculation of the
    process sched_time sum is implemented roughly as:

    unsigned long long sum;

    sum = tsk->signal->sched_time;
    t = tsk;
    do {
    sum += t->sched_time;
    t = next_thread(t);
    } while (t != tsk);

    In fact this is the exact scheme used by check_process_timers().

    In the case of check_process_timers(), current->sched_time has just
    been updated (via scheduler_tick(), which is invoked by
    update_process_times(), which subsequently invokes
    run_posix_cpu_timers()) So there is no special processing necessary
    wrt. that.

    In other contexts, we have to allot for the fact that tsk->sched_time
    might be a bit out of date if we are current. And the
    posix-cpu-timers.c code uses current_sched_time() to deal with that.

    Unfortunately it does so in an erroneous and inconsistent manner in
    one spot which is what results in the early timer firing.

    In cpu_clock_sample_group_locked(), it does this:

    cpu->sched = p->signal->sched_time;
    /* Add in each other live thread. */
    while ((t = next_thread(t)) != p) {
    cpu->sched += t->sched_time;
    }
    if (p->tgid == current->tgid) {
    /*
    * We're sampling ourselves, so include the
    * cycles not yet banked. We still omit
    * other threads running on other CPUs,
    * so the total can always be behind as
    * much as max(nthreads-1,ncpus) * (NSEC_PER_SEC/HZ).
    */
    cpu->sched += current_sched_time(current);
    } else {
    cpu->sched += p->sched_time;
    }

    The problem is the "p->tgid == current->tgid" test. If "p" is
    not current, and the tgids are the same, we will add the process
    t->sched_time twice into cpu->sched and omit "p"'s sched_time
    which is very very very wrong.

    posix-cpu-timers.c has a helper function, sched_ns(p) which takes care
    of this, so my fix is to use that here instead of this special tgid
    test.

    The fact that current can be one of the sub-threads of "p" points out
    that we could make things a little bit more accurate, perhaps by using
    sched_ns() on every thread we process in these loops. It also points
    out that we don't use the most accurate value for threads in the group
    actively running other cpus (and this is mentioned in the comment).

    But that is a future enhancement, and this fix here definitely makes
    sense.

    Signed-off-by: David S. Miller
    Signed-off-by: Linus Torvalds

    David S. Miller
     

29 Nov, 2005

1 commit


07 Nov, 2005

1 commit


31 Oct, 2005

1 commit


28 Oct, 2005

2 commits


27 Oct, 2005

2 commits


24 Oct, 2005

5 commits

  • This might be harmless, but looks like a race from code inspection (I
    was unable to trigger it). I must admit, I don't understand why we
    can't return TIMER_RETRY after 'spin_unlock(&p->sighand->siglock)'
    without doing bump_cpu_timer(), but this is what original code does.

    posix_cpu_timer_set:

    read_lock(&tasklist_lock);

    spin_lock(&p->sighand->siglock);
    list_del_init(&timer->it.cpu.entry);
    spin_unlock(&p->sighand->siglock);

    We are probaly deleting the timer from run_posix_cpu_timers's 'firing'
    local list_head while run_posix_cpu_timers() does list_for_each_safe.

    Various bad things can happen, for example we can just delete this timer
    so that list_for_each() will not notice it and run_posix_cpu_timers()
    will not reset '->firing' flag. In that case,

    ....

    if (timer->it.cpu.firing) {
    read_unlock(&tasklist_lock);
    timer->it.cpu.firing = -1;
    return TIMER_RETRY;
    }

    sys_timer_settime() goes to 'retry:', calls posix_cpu_timer_set() again,
    it returns TIMER_RETRY ...

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • No need to rebalance when task exited

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • do_exit() clears ->it_##clock##_expires, but nothing prevents
    another cpu to attach the timer to exiting process after that.

    After exit_notify() does 'write_unlock_irq(&tasklist_lock)' and
    before do_exit() calls 'schedule() local timer interrupt can find
    tsk->exit_state != 0. If that state was EXIT_DEAD (or another cpu
    does sys_wait4) interrupted task has ->signal == NULL.

    At this moment exiting task has no pending cpu timers, they were cleaned
    up in __exit_signal()->posix_cpu_timers_exit{,_group}(), so we can just
    return from irq.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • 1. cleanup_timers() sets timer->task = NULL under tasklist + ->sighand locks.
    That means that this code in posix_cpu_timer_del() and posix_cpu_timer_set()

    lock_timer(timer);
    if (timer->task == NULL)
    return;
    read_lock(tasklist);
    put_task_struct(timer->task)

    is racy. With this patch timer->task modified and accounted only under
    timer->it_lock. Sadly, this means that dead task_struct won't be freed
    until timer deleted or armed.

    2. run_posix_cpu_timers() collects expired timers into local list under
    tasklist + ->sighand again. That means that posix_cpu_timer_del()
    should check timer->it.cpu.firing under these locks too.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Bursty timers aren't good for anybody, very much including latency for
    other programs when we trigger lots of timers in interrupt context. So
    set a random limit, after which we'll handle the rest on the next timer
    tick.

    Noted by Oleg Nesterov

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

22 Oct, 2005

1 commit


20 Oct, 2005

1 commit

  • Oleg Nesterov reported an SMP deadlock. If there is a running timer
    tracking a different process's CPU time clock when the process owning
    the timer exits, we deadlock on tasklist_lock in posix_cpu_timer_del via
    exit_itimers.

    That code was using tasklist_lock to check for a race with __exit_signal
    being called on the timer-target task and clearing its ->signal.
    However, there is actually no such race. __exit_signal will have called
    posix_cpu_timers_exit and posix_cpu_timers_exit_group before it does
    that. Those will clear those k_itimer's association with the dying
    task, so posix_cpu_timer_del will return early and never reach the code
    in question.

    In addition, posix_cpu_timer_del called from exit_itimers during execve
    or directly from timer_delete in the process owning the timer can race
    with an exiting timer-target task to cause a double put on timer-target
    task struct. Make sure we always access cpu_timers lists with sighand
    lock held.

    Signed-off-by: Roland McGrath
    Signed-off-by: Chris Wright
    Signed-off-by: Linus Torvalds

    Roland McGrath
     

18 Oct, 2005

1 commit


17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds