22 Oct, 2008

1 commit


20 Oct, 2008

1 commit


18 Oct, 2008

1 commit


03 Oct, 2008

1 commit


24 Sep, 2008

9 commits

  • Cleanup. Imho makes the code much more understandable. At least this
    patch lessens both the source and compiled code.

    Signed-off-by: Oleg Nesterov
    Cc: mingo@elte.hu
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Oleg Nesterov
     
  • lock_timer() checks that the timer found by idr_find(timer_id) has ->it_id
    == timer_id. This buys nothing. This check can fail only if
    sys_timer_create() unlocked idr_lock after idr_get_new(), but didn't set
    ->it_id = new_timer_id yet. But in that case ->it_process == NULL so
    lock_timer() can't succeed anyway.

    Also remove a couple of unneeded typecasts.

    Note that with or without this patch we have a small problem.
    sys_timer_create() doesn't ensure that the result of setting (say)
    ->it_sigev_notify must be visible if lock_timer() succeeds.

    Signed-off-by: Oleg Nesterov
    Cc: mingo@elte.hu
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Oleg Nesterov
     
  • With the recent changes ->it_sigev_signo and ->it_sigev_value are only
    used in sys_timer_create(), kill them.

    Signed-off-by: Oleg Nesterov
    Cc: mingo@elte.hu
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Oleg Nesterov
     
  • Cleanup.

    - sys_timer_create() is big and complicated. The code above the "out:"
    label relies on the fact that "error" must be == 0. This is not very
    robust, make the code more explicit. Remove the unneeded initialization
    of error.

    - If idr_get_new() succeeds (as it normally should), we check the returned
    value twice. Move the "-EAGAIN" check under "if (error)".

    Signed-off-by: Oleg Nesterov
    Cc: mingo@elte.hu
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Oleg Nesterov
     
  • posix_timer_event() always populates timer->sigq with the same numbers,
    move this code into sys_timer_create().

    Note that with this patch we can kill it_sigev_signo and it_sigev_value.

    Signed-off-by: Oleg Nesterov
    Cc: mingo@elte.hu
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Oleg Nesterov
     
  • - Change the code to do rcu_read_lock() instead of taking tasklist_lock,
    it is safe to get_task_struct(p) if p was found under RCU.

    However, now we must not use process's sighand/signal, they may be NULL.
    We can use current->sighand/signal instead, this "process" must belong
    to the current's thread-group.

    - Factor out the common code for 2 "if (timer_event_spec)" branches, the
    !timer_event_spec case can use current too.

    - use spin_lock_irq() instead of _irqsave(), kill "flags".

    Signed-off-by: Oleg Nesterov
    Cc: mingo@elte.hu
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Oleg Nesterov
     
  • sys_timer_create() return -EINVAL if the target thread has PF_EXITING.

    This doesn't really make sense, the sub-thread can die right after unlock.
    And in fact, this is just wrong. Without SIGEV_THREAD_ID good_sigevent()
    returns ->group_leader, and it is very possible that the leader is already
    dead. This is OK, we shouldn't return the error in this case.

    Remove this check and the comment. Note that the "process" was found
    under tasklist_lock, it must have ->sighand != NULL.

    Also, remove a couple of unneeded initializations.

    Signed-off-by: Oleg Nesterov
    Cc: mingo@elte.hu
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Oleg Nesterov
     
  • Change the code to get/put timer->it_process regardless of
    SIGEV_THREAD_ID. This streamlines the create/destroy paths and allows us
    to simplify the usage of exit_itimers() in de_thread().

    Signed-off-by: Oleg Nesterov
    Cc: mingo@elte.hu
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Oleg Nesterov
     
  • posix_timer_event() drops SIGEV_THREAD_ID and switches to ->group_leader
    if send_sigqueue() fails.

    This is not very useful and doesn't work reliably. send_sigqueue() can
    only fail if ->it_process is dead. But it can die before it dequeues the
    SI_TIMER signal, in that case the timer stops anyway.

    Remove this code. I guess it was needed a long ago to ensure that the
    timer is not destroyed when when its creator thread dies.

    Q: perhaps it makes sense to change sys_timer_settime() to return an error
    if ->it_process is dead?

    Signed-off-by: Oleg Nesterov
    Cc: mingo@elte.hu
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Oleg Nesterov
     

06 Sep, 2008

1 commit


21 Aug, 2008

1 commit

  • In talking with Josip Loncaric, and his work on clock synchronization (see
    btime.sf.net), he mentioned that for really close synchronization, it is
    useful to have access to "hardware time", that is a notion of time that is
    not in any way adjusted by the clock slewing done to keep close time sync.

    Part of the issue is if we are using the kernel's ntp adjusted
    representation of time in order to measure how we should correct time, we
    can run into what Paul McKenney aptly described as "Painting a road using
    the lines we're painting as the guide".

    I had been thinking of a similar problem, and was trying to come up with a
    way to give users access to a purely hardware based time representation
    that avoided users having to know the underlying frequency and mask values
    needed to deal with the wide variety of possible underlying hardware
    counters.

    My solution is to introduce CLOCK_MONOTONIC_RAW. This exposes a
    nanosecond based time value, that increments starting at bootup and has no
    frequency adjustments made to it what so ever.

    The time is accessed from userspace via the posix_clock_gettime() syscall,
    passing CLOCK_MONOTONIC_RAW as the clock_id.

    Signed-off-by: John Stultz
    Signed-off-by: Roman Zippel
    Signed-off-by: Andrew Morton
    Signed-off-by: Ingo Molnar

    John Stultz
     

12 Aug, 2008

1 commit


26 Jul, 2008

2 commits

  • release_posix_timer() can't be called with ->it_process != NULL. Once
    sys_timer_create() sets ->it_process it must not call
    release_posix_timer(), otherwise we can race with another thread doing
    sys_timer_delete(), this timer is visible to idr_find() and unlocked.

    The same is true for two other callers (actually, for any possible
    caller), sys_timer_delete() and itimer_delete(). They must clear
    ->it_process before unlock_timer() + release_posix_timer().

    Signed-off-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Cc: john stultz
    Cc: Thomas Gleixner
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • sys_timer_delete() and itimer_delete() check "timer->it_process != NULL",
    this looks completely bogus. ->it_process == NULL means that this timer
    is already under destruction or it is not fully initialized, this must not
    happen.

    sys_timer_delete: the timer is locked, and lock_timer() can't succeed
    if ->it_process == NULL.

    itimer_delete: it is called by exit_itimers() when there are no other
    threads which can play with signal_struct->posix_timers.

    Signed-off-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Cc: john stultz
    Cc: Thomas Gleixner
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

24 Jul, 2008

2 commits

  • The bug was reported and analysed by Mark McLoughlin ,
    the patch is based on his and Roland's suggestions.

    posix_timer_event() always rewrites the pre-allocated siginfo before sending
    the signal. Most of the written info is the same all the time, but memset(0)
    is very wrong. If ->sigq is queued we can race with collect_signal() which
    can fail to find this siginfo looking at .si_signo, or copy_siginfo() can
    copy the wrong .si_code/si_tid/etc.

    In short, sys_timer_settime() can in fact stop the active timer, or the user
    can receive the siginfo with the wrong .si_xxx values.

    Move "memset(->info, 0)" from posix_timer_event() to alloc_posix_timer(),
    change send_sigqueue() to set .si_overrun = 0 when ->sigq is not queued.
    It would be nice to move the whole sigq->info initialization from send to
    create path, but this is not easy to do without uglifying timer_create()
    further.

    As Roland rightly pointed out, we need more cleanups/fixes here, see the
    "FIXME" comment in the patch. Hopefully this patch makes sense anyway, and
    it can mask the most bad implications.

    Reported-by: Mark McLoughlin
    Signed-off-by: Oleg Nesterov
    Cc: Mark McLoughlin
    Cc: Oliver Pinter
    Cc: Roland McGrath
    Cc: stable@kernel.org
    Cc: Andrew Morton
    Signed-off-by: Thomas Gleixner

    kernel/posix-timers.c | 17 +++++++++++++----
    kernel/signal.c | 1 +
    2 files changed, 14 insertions(+), 4 deletions(-)

    Oleg Nesterov
     
  • do_schedule_next_timer() sets info->si_overrun = timr->it_overrun_last,
    this discards the already accumulated overruns.

    Signed-off-by: Oleg Nesterov
    Cc: Mark McLoughlin
    Cc: Oliver Pinter
    Cc: Roland McGrath
    Cc: stable@kernel.org
    Cc: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Oleg Nesterov
     

30 Apr, 2008

1 commit

  • We export send_sigqueue() and send_group_sigqueue() for the only user,
    posix_timer_event(). This is a bit silly, because both are just trivial
    helpers on top of do_send_sigqueue() and because the we pass the unused
    .si_signo parameter.

    Kill them both, rename do_send_sigqueue() to send_sigqueue(), and export it.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

19 Apr, 2008

1 commit


15 Feb, 2008

1 commit

  • Various user space callers ask for relative timeouts. While we fixed
    that overflow issue in hrtimer_start(), the sites which convert
    relative user space values to absolute timeouts themself were uncovered.

    Instead of putting overflow checks into each place add a function
    which does the sanity checking and convert all affected callers to use
    it.

    Thanks to Frans Pop, who reported the problem and tested the fixes.

    Signed-off-by: Thomas Gleixner
    Acked-by: Ingo Molnar
    Tested-by: Frans Pop

    Thomas Gleixner
     

10 Feb, 2008

1 commit

  • Spotted by Pavel Emelyanov and Alexey Dobriyan.

    hrtimer_nanosleep() sets restart_block->arg1 = rmtp, but this rmtp points to
    the local variable which lives in the caller's stack frame. This means that
    if sys_restart_syscall() actually happens and it is interrupted as well, we
    don't update the user-space variable, but write into the already dead stack
    frame.

    Introduced by commit 04c227140fed77587432667a574b14736a06dd7f
    hrtimer: Rework hrtimer_nanosleep to make sys_compat_nanosleep easier

    Change the callers to pass "__user *rmtp" to hrtimer_nanosleep(), and change
    hrtimer_nanosleep() to use copy_to_user() to actually update *rmtp.

    Small problem remains. man 2 nanosleep states that *rtmp should be written if
    nanosleep() was interrupted (it says nothing whether it is OK to update *rmtp
    if nanosleep returns 0), but (with or without this patch) we can dirty *rem
    even if nanosleep() returns 0.

    NOTE: this patch doesn't change compat_sys_nanosleep(), because it has other
    bugs. Fixed by the next patch.

    Signed-off-by: Oleg Nesterov
    Cc: Alexey Dobriyan
    Cc: Michael Kerrisk
    Cc: Pavel Emelyanov
    Cc: Peter Zijlstra
    Cc: Toyo Abe
    Cc: Andrew Morton
    Signed-off-by: Thomas Gleixner

    include/linux/hrtimer.h | 2 -
    kernel/hrtimer.c | 51 +++++++++++++++++++++++++-----------------------
    kernel/posix-timers.c | 14 +------------
    3 files changed, 30 insertions(+), 37 deletions(-)

    Oleg Nesterov
     

09 Feb, 2008

1 commit

  • All the functions that need to lookup a task by pid in posix timers obtain
    this pid from a user space, and thus this value refers to a task in the same
    namespace, as the current task lives in.

    So the proper behavior is to call find_task_by_vpid() here.

    Signed-off-by: Pavel Emelyanov
    Cc: "Eric W. Biederman"
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     

06 Feb, 2008

1 commit

  • This is the new timerfd API as it is implemented by the following patch:

    int timerfd_create(int clockid, int flags);
    int timerfd_settime(int ufd, int flags,
    const struct itimerspec *utmr,
    struct itimerspec *otmr);
    int timerfd_gettime(int ufd, struct itimerspec *otmr);

    The timerfd_create() API creates an un-programmed timerfd fd. The "clockid"
    parameter can be either CLOCK_MONOTONIC or CLOCK_REALTIME.

    The timerfd_settime() API give new settings by the timerfd fd, by optionally
    retrieving the previous expiration time (in case the "otmr" parameter is not
    NULL).

    The time value specified in "utmr" is absolute, if the TFD_TIMER_ABSTIME bit
    is set in the "flags" parameter. Otherwise it's a relative time.

    The timerfd_gettime() API returns the next expiration time of the timer, or
    {0, 0} if the timerfd has not been set yet.

    Like the previous timerfd API implementation, read(2) and poll(2) are
    supported (with the same interface). Here's a simple test program I used to
    exercise the new timerfd APIs:

    http://www.xmailserver.org/timerfd-test2.c

    [akpm@linux-foundation.org: coding-style cleanups]
    [akpm@linux-foundation.org: fix ia64 build]
    [akpm@linux-foundation.org: fix m68k build]
    [akpm@linux-foundation.org: fix mips build]
    [akpm@linux-foundation.org: fix alpha, arm, blackfin, cris, m68k, s390, sparc and sparc64 builds]
    [heiko.carstens@de.ibm.com: fix s390]
    [akpm@linux-foundation.org: fix powerpc build]
    [akpm@linux-foundation.org: fix sparc64 more]
    Signed-off-by: Davide Libenzi
    Cc: Michael Kerrisk
    Cc: Thomas Gleixner
    Cc: Davide Libenzi
    Cc: Michael Kerrisk
    Cc: Martin Schwidefsky
    Signed-off-by: Heiko Carstens
    Cc: Michael Kerrisk
    Cc: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     

03 Feb, 2008

1 commit


20 Oct, 2007

1 commit

  • With pid namespaces this field is now dangerous to use explicitly, so hide
    it behind the helpers.

    Also the pid and pgrp fields o task_struct and signal_struct are to be
    deprecated. Unfortunately this patch cannot be sent right now as this
    leads to tons of warnings, so start isolating them, and deprecate later.

    Actually the p->tgid == pid has to be changed to has_group_leader_pid(),
    but Oleg pointed out that in case of posix cpu timers this is the same, and
    thread_group_leader() is more preferable.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Oleg Nesterov
    Cc: Sukadev Bhattiprolu
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     

19 Oct, 2007

1 commit


17 Oct, 2007

1 commit


15 Oct, 2007

1 commit


23 Aug, 2007

2 commits

  • sys_timer_create() sets ->it_process and unlocks ->siglock, then checks
    tmr->it_sigev_notify to define if get_task_struct() is needed.

    We already passed ->it_id to the caller, another thread can delete this timer
    and free its memory in between.

    As a minimal fix, move this code under ->siglock, sys_timer_delete() takes it
    too before calling release_posix_timer(). A proper serialization would be to
    take ->it_lock, we add a partly initialized timer on posix_timers_id, not
    good.

    Signed-off-by: Oleg Nesterov
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • timer_delete does:
    lock_timer();
    timer->it_process = NULL;
    unlock_timer();
    release_posix_timer();

    timer->it_process is checked in lock_timer() to prevent access to a
    timer, which is on the way to be deleted, but the check happens after
    idr_lock is dropped. This allows release_posix_timer() to delete the
    timer before the lock code can check the timer:

    CPU 0 CPU 1

    lock_timer();
    timer->it_process = NULL;
    unlock_timer();
    lock_timer()
    spin_lock(idr_lock);
    timer = idr_find();
    spin_lock(timer->lock);
    spin_unlock(idr_lock);
    release_posix_timer();
    spin_lock(idr_lock);
    idr_remove(timer);
    spin_unlock(idr_lock);
    free_timer(timer);
    if (timer->......)

    Change the locking to prevent this.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     

20 Jul, 2007

1 commit

  • Slab destructors were no longer supported after Christoph's
    c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
    BUGs for both slab and slub, and slob never supported them
    either.

    This rips out support for the dtor pointer from kmem_cache_create()
    completely and fixes up every single callsite in the kernel (there were
    about 224, not including the slab allocator definitions themselves,
    or the documentation references).

    Signed-off-by: Paul Mundt

    Paul Mundt
     

22 Jun, 2007

1 commit

  • posix-timers which deliver an ignored signal are currently rearmed in
    the timer softirq: This is necessary because the timer needs to be
    delivered again when SIG_IGN is removed. This is not a problem, when
    the interval is reasonable.

    With high resolution timers enabled one might arm a posix timer with a
    very small interval and ignore the signal. This might lead to a
    softirq starvation when the interval is so small that the timer is
    requeued onto the softirq pending list right away.

    This problem was pointed out by Jan Kiszka. Thanks Jan !

    The correct solution would be to stop the timer, when the signal is
    ignored and rearm it when SIG_IGN is removed. Unfortunately this
    requires modification in sigaction and involves non trivial sighand
    locking. It's too late in the release cycle for such a change.

    For now we just keep the timer running and enforce that the timer only
    fires every jiffie. This does not break anything as we keep the
    overrun counter correct. It adds a little inaccuracy to the
    timer_gettime() interface, but...

    The more complex change is necessary anyway to fix another short
    coming of the current implementation, which I discovered while looking
    at this problem: A pending signal is discarded when SIG_IGN is set. In
    case that a posixtimer signal is pending then it is discarded as well,
    but when SIG_IGN is removed later nothing rearms the timer. This is
    not new, it's that way since posix timers have been merged. So nothing
    to worry about right now.

    I have a working solution to fix all of this, but the impact is too
    large for both stable and 2.6.22. I'm going to send it out for review
    in the next days.

    This should go into 2.6.21.stable as well.

    Signed-off-by: Thomas Gleixner
    Acked-by: Ingo Molnar
    Cc: Jan Kiszka
    Cc: Ulrich Drepper
    Cc: Stable Team
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     

09 May, 2007

1 commit


17 Feb, 2007

2 commits

  • Implement high resolution timers on top of the hrtimers infrastructure and the
    clockevents / tick-management framework. This provides accurate timers for
    all hrtimer subsystem users.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar
    Cc: john stultz
    Cc: Roman Zippel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • - hrtimers did not use the hrtimer_restart enum and relied on the implict
    int representation. Fix the prototypes and the functions using the enums.
    - Use seperate name spaces for the enumerations
    - Convert hrtimer_restart macro to inline function
    - Add comments

    No functional changes.

    [akpm@osdl.org: fix input driver]
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar
    Cc: john stultz
    Cc: Roman Zippel
    Cc: Dmitry Torokhov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     

12 Feb, 2007

1 commit

  • Replace appropriate pairs of "kmem_cache_alloc()" + "memset(0)" with the
    corresponding "kmem_cache_zalloc()" call.

    Signed-off-by: Robert P. J. Day
    Cc: "Luck, Tony"
    Cc: Andi Kleen
    Cc: Roland McGrath
    Cc: James Bottomley
    Cc: Greg KH
    Acked-by: Joel Becker
    Cc: Steven Whitehouse
    Cc: Jan Kara
    Cc: Michael Halcrow
    Cc: "David S. Miller"
    Cc: Stephen Smalley
    Cc: James Morris
    Cc: Chris Wright
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robert P. J. Day
     

08 Dec, 2006

1 commit

  • Replace all uses of kmem_cache_t with struct kmem_cache.

    The patch was generated using the following script:

    #!/bin/sh
    #
    # Replace one string by another in all the kernel sources.
    #

    set -e

    for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
    quilt add $file
    sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
    mv /tmp/$$ $file
    quilt refresh
    done

    The script was run like this

    sh replace kmem_cache_t "struct kmem_cache"

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter