03 Oct, 2008

1 commit


12 Aug, 2008

1 commit


26 Jul, 2008

2 commits

  • release_posix_timer() can't be called with ->it_process != NULL. Once
    sys_timer_create() sets ->it_process it must not call
    release_posix_timer(), otherwise we can race with another thread doing
    sys_timer_delete(), this timer is visible to idr_find() and unlocked.

    The same is true for two other callers (actually, for any possible
    caller), sys_timer_delete() and itimer_delete(). They must clear
    ->it_process before unlock_timer() + release_posix_timer().

    Signed-off-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Cc: john stultz
    Cc: Thomas Gleixner
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • sys_timer_delete() and itimer_delete() check "timer->it_process != NULL",
    this looks completely bogus. ->it_process == NULL means that this timer
    is already under destruction or it is not fully initialized, this must not
    happen.

    sys_timer_delete: the timer is locked, and lock_timer() can't succeed
    if ->it_process == NULL.

    itimer_delete: it is called by exit_itimers() when there are no other
    threads which can play with signal_struct->posix_timers.

    Signed-off-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Cc: john stultz
    Cc: Thomas Gleixner
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

24 Jul, 2008

2 commits

  • The bug was reported and analysed by Mark McLoughlin ,
    the patch is based on his and Roland's suggestions.

    posix_timer_event() always rewrites the pre-allocated siginfo before sending
    the signal. Most of the written info is the same all the time, but memset(0)
    is very wrong. If ->sigq is queued we can race with collect_signal() which
    can fail to find this siginfo looking at .si_signo, or copy_siginfo() can
    copy the wrong .si_code/si_tid/etc.

    In short, sys_timer_settime() can in fact stop the active timer, or the user
    can receive the siginfo with the wrong .si_xxx values.

    Move "memset(->info, 0)" from posix_timer_event() to alloc_posix_timer(),
    change send_sigqueue() to set .si_overrun = 0 when ->sigq is not queued.
    It would be nice to move the whole sigq->info initialization from send to
    create path, but this is not easy to do without uglifying timer_create()
    further.

    As Roland rightly pointed out, we need more cleanups/fixes here, see the
    "FIXME" comment in the patch. Hopefully this patch makes sense anyway, and
    it can mask the most bad implications.

    Reported-by: Mark McLoughlin
    Signed-off-by: Oleg Nesterov
    Cc: Mark McLoughlin
    Cc: Oliver Pinter
    Cc: Roland McGrath
    Cc: stable@kernel.org
    Cc: Andrew Morton
    Signed-off-by: Thomas Gleixner

    kernel/posix-timers.c | 17 +++++++++++++----
    kernel/signal.c | 1 +
    2 files changed, 14 insertions(+), 4 deletions(-)

    Oleg Nesterov
     
  • do_schedule_next_timer() sets info->si_overrun = timr->it_overrun_last,
    this discards the already accumulated overruns.

    Signed-off-by: Oleg Nesterov
    Cc: Mark McLoughlin
    Cc: Oliver Pinter
    Cc: Roland McGrath
    Cc: stable@kernel.org
    Cc: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Oleg Nesterov
     

30 Apr, 2008

1 commit

  • We export send_sigqueue() and send_group_sigqueue() for the only user,
    posix_timer_event(). This is a bit silly, because both are just trivial
    helpers on top of do_send_sigqueue() and because the we pass the unused
    .si_signo parameter.

    Kill them both, rename do_send_sigqueue() to send_sigqueue(), and export it.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

19 Apr, 2008

1 commit


15 Feb, 2008

1 commit

  • Various user space callers ask for relative timeouts. While we fixed
    that overflow issue in hrtimer_start(), the sites which convert
    relative user space values to absolute timeouts themself were uncovered.

    Instead of putting overflow checks into each place add a function
    which does the sanity checking and convert all affected callers to use
    it.

    Thanks to Frans Pop, who reported the problem and tested the fixes.

    Signed-off-by: Thomas Gleixner
    Acked-by: Ingo Molnar
    Tested-by: Frans Pop

    Thomas Gleixner
     

10 Feb, 2008

1 commit

  • Spotted by Pavel Emelyanov and Alexey Dobriyan.

    hrtimer_nanosleep() sets restart_block->arg1 = rmtp, but this rmtp points to
    the local variable which lives in the caller's stack frame. This means that
    if sys_restart_syscall() actually happens and it is interrupted as well, we
    don't update the user-space variable, but write into the already dead stack
    frame.

    Introduced by commit 04c227140fed77587432667a574b14736a06dd7f
    hrtimer: Rework hrtimer_nanosleep to make sys_compat_nanosleep easier

    Change the callers to pass "__user *rmtp" to hrtimer_nanosleep(), and change
    hrtimer_nanosleep() to use copy_to_user() to actually update *rmtp.

    Small problem remains. man 2 nanosleep states that *rtmp should be written if
    nanosleep() was interrupted (it says nothing whether it is OK to update *rmtp
    if nanosleep returns 0), but (with or without this patch) we can dirty *rem
    even if nanosleep() returns 0.

    NOTE: this patch doesn't change compat_sys_nanosleep(), because it has other
    bugs. Fixed by the next patch.

    Signed-off-by: Oleg Nesterov
    Cc: Alexey Dobriyan
    Cc: Michael Kerrisk
    Cc: Pavel Emelyanov
    Cc: Peter Zijlstra
    Cc: Toyo Abe
    Cc: Andrew Morton
    Signed-off-by: Thomas Gleixner

    include/linux/hrtimer.h | 2 -
    kernel/hrtimer.c | 51 +++++++++++++++++++++++++-----------------------
    kernel/posix-timers.c | 14 +------------
    3 files changed, 30 insertions(+), 37 deletions(-)

    Oleg Nesterov
     

09 Feb, 2008

1 commit

  • All the functions that need to lookup a task by pid in posix timers obtain
    this pid from a user space, and thus this value refers to a task in the same
    namespace, as the current task lives in.

    So the proper behavior is to call find_task_by_vpid() here.

    Signed-off-by: Pavel Emelyanov
    Cc: "Eric W. Biederman"
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     

06 Feb, 2008

1 commit

  • This is the new timerfd API as it is implemented by the following patch:

    int timerfd_create(int clockid, int flags);
    int timerfd_settime(int ufd, int flags,
    const struct itimerspec *utmr,
    struct itimerspec *otmr);
    int timerfd_gettime(int ufd, struct itimerspec *otmr);

    The timerfd_create() API creates an un-programmed timerfd fd. The "clockid"
    parameter can be either CLOCK_MONOTONIC or CLOCK_REALTIME.

    The timerfd_settime() API give new settings by the timerfd fd, by optionally
    retrieving the previous expiration time (in case the "otmr" parameter is not
    NULL).

    The time value specified in "utmr" is absolute, if the TFD_TIMER_ABSTIME bit
    is set in the "flags" parameter. Otherwise it's a relative time.

    The timerfd_gettime() API returns the next expiration time of the timer, or
    {0, 0} if the timerfd has not been set yet.

    Like the previous timerfd API implementation, read(2) and poll(2) are
    supported (with the same interface). Here's a simple test program I used to
    exercise the new timerfd APIs:

    http://www.xmailserver.org/timerfd-test2.c

    [akpm@linux-foundation.org: coding-style cleanups]
    [akpm@linux-foundation.org: fix ia64 build]
    [akpm@linux-foundation.org: fix m68k build]
    [akpm@linux-foundation.org: fix mips build]
    [akpm@linux-foundation.org: fix alpha, arm, blackfin, cris, m68k, s390, sparc and sparc64 builds]
    [heiko.carstens@de.ibm.com: fix s390]
    [akpm@linux-foundation.org: fix powerpc build]
    [akpm@linux-foundation.org: fix sparc64 more]
    Signed-off-by: Davide Libenzi
    Cc: Michael Kerrisk
    Cc: Thomas Gleixner
    Cc: Davide Libenzi
    Cc: Michael Kerrisk
    Cc: Martin Schwidefsky
    Signed-off-by: Heiko Carstens
    Cc: Michael Kerrisk
    Cc: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     

03 Feb, 2008

1 commit


20 Oct, 2007

1 commit

  • With pid namespaces this field is now dangerous to use explicitly, so hide
    it behind the helpers.

    Also the pid and pgrp fields o task_struct and signal_struct are to be
    deprecated. Unfortunately this patch cannot be sent right now as this
    leads to tons of warnings, so start isolating them, and deprecate later.

    Actually the p->tgid == pid has to be changed to has_group_leader_pid(),
    but Oleg pointed out that in case of posix cpu timers this is the same, and
    thread_group_leader() is more preferable.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Oleg Nesterov
    Cc: Sukadev Bhattiprolu
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     

19 Oct, 2007

1 commit


17 Oct, 2007

1 commit


15 Oct, 2007

1 commit


23 Aug, 2007

2 commits

  • sys_timer_create() sets ->it_process and unlocks ->siglock, then checks
    tmr->it_sigev_notify to define if get_task_struct() is needed.

    We already passed ->it_id to the caller, another thread can delete this timer
    and free its memory in between.

    As a minimal fix, move this code under ->siglock, sys_timer_delete() takes it
    too before calling release_posix_timer(). A proper serialization would be to
    take ->it_lock, we add a partly initialized timer on posix_timers_id, not
    good.

    Signed-off-by: Oleg Nesterov
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • timer_delete does:
    lock_timer();
    timer->it_process = NULL;
    unlock_timer();
    release_posix_timer();

    timer->it_process is checked in lock_timer() to prevent access to a
    timer, which is on the way to be deleted, but the check happens after
    idr_lock is dropped. This allows release_posix_timer() to delete the
    timer before the lock code can check the timer:

    CPU 0 CPU 1

    lock_timer();
    timer->it_process = NULL;
    unlock_timer();
    lock_timer()
    spin_lock(idr_lock);
    timer = idr_find();
    spin_lock(timer->lock);
    spin_unlock(idr_lock);
    release_posix_timer();
    spin_lock(idr_lock);
    idr_remove(timer);
    spin_unlock(idr_lock);
    free_timer(timer);
    if (timer->......)

    Change the locking to prevent this.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     

20 Jul, 2007

1 commit

  • Slab destructors were no longer supported after Christoph's
    c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
    BUGs for both slab and slub, and slob never supported them
    either.

    This rips out support for the dtor pointer from kmem_cache_create()
    completely and fixes up every single callsite in the kernel (there were
    about 224, not including the slab allocator definitions themselves,
    or the documentation references).

    Signed-off-by: Paul Mundt

    Paul Mundt
     

22 Jun, 2007

1 commit

  • posix-timers which deliver an ignored signal are currently rearmed in
    the timer softirq: This is necessary because the timer needs to be
    delivered again when SIG_IGN is removed. This is not a problem, when
    the interval is reasonable.

    With high resolution timers enabled one might arm a posix timer with a
    very small interval and ignore the signal. This might lead to a
    softirq starvation when the interval is so small that the timer is
    requeued onto the softirq pending list right away.

    This problem was pointed out by Jan Kiszka. Thanks Jan !

    The correct solution would be to stop the timer, when the signal is
    ignored and rearm it when SIG_IGN is removed. Unfortunately this
    requires modification in sigaction and involves non trivial sighand
    locking. It's too late in the release cycle for such a change.

    For now we just keep the timer running and enforce that the timer only
    fires every jiffie. This does not break anything as we keep the
    overrun counter correct. It adds a little inaccuracy to the
    timer_gettime() interface, but...

    The more complex change is necessary anyway to fix another short
    coming of the current implementation, which I discovered while looking
    at this problem: A pending signal is discarded when SIG_IGN is set. In
    case that a posixtimer signal is pending then it is discarded as well,
    but when SIG_IGN is removed later nothing rearms the timer. This is
    not new, it's that way since posix timers have been merged. So nothing
    to worry about right now.

    I have a working solution to fix all of this, but the impact is too
    large for both stable and 2.6.22. I'm going to send it out for review
    in the next days.

    This should go into 2.6.21.stable as well.

    Signed-off-by: Thomas Gleixner
    Acked-by: Ingo Molnar
    Cc: Jan Kiszka
    Cc: Ulrich Drepper
    Cc: Stable Team
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     

09 May, 2007

1 commit


17 Feb, 2007

2 commits

  • Implement high resolution timers on top of the hrtimers infrastructure and the
    clockevents / tick-management framework. This provides accurate timers for
    all hrtimer subsystem users.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar
    Cc: john stultz
    Cc: Roman Zippel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • - hrtimers did not use the hrtimer_restart enum and relied on the implict
    int representation. Fix the prototypes and the functions using the enums.
    - Use seperate name spaces for the enumerations
    - Convert hrtimer_restart macro to inline function
    - Add comments

    No functional changes.

    [akpm@osdl.org: fix input driver]
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar
    Cc: john stultz
    Cc: Roman Zippel
    Cc: Dmitry Torokhov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     

12 Feb, 2007

1 commit

  • Replace appropriate pairs of "kmem_cache_alloc()" + "memset(0)" with the
    corresponding "kmem_cache_zalloc()" call.

    Signed-off-by: Robert P. J. Day
    Cc: "Luck, Tony"
    Cc: Andi Kleen
    Cc: Roland McGrath
    Cc: James Bottomley
    Cc: Greg KH
    Acked-by: Joel Becker
    Cc: Steven Whitehouse
    Cc: Jan Kara
    Cc: Michael Halcrow
    Cc: "David S. Miller"
    Cc: Stephen Smalley
    Cc: James Morris
    Cc: Chris Wright
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robert P. J. Day
     

08 Dec, 2006

1 commit

  • Replace all uses of kmem_cache_t with struct kmem_cache.

    The patch was generated using the following script:

    #!/bin/sh
    #
    # Replace one string by another in all the kernel sources.
    #

    set -e

    for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
    quilt add $file
    sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
    mv /tmp/$$ $file
    quilt refresh
    done

    The script was run like this

    sh replace kmem_cache_t "struct kmem_cache"

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

04 Oct, 2006

1 commit


30 Sep, 2006

1 commit

  • The clock_nanosleep() function does not return the time remaining when the
    sleep is interrupted by a signal.

    This patch creates a new call out, compat_clock_nanosleep_restart(), which
    handles returning the remaining time after a sleep is interrupted. This
    patch revives clock_nanosleep_restart(). It is now accessed via the new
    call out. The compat_clock_nanosleep_restart() is used for compatibility
    access.

    Since this is implemented in compatibility mode the normal path is
    virtually unaffected - no real performance impact.

    Signed-off-by: Toyo Abe
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Toyo Abe
     

27 Mar, 2006

3 commits


23 Mar, 2006

1 commit

  • Semaphore to mutex conversion.

    The conversion was generated via scripts, and the result was validated
    automatically via a script as well.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     

17 Mar, 2006

1 commit


02 Feb, 2006

4 commits


15 Jan, 2006

1 commit


11 Jan, 2006

2 commits