23 May, 2011

1 commit


02 Feb, 2011

1 commit


18 Sep, 2010

1 commit


06 Apr, 2009

1 commit

  • PI Futexes and their underlying rt_mutex cannot be left ownerless if
    there are pending waiters as this will break the PI boosting logic, so
    the standard requeue commands aren't sufficient. The new commands
    properly manage pi futex ownership by ensuring a futex with waiters
    has an owner at all times. This will allow glibc to properly handle
    pi mutexes with pthread_condvars.

    The approach taken here is to create two new futex op codes:

    FUTEX_WAIT_REQUEUE_PI:
    Tasks will use this op code to wait on a futex (such as a non-pi waitqueue)
    and wake after they have been requeued to a pi futex. Prior to returning to
    userspace, they will acquire this pi futex (and the underlying rt_mutex).

    futex_wait_requeue_pi() is the result of a high speed collision between
    futex_wait() and futex_lock_pi() (with the first part of futex_lock_pi() being
    done by futex_proxy_trylock_atomic() on behalf of the top_waiter).

    FUTEX_REQUEUE_PI (and FUTEX_CMP_REQUEUE_PI):
    This call must be used to wake tasks waiting with FUTEX_WAIT_REQUEUE_PI,
    regardless of how many tasks the caller intends to wake or requeue.
    pthread_cond_broadcast() should call this with nr_wake=1 and
    nr_requeue=INT_MAX. pthread_cond_signal() should call this with nr_wake=1 and
    nr_requeue=0. The reason being we need both callers to get the benefit of the
    futex_proxy_trylock_atomic() routine. futex_requeue() also enqueues the
    top_waiter on the rt_mutex via rt_mutex_start_proxy_lock().

    Signed-off-by: Darren Hart
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Thomas Gleixner

    Darren Hart
     

06 Sep, 2008

1 commit

  • with hrtimer poll/select, the signal restart data no longer is a single
    long representing a jiffies count, but it becomes a second/nanosecond pair
    that also needs to encode if there was a timeout at all or not.

    This patch adds a struct to the restart_block union for this purpose

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Arjan van de Ven

    Thomas Gleixner
     

30 Apr, 2008

3 commits

  • Change all the #ifdef TIF_RESTORE_SIGMASK conditionals in non-arch code to
    #ifdef HAVE_SET_RESTORE_SIGMASK. If arch code defines it first, the generic
    set_restore_sigmask() using TIF_RESTORE_SIGMASK is not defined.

    Signed-off-by: Roland McGrath
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: "Luck, Tony"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • Set TIF_SIGPENDING in set_restore_sigmask. This lets arch code take
    TIF_RESTORE_SIGMASK out of the set of bits that will be noticed on return to
    user mode. On some machines those bits are scarce, and we can free this
    unneeded one up for other uses.

    It is probably the case that TIF_SIGPENDING is always set anyway everywhere
    set_restore_sigmask() is used. But this is some cheap paranoia in case there
    is an arcane case where it might not be.

    Signed-off-by: Roland McGrath
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: "Luck, Tony"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • This adds the set_restore_sigmask() inline in and
    replaces every set_thread_flag(TIF_RESTORE_SIGMASK) with a call to it. No
    change, but abstracts the details of the flag protocol from all the calls.

    Signed-off-by: Roland McGrath
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: "Luck, Tony"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     

17 Apr, 2008

1 commit


02 Feb, 2008

1 commit

  • To allow the implementation of optimized rw-locks in user space, glibc
    needs a possibility to select waiters for wakeup depending on a bitset
    mask.

    This requires two new futex OPs: FUTEX_WAIT_BITS and FUTEX_WAKE_BITS
    These OPs are basically the same as FUTEX_WAIT and FUTEX_WAKE plus an
    additional argument - a bitset. Further the FUTEX_WAIT_BITS OP is
    expecting an absolute timeout value instead of the relative one, which
    is used for the FUTEX_WAIT OP.

    FUTEX_WAIT_BITS calls into the kernel with a bitset. The bitset is
    stored in the futex_q structure, which is used to enqueue the waiter
    into the hashed futex waitqueue.

    FUTEX_WAKE_BITS also calls into the kernel with a bitset. The wakeup
    function logically ANDs the bitset with the bitset stored in each
    waiters futex_q structure. If the result is zero (i.e. none of the set
    bits in the bitsets is matching), then the waiter is not woken up. If
    the result is not zero (i.e. one of the set bits in the bitsets is
    matching), then the waiter is woken.

    The bitset provided by the caller must be non zero. In case the
    provided bitset is zero the kernel returns EINVAL.

    Internaly the new OPs are only extensions to the existing FUTEX_WAIT
    and FUTEX_WAKE functions. The existing OPs hand a bitset with all bits
    set into the futex_wait() and futex_wake() functions.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

30 Jan, 2008

1 commit


05 Dec, 2007

1 commit

  • David Holmes found a bug in the -rt tree with respect to
    pthread_cond_timedwait. After trying his test program on the latest git
    from mainline, I found the bug was there too. The bug he was seeing
    that his test program showed, was that if one were to do a "Ctrl-Z" on a
    process that was in the pthread_cond_timedwait, and then did a "bg" on
    that process, it would return with a "-ETIMEDOUT" but early. That is,
    the timer would go off early.

    Looking into this, I found the source of the problem. And it is a rather
    nasty bug at that.

    Here's the relevant code from kernel/futex.c: (not in order in the file)

    [...]
    smlinkage long sys_futex(u32 __user *uaddr, int op, u32 val,
    struct timespec __user *utime, u32 __user *uaddr2,
    u32 val3)
    {
    struct timespec ts;
    ktime_t t, *tp = NULL;
    u32 val2 = 0;
    int cmd = op & FUTEX_CMD_MASK;

    if (utime && (cmd == FUTEX_WAIT || cmd == FUTEX_LOCK_PI)) {
    if (copy_from_user(&ts, utime, sizeof(ts)) != 0)
    return -EFAULT;
    if (!timespec_valid(&ts))
    return -EINVAL;

    t = timespec_to_ktime(ts);
    if (cmd == FUTEX_WAIT)
    t = ktime_add(ktime_get(), t);
    tp = &t;
    }
    [...]
    return do_futex(uaddr, op, val, tp, uaddr2, val2, val3);
    }

    [...]

    long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
    u32 __user *uaddr2, u32 val2, u32 val3)
    {
    int ret;
    int cmd = op & FUTEX_CMD_MASK;
    struct rw_semaphore *fshared = NULL;

    if (!(op & FUTEX_PRIVATE_FLAG))
    fshared = ¤t->mm->mmap_sem;

    switch (cmd) {
    case FUTEX_WAIT:
    ret = futex_wait(uaddr, fshared, val, timeout);

    [...]

    static int futex_wait(u32 __user *uaddr, struct rw_semaphore *fshared,
    u32 val, ktime_t *abs_time)
    {
    [...]
    struct restart_block *restart;
    restart = ¤t_thread_info()->restart_block;
    restart->fn = futex_wait_restart;
    restart->arg0 = (unsigned long)uaddr;
    restart->arg1 = (unsigned long)val;
    restart->arg2 = (unsigned long)abs_time;
    restart->arg3 = 0;
    if (fshared)
    restart->arg3 |= ARG3_SHARED;
    return -ERESTART_RESTARTBLOCK;
    [...]

    static long futex_wait_restart(struct restart_block *restart)
    {
    u32 __user *uaddr = (u32 __user *)restart->arg0;
    u32 val = (u32)restart->arg1;
    ktime_t *abs_time = (ktime_t *)restart->arg2;
    struct rw_semaphore *fshared = NULL;

    restart->fn = do_no_restart_syscall;
    if (restart->arg3 & ARG3_SHARED)
    fshared = ¤t->mm->mmap_sem;
    return (long)futex_wait(uaddr, fshared, val, abs_time);
    }

    So when the futex_wait is interrupt by a signal we break out of the
    hrtimer code and set up or return from signal. This code does not return
    back to userspace, so we set up a RESTARTBLOCK. The bug here is that we
    save the "abs_time" which is a pointer to the stack variable "ktime_t t"
    from sys_futex.

    This returns and unwinds the stack before we get to call our signal. On
    return from the signal we go to futex_wait_restart, where we update all
    the parameters for futex_wait and call it. But here we have a problem
    where abs_time is no longer valid.

    I verified this with print statements, and sure enough, what abs_time
    was set to ends up being garbage when we get to futex_wait_restart.

    The solution I did to solve this (with input from Linus Torvalds)
    was to add unions to the restart_block to allow system calls to
    use the restart with specific parameters. This way the futex code now
    saves the time in a 64bit value in the restart block instead of storing
    it on the stack.

    Note: I'm a bit nervious to add "linux/types.h" and use u32 and u64
    in thread_info.h, when there's a #ifdef __KERNEL__ just below that.
    Not sure what that is there for. If this turns out to be a problem, I've
    tested this with using "unsigned int" for u32 and "unsigned long long" for
    u64 and it worked just the same. I'm using u32 and u64 just to be
    consistent with what the futex code uses.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner
    Acked-by: Linus Torvalds

    Steven Rostedt
     

14 Nov, 2005

1 commit

  • Remove task_work structure, use the standard thread flags functions and use
    shifts in entry.S to test the thread flags. Add a few local labels to entry.S
    to allow gas to generate short jumps.

    Finally it changes a number of inline functions in thread_info.h to macros to
    delay the current_thread_info() usage, which requires on m68k a structure
    (task_struct) not yet defined at this point.

    Signed-off-by: Roman Zippel
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roman Zippel
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds