16 Jan, 2011

1 commit

  • …linus' and 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

    * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    rcu: avoid pointless blocked-task warnings
    rcu: demote SRCU_SYNCHRONIZE_DELAY from kernel-parameter status
    rtmutex: Fix comment about why new_owner can be NULL in wake_futex_pi()

    * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86, olpc: Add missing Kconfig dependencies
    x86, mrst: Set correct APB timer IRQ affinity for secondary cpu
    x86: tsc: Fix calibration refinement conditionals to avoid divide by zero
    x86, ia64, acpi: Clean up x86-ism in drivers/acpi/numa.c

    * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    timekeeping: Make local variables static
    time: Rename misnamed minsec argument of clocks_calc_mult_shift()

    * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    tracing: Remove syscall_exit_fields
    tracing: Only process module tracepoints once
    perf record: Add "nodelay" mode, disabled by default
    perf sched: Fix list of events, dropping unsupported ':r' modifier
    Revert "perf tools: Emit clearer message for sys_perf_event_open ENOENT return"
    perf top: Fix annotate segv
    perf evsel: Fix order of event list deletion

    Linus Torvalds
     

14 Jan, 2011

1 commit

  • Futex code is smarter than most other gup_fast O_DIRECT code and knows
    about the compound internals. However now doing a put_page(head_page)
    will not release the pin on the tail page taken by gup-fast, leading to
    all sort of refcounting bugchecks. Getting a stable head_page is a little
    tricky.

    page_head = page is there because if this is not a tail page it's also the
    page_head. Only in case this is a tail page, compound_head is called,
    otherwise it's guaranteed unnecessary. And if it's a tail page
    compound_head has to run atomically inside irq disabled section
    __get_user_pages_fast before returning. Otherwise ->first_page won't be a
    stable pointer.

    Disableing irq before __get_user_page_fast and releasing irq after running
    compound_head is needed because if __get_user_page_fast returns == 1, it
    means the huge pmd is established and cannot go away from under us.
    pmdp_splitting_flush_notify in __split_huge_page_splitting will have to
    wait for local_irq_enable before the IPI delivery can return. This means
    __split_huge_page_refcount can't be running from under us, and in turn
    when we run compound_head(page) we're not reading a dangling pointer from
    tailpage->first_page. Then after we get to stable head page, we are
    always safe to call compound_lock and after taking the compound lock on
    head page we can finally re-check if the page returned by gup-fast is
    still a tail page. in which case we're set and we didn't need to split
    the hugepage in order to take a futex on it.

    Signed-off-by: Andrea Arcangeli
    Acked-by: Mel Gorman
    Acked-by: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

11 Jan, 2011

1 commit

  • The comment about why rt_mutex_next_owner() can return NULL in
    wake_futex_pi() is not the normal case.

    Tracing the cause of why this occurs is more likely that waiter
    simply timedout. But because it originally caused contention with
    the futex, the owner will go into the kernel when it unlocks
    the lock. Then it will hit this code path and
    rt_mutex_next_owner() will return NULL.

    Cc: Thomas Gleixner
    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     

10 Nov, 2010

4 commits

  • The futex_q struct has grown considerably over the last couple years. I
    believe it now merits a static initializer to avoid uninitialized data
    errors (having spent more time than I care to admit debugging an uninitialized
    q.bitset in an experimental new op code).

    With the key initializer built in, several of the FUTEX_KEY_INIT calls can
    be removed.

    V2: use a static variable instead of an init macro.
    use a C99 initializer and don't rely on variable ordering in the struct.
    V3: make futex_q_init const

    Signed-off-by: Darren Hart
    Cc: Peter Zijlstra
    Cc: Eric Dumazet
    Cc: John Kacur
    Cc: Ingo Molnar
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Darren Hart
     
  • In the early days we passed the mmap sem around. That became the
    "int fshared" with the fast gup improvements. Then we added
    "int clockrt" in places. This patch unifies these options as "flags".

    [ tglx: Split out the stale fshared cleanup ]

    Signed-off-by: Darren Hart
    Cc: Peter Zijlstra
    Cc: Eric Dumazet
    Cc: John Kacur
    Cc: Ingo Molnar
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Darren Hart
     
  • The fast GUP changes stopped using the fshared flag in
    put_futex_keys(), but we kept the interface the same.

    Cleanup all stale users.

    This patch is split out from Darren Harts combo patch which also
    combines various flags. This way the changes are clearly separated.

    Signed-off-by: Thomas Gleixner
    Cc: Darren Hart
    LKML-Reference:

    Thomas Gleixner
     
  • Since commit 1dcc41bb (futex: Change 3rd arg of fetch_robust_entry()
    to unsigned int*) some gcc versions decided to emit the following
    warning:

    kernel/futex.c: In function ‘exit_robust_list’:
    kernel/futex.c:2492: warning: ‘next_pi’ may be used uninitialized in this function

    The commit did not introduce the warning as gcc should have warned
    before that commit as well. It's just gcc being silly.

    The code path really can't result in next_pi being unitialized (or
    should not), but let's keep the build clean. Annotate next_pi as an
    uninitialized_var.

    [ tglx: Addressed the same issue in futex_compat.c and massaged the
    changelog ]

    Signed-off-by: Darren Hart
    Tested-by: Matt Fleming
    Tested-by: Uwe Kleine-König
    Cc: Peter Zijlstra
    Cc: Eric Dumazet
    Cc: John Kacur
    Cc: Ingo Molnar
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Darren Hart
     

26 Oct, 2010

1 commit


22 Oct, 2010

1 commit


19 Oct, 2010

1 commit

  • futex_wait() is leaking key references due to futex_wait_setup()
    acquiring an additional reference via the queue_lock() routine. The
    nested key ref-counting has been masking bugs and complicating code
    analysis. queue_lock() is only called with a previously ref-counted
    key, so remove the additional ref-counting from the queue_(un)lock()
    functions.

    Also futex_wait_requeue_pi() drops one key reference too many in
    unqueue_me_pi(). Remove the key reference handling from
    unqueue_me_pi(). This was paired with a queue_lock() in
    futex_lock_pi(), so the count remains unchanged.

    Document remaining nested key ref-counting sites.

    Signed-off-by: Darren Hart
    Reported-and-tested-by: Matthieu Fertré
    Reported-by: Louis Rilling
    Cc: Peter Zijlstra
    Cc: Eric Dumazet
    Cc: John Kacur
    Cc: Rusty Russell
    LKML-Reference:
    Signed-off-by: Thomas Gleixner
    Cc: stable@kernel.org

    Darren Hart
     

14 Oct, 2010

1 commit

  • Convert futex_requeue() function parameters to use @name
    kernel-doc notation and add @fshared & @cmpval to prevent
    kernel-doc warnings.

    Add @list to struct futex_q.

    Fix a few typos.

    Signed-off-by: Randy Dunlap
    Acked-by: Rusty Russell
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Randy Dunlap
     

18 Sep, 2010

3 commits


01 Jul, 2010

1 commit

  • futex_find_get_task is currently used (through lookup_pi_state) from two
    contexts, futex_requeue and futex_lock_pi_atomic. None of the paths
    looks it needs the credentials check, though. Different (e)uids
    shouldn't matter at all because the only thing that is important for
    shared futex is the accessibility of the shared memory.

    The credentail check results in glibc assert failure or process hang (if
    glibc is compiled without assert support) for shared robust pthread
    mutex with priority inheritance if a process tries to lock already held
    lock owned by a process with a different euid:

    pthread_mutex_lock.c:312: __pthread_mutex_lock_full: Assertion `(-(e)) != 3 || !robust' failed.

    The problem is that futex_lock_pi_atomic which is called when we try to
    lock already held lock checks the current holder (tid is stored in the
    futex value) to get the PI state. It uses lookup_pi_state which in turn
    gets task struct from futex_find_get_task. ESRCH is returned either
    when the task is not found or if credentials check fails.

    futex_lock_pi_atomic simply returns if it gets ESRCH. glibc code,
    however, doesn't expect that robust lock returns with ESRCH because it
    should get either success or owner died.

    Signed-off-by: Michal Hocko
    Acked-by: Darren Hart
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Nick Piggin
    Cc: Alexey Kuznetsov
    Cc: Peter Zijlstra
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

03 Feb, 2010

3 commits

  • The WARN_ON in lookup_pi_state which complains about a mismatch
    between pi_state->owner->pid and the pid which we retrieved from the
    user space futex is completely bogus.

    The code just emits the warning and then continues despite the fact
    that it detected an inconsistent state of the futex. A conveniant way
    for user space to spam the syslog.

    Replace the WARN_ON by a consistency check. If the values do not match
    return -EINVAL and let user space deal with the mess it created.

    This also fixes the missing task_pid_vnr() when we compare the
    pi_state->owner pid with the futex value.

    Reported-by: Jermome Marchand
    Signed-off-by: Thomas Gleixner
    Acked-by: Darren Hart
    Acked-by: Peter Zijlstra
    Cc:

    Thomas Gleixner
     
  • If the owner of a PI futex dies we fix up the pi_state and set
    pi_state->owner to NULL. When a malicious or just sloppy programmed
    user space application sets the futex value to 0 e.g. by calling
    pthread_mutex_init(), then the futex can be acquired again. A new
    waiter manages to enqueue itself on the pi_state w/o damage, but on
    unlock the kernel dereferences pi_state->owner and oopses.

    Prevent this by checking pi_state->owner in the unlock path. If
    pi_state->owner is not current we know that user space manipulated the
    futex value. Ignore the mess and return -EINVAL.

    This catches the above case and also the case where a task hijacks the
    futex by setting the tid value and then tries to unlock it.

    Reported-by: Jermome Marchand
    Signed-off-by: Thomas Gleixner
    Acked-by: Darren Hart
    Acked-by: Peter Zijlstra
    Cc:

    Thomas Gleixner
     
  • This fixes a futex key reference count bug in futex_lock_pi(),
    where a key's reference count is incremented twice but decremented
    only once, causing the backing object to not be released.

    If the futex is created in a temporary file in an ext3 file system,
    this bug causes the file's inode to become an "undead" orphan,
    which causes an oops from a BUG_ON() in ext3_put_super() when the
    file system is unmounted. glibc's test suite is known to trigger this,
    see .

    The bug is a regression from 2.6.28-git3, namely Peter Zijlstra's
    38d47c1b7075bd7ec3881141bb3629da58f88dab "[PATCH] futex: rely on
    get_user_pages() for shared futexes". That commit made get_futex_key()
    also increment the reference count of the futex key, and updated its
    callers to decrement the key's reference count before returning.
    Unfortunately the normal exit path in futex_lock_pi() wasn't corrected:
    the reference count is incremented by get_futex_key() and queue_lock(),
    but the normal exit path only decrements once, via unqueue_me_pi().
    The fix is to put_futex_key() after unqueue_me_pi(), since 2.6.31
    this is easily done by 'goto out_put_key' rather than 'goto out'.

    Signed-off-by: Mikael Pettersson
    Acked-by: Peter Zijlstra
    Acked-by: Darren Hart
    Signed-off-by: Thomas Gleixner
    Cc:

    Mikael Pettersson
     

13 Jan, 2010

1 commit

  • Currently, futexes have two problem:

    A) The current futex code doesn't handle private file mappings properly.

    get_futex_key() uses PageAnon() to distinguish file and
    anon, which can cause the following bad scenario:

    1) thread-A call futex(private-mapping, FUTEX_WAIT), it
    sleeps on file mapping object.
    2) thread-B writes a variable and it makes it cow.
    3) thread-B calls futex(private-mapping, FUTEX_WAKE), it
    wakes up blocked thread on the anonymous page. (but it's nothing)

    B) Current futex code doesn't handle zero page properly.

    Read mode get_user_pages() can return zero page, but current
    futex code doesn't handle it at all. Then, zero page makes
    infinite loop internally.

    The solution is to use write mode get_user_page() always for
    page lookup. It prevents the lookup of both file page of private
    mappings and zero page.

    Performance concerns:

    Probaly very little, because glibc always initialize variables
    for futex before to call futex(). It means glibc users never see
    the overhead of this patch.

    Compatibility concerns:

    This patch has few compatibility issues. After this patch,
    FUTEX_WAIT require writable access to futex variables (read-only
    mappings makes EFAULT). But practically it's not a problem,
    glibc always initalizes variables for futexes explicitly - nobody
    uses read-only mappings.

    Reported-by: Hugh Dickins
    Signed-off-by: KOSAKI Motohiro
    Acked-by: Peter Zijlstra
    Acked-by: Darren Hart
    Cc:
    Cc: Linus Torvalds
    Cc: KAMEZAWA Hiroyuki
    Cc: Nick Piggin
    Cc: Ulrich Drepper
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    KOSAKI Motohiro
     

15 Dec, 2009

3 commits


08 Dec, 2009

1 commit


29 Oct, 2009

1 commit

  • The requeue_pi path doesn't use unqueue_me() (and the racy lock_ptr ==
    NULL test) nor does it use the wake_list of futex_wake() which where
    the reason for commit 41890f2 (futex: Handle spurious wake up)

    See debugging discussing on LKML Message-ID:

    The changes in this fix to the wait_requeue_pi path were considered to
    be a likely unecessary, but harmless safety net. But it turns out that
    due to the fact that for unknown $@#!*( reasons EWOULDBLOCK is defined
    as EAGAIN we built an endless loop in the code path which returns
    correctly EWOULDBLOCK.

    Spurious wakeups in wait_requeue_pi code path are unlikely so we do
    the easy solution and return EWOULDBLOCK^WEAGAIN to user space and let
    it deal with the spurious wakeup.

    Cc: Darren Hart
    Cc: Peter Zijlstra
    Cc: Eric Dumazet
    Cc: John Stultz
    Cc: Dinakar Guniguntala
    LKML-Reference:
    Cc: stable@kernel.org
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

16 Oct, 2009

1 commit

  • When requeuing tasks from one futex to another, the reference held
    by the requeued task to the original futex location needs to be
    dropped eventually.

    Dropping the reference may ultimately lead to a call to
    "iput_final" and subsequently call into filesystem- specific code -
    which may be non-atomic.

    It is therefore safer to defer this drop operation until after the
    futex_hash_bucket spinlock has been dropped.

    Originally-From: Helge Bahmann
    Signed-off-by: Darren Hart
    Cc:
    Cc: Peter Zijlstra
    Cc: Eric Dumazet
    Cc: Dinakar Guniguntala
    Cc: John Stultz
    Cc: Sven-Thorsten Dietrich
    Cc: John Kacur
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Darren Hart
     

15 Oct, 2009

1 commit

  • If userspace tries to perform a requeue_pi on a non-requeue_pi waiter,
    it will find the futex_q->requeue_pi_key to be NULL and OOPS.

    Check for NULL in match_futex() instead of doing explicit NULL pointer
    checks on all call sites. While match_futex(NULL, NULL) returning
    false is a little odd, it's still correct as we expect valid key
    references.

    Signed-off-by: Darren Hart
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    CC: Eric Dumazet
    CC: Dinakar Guniguntala
    CC: John Stultz
    Cc: stable@kernel.org
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Darren Hart
     

14 Oct, 2009

1 commit

  • The futex code does not handle spurious wake up in futex_wait and
    futex_wait_requeue_pi.

    The code assumes that any wake up which was not caused by futex_wake /
    requeue or by a timeout was caused by a signal wake up and returns one
    of the syscall restart error codes.

    In case of a spurious wake up the signal delivery code which deals
    with the restart error codes is not invoked and we return that error
    code to user space. That causes applications which actually check the
    return codes to fail. Blaise reported that on preempt-rt a python test
    program run into a exception trap. -rt exposed that due to a built in
    spurious wake up accelerator :)

    Solve this by checking signal_pending(current) in the wake up path and
    handle the spurious wake up case w/o returning to user space.

    Reported-by: Blaise Gassend
    Debugged-by: Darren Hart
    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: stable@kernel.org
    LKML-Reference:

    Thomas Gleixner
     

09 Oct, 2009

1 commit

  • …/git/tip/linux-2.6-tip

    * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    futex: fix requeue_pi key imbalance
    futex: Fix typo in FUTEX_WAIT/WAKE_BITSET_PRIVATE definitions
    rcu: Place root rcu_node structure in separate lockdep class
    rcu: Make hot-unplugged CPU relinquish its own RCU callbacks
    rcu: Move rcu_barrier() to rcutree
    futex: Move exit_pi_state() call to release_mm()
    futex: Nullify robust lists after cleanup
    futex: Fix locking imbalance
    panic: Fix panic message visibility by calling bust_spinlocks(0) before dying
    rcu: Replace the rcu_barrier enum with pointer to call_rcu*() function
    rcu: Clean up code based on review feedback from Josh Triplett, part 4
    rcu: Clean up code based on review feedback from Josh Triplett, part 3
    rcu: Fix rcu_lock_map build failure on CONFIG_PROVE_LOCKING=y
    rcu: Clean up code to address Ingo's checkpatch feedback
    rcu: Clean up code based on review feedback from Josh Triplett, part 2
    rcu: Clean up code based on review feedback from Josh Triplett

    Linus Torvalds
     

08 Oct, 2009

1 commit

  • If futex_wait_requeue_pi() wakes prior to requeue, we drop the
    reference to the source futex_key twice, once in
    handle_early_requeue_pi_wakeup() and once on our way out.

    Remove the drop from the handle_early_requeue_pi_wakeup() and keep
    the get/drops together in futex_wait_requeue_pi().

    Reported-by: Helge Bahmann
    Signed-off-by: Darren Hart
    Cc: Helge Bahmann
    Cc: Peter Zijlstra
    Cc: Eric Dumazet
    Cc: Dinakar Guniguntala
    Cc: John Stultz
    Cc: stable-2.6.31
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Darren Hart
     

06 Oct, 2009

1 commit

  • Rich reported a lock imbalance in the futex code:

    http://bugzilla.kernel.org/show_bug.cgi?id=14288

    It's caused by the displacement of the retry_private label in
    futex_wake_op(). The code unlocks the hash bucket locks in the
    error handling path and retries without locking them again which
    makes the next unlock fail.

    Move retry_private so we lock the hash bucket locks when we retry.

    Reported-by: Rich Ercolany
    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Darren Hart
    Cc: stable-2.6.31
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

25 Sep, 2009

1 commit


22 Sep, 2009

5 commits

  • PI futexes do not use the same plist_node_empty() test for wakeup.
    It was possible for the waiter (in futex_wait_requeue_pi()) to set
    TASK_INTERRUPTIBLE after the waker assigned the rtmutex to the
    waiter. The waiter would then note the plist was not empty and call
    schedule(). The task would not be found by any subsequeuent futex
    wakeups, resulting in a userspace hang.

    By moving the setting of TASK_INTERRUPTIBLE to before the call to
    queue_me(), the race with the waker is eliminated. Since we no
    longer call get_user() from within queue_me(), there is no need to
    delay the setting of TASK_INTERRUPTIBLE until after the call to
    queue_me().

    The FUTEX_LOCK_PI operation is not affected as futex_lock_pi()
    relies entirely on the rtmutex code to handle schedule() and
    wakeup. The requeue PI code is affected because the waiter starts
    as a non-PI waiter and is woken on a PI futex.

    Remove the crusty old comment about holding spinlocks() across
    get_user() as we no longer do that. Correct the locking statement
    with a description of why the test is performed.

    Signed-off-by: Darren Hart
    Acked-by: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Eric Dumazet
    Cc: Dinakar Guniguntala
    Cc: John Stultz
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Darren Hart
     
  • Use kernel-doc format to describe struct futex_q.

    Correct the wakeup definition to eliminate the statement about
    waking the waiter between the plist_del() and the q->lock_ptr = 0.

    Note in the comment that PI futexes have a different definition of
    the woken state.

    Signed-off-by: Darren Hart
    Acked-by: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Eric Dumazet
    Cc: Dinakar Guniguntala
    Cc: John Stultz
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Darren Hart
     
  • Make the existing function kernel-doc consistent throughout
    futex.c, following Documentation/kernel-doc-nano-howto.txt as
    closely as possible.

    When unsure, at least be consistent within futex.c.

    Signed-off-by: Darren Hart
    Acked-by: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Eric Dumazet
    Cc: Dinakar Guniguntala
    Cc: John Stultz
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Darren Hart
     
  • The queue_me/unqueue_me commentary is oddly placed and out of date.
    Clean it up and correct the inaccurate bits.

    Signed-off-by: Darren Hart
    Acked-by: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Eric Dumazet
    Cc: Dinakar Guniguntala
    Cc: John Stultz
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Darren Hart
     
  • Correct various typos and formatting inconsistencies in the
    commentary of futex_wait_requeue_pi().

    Signed-off-by: Darren Hart
    Acked-by: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Eric Dumazet
    Cc: Dinakar Guniguntala
    Cc: John Stultz
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Darren Hart
     

12 Sep, 2009

1 commit


16 Aug, 2009

1 commit

  • There is currently no check to ensure that userspace uses the same
    futex requeue target (uaddr2) in futex_requeue() that the waiter used
    in futex_wait_requeue_pi(). A mismatch here could very unexpected
    results as the waiter assumes it either wakes on uaddr1 or uaddr2. We
    could detect this on wakeup in the waiter, but the cleanup is more
    intense after the improper requeue has occured.

    This patch stores the waiter's expected requeue target in a new
    requeue_pi_key pointer in the futex_q which futex_requeue() checks
    prior to attempting to do a proxy lock acquistion or a requeue when
    requeue_pi=1. If they don't match, return -EINVAL from futex_requeue,
    aborting the requeue of any remaining waiters.

    Signed-off-by: Darren Hart
    Cc: Peter Zijlstra
    Cc: Eric Dumazet
    Cc: John Kacur
    Cc: Dinakar Guniguntala
    Cc: John Stultz
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Darren Hart
     

11 Aug, 2009

1 commit

  • If futex_requeue(requeue_pi=1) finds a futex_q that was created by a call
    other the futex_wait_requeue_pi(), the q.rt_waiter may be null. If so,
    this will result in an oops from the following call graph:

    futex_requeue()
    rt_mutex_start_proxy_lock()
    task_blocks_on_rt_mutex()
    waiter->task dereference
    OOPS

    We currently WARN_ON() if this is detected, clearly this is inadequate.
    If we detect a mispairing in futex_requeue(), bail out, seding -EINVAL to
    user-space.

    V2: Fix parenthesis warnings.

    Signed-off-by: Darren Hart
    Acked-by: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: John Kacur
    Cc: Eric Dumazet
    Cc: Dinakar Guniguntala
    Cc: John Stultz
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Darren Hart
     

10 Aug, 2009

1 commit

  • futex_requeue() can acquire the lock on behalf of a waiter
    early on or during the requeue loop if it is uncontended or in
    the event of a lock steal or owner died. On wakeup, the waiter
    (in futex_wait_requeue_pi()) cleans up the pi_state owner using
    the lock_ptr to protect against concurrent access to the
    pi_state. The pi_state is hung off futex_q's on the requeue
    target futex hash bucket so the lock_ptr needs to be updated
    accordingly.

    The problem manifested by triggering the WARN_ON in
    lookup_pi_state() about the pid != pi_state->owner->pid. With
    this patch, the pi_state is properly guarded against concurrent
    access via the requeue target hb lock.

    The astute reviewer may notice that there is a window of time
    between when futex_requeue() unlocks the hb locks and when
    futex_wait_requeue_pi() will acquire hb2->lock. During this
    time the pi_state and uval are not in sync with the underlying
    rtmutex owner (but the uval does indicate there are waiters, so
    no atomic changes will occur in userspace). However, this is
    not a problem. Should a contending thread enter
    lookup_pi_state() and acquire hb2->lock before the ownership is
    fixed up, it will find the pi_state hung off a waiter's
    (possibly the pending owner's) futex_q and block on the
    rtmutex. Once futex_wait_requeue_pi() fixes up the owner, it
    will also move the pi_state from the old owner's
    task->pi_state_list to its own.

    v3: Fix plist lock name for application to mainline (rather
    than -rt) Compile tested against tip/v2.6.31-rc5.

    Signed-off-by: Darren Hart
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Eric Dumazet
    Cc: Dinakar Guniguntala
    Cc: John Stultz
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Darren Hart