20 May, 2009

1 commit

  • The futex code installs a read only mapping via get_user_pages_fast()
    even if the futex op function has to modify user space data. The
    eventual fault was fixed up by futex_handle_fault() which walked the
    VMA with mmap_sem held.

    After the cleanup patches which removed the mmap_sem dependency of the
    futex code commit 4dc5b7a36a49eff97050894cf1b3a9a02523717 (futex:
    clean up fault logic) removed the private VMA walk logic from the
    futex code. This change results in a stale RO mapping which is not
    fixed up.

    Instead of reintroducing the previous fault logic we set up the
    mapping in get_user_pages_fast() read/write for all operations which
    modify user space data. Also handle private futexes in the same way
    and make the current unconditional access_ok(VERIFY_WRITE) depend on
    the futex op.

    Reported-by: Andreas Schwab
    Signed-off-by: Thomas Gleixner
    CC: stable@kernel.org

    Thomas Gleixner
     

03 Apr, 2009

1 commit

  • We've tripped over the futex_requeue drop_count refering to key2
    instead of key1. The code is actually correct, but is non-intuitive.
    This patch adds an explicit comment explaining the requeue.

    Signed-off-by: Darren Hart
    Cc: Peter Zijlstra
    Cc: Nick Piggin
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Ingo Molnar

    Darren Hart
     

13 Mar, 2009

2 commits


12 Mar, 2009

6 commits

  • Impact: cleanup

    Older versions of the futex code held the mmap_sem which had to
    be dropped in order to call get_user(), so a two-pronged fault
    handling mechanism was employed to handle faults of the atomic
    operations. The mmap_sem is no longer held, so get_user()
    should be adequate. This patch greatly simplifies the logic and
    improves legibility.

    Build and boot tested on a 4 way Intel x86_64 workstation.
    Passes basic pthread_mutex and PI tests out of
    ltp/testcases/realtime.

    Signed-off-by: Darren Hart
    Acked-by: Peter Zijlstra
    Cc: Rusty Russell
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Darren Hart
     
  • Impact: rt-mutex failure case fix

    futex_lock_pi can potentially return -EFAULT with the rt_mutex
    held. This seems like the wrong thing to do as userspace should
    assume -EFAULT means the lock was not taken. Even if it could
    figure this out, we'd be leaving the pi_state->owner in an
    inconsistent state. This patch unlocks the rt_mutex prior to
    returning -EFAULT to userspace.

    Build and boot tested on a 4 way Intel x86_64 workstation.
    Passes basic pthread_mutex and PI tests out of
    ltp/testcases/realtime.

    Signed-off-by: Darren Hart
    Acked-by: Peter Zijlstra
    Cc: Rusty Russell
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Darren Hart
     
  • RT tasks should set their timer slack to 0 on their own. This
    patch removes the 'if (rt_task()) slack = 0;' block in
    futex_wait.

    Build and boot tested on a 4 way Intel x86_64 workstation.
    Passes basic pthread_mutex and PI tests out of
    ltp/testcases/realtime.

    Signed-off-by: Darren Hart
    Acked-by: Peter Zijlstra
    Cc: Rusty Russell
    Cc: Arjan van de Ven
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Darren Hart
     
  • Impact: cleanup

    The futex code uses double_lock_hb() which locks the hb->lock's
    in pointer value order. There is no parallel unlock routine,
    and the code unlocks them in name order, ignoring pointer value.

    This patch adds double_unlock_hb() to refactor the duplicated
    code segments.

    Build and boot tested on a 4 way Intel x86_64 workstation.
    Passes basic pthread_mutex and PI tests out of
    ltp/testcases/realtime.

    Signed-off-by: Darren Hart
    Acked-by: Peter Zijlstra
    Cc: Rusty Russell
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Darren Hart
     
  • Impact: fix races

    futex_requeue and futex_lock_pi still had some bad
    (get|put)_futex_key() usage. This patch adds the missing
    put_futex_keys() and corrects a goto in futex_lock_pi() to avoid
    a double get.

    Build and boot tested on a 4 way Intel x86_64 workstation.
    Passes basic pthread_mutex and PI tests out of
    ltp/testcases/realtime.

    Signed-off-by: Darren Hart
    Acked-by: Peter Zijlstra
    Cc: Rusty Russell
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Darren Hart
     
  • Impact: cleanup

    The futex_hash_bucket can be a bit confusing when first looking
    at the code as it is a shared queue (and futex_q isn't a queue
    at all, but rather an element on the queue).

    The mmap_sem is no longer held outside of the
    futex_handle_fault() routine, yet numerous comments refer to it.
    The fshared argument is no an integer. I left some of these
    comments along as they are simply removed in future patches.

    Some of the commentary refering to futexes by virtual page
    mappings was not very clear, and completely accurate (as for
    shared futexes both the page and the offset are used to
    determine the key). For the purposes of the function
    description, just referring to "the futex" seems sufficient.

    With hashed futexes we now access the page after the hash-bucket
    is locked, and not only after it is enqueued.

    Signed-off-by: Darren Hart
    Acked-by: Peter Zijlstra
    Cc: Rusty Russell
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Darren Hart
     

12 Feb, 2009

1 commit

  • Catalin noticed that (38d47c1b7075: futex: rely on get_user_pages() for
    shared futexes) caused an mm_struct leak.

    Some tracing with the function graph tracer quickly pointed out that
    futex_wait() has exit paths with unbalanced reference counts.

    This regression was discovered by kmemleak.

    Reported-by: Catalin Marinas
    Signed-off-by: Peter Zijlstra
    Tested-by: "Pallipadi, Venkatesh"
    Tested-by: Catalin Marinas
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

14 Jan, 2009

2 commits


06 Jan, 2009

1 commit


03 Jan, 2009

1 commit

  • Impact: add debug check

    Following up on my previous key reference accounting patches, this patch
    will catch puts on keys that haven't been "got". This won't catch nested
    get/put mismatches though.

    Build and boot tested, with minimal desktop activity and a run of the
    open_posix_testsuite in LTP for testing. No warnings logged.

    Signed-off-by: Darren Hart
    Cc:
    Signed-off-by: Ingo Molnar

    Darren Hart
     

31 Dec, 2008

1 commit

  • * 'core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (63 commits)
    stacktrace: provide save_stack_trace_tsk() weak alias
    rcu: provide RCU options on non-preempt architectures too
    printk: fix discarding message when recursion_bug
    futex: clean up futex_(un)lock_pi fault handling
    "Tree RCU": scalable classic RCU implementation
    futex: rename field in futex_q to clarify single waiter semantics
    x86/swiotlb: add default swiotlb_arch_range_needs_mapping
    x86/swiotlb: add default physbus conversion
    x86: unify pci iommu setup and allow swiotlb to compile for 32 bit
    x86: add swiotlb allocation functions
    swiotlb: consolidate swiotlb info message printing
    swiotlb: support bouncing of HighMem pages
    swiotlb: factor out copy to/from device
    swiotlb: add arch hook to force mapping
    swiotlb: allow architectures to override physbusphys conversions
    swiotlb: add comment where we handle the overflow of a dma mask on 32 bit
    rcu: fix rcutorture behavior during reboot
    resources: skip sanity check of busy resources
    swiotlb: move some definitions to header
    swiotlb: allow architectures to override swiotlb pool allocation
    ...

    Fix up trivial conflicts in
    arch/x86/kernel/Makefile
    arch/x86/mm/init_32.c
    include/linux/hardirq.h
    as per Ingo's suggestions.

    Linus Torvalds
     

30 Dec, 2008

1 commit

  • Impact: cleanup

    This patch makes the calls to futex_get_key_refs() and futex_drop_key_refs()
    explicitly symmetric by only "putting" keys we successfully "got". Also
    cleanup a couple return points that didn't "put" after a successful "get".

    Build and boot tested on an x86_64 system.

    Signed-off-by: Darren Hart
    Cc:
    Signed-off-by: Ingo Molnar

    Darren Hart
     

19 Dec, 2008

1 commit

  • Impact: cleanup

    Some apparently left over cruft code was complicating the fault logic:

    Testing if uval != -EFAULT doesn't have any meaning, get_user() sets ret
    to either 0 or -EFAULT, there's no need to compare uval, especially not
    against EFAULT which it will never be. This patch removes the superfluous
    test and clarifies the comment blocks.

    Build and boot tested on an 8way x86_64 system.

    Signed-off-by: Darren Hart
    Signed-off-by: Ingo Molnar

    Darren Hart
     

18 Dec, 2008

1 commit

  • Impact: simplify code

    I've tripped over the naming of this field a couple times.

    The futex_q uses a "waiters" list to represent a single blocked task and
    then calles wake_up_all().

    This can lead to confusion in trying to understand the intent of the code,
    which is to have a single futex_q for every task waiting on a futex.

    This patch corrects the problem, using a single pointer to the waiting
    task, and an appropriate call to wake_up, rather than wake_up_all.

    Compile and boot tested on an 8way x86_64 machine.

    Signed-off-by: Darren Hart
    Acked-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Darren Hart
     

25 Nov, 2008

2 commits


14 Nov, 2008

3 commits

  • Use RCU to access another task's creds and to release a task's own creds.
    This means that it will be possible for the credentials of a task to be
    replaced without another task (a) requiring a full lock to read them, and (b)
    seeing deallocated memory.

    Signed-off-by: David Howells
    Acked-by: James Morris
    Acked-by: Serge Hallyn
    Signed-off-by: James Morris

    David Howells
     
  • Separate the task security context from task_struct. At this point, the
    security data is temporarily embedded in the task_struct with two pointers
    pointing to it.

    Note that the Alpha arch is altered as it refers to (E)UID and (E)GID in
    entry.S via asm-offsets.

    With comment fixes Signed-off-by: Marc Dionne

    Signed-off-by: David Howells
    Acked-by: James Morris
    Acked-by: Serge Hallyn
    Signed-off-by: James Morris

    David Howells
     
  • Wrap access to task credentials so that they can be separated more easily from
    the task_struct during the introduction of COW creds.

    Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

    Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
    sense to use RCU directly rather than a convenient wrapper; these will be
    addressed by later patches.

    Signed-off-by: David Howells
    Reviewed-by: James Morris
    Acked-by: Serge Hallyn
    Cc: Al Viro
    Cc: linux-audit@redhat.com
    Cc: containers@lists.linux-foundation.org
    Cc: linux-mm@kvack.org
    Signed-off-by: James Morris

    David Howells
     

30 Sep, 2008

5 commits


11 Sep, 2008

1 commit


06 Sep, 2008

1 commit


23 Jun, 2008

1 commit

  • This patch addresses a very sporadic pi-futex related failure in
    highly threaded java apps on large SMP systems.

    David Holmes reported that the pi_state consistency check in
    lookup_pi_state triggered with his test application. This means that
    the kernel internal pi_state and the user space futex variable are out
    of sync. First we assumed that this is a user space data corruption,
    but deeper investigation revieled that the problem happend because the
    pi-futex code is not handling a fault in the futex_lock_pi path when
    the user space variable needs to be fixed up.

    The fault happens when a fork mapped the anon memory which contains
    the futex readonly for COW or the page got swapped out exactly between
    the unlock of the futex and the return of either the new futex owner
    or the task which was the expected owner but failed to acquire the
    kernel internal rtmutex. The current futex_lock_pi() code drops out
    with an inconsistent in case it faults and returns -EFAULT to user
    space. User space has no way to fixup that state.

    When we wrote this code we thought that we could not drop the hash
    bucket lock at this point to handle the fault.

    After analysing the code again it turned out to be wrong because there
    are only two tasks involved which might modify the pi_state and the
    user space variable:

    - the task which acquired the rtmutex
    - the pending owner of the pi_state which did not get the rtmutex

    Both tasks drop into the fixup_pi_state() function before returning to
    user space. The first task which acquired the hash bucket lock faults
    in the fixup of the user space variable, drops the spinlock and calls
    futex_handle_fault() to fault in the page. Now the second task could
    acquire the hash bucket lock and tries to fixup the user space
    variable as well. It either faults as well or it succeeds because the
    first task already faulted the page in.

    One caveat is to avoid a double fixup. After returning from the fault
    handling we reacquire the hash bucket lock and check whether the
    pi_state owner has been modified already.

    Reported-by: David Holmes
    Signed-off-by: Thomas Gleixner
    Cc: Andrew Morton
    Cc: David Holmes
    Cc: Peter Zijlstra
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc:
    Signed-off-by: Ingo Molnar

    kernel/futex.c | 93 ++++++++++++++++++++++++++++++++++++++++++++-------------
    1 file changed, 73 insertions(+), 20 deletions(-)

    Thomas Gleixner
     

05 May, 2008

1 commit

  • Since FUTEX_FD was scheduled for removal in June 2007 lets remove it.

    Google Code search found no users for it and NGPT was abandoned in 2003
    according to IBM. futex.h is left untouched to make sure the id does
    not get reassigned. Since queue_me() has no users left it is commented
    out to avoid a warning, i didnt remove it completely since it is part of
    the internal api (matching unqueue_me())

    Signed-off-by: Eric Sesterhenn
    Signed-off-by: Rusty Russell (removed rest)
    Acked-by: Thomas Gleixner
    Signed-off-by: Linus Torvalds

    Eric Sesterhenn
     

30 Apr, 2008

1 commit

  • hrtimers have now dynamic users in the network code. Put them under
    debugobjects surveillance as well.

    Add calls to the generic object debugging infrastructure and provide fixup
    functions which allow to keep the system alive when recoverable problems have
    been detected by the object debugging core code.

    Signed-off-by: Thomas Gleixner
    Cc: Greg KH
    Cc: Randy Dunlap
    Cc: Kay Sievers
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     

31 Mar, 2008

1 commit


27 Mar, 2008

1 commit


24 Feb, 2008

2 commits

  • Not all architectures implement futex_atomic_cmpxchg_inatomic(). The default
    implementation returns -ENOSYS, which is currently not handled inside of the
    futex guts.

    Futex PI calls and robust list exits with a held futex result in an endless
    loop in the futex code on architectures which have no support.

    Fixing up every place where futex_atomic_cmpxchg_inatomic() is called would
    add a fair amount of extra if/else constructs to the already complex code. It
    is also not possible to disable the robust feature before user space tries to
    register robust lists.

    Compile time disabling is not a good idea either, as there are already
    architectures with runtime detection of futex_atomic_cmpxchg_inatomic support.

    Detect the functionality at runtime instead by calling
    cmpxchg_futex_value_locked() with a NULL pointer from the futex initialization
    code. This is guaranteed to fail, but the call of
    futex_atomic_cmpxchg_inatomic() happens with pagefaults disabled.

    On architectures, which use the asm-generic implementation or have a runtime
    CPU feature detection, a -ENOSYS return value disables the PI/robust features.

    On architectures with a working implementation the call returns -EFAULT and
    the PI/robust features are enabled.

    The relevant syscalls return -ENOSYS and the robust list exit code is blocked,
    when the detection fails.

    Fixes http://lkml.org/lkml/2008/2/11/149
    Originally reported by: Lennart Buytenhek

    Signed-off-by: Thomas Gleixner
    Acked-by: Ingo Molnar
    Cc: Lennert Buytenhek
    Cc: Riku Voipio
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • When the futex init code fails to initialize the futex pseudo file system it
    returns early without initializing the hash queues. Should the boot succeed
    then a futex syscall which tries to enqueue a waiter on the hashqueue will
    crash due to the unitilialized plist heads.

    Initialize the hash queues before the filesystem.

    Signed-off-by: Thomas Gleixner
    Acked-by: Ingo Molnar
    Cc: Lennert Buytenhek
    Cc: Riku Voipio
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     

15 Feb, 2008

1 commit

  • Various user space callers ask for relative timeouts. While we fixed
    that overflow issue in hrtimer_start(), the sites which convert
    relative user space values to absolute timeouts themself were uncovered.

    Instead of putting overflow checks into each place add a function
    which does the sanity checking and convert all affected callers to use
    it.

    Thanks to Frans Pop, who reported the problem and tested the fixes.

    Signed-off-by: Thomas Gleixner
    Acked-by: Ingo Molnar
    Tested-by: Frans Pop

    Thomas Gleixner
     

02 Feb, 2008

1 commit

  • To allow the implementation of optimized rw-locks in user space, glibc
    needs a possibility to select waiters for wakeup depending on a bitset
    mask.

    This requires two new futex OPs: FUTEX_WAIT_BITS and FUTEX_WAKE_BITS
    These OPs are basically the same as FUTEX_WAIT and FUTEX_WAKE plus an
    additional argument - a bitset. Further the FUTEX_WAIT_BITS OP is
    expecting an absolute timeout value instead of the relative one, which
    is used for the FUTEX_WAIT OP.

    FUTEX_WAIT_BITS calls into the kernel with a bitset. The bitset is
    stored in the futex_q structure, which is used to enqueue the waiter
    into the hashed futex waitqueue.

    FUTEX_WAKE_BITS also calls into the kernel with a bitset. The wakeup
    function logically ANDs the bitset with the bitset stored in each
    waiters futex_q structure. If the result is zero (i.e. none of the set
    bits in the bitsets is matching), then the waiter is not woken up. If
    the result is not zero (i.e. one of the set bits in the bitsets is
    matching), then the waiter is woken.

    The bitset provided by the caller must be non zero. In case the
    provided bitset is zero the kernel returns EINVAL.

    Internaly the new OPs are only extensions to the existing FUTEX_WAIT
    and FUTEX_WAKE functions. The existing OPs hand a bitset with all bits
    set into the futex_wait() and futex_wake() functions.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Thomas Gleixner