20 May, 2009
1 commit
-
The futex code installs a read only mapping via get_user_pages_fast()
even if the futex op function has to modify user space data. The
eventual fault was fixed up by futex_handle_fault() which walked the
VMA with mmap_sem held.After the cleanup patches which removed the mmap_sem dependency of the
futex code commit 4dc5b7a36a49eff97050894cf1b3a9a02523717 (futex:
clean up fault logic) removed the private VMA walk logic from the
futex code. This change results in a stale RO mapping which is not
fixed up.Instead of reintroducing the previous fault logic we set up the
mapping in get_user_pages_fast() read/write for all operations which
modify user space data. Also handle private futexes in the same way
and make the current unconditional access_ok(VERIFY_WRITE) depend on
the futex op.Reported-by: Andreas Schwab
Signed-off-by: Thomas Gleixner
CC: stable@kernel.org
03 Apr, 2009
1 commit
-
We've tripped over the futex_requeue drop_count refering to key2
instead of key1. The code is actually correct, but is non-intuitive.
This patch adds an explicit comment explaining the requeue.Signed-off-by: Darren Hart
Cc: Peter Zijlstra
Cc: Nick Piggin
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Ingo Molnar
13 Mar, 2009
2 commits
-
Impact: fix double unlock crash
Thomas Gleixner noticed that the simplified double_unlock_hb()
became ... too unsophisticated: in the hb1 == hb2 case it will
do a double unlock.Reported-by: Thomas Gleixner
Cc: Darren Hart
LKML-Reference:
Signed-off-by: Ingo Molnar -
Impact: simplify code
I mistakenly included the pointer value ordering in the
double_unlock_hb() in my previous patch. It's only necessary
in the double_lock_hb() function. This patch removes it.Signed-off-by: Darren Hart
Acked-by: Peter Zijlstra
Cc: Rusty Russell
LKML-Reference:
Signed-off-by: Ingo Molnar
12 Mar, 2009
6 commits
-
Impact: cleanup
Older versions of the futex code held the mmap_sem which had to
be dropped in order to call get_user(), so a two-pronged fault
handling mechanism was employed to handle faults of the atomic
operations. The mmap_sem is no longer held, so get_user()
should be adequate. This patch greatly simplifies the logic and
improves legibility.Build and boot tested on a 4 way Intel x86_64 workstation.
Passes basic pthread_mutex and PI tests out of
ltp/testcases/realtime.Signed-off-by: Darren Hart
Acked-by: Peter Zijlstra
Cc: Rusty Russell
LKML-Reference:
Signed-off-by: Ingo Molnar -
Impact: rt-mutex failure case fix
futex_lock_pi can potentially return -EFAULT with the rt_mutex
held. This seems like the wrong thing to do as userspace should
assume -EFAULT means the lock was not taken. Even if it could
figure this out, we'd be leaving the pi_state->owner in an
inconsistent state. This patch unlocks the rt_mutex prior to
returning -EFAULT to userspace.Build and boot tested on a 4 way Intel x86_64 workstation.
Passes basic pthread_mutex and PI tests out of
ltp/testcases/realtime.Signed-off-by: Darren Hart
Acked-by: Peter Zijlstra
Cc: Rusty Russell
LKML-Reference:
Signed-off-by: Ingo Molnar -
RT tasks should set their timer slack to 0 on their own. This
patch removes the 'if (rt_task()) slack = 0;' block in
futex_wait.Build and boot tested on a 4 way Intel x86_64 workstation.
Passes basic pthread_mutex and PI tests out of
ltp/testcases/realtime.Signed-off-by: Darren Hart
Acked-by: Peter Zijlstra
Cc: Rusty Russell
Cc: Arjan van de Ven
LKML-Reference:
Signed-off-by: Ingo Molnar -
Impact: cleanup
The futex code uses double_lock_hb() which locks the hb->lock's
in pointer value order. There is no parallel unlock routine,
and the code unlocks them in name order, ignoring pointer value.This patch adds double_unlock_hb() to refactor the duplicated
code segments.Build and boot tested on a 4 way Intel x86_64 workstation.
Passes basic pthread_mutex and PI tests out of
ltp/testcases/realtime.Signed-off-by: Darren Hart
Acked-by: Peter Zijlstra
Cc: Rusty Russell
LKML-Reference:
Signed-off-by: Ingo Molnar -
Impact: fix races
futex_requeue and futex_lock_pi still had some bad
(get|put)_futex_key() usage. This patch adds the missing
put_futex_keys() and corrects a goto in futex_lock_pi() to avoid
a double get.Build and boot tested on a 4 way Intel x86_64 workstation.
Passes basic pthread_mutex and PI tests out of
ltp/testcases/realtime.Signed-off-by: Darren Hart
Acked-by: Peter Zijlstra
Cc: Rusty Russell
LKML-Reference:
Signed-off-by: Ingo Molnar -
Impact: cleanup
The futex_hash_bucket can be a bit confusing when first looking
at the code as it is a shared queue (and futex_q isn't a queue
at all, but rather an element on the queue).The mmap_sem is no longer held outside of the
futex_handle_fault() routine, yet numerous comments refer to it.
The fshared argument is no an integer. I left some of these
comments along as they are simply removed in future patches.Some of the commentary refering to futexes by virtual page
mappings was not very clear, and completely accurate (as for
shared futexes both the page and the offset are used to
determine the key). For the purposes of the function
description, just referring to "the futex" seems sufficient.With hashed futexes we now access the page after the hash-bucket
is locked, and not only after it is enqueued.Signed-off-by: Darren Hart
Acked-by: Peter Zijlstra
Cc: Rusty Russell
LKML-Reference:
Signed-off-by: Ingo Molnar
12 Feb, 2009
1 commit
-
Catalin noticed that (38d47c1b7075: futex: rely on get_user_pages() for
shared futexes) caused an mm_struct leak.Some tracing with the function graph tracer quickly pointed out that
futex_wait() has exit paths with unbalanced reference counts.This regression was discovered by kmemleak.
Reported-by: Catalin Marinas
Signed-off-by: Peter Zijlstra
Tested-by: "Pallipadi, Venkatesh"
Tested-by: Catalin Marinas
Signed-off-by: Ingo Molnar
14 Jan, 2009
2 commits
-
Signed-off-by: Heiko Carstens
-
Signed-off-by: Heiko Carstens
06 Jan, 2009
1 commit
03 Jan, 2009
1 commit
-
Impact: add debug check
Following up on my previous key reference accounting patches, this patch
will catch puts on keys that haven't been "got". This won't catch nested
get/put mismatches though.Build and boot tested, with minimal desktop activity and a run of the
open_posix_testsuite in LTP for testing. No warnings logged.Signed-off-by: Darren Hart
Cc:
Signed-off-by: Ingo Molnar
31 Dec, 2008
1 commit
-
* 'core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (63 commits)
stacktrace: provide save_stack_trace_tsk() weak alias
rcu: provide RCU options on non-preempt architectures too
printk: fix discarding message when recursion_bug
futex: clean up futex_(un)lock_pi fault handling
"Tree RCU": scalable classic RCU implementation
futex: rename field in futex_q to clarify single waiter semantics
x86/swiotlb: add default swiotlb_arch_range_needs_mapping
x86/swiotlb: add default physbus conversion
x86: unify pci iommu setup and allow swiotlb to compile for 32 bit
x86: add swiotlb allocation functions
swiotlb: consolidate swiotlb info message printing
swiotlb: support bouncing of HighMem pages
swiotlb: factor out copy to/from device
swiotlb: add arch hook to force mapping
swiotlb: allow architectures to override physbusphys conversions
swiotlb: add comment where we handle the overflow of a dma mask on 32 bit
rcu: fix rcutorture behavior during reboot
resources: skip sanity check of busy resources
swiotlb: move some definitions to header
swiotlb: allow architectures to override swiotlb pool allocation
...Fix up trivial conflicts in
arch/x86/kernel/Makefile
arch/x86/mm/init_32.c
include/linux/hardirq.h
as per Ingo's suggestions.
30 Dec, 2008
1 commit
-
Impact: cleanup
This patch makes the calls to futex_get_key_refs() and futex_drop_key_refs()
explicitly symmetric by only "putting" keys we successfully "got". Also
cleanup a couple return points that didn't "put" after a successful "get".Build and boot tested on an x86_64 system.
Signed-off-by: Darren Hart
Cc:
Signed-off-by: Ingo Molnar
19 Dec, 2008
1 commit
-
Impact: cleanup
Some apparently left over cruft code was complicating the fault logic:
Testing if uval != -EFAULT doesn't have any meaning, get_user() sets ret
to either 0 or -EFAULT, there's no need to compare uval, especially not
against EFAULT which it will never be. This patch removes the superfluous
test and clarifies the comment blocks.Build and boot tested on an 8way x86_64 system.
Signed-off-by: Darren Hart
Signed-off-by: Ingo Molnar
18 Dec, 2008
1 commit
-
Impact: simplify code
I've tripped over the naming of this field a couple times.
The futex_q uses a "waiters" list to represent a single blocked task and
then calles wake_up_all().This can lead to confusion in trying to understand the intent of the code,
which is to have a single futex_q for every task waiting on a futex.This patch corrects the problem, using a single pointer to the waiting
task, and an appropriate call to wake_up, rather than wake_up_all.Compile and boot tested on an 8way x86_64 machine.
Signed-off-by: Darren Hart
Acked-by: Thomas Gleixner
Signed-off-by: Ingo Molnar
25 Nov, 2008
2 commits
-
FUTEX_WAIT_BITSET could be used instead of FUTEX_WAIT by setting the
bit set to FUTEX_BITSET_MATCH_ANY, but FUTEX_WAIT uses CLOCK_REALTIME
while FUTEX_WAIT_BITSET uses CLOCK_MONOTONIC.Add a flag to select CLOCK_REALTIME for FUTEX_WAIT_BITSET so glibc can
replace the FUTEX_WAIT logic which needs to do gettimeofday() calls
before and after the syscall to convert the absolute timeout to a
relative timeout for FUTEX_WAIT.Signed-off-by: Thomas Gleixner
Cc: Ulrich Drepper
14 Nov, 2008
3 commits
-
Use RCU to access another task's creds and to release a task's own creds.
This means that it will be possible for the credentials of a task to be
replaced without another task (a) requiring a full lock to read them, and (b)
seeing deallocated memory.Signed-off-by: David Howells
Acked-by: James Morris
Acked-by: Serge Hallyn
Signed-off-by: James Morris -
Separate the task security context from task_struct. At this point, the
security data is temporarily embedded in the task_struct with two pointers
pointing to it.Note that the Alpha arch is altered as it refers to (E)UID and (E)GID in
entry.S via asm-offsets.With comment fixes Signed-off-by: Marc Dionne
Signed-off-by: David Howells
Acked-by: James Morris
Acked-by: Serge Hallyn
Signed-off-by: James Morris -
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().
Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.Signed-off-by: David Howells
Reviewed-by: James Morris
Acked-by: Serge Hallyn
Cc: Al Viro
Cc: linux-audit@redhat.com
Cc: containers@lists.linux-foundation.org
Cc: linux-mm@kvack.org
Signed-off-by: James Morris
30 Sep, 2008
5 commits
-
With the get_user_pages_fast() patches we made get_futex_key() obtain a
reference on the returned key, but failed to do so for private futexes.Signed-off-by: Peter Zijlstra
Acked-by: Nick Piggin
Signed-off-by: Ingo Molnar -
fshared doesn't need to be a rw_sem pointer anymore, so clean that up.
Signed-off-by: Peter Zijlstra
Acked-by: Nick Piggin
Signed-off-by: Ingo Molnar -
Change the get_user_pages() call with fast_gup() which doesn't require holding
the mmap_sem thereby removing the mmap_sem from all fast paths.Signed-off-by: Peter Zijlstra
Acked-by: Nick Piggin
Signed-off-by: Ingo Molnar -
now that we rely on get_user_pages() for the shared key handling
move all the mmap_sem stuff closely around the slow paths.Signed-off-by: Peter Zijlstra
Acked-by: Nick Piggin
Signed-off-by: Ingo Molnar -
On the way of getting rid of the mmap_sem requirement for shared futexes,
start by relying on get_user_pages().Signed-off-by: Peter Zijlstra
Acked-by: Nick Piggin
Signed-off-by: Ingo Molnar
11 Sep, 2008
1 commit
-
This patch makes the futex() system call use the per process
slack value; with this users are able to externally control existing
applications to reduce the wakeup rate.Signed-off-by: Arjan van de Ven
06 Sep, 2008
1 commit
-
In order to be able to do range hrtimers we need to use accessor functions
to the "expire" member of the hrtimer struct.
This patch converts kernel/* to these accessors.Signed-off-by: Arjan van de Ven
23 Jun, 2008
1 commit
-
This patch addresses a very sporadic pi-futex related failure in
highly threaded java apps on large SMP systems.David Holmes reported that the pi_state consistency check in
lookup_pi_state triggered with his test application. This means that
the kernel internal pi_state and the user space futex variable are out
of sync. First we assumed that this is a user space data corruption,
but deeper investigation revieled that the problem happend because the
pi-futex code is not handling a fault in the futex_lock_pi path when
the user space variable needs to be fixed up.The fault happens when a fork mapped the anon memory which contains
the futex readonly for COW or the page got swapped out exactly between
the unlock of the futex and the return of either the new futex owner
or the task which was the expected owner but failed to acquire the
kernel internal rtmutex. The current futex_lock_pi() code drops out
with an inconsistent in case it faults and returns -EFAULT to user
space. User space has no way to fixup that state.When we wrote this code we thought that we could not drop the hash
bucket lock at this point to handle the fault.After analysing the code again it turned out to be wrong because there
are only two tasks involved which might modify the pi_state and the
user space variable:- the task which acquired the rtmutex
- the pending owner of the pi_state which did not get the rtmutexBoth tasks drop into the fixup_pi_state() function before returning to
user space. The first task which acquired the hash bucket lock faults
in the fixup of the user space variable, drops the spinlock and calls
futex_handle_fault() to fault in the page. Now the second task could
acquire the hash bucket lock and tries to fixup the user space
variable as well. It either faults as well or it succeeds because the
first task already faulted the page in.One caveat is to avoid a double fixup. After returning from the fault
handling we reacquire the hash bucket lock and check whether the
pi_state owner has been modified already.Reported-by: David Holmes
Signed-off-by: Thomas Gleixner
Cc: Andrew Morton
Cc: David Holmes
Cc: Peter Zijlstra
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc:
Signed-off-by: Ingo Molnarkernel/futex.c | 93 ++++++++++++++++++++++++++++++++++++++++++++-------------
1 file changed, 73 insertions(+), 20 deletions(-)
05 May, 2008
1 commit
-
Since FUTEX_FD was scheduled for removal in June 2007 lets remove it.
Google Code search found no users for it and NGPT was abandoned in 2003
according to IBM. futex.h is left untouched to make sure the id does
not get reassigned. Since queue_me() has no users left it is commented
out to avoid a warning, i didnt remove it completely since it is part of
the internal api (matching unqueue_me())Signed-off-by: Eric Sesterhenn
Signed-off-by: Rusty Russell (removed rest)
Acked-by: Thomas Gleixner
Signed-off-by: Linus Torvalds
30 Apr, 2008
1 commit
-
hrtimers have now dynamic users in the network code. Put them under
debugobjects surveillance as well.Add calls to the generic object debugging infrastructure and provide fixup
functions which allow to keep the system alive when recoverable problems have
been detected by the object debugging core code.Signed-off-by: Thomas Gleixner
Cc: Greg KH
Cc: Randy Dunlap
Cc: Kay Sievers
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
31 Mar, 2008
1 commit
-
Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds
27 Mar, 2008
1 commit
-
The futex init function is called init(). This is a pain in the neck
when debugging when you code dies in ... init :-)This renames it to futex_init().
Signed-off-by: Benjamin Herrenschmidt
Signed-off-by: Linus Torvalds
24 Feb, 2008
2 commits
-
Not all architectures implement futex_atomic_cmpxchg_inatomic(). The default
implementation returns -ENOSYS, which is currently not handled inside of the
futex guts.Futex PI calls and robust list exits with a held futex result in an endless
loop in the futex code on architectures which have no support.Fixing up every place where futex_atomic_cmpxchg_inatomic() is called would
add a fair amount of extra if/else constructs to the already complex code. It
is also not possible to disable the robust feature before user space tries to
register robust lists.Compile time disabling is not a good idea either, as there are already
architectures with runtime detection of futex_atomic_cmpxchg_inatomic support.Detect the functionality at runtime instead by calling
cmpxchg_futex_value_locked() with a NULL pointer from the futex initialization
code. This is guaranteed to fail, but the call of
futex_atomic_cmpxchg_inatomic() happens with pagefaults disabled.On architectures, which use the asm-generic implementation or have a runtime
CPU feature detection, a -ENOSYS return value disables the PI/robust features.On architectures with a working implementation the call returns -EFAULT and
the PI/robust features are enabled.The relevant syscalls return -ENOSYS and the robust list exit code is blocked,
when the detection fails.Fixes http://lkml.org/lkml/2008/2/11/149
Originally reported by: Lennart BuytenhekSigned-off-by: Thomas Gleixner
Acked-by: Ingo Molnar
Cc: Lennert Buytenhek
Cc: Riku Voipio
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
When the futex init code fails to initialize the futex pseudo file system it
returns early without initializing the hash queues. Should the boot succeed
then a futex syscall which tries to enqueue a waiter on the hashqueue will
crash due to the unitilialized plist heads.Initialize the hash queues before the filesystem.
Signed-off-by: Thomas Gleixner
Acked-by: Ingo Molnar
Cc: Lennert Buytenhek
Cc: Riku Voipio
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
15 Feb, 2008
1 commit
-
Various user space callers ask for relative timeouts. While we fixed
that overflow issue in hrtimer_start(), the sites which convert
relative user space values to absolute timeouts themself were uncovered.Instead of putting overflow checks into each place add a function
which does the sanity checking and convert all affected callers to use
it.Thanks to Frans Pop, who reported the problem and tested the fixes.
Signed-off-by: Thomas Gleixner
Acked-by: Ingo Molnar
Tested-by: Frans Pop
02 Feb, 2008
1 commit
-
To allow the implementation of optimized rw-locks in user space, glibc
needs a possibility to select waiters for wakeup depending on a bitset
mask.This requires two new futex OPs: FUTEX_WAIT_BITS and FUTEX_WAKE_BITS
These OPs are basically the same as FUTEX_WAIT and FUTEX_WAKE plus an
additional argument - a bitset. Further the FUTEX_WAIT_BITS OP is
expecting an absolute timeout value instead of the relative one, which
is used for the FUTEX_WAIT OP.FUTEX_WAIT_BITS calls into the kernel with a bitset. The bitset is
stored in the futex_q structure, which is used to enqueue the waiter
into the hashed futex waitqueue.FUTEX_WAKE_BITS also calls into the kernel with a bitset. The wakeup
function logically ANDs the bitset with the bitset stored in each
waiters futex_q structure. If the result is zero (i.e. none of the set
bits in the bitsets is matching), then the waiter is not woken up. If
the result is not zero (i.e. one of the set bits in the bitsets is
matching), then the waiter is woken.The bitset provided by the caller must be non zero. In case the
provided bitset is zero the kernel returns EINVAL.Internaly the new OPs are only extensions to the existing FUTEX_WAIT
and FUTEX_WAKE functions. The existing OPs hand a bitset with all bits
set into the futex_wait() and futex_wake() functions.Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar