Eric Lee / smarc-fsl-linux-kernel

01 Jan, 2012

1 commit

e6780f724 futex: Fix uninterruptible loop due to gate_area ... Browse Code »
1

It was found (by Sasha) that if you use a futex located in the gate
area we get stuck in an uninterruptible infinite loop, much like the
ZERO_PAGE issue.

While looking at this problem, PeterZ realized you'll get into similar
trouble when hitting any install_special_pages() mapping. And are there
still drivers setting up their own special mmaps without page->mapping,
and without special VM or pte flags to make get_user_pages fail?

In most cases, if page->mapping is NULL, we do not need to retry at all:
Linus points out that even /proc/sys/vm/drop_caches poses no problem,
because it ends up using remove_mapping(), which takes care not to
interfere when the page reference count is raised.

But there is still one case which does need a retry: if memory pressure
called shmem_writepage in between get_user_pages_fast dropping page
table lock and our acquiring page lock, then the page gets switched from
filecache to swapcache (and ->mapping set to NULL) whatever the refcount.
Fault it back in to get the page->mapping needed for key->shared.inode.

Reported-by: Sasha Levin
Signed-off-by: Hugh Dickins
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds

Hugh Dickins
2012-01-01 03:48:28 +0800

31 Oct, 2011

1 commit

9984de1a5 kernel: Map most files to use export.h instead of module.h ... Browse Code »

The changed files were only including linux/module.h for the
EXPORT_SYMBOL infrastructure, and nothing else. Revector them
onto the isolated export header for faster compile times.

Nothing to see here but a whole lot of instances of:

-#include
+#include

This commit is only changing the kernel dir; next targets
will probably be mm, fs, the arch dirs, etc.

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2011-10-31 21:20:12 +0800

15 Sep, 2011

3 commits

e060c3843 Merge branch 'master' into for-next ... Browse Code »

Fast-forward merge with Linus to be able to merge patches
based on more recent version of the tree.

Jiri Kosina
2011-09-15 21:08:18 +0800
ca4a04cf3 futex: Fix spelling in a source code comment ... Browse Code »

Change a single occurrence of "unlcoked" into "unlocked".

Signed-off-by: Bart Van Assche
Cc: Darren Hart
Cc: Thomas Gleixner
Signed-off-by: Jiri Kosina

Bart Van Assche
2011-09-15 20:37:17 +0800
7cfdaf38d futex: uninitialized warning corrections ... Browse Code »

The variables here are really not used uninitialized.

kernel/futex.c: In function 'fixup_pi_state_owner.clone.17':
kernel/futex.c:1582:6: warning: 'curval' may be used uninitialized in this function
kernel/futex.c: In function 'handle_futex_death':
kernel/futex.c:2486:6: warning: 'nval' may be used uninitialized in this function
kernel/futex.c: In function 'do_futex':
kernel/futex.c:863:11: warning: 'curval' may be used uninitialized in this function
kernel/futex.c:828:6: note: 'curval' was declared here
kernel/futex.c:898:5: warning: 'oldval' may be used uninitialized in this function
kernel/futex.c:890:6: note: 'oldval' was declared here

Signed-off-by: Vitaliy Ivanov
Acked-by: Darren Hart
Signed-off-by: Jiri Kosina

Vitaliy Ivanov
2011-09-15 20:23:07 +0800

04 Aug, 2011

1 commit

d7619fe39 Merge branch 'linus' into core/urgent Browse Code »

Ingo Molnar
2011-08-04 15:09:27 +0800

27 Jul, 2011

1 commit

9ea71503a futex: Fix regression with read only mappings ... Browse Code »
1

commit 7485d0d3758e8e6491a5c9468114e74dc050785d (futexes: Remove rw
parameter from get_futex_key()) in 2.6.33 fixed two problems: First, It
prevented a loop when encountering a ZERO_PAGE. Second, it fixed RW
MAP_PRIVATE futex operations by forcing the COW to occur by
unconditionally performing a write access get_user_pages_fast() to get
the page. The commit also introduced a user-mode regression in that it
broke futex operations on read-only memory maps. For example, this
breaks workloads that have one or more reader processes doing a
FUTEX_WAIT on a futex within a read only shared file mapping, and a
writer processes that has a writable mapping issuing the FUTEX_WAKE.

This fixes the regression for valid futex operations on RO mappings by
trying a RO get_user_pages_fast() when the RW get_user_pages_fast()
fails. This change makes it necessary to also check for invalid use
cases, such as anonymous RO mappings (which can never change) and the
ZERO_PAGE which the commit referenced above was written to address.

This patch does restore the original behavior with RO MAP_PRIVATE
mappings, which have inherent user-mode usage problems and don't really
make sense. With this patch performing a FUTEX_WAIT within a RO
MAP_PRIVATE mapping will be successfully woken provided another process
updates the region of the underlying mapped file. However, the mmap()
man page states that for a MAP_PRIVATE mapping:

It is unspecified whether changes made to the file after
the mmap() call are visible in the mapped region.

So user-mode users attempting to use futex operations on RO MAP_PRIVATE
mappings are depending on unspecified behavior. Additionally a
RO MAP_PRIVATE mapping could fail to wake up in the following case.

Thread-A: call futex(FUTEX_WAIT, memory-region-A).
get_futex_key() return inode based key.
sleep on the key
Thread-B: call mprotect(PROT_READ|PROT_WRITE, memory-region-A)
Thread-B: write memory-region-A.
COW happen. This process's memory-region-A become related
to new COWed private (ie PageAnon=1) page.
Thread-B: call futex(FUETX_WAKE, memory-region-A).
get_futex_key() return mm based key.
IOW, we fail to wake up Thread-A.

Once again doing something like this is just silly and users who do
something like this get what they deserve.

While RO MAP_PRIVATE mappings are nonsensical, checking for a private
mapping requires walking the vmas and was deemed too costly to avoid a
userspace hang.

This Patch is based on Peter Zijlstra's initial patch with modifications to
only allow RO mappings for futex operations that need VERIFY_READ access.

Reported-by: David Oliver
Signed-off-by: Shawn Bohrer
Acked-by: Peter Zijlstra
Signed-off-by: Darren Hart
Cc: KOSAKI Motohiro
Cc: peterz@infradead.org
Cc: eric.dumazet@gmail.com
Cc: zvonler@rgmadvisors.com
Cc: hughd@google.com
Link: http://lkml.kernel.org/r/1309450892-30676-1-git-send-email-sbohrer@rgmadvisors.com
Cc: stable@kernel.org
Signed-off-by: Thomas Gleixner

Shawn Bohrer
2011-07-27 02:59:35 +0800

26 Jul, 2011

1 commit

2efaca927 mm/futex: fix futex writes on archs with SW tracking of dirty & young ... Browse Code »
2

I haven't reproduced it myself but the fail scenario is that on such
machines (notably ARM and some embedded powerpc), if you manage to hit
that futex path on a writable page whose dirty bit has gone from the PTE,
you'll livelock inside the kernel from what I can tell.

It will go in a loop of trying the atomic access, failing, trying gup to
"fix it up", getting succcess from gup, go back to the atomic access,
failing again because dirty wasn't fixed etc...

So I think you essentially hang in the kernel.

The scenario is probably rare'ish because affected architecture are
embedded and tend to not swap much (if at all) so we probably rarely hit
the case where dirty is missing or young is missing, but I think Shan has
a piece of SW that can reliably reproduce it using a shared writable
mapping & fork or something like that.

On archs who use SW tracking of dirty & young, a page without dirty is
effectively mapped read-only and a page without young unaccessible in the
PTE.

Additionally, some architectures might lazily flush the TLB when relaxing
write protection (by doing only a local flush), and expect a fault to
invalidate the stale entry if it's still present on another processor.

The futex code assumes that if the "in_atomic()" access -EFAULT's, it can
"fix it up" by causing get_user_pages() which would then be equivalent to
taking the fault.

However that isn't the case. get_user_pages() will not call
handle_mm_fault() in the case where the PTE seems to have the right
permissions, regardless of the dirty and young state. It will eventually
update those bits ... in the struct page, but not in the PTE.

Additionally, it will not handle the lazy TLB flushing that can be
required by some architectures in the fault case.

Basically, gup is the wrong interface for the job. The patch provides a
more appropriate one which boils down to just calling handle_mm_fault()
since what we are trying to do is simulate a real page fault.

The futex code currently attempts to write to user memory within a
pagefault disabled section, and if that fails, tries to fix it up using
get_user_pages().

This doesn't work on archs where the dirty and young bits are maintained
by software, since they will gate access permission in the TLB, and will
not be updated by gup().

In addition, there's an expectation on some archs that a spurious write
fault triggers a local TLB flush, and that is missing from the picture as
well.

I decided that adding those "features" to gup() would be too much for this
already too complex function, and instead added a new simpler
fixup_user_fault() which is essentially a wrapper around handle_mm_fault()
which the futex code can call.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix some nits Darren saw, fiddle comment layout]
Signed-off-by: Benjamin Herrenschmidt
Reported-by: Shan Hai
Tested-by: Shan Hai
Cc: David Laight
Acked-by: Peter Zijlstra
Cc: Darren Hart
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Benjamin Herrenschmidt
2011-07-26 11:57:11 +0800

08 Jul, 2011

1 commit

732375c6a plist: Remove the need to supply locks to plist heads ... Browse Code »

This was legacy code brought over from the RT tree and
is no longer necessary.

Signed-off-by: Dima Zavin
Acked-by: Thomas Gleixner
Cc: Daniel Walker
Cc: Steven Rostedt
Cc: Peter Zijlstra
Cc: Andi Kleen
Cc: Lai Jiangshan
Link: http://lkml.kernel.org/r/1310084879-10351-2-git-send-email-dima@android.com
Signed-off-by: Ingo Molnar

Dima Zavin
2011-07-08 20:02:53 +0800

15 Apr, 2011

1 commit

0cd9c6494 futex: Set FLAGS_HAS_TIMEOUT during futex_wait restart setup ... Browse Code »

The FLAGS_HAS_TIMEOUT flag was not getting set, causing the restart_block to
restart futex_wait() without a timeout after a signal.

Commit b41277dc7a18ee332d in 2.6.38 introduced the regression by accidentally
removing the the FLAGS_HAS_TIMEOUT assignment from futex_wait() during the setup
of the restart block. Restore the originaly behavior.

Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=32922

Reported-by: Tim Smith
Reported-by: Torsten Hilbrich
Signed-off-by: Darren Hart
Signed-off-by: Eric Dumazet
Cc: Peter Zijlstra
Cc: John Kacur
Cc: stable@kernel.org
Link: http://lkml.kernel.org/r/%3Cdaac0eb3af607f72b9a4d3126b2ba8fb5ed3b883.1302820917.git.dvhart%40linux.intel.com%3E
Signed-off-by: Thomas Gleixner

Darren Hart
2011-04-15 22:34:32 +0800

26 Mar, 2011

1 commit

94df491c4 Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel… ... Browse Code »

…/git/tip/linux-2.6-tip

* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
futex: Fix WARN_ON() test for UP
WARN_ON_SMP(): Allow use in if() statements on UP
x86, dumpstack: Use %pB format specifier for stack trace
vsprintf: Introduce %pB format specifier
lockdep: Remove unused 'factor' variable from lockdep_stats_show()

Linus Torvalds
2011-03-26 08:52:22 +0800

25 Mar, 2011

1 commit

290962021 futex: Fix WARN_ON() test for UP ... Browse Code »

An update of the futex code had a

WARN_ON(!spin_is_locked(q->lock_ptr))

But on UP, spin_is_locked() is always false, and will
trigger this warning, and even worse, it will exit the function
without doing the necessary work.

Converting this to a WARN_ON_SMP() fixes the problem.

Reported-by: Richard Weinberger
Tested-by: Richard Weinberger
Signed-off-by: Steven Rostedt
Acked-by: Thomas Gleixner
Acked-by: Peter Zijlstra
Acked-by: Darren Hart
Cc: Lai Jiangshan
LKML-Reference:
Signed-off-by: Ingo Molnar

Steven Rostedt
2011-03-25 18:32:11 +0800

24 Mar, 2011

1 commit

b0e77598f userns: user namespaces: convert several capable() calls ... Browse Code »

CAP_IPC_OWNER and CAP_IPC_LOCK can be checked against current_user_ns(),
because the resource comes from current's own ipc namespace.

setuid/setgid are to uids in own namespace, so again checks can be against
current_user_ns().

Changelog:
Jan 11: Use task_ns_capable() in place of sched_capable().
Jan 11: Use nsown_capable() as suggested by Bastian Blank.
Jan 11: Clarify (hopefully) some logic in futex and sched.c
Feb 15: use ns_capable for ipc, not nsown_capable
Feb 23: let copy_ipcs handle setting ipc_ns->user_ns
Feb 23: pass ns down rather than taking it from current

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Serge E. Hallyn
Acked-by: "Eric W. Biederman"
Acked-by: Daniel Lezcano
Acked-by: David Howells
Cc: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Serge E. Hallyn
2011-03-24 10:47:08 +0800

16 Mar, 2011

1 commit

0586bed3e Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kern… ... Browse Code »

…el/git/tip/linux-2.6-tip

* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
rtmutex: tester: Remove the remaining BKL leftovers
lockdep/timers: Explain in detail the locking problems del_timer_sync() may cause
rtmutex: Simplify PI algorithm and make highest prio task get lock
rwsem: Remove redundant asmregparm annotation
rwsem: Move duplicate function prototypes to linux/rwsem.h
rwsem: Unify the duplicate rwsem_is_locked() inlines
rwsem: Move duplicate init macros and functions to linux/rwsem.h
rwsem: Move duplicate struct rwsem declaration to linux/rwsem.h
x86: Cleanup rwsem_count_t typedef
rwsem: Cleanup includes
locking: Remove deprecated lock initializers
cred: Replace deprecated spinlock initialization
kthread: Replace deprecated spinlock initialization
xtensa: Replace deprecated spinlock initialization
um: Replace deprecated spinlock initialization
sparc: Replace deprecated spinlock initialization
mips: Replace deprecated spinlock initialization
cris: Replace deprecated spinlock initialization
alpha: Replace deprecated spinlock initialization
rtmutex-tester: Remove BKL tests

Linus Torvalds
2011-03-16 09:28:30 +0800

15 Mar, 2011

1 commit

6e0aa9f8a futex: Deobfuscate handle_futex_death() ... Browse Code »

handle_futex_death() uses futex_atomic_cmpxchg_inatomic() without
disabling page faults. That's ok, but totally non obvious.

We don't hold locks so we actually can and want to fault here, because
the get_user() before futex_atomic_cmpxchg_inatomic() does not
guarantee a R/W mapping.

We could just add a big fat comment to explain this, but actually
changing the code so that the functionality is entirely clear is
better.

Use the helper function which disables page faults around the
futex_atomic_cmpxchg_inatomic() and handle a fault with a call to
fault_in_user_writeable() as all other places in the futex code do as
well.

Pointed-out-by: Linus Torvalds
Signed-off-by: Thomas Gleixner
Acked-by: Darren Hart
Cc: Michel Lespinasse
Cc: Peter Zijlstra
Cc: Matt Turner
Cc: Russell King
Cc: David Howells
Cc: Tony Luck
Cc: Michal Simek
Cc: Ralf Baechle
Cc: "James E.J. Bottomley"
Cc: Benjamin Herrenschmidt
Cc: Martin Schwidefsky
Cc: Paul Mundt
Cc: "David S. Miller"
Cc: Chris Metcalf
LKML-Reference:
Signed-off-by: Thomas Gleixner

Thomas Gleixner
2011-03-15 04:08:47 +0800

12 Mar, 2011

3 commits

995612178 Merge branch 'tip/futex/devel' of git://git.kernel.org/pub/scm/linux/kernel/git/… ... Browse Code »

…rostedt/linux-2.6-rt into core/futexes

futex,plist: Pass the real head of the priority list to plist_del()
futex,plist: Remove debug lock assignment from plist_node
plist: Shrink struct plist_head
plist: Add priority list test

Thomas Gleixner
2011-03-12 18:43:32 +0800
017f2b239 futex,plist: Remove debug lock assignment from plist_node ... Browse Code »

The original code uses &plist_node->plist as the fake head of
the priority list for plist_del(), these debug locks in
the fake head are needed for CONFIG_DEBUG_PI_LIST.

But now we always pass the real head to plist_del(), the debug locks
in plist_node will not be used, so we remove these assignments.

Acked-by: Darren Hart
Signed-off-by: Lai Jiangshan
LKML-Reference:
Signed-off-by: Steven Rostedt

Lai Jiangshan
2011-03-12 04:09:53 +0800
2e12978a9 futex,plist: Pass the real head of the priority list to plist_del() ... Browse Code »

Some plist_del()s in kernel/futex.c are passed a faked head of the
priority list.

It does not fail because the current code does not require the real head
in plist_del(). The current code of plist_del() just uses the head for checking,
so it will not cause a bad result even when we use a faked head.

But it is undocumented usage:

/**
* plist_del - Remove a @node from plist.
*
* @node: &struct plist_node pointer - entry to be removed
* @head: &struct plist_head pointer - list head
*/

The document says that the @head is the "list head" head of the priority list.

In futex code, several places use "plist_del(&q->list, &q->list.plist);",
they pass a fake head. We need to fix them all.

Thanks to Darren Hart for many suggestions.

Acked-by: Darren Hart
Signed-off-by: Lai Jiangshan
LKML-Reference:
Signed-off-by: Steven Rostedt

Lai Jiangshan
2011-03-12 04:09:52 +0800

11 Mar, 2011

3 commits

37a9d912b futex: Sanitize cmpxchg_futex_value_locked API ... Browse Code »
44

The cmpxchg_futex_value_locked API was funny in that it returned either
the original, user-exposed futex value OR an error code such as -EFAULT.
This was confusing at best, and could be a source of livelocks in places
that retry the cmpxchg_futex_value_locked after trying to fix the issue
by running fault_in_user_writeable().

This change makes the cmpxchg_futex_value_locked API more similar to the
get_futex_value_locked one, returning an error code and updating the
original value through a reference argument.

Signed-off-by: Michel Lespinasse
Acked-by: Chris Metcalf [tile]
Acked-by: Tony Luck [ia64]
Acked-by: Thomas Gleixner
Tested-by: Michal Simek [microblaze]
Acked-by: David Howells [frv]
Cc: Darren Hart
Cc: Peter Zijlstra
Cc: Matt Turner
Cc: Russell King
Cc: Ralf Baechle
Cc: "James E.J. Bottomley"
Cc: Benjamin Herrenschmidt
Cc: Martin Schwidefsky
Cc: Paul Mundt
Cc: "David S. Miller"
Cc: Linus Torvalds
LKML-Reference:
Signed-off-by: Thomas Gleixner

Michel Lespinasse
2011-03-11 19:23:08 +0800
c0c9ed150 futex: Avoid redudant evaluation of task_pid_vnr() ... Browse Code »

The result is not going to change under us, so no need to reevaluate
this over and over. Seems to be a leftover from the mechanical mass
conversion of task->pid to task_pid_vnr(tsk).

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2011-03-11 19:23:07 +0800
8fe8f545c futex: Update futex_wait_setup comments about locking ... Browse Code »

Reviving a cleanup I had done about a year ago as part of a larger
futex_set_wait proposal. Over the years, the locking of the hashed
futex queue got improved, so that some of the "rare but normal" race
conditions described in comments can't actually happen anymore.

Signed-off-by: Michel Lespinasse
Cc: Linus Torvalds
Cc: Darren Hart
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Thomas Gleixner

Michel Lespinasse
2011-03-11 02:56:18 +0800

28 Jan, 2011

1 commit

8161239a8 rtmutex: Simplify PI algorithm and make highest prio task get lock ... Browse Code »

In current rtmutex, the pending owner may be boosted by the tasks
in the rtmutex's waitlist when the pending owner is deboosted
or a task in the waitlist is boosted. This boosting is unrelated,
because the pending owner does not really take the rtmutex.
It is not reasonable.

Example.

time1:
A(high prio) onwers the rtmutex.
B(mid prio) and C (low prio) in the waitlist.

time2
A release the lock, B becomes the pending owner
A(or other high prio task) continues to run. B's prio is lower
than A, so B is just queued at the runqueue.

time3
A or other high prio task sleeps, but we have passed some time
The B and C's prio are changed in the period (time2 ~ time3)
due to boosting or deboosting. Now C has the priority higher
than B. ***Is it reasonable that C has to boost B and help B to
get the rtmutex?

NO!! I think, it is unrelated/unneed boosting before B really
owns the rtmutex. We should give C a chance to beat B and
win the rtmutex.

This is the motivation of this patch. This patch *ensures*
only the top waiter or higher priority task can take the lock.

How?
1) we don't dequeue the top waiter when unlock, if the top waiter
is changed, the old top waiter will fail and go to sleep again.
2) when requiring lock, it will get the lock when the lock is not taken and:
there is no waiter OR higher priority than waiters OR it is top waiter.
3) In any time, the top waiter is changed, the top waiter will be woken up.

The algorithm is much simpler than before, no pending owner, no
boosting for pending owner.

Other advantage of this patch:
1) The states of a rtmutex are reduced a half, easier to read the code.
2) the codes become shorter.
3) top waiter is not dequeued until it really take the lock:
they will retain FIFO when it is stolen.

Not advantage nor disadvantage
1) Even we may wakeup multiple waiters(any time when top waiter changed),
we hardly cause "thundering herd",
the number of wokenup task is likely 1 or very little.
2) two APIs are changed.
rt_mutex_owner() will not return pending owner, it will return NULL when
the top waiter is going to take the lock.
rt_mutex_next_owner() always return the top waiter.
will not return NULL if we have waiters
because the top waiter is not dequeued.

I have fixed the code that use these APIs.

need updated after this patch is accepted
1) Document/*
2) the testcase scripts/rt-tester/t4-l2-pi-deboost.tst

Signed-off-by: Lai Jiangshan
LKML-Reference:
Reviewed-by: Steven Rostedt
Signed-off-by: Steven Rostedt

Lai Jiangshan
2011-01-28 10:13:51 +0800

16 Jan, 2011

1 commit

f9ee7f60d Merge branches 'core-fixes-for-linus', 'x86-fixes-for-linus', 'timers-fixes-for-… ... Browse Code »

…linus' and 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
rcu: avoid pointless blocked-task warnings
rcu: demote SRCU_SYNCHRONIZE_DELAY from kernel-parameter status
rtmutex: Fix comment about why new_owner can be NULL in wake_futex_pi()

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, olpc: Add missing Kconfig dependencies
x86, mrst: Set correct APB timer IRQ affinity for secondary cpu
x86: tsc: Fix calibration refinement conditionals to avoid divide by zero
x86, ia64, acpi: Clean up x86-ism in drivers/acpi/numa.c

* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
timekeeping: Make local variables static
time: Rename misnamed minsec argument of clocks_calc_mult_shift()

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
tracing: Remove syscall_exit_fields
tracing: Only process module tracepoints once
perf record: Add "nodelay" mode, disabled by default
perf sched: Fix list of events, dropping unsupported ':r' modifier
Revert "perf tools: Emit clearer message for sys_perf_event_open ENOENT return"
perf top: Fix annotate segv
perf evsel: Fix order of event list deletion

Linus Torvalds
2011-01-16 04:45:00 +0800

14 Jan, 2011

1 commit

a5b338f2b thp: update futex compound knowledge ... Browse Code »

Futex code is smarter than most other gup_fast O_DIRECT code and knows
about the compound internals. However now doing a put_page(head_page)
will not release the pin on the tail page taken by gup-fast, leading to
all sort of refcounting bugchecks. Getting a stable head_page is a little
tricky.

page_head = page is there because if this is not a tail page it's also the
page_head. Only in case this is a tail page, compound_head is called,
otherwise it's guaranteed unnecessary. And if it's a tail page
compound_head has to run atomically inside irq disabled section
__get_user_pages_fast before returning. Otherwise ->first_page won't be a
stable pointer.

Disableing irq before __get_user_page_fast and releasing irq after running
compound_head is needed because if __get_user_page_fast returns == 1, it
means the huge pmd is established and cannot go away from under us.
pmdp_splitting_flush_notify in __split_huge_page_splitting will have to
wait for local_irq_enable before the IPI delivery can return. This means
__split_huge_page_refcount can't be running from under us, and in turn
when we run compound_head(page) we're not reading a dangling pointer from
tailpage->first_page. Then after we get to stable head page, we are
always safe to call compound_lock and after taking the compound lock on
head page we can finally re-check if the page returned by gup-fast is
still a tail page. in which case we're set and we didn't need to split
the hugepage in order to take a futex on it.

Signed-off-by: Andrea Arcangeli
Acked-by: Mel Gorman
Acked-by: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrea Arcangeli
2011-01-14 09:32:39 +0800

11 Jan, 2011

1 commit

f123c98e7 rtmutex: Fix comment about why new_owner can be NULL in wake_futex_pi() ... Browse Code »

The comment about why rt_mutex_next_owner() can return NULL in
wake_futex_pi() is not the normal case.

Tracing the cause of why this occurs is more likely that waiter
simply timedout. But because it originally caused contention with
the futex, the owner will go into the kernel when it unlocks
the lock. Then it will hit this code path and
rt_mutex_next_owner() will return NULL.

Cc: Thomas Gleixner
Signed-off-by: Steven Rostedt
Signed-off-by: Ingo Molnar

Steven Rostedt
2011-01-11 22:17:24 +0800

10 Nov, 2010

4 commits

5bdb05f91 futex: Add futex_q static initializer ... Browse Code »

The futex_q struct has grown considerably over the last couple years. I
believe it now merits a static initializer to avoid uninitialized data
errors (having spent more time than I care to admit debugging an uninitialized
q.bitset in an experimental new op code).

With the key initializer built in, several of the FUTEX_KEY_INIT calls can
be removed.

V2: use a static variable instead of an init macro.
use a C99 initializer and don't rely on variable ordering in the struct.
V3: make futex_q_init const

Signed-off-by: Darren Hart
Cc: Peter Zijlstra
Cc: Eric Dumazet
Cc: John Kacur
Cc: Ingo Molnar
LKML-Reference:
Signed-off-by: Thomas Gleixner

Darren Hart
2010-11-10 22:01:34 +0800
b41277dc7 futex: Replace fshared and clockrt with combined flags ... Browse Code »

In the early days we passed the mmap sem around. That became the
"int fshared" with the fast gup improvements. Then we added
"int clockrt" in places. This patch unifies these options as "flags".

[ tglx: Split out the stale fshared cleanup ]

Signed-off-by: Darren Hart
Cc: Peter Zijlstra
Cc: Eric Dumazet
Cc: John Kacur
Cc: Ingo Molnar
LKML-Reference:
Signed-off-by: Thomas Gleixner

Darren Hart
2010-11-10 22:01:33 +0800
ae791a2d2 futex: Cleanup stale fshared flag interfaces ... Browse Code »

The fast GUP changes stopped using the fshared flag in
put_futex_keys(), but we kept the interface the same.

Cleanup all stale users.

This patch is split out from Darren Harts combo patch which also
combines various flags. This way the changes are clearly separated.

Signed-off-by: Thomas Gleixner
Cc: Darren Hart
LKML-Reference:

Thomas Gleixner
2010-11-10 22:01:33 +0800
4c115e951 futex: Address compiler warnings in exit_robust_list ... Browse Code »

Since commit 1dcc41bb (futex: Change 3rd arg of fetch_robust_entry()
to unsigned int*) some gcc versions decided to emit the following
warning:

kernel/futex.c: In function ‘exit_robust_list’:
kernel/futex.c:2492: warning: ‘next_pi’ may be used uninitialized in this function

The commit did not introduce the warning as gcc should have warned
before that commit as well. It's just gcc being silly.

The code path really can't result in next_pi being unitialized (or
should not), but let's keep the build clean. Annotate next_pi as an
uninitialized_var.

[ tglx: Addressed the same issue in futex_compat.c and massaged the
changelog ]

Signed-off-by: Darren Hart
Tested-by: Matt Fleming
Tested-by: Uwe Kleine-König
Cc: Peter Zijlstra
Cc: Eric Dumazet
Cc: John Kacur
Cc: Ingo Molnar
LKML-Reference:
Signed-off-by: Thomas Gleixner

Darren Hart
2010-11-10 20:27:50 +0800

26 Oct, 2010

1 commit

7de9c6ee3 new helper: ihold() ... Browse Code »

Clones an existing reference to inode; caller must already hold one.

Signed-off-by: Al Viro

Al Viro
2010-10-26 09:26:11 +0800

22 Oct, 2010

1 commit

b61f6a57f Merge branch 'futexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip ... Browse Code »

* 'futexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
futex: Fix kernel-doc notation & typos
futex: Add lock context annotations
futex: Mark restart_block.futex.uaddr[2] __user
futex: Change 3rd arg of fetch_robust_entry() to unsigned int*

Linus Torvalds
2010-10-22 05:06:17 +0800

19 Oct, 2010

1 commit

7ada876a8 futex: Fix errors in nested key ref-counting ... Browse Code »

futex_wait() is leaking key references due to futex_wait_setup()
acquiring an additional reference via the queue_lock() routine. The
nested key ref-counting has been masking bugs and complicating code
analysis. queue_lock() is only called with a previously ref-counted
key, so remove the additional ref-counting from the queue_(un)lock()
functions.

Also futex_wait_requeue_pi() drops one key reference too many in
unqueue_me_pi(). Remove the key reference handling from
unqueue_me_pi(). This was paired with a queue_lock() in
futex_lock_pi(), so the count remains unchanged.

Document remaining nested key ref-counting sites.

Signed-off-by: Darren Hart
Reported-and-tested-by: Matthieu Fertré
Reported-by: Louis Rilling
Cc: Peter Zijlstra
Cc: Eric Dumazet
Cc: John Kacur
Cc: Rusty Russell
LKML-Reference:
Signed-off-by: Thomas Gleixner
Cc: stable@kernel.org

Darren Hart
2010-10-19 17:41:54 +0800

14 Oct, 2010

1 commit

fb62db2ba futex: Fix kernel-doc notation & typos ... Browse Code »

Convert futex_requeue() function parameters to use @name
kernel-doc notation and add @fshared & @cmpval to prevent
kernel-doc warnings.

Add @list to struct futex_q.

Fix a few typos.

Signed-off-by: Randy Dunlap
Acked-by: Rusty Russell
LKML-Reference:
Signed-off-by: Ingo Molnar

Randy Dunlap
2010-10-14 14:57:35 +0800

18 Sep, 2010

3 commits

15e408cd6 futex: Add lock context annotations ... Browse Code »

queue_lock/unlock/me() and unqueue_me_pi() grab/release spinlocks
but are missing proper annotations. Add them.

Signed-off-by: Namhyung Kim
Cc: Peter Zijlstra
Cc: Darren Hart
LKML-Reference:
Signed-off-by: Thomas Gleixner

Namhyung Kim
2010-09-18 18:19:21 +0800
a3c74c525 futex: Mark restart_block.futex.uaddr[2] __user ... Browse Code »

@uaddr and @uaddr2 fields in restart_block.futex are user
pointers. Add __user and remove unnecessary casts.

Signed-off-by: Namhyung Kim
Cc: Peter Zijlstra
Cc: Darren Hart
LKML-Reference:
Signed-off-by: Thomas Gleixner

Namhyung Kim
2010-09-18 18:19:21 +0800
1dcc41bb0 futex: Change 3rd arg of fetch_robust_entry() to unsigned int* ... Browse Code »

Sparse complains:
kernel/futex.c:2495:59: warning: incorrect type in argument 3 (different signedness)

Make 3rd argument of fetch_robust_entry() 'unsigned int'.

Signed-off-by: Namhyung Kim
Cc: Peter Zijlstra
Cc: Darren Hart
LKML-Reference:
Signed-off-by: Thomas Gleixner

Namhyung Kim
2010-09-18 18:19:21 +0800

01 Jul, 2010

1 commit

7a0ea09ad futex: futex_find_get_task remove credentails check ... Browse Code »

futex_find_get_task is currently used (through lookup_pi_state) from two
contexts, futex_requeue and futex_lock_pi_atomic. None of the paths
looks it needs the credentials check, though. Different (e)uids
shouldn't matter at all because the only thing that is important for
shared futex is the accessibility of the shared memory.

The credentail check results in glibc assert failure or process hang (if
glibc is compiled without assert support) for shared robust pthread
mutex with priority inheritance if a process tries to lock already held
lock owned by a process with a different euid:

pthread_mutex_lock.c:312: __pthread_mutex_lock_full: Assertion `(-(e)) != 3 || !robust' failed.

The problem is that futex_lock_pi_atomic which is called when we try to
lock already held lock checks the current holder (tid is stored in the
futex value) to get the PI state. It uses lookup_pi_state which in turn
gets task struct from futex_find_get_task. ESRCH is returned either
when the task is not found or if credentials check fails.

futex_lock_pi_atomic simply returns if it gets ESRCH. glibc code,
however, doesn't expect that robust lock returns with ESRCH because it
should get either success or owner died.

Signed-off-by: Michal Hocko
Acked-by: Darren Hart
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: Nick Piggin
Cc: Alexey Kuznetsov
Cc: Peter Zijlstra
Signed-off-by: Linus Torvalds

Michal Hocko
2010-07-01 06:43:44 +0800

03 Feb, 2010

3 commits

59647b6ac futex: Handle futex value corruption gracefully ... Browse Code »

The WARN_ON in lookup_pi_state which complains about a mismatch
between pi_state->owner->pid and the pid which we retrieved from the
user space futex is completely bogus.

The code just emits the warning and then continues despite the fact
that it detected an inconsistent state of the futex. A conveniant way
for user space to spam the syslog.

Replace the WARN_ON by a consistency check. If the values do not match
return -EINVAL and let user space deal with the mess it created.

This also fixes the missing task_pid_vnr() when we compare the
pi_state->owner pid with the futex value.

Reported-by: Jermome Marchand
Signed-off-by: Thomas Gleixner
Acked-by: Darren Hart
Acked-by: Peter Zijlstra
Cc:

Thomas Gleixner
2010-02-03 22:13:22 +0800
51246bfd1 futex: Handle user space corruption gracefully ... Browse Code »

If the owner of a PI futex dies we fix up the pi_state and set
pi_state->owner to NULL. When a malicious or just sloppy programmed
user space application sets the futex value to 0 e.g. by calling
pthread_mutex_init(), then the futex can be acquired again. A new
waiter manages to enqueue itself on the pi_state w/o damage, but on
unlock the kernel dereferences pi_state->owner and oopses.

Prevent this by checking pi_state->owner in the unlock path. If
pi_state->owner is not current we know that user space manipulated the
futex value. Ignore the mess and return -EINVAL.

This catches the above case and also the case where a task hijacks the
futex by setting the tid value and then tries to unlock it.

Reported-by: Jermome Marchand
Signed-off-by: Thomas Gleixner
Acked-by: Darren Hart
Acked-by: Peter Zijlstra
Cc:

Thomas Gleixner
2010-02-03 22:13:22 +0800
5ecb01cfd futex_lock_pi() key refcnt fix ... Browse Code »

This fixes a futex key reference count bug in futex_lock_pi(),
where a key's reference count is incremented twice but decremented
only once, causing the backing object to not be released.

If the futex is created in a temporary file in an ext3 file system,
this bug causes the file's inode to become an "undead" orphan,
which causes an oops from a BUG_ON() in ext3_put_super() when the
file system is unmounted. glibc's test suite is known to trigger this,
see .

The bug is a regression from 2.6.28-git3, namely Peter Zijlstra's
38d47c1b7075bd7ec3881141bb3629da58f88dab "[PATCH] futex: rely on
get_user_pages() for shared futexes". That commit made get_futex_key()
also increment the reference count of the futex key, and updated its
callers to decrement the key's reference count before returning.
Unfortunately the normal exit path in futex_lock_pi() wasn't corrected:
the reference count is incremented by get_futex_key() and queue_lock(),
but the normal exit path only decrements once, via unqueue_me_pi().
The fix is to put_futex_key() after unqueue_me_pi(), since 2.6.31
this is easily done by 'goto out_put_key' rather than 'goto out'.

Signed-off-by: Mikael Pettersson
Acked-by: Peter Zijlstra
Acked-by: Darren Hart
Signed-off-by: Thomas Gleixner
Cc:

Mikael Pettersson
2010-02-03 22:13:22 +0800