23 Aug, 2010
1 commit
-
…/linux-2.6-rcu into core/rcu
21 Aug, 2010
8 commits
-
It's a really simple list, and several of the users want to go backwards
in it to find the previous vma. So rather than have to look up the
previous entry with 'find_vma_prev()' or something similar, just make it
doubly linked instead.Tested-by: Ian Campbell
Signed-off-by: Linus Torvalds -
kfifo_skip() is currently broken, due to the missing of the internal
helper function. Add it.Signed-off-by: Andrea Righi
Cc: Greg KH
Acked-by: Stefani Seibold
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Because list_empty() does not dereference any RCU-protected pointers, and
further does not pass such pointers to the caller (so that the caller
does not dereference them either), it is safe to use list_empty() on
RCU-protected lists. There is no need for a list_empty_rcu(). This
commit adds a comment stating this explicitly.Requested-by: Andrew Morton
Signed-off-by: Paul E. McKenney -
The CONFIG_PREEMPT_RCU kernel configuration parameter was recently
re-introduced, but as an indication of the type of RCU (preemptible
vs. non-preemptible) instead of as selecting a given implementation.
This commit uses CONFIG_PREEMPT_RCU to combine duplicate code
from include/linux/rcutiny.h and include/linux/rcutree.h into
include/linux/rcupdate.h. This commit also combines a few other pieces
of duplicate code that have accumulated.Signed-off-by: Paul E. McKenney
-
It is illegal to wait for an SRCU grace period while within the
corresponding flavor of SRCU read-side critical section. Therefore,
this commit updates the srcu_read_lock() docbook accordingly.Signed-off-by: Paul E. McKenney
-
Combine the duplicate definitions of ULONG_CMP_GE(), ULONG_CMP_LT(),
and rcu_preempt_depth() into include/linux/rcupdate.h.Signed-off-by: Paul E. McKenney
-
When using a kernel debugger, a long sojourn in the debugger can get
you lots of RCU CPU stall warnings once you resume. This might not be
helpful, especially if you are using the system console. This patch
therefore allows RCU CPU stall warnings to be suppressed, but only for
the duration of the current set of grace periods.This differs from Jason's original patch in that it adds support for
tiny RCU and preemptible RCU, and uses a slightly different method for
suppressing the RCU CPU stall warning messages.Signed-off-by: Jason Wessel
Signed-off-by: Paul E. McKenney
Tested-by: Jason Wessel -
The comment says that blocking is illegal in rcu_read_lock()-style
RCU read-side critical sections, which is no longer entirely true
given preemptible RCU. This commit provides a fix.Suggested-by: David Miller
Signed-off-by: Paul E. McKenney
20 Aug, 2010
16 commits
-
Implement a small-memory-footprint uniprocessor-only implementation of
preemptible RCU. This implementation uses but a single blocked-tasks
list rather than the combinatorial number used per leaf rcu_node by
TREE_PREEMPT_RCU, which reduces memory consumption and greatly simplifies
processing. This version also takes advantage of uniprocessor execution
to accelerate grace periods in the case where there are no readers.The general design is otherwise broadly similar to that of TREE_PREEMPT_RCU.
This implementation is a step towards having RCU implementation driven
off of the SMP and PREEMPT kernel configuration variables, which can
happen once this implementation has accumulated sufficient experience.Removed ACCESS_ONCE() from __rcu_read_unlock() and added barrier() as
suggested by Steve Rostedt in order to avoid the compiler-reordering
issue noted by Mathieu Desnoyers (http://lkml.org/lkml/2010/8/16/183).As can be seen below, CONFIG_TINY_PREEMPT_RCU represents almost 5Kbyte
savings compared to CONFIG_TREE_PREEMPT_RCU. Of course, for non-real-time
workloads, CONFIG_TINY_RCU is even better.CONFIG_TREE_PREEMPT_RCU
text data bss dec filename
13 0 0 13 kernel/rcupdate.o
6170 825 28 7023 kernel/rcutree.o
----
7026 TotalCONFIG_TINY_PREEMPT_RCU
text data bss dec filename
13 0 0 13 kernel/rcupdate.o
2081 81 8 2170 kernel/rcutiny.o
----
2183 TotalCONFIG_TINY_RCU (non-preemptible)
text data bss dec filename
13 0 0 13 kernel/rcupdate.o
719 25 0 744 kernel/rcutiny.o
---
757 TotalRequested-by: Loïc Minier
Signed-off-by: Paul E. McKenney -
RCU heads really don't need to be initialized. Their state before call_rcu()
really does not matter.We need to keep init/destroy_rcu_head_on_stack() though, since we want
debugobjects to be able to keep track of these objects.Signed-off-by: Alexey Dobriyan
Signed-off-by: Mathieu Desnoyers
CC: David S. Miller
CC: "Paul E. McKenney"
CC: akpm@linux-foundation.org
CC: mingo@elte.hu
CC: laijs@cn.fujitsu.com
CC: dipankar@in.ibm.com
CC: josh@joshtriplett.org
CC: dvhltc@us.ibm.com
CC: niv@us.ibm.com
CC: tglx@linutronix.de
CC: peterz@infradead.org
CC: rostedt@goodmis.org
CC: Valdis.Kletnieks@vt.edu
CC: dhowells@redhat.com
CC: eric.dumazet@gmail.com
CC: Alexey Dobriyan
Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett -
This adds annotations for RCU operations in core kernel components
Signed-off-by: Arnd Bergmann
Signed-off-by: Paul E. McKenney
Cc: Al Viro
Cc: Jens Axboe
Cc: Andrew Morton
Reviewed-by: Josh Triplett -
Signed-off-by: Arnd Bergmann
Signed-off-by: Paul E. McKenney
Cc: Manfred Spraul
Reviewed-by: Josh Triplett -
Signed-off-by: Arnd Bergmann
Signed-off-by: Paul E. McKenney
Cc: Nick Piggin
Reviewed-by: Josh Triplett -
Signed-off-by: Arnd Bergmann
Signed-off-by: Paul E. McKenney
Cc: Alan Cox
Reviewed-by: Josh Triplett -
Make it explicit that new RCU read-side critical sections that start
after call_rcu() and synchronize_rcu() start might still be running
after the end of the relevant grace period.Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett -
find_task_by_vpid() says "Must be called under rcu_read_lock().". But due to
commit 3120438 "rcu: Disable lockdep checking in RCU list-traversal primitives",
we are currently unable to catch "find_task_by_vpid() with tasklist_lock held
but RCU lock not held" errors due to the RCU-lockdep checks being
suppressed in the RCU variants of the struct list_head traversals.
This commit therefore places an explicit check for being in an RCU
read-side critical section in find_task_by_pid_ns().===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
kernel/pid.c:386 invoked rcu_dereference_check() without protection!other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 1
1 lock held by rc.sysinit/1102:
#0: (tasklist_lock){.+.+..}, at: [] sys_setpgid+0x40/0x160stack backtrace:
Pid: 1102, comm: rc.sysinit Not tainted 2.6.35-rc3-dirty #1
Call Trace:
[] lockdep_rcu_dereference+0x94/0xb0
[] find_task_by_pid_ns+0x6d/0x70
[] find_task_by_vpid+0x18/0x20
[] sys_setpgid+0x47/0x160
[] sysenter_do_call+0x12/0x36Commit updated to use a new rcu_lockdep_assert() exported API rather than
the old internal __do_rcu_dereference().Signed-off-by: Tetsuo Handa
Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett -
Signed-off-by: Arnd Bergmann
Signed-off-by: Paul E. McKenney
Cc: Avi Kivity
Cc: Marcelo Tosatti
Reviewed-by: Josh Triplett -
Signed-off-by: Arnd Bergmann
Signed-off-by: Paul E. McKenney
Cc: Dmitry Torokhov
Acked-by: Dmitry Torokhov
Reviewed-by: Josh Triplett -
Signed-off-by: Arnd Bergmann
Signed-off-by: Paul E. McKenney
Acked-by: Trond Myklebust -
Signed-off-by: Arnd Bergmann
Signed-off-by: Paul E. McKenney
Acked-by: David Howells
Reviewed-by: Josh Triplett -
Signed-off-by: Arnd Bergmann
Signed-off-by: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Ingo Molnar
Acked-by: David Howells
Reviewed-by: Josh Triplett -
Signed-off-by: Arnd Bergmann
Signed-off-by: Paul E. McKenney
Acked-by: Paul Menage
Cc: Li Zefan
Reviewed-by: Josh Triplett -
This avoids warnings from missing __rcu annotations
in the rculist implementation, making it possible to
use the same lists in both RCU and non-RCU cases.We can add rculist annotations later, together with
lockdep support for rculist, which is missing as well,
but that may involve changing all the users.Signed-off-by: Arnd Bergmann
Signed-off-by: Paul E. McKenney
Cc: Pavel Emelyanov
Cc: Sukadev Bhattiprolu
Reviewed-by: Josh Triplett -
This commit provides definitions for the __rcu annotation defined earlier.
This annotation permits sparse to check for correct use of RCU-protected
pointers. If a pointer that is annotated with __rcu is accessed
directly (as opposed to via rcu_dereference(), rcu_assign_pointer(),
or one of their variants), sparse can be made to complain. To enable
such complaints, use the new default-disabled CONFIG_SPARSE_RCU_POINTER
kernel configuration option. Please note that these sparse complaints are
intended to be a debugging aid, -not- a code-style-enforcement mechanism.There are special rcu_dereference_protected() and rcu_access_pointer()
accessors for use when RCU read-side protection is not required, for
example, when no other CPU has access to the data structure in question
or while the current CPU hold the update-side lock.This patch also updates a number of docbook comments that were showing
their age.Signed-off-by: Arnd Bergmann
Signed-off-by: Paul E. McKenney
Cc: Christopher Li
Reviewed-by: Josh Triplett
19 Aug, 2010
2 commits
-
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
fs: brlock vfsmount_lock
fs: scale files_lock
lglock: introduce special lglock and brlock spin locks
tty: fix fu_list abuse
fs: cleanup files_lock locking
fs: remove extra lookup in __lookup_hash
fs: fs_struct rwlock to spinlock
apparmor: use task path helpers
fs: dentry allocation consolidation
fs: fix do_lookup false negative
mbcache: Limit the maximum number of cache entries
hostfs ->follow_link() braino
hostfs: dumb (and usually harmless) tpyo - strncpy instead of strlcpy
remove SWRITE* I/O types
kill BH_Ordered flag
vfs: update ctime when changing the file's permission by setfacl
cramfs: only unlock new inodes
fix reiserfs_evict_inode end_writeback second call -
* 'merge-devicetree' of git://git.secretlab.ca/git/linux-2.6:
spi.h: missing kernel-doc notation, please fix
of: fix missing headers for of_address_to_resource() in MTD and SysACE drivers
of: Fix missing includes
ata: update for of_device to platform_device replacement
microblaze: Fix of: eliminate of_device->node and dev_archdata->{of,prom}_node
microblaze: Fix of/address: Merge all of the bus translation code
booting-without-of: Remove nonexistent chapters from TOC, fix numbering
18 Aug, 2010
11 commits
-
fs: scale files_lock
Improve scalability of files_lock by adding per-cpu, per-sb files lists,
protected with an lglock. The lglock provides fast access to the per-cpu lists
to add and remove files. It also provides a snapshot of all the per-cpu lists
(although this is very slow).One difficulty with this approach is that a file can be removed from the list
by another CPU. We must track which per-cpu list the file is on with a new
variale in the file struct (packed into a hole on 64-bit archs). Scalability
could suffer if files are frequently removed from different cpu's list.However loads with frequent removal of files imply short interval between
adding and removing the files, and the scheduler attempts to avoid moving
processes too far away. Also, even in the case of cross-CPU removal, the
hardware has much more opportunity to parallelise cacheline transfers with N
cachelines than with 1.A worst-case test of 1 CPU allocating files subsequently being freed by N CPUs
degenerates to contending on a single lock, which is no worse than before. When
more than one CPU are allocating files, even if they are always freed by
different CPUs, there will be more parallelism than the single-lock case.Testing results:
On a 2 socket, 8 core opteron, I measure the number of times the lock is taken
to remove the file, the number of times it is removed by the same CPU that
added it, and the number of times it is removed by the same node that added it.Booting: locks= 25049 cpu-hits= 23174 (92.5%) node-hits= 23945 (95.6%)
kbuild -j16 locks=2281913 cpu-hits=2208126 (96.8%) node-hits=2252674 (98.7%)
dbench 64 locks=4306582 cpu-hits=4287247 (99.6%) node-hits=4299527 (99.8%)So a file is removed from the same CPU it was added by over 90% of the time.
It remains within the same node 95% of the time.Tim Chen ran some numbers for a 64 thread Nehalem system performing a compile.
throughput
2.6.34-rc2 24.5
+patch 24.9us sys idle IO wait (in %)
2.6.34-rc2 51.25 28.25 17.25 3.25
+patch 53.75 18.5 19 8.75So significantly less CPU time spent in kernel code, higher idle time and
slightly higher throughput.Single threaded performance difference was within the noise of microbenchmarks.
That is not to say penalty does not exist, the code is larger and more memory
accesses required so it will be slightly slower.Cc: linux-kernel@vger.kernel.org
Cc: Tim Chen
Cc: Andi Kleen
Signed-off-by: Nick Piggin
Signed-off-by: Al Viro -
lglock: introduce special lglock and brlock spin locks
This patch introduces "local-global" locks (lglocks). These can be used to:
- Provide fast exclusive access to per-CPU data, with exclusive access to
another CPU's data allowed but possibly subject to contention, and to provide
very slow exclusive access to all per-CPU data.
- Or to provide very fast and scalable read serialisation, and to provide
very slow exclusive serialisation of data (not necessarily per-CPU data).Brlocks are also implemented as a short-hand notation for the latter use
case.Thanks to Paul for local/global naming convention.
Cc: linux-kernel@vger.kernel.org
Cc: Al Viro
Cc: "Paul E. McKenney"
Signed-off-by: Nick Piggin
Signed-off-by: Al Viro -
tty: fix fu_list abuse
tty code abuses fu_list, which causes a bug in remount,ro handling.
If a tty device node is opened on a filesystem, then the last link to the inode
removed, the filesystem will be allowed to be remounted readonly. This is
because fs_may_remount_ro does not find the 0 link tty inode on the file sb
list (because the tty code incorrectly removed it to use for its own purpose).
This can result in a filesystem with errors after it is marked "clean".Taking idea from Christoph's initial patch, allocate a tty private struct
at file->private_data and put our required list fields in there, linking
file and tty. This makes tty nodes behave the same way as other device nodes
and avoid meddling with the vfs, and avoids this bug.The error handling is not trivial in the tty code, so for this bugfix, I take
the simple approach of using __GFP_NOFAIL and don't worry about memory errors.
This is not a problem because our allocator doesn't fail small allocs as a rule
anyway. So proper error handling is left as an exercise for tty hackers.[ Arguably filesystem's device inode would ideally be divorced from the
driver's pseudo inode when it is opened, but in practice it's not clear whether
that will ever be worth implementing. ]Cc: linux-kernel@vger.kernel.org
Cc: Christoph Hellwig
Cc: Alan Cox
Cc: Greg Kroah-Hartman
Signed-off-by: Nick Piggin
Signed-off-by: Al Viro -
fs: cleanup files_lock locking
Lock tty_files with a new spinlock, tty_files_lock; provide helpers to
manipulate the per-sb files list; unexport the files_lock spinlock.Cc: linux-kernel@vger.kernel.org
Cc: Christoph Hellwig
Cc: Alan Cox
Acked-by: Andi Kleen
Acked-by: Greg Kroah-Hartman
Signed-off-by: Nick Piggin
Signed-off-by: Al Viro -
fs: fs_struct rwlock to spinlock
struct fs_struct.lock is an rwlock with the read-side used to protect root and
pwd members while taking references to them. Taking a reference to a path
typically requires just 2 atomic ops, so the critical section is very small.
Parallel read-side operations would have cacheline contention on the lock, the
dentry, and the vfsmount cachelines, so the rwlock is unlikely to ever give a
real parallelism increase.Replace it with a spinlock to avoid one or two atomic operations in typical
path lookup fastpath.Signed-off-by: Nick Piggin
Signed-off-by: Al Viro -
These flags aren't real I/O types, but tell ll_rw_block to always
lock the buffer instead of giving up on a failed trylock.Instead add a new write_dirty_buffer helper that implements this semantic
and use it from the existing SWRITE* callers. Note that the ll_rw_block
code had a bug where it didn't promote WRITE_SYNC_PLUG properly, which
this patch fixes.In the ufs code clean up the helper that used to call ll_rw_block
to mirror sync_dirty_buffer, which is the function it implements for
compound buffers.Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro -
Instead of abusing a buffer_head flag just add a variant of
sync_dirty_buffer which allows passing the exact type of write
flag required.Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro -
Added comments in kernel-doc notation for previously added struct fields.
Signed-off-by: Ernst Schwab
Acked-by: Randy Dunlap
Signed-off-by: Grant Likely -
* master.kernel.org:/home/rmk/linux-2.6-arm:
VIDEO: amba clcd: don't disable an already disabled clock
ARM: Tighten check for allowable CPSR values
ARM: 6329/1: wire up sys_accept4() on ARM
ARM: 6328/1: Build with -fno-dwarf2-cfi-asm
ARM: 6326/1: kgdb: fix GDB_MAX_REGS no longer used -
Make do_execve() take a const filename pointer so that kernel_execve() compiles
correctly on ARM:arch/arm/kernel/sys_arm.c:88: warning: passing argument 1 of 'do_execve' discards qualifiers from pointer target type
This also requires the argv and envp arguments to be consted twice, once for
the pointer array and once for the strings the array points to. This is
because do_execve() passes a pointer to the filename (now const) to
copy_strings_kernel(). A simpler alternative would be to cast the filename
pointer in do_execve() when it's passed to copy_strings_kernel().do_execve() may not change any of the strings it is passed as part of the argv
or envp lists as they are some of them in .rodata, so marking these strings as
const should be fine.Further kernel_execve() and sys_execve() need to be changed to match.
This has been test built on x86_64, frv, arm and mips.
Signed-off-by: David Howells
Tested-by: Ralf Baechle
Acked-by: Russell King
Signed-off-by: Linus Torvalds -
Fix the clock enable/disable tracking in the AMBA CLCD driver so
that the driver doesn't try to disable an already disabled clock,
thereby causing the clock (if shared) to become unbalanced.This resolves a problem with CLCD on LPC32xx ARM platforms.
Reported-by: Kevin Wells
Signed-off-by: Russell King
16 Aug, 2010
1 commit
-
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
gcc-4.6: ACPI: fix unused but set variables in ACPI
ACPI thermal: make procfs I/F depend on CONFIG_ACPI_PROCFS
ACPI video: make procfs I/F depend on CONFIG_ACPI_PROCFS
ACPI processor: remove deprecated ACPI procfs I/F
ACPI power_resource: remove unused procfs I/F
ACPI: remove deprecated ACPI procfs I/F
ACPI: introduce drivers/acpi/sysfs.c
ACPI: introduce module parameter acpi.aml_debug_output
ACPI: introduce drivers/acpi/debugfs.c
ACPI, APEI, ERST debug support
ACPI, APEI, Manage GHES as platform devices
ACPI, APEI, Rename CPER and GHES severity constants
ACPI, APEI, Fix a typo of error path of apei_resources_request
ACPI / ACPICA: Fix reference counting problems with GPE handlers
ACPI: Add the check of ADR flag in course of finding ACPI handle for PCI device
ACPI / Sleep: Drop acpi_suspend_finish()
ACPI / Sleep: Consolidate suspend and hibernation routines
ACPI / Wakeup: Simplify enabling of wakeup devices
ACPI / Sleep: Rework enabling wakeup devices
ACPI / Sleep: Free NVS copy if suspending of devices failsFixed up totally buggered "ACPI: fix unused but set variables in ACPI"
patch that doesn't even compile in the merge.Thanks to Sedat Dilek for noticing the
breakage before I even pulled. And a big "Grrr.." at Len for not even
bothering to compile the tree before asking me to pull.
15 Aug, 2010
1 commit
-
Conflicts:
drivers/acpi/debug.cSigned-off-by: Len Brown