20 Jul, 2007
40 commits
-
Lguest block driver
A simple block driver for lguest.
Signed-off-by: Rusty Russell
Cc: Andi Kleen
Cc: Jens Axboe
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Lguest net driver
A simple net driver for lguest.
[akpm@linux-foundation.org: include fix]
Signed-off-by: Rusty Russell
Cc: Andi Kleen
Cc: Jeff Garzik
Acked-by: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
A simple console driver for lguest.
Signed-off-by: Rusty Russell
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This is the Kconfig and Makefile to allow lguest to actually be
compiled.Signed-off-by: Rusty Russell
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This is the structure offsets required by lg.ko's switcher.S.
Unfortunately we don't have infrastructure for private asm-offsets
creation.Signed-off-by: Rusty Russell
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This is the code for the "lg.ko" module, which allows lguest guests to
be launched.[akpm@linux-foundation.org: update for futex-new-private-futexes]
[akpm@linux-foundation.org: build fix]
[jmorris@namei.org: lguest: use hrtimers]
[akpm@linux-foundation.org: x86_64 build fix]
Signed-off-by: Rusty Russell
Cc: Andi Kleen
Cc: Eric Dumazet
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
lguest is a simple hypervisor for Linux on Linux. Unlike kvm it doesn't need
VT/SVM hardware. Unlike Xen it's simply "modprobe and go". Unlike both, it's
5000 lines and self-contained.Performance is ok, but not great (-30% on kernel compile). But given its
hackability, I expect this to improve, along with the paravirt_ops code which
it supplies a complete example for. There's also a 64-bit version being
worked on and other craziness.But most of all, lguest is awesome fun! Too much of the kernel is a big ball
of hair. lguest is simple enough to dive into and hack, plus has some warts
which scream "fork me!".This patch:
This is the code and headers required to make an i386 kernel an lguest guest.
Signed-off-by: Rusty Russell
Cc: Andi Kleen
Cc: Jeremy Fitzhardinge
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
lguest does some fairly lowlevel things to support a host, which
normal modules don't need:math_state_restore:
When the guest triggers a Device Not Available fault, we need
to be able to restore the FPU__put_task_struct:
We need to hold a reference to another task for inter-guest
I/O, and put_task_struct() is an inline function which calls
__put_task_struct.access_process_vm:
We need to access another task for inter-guest I/O.map_vm_area & __get_vm_area:
We need to map the switcher shim (ie. monitor) at 0xFFC01000.Signed-off-by: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Adds support for periodic irq enabling in rtc-cmos. This could be used by
the ALSA driver and is already being tested with the zaptel ztdummy module.Signed-off-by: Alessandro Zummo
Cc: David Brownell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Share a little common code, reverse the arguments for consistency, drop the
unnecessary "inline", and lowercase the name.Signed-off-by: "J. Bruce Fields"
Acked-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
EX_RDONLY is only called in one place; just put it there.
Signed-off-by: "J. Bruce Fields"
Acked-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
We can now assume that rqst_exp_get_by_name() does not return NULL; so clean
up some unnecessary checks.Signed-off-by: "J. Bruce Fields"
Acked-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
I converted the various export-returning functions to return -ENOENT instead
of NULL, but missed a few cases.This particular case could cause actual bugs in the case of a krb5 client that
doesn't match any ip-based client and that is trying to access a filesystem
not exported to krb5 clients.Signed-off-by: "J. Bruce Fields"
Acked-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The value of nperbucket calculated here is too small--we should be rounding up
instead of down--with the result that the index j in the following loop can
overflow the raparm_hash array. At least in my case, the next thing in memory
turns out to be export_table, so the symptoms I see are crashes caused by the
appearance of four zeroed-out export entries in the first bucket of the hash
table of exports (which were actually entries in the readahead cache, a
pointer to which had been written to the export table in this initialization
code).It looks like the bug was probably introduced with commit
fce1456a19f5c08b688c29f00ef90fdfa074c79b ("knfsd: make the readahead params
cache SMP-friendly").Cc:
Cc: Greg Banks
Signed-off-by: "J. Bruce Fields"
Acked-by: NeilBrown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
page-writeback accounting is presently performed in the page-flags macros.
This is inconsistent and a bit ugly and makes it awkward to implement
per-backing_dev under-writeback page accounting.So move this accounting down to the callsite(s).
Acked-by: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
clocksource_adjust() has a clock argument, which shadows the file global clock
variable. Fix this up.Signed-off-by: Thomas Gleixner
Cc: john stultz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Remove is_in_rom() function. It doesn't actually serve the purpose it was
intended to. If you look at the use of it _access_ok() (which is the only use
of it) then it is obvious that most of memory is marked as access_ok. No
point having is_in_rom() then, so remove it.Signed-off-by: Greg Ungerer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
In die_if_kernel() start the stack dump at the exception-time SP, not at the
SP with all the saved registers; the stack below exception-time sp contains
only exception-saved values and is already printed in details just before.Signed-off-by: Philippe De Muyter
Signed-off-by: Greg Ungerer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Change the m68knommu irq handling to use the generic irq framework.
Signed-off-by: Greg Ungerer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Use appropriate accessor function to set compound page destructor
function.Cc: William Irwin
Signed-off-by: Akinobu Mita
Acked-by: Adam Litke
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The fix to that race in alloc_fresh_huge_page() which could give an illegal
node ID did not need nid_lock at all: the fix was to replace static int nid
by static int prev_nid and do the work on local int nid. nid_lock did make
sure that racers strictly roundrobin the nodes, but that's not something we
need to enforce strictly. Kill nid_lock.Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
There is check_reset() -- global function in drivers/isdn/sc/
There is check_reset -- variable holding module param in aacraid driver.On allyesconfig they clash with:
LD drivers/built-in.o
drivers/isdn/built-in.o: In function `check_reset':
: multiple definition of `check_reset'
drivers/scsi/built-in.o:(.data+0xe458): first defined here
ld: Warning: size of symbol `check_reset' changed from 4 in drivers/scsi/built-in.o to 219 in drivers/isdn/built-in.o
ld: Warning: type of symbol `check_reset' changed from 1 to 2 in drivers/isdn/built-in.oRename the former.
Signed-off-by: Alexey Dobriyan
Cc: Karsten Keil
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
I've noticed lots of failures of vmalloc_32 on machines where it
shouldn't have failed unless it was doing an atomic operation.Looking closely, I noticed that:
#if defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA32)
#define GFP_VMALLOC32 GFP_DMA32
#elif defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA)
#define GFP_VMALLOC32 GFP_DMA
#else
#define GFP_VMALLOC32 GFP_KERNEL
#endifWhich seems to be incorrect, it should always -or- in the DMA flags
on top of GFP_KERNEL, thus this patch.This fixes frequent errors launchin X with the nouveau DRM for example.
Signed-off-by: Benjamin Herrenschmidt
Cc: Andi Kleen
Cc: Dave Airlie
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Work around a possible bug in the FRV compiler.
What appears to be happening is that gcc resolves the
__builtin_constant_p() in kmalloc() to true, but then fails to reduce the
therefore constant conditions in the if-statements it guards to constant
results.When compiling with -O2 or -Os, one single spurious error crops up in
cpuup_callback() in mm/slab.c. This can be avoided by making the memsize
variable const.Signed-off-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
mm/hugetlb.c: In function `dequeue_huge_page':
mm/hugetlb.c:72: warning: 'nid' might be used uninitialized in this functionCc: Christoph Lameter
Cc: Adam Litke
Cc: David Gibson
Cc: William Lee Irwin III
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Transform some calls to kmalloc/memset to a single kzalloc (or kcalloc).
Here is a short excerpt of the semantic patch performing
this transformation:@@
type T2;
expression x;
identifier f,fld;
expression E;
expression E1,E2;
expression e1,e2,e3,y;
statement S;
@@x =
- kmalloc
+ kzalloc
(E1,E2)
... when != \(x->fld=E;\|y=f(...,x,...);\|f(...,x,...);\|x=E;\|while(...) S\|for(e1;e2;e3) S\)
- memset((T2)x,0,E1);@@
expression E1,E2,E3;
@@- kzalloc(E1 * E2,E3)
+ kcalloc(E1,E2,E3)[akpm@linux-foundation.org: get kcalloc args the right way around]
Signed-off-by: Yoann Padioleau
Cc: Richard Henderson
Cc: Ivan Kokshaysky
Acked-by: Russell King
Cc: Bryan Wu
Acked-by: Jiri Slaby
Cc: Dave Airlie
Acked-by: Roland Dreier
Cc: Jiri Kosina
Acked-by: Dmitry Torokhov
Cc: Benjamin Herrenschmidt
Acked-by: Mauro Carvalho Chehab
Acked-by: Pierre Ossman
Cc: Jeff Garzik
Cc: "David S. Miller"
Acked-by: Greg KH
Cc: James Bottomley
Cc: "Antonino A. Daplas"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The print_stack_trace macro in stacktrace.h has a wrong number of
arguments, fix it.Signed-off-by: Johannes Berg
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
When I started adding support for lockdep to 64-bit powerpc, I got a
lockdep_init_error and with this patch was able to pinpoint why and where
to put lockdep_init(). Let's support this generally for others adding
lockdep support to their architecture.Signed-off-by: Johannes Berg
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
optionally add class->name_version and class->subclass to the class name
Signed-off-by: Peter Zijlstra
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
__acquire
|
lock _____
| \
| __contended
| |
| wait
| _______/
|/
|
__acquired
|
__release
|
unlockWe measure acquisition and contention bouncing.
This is done by recording a cpu stamp in each lock instance.
Contention bouncing requires the cpu stamp to be set on acquisition. Hence we
move __acquired into the generic path.__acquired is then used to measure acquisition bouncing by comparing the
current cpu with the old stamp before replacing it.__contended is used to measure contention bouncing (only useful for preemptable
locks)[akpm@linux-foundation.org: cleanups]
Signed-off-by: Peter Zijlstra
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
the two init sites resulted in inconsistend names for the lock class.
Signed-off-by: Peter Zijlstra
Acked-by: Ingo Molnar
Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
- update the copyright notices
- use the default hash function
- fix a thinko in a BUILD_BUG_ON
- add a WARN_ON to spot inconsitent naming
- fix a termination issue in /proc/lock_stat[akpm@linux-foundation.org: cleanups]
Signed-off-by: Peter Zijlstra
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Call the new lockstat tracking functions from the various lock primitives.
Signed-off-by: Peter Zijlstra
Acked-by: Ingo Molnar
Acked-by: Jason Baron
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Present all this fancy new lock statistics information:
*warning, _wide_ output ahead*
(output edited for purpose of brevity)
# cat /proc/lock_stat
lock_stat version 0.1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
class name contentions waittime-min waittime-max waittime-total acquisitions holdtime-min holdtime-max holdtime-total
-----------------------------------------------------------------------------------------------------------------------------------------------------------------&inode->i_mutex: 14458 6.57 398832.75 2469412.23 6768876 0.34 11398383.65 339410830.89
---------------
&inode->i_mutex 4486 [] pipe_wait+0x86/0x8d
&inode->i_mutex 0 [] pipe_write_fasync+0x29/0x5d
&inode->i_mutex 0 [] pipe_read+0x74/0x3a5
&inode->i_mutex 0 [] do_lookup+0x81/0x1ae.................................................................................................................................................................
&inode->i_data.tree_lock-W: 491 0.27 62.47 493.89 2477833 0.39 468.89 1146584.25
&inode->i_data.tree_lock-R: 65 0.44 4.27 48.78 26288792 0.36 184.62 10197458.24
--------------------------
&inode->i_data.tree_lock 46 [] __do_page_cache_readahead+0x69/0x24f
&inode->i_data.tree_lock 31 [] add_to_page_cache+0x31/0xba
&inode->i_data.tree_lock 0 [] __do_page_cache_readahead+0xc2/0x24f
&inode->i_data.tree_lock 0 [] find_get_page+0x1a/0x58.................................................................................................................................................................
proc_inum_idr.lock: 0 0.00 0.00 0.00 36 0.00 65.60 148.26
proc_subdir_lock: 0 0.00 0.00 0.00 3049859 0.00 106.81 1563212.42
shrinker_rwsem-W: 0 0.00 0.00 0.00 5 0.00 1.73 3.68
shrinker_rwsem-R: 0 0.00 0.00 0.00 633 2.57 246.57 10909.76'contentions' and 'acquisitions' are the number of such events measured (since
the last reset). The waittime- and holdtime- (min, max, total) numbers are
presented in microseconds.If there are any contention points, the lock class is presented in the block
format (as i_mutex and tree_lock above), otherwise a single line of output is
presented.The output is sorted on absolute number of contentions (read + write), this
should get the worst offenders presented first, so that:# grep : /proc/lock_stat | head
will quickly show who's bad.
The stats can be reset using:
# echo 0 > /proc/lock_stat
[bunk@stusta.de: make 2 functions static]
[akpm@linux-foundation.org: fix printk warning]
Signed-off-by: Peter Zijlstra
Acked-by: Ingo Molnar
Acked-by: Jason Baron
Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Introduce the core lock statistics code.
Lock statistics provides lock wait-time and hold-time (as well as the count
of corresponding contention and acquisitions events). Also, the first few
call-sites that encounter contention are tracked.Lock wait-time is the time spent waiting on the lock. This provides insight
into the locking scheme, that is, a heavily contended lock is indicative of
a too coarse locking scheme.Lock hold-time is the duration the lock was held, this provides a reference for
the wait-time numbers, so they can be put into perspective.1)
lock
2)
... do stuff ..
unlock
3)The time between 1 and 2 is the wait-time. The time between 2 and 3 is the
hold-time.The lockdep held-lock tracking code is reused, because it already collects locks
into meaningful groups (classes), and because it is an existing infrastructure
for lock instrumentation.Currently lockdep tracks lock acquisition with two hooks:
lock()
lock_acquire()
_lock()... code protected by lock ...
unlock()
lock_release()
_unlock()We need to extend this with two more hooks, in order to measure contention.
lock_contended() - used to measure contention events
lock_acquired() - completion of the contentionThese are then placed the following way:
lock()
lock_acquire()
if (!_try_lock())
lock_contended()
_lock()
lock_acquired()... do locked stuff ...
unlock()
lock_release()
_unlock()(Note: the try_lock() 'trick' is used to avoid instrumenting all platform
dependent lock primitive implementations.)It is also possible to toggle the two lockdep features at runtime using:
/proc/sys/kernel/prove_locking
/proc/sys/kernel/lock_stat(esp. turning off the O(n^2) prove_locking functionaliy can help)
[akpm@linux-foundation.org: build fixes]
[akpm@linux-foundation.org: nuke unneeded ifdefs]
Signed-off-by: Peter Zijlstra
Acked-by: Ingo Molnar
Acked-by: Jason Baron
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Move code around to get fewer but larger #ifdef sections. Break some
in-function #ifdefs out into their own functions.Signed-off-by: Peter Zijlstra
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Ensure that all of the lock dependency tracking code is under
CONFIG_PROVE_LOCKING. This allows us to use the held lock tracking code for
other purposes.Signed-off-by: Peter Zijlstra
Acked-by: Ingo Molnar
Acked-by: Jason Baron
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Use the lockdep infrastructure to track lock contention and other lock
statistics.It tracks lock contention events, and the first four unique call-sites that
encountered contention.It also measures lock wait-time and hold-time in nanoseconds. The minimum and
maximum times are tracked, as well as a total (which together with the number
of event can give the avg).All statistics are done per lock class, per write (exclusive state) and per read
(shared state).The statistics are collected per-cpu, so that the collection overhead is
minimized via having no global cachemisses.This new lock statistics feature is independent of the lock dependency checking
traditionally done by lockdep; it just shares the lock tracking code. It is
also possible to enable both and runtime disabled either component - thereby
avoiding the O(n^2) lock chain walks for instance.This patch:
raw_spinlock_t should not use lockdep (and doesn't) since lockdep itself
relies on it.Signed-off-by: Peter Zijlstra
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Jan Harkes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds