20 Jul, 2007

40 commits

  • Lguest block driver

    A simple block driver for lguest.

    Signed-off-by: Rusty Russell
    Cc: Andi Kleen
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rusty Russell
     
  • Lguest net driver

    A simple net driver for lguest.

    [akpm@linux-foundation.org: include fix]
    Signed-off-by: Rusty Russell
    Cc: Andi Kleen
    Cc: Jeff Garzik
    Acked-by: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rusty Russell
     
  • A simple console driver for lguest.

    Signed-off-by: Rusty Russell
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rusty Russell
     
  • This is the Kconfig and Makefile to allow lguest to actually be
    compiled.

    Signed-off-by: Rusty Russell
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rusty Russell
     
  • This is the structure offsets required by lg.ko's switcher.S.

    Unfortunately we don't have infrastructure for private asm-offsets
    creation.

    Signed-off-by: Rusty Russell
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rusty Russell
     
  • This is the code for the "lg.ko" module, which allows lguest guests to
    be launched.

    [akpm@linux-foundation.org: update for futex-new-private-futexes]
    [akpm@linux-foundation.org: build fix]
    [jmorris@namei.org: lguest: use hrtimers]
    [akpm@linux-foundation.org: x86_64 build fix]
    Signed-off-by: Rusty Russell
    Cc: Andi Kleen
    Cc: Eric Dumazet
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rusty Russell
     
  • lguest is a simple hypervisor for Linux on Linux. Unlike kvm it doesn't need
    VT/SVM hardware. Unlike Xen it's simply "modprobe and go". Unlike both, it's
    5000 lines and self-contained.

    Performance is ok, but not great (-30% on kernel compile). But given its
    hackability, I expect this to improve, along with the paravirt_ops code which
    it supplies a complete example for. There's also a 64-bit version being
    worked on and other craziness.

    But most of all, lguest is awesome fun! Too much of the kernel is a big ball
    of hair. lguest is simple enough to dive into and hack, plus has some warts
    which scream "fork me!".

    This patch:

    This is the code and headers required to make an i386 kernel an lguest guest.

    Signed-off-by: Rusty Russell
    Cc: Andi Kleen
    Cc: Jeremy Fitzhardinge
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rusty Russell
     
  • lguest does some fairly lowlevel things to support a host, which
    normal modules don't need:

    math_state_restore:
    When the guest triggers a Device Not Available fault, we need
    to be able to restore the FPU

    __put_task_struct:
    We need to hold a reference to another task for inter-guest
    I/O, and put_task_struct() is an inline function which calls
    __put_task_struct.

    access_process_vm:
    We need to access another task for inter-guest I/O.

    map_vm_area & __get_vm_area:
    We need to map the switcher shim (ie. monitor) at 0xFFC01000.

    Signed-off-by: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rusty Russell
     
  • Adds support for periodic irq enabling in rtc-cmos. This could be used by
    the ALSA driver and is already being tested with the zaptel ztdummy module.

    Signed-off-by: Alessandro Zummo
    Cc: David Brownell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alessandro Zummo
     
  • Share a little common code, reverse the arguments for consistency, drop the
    unnecessary "inline", and lowercase the name.

    Signed-off-by: "J. Bruce Fields"
    Acked-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    J. Bruce Fields
     
  • EX_RDONLY is only called in one place; just put it there.

    Signed-off-by: "J. Bruce Fields"
    Acked-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    J. Bruce Fields
     
  • We can now assume that rqst_exp_get_by_name() does not return NULL; so clean
    up some unnecessary checks.

    Signed-off-by: "J. Bruce Fields"
    Acked-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    J. Bruce Fields
     
  • I converted the various export-returning functions to return -ENOENT instead
    of NULL, but missed a few cases.

    This particular case could cause actual bugs in the case of a krb5 client that
    doesn't match any ip-based client and that is trying to access a filesystem
    not exported to krb5 clients.

    Signed-off-by: "J. Bruce Fields"
    Acked-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    J. Bruce Fields
     
  • The value of nperbucket calculated here is too small--we should be rounding up
    instead of down--with the result that the index j in the following loop can
    overflow the raparm_hash array. At least in my case, the next thing in memory
    turns out to be export_table, so the symptoms I see are crashes caused by the
    appearance of four zeroed-out export entries in the first bucket of the hash
    table of exports (which were actually entries in the readahead cache, a
    pointer to which had been written to the export table in this initialization
    code).

    It looks like the bug was probably introduced with commit
    fce1456a19f5c08b688c29f00ef90fdfa074c79b ("knfsd: make the readahead params
    cache SMP-friendly").

    Cc:
    Cc: Greg Banks
    Signed-off-by: "J. Bruce Fields"
    Acked-by: NeilBrown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    J. Bruce Fields
     
  • page-writeback accounting is presently performed in the page-flags macros.
    This is inconsistent and a bit ugly and makes it awkward to implement
    per-backing_dev under-writeback page accounting.

    So move this accounting down to the callsite(s).

    Acked-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Signed-off-by: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • clocksource_adjust() has a clock argument, which shadows the file global clock
    variable. Fix this up.

    Signed-off-by: Thomas Gleixner
    Cc: john stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • Remove is_in_rom() function. It doesn't actually serve the purpose it was
    intended to. If you look at the use of it _access_ok() (which is the only use
    of it) then it is obvious that most of memory is marked as access_ok. No
    point having is_in_rom() then, so remove it.

    Signed-off-by: Greg Ungerer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Greg Ungerer
     
  • In die_if_kernel() start the stack dump at the exception-time SP, not at the
    SP with all the saved registers; the stack below exception-time sp contains
    only exception-saved values and is already printed in details just before.

    Signed-off-by: Philippe De Muyter
    Signed-off-by: Greg Ungerer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Greg Ungerer
     
  • Change the m68knommu irq handling to use the generic irq framework.

    Signed-off-by: Greg Ungerer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Greg Ungerer
     
  • Use appropriate accessor function to set compound page destructor
    function.

    Cc: William Irwin
    Signed-off-by: Akinobu Mita
    Acked-by: Adam Litke
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • The fix to that race in alloc_fresh_huge_page() which could give an illegal
    node ID did not need nid_lock at all: the fix was to replace static int nid
    by static int prev_nid and do the work on local int nid. nid_lock did make
    sure that racers strictly roundrobin the nodes, but that's not something we
    need to enforce strictly. Kill nid_lock.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • There is check_reset() -- global function in drivers/isdn/sc/
    There is check_reset -- variable holding module param in aacraid driver.

    On allyesconfig they clash with:

    LD drivers/built-in.o
    drivers/isdn/built-in.o: In function `check_reset':
    : multiple definition of `check_reset'
    drivers/scsi/built-in.o:(.data+0xe458): first defined here
    ld: Warning: size of symbol `check_reset' changed from 4 in drivers/scsi/built-in.o to 219 in drivers/isdn/built-in.o
    ld: Warning: type of symbol `check_reset' changed from 1 to 2 in drivers/isdn/built-in.o

    Rename the former.

    Signed-off-by: Alexey Dobriyan
    Cc: Karsten Keil
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • I've noticed lots of failures of vmalloc_32 on machines where it
    shouldn't have failed unless it was doing an atomic operation.

    Looking closely, I noticed that:

    #if defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA32)
    #define GFP_VMALLOC32 GFP_DMA32
    #elif defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA)
    #define GFP_VMALLOC32 GFP_DMA
    #else
    #define GFP_VMALLOC32 GFP_KERNEL
    #endif

    Which seems to be incorrect, it should always -or- in the DMA flags
    on top of GFP_KERNEL, thus this patch.

    This fixes frequent errors launchin X with the nouveau DRM for example.

    Signed-off-by: Benjamin Herrenschmidt
    Cc: Andi Kleen
    Cc: Dave Airlie
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Benjamin Herrenschmidt
     
  • Work around a possible bug in the FRV compiler.

    What appears to be happening is that gcc resolves the
    __builtin_constant_p() in kmalloc() to true, but then fails to reduce the
    therefore constant conditions in the if-statements it guards to constant
    results.

    When compiling with -O2 or -Os, one single spurious error crops up in
    cpuup_callback() in mm/slab.c. This can be avoided by making the memsize
    variable const.

    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • mm/hugetlb.c: In function `dequeue_huge_page':
    mm/hugetlb.c:72: warning: 'nid' might be used uninitialized in this function

    Cc: Christoph Lameter
    Cc: Adam Litke
    Cc: David Gibson
    Cc: William Lee Irwin III
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Transform some calls to kmalloc/memset to a single kzalloc (or kcalloc).

    Here is a short excerpt of the semantic patch performing
    this transformation:

    @@
    type T2;
    expression x;
    identifier f,fld;
    expression E;
    expression E1,E2;
    expression e1,e2,e3,y;
    statement S;
    @@

    x =
    - kmalloc
    + kzalloc
    (E1,E2)
    ... when != \(x->fld=E;\|y=f(...,x,...);\|f(...,x,...);\|x=E;\|while(...) S\|for(e1;e2;e3) S\)
    - memset((T2)x,0,E1);

    @@
    expression E1,E2,E3;
    @@

    - kzalloc(E1 * E2,E3)
    + kcalloc(E1,E2,E3)

    [akpm@linux-foundation.org: get kcalloc args the right way around]
    Signed-off-by: Yoann Padioleau
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Acked-by: Russell King
    Cc: Bryan Wu
    Acked-by: Jiri Slaby
    Cc: Dave Airlie
    Acked-by: Roland Dreier
    Cc: Jiri Kosina
    Acked-by: Dmitry Torokhov
    Cc: Benjamin Herrenschmidt
    Acked-by: Mauro Carvalho Chehab
    Acked-by: Pierre Ossman
    Cc: Jeff Garzik
    Cc: "David S. Miller"
    Acked-by: Greg KH
    Cc: James Bottomley
    Cc: "Antonino A. Daplas"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yoann Padioleau
     
  • The print_stack_trace macro in stacktrace.h has a wrong number of
    arguments, fix it.

    Signed-off-by: Johannes Berg
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Berg
     
  • When I started adding support for lockdep to 64-bit powerpc, I got a
    lockdep_init_error and with this patch was able to pinpoint why and where
    to put lockdep_init(). Let's support this generally for others adding
    lockdep support to their architecture.

    Signed-off-by: Johannes Berg
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Berg
     
  • optionally add class->name_version and class->subclass to the class name

    Signed-off-by: Peter Zijlstra
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • __acquire
    |
    lock _____
    | \
    | __contended
    | |
    | wait
    | _______/
    |/
    |
    __acquired
    |
    __release
    |
    unlock

    We measure acquisition and contention bouncing.

    This is done by recording a cpu stamp in each lock instance.

    Contention bouncing requires the cpu stamp to be set on acquisition. Hence we
    move __acquired into the generic path.

    __acquired is then used to measure acquisition bouncing by comparing the
    current cpu with the old stamp before replacing it.

    __contended is used to measure contention bouncing (only useful for preemptable
    locks)

    [akpm@linux-foundation.org: cleanups]
    Signed-off-by: Peter Zijlstra
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • the two init sites resulted in inconsistend names for the lock class.

    Signed-off-by: Peter Zijlstra
    Acked-by: Ingo Molnar
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • - update the copyright notices
    - use the default hash function
    - fix a thinko in a BUILD_BUG_ON
    - add a WARN_ON to spot inconsitent naming
    - fix a termination issue in /proc/lock_stat

    [akpm@linux-foundation.org: cleanups]
    Signed-off-by: Peter Zijlstra
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Call the new lockstat tracking functions from the various lock primitives.

    Signed-off-by: Peter Zijlstra
    Acked-by: Ingo Molnar
    Acked-by: Jason Baron
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Present all this fancy new lock statistics information:

    *warning, _wide_ output ahead*

    (output edited for purpose of brevity)

    # cat /proc/lock_stat
    lock_stat version 0.1
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------
    class name contentions waittime-min waittime-max waittime-total acquisitions holdtime-min holdtime-max holdtime-total
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------

    &inode->i_mutex: 14458 6.57 398832.75 2469412.23 6768876 0.34 11398383.65 339410830.89
    ---------------
    &inode->i_mutex 4486 [] pipe_wait+0x86/0x8d
    &inode->i_mutex 0 [] pipe_write_fasync+0x29/0x5d
    &inode->i_mutex 0 [] pipe_read+0x74/0x3a5
    &inode->i_mutex 0 [] do_lookup+0x81/0x1ae

    .................................................................................................................................................................

    &inode->i_data.tree_lock-W: 491 0.27 62.47 493.89 2477833 0.39 468.89 1146584.25
    &inode->i_data.tree_lock-R: 65 0.44 4.27 48.78 26288792 0.36 184.62 10197458.24
    --------------------------
    &inode->i_data.tree_lock 46 [] __do_page_cache_readahead+0x69/0x24f
    &inode->i_data.tree_lock 31 [] add_to_page_cache+0x31/0xba
    &inode->i_data.tree_lock 0 [] __do_page_cache_readahead+0xc2/0x24f
    &inode->i_data.tree_lock 0 [] find_get_page+0x1a/0x58

    .................................................................................................................................................................

    proc_inum_idr.lock: 0 0.00 0.00 0.00 36 0.00 65.60 148.26
    proc_subdir_lock: 0 0.00 0.00 0.00 3049859 0.00 106.81 1563212.42
    shrinker_rwsem-W: 0 0.00 0.00 0.00 5 0.00 1.73 3.68
    shrinker_rwsem-R: 0 0.00 0.00 0.00 633 2.57 246.57 10909.76

    'contentions' and 'acquisitions' are the number of such events measured (since
    the last reset). The waittime- and holdtime- (min, max, total) numbers are
    presented in microseconds.

    If there are any contention points, the lock class is presented in the block
    format (as i_mutex and tree_lock above), otherwise a single line of output is
    presented.

    The output is sorted on absolute number of contentions (read + write), this
    should get the worst offenders presented first, so that:

    # grep : /proc/lock_stat | head

    will quickly show who's bad.

    The stats can be reset using:

    # echo 0 > /proc/lock_stat

    [bunk@stusta.de: make 2 functions static]
    [akpm@linux-foundation.org: fix printk warning]
    Signed-off-by: Peter Zijlstra
    Acked-by: Ingo Molnar
    Acked-by: Jason Baron
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Introduce the core lock statistics code.

    Lock statistics provides lock wait-time and hold-time (as well as the count
    of corresponding contention and acquisitions events). Also, the first few
    call-sites that encounter contention are tracked.

    Lock wait-time is the time spent waiting on the lock. This provides insight
    into the locking scheme, that is, a heavily contended lock is indicative of
    a too coarse locking scheme.

    Lock hold-time is the duration the lock was held, this provides a reference for
    the wait-time numbers, so they can be put into perspective.

    1)
    lock
    2)
    ... do stuff ..
    unlock
    3)

    The time between 1 and 2 is the wait-time. The time between 2 and 3 is the
    hold-time.

    The lockdep held-lock tracking code is reused, because it already collects locks
    into meaningful groups (classes), and because it is an existing infrastructure
    for lock instrumentation.

    Currently lockdep tracks lock acquisition with two hooks:

    lock()
    lock_acquire()
    _lock()

    ... code protected by lock ...

    unlock()
    lock_release()
    _unlock()

    We need to extend this with two more hooks, in order to measure contention.

    lock_contended() - used to measure contention events
    lock_acquired() - completion of the contention

    These are then placed the following way:

    lock()
    lock_acquire()
    if (!_try_lock())
    lock_contended()
    _lock()
    lock_acquired()

    ... do locked stuff ...

    unlock()
    lock_release()
    _unlock()

    (Note: the try_lock() 'trick' is used to avoid instrumenting all platform
    dependent lock primitive implementations.)

    It is also possible to toggle the two lockdep features at runtime using:

    /proc/sys/kernel/prove_locking
    /proc/sys/kernel/lock_stat

    (esp. turning off the O(n^2) prove_locking functionaliy can help)

    [akpm@linux-foundation.org: build fixes]
    [akpm@linux-foundation.org: nuke unneeded ifdefs]
    Signed-off-by: Peter Zijlstra
    Acked-by: Ingo Molnar
    Acked-by: Jason Baron
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Move code around to get fewer but larger #ifdef sections. Break some
    in-function #ifdefs out into their own functions.

    Signed-off-by: Peter Zijlstra
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Ensure that all of the lock dependency tracking code is under
    CONFIG_PROVE_LOCKING. This allows us to use the held lock tracking code for
    other purposes.

    Signed-off-by: Peter Zijlstra
    Acked-by: Ingo Molnar
    Acked-by: Jason Baron
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Use the lockdep infrastructure to track lock contention and other lock
    statistics.

    It tracks lock contention events, and the first four unique call-sites that
    encountered contention.

    It also measures lock wait-time and hold-time in nanoseconds. The minimum and
    maximum times are tracked, as well as a total (which together with the number
    of event can give the avg).

    All statistics are done per lock class, per write (exclusive state) and per read
    (shared state).

    The statistics are collected per-cpu, so that the collection overhead is
    minimized via having no global cachemisses.

    This new lock statistics feature is independent of the lock dependency checking
    traditionally done by lockdep; it just shares the lock tracking code. It is
    also possible to enable both and runtime disabled either component - thereby
    avoiding the O(n^2) lock chain walks for instance.

    This patch:

    raw_spinlock_t should not use lockdep (and doesn't) since lockdep itself
    relies on it.

    Signed-off-by: Peter Zijlstra
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Signed-off-by: Jan Harkes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Harkes