20 Jul, 2007

40 commits

  • Transform some calls to kmalloc/memset to a single kzalloc (or kcalloc).

    Here is a short excerpt of the semantic patch performing
    this transformation:

    @@
    type T2;
    expression x;
    identifier f,fld;
    expression E;
    expression E1,E2;
    expression e1,e2,e3,y;
    statement S;
    @@

    x =
    - kmalloc
    + kzalloc
    (E1,E2)
    ... when != \(x->fld=E;\|y=f(...,x,...);\|f(...,x,...);\|x=E;\|while(...) S\|for(e1;e2;e3) S\)
    - memset((T2)x,0,E1);

    @@
    expression E1,E2,E3;
    @@

    - kzalloc(E1 * E2,E3)
    + kcalloc(E1,E2,E3)

    [akpm@linux-foundation.org: get kcalloc args the right way around]
    Signed-off-by: Yoann Padioleau
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Acked-by: Russell King
    Cc: Bryan Wu
    Acked-by: Jiri Slaby
    Cc: Dave Airlie
    Acked-by: Roland Dreier
    Cc: Jiri Kosina
    Acked-by: Dmitry Torokhov
    Cc: Benjamin Herrenschmidt
    Acked-by: Mauro Carvalho Chehab
    Acked-by: Pierre Ossman
    Cc: Jeff Garzik
    Cc: "David S. Miller"
    Acked-by: Greg KH
    Cc: James Bottomley
    Cc: "Antonino A. Daplas"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yoann Padioleau
     
  • The print_stack_trace macro in stacktrace.h has a wrong number of
    arguments, fix it.

    Signed-off-by: Johannes Berg
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Berg
     
  • When I started adding support for lockdep to 64-bit powerpc, I got a
    lockdep_init_error and with this patch was able to pinpoint why and where
    to put lockdep_init(). Let's support this generally for others adding
    lockdep support to their architecture.

    Signed-off-by: Johannes Berg
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Berg
     
  • optionally add class->name_version and class->subclass to the class name

    Signed-off-by: Peter Zijlstra
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • __acquire
    |
    lock _____
    | \
    | __contended
    | |
    | wait
    | _______/
    |/
    |
    __acquired
    |
    __release
    |
    unlock

    We measure acquisition and contention bouncing.

    This is done by recording a cpu stamp in each lock instance.

    Contention bouncing requires the cpu stamp to be set on acquisition. Hence we
    move __acquired into the generic path.

    __acquired is then used to measure acquisition bouncing by comparing the
    current cpu with the old stamp before replacing it.

    __contended is used to measure contention bouncing (only useful for preemptable
    locks)

    [akpm@linux-foundation.org: cleanups]
    Signed-off-by: Peter Zijlstra
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • the two init sites resulted in inconsistend names for the lock class.

    Signed-off-by: Peter Zijlstra
    Acked-by: Ingo Molnar
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • - update the copyright notices
    - use the default hash function
    - fix a thinko in a BUILD_BUG_ON
    - add a WARN_ON to spot inconsitent naming
    - fix a termination issue in /proc/lock_stat

    [akpm@linux-foundation.org: cleanups]
    Signed-off-by: Peter Zijlstra
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Call the new lockstat tracking functions from the various lock primitives.

    Signed-off-by: Peter Zijlstra
    Acked-by: Ingo Molnar
    Acked-by: Jason Baron
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Present all this fancy new lock statistics information:

    *warning, _wide_ output ahead*

    (output edited for purpose of brevity)

    # cat /proc/lock_stat
    lock_stat version 0.1
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------
    class name contentions waittime-min waittime-max waittime-total acquisitions holdtime-min holdtime-max holdtime-total
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------

    &inode->i_mutex: 14458 6.57 398832.75 2469412.23 6768876 0.34 11398383.65 339410830.89
    ---------------
    &inode->i_mutex 4486 [] pipe_wait+0x86/0x8d
    &inode->i_mutex 0 [] pipe_write_fasync+0x29/0x5d
    &inode->i_mutex 0 [] pipe_read+0x74/0x3a5
    &inode->i_mutex 0 [] do_lookup+0x81/0x1ae

    .................................................................................................................................................................

    &inode->i_data.tree_lock-W: 491 0.27 62.47 493.89 2477833 0.39 468.89 1146584.25
    &inode->i_data.tree_lock-R: 65 0.44 4.27 48.78 26288792 0.36 184.62 10197458.24
    --------------------------
    &inode->i_data.tree_lock 46 [] __do_page_cache_readahead+0x69/0x24f
    &inode->i_data.tree_lock 31 [] add_to_page_cache+0x31/0xba
    &inode->i_data.tree_lock 0 [] __do_page_cache_readahead+0xc2/0x24f
    &inode->i_data.tree_lock 0 [] find_get_page+0x1a/0x58

    .................................................................................................................................................................

    proc_inum_idr.lock: 0 0.00 0.00 0.00 36 0.00 65.60 148.26
    proc_subdir_lock: 0 0.00 0.00 0.00 3049859 0.00 106.81 1563212.42
    shrinker_rwsem-W: 0 0.00 0.00 0.00 5 0.00 1.73 3.68
    shrinker_rwsem-R: 0 0.00 0.00 0.00 633 2.57 246.57 10909.76

    'contentions' and 'acquisitions' are the number of such events measured (since
    the last reset). The waittime- and holdtime- (min, max, total) numbers are
    presented in microseconds.

    If there are any contention points, the lock class is presented in the block
    format (as i_mutex and tree_lock above), otherwise a single line of output is
    presented.

    The output is sorted on absolute number of contentions (read + write), this
    should get the worst offenders presented first, so that:

    # grep : /proc/lock_stat | head

    will quickly show who's bad.

    The stats can be reset using:

    # echo 0 > /proc/lock_stat

    [bunk@stusta.de: make 2 functions static]
    [akpm@linux-foundation.org: fix printk warning]
    Signed-off-by: Peter Zijlstra
    Acked-by: Ingo Molnar
    Acked-by: Jason Baron
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Introduce the core lock statistics code.

    Lock statistics provides lock wait-time and hold-time (as well as the count
    of corresponding contention and acquisitions events). Also, the first few
    call-sites that encounter contention are tracked.

    Lock wait-time is the time spent waiting on the lock. This provides insight
    into the locking scheme, that is, a heavily contended lock is indicative of
    a too coarse locking scheme.

    Lock hold-time is the duration the lock was held, this provides a reference for
    the wait-time numbers, so they can be put into perspective.

    1)
    lock
    2)
    ... do stuff ..
    unlock
    3)

    The time between 1 and 2 is the wait-time. The time between 2 and 3 is the
    hold-time.

    The lockdep held-lock tracking code is reused, because it already collects locks
    into meaningful groups (classes), and because it is an existing infrastructure
    for lock instrumentation.

    Currently lockdep tracks lock acquisition with two hooks:

    lock()
    lock_acquire()
    _lock()

    ... code protected by lock ...

    unlock()
    lock_release()
    _unlock()

    We need to extend this with two more hooks, in order to measure contention.

    lock_contended() - used to measure contention events
    lock_acquired() - completion of the contention

    These are then placed the following way:

    lock()
    lock_acquire()
    if (!_try_lock())
    lock_contended()
    _lock()
    lock_acquired()

    ... do locked stuff ...

    unlock()
    lock_release()
    _unlock()

    (Note: the try_lock() 'trick' is used to avoid instrumenting all platform
    dependent lock primitive implementations.)

    It is also possible to toggle the two lockdep features at runtime using:

    /proc/sys/kernel/prove_locking
    /proc/sys/kernel/lock_stat

    (esp. turning off the O(n^2) prove_locking functionaliy can help)

    [akpm@linux-foundation.org: build fixes]
    [akpm@linux-foundation.org: nuke unneeded ifdefs]
    Signed-off-by: Peter Zijlstra
    Acked-by: Ingo Molnar
    Acked-by: Jason Baron
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Move code around to get fewer but larger #ifdef sections. Break some
    in-function #ifdefs out into their own functions.

    Signed-off-by: Peter Zijlstra
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Ensure that all of the lock dependency tracking code is under
    CONFIG_PROVE_LOCKING. This allows us to use the held lock tracking code for
    other purposes.

    Signed-off-by: Peter Zijlstra
    Acked-by: Ingo Molnar
    Acked-by: Jason Baron
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Use the lockdep infrastructure to track lock contention and other lock
    statistics.

    It tracks lock contention events, and the first four unique call-sites that
    encountered contention.

    It also measures lock wait-time and hold-time in nanoseconds. The minimum and
    maximum times are tracked, as well as a total (which together with the number
    of event can give the avg).

    All statistics are done per lock class, per write (exclusive state) and per read
    (shared state).

    The statistics are collected per-cpu, so that the collection overhead is
    minimized via having no global cachemisses.

    This new lock statistics feature is independent of the lock dependency checking
    traditionally done by lockdep; it just shares the lock tracking code. It is
    also possible to enable both and runtime disabled either component - thereby
    avoiding the O(n^2) lock chain walks for instance.

    This patch:

    raw_spinlock_t should not use lockdep (and doesn't) since lockdep itself
    relies on it.

    Signed-off-by: Peter Zijlstra
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Signed-off-by: Jan Harkes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Harkes
     
  • Similar information can easily be obtained with strace -c.

    Signed-off-by: Jan Harkes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Harkes
     
  • The sb_info structure only contains a single pointer to the character device,
    there is no need for the added indirection.

    Signed-off-by: Jan Harkes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Harkes
     
  • Signed-off-by: Jan Harkes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Harkes
     
  • Signed-off-by: Jan Harkes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Harkes
     
  • Venus returns an ENOENT error on open, so we shouldn't try to grab the
    filehandle for the returned fd.

    Signed-off-by: Jan Harkes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Harkes
     
  • Signed-off-by: Jan Harkes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Harkes
     
  • Signed-off-by: Jan Harkes
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Harkes
     
  • We ignore signals for about 30 seconds to give userspace a chance to see the
    upcall. As we did not block signals we ended up in a busy loop for the
    remainder of the period when a signal is received.

    Signed-off-by: Jan Harkes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Harkes
     
  • Make the code that processes upcall responses more straightforward, uncovered
    at least one bad assumption. We trusted that vc_inuse would be 0 when upcalls
    are aborted, however the device may have been reopened.

    Signed-off-by: Jan Harkes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Harkes
     
  • - Make sure device index is not a negative number.
    - Unlink queued requests when the device is closed to avoid passing them
    to the next opener.

    Signed-off-by: Jan Harkes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Harkes
     
  • Signed-off-by: Jan Harkes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Harkes
     
  • Set MS_NOATIME flag to avoid unnecessary calls when the coda inode is
    accessed.

    Also, set statfs.f_bsize to 4k. 1k is obviously too small for the suggested
    IO size.

    Signed-off-by: Jan Harkes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Harkes
     
  • A directory without children may still be busy when it is the cwd for some
    process. We can safely remove such a directory because the VFS prevents
    further operations. Also we don't need to call d_delete as it is already
    called in vfs_rmdir.

    Signed-off-by: Jan Harkes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Harkes
     
  • The Coda client sets the directory link count to 1 when it isn't sure how many
    subdirectories we have. In this case we shouldn't change the link count in
    the kernel when a subdirectory is created or removed.

    Signed-off-by: Jan Harkes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Harkes
     
  • Change the epoch value to forces a refresh instead of clearing the cached
    rights mask and block all further accesses to the object.

    Signed-off-by: Jan Harkes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Harkes
     
  • When open fails the fd in the response is uninitialized and we ended up taking
    a reference on the file struct and never released it.

    Signed-off-by: Jan Harkes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Harkes
     
  • This change passes the --build-id when linking the kernel and when linking
    modules, if ld supports it. This is a new GNU ld option that synthesizes an
    ELF note section inside the read-only data. The note in this section contains
    unique identifying bits called the "build ID", which are generated so as to be
    different for any two linked ELF files that aren't identical. The build ID
    can be recovered from stripped files, memory dumps, etc. and used to look up
    the original program built, locate debuginfo or other details or history
    associated with it. For normal program linking, the compiler passes
    --build-id to ld by default, but the option is needed when using ld directly
    as we do.

    Signed-off-by: Roland McGrath
    Cc: Andi Kleen
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • This patch adds the /sys/kernel/notes magic file. Reading this delivers the
    contents of the kernel's .notes section. This lets userland easily glean any
    detailed information about the running kernel's build that was stored there at
    compile time.

    Signed-off-by: Roland McGrath
    Cc: Andi Kleen
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • This changes the s390 linker script to use the asm-generic NOTES macro so that
    ELF note sections with SHF_ALLOC set are linked into the kernel image along
    with other read-only data. The PT_NOTE also points to their location.

    This paves the way for putting useful build-time information into ELF notes
    that can be found easily later in a kernel memory dump.

    Signed-off-by: Roland McGrath
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • This changes the powerpc linker script to use the asm-generic NOTES macro so
    that ELF note sections with SHF_ALLOC set are linked into the kernel image
    along with other read-only data. The PT_NOTE also points to their location.

    This paves the way for putting useful build-time information into ELF notes
    that can be found easily later in a kernel memory dump.

    Signed-off-by: Roland McGrath
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • This changes the alpha linker script to use the asm-generic NOTES macro so
    that ELF note sections with SHF_ALLOC set are linked into the kernel image
    along with other read-only data. The PT_NOTE also points to their location.

    This paves the way for putting useful build-time information into ELF notes
    that can be found easily later in a kernel memory dump.

    Signed-off-by: Roland McGrath
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • This changes the x86_64 linker script to use the asm-generic NOTES macro so
    that ELF note sections with SHF_ALLOC set are linked into the kernel image
    along with other read-only data. The PT_NOTE also points to their location.

    This paves the way for putting useful build-time information into ELF notes
    that can be found easily later in a kernel memory dump.

    Signed-off-by: Roland McGrath
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • This changes the i386 linker script and the asm-generic macro it uses so that
    ELF note sections with SHF_ALLOC set are linked into the kernel image along
    with other read-only data. The PT_NOTE also points to their location.

    This paves the way for putting useful build-time information into ELF notes
    that can be found easily later in a kernel memory dump.

    Signed-off-by: Roland McGrath
    Cc: Andi Kleen
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • Looking at the current linus-git tree jbd_debug() define in
    include/linux/jbd2.h

    extern u8 journal_enable_debug;

    #define jbd_debug(n, f, a...) \
    do { \
    if ((n) fs/ext4/inode.c: In function ‘ext4_write_inode’:
    > fs/ext4/inode.c:2906: warning: comparison is always true due to limited
    > range of data type
    >
    > fs/jbd2/recovery.c: In function ‘jbd2_journal_recover’:
    > fs/jbd2/recovery.c:254: warning: comparison is always true due to
    > limited range of data type
    > fs/jbd2/recovery.c:257: warning: comparison is always true due to
    > limited range of data type
    >
    > fs/jbd2/recovery.c: In function ‘jbd2_journal_skip_recovery’:
    > fs/jbd2/recovery.c:301: warning: comparison is always true due to
    > limited range of data type
    >
    Noticed all warnings are occurs when the debug level is 0. Then found
    the "jbd2: Move jbd2-debug file to debugfs" patch
    http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0f49d5d019afa4e94253bfc92f0daca3badb990b

    changed the jbd2_journal_enable_debug from int type to u8, makes the
    jbd_debug comparision is always true when the debugging level is 0. Thus
    the compile warning occurs.

    Thought about changing the jbd2_journal_enable_debug data type back to
    int, but can't, because the jbd2-debug is moved to debug fs, where
    calling debugfs_create_u8() to create the debugfs entry needs the value
    to be u8 type.

    Even if we changed the data type back to int, the code is still buggy,
    kernel should not print jbd2 debug message if the
    jbd2_journal_enable_debug is set to 0. But this is not the case.

    The fix is change the level of debugging to 1. The same should fixed in
    ext3/JBD, but currently ext3 jbd-debug via /proc fs is broken, so we
    probably should fix it all together.

    Signed-off-by: Mingming Cao
    Cc: Jeff Garzik
    Cc: Theodore Tso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     
  • This version brings a number of new checks, and a number of bug
    fixes. Of note:

    - warnings for multiple assignments per line
    - warnings for multiple declarations per line
    - checks for single statement blocks with braces

    This patch includes an update for feature-removal-schedule.txt to
    better target checks.

    Andy Whitcroft (12):
    Version: 0.08
    only apply printk checks where there is a string literal
    allow suppression of errors for when no patch is found
    warn about multiple assignments
    warn on declaration of multiple variables
    check for kfree() with needless null check
    check for single statement braced blocks
    check for aggregate initialisation on the next line
    handle the => operator
    check for spaces between function name and open parenthesis
    move to explicit Check: entries in feature-removal-schedule.txt
    handle pointer attributes

    Signed-off-by: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Whitcroft
     
  • Signed-off-by: Rolf Eike Beer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rolf Eike Beer