26 Oct, 2011

1 commit

  • * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
    rcu: Move propagation of ->completed from rcu_start_gp() to rcu_report_qs_rsp()
    rcu: Remove rcu_needs_cpu_flush() to avoid false quiescent states
    rcu: Wire up RCU_BOOST_PRIO for rcutree
    rcu: Make rcu_torture_boost() exit loops at end of test
    rcu: Make rcu_torture_fqs() exit loops at end of test
    rcu: Permit rt_mutex_unlock() with irqs disabled
    rcu: Avoid having just-onlined CPU resched itself when RCU is idle
    rcu: Suppress NMI backtraces when stall ends before dump
    rcu: Prohibit grace periods during early boot
    rcu: Simplify unboosting checks
    rcu: Prevent early boot set_need_resched() from __rcu_pending()
    rcu: Dump local stack if cannot dump all CPUs' stacks
    rcu: Move __rcu_read_unlock()'s barrier() within if-statement
    rcu: Improve rcu_assign_pointer() and RCU_INIT_POINTER() documentation
    rcu: Make rcu_assign_pointer() unconditionally insert a memory barrier
    rcu: Make rcu_implicit_dynticks_qs() locals be correct size
    rcu: Eliminate in_irq() checks in rcu_enter_nohz()
    nohz: Remove nohz_cpu_mask
    rcu: Document interpretation of RCU-lockdep splats
    rcu: Allow rcutorture's stat_interval parameter to be changed at runtime
    ...

    Linus Torvalds
     

29 Sep, 2011

1 commit

  • Long ago, using TREE_RCU with PREEMPT would result in "scheduling
    while atomic" diagnostics if you blocked in an RCU read-side critical
    section. However, PREEMPT now implies TREE_PREEMPT_RCU, which defeats
    this diagnostic. This commit therefore adds a replacement diagnostic
    based on PROVE_RCU.

    Because rcu_lockdep_assert() and lockdep_rcu_dereference() are now being
    used for things that have nothing to do with rcu_dereference(), rename
    lockdep_rcu_dereference() to lockdep_rcu_suspicious() and add a third
    argument that is a string indicating what is suspicious. This third
    argument is passed in from a new third argument to rcu_lockdep_assert().
    Update all calls to rcu_lockdep_assert() to add an informative third
    argument.

    Also, add a pair of rcu_lockdep_assert() calls from within
    rcu_note_context_switch(), one complaining if a context switch occurs
    in an RCU-bh read-side critical section and another complaining if a
    context switch occurs in an RCU-sched read-side critical section.
    These are present only if the PROVE_RCU kernel parameter is enabled.

    Finally, fix some checkpatch whitespace complaints in lockdep.c.

    Again, you must enable PROVE_RCU to see these new diagnostics. But you
    are enabling PROVE_RCU to check out new RCU uses in any case, aren't you?

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

18 Sep, 2011

1 commit

  • Andrew requested I comment all the lockdep WARN()s to help other people
    figure out wth is wrong..

    Requested-by: Andrew Morton
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1315301493.3191.9.camel@twins
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

09 Aug, 2011

1 commit

  • match_held_lock() was assuming it was being called on a lock class
    that had already seen usage.

    This condition was true for bug-free code using lockdep_assert_held(),
    since you're in fact holding the lock when calling it. However the
    assumption fails the moment you assume the assertion can fail, which
    is the whole point of having the assertion in the first place.

    Anyway, now that there's more lockdep_is_held() users, notably
    __rcu_dereference_check(), its much easier to trigger this since we
    test for a number of locks and we only need to hold any one of them to
    be good.

    Reported-by: Sergey Senozhatsky
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1312547787.28695.2.camel@twins
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

04 Aug, 2011

3 commits

  • lockdep_init_map() only initializes parts of lockdep_map and triggers
    kmemcheck warning when it is copied as a whole. There isn't anything
    to be gained by clearing selectively. memset() the whole structure
    and remove loop for ->class_cache[] clearing.

    Addresses https://bugzilla.kernel.org/show_bug.cgi?id=35532

    Signed-off-by: Tejun Heo
    Reported-and-tested-by: Christian Casteyde
    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=35532
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20110714131909.GJ3455@htj.dyndns.org
    Signed-off-by: Ingo Molnar

    Tejun Heo
     
  • On Sun, 2011-07-24 at 21:06 -0400, Arnaud Lacombe wrote:

    > /src/linux/linux/kernel/lockdep.c: In function 'mark_held_locks':
    > /src/linux/linux/kernel/lockdep.c:2471:31: warning: comparison of
    > distinct pointer types lacks a cast

    The warning is harmless in this case, but the below makes it go away.

    Reported-by: Arnaud Lacombe
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1311588599.2617.56.camel@laptop
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Commit dd4e5d3ac4a ("lockdep: Fix trace_[soft,hard]irqs_[on,off]()
    recursion") made a bit of a mess of the various checks and error
    conditions.

    In particular it moved the check for !irqs_disabled() before the
    spurious enable test, resulting in some warnings.

    Reported-by: Arnaud Lacombe
    Reported-by: Dave Jones
    Reported-and-tested-by: Sergey Senozhatsky
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1311679697.24752.28.camel@twins
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

23 Jul, 2011

1 commit


22 Jul, 2011

1 commit

  • Thomas noticed that a lock marked with lockdep_set_novalidate_class()
    will still trigger warnings for IRQ inversions. Cure this by skipping
    those when marking irq state.

    Reported-and-tested-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/n/tip-2dp5vmpsxeraqm42kgww6ge2@git.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

22 Jun, 2011

1 commit

  • Commit:

    1efc5da3cf56: [PATCH] order of lockdep off/on in vprintk() should be changed

    explains the reason for having raw_local_irq_*() and lockdep_off()
    in printk(). Instead of working around the broken recursion detection
    of interrupt state tracking, fix it.

    Signed-off-by: Peter Zijlstra
    Cc: efault@gmx.de
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/20110621153806.185242734@chello.nl
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

07 Jun, 2011

1 commit

  • The main lock_is_held() user is lockdep_assert_held(), avoid false
    assertions in lockdep_off() sections by unconditionally reporting the
    lock is taken.

    [ the reason this is important is a lockdep_assert_held() in ttwu()
    which triggers a warning under lockdep_off() as in printk() which
    can trigger another wakeup and lock up due to spinlock
    recursion, as reported and heroically debugged by Arne Jansen ]

    Reported-and-tested-by: Arne Jansen
    Signed-off-by: Peter Zijlstra
    Cc: Linus Torvalds
    Cc:
    Link: http://lkml.kernel.org/r/1307398759.2497.966.camel@laptop
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

22 Apr, 2011

7 commits

  • For some reason nr_chain_hlocks is updated with cmpxchg, but
    this is performed inside of the lockdep global "grab_lock()",
    which also makes simple modification of this variable atomic.

    Remove the cmpxchg logic for updating nr_chain_hlocks and
    simplify the code.

    Signed-off-by: Steven Rostedt
    Acked-by: Peter Zijlstra
    Cc: Frederic Weisbecker
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/20110421014300.727863282@goodmis.org
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     
  • Lockdep output can be pretty cryptic, having nicer output
    can save a lot of head scratching. When a simple irq inversion
    scenario is detected by lockdep (lock A taken in interrupt
    context but also in thread context without disabling interrupts)
    we now get the following (hopefully more informative) output:

    other info that might help us debug this:
    Possible unsafe locking scenario:

    CPU0
    ----
    lock(lockA);

    lock(lockA);

    *** DEADLOCK ***

    Signed-off-by: Steven Rostedt
    Acked-by: Peter Zijlstra
    Cc: Frederic Weisbecker
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/20110421014300.436140880@goodmis.org
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     
  • The message of "Bad BFS generated tree" is a bit confusing.
    Replace it with a more sane error message.

    Thanks to Peter Zijlstra for helping me come up with a better
    message.

    Signed-off-by: Steven Rostedt
    Acked-by: Peter Zijlstra
    Cc: Frederic Weisbecker
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/20110421014300.135521252@goodmis.org
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     
  • Irq inversion and irq dependency bugs are only subtly
    different. The diffenerence lies where the interrupt occurred.

    For irq dependency:

    irq_disable
    lock(A)
    lock(B)
    unlock(B)
    unlock(A)
    irq_enable

    lock(B)
    unlock(B)


    lock(A)

    The interrupt comes in after it has been established that lock A
    can be held when taking an irq unsafe lock. Lockdep detects the
    problem when taking lock A in interrupt context.

    With the irq_inversion the irq happens before it is established
    and lockdep detects the problem with the taking of lock B:


    lock(A)

    irq_disable
    lock(A)
    lock(B)
    unlock(B)
    unlock(A)
    irq_enable

    lock(B)
    unlock(B)

    Since the problem with the locking logic for both of these issues
    is in actuality the same, they both should report the same scenario.
    This patch implements that and prints this:

    other info that might help us debug this:

    Chain exists of:
    &rq->lock --> lockA --> lockC

    Possible interrupt unsafe locking scenario:

    CPU0 CPU1
    ---- ----
    lock(lockC);
    local_irq_disable();
    lock(&rq->lock);
    lock(lockA);

    lock(&rq->lock);

    *** DEADLOCK ***

    Signed-off-by: Steven Rostedt
    Acked-by: Peter Zijlstra
    Cc: Frederic Weisbecker
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/20110421014259.910720381@goodmis.org
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     
  • Lockdep output can be pretty cryptic, having nicer output
    can save a lot of head scratching. When a simple deadlock
    scenario is detected by lockdep (lock A -> lock A) we now
    get the following new output:

    other info that might help us debug this:
    Possible unsafe locking scenario:

    CPU0
    ----
    lock(&(lock)->rlock);
    lock(&(lock)->rlock);

    *** DEADLOCK ***

    Signed-off-by: Steven Rostedt
    Acked-by: Peter Zijlstra
    Cc: Frederic Weisbecker
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/20110421014259.643930104@goodmis.org
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     
  • The lockdep output can be pretty cryptic, having nicer output
    can save a lot of head scratching. When a normal deadlock
    scenario is detected by lockdep (lock A -> lock B and there
    exists a place where lock B -> lock A) we now get the following
    new output:

    other info that might help us debug this:

    Possible unsafe locking scenario:

    CPU0 CPU1
    ---- ----
    lock(lockB);
    lock(lockA);
    lock(lockB);
    lock(lockA);

    *** DEADLOCK ***

    On cases where there's a deeper chair, it shows the partial
    chain that can cause the issue:

    Chain exists of:
    lockC --> lockA --> lockB

    Possible unsafe locking scenario:

    CPU0 CPU1
    ---- ----
    lock(lockB);
    lock(lockA);
    lock(lockB);
    lock(lockC);

    *** DEADLOCK ***

    Signed-off-by: Steven Rostedt
    Acked-by: Peter Zijlstra
    Cc: Frederic Weisbecker
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/20110421014259.380621789@goodmis.org
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     
  • Locking order inversion due to interrupts is a subtle problem.

    When an irq lockiinversion discovered by lockdep it currently
    reports something like:

    [ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ]

    ... and then prints out the locks that are involved, as back traces.

    Judging by lkml feedback developers were routinely confused by what
    a HARDIRQ->safe to unsafe issue is all about, and sometimes even
    blew it off as a bug in lockdep.

    It is not obvious when lockdep prints this message about a lock that
    is never taken in interrupt context.

    After explaining the problems that lockdep is reporting, I
    decided to add a description of the problem in visual form. Now
    the following is shown:

    ---
    other info that might help us debug this:

    Possible interrupt unsafe locking scenario:

    CPU0 CPU1
    ---- ----
    lock(lockA);
    local_irq_disable();
    lock(&rq->lock);
    lock(lockA);

    lock(&rq->lock);

    *** DEADLOCK ***

    ---

    The above is the case when the unsafe lock is taken while
    holding a lock taken in irq context. But when a lock is taken
    that also grabs a unsafe lock, the call chain is shown:

    ---
    other info that might help us debug this:

    Chain exists of:
    &rq->lock --> lockA --> lockC

    Possible interrupt unsafe locking scenario:

    CPU0 CPU1
    ---- ----
    lock(lockC);
    local_irq_disable();
    lock(&rq->lock);
    lock(lockA);

    lock(&rq->lock);

    *** DEADLOCK ***

    Signed-off-by: Steven Rostedt
    Acked-by: Peter Zijlstra
    Cc: Frederic Weisbecker
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/20110421014259.132728798@goodmis.org
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     

31 Mar, 2011

1 commit


20 Jan, 2011

1 commit

  • During early boot, local IRQ is disabled until IRQ subsystem is
    properly initialized. During this time, no one should enable
    local IRQ and some operations which usually are not allowed with
    IRQ disabled, e.g. operations which might sleep or require
    communications with other processors, are allowed.

    lockdep tracked this with early_boot_irqs_off/on() callbacks.
    As other subsystems need this information too, move it to
    init/main.c and make it generally available. While at it,
    toggle the boolean to early_boot_irqs_disabled instead of
    enabled so that it can be initialized with %false and %true
    indicates the exceptional condition.

    Signed-off-by: Tejun Heo
    Acked-by: Peter Zijlstra
    Acked-by: Pekka Enberg
    Cc: Linus Torvalds
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Tejun Heo
     

19 Oct, 2010

2 commits

  • Current look_up_lock_class() doesn't check the parameter "subclass".
    This rarely rises problems because the main caller of this function,
    register_lock_class(), checks it.

    But register_lock_class() is not the only function which calls
    look_up_lock_class(). lock_set_class() and its callees also call it.
    And lock_set_class() doesn't check this parameter.

    This will rise problems when the the value of subclass is larger than
    MAX_LOCKDEP_SUBCLASSES. Because the address (used as the key of class)
    caliculated with too large subclass has a probability to point
    another key in different lock_class_key.

    Of course this problem depends on the memory layout and
    occurs with really low probability.

    Signed-off-by: Hitoshi Mitake
    Cc: Dmitry Torokhov
    Cc: Vojtech Pavlik
    Cc: Frederic Weisbecker
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Hitoshi Mitake
     
  • Current lockdep_map only caches one class with subclass == 0,
    and looks up hash table of classes when subclass != 0.

    It seems that this has no problem because the case of
    subclass != 0 is rare. But locks of struct rq are
    acquired with subclass == 1 when task migration is executed.
    Task migration is high frequent event, so I modified lockdep
    to cache subclasses.

    I measured the score of perf bench sched messaging.
    This patch has slightly but certain (order of milli seconds
    or 10 milli seconds) effect when lots of tasks are running.
    I'll show the result in the tail of this description.

    NR_LOCKDEP_CACHING_CLASSES specifies how many classes can be
    cached in the instances of lockdep_map.
    I discussed with Peter Zijlstra in LinuxCon Japan about
    this approach and he taught me that caching every subclasses(8)
    is cleary waste of memory. So number of cached classes
    should be configurable.

    === Score comparison of benchmarks ===
    # "min" means best score, and "max" means worst score

    for i in `seq 1 10`; do ./perf bench -f simple sched messaging; done

    before: min: 0.565000, max: 0.583000, avg: 0.572500
    after: min: 0.559000, max: 0.568000, avg: 0.563300

    # with more processes
    for i in `seq 1 10`; do ./perf bench -f simple sched messaging -g 40; done

    before: min: 2.274000, max: 2.298000, avg: 2.286300
    after: min: 2.242000, max: 2.270000, avg: 2.259700

    Signed-off-by: Hitoshi Mitake
    Cc: Frederic Weisbecker
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Hitoshi Mitake
     

17 Aug, 2010

1 commit


09 Jun, 2010

1 commit

  • For people who otherwise get to write: cpu_clock(smp_processor_id()),
    there is now: local_clock().

    Also, as per suggestion from Andrew, provide some documentation on
    the various clock interfaces, and minimize the unsigned long long vs
    u64 mess.

    Signed-off-by: Peter Zijlstra
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Jens Axboe
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

22 May, 2010

1 commit

  • The conversion of device->sem to device->mutex resulted in lockdep
    warnings. Create a novalidate class for now until the driver folks
    come up with separate classes. That way we have at least the basic
    mutex debugging coverage.

    Add a checkpatch error so the usage is reserved for device->mutex.

    [ tglx: checkpatch and compile fix for LOCKDEP=n ]

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Peter Zijlstra
     

18 May, 2010

2 commits

  • …git/tip/linux-2.6-tip

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (311 commits)
    perf tools: Add mode to build without newt support
    perf symbols: symbol inconsistency message should be done only at verbose=1
    perf tui: Add explicit -lslang option
    perf options: Type check all the remaining OPT_ variants
    perf options: Type check OPT_BOOLEAN and fix the offenders
    perf options: Check v type in OPT_U?INTEGER
    perf options: Introduce OPT_UINTEGER
    perf tui: Add workaround for slang < 2.1.4
    perf record: Fix bug mismatch with -c option definition
    perf options: Introduce OPT_U64
    perf tui: Add help window to show key associations
    perf tui: Make <- exit menus too
    perf newt: Add single key shortcuts for zoom into DSO and threads
    perf newt: Exit browser unconditionally when CTRL+C, q or Q is pressed
    perf newt: Fix the 'A'/'a' shortcut for annotate
    perf newt: Make <- exit the ui_browser
    x86, perf: P4 PMU - fix counters management logic
    perf newt: Make <- zoom out filters
    perf report: Report number of events, not samples
    perf hist: Clarify events_stats fields usage
    ...

    Fix up trivial conflicts in kernel/fork.c and tools/perf/builtin-record.c

    Linus Torvalds
     
  • * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (24 commits)
    rcu: remove all rcu head initializations, except on_stack initializations
    rcu head introduce rcu head init on stack
    Debugobjects transition check
    rcu: fix build bug in RCU_FAST_NO_HZ builds
    rcu: RCU_FAST_NO_HZ must check RCU dyntick state
    rcu: make SRCU usable in modules
    rcu: improve the RCU CPU-stall warning documentation
    rcu: reduce the number of spurious RCU_SOFTIRQ invocations
    rcu: permit discontiguous cpu_possible_mask CPU numbering
    rcu: improve RCU CPU stall-warning messages
    rcu: print boot-time console messages if RCU configs out of ordinary
    rcu: disable CPU stall warnings upon panic
    rcu: enable CPU_STALL_VERBOSE by default
    rcu: slim down rcutiny by removing rcu_scheduler_active and friends
    rcu: refactor RCU's context-switch handling
    rcu: rename rcutiny rcu_ctrlblk to rcu_sched_ctrlblk
    rcu: shrink rcutiny by making synchronize_rcu_bh() be inline
    rcu: fix now-bogus rcu_scheduler_active comments.
    rcu: Fix bogus CONFIG_PROVE_LOCKING in comments to reflect reality.
    rcu: ignore offline CPUs in last non-dyntick-idle CPU check
    ...

    Linus Torvalds
     

11 May, 2010

1 commit

  • There is no need to disable lockdep after an RCU lockdep splat,
    so remove the debug_lockdeps_off() from lockdep_rcu_dereference().
    To avoid repeated lockdep splats, use a static variable in the inlined
    rcu_dereference_check() and rcu_dereference_protected() macros so that
    a given instance splats only once, but so that multiple instances can
    be detected per boot.

    This is controlled by a new config variable CONFIG_PROVE_RCU_REPEATEDLY,
    which is disabled by default. This provides the normal lockdep behavior
    by default, but permits people who want to find multiple RCU-lockdep
    splats per boot to easily do so.

    Requested-by: Eric Paris
    Signed-off-by: Lai Jiangshan
    Tested-by: Eric Paris
    Signed-off-by: Paul E. McKenney

    Lai Jiangshan
     

09 May, 2010

2 commits


07 May, 2010

1 commit

  • When calling check_prevs_add(), if all validations passed
    add_lock_to_list() will add new lock to dependency tree and
    alloc stack_trace for each list_entry.

    But at this time, we are always on the same stack, so stack_trace
    for each list_entry has the same value. This is redundant and eats
    up lots of memory which could lead to warning on low
    MAX_STACK_TRACE_ENTRIES.

    Use one copy of stack_trace instead.

    V2: As suggested by Peter Zijlstra, move save_trace() from
    check_prevs_add() to check_prev_add().
    Add tracking for trylock dependence which is also redundant.

    Signed-off-by: Yong Zhang
    Cc: David S. Miller
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Yong Zhang
     

04 May, 2010

1 commit

  • We forgot to provide a !CONFIG_DEBUG_LOCKDEP case for the
    redundant_hardirqs_on stat handling.

    Manage that in the headers with a new __debug_atomic_inc() helper.

    Fixes:

    kernel/lockdep.c:2306: error: 'lockdep_stats' undeclared (first use in this function)
    kernel/lockdep.c:2306: error: (Each undeclared identifier is reported only once
    kernel/lockdep.c:2306: error: for each function it appears in.)

    Reported-by: Ingo Molnar
    Signed-off-by: Frederic Weisbecker
    Cc: Peter Zijlstra

    Frederic Weisbecker
     

01 May, 2010

2 commits


06 Apr, 2010

2 commits

  • Locking statistics are implemented using global atomic
    variables. This is usually fine unless some path write them very
    often.

    This is the case for the function and function graph tracers
    that disable irqs for each entry saved (except if the function
    tracer is in preempt disabled only mode).
    And calls to local_irq_save/restore() increment
    hardirqs_on_events and hardirqs_off_events stats (or similar
    stats for redundant versions).

    Incrementing these global vars for each function ends up in too
    much cache bouncing if lockstats are enabled.

    To solve this, implement the debug_atomic_*() operations using
    per cpu vars.

    -v2: Use per_cpu() instead of get_cpu_var() to fetch the desired
    cpu vars on debug_atomic_read()

    -v3: Store the stats in a structure. No need for local_t as we
    are NMI/irq safe.

    -v4: Fix tons of build errors. I thought I had tested it but I
    probably forgot to select the relevant config.

    Suggested-by: Steven Rostedt
    Signed-off-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Steven Rostedt

    Frederic Weisbecker
     
  • * 'slabh' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc:
    eeepc-wmi: include slab.h
    staging/otus: include slab.h from usbdrv.h
    percpu: don't implicitly include slab.h from percpu.h
    kmemcheck: Fix build errors due to missing slab.h
    include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
    iwlwifi: don't include iwl-dev.h from iwl-devtrace.h
    x86: don't include slab.h from arch/x86/include/asm/pgtable_32.h

    Fix up trivial conflicts in include/linux/percpu.h due to
    is_kernel_percpu_address() having been introduced since the slab.h
    cleanup with the percpu_up.c splitup.

    Linus Torvalds
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

29 Mar, 2010

1 commit

  • lockdep has custom code to check whether a pointer belongs to static
    percpu area which is somewhat broken. Implement proper
    is_kernel/module_percpu_address() and replace the custom code.

    On UP, percpu variables are regular static variables and can't be
    distinguished from them. Always return %false on UP.

    Signed-off-by: Tejun Heo
    Acked-by: Peter Zijlstra
    Cc: Rusty Russell
    Cc: Ingo Molnar

    Tejun Heo
     

19 Mar, 2010

1 commit

  • …/git/tip/linux-2.6-tip

    * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (35 commits)
    perf: Fix unexported generic perf_arch_fetch_caller_regs
    perf record: Don't try to find buildids in a zero sized file
    perf: export perf_trace_regs and perf_arch_fetch_caller_regs
    perf, x86: Fix hw_perf_enable() event assignment
    perf, ppc: Fix compile error due to new cpu notifiers
    perf: Make the install relative to DESTDIR if specified
    kprobes: Calculate the index correctly when freeing the out-of-line execution slot
    perf tools: Fix sparse CPU numbering related bugs
    perf_event: Fix oops triggered by cpu offline/online
    perf: Drop the obsolete profile naming for trace events
    perf: Take a hot regs snapshot for trace events
    perf: Introduce new perf_fetch_caller_regs() for hot regs snapshot
    perf/x86-64: Use frame pointer to walk on irq and process stacks
    lockdep: Move lock events under lockdep recursion protection
    perf report: Print the map table just after samples for which no map was found
    perf report: Add multiple event support
    perf session: Change perf_session post processing functions to take histogram tree
    perf session: Add storage for seperating event types in report
    perf session: Change add_hist_entry to take the tree root instead of session
    perf record: Add ID and to recorded event data when recording multiple events
    ...

    Linus Torvalds
     

10 Mar, 2010

1 commit

  • There are rcu locked read side areas in the path where we submit
    a trace event. And these rcu_read_(un)lock() trigger lock events,
    which create recursive events.

    One pair in do_perf_sw_event:

    __lock_acquire
    |
    |--96.11%-- lock_acquire
    | |
    | |--27.21%-- do_perf_sw_event
    | | perf_tp_event
    | | |
    | | |--49.62%-- ftrace_profile_lock_release
    | | | lock_release
    | | | |
    | | | |--33.85%-- _raw_spin_unlock

    Another pair in perf_output_begin/end:

    __lock_acquire
    |--23.40%-- perf_output_begin
    | | __perf_event_overflow
    | | perf_swevent_overflow
    | | perf_swevent_add
    | | perf_swevent_ctx_event
    | | do_perf_sw_event
    | | perf_tp_event
    | | |
    | | |--55.37%-- ftrace_profile_lock_acquire
    | | | lock_acquire
    | | | |
    | | | |--37.31%-- _raw_spin_lock

    The problem is not that much the trace recursion itself, as we have a
    recursion protection already (though it's always wasteful to recurse).
    But the trace events are outside the lockdep recursion protection, then
    each lockdep event triggers a lock trace, which will trigger two
    other lockdep events. Here the recursive lock trace event won't
    be taken because of the trace recursion, so the recursion stops there
    but lockdep will still analyse these new events:

    To sum up, for each lockdep events we have:

    lock_*()
    |
    trace lock_acquire
    |
    ----- rcu_read_lock()
    | |
    | lock_acquire()
    | |
    | trace_lock_acquire() (stopped)
    | |
    | lockdep analyze
    |
    ----- rcu_read_unlock()
    |
    lock_release
    |
    trace_lock_release() (stopped)
    |
    lockdep analyze

    And you can repeat the above two times as we have two rcu read side
    sections when we submit an event.

    This is fixed in this patch by moving the lock trace event under
    the lockdep recursion protection.

    Signed-off-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Steven Rostedt
    Cc: Paul Mackerras
    Cc: Hitoshi Mitake
    Cc: Li Zefan
    Cc: Lai Jiangshan
    Cc: Masami Hiramatsu
    Cc: Jens Axboe

    Frederic Weisbecker