27 Aug, 2008

1 commit

  • The problem is found during iwlagn driver testing on
    v2.6.27-rc4-176-gb8e6c91 kernel, but it turns out to be a lockdep bug.
    In our testing, we frequently load and unload the iwlagn driver
    (>50 times). Then the MAX_STACK_TRACE_ENTRIES is reached (expected
    behaviour?). The error message with the call trace is as below.

    BUG: MAX_STACK_TRACE_ENTRIES too low!
    turning off the locking correctness validator.
    Pid: 4895, comm: iwlagn Not tainted 2.6.27-rc4 #13

    Call Trace:
    [] save_stack_trace+0x22/0x3e
    [] save_trace+0x8b/0x91
    [] mark_lock+0x1b0/0x8fa
    [] __lock_acquire+0x5b9/0x716
    [] ieee80211_sta_work+0x0/0x6ea [mac80211]
    [] lock_acquire+0x52/0x6b
    [] run_workqueue+0x97/0x1ed
    [] run_workqueue+0xe7/0x1ed
    [] run_workqueue+0x97/0x1ed
    [] worker_thread+0xd8/0xe3
    [] autoremove_wake_function+0x0/0x2e
    [] worker_thread+0x0/0xe3
    [] kthread+0x47/0x73
    [] trace_hardirqs_on_thunk+0x3a/0x3f
    [] child_rip+0xa/0x11
    [] restore_args+0x0/0x30
    [] finish_task_switch+0x0/0xcc
    [] kthread+0x0/0x73
    [] child_rip+0x0/0x11

    Although the above is harmless, when the ilwagn module is removed
    later lockdep will trigger a kernel oops as below.

    BUG: unable to handle kernel NULL pointer dereference at
    0000000000000008
    IP: [] zap_class+0x24/0x82
    PGD 73128067 PUD 7448c067 PMD 0
    Oops: 0002 [1] SMP
    CPU 0
    Modules linked in: rfcomm l2cap bluetooth autofs4 sunrpc
    nf_conntrack_ipv6 xt_state nf_conntrack xt_tcpudp ip6t_ipv6header
    ip6t_REJECT ip6table_filter ip6_tables x_tables ipv6 cpufreq_ondemand
    acpi_cpufreq dm_mirror dm_log dm_multipath dm_mod snd_hda_intel sr_mod
    snd_seq_dummy snd_seq_oss snd_seq_midi_event battery snd_seq
    snd_seq_device cdrom button snd_pcm_oss snd_mixer_oss snd_pcm
    snd_timer snd_page_alloc e1000e snd_hwdep sg iTCO_wdt
    iTCO_vendor_support ac pcspkr i2c_i801 i2c_core snd soundcore video
    output ata_piix ata_generic libata sd_mod scsi_mod ext3 jbd mbcache
    uhci_hcd ohci_hcd ehci_hcd [last unloaded: mac80211]
    Pid: 4941, comm: modprobe Not tainted 2.6.27-rc4 #10
    RIP: 0010:[] []
    zap_class+0x24/0x82
    RSP: 0000:ffff88007bcb3eb0 EFLAGS: 00010046
    RAX: 0000000000068ee8 RBX: ffffffff8192a0a0 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000000001dfb RDI: ffffffff816e70b0
    RBP: ffffffffa00cd000 R08: ffffffff816818f8 R09: ffff88007c923558
    R10: ffffe20002ad2408 R11: ffffffff811028ec R12: ffffffff8192a0a0
    R13: 000000000002bd90 R14: 0000000000000000 R15: 0000000000000296
    FS: 00007f9d1cee56f0(0000) GS:ffffffff814a58c0(0000)
    knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 0000000000000008 CR3: 0000000073047000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process modprobe (pid: 4941, threadinfo ffff88007bcb2000, task
    ffff8800758d1fc0)
    Stack: ffffffff81057376 0000000000000000 ffffffffa00f7b00
    0000000000000000
    0000000000000080 0000000000618278 00007fff24f16720 0000000000000000
    ffffffff8105d37a ffffffffa00f7b00 ffffffff8105d591 313132303863616d
    Call Trace:
    [] ? lockdep_free_key_range+0x61/0xf5
    [] ? free_module+0xd4/0xe4
    [] ? sys_delete_module+0x1de/0x1f9
    [] ? audit_syscall_entry+0x12d/0x160
    [] ? system_call_fastpath+0x16/0x1b

    Code: b2 00 01 00 00 00 c3 31 f6 49 c7 c0 10 8a 61 81 eb 32 49 39 38
    75 26 48 98 48 6b c0 38 48 8b 90 08 8a 61 81 48 8b 88 00 8a 61 81
    89 51 08 48 89 0a 48 c7 80 08 8a 61 81 00 02 20 00 48 ff c6
    RIP [] zap_class+0x24/0x82
    RSP
    CR2: 0000000000000008
    ---[ end trace a1297e0c4abb0f2e ]---

    The root cause for this oops is in add_lock_to_list() when
    save_trace() fails due to MAX_STACK_TRACE_ENTRIES is reached,
    entry->class is assigned but entry is never added into any lock list.
    This makes the list_del_rcu() in zap_class() oops later when the
    module is unloaded. This patch fixes the problem by assigning
    entry->class after save_trace() returns success.

    Signed-off-by: Zhu Yi
    Signed-off-by: Ingo Molnar

    Zhu Yi
     

26 Aug, 2008

1 commit


18 Aug, 2008

1 commit


14 Aug, 2008

1 commit


12 Aug, 2008

1 commit

  • When we enable DEBUG_LOCK_ALLOC but do not enable PROVE_LOCKING and or
    LOCK_STAT, lock_alloc() and lock_release() turn into nops, even though
    we should be doing hlock checking (check=1).

    This causes a false warning and a lockdep self-disable.

    Rectify this.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

11 Aug, 2008

4 commits

  • Solve this by marking the classes as unused and not printing information
    about the unused classes.

    Reported-by: Eric Sesterhenn
    Signed-off-by: Rabin Vincent
    Acked-by: Huang Ying
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Rabin Vincent
     
  • On Fri, 2008-08-01 at 16:26 -0700, Linus Torvalds wrote:

    > On Fri, 1 Aug 2008, David Miller wrote:
    > >
    > > Taking more than a few locks of the same class at once is bad
    > > news and it's better to find an alternative method.
    >
    > It's not always wrong.
    >
    > If you can guarantee that anybody that takes more than one lock of a
    > particular class will always take a single top-level lock _first_, then
    > that's all good. You can obviously screw up and take the same lock _twice_
    > (which will deadlock), but at least you cannot get into ABBA situations.
    >
    > So maybe the right thing to do is to just teach lockdep about "lock
    > protection locks". That would have solved the multi-queue issues for
    > networking too - all the actual network drivers would still have taken
    > just their single queue lock, but the one case that needs to take all of
    > them would have taken a separate top-level lock first.
    >
    > Never mind that the multi-queue locks were always taken in the same order:
    > it's never wrong to just have some top-level serialization, and anybody
    > who needs to take locks might as well do , because they sure as
    > hell aren't going to be on _any_ fastpaths.
    >
    > So the simplest solution really sounds like just teaching lockdep about
    > that one special case. It's not "nesting" exactly, although it's obviously
    > related to it.

    Do as Linus suggested. The lock protection lock is called nest_lock.

    Note that we still have the MAX_LOCK_DEPTH (48) limit to consider, so anything
    that spills that it still up shit creek.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • struct held_lock {
    u64 prev_chain_key; /* 0 8 */
    struct lock_class * class; /* 8 8 */
    long unsigned int acquire_ip; /* 16 8 */
    struct lockdep_map * instance; /* 24 8 */
    int irq_context; /* 32 4 */
    int trylock; /* 36 4 */
    int read; /* 40 4 */
    int check; /* 44 4 */
    int hardirqs_off; /* 48 4 */

    /* size: 56, cachelines: 1 */
    /* padding: 4 */
    /* last cacheline: 56 bytes */
    };

    struct held_lock {
    u64 prev_chain_key; /* 0 8 */
    long unsigned int acquire_ip; /* 8 8 */
    struct lockdep_map * instance; /* 16 8 */
    unsigned int class_idx:11; /* 24:21 4 */
    unsigned int irq_context:2; /* 24:19 4 */
    unsigned int trylock:1; /* 24:18 4 */
    unsigned int read:2; /* 24:16 4 */
    unsigned int check:2; /* 24:14 4 */
    unsigned int hardirqs_off:1; /* 24:13 4 */

    /* size: 32, cachelines: 1 */
    /* padding: 4 */
    /* bit_padding: 13 bits */
    /* last cacheline: 32 bytes */
    };

    [mingo@elte.hu: shrunk hlock->class too]
    [peterz@infradead.org: fixup bit sizes]
    Signed-off-by: Dave Jones
    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra

    Dave Jones
     
  • this can be used to reset a held lock's subclass, for arbitrary-depth
    iterated data structures such as trees or lists which have per-node
    locks.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

01 Aug, 2008

1 commit

  • When we traverse the graph, either forwards or backwards, we
    are interested in whether a certain property exists somewhere
    in a node reachable in the graph.

    Therefore it is never necessary to traverse through a node more
    than once to get a correct answer to the given query.

    Take advantage of this property using a global ID counter so that we
    need not clear all the markers in all the lock_class entries before
    doing a traversal. A new ID is choosen when we start to traverse, and
    we continue through a lock_class only if it's ID hasn't been marked
    with the new value yet.

    This short-circuiting is essential especially for high CPU count
    systems. The scheduler has a runqueue per cpu, and needs to take
    two runqueue locks at a time, which leads to long chains of
    backwards and forwards subgraphs from these runqueue lock nodes.
    Without the short-circuit implemented here, a graph traversal on
    a runqueue lock can take up to (1 << (N - 1)) checks on a system
    with N cpus.

    For anything more than 16 cpus or so, lockdep will eventually bring
    the machine to a complete standstill.

    Signed-off-by: David S. Miller
    Acked-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    David Miller
     

15 Jul, 2008

1 commit

  • * 'core/locking' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    lockdep: fix kernel/fork.c warning
    lockdep: fix ftrace irq tracing false positive
    lockdep: remove duplicate definition of STATIC_LOCKDEP_MAP_INIT
    lockdep: add lock_class information to lock_chain and output it
    lockdep: add lock_class information to lock_chain and output it
    lockdep: output lock_class key instead of address for forward dependency output
    __mutex_lock_common: use signal_pending_state()
    mutex-debug: check mutex magic before owner

    Fixed up conflict in kernel/fork.c manually

    Linus Torvalds
     

14 Jul, 2008

1 commit

  • fix this false positive:

    [ 0.020000] ------------[ cut here ]------------
    [ 0.020000] WARNING: at kernel/lockdep.c:2718 check_flags+0x14a/0x170()
    [ 0.020000] Modules linked in:
    [ 0.020000] Pid: 0, comm: swapper Not tainted 2.6.26-tip-00343-gd7e5521-dirty #14486
    [ 0.020000] [] warn_on_slowpath+0x54/0x80
    [ 0.020000] [] ? _spin_unlock_irqrestore+0x61/0x70
    [ 0.020000] [] ? release_console_sem+0x201/0x210
    [ 0.020000] [] ? __kernel_text_address+0x35/0x40
    [ 0.020000] [] ? dump_trace+0x5e/0x140
    [ 0.020000] [] ? __lock_acquire+0x245/0x820
    [ 0.020000] [] check_flags+0x14a/0x170
    [ 0.020000] [] ? lock_acquire+0x48/0xc0
    [ 0.020000] [] lock_acquire+0x51/0xc0
    [ 0.020000] [] ? down+0x2c/0x40
    [ 0.020000] [] ? sched_clock+0x9/0x10
    [ 0.020000] [] _write_lock+0x32/0x60
    [ 0.020000] [] ? request_resource+0x1f/0xb0
    [ 0.020000] [] request_resource+0x1f/0xb0
    [ 0.020000] [] vgacon_startup+0x2bd/0x3e0
    [ 0.020000] [] con_init+0x19/0x22f
    [ 0.020000] [] ? tty_register_ldisc+0x5c/0x70
    [ 0.020000] [] console_init+0x20/0x2e
    [ 0.020000] [] start_kernel+0x20c/0x379
    [ 0.020000] [] ? unknown_bootoption+0x0/0x1f6
    [ 0.020000] [] __init_begin+0x99/0xa1
    [ 0.020000] =======================
    [ 0.020000] ---[ end trace 4eaa2a86a8e2da22 ]---
    [ 0.020000] possible reason: unannotated irqs-on.
    [ 0.020000] irq event stamp: 0

    which occurs if CONFIG_TRACE_IRQFLAGS=y, CONFIG_DEBUG_LOCKDEP=y,
    but CONFIG_PROVE_LOCKING is disabled.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

24 Jun, 2008

1 commit


20 Jun, 2008

1 commit


24 May, 2008

4 commits

  • With the introduction of ftrace, it is possible to recurse into
    the lockdep functions via the mcount call. To prevent possible
    lockups, updating the lockdep_recursion counter on grabbing the internal
    lockdep_lock should prevent deadlocks.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Steven Rostedt
     
  • This patch removes the "notrace" annotation from lockdep and adds the debugging
    files in the kernel director to those that should not be compiled with
    "-pg" mcount tracing.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Steven Rostedt
     
  • Add notrace annotations to lockdep to keep ftrace from causing
    recursive problems with lock tracing and debugging.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Steven Rostedt
     
  • This patch adds latency tracing for critical timings
    (how long interrupts are disabled for).

    "irqsoff" is added to /debugfs/tracing/available_tracers

    Note:
    tracing_max_latency
    also holds the max latency for irqsoff (in usecs).
    (default to large number so one must start latency tracing)

    tracing_thresh
    threshold (in usecs) to always print out if irqs off
    is detected to be longer than stated here.
    If irq_thresh is non-zero, then max_irq_latency
    is ignored.

    Here's an example of a trace with ftrace_enabled = 0

    =======
    preemption latency trace v1.1.5 on 2.6.24-rc7
    Signed-off-by: Ingo Molnar
    --------------------------------------------------------------------
    latency: 100 us, #3/3, CPU#1 | (M:rt VP:0, KP:0, SP:0 HP:0 #P:2)
    -----------------
    | task: swapper-0 (uid:0 nice:0 policy:0 rt_prio:0)
    -----------------
    => started at: _spin_lock_irqsave+0x2a/0xb7
    => ended at: _spin_unlock_irqrestore+0x32/0x5f

    _------=> CPU#
    / _-----=> irqs-off
    | / _----=> need-resched
    || / _---=> hardirq/softirq
    ||| / _--=> preempt-depth
    |||| /
    ||||| delay
    cmd pid ||||| time | caller
    \ / ||||| \ | /
    swapper-0 1d.s3 0us+: _spin_lock_irqsave+0x2a/0xb7 (e1000_update_stats+0x47/0x64c [e1000])
    swapper-0 1d.s3 100us : _spin_unlock_irqrestore+0x32/0x5f (e1000_update_stats+0x641/0x64c [e1000])
    swapper-0 1d.s3 100us : trace_hardirqs_on_caller+0x75/0x89 (_spin_unlock_irqrestore+0x32/0x5f)

    vim:ft=help
    =======

    And this is a trace with ftrace_enabled == 1

    =======
    preemption latency trace v1.1.5 on 2.6.24-rc7
    --------------------------------------------------------------------
    latency: 102 us, #12/12, CPU#1 | (M:rt VP:0, KP:0, SP:0 HP:0 #P:2)
    -----------------
    | task: swapper-0 (uid:0 nice:0 policy:0 rt_prio:0)
    -----------------
    => started at: _spin_lock_irqsave+0x2a/0xb7
    => ended at: _spin_unlock_irqrestore+0x32/0x5f

    _------=> CPU#
    / _-----=> irqs-off
    | / _----=> need-resched
    || / _---=> hardirq/softirq
    ||| / _--=> preempt-depth
    |||| /
    ||||| delay
    cmd pid ||||| time | caller
    \ / ||||| \ | /
    swapper-0 1dNs3 0us+: _spin_lock_irqsave+0x2a/0xb7 (e1000_update_stats+0x47/0x64c [e1000])
    swapper-0 1dNs3 46us : e1000_read_phy_reg+0x16/0x225 [e1000] (e1000_update_stats+0x5e2/0x64c [e1000])
    swapper-0 1dNs3 46us : e1000_swfw_sync_acquire+0x10/0x99 [e1000] (e1000_read_phy_reg+0x49/0x225 [e1000])
    swapper-0 1dNs3 46us : e1000_get_hw_eeprom_semaphore+0x12/0xa6 [e1000] (e1000_swfw_sync_acquire+0x36/0x99 [e1000])
    swapper-0 1dNs3 47us : __const_udelay+0x9/0x47 (e1000_read_phy_reg+0x116/0x225 [e1000])
    swapper-0 1dNs3 47us+: __delay+0x9/0x50 (__const_udelay+0x45/0x47)
    swapper-0 1dNs3 97us : preempt_schedule+0xc/0x84 (__delay+0x4e/0x50)
    swapper-0 1dNs3 98us : e1000_swfw_sync_release+0xc/0x55 [e1000] (e1000_read_phy_reg+0x211/0x225 [e1000])
    swapper-0 1dNs3 99us+: e1000_put_hw_eeprom_semaphore+0x9/0x35 [e1000] (e1000_swfw_sync_release+0x50/0x55 [e1000])
    swapper-0 1dNs3 101us : _spin_unlock_irqrestore+0xe/0x5f (e1000_update_stats+0x641/0x64c [e1000])
    swapper-0 1dNs3 102us : _spin_unlock_irqrestore+0x32/0x5f (e1000_update_stats+0x641/0x64c [e1000])
    swapper-0 1dNs3 102us : trace_hardirqs_on_caller+0x75/0x89 (_spin_unlock_irqrestore+0x32/0x5f)

    vim:ft=help
    =======

    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Steven Rostedt
     

26 Feb, 2008

1 commit


26 Jan, 2008

1 commit

  • this patch extends the soft-lockup detector to automatically
    detect hung TASK_UNINTERRUPTIBLE tasks. Such hung tasks are
    printed the following way:

    ------------------>
    INFO: task prctl:3042 blocked for more than 120 seconds.
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message
    prctl D fd5e3793 0 3042 2997
    f6050f38 00000046 00000001 fd5e3793 00000009 c06d8264 c06dae80 00000286
    f6050f40 f6050f00 f7d34d90 f7d34fc8 c1e1be80 00000001 f6050000 00000000
    f7e92d00 00000286 f6050f18 c0489d1a f6050f40 00006605 00000000 c0133a5b
    Call Trace:
    [] schedule_timeout+0x6d/0x8b
    [] schedule_timeout_uninterruptible+0x15/0x17
    [] msleep+0x10/0x16
    [] sys_prctl+0x30/0x1e2
    [] sysenter_past_esp+0x5f/0xa5
    =======================
    2 locks held by prctl/3042:
    #0: (&sb->s_type->i_mutex_key#5){--..}, at: [] do_fsync+0x38/0x7a
    #1: (jbd_handle){--..}, at: [] journal_start+0xc7/0xe9
    : CPU hotplug fixes. ]
    [ Andrew Morton : build warning fix. ]

    Signed-off-by: Ingo Molnar
    Signed-off-by: Arjan van de Ven

    Ingo Molnar
     

25 Jan, 2008

1 commit

  • Michael Wu noticed in his lkml post at

    http://marc.info/?l=linux-kernel&m=119396182726091&w=2

    that certain wireless drivers ended up having their name in module
    memory, which would then crash the kernel on module unload.

    The patch he proposed was a bit clumsy in that it increased the size of
    a lockdep entry significantly; the patch below tries another approach,
    it checks, on module teardown, if the name of a class is in module space
    and then zaps the class. This is very similar to what we already do
    with keys that are in module space.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Ingo Molnar
    Acked-by: Peter Zijlstra
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     

16 Jan, 2008

1 commit


08 Dec, 2007

1 commit


05 Dec, 2007

2 commits

  • Torsten Kaiser wrote:

    | static inline int in_range(const void *start, const void *addr, const void *end)
    | {
    | return addr >= start && addr mem_to is the last byte of the freed range, that fits in_range
    | lock_from = (void *)hlock->instance;
    | -> first byte of the lock
    | lock_to = (void *)(hlock->instance + 1);
    | -> first byte of the next lock, not last byte of the lock that is being checked!
    |
    | The test is:
    | if (!in_range(mem_from, lock_from, mem_to) &&
    | !in_range(mem_from, lock_to, mem_to))
    | continue;
    | So it tests, if the first byte of the lock is in the range that is freed ->OK
    | And if the first byte of the *next* lock is in the range that is freed
    | -> Not OK.

    We can also simplify in_range checks, we need only 2 comparisons, not 4.
    If the lock is not in memory range, it should be either at the left of range
    or at the right.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra

    Oleg Nesterov
     
  • fix the oops that can be seen in:

    http://bugzilla.kernel.org/attachment.cgi?id=13828&action=view

    it is not safe to print the locks of running tasks.

    (even with this fix we have a small race - but this is a debug
    function after all.)

    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra

    Ingo Molnar
     

29 Oct, 2007

1 commit


20 Oct, 2007

2 commits

  • The task_struct->pid member is going to be deprecated, so start
    using the helpers (task_pid_nr/task_pid_vnr/task_pid_nr_ns) in
    the kernel.

    The first thing to start with is the pid, printed to dmesg - in
    this case we may safely use task_pid_nr(). Besides, printks produce
    more (much more) than a half of all the explicit pid usage.

    [akpm@linux-foundation.org: git-drm went and changed lots of stuff]
    Signed-off-by: Pavel Emelyanov
    Cc: Dave Airlie
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • In the following scenario:

    code path 1:
    my_function() -> lock(L1); ...; flush_workqueue(); ...

    code path 2:
    run_workqueue() -> my_work() -> ...; lock(L1); ...

    you can get a deadlock when my_work() is queued or running
    but my_function() has acquired L1 already.

    This patch adds a pseudo-lock to each workqueue to make lockdep
    warn about this scenario.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Johannes Berg
    Acked-by: Oleg Nesterov
    Acked-by: Ingo Molnar
    Acked-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Berg
     

12 Oct, 2007

2 commits

  • Provide a check to validate that we do not hold any locks when switching
    back to user-space.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • It is possible for the current->curr_chain_key to become inconsistent with the
    current index if the chain fails to validate. The end result is that future
    lock_acquire() operations may inadvertently fail to find a hit in the cache
    resulting in a new node being added to the graph for every acquire.

    Signed-off-by: Gregory Haskins
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Gregory Haskins
     

20 Jul, 2007

7 commits

  • When I started adding support for lockdep to 64-bit powerpc, I got a
    lockdep_init_error and with this patch was able to pinpoint why and where
    to put lockdep_init(). Let's support this generally for others adding
    lockdep support to their architecture.

    Signed-off-by: Johannes Berg
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Berg
     
  • __acquire
    |
    lock _____
    | \
    | __contended
    | |
    | wait
    | _______/
    |/
    |
    __acquired
    |
    __release
    |
    unlock

    We measure acquisition and contention bouncing.

    This is done by recording a cpu stamp in each lock instance.

    Contention bouncing requires the cpu stamp to be set on acquisition. Hence we
    move __acquired into the generic path.

    __acquired is then used to measure acquisition bouncing by comparing the
    current cpu with the old stamp before replacing it.

    __contended is used to measure contention bouncing (only useful for preemptable
    locks)

    [akpm@linux-foundation.org: cleanups]
    Signed-off-by: Peter Zijlstra
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • - update the copyright notices
    - use the default hash function
    - fix a thinko in a BUILD_BUG_ON
    - add a WARN_ON to spot inconsitent naming
    - fix a termination issue in /proc/lock_stat

    [akpm@linux-foundation.org: cleanups]
    Signed-off-by: Peter Zijlstra
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Present all this fancy new lock statistics information:

    *warning, _wide_ output ahead*

    (output edited for purpose of brevity)

    # cat /proc/lock_stat
    lock_stat version 0.1
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------
    class name contentions waittime-min waittime-max waittime-total acquisitions holdtime-min holdtime-max holdtime-total
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------

    &inode->i_mutex: 14458 6.57 398832.75 2469412.23 6768876 0.34 11398383.65 339410830.89
    ---------------
    &inode->i_mutex 4486 [] pipe_wait+0x86/0x8d
    &inode->i_mutex 0 [] pipe_write_fasync+0x29/0x5d
    &inode->i_mutex 0 [] pipe_read+0x74/0x3a5
    &inode->i_mutex 0 [] do_lookup+0x81/0x1ae

    .................................................................................................................................................................

    &inode->i_data.tree_lock-W: 491 0.27 62.47 493.89 2477833 0.39 468.89 1146584.25
    &inode->i_data.tree_lock-R: 65 0.44 4.27 48.78 26288792 0.36 184.62 10197458.24
    --------------------------
    &inode->i_data.tree_lock 46 [] __do_page_cache_readahead+0x69/0x24f
    &inode->i_data.tree_lock 31 [] add_to_page_cache+0x31/0xba
    &inode->i_data.tree_lock 0 [] __do_page_cache_readahead+0xc2/0x24f
    &inode->i_data.tree_lock 0 [] find_get_page+0x1a/0x58

    .................................................................................................................................................................

    proc_inum_idr.lock: 0 0.00 0.00 0.00 36 0.00 65.60 148.26
    proc_subdir_lock: 0 0.00 0.00 0.00 3049859 0.00 106.81 1563212.42
    shrinker_rwsem-W: 0 0.00 0.00 0.00 5 0.00 1.73 3.68
    shrinker_rwsem-R: 0 0.00 0.00 0.00 633 2.57 246.57 10909.76

    'contentions' and 'acquisitions' are the number of such events measured (since
    the last reset). The waittime- and holdtime- (min, max, total) numbers are
    presented in microseconds.

    If there are any contention points, the lock class is presented in the block
    format (as i_mutex and tree_lock above), otherwise a single line of output is
    presented.

    The output is sorted on absolute number of contentions (read + write), this
    should get the worst offenders presented first, so that:

    # grep : /proc/lock_stat | head

    will quickly show who's bad.

    The stats can be reset using:

    # echo 0 > /proc/lock_stat

    [bunk@stusta.de: make 2 functions static]
    [akpm@linux-foundation.org: fix printk warning]
    Signed-off-by: Peter Zijlstra
    Acked-by: Ingo Molnar
    Acked-by: Jason Baron
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Introduce the core lock statistics code.

    Lock statistics provides lock wait-time and hold-time (as well as the count
    of corresponding contention and acquisitions events). Also, the first few
    call-sites that encounter contention are tracked.

    Lock wait-time is the time spent waiting on the lock. This provides insight
    into the locking scheme, that is, a heavily contended lock is indicative of
    a too coarse locking scheme.

    Lock hold-time is the duration the lock was held, this provides a reference for
    the wait-time numbers, so they can be put into perspective.

    1)
    lock
    2)
    ... do stuff ..
    unlock
    3)

    The time between 1 and 2 is the wait-time. The time between 2 and 3 is the
    hold-time.

    The lockdep held-lock tracking code is reused, because it already collects locks
    into meaningful groups (classes), and because it is an existing infrastructure
    for lock instrumentation.

    Currently lockdep tracks lock acquisition with two hooks:

    lock()
    lock_acquire()
    _lock()

    ... code protected by lock ...

    unlock()
    lock_release()
    _unlock()

    We need to extend this with two more hooks, in order to measure contention.

    lock_contended() - used to measure contention events
    lock_acquired() - completion of the contention

    These are then placed the following way:

    lock()
    lock_acquire()
    if (!_try_lock())
    lock_contended()
    _lock()
    lock_acquired()

    ... do locked stuff ...

    unlock()
    lock_release()
    _unlock()

    (Note: the try_lock() 'trick' is used to avoid instrumenting all platform
    dependent lock primitive implementations.)

    It is also possible to toggle the two lockdep features at runtime using:

    /proc/sys/kernel/prove_locking
    /proc/sys/kernel/lock_stat

    (esp. turning off the O(n^2) prove_locking functionaliy can help)

    [akpm@linux-foundation.org: build fixes]
    [akpm@linux-foundation.org: nuke unneeded ifdefs]
    Signed-off-by: Peter Zijlstra
    Acked-by: Ingo Molnar
    Acked-by: Jason Baron
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Move code around to get fewer but larger #ifdef sections. Break some
    in-function #ifdefs out into their own functions.

    Signed-off-by: Peter Zijlstra
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Ensure that all of the lock dependency tracking code is under
    CONFIG_PROVE_LOCKING. This allows us to use the held lock tracking code for
    other purposes.

    Signed-off-by: Peter Zijlstra
    Acked-by: Ingo Molnar
    Acked-by: Jason Baron
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

18 Jul, 2007

1 commit

  • KSYM_NAME_LEN is peculiar in that it does not include the space for the
    trailing '\0', forcing all users to use KSYM_NAME_LEN + 1 when allocating
    buffer. This is nonsense and error-prone. Moreover, when the caller
    forgets that it's very likely to subtly bite back by corrupting the stack
    because the last position of the buffer is always cleared to zero.

    This patch increments KSYM_NAME_LEN by one and updates code accordingly.

    * off-by-one bug in asm-powerpc/kprobes.h::kprobe_lookup_name() macro
    is fixed.

    * Where MODULE_NAME_LEN and KSYM_NAME_LEN were used together,
    MODULE_NAME_LEN was treated as if it didn't include space for the
    trailing '\0'. Fix it.

    Signed-off-by: Tejun Heo
    Acked-by: Paulo Marques
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     

09 May, 2007

2 commits