01 Sep, 2008

1 commit

  • Daniel J. Blueman reported:
    > =======================================================
    > [ INFO: possible circular locking dependency detected ]
    > 2.6.27-rc4-224c #1
    > -------------------------------------------------------
    > hald/4680 is trying to acquire lock:
    > (&n->list_lock){++..}, at: [] add_partial+0x26/0x80
    >
    > but task is already holding lock:
    > (&obj_hash[i].lock){++..}, at: []
    > debug_object_free+0x5c/0x120

    We fix it by moving the actual freeing to outside the lock (the lock
    now only protects the list).

    The pool lock is also promoted to irq-safe (suggested by Dan). It's
    necessary because free_pool is now called outside the irq disabled
    region. So we need to protect against an interrupt handler which calls
    debug_object_init().

    [tglx@linutronix.de: added hlist_move_list helper to avoid looping
    through the list twice]

    Reported-by: Daniel J Blueman
    Signed-off-by: Vegard Nossum
    Signed-off-by: Thomas Gleixner

    Vegard Nossum
     

27 Jul, 2008

1 commit

  • Use WARN() instead of a printk+WARN_ON() pair; this way the message becomes
    part of the warning section for better reporting/collection. In addition, one
    of the if() clauses collapes into the WARN() entirely now.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     

25 Jul, 2008

1 commit

  • lib/debugobjects.c has a function to test if an object is on the stack.
    The block layer and ide needs it (they need to avoid DMA from/to stack
    buffers). This patch moves the function to include/linux/sched.h so that
    everyone can use it.

    lib/debugobjects.c uses current->stack but this patch uses a
    task_stack_page() accessor, which is a preferable way to access the stack.

    Signed-off-by: FUJITA Tomonori
    Cc: Christoph Lameter
    Cc: Andy Whitcroft
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    FUJITA Tomonori
     

18 Jun, 2008

1 commit

  • Daniel J Blueman reported:
    | =======================================================
    | [ INFO: possible circular locking dependency detected ]
    | 2.6.26-rc5-201c #1
    | -------------------------------------------------------
    | nscd/3669 is trying to acquire lock:
    | (&n->list_lock){.+..}, at: [] deactivate_slab+0x173/0x1e0
    |
    | but task is already holding lock:
    | (&obj_hash[i].lock){++..}, at: []
    | __debug_object_init+0x2f/0x350
    |
    | which lock already depends on the new lock.

    There are two locks involved here; the first is a SLUB-local lock, and
    the second is a debugobjects-local lock. They are basically taken in two
    different orders:

    1. SLUB { debugobjects { ... } }
    2. debugobjects { SLUB { ... } }

    This patch changes pattern #2 by trying to fill the memory pool (e.g.
    the call into SLUB/kmalloc()) outside the debugobjects lock, so now the
    two patterns look like this:

    1. SLUB { debugobjects { ... } }
    2. SLUB { } debugobjects { ... }

    [ daniel.blueman@gmail.com: pool_lock needs to be taken irq safe in fill_pool ]

    Reported-by: Daniel J Blueman
    Signed-off-by: Vegard Nossum
    Signed-off-by: Thomas Gleixner

    Vegard Nossum
     

30 Apr, 2008

1 commit

  • We can see an ever repeating problem pattern with objects of any kind in the
    kernel:

    1) freeing of active objects
    2) reinitialization of active objects

    Both problems can be hard to debug because the crash happens at a point where
    we have no chance to decode the root cause anymore. One problem spot are
    kernel timers, where the detection of the problem often happens in interrupt
    context and usually causes the machine to panic.

    While working on a timer related bug report I had to hack specialized code
    into the timer subsystem to get a reasonable hint for the root cause. This
    debug hack was fine for temporary use, but far from a mergeable solution due
    to the intrusiveness into the timer code.

    The code further lacked the ability to detect and report the root cause
    instantly and keep the system operational.

    Keeping the system operational is important to get hold of the debug
    information without special debugging aids like serial consoles and special
    knowledge of the bug reporter.

    The problems described above are not restricted to timers, but timers tend to
    expose it usually in a full system crash. Other objects are less explosive,
    but the symptoms caused by such mistakes can be even harder to debug.

    Instead of creating specialized debugging code for the timer subsystem a
    generic infrastructure is created which allows developers to verify their code
    and provides an easy to enable debug facility for users in case of trouble.

    The debugobjects core code keeps track of operations on static and dynamic
    objects by inserting them into a hashed list and sanity checking them on
    object operations and provides additional checks whenever kernel memory is
    freed.

    The tracked object operations are:
    - initializing an object
    - adding an object to a subsystem list
    - deleting an object from a subsystem list

    Each operation is sanity checked before the operation is executed and the
    subsystem specific code can provide a fixup function which allows to prevent
    the damage of the operation. When the sanity check triggers a warning message
    and a stack trace is printed.

    The list of operations can be extended if the need arises. For now it's
    limited to the requirements of the first user (timers).

    The core code enqueues the objects into hash buckets. The hash index is
    generated from the address of the object to simplify the lookup for the check
    on kfree/vfree. Each bucket has it's own spinlock to avoid contention on a
    global lock.

    The debug code can be compiled in without being active. The runtime overhead
    is minimal and could be optimized by asm alternatives. A kernel command line
    option enables the debugging code.

    Thanks to Ingo Molnar for review, suggestions and cleanup patches.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar
    Cc: Greg KH
    Cc: Randy Dunlap
    Cc: Kay Sievers
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner