20 Dec, 2011

1 commit

  • When printing the code bytes in show_registers(), the markers around the
    byte at the fault address could make the printk() format string look
    like a valid log level and facility code. This would prevent this byte
    from being printed and result in a spurious newline:

    [ 7555.765589] Code: 8b 32 e9 94 00 00 00 81 7d 00 ff 00 00 00 0f 87 96 00 00 00 48 8b 83 c0 00 00 00 44 89 e2 44 89 e6 48 89 df 48 8b 80 d8 02 00 00
    [ 7555.765683] 8b 48 28 48 89 d0 81 e2 ff 0f 00 00 48 c1 e8 0c 48 c1 e0 04

    Add KERN_CONT where needed, and elsewhere in show_registers() for
    consistency.

    Signed-off-by: Clemens Ladisch
    Link: http://lkml.kernel.org/r/4EEFA7AE.9020407@ladisch.de
    Signed-off-by: H. Peter Anvin

    Clemens Ladisch
     

03 Jul, 2011

2 commits

  • rbp is used in SAVE_ARGS_IRQ to save the old stack pointer
    in order to restore it later in ret_from_intr.

    It is convenient because we save its value in the irq regs
    and it's easily restored using the leave instruction.

    However this is a kind of abuse of the frame pointer which
    role is to help unwinding the kernel by chaining frames
    together, each node following the return address to the
    previous frame.

    But although we are breaking the frame by changing the stack
    pointer, there is no preceding return address before the new
    frame. Hence using the frame pointer to link the two stacks
    breaks the stack unwinders that find a random value instead of
    a return address here.

    There is no workaround that can work in every case. We are using
    the fixup_bp_irq_link() function to dereference that abused frame
    pointer in the case of non nesting interrupt (which means stack
    changed).
    But that doesn't fix the case of interrupts that don't change the
    stack (but we still have the unconditional frame link), which is
    the case of hardirq interrupting softirq. We have no way to detect
    this transition so the frame irq link is considered as a real frame
    pointer and the return address is dereferenced but it is still a
    spurious one.

    There are two possible results of this: either the spurious return
    address, a random stack value, luckily belongs to the kernel text
    and then the unwinding can continue and we just have a weird entry
    in the stack trace. Or it doesn't belong to the kernel text and
    unwinding stops there.

    This is the reason why stacktraces (including perf callchains) on
    irqs that interrupted softirqs don't work very well.

    To solve this, we don't save the old stack pointer on rbp anymore
    but we save it to a scratch register that we push on the new
    stack and that we pop back later on irq return.

    This preserves the whole frame chain without spurious return addresses
    in the middle and drops the need for the horrid fixup_bp_irq_link()
    workaround.

    And finally irqs that interrupt softirq are sanely unwinded.

    Before:

    99.81% perf [kernel.kallsyms] [k] perf_pending_event
    |
    --- perf_pending_event
    irq_work_run
    smp_irq_work_interrupt
    irq_work_interrupt
    |
    |--41.60%-- __read
    | |
    | |--99.90%-- create_worker
    | | bench_sched_messaging
    | | cmd_bench
    | | run_builtin
    | | main
    | | __libc_start_main
    | --0.10%-- [...]

    After:

    1.64% swapper [kernel.kallsyms] [k] perf_pending_event
    |
    --- perf_pending_event
    irq_work_run
    smp_irq_work_interrupt
    irq_work_interrupt
    |
    |--95.00%-- arch_irq_work_raise
    | irq_work_queue
    | __perf_event_overflow
    | perf_swevent_overflow
    | perf_swevent_event
    | perf_tp_event
    | perf_trace_softirq
    | __do_softirq
    | call_softirq
    | do_softirq
    | irq_exit
    | |
    | |--73.68%-- smp_apic_timer_interrupt
    | | apic_timer_interrupt
    | | |
    | | |--96.43%-- amd_e400_idle
    | | | cpu_idle
    | | | start_secondary

    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: H. Peter Anvin
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Jan Beulich

    Frederic Weisbecker
     
  • When regs are passed to dump_stack(), we fetch the frame
    pointer from the regs but the stack pointer is taken from
    the current frame.

    Thus the frame and stack pointers may not come from the same
    context. For example this can result in the unwinder to
    think the context is in irq, due to the current value of
    the stack, but the frame pointer coming from the regs points
    to a frame from another place. It then tries to fix up
    the irq link but ends up dereferencing a random frame
    pointer that doesn't belong to the irq stack:

    [ 9131.706906] ------------[ cut here ]------------
    [ 9131.707003] WARNING: at arch/x86/kernel/dumpstack_64.c:129 dump_trace+0x2aa/0x330()
    [ 9131.707003] Hardware name: AMD690VM-FMH
    [ 9131.707003] Perf: bad frame pointer = 0000000000000005 in callchain
    [ 9131.707003] Modules linked in:
    [ 9131.707003] Pid: 1050, comm: perf Not tainted 3.0.0-rc3+ #181
    [ 9131.707003] Call Trace:
    [ 9131.707003] [] warn_slowpath_common+0x7a/0xb0
    [ 9131.707003] [] warn_slowpath_fmt+0x41/0x50
    [ 9131.707003] [] ? bad_to_user+0x6d/0x10be
    [ 9131.707003] [] dump_trace+0x2aa/0x330
    [ 9131.707003] [] ? native_sched_clock+0x13/0x50
    [ 9131.707003] [] perf_callchain_kernel+0x54/0x70
    [ 9131.707003] [] perf_prepare_sample+0x19f/0x2a0
    [ 9131.707003] [] __perf_event_overflow+0x16c/0x290
    [ 9131.707003] [] ? __perf_event_overflow+0x130/0x290
    [ 9131.707003] [] ? native_sched_clock+0x13/0x50
    [ 9131.707003] [] ? sched_clock+0x9/0x10
    [ 9131.707003] [] ? T.375+0x15/0x90
    [ 9131.707003] [] ? trace_hardirqs_on_caller+0x64/0x180
    [ 9131.707003] [] ? trace_hardirqs_off+0xd/0x10
    [ 9131.707003] [] perf_event_overflow+0x14/0x20
    [ 9131.707003] [] perf_swevent_hrtimer+0x11c/0x130
    [ 9131.707003] [] ? error_exit+0x51/0xb0
    [ 9131.707003] [] __run_hrtimer+0x83/0x1e0
    [ 9131.707003] [] ? perf_event_overflow+0x20/0x20
    [ 9131.707003] [] hrtimer_interrupt+0x106/0x250
    [ 9131.707003] [] ? trace_hardirqs_off_thunk+0x3a/0x3c
    [ 9131.707003] [] smp_apic_timer_interrupt+0x53/0x90
    [ 9131.707003] [] apic_timer_interrupt+0x13/0x20
    [ 9131.707003] [] ? error_exit+0x51/0xb0
    [ 9131.707003] [] ? error_exit+0x4c/0xb0
    [ 9131.707003] ---[ end trace b2560d4876709347 ]---

    Fix this by simply taking the stack pointer from regs->sp
    when regs are provided.

    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: H. Peter Anvin
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo

    Frederic Weisbecker
     

18 Mar, 2011

1 commit

  • Current stack dump code scans entire stack and check each entry
    contains a pointer to kernel code. If CONFIG_FRAME_POINTER=y it
    could mark whether the pointer is valid or not based on value of
    the frame pointer. Invalid entries could be preceded by '?' sign.

    However this was not going to happen because scan start point
    was always higher than the frame pointer so that they could not
    meet.

    Commit 9c0729dc8062 ("x86: Eliminate bp argument from the stack
    tracing routines") delayed bp acquisition point, so the bp was
    read in lower frame, thus all of the entries were marked
    invalid.

    This patch fixes this by reverting above commit while retaining
    stack_frame() helper as suggested by Frederic Weisbecker.

    End result looks like below:

    before:

    [ 3.508329] Call Trace:
    [ 3.508551] [] ? panic+0x91/0x199
    [ 3.508662] [] ? printk+0x68/0x6a
    [ 3.508770] [] ? mount_block_root+0x257/0x26e
    [ 3.508876] [] ? mount_root+0x56/0x5a
    [ 3.508975] [] ? prepare_namespace+0x170/0x1a9
    [ 3.509216] [] ? kernel_init+0x1d2/0x1e2
    [ 3.509335] [] ? kernel_thread_helper+0x4/0x10
    [ 3.509442] [] ? restore_args+0x0/0x30
    [ 3.509542] [] ? kernel_init+0x0/0x1e2
    [ 3.509641] [] ? kernel_thread_helper+0x0/0x10

    after:

    [ 3.522991] Call Trace:
    [ 3.523351] [] panic+0x91/0x199
    [ 3.523468] [] ? printk+0x68/0x6a
    [ 3.523576] [] mount_block_root+0x257/0x26e
    [ 3.523681] [] mount_root+0x56/0x5a
    [ 3.523780] [] prepare_namespace+0x170/0x1a9
    [ 3.523885] [] kernel_init+0x1d2/0x1e2
    [ 3.523987] [] kernel_thread_helper+0x4/0x10
    [ 3.524228] [] ? restore_args+0x0/0x30
    [ 3.524345] [] ? kernel_init+0x0/0x1e2
    [ 3.524445] [] ? kernel_thread_helper+0x0/0x10

    -v5:
    * fix build breakage with oprofile

    -v4:
    * use 0 instead of regs->bp
    * separate out printk changes

    -v3:
    * apply comment from Frederic
    * add a couple of printk fixes

    Signed-off-by: Namhyung Kim
    Acked-by: Peter Zijlstra
    Acked-by: Frederic Weisbecker
    Cc: Soren Sandmann
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: Robert Richter
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Namhyung Kim
     

25 Jan, 2011

1 commit

  • In arch/x86/kernel/dumpstack_64.c::dump_trace() we have this code:

    ...
    if (!stack) {
    unsigned long dummy;
    stack = &dummy;
    if (task && task != current)
    stack = (unsigned long *)task->thread.sp;
    }

    bp = stack_frame(task, regs);
    /*
    * Print function call entries in all stacks, starting at the
    * current stack address. If the stacks consist of nested
    * exceptions
    */
    tinfo = task_thread_info(task);

    for (;;) {
    char *id;
    unsigned long *estack_end;
    estack_end = in_exception_stack(cpu, (unsigned long)stack,
    &used, &id);
    ...

    You'll notice that we assign to 'stack' the address of the variable
    'dummy' which is only in-scope inside the 'if (!stack)'. So when we later
    access stack (at the end of the above, and assuming we did not take the
    'if (task && task != current)' branch) we'll be using the address of a
    variable that is no longer in scope. I believe this patch is the proper
    fix, but I freely admit that I'm not 100% certain.

    Signed-off-by: Jesper Juhl
    LKML-Reference:
    Signed-off-by: H. Peter Anvin

    Jesper Juhl
     

18 Nov, 2010

1 commit

  • The various stack tracing routines take a 'bp' argument in which the
    caller is supposed to provide the base pointer to use, or 0 if doesn't
    have one. Since bp is garbage whenever CONFIG_FRAME_POINTER is not
    defined, this means all callers in principle should either always pass
    0, or be conditional on CONFIG_FRAME_POINTER.

    However, there are only really three use cases for stack tracing:

    (a) Trace the current task, including IRQ stack if any
    (b) Trace the current task, but skip IRQ stack
    (c) Trace some other task

    In all cases, if CONFIG_FRAME_POINTER is not defined, bp should just
    be 0. If it _is_ defined, then

    - in case (a) bp should be gotten directly from the CPU's register, so
    the caller should pass NULL for regs,

    - in case (b) the caller should should pass the IRQ registers to
    dump_trace(),

    - in case (c) bp should be gotten from the top of the task's stack, so
    the caller should pass NULL for regs.

    Hence, the bp argument is not necessary because the combination of
    task and regs is sufficient to determine an appropriate value for bp.

    This patch introduces a new inline function stack_frame(task, regs)
    that computes the desired bp. This function is then called from the
    two versions of dump_stack().

    Signed-off-by: Soren Sandmann
    Acked-by: Steven Rostedt
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: H. Peter Anvin
    Cc: Peter Zijlstra
    Cc: Arjan van de Ven ,
    Cc: Frederic Weisbecker ,
    Cc: Arnaldo Carvalho de Melo ,
    LKML-Reference: >
    Signed-off-by: Frederic Weisbecker

    Soeren Sandmann Pedersen
     

24 Oct, 2010

1 commit

  • The stack output currently looks like this:

    7fffffffffffffff 0000000a00000000 ffffffff81093341 0000000000000046
    ffff88003a545fd8 0000000000000000 0000000000000000 00007fffa39769c0
    ffff88003e403f58 ffffffff8102fc4c ffff88003e403f58 ffff88003e403f78

    The superfluous are caused by recent printk KERN_CONT
    change. is now ignored in printk unless some text follows
    the level and even then it still has to be the first in the
    format message.

    Note that the log_lvl parameter is now completely ignored in
    show_stack_log_lvl and the stack is dumped with the default
    level (like for quite some time already). It behaves the same as
    the rest of the dump, function traces are dumped in the very
    same manner. Only Code and maybe some lines are printed with
    EMERG level.

    Unfortunately I see no way how to fix this conceptually to have
    the whole oops/BUG/panic output with the same level, so this
    removed only the superfluous characters for the time being.

    Just for illustration:

    Process kworker/0:0 (pid: 0, threadinfo ffff88003c8a6000, task ffff88003c85c100)
    Stack:
    ffffffff818022c0 0000000a00000001 0000000000000001 0000000000000046
    ffff88003c8a7fd8 0000000000000001 ffff88003c8a7e58 0000000000000000
    ffff88003e503f48 ffffffff8102fc4c ffff88003e503f48 ffff88003e503f68
    Call Trace:

    [] ? call_softirq+0x1c/0x30 ...
    Code: 00 01 00 00 65 8b 04 25 80 c5 00 00 c7 45 ...

    Signed-off-by: Jiri Slaby
    Cc: jirislaby@gmail.com
    Cc: Linus Torvalds
    Cc: Andrew Morton
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Jiri Slaby
     

09 Jun, 2010

1 commit

  • arch/x86/include/asm/stacktrace.h and arch/x86/kernel/dumpstack.h
    declare headers of objects that deal with the same topic.
    Actually most of the files that include stacktrace.h also include
    dumpstack.h

    Although dumpstack.h seems more reserved for internals of stack
    traces, those are quite often needed to define specialized stack
    trace operations. And perf event arch headers are going to need
    access to such low level operations anyway. So don't continue to
    bother with dumpstack.h as it's not anymore about isolated deep
    internals.

    v2: fix struct stack_frame definition conflict in sysprof

    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: H. Peter Anvin
    Cc: Thomas Gleixner
    Cc: Soeren Sandmann

    Frederic Weisbecker
     

10 Mar, 2010

2 commits


04 Mar, 2010

1 commit


03 Mar, 2010

1 commit

  • Callers of a stacktrace might pass bad frame pointers. Those
    are usually checked for safety in stack walking helpers before
    any dereferencing, but this is not the case when we need to go
    through one more frame pointer that backlinks the irq stack to
    the previous one, as we don't have any reliable address boudaries
    to compare this frame pointer against.

    This raises crashes when we record callchains for ftrace events
    with perf because we don't use the right helpers to capture
    registers there. We get wrong frame pointers as we call
    task_pt_regs() even on kernel threads, which is a wrong thing
    as it gives us the initial state of any kernel threads freshly
    created. This is even not what we want for user tasks. What we want
    is a hot snapshot of registers when the ftrace event triggers, not
    the state before a task entered the kernel.

    This requires more thoughts to do it correctly though.
    So first put a guardian to ensure the given frame pointer
    can be dereferenced to avoid crashes. We'll think about how to fix
    the callers in a subsequent patch.

    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: H. Peter Anvin
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Steven Rostedt
    Cc: 2.6.33.x
    Cc: Arnaldo Carvalho de Melo

    Frederic Weisbecker
     

01 Mar, 2010

1 commit

  • …git/tip/linux-2.6-tip

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (172 commits)
    perf_event, amd: Fix spinlock initialization
    perf_event: Fix preempt warning in perf_clock()
    perf tools: Flush maps on COMM events
    perf_events, x86: Split PMU definitions into separate files
    perf annotate: Handle samples not at objdump output addr boundaries
    perf_events, x86: Remove superflous MSR writes
    perf_events: Simplify code by removing cpu argument to hw_perf_group_sched_in()
    perf_events, x86: AMD event scheduling
    perf_events: Add new start/stop PMU callbacks
    perf_events: Report the MMAP pgoff value in bytes
    perf annotate: Defer allocating sym_priv->hist array
    perf symbols: Improve debugging information about symtab origins
    perf top: Use a macro instead of a constant variable
    perf symbols: Check the right return variable
    perf/scripts: Tag syscall_name helper as not yet available
    perf/scripts: Add perf-trace-python Documentation
    perf/scripts: Remove unnecessary PyTuple resizes
    perf/scripts: Add syscall tracing scripts
    perf/scripts: Add Python scripting engine
    perf/scripts: Remove check-perf-trace from listed scripts
    ...

    Fix trivial conflict in tools/perf/util/probe-event.c

    Linus Torvalds
     

04 Feb, 2010

1 commit


13 Jan, 2010

1 commit

  • The check that ignores the debug and nmi stack frames is useless
    now that we have a frame pointer that makes us start at the
    right place. We don't anymore have to deal with these.

    Signed-off-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

17 Dec, 2009

1 commit

  • The current print_context_stack helper that does the stack
    walking job is good for usual stacktraces as it walks through
    all the stack and reports even addresses that look unreliable,
    which is nice when we don't have frame pointers for example.

    But we have users like perf that only require reliable
    stacktraces, and those may want a more adapted stack walker, so
    lets make this function a callback in stacktrace_ops that users
    can tune for their needs.

    Signed-off-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

12 Dec, 2009

1 commit

  • …/git/tip/linux-2.6-tip

    * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (57 commits)
    x86, perf events: Check if we have APIC enabled
    perf_event: Fix variable initialization in other codepaths
    perf kmem: Fix unused argument build warning
    perf symbols: perf_header__read_build_ids() offset'n'size should be u64
    perf symbols: dsos__read_build_ids() should read both user and kernel buildids
    perf tools: Align long options which have no short forms
    perf kmem: Show usage if no option is specified
    sched: Mark sched_clock() as notrace
    perf sched: Add max delay time snapshot
    perf tools: Correct size given to memset
    perf_event: Fix perf_swevent_hrtimer() variable initialization
    perf sched: Fix for getting task's execution time
    tracing/kprobes: Fix field creation's bad error handling
    perf_event: Cleanup for cpu_clock_perf_event_update()
    perf_event: Allocate children's perf_event_ctxp at the right time
    perf_event: Clean up __perf_event_init_context()
    hw-breakpoints: Modify breakpoints without unregistering them
    perf probe: Update perf-probe document
    perf probe: Support --del option
    trace-kprobe: Support delete probe syntax
    ...

    Linus Torvalds
     

06 Dec, 2009

1 commit

  • When we enter in irq, two things can happen to preserve the link
    to the previous frame pointer:

    - If we were in an irq already, we don't switch to the irq stack
    as we are inside. We just need to save the previous frame
    pointer and to link the new one to the previous.

    - Otherwise we need another level of indirection. We enter the irq with
    the previous stack. We save the previous bp inside and make bp
    pointing to its saved address. Then we switch to the irq stack and
    push bp another time but to the new stack. This makes two levels to
    dereference instead of one.

    In the second case, the current stacktrace code omits the second level
    and loses the frame pointer accuracy. The stack that follows will then
    be considered as unreliable.

    Handling that makes the perf callchain happier.
    Before:

    43.94% [k] _raw_read_lock
    |
    --- _read_lock
    |
    |--60.53%-- send_sigio
    | __kill_fasync
    | kill_fasync
    | evdev_pass_event
    | evdev_event
    | input_pass_event
    | input_handle_event
    | input_event
    | synaptics_process_byte
    | psmouse_handle_byte
    | psmouse_interrupt
    | serio_interrupt
    | i8042_interrupt
    | handle_IRQ_event
    | handle_edge_irq
    | handle_irq
    | __irqentry_text_start
    | ret_from_intr
    | |
    | |--30.43%-- __select
    | |
    | |--17.39%-- 0x454f15
    | |
    | |--13.04%-- __read
    | |
    | |--13.04%-- vread_hpet
    | |
    | |--13.04%-- _xcb_lock_io
    | |
    | --13.04%-- 0x7f630878ce8

    After:

    50.00% [k] _raw_read_lock
    |
    --- _read_lock
    |
    |--98.97%-- send_sigio
    | __kill_fasync
    | kill_fasync
    | evdev_pass_event
    | evdev_event
    | input_pass_event
    | input_handle_event
    | input_event
    | |
    | |--96.88%-- synaptics_process_byte
    | | psmouse_handle_byte
    | | psmouse_interrupt
    | | serio_interrupt
    | | i8042_interrupt
    | | handle_IRQ_event
    | | handle_edge_irq
    | | handle_irq
    | | __irqentry_text_start
    | | ret_from_intr
    | | |
    | | |--39.78%-- __const_udelay
    | | | |
    | | | |--91.89%-- ath5k_hw_register_timeout
    | | | | ath5k_hw_noise_floor_calibration
    | | | | ath5k_hw_reset
    | | | | ath5k_reset
    | | | | ath5k_config
    | | | | ieee80211_hw_config
    | | | | |
    | | | | |--88.24%-- ieee80211_scan_work
    | | | | | worker_thread
    | | | | | kthread
    | | | | | child_rip
    | | | | |
    | | | | --11.76%-- ieee80211_scan_completed
    | | | | ieee80211_scan_work
    | | | | worker_thread
    | | | | kthread
    | | | | child_rip
    | | | |
    | | | --8.11%-- ath5k_hw_noise_floor_calibration
    | | | ath5k_hw_reset
    | | | ath5k_reset
    | | | ath5k_config

    Note: This does not only affect perf events but also x86-64
    stacktraces. They were considered as unreliable once we quit
    the irq stack frame.

    Signed-off-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: "K. Prasad"
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"

    Frederic Weisbecker
     

26 Nov, 2009

2 commits

  • This warning:

    [ 847.140022] rb_producer D 0000000000000000 5928 519 2 0x00000000
    [ 847.203627] BUG: using smp_processor_id() in preemptible [00000000] code: khungtaskd/517
    [ 847.207360] caller is show_stack_log_lvl+0x2e/0x241
    [ 847.210364] Pid: 517, comm: khungtaskd Not tainted 2.6.32-rc8-tip+ #13761
    [ 847.213395] Call Trace:
    [ 847.215847] [] debug_smp_processor_id+0x1f0/0x20a
    [ 847.216809] [] show_stack_log_lvl+0x2e/0x241
    [ 847.220027] [] show_stack+0x1c/0x1e
    [ 847.223365] [] sched_show_task+0xe4/0xe9
    [ 847.226694] [] check_hung_task+0x140/0x199
    [ 847.230261] [] check_hung_uninterruptible_tasks+0x1b7/0x20f
    [ 847.233371] [] ? watchdog+0x0/0x50
    [ 847.236683] [] watchdog+0x4e/0x50
    [ 847.240034] [] kthread+0x97/0x9f
    [ 847.243372] [] child_rip+0xa/0x20
    [ 847.246690] [] ? restore_args+0x0/0x30
    [ 847.250019] [] ? _spin_lock+0xe/0x10
    [ 847.253351] [] ? kthread+0x0/0x9f
    [ 847.256833] [] ? child_rip+0x0/0x20

    Happens because on preempt-RCU, khungd calls show_stack() with
    preemption enabled.

    Make sure we are not preemptible while walking the IRQ and exception
    stacks on 64-bit. (32-bit stack dumping is preemption safe.)

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Make the initialization more readable, plus tidy up a few small
    visual details as well.

    No change in functionality.

    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

24 Sep, 2009

1 commit

  • * remove asm/atomic.h inclusion from linux/utsname.h --
    not needed after kref conversion
    * remove linux/utsname.h inclusion from files which do not need it

    NOTE: it looks like fs/binfmt_elf.c do not need utsname.h, however
    due to some personality stuff it _is_ needed -- cowardly leave ELF-related
    headers and files alone.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

02 Jul, 2009

1 commit

  • About every callchains recorded with perf record are filled up
    including the internal perfcounter nmi frame:

    perf_callchain
    perf_counter_overflow
    intel_pmu_handle_irq
    perf_counter_nmi_handler
    notifier_call_chain
    atomic_notifier_call_chain
    notify_die
    do_nmi
    nmi

    We want ignore this frame as it's not interesting for
    instrumentation. To solve this, we simply ignore every frames
    from nmi context.

    New example of "perf report -s sym -c" after this patch:

    9.59% [k] search_by_key
    4.88%
    search_by_key
    reiserfs_read_locked_inode
    reiserfs_iget
    reiserfs_lookup
    do_lookup
    __link_path_walk
    path_walk
    do_path_lookup
    user_path_at
    vfs_fstatat
    vfs_lstat
    sys_newlstat
    system_call_fastpath
    __lxstat
    0x406fb1

    3.19%
    search_by_key
    search_by_entry_key
    reiserfs_find_entry
    reiserfs_lookup
    do_lookup
    __link_path_walk
    path_walk
    do_path_lookup
    user_path_at
    vfs_fstatat
    vfs_lstat
    sys_newlstat
    system_call_fastpath
    __lxstat
    0x406fb1
    [...]

    For now this patch only solves the problem in x86-64.

    Signed-off-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Anton Blanchard
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

18 Jan, 2009

2 commits

  • Signed-off-by: Brian Gerst
    Signed-off-by: Tejun Heo

    Brian Gerst
     
  • Move the irqstackptr variable from the PDA to per-cpu. Make the
    stacks themselves per-cpu, removing some specific allocation code.
    Add a seperate flag (is_boot_cpu) to simplify the per-cpu boot
    adjustments.

    tj: * sprinkle some underbars around.

    * irq_stack_ptr is not used till traps_init(), no reason to
    initialize it early. On SMP, just leaving it NULL till proper
    initialization in setup_per_cpu_areas() works. Dropped
    is_boot_cpu and early irq_stack_ptr initialization.

    * do DECLARE/DEFINE_PER_CPU(char[IRQ_STACK_SIZE], irq_stack)
    instead of (char, irq_stack[IRQ_STACK_SIZE]).

    Signed-off-by: Brian Gerst
    Signed-off-by: Tejun Heo

    Brian Gerst
     

03 Dec, 2008

1 commit

  • Impact: better dumpstack output

    I noticed in my crash dumps and even in the stack tracer that a
    lot of functions listed in the stack trace are simply
    return_to_handler which is ftrace graphs way to insert its own
    call into the return of a function.

    But we lose out where the actually function was called from.

    This patch adds in hooks to the dumpstack mechanism that detects
    this and finds the real function to print. Both are printed to
    let the user know that a hook is still in place.

    This does give a funny side effect in the stack tracer output:

    Depth Size Location (80 entries)
    ----- ---- --------
    0) 4144 48 save_stack_trace+0x2f/0x4d
    1) 4096 128 ftrace_call+0x5/0x2b
    2) 3968 16 mempool_alloc_slab+0x16/0x18
    3) 3952 384 return_to_handler+0x0/0x73
    4) 3568 -240 stack_trace_call+0x11d/0x209
    5) 3808 144 return_to_handler+0x0/0x73
    6) 3664 -128 mempool_alloc+0x4d/0xfe
    7) 3792 128 return_to_handler+0x0/0x73
    8) 3664 -32 scsi_sg_alloc+0x48/0x4a [scsi_mod]

    As you can see, the real functions are now negative. This is due
    to them not being found inside the stack.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     

28 Oct, 2008

1 commit

  • Impact: cleanup

    As promised, now that dumpstack_32 and dumpstack_64 have so many bits
    in common, we should merge the in-sync bits into a common file, to
    prevent them from diverging again.

    This patch removes bits which are common between dumpstack_32.c and
    dumpstack_64.c and places them in a common dumpstack.c which is built
    for both 32 and 64 bit arches.

    Signed-off-by: Neil Horman
    Acked-by: Alexander van Heukelum
    Signed-off-by: Ingo Molnar

    Makefile | 2
    arch/x86/kernel/Makefile | 2
    arch/x86/kernel/Makefile | 2
    arch/x86/kernel/Makefile | 2
    arch/x86/kernel/Makefile | 2
    arch/x86/kernel/Makefile | 2
    arch/x86/kernel/dumpstack.c | 319 +++++++++++++++++++++++++++++++++++++++++
    arch/x86/kernel/dumpstack.h | 39 +++++
    arch/x86/kernel/dumpstack_32.c | 294 -------------------------------------
    arch/x86/kernel/dumpstack_64.c | 285 ------------------------------------
    5 files changed, 363 insertions(+), 576 deletions(-)

    Neil Horman
     

22 Oct, 2008

5 commits


17 Oct, 2008

1 commit

  • Print the name of the last-accessed sysfs file when we oops, to help track
    down oopses which occur in sysfs store/read handlers. Because these oopses
    tend to not leave any trace of the offending code in the stack traces.

    Cc: Kay Sievers
    Cc: Mathieu Desnoyers
    Signed-off-by: Andrew Morton
    Signed-off-by: Greg Kroah-Hartman

    Andrew Morton
     

13 Oct, 2008

7 commits