25 Aug, 2010

1 commit

  • This patch fixes a crash during shutdown reported below. The crash is
    caused by accessing already freed task structs. The fix changes the
    order for registering and unregistering notifier callbacks.

    All notifiers must be initialized before buffers start working. To
    stop buffer synchronization we cancel all workqueues, unregister the
    notifier callback and then flush all buffers. After all of this we
    finally can free all tasks listed.

    This should avoid accessing freed tasks.

    On 22.07.10 01:14:40, Benjamin Herrenschmidt wrote:

    > So the initial observation is a spinlock bad magic followed by a crash
    > in the spinlock debug code:
    >
    > [ 1541.586531] BUG: spinlock bad magic on CPU#5, events/5/136
    > [ 1541.597564] Unable to handle kernel paging request for data at address 0x6b6b6b6b6b6b6d03
    >
    > Backtrace looks like:
    >
    > spin_bug+0x74/0xd4
    > ._raw_spin_lock+0x48/0x184
    > ._spin_lock+0x10/0x24
    > .get_task_mm+0x28/0x8c
    > .sync_buffer+0x1b4/0x598
    > .wq_sync_buffer+0xa0/0xdc
    > .worker_thread+0x1d8/0x2a8
    > .kthread+0xa8/0xb4
    > .kernel_thread+0x54/0x70
    >
    > So we are accessing a freed task struct in the work queue when
    > processing the samples.

    Reported-by: Benjamin Herrenschmidt
    Cc: stable@kernel.org
    Signed-off-by: Robert Richter

    Robert Richter
     

26 Jul, 2010

1 commit


04 May, 2010

1 commit

  • http://lkml.org/lkml/2010/4/27/285

    Protect against dereferencing regs when it's NULL, and
    force a magic number into pc to prevent too deep processing.
    This approach permits the dropped samples to be tallied as
    invalid Instruction Pointer events.

    e.g. output from about 15mins at 10kHz sample rate:
    Nr. samples received: 2565380
    Nr. samples lost invalid pc: 4

    Signed-off-by: Phil Carmody
    Signed-off-by: Robert Richter

    Phil Carmody
     

23 Apr, 2010

3 commits


08 Apr, 2010

1 commit

  • Conflicts:
    include/linux/module.h
    kernel/module.c

    Semantic conflict:
    include/trace/events/module.h

    Merge reason: Resolve the conflict with upstream commit 5fbfb18 ("Fix up
    possibly racy module refcounting")

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

01 Apr, 2010

1 commit

  • Currently, when the ring buffer drops events, it does not record
    the fact that it did so. It does inform the writer that the event
    was dropped by returning a NULL event, but it does not put in any
    place holder where the event was dropped.

    This is not a trivial thing to add because the ring buffer mostly
    runs in overwrite (flight recorder) mode. That is, when the ring
    buffer is full, new data will overwrite old data.

    In a produce/consumer mode, where new data is simply dropped when
    the ring buffer is full, it is trivial to add the placeholder
    for dropped events. When there's more room to write new data, then
    a special event can be added to notify the reader about the dropped
    events.

    But in overwrite mode, any new write can overwrite events. A place
    holder can not be inserted into the ring buffer since there never
    may be room. A reader could also come in at anytime and miss the
    placeholder.

    Luckily, the way the ring buffer works, the read side can find out
    if events were lost or not, and how many events. Everytime a write
    takes place, if it overwrites the header page (the next read) it
    updates a "overrun" variable that keeps track of the number of
    lost events. When a reader swaps out a page from the ring buffer,
    it can record this number, perfom the swap, and then check to
    see if the number changed, and take the diff if it has, which would be
    the number of events dropped. This can be stored by the reader
    and returned to callers of the reader.

    Since the reader page swap will fail if the writer moved the head
    page since the time the reader page set up the swap, this gives room
    to record the overruns without worrying about races. If the reader
    sets up the pages, records the overrun, than performs the swap,
    if the swap succeeds, then the overrun variable has not been
    updated since the setup before the swap.

    For binary readers of the ring buffer, a flag is set in the header
    of each sub page (sub buffer) of the ring buffer. This flag is embedded
    in the size field of the data on the sub buffer, in the 31st bit (the size
    can be 32 or 64 bits depending on the architecture), but only 27
    bits needs to be used for the actual size (less actually).

    We could add a new field in the sub buffer header to also record the
    number of events dropped since the last read, but this will change the
    format of the binary ring buffer a bit too much. Perhaps this change can
    be made if the information on the number of events dropped is considered
    important enough.

    Note, the notification of dropped events is only used by consuming reads
    or peeking at the ring buffer. Iterating over the ring buffer does not
    keep this information because the necessary data is only available when
    a page swap is made, and the iterator does not swap out pages.

    Cc: Robert Richter
    Cc: Andi Kleen
    Cc: Li Zefan
    Cc: Arnaldo Carvalho de Melo
    Cc: "Luis Claudio R. Goncalves"
    Cc: Frederic Weisbecker
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

03 Mar, 2010

1 commit

  • Oprofile is currently broken on systems running with NOHZ enabled.
    A maximum of 1 tick is accounted via the timer_hook if a cpu sleeps
    for a longer period of time. This does bad things to the percentages
    in the profiler output. To solve this problem convert oprofile to
    use a restarting hrtimer instead of the timer_hook.

    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Robert Richter

    Martin Schwidefsky
     

15 Dec, 2009

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits)
    m68k: rename global variable vmalloc_end to m68k_vmalloc_end
    percpu: add missing per_cpu_ptr_to_phys() definition for UP
    percpu: Fix kdump failure if booted with percpu_alloc=page
    percpu: make misc percpu symbols unique
    percpu: make percpu symbols in ia64 unique
    percpu: make percpu symbols in powerpc unique
    percpu: make percpu symbols in x86 unique
    percpu: make percpu symbols in xen unique
    percpu: make percpu symbols in cpufreq unique
    percpu: make percpu symbols in oprofile unique
    percpu: make percpu symbols in tracer unique
    percpu: make percpu symbols under kernel/ and mm/ unique
    percpu: remove some sparse warnings
    percpu: make alloc_percpu() handle array types
    vmalloc: fix use of non-existent percpu variable in put_cpu_var()
    this_cpu: Use this_cpu_xx in trace_functions_graph.c
    this_cpu: Use this_cpu_xx for ftrace
    this_cpu: Use this_cpu_xx in nmi handling
    this_cpu: Use this_cpu operations in RCU
    this_cpu: Use this_cpu ops for VM statistics
    ...

    Fix up trivial (famous last words) global per-cpu naming conflicts in
    arch/x86/kvm/svm.c
    mm/slab.c

    Linus Torvalds
     

29 Oct, 2009

1 commit

  • This patch updates percpu related symbols in oprofile such that percpu
    symbols are unique and don't clash with local symbols. This serves
    two purposes of decreasing the possibility of global percpu symbol
    collision and allowing dropping per_cpu__ prefix from percpu symbols.

    * drivers/oprofile/cpu_buffer.c: s/cpu_buffer/op_cpu_buffer/

    Partly based on Rusty Russell's "alloc_percpu: rename percpu vars
    which cause name clashes" patch.

    Signed-off-by: Tejun Heo
    Acked-by: Robert Richter
    Cc: Rusty Russell

    Tejun Heo
     

10 Oct, 2009

2 commits

  • A race shouldn't happen since all workqueues or handlers are canceled
    or flushed before the event buffer is freed. A warning is triggered
    now if the buffer is freed too early.

    Also, this patch adds some comments about event buffer protection,
    reworks some code and adds code to clear buffer_pos during alloc and
    free of the event buffer.

    Cc: David Rientjes
    Cc: Stephane Eranian
    Signed-off-by: Robert Richter

    Robert Richter
     
  • Looking at the 2.6.31-rc9 code, it appears there is a race condition
    in the event_buffer cleanup code path (shutdown). This could lead to
    kernel panic as some CPUs may be operating on the event buffer AFTER
    it has been freed. The attached patch solves the problem and makes
    sure CPUs check if the buffer is not NULL before they access it as
    some may have been spinning on the mutex while the buffer was being
    freed.

    The race may happen if the buffer is freed during pending reads. But
    it is not clear why there are races in add_event_entry() since all
    workqueues or handlers are canceled or flushed before the event buffer
    is freed.

    Signed-off-by: David Rientjes
    Signed-off-by: Stephane Eranian
    Signed-off-by: Robert Richter

    David Rientjes
     

24 Sep, 2009

1 commit


22 Sep, 2009

1 commit


20 Jul, 2009

6 commits


14 Jul, 2009

1 commit


10 Jul, 2009

1 commit


13 Jun, 2009

1 commit


12 Jun, 2009

2 commits


11 Jun, 2009

1 commit


07 May, 2009

1 commit

  • The unit of oprofile_cpu_buffer_size is in samples, but was allocated
    in bytes. This led to the allocation of too small cpu buffers. This
    patch recalculates the buffer size in bytes taking also the
    ring_buffer_event header size into account.

    Reported-by: Suravee Suthikulpanit
    Signed-off-by: Robert Richter

    Robert Richter
     

06 Apr, 2009

1 commit

  • * 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (413 commits)
    tracing, net: fix net tree and tracing tree merge interaction
    tracing, powerpc: fix powerpc tree and tracing tree interaction
    ring-buffer: do not remove reader page from list on ring buffer free
    function-graph: allow unregistering twice
    trace: make argument 'mem' of trace_seq_putmem() const
    tracing: add missing 'extern' keywords to trace_output.h
    tracing: provide trace_seq_reserve()
    blktrace: print out BLK_TN_MESSAGE properly
    blktrace: extract duplidate code
    blktrace: fix memory leak when freeing struct blk_io_trace
    blktrace: fix blk_probes_ref chaos
    blktrace: make classic output more classic
    blktrace: fix off-by-one bug
    blktrace: fix the original blktrace
    blktrace: fix a race when creating blk_tree_root in debugfs
    blktrace: fix timestamp in binary output
    tracing, Text Edit Lock: cleanup
    tracing: filter fix for TRACE_EVENT_FORMAT events
    ftrace: Using FTRACE_WARN_ON() to check "freed record" in ftrace_release()
    x86: kretprobe-booster interrupt emulation code fix
    ...

    Fix up trivial conflicts in
    arch/parisc/include/asm/ftrace.h
    include/linux/memory.h
    kernel/extable.c
    kernel/module.c

    Linus Torvalds
     

02 Apr, 2009

1 commit


30 Mar, 2009

1 commit


11 Mar, 2009

1 commit


06 Mar, 2009

1 commit


06 Feb, 2009

1 commit


27 Jan, 2009

1 commit


22 Jan, 2009

1 commit


18 Jan, 2009

1 commit


12 Jan, 2009

1 commit

  • Impact: use new cpumask API.

    Convert misc driver functions to use struct cpumask.

    To Do:
    - Convert iucv_buffer_cpumask to cpumask_var_t.

    Signed-off-by: Rusty Russell
    Signed-off-by: Mike Travis
    Acked-by: Dean Nelson
    Cc: Robert Richter
    Cc: oprofile-list@lists.sf.net
    Cc: Jeremy Fitzhardinge
    Cc: Chris Wright
    Cc: virtualization@lists.osdl.org
    Cc: xen-devel@lists.xensource.com
    Cc: Ursula Braun
    Cc: linux390@de.ibm.com
    Cc: linux-s390@vger.kernel.org

    Rusty Russell
     

10 Jan, 2009

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile: (31 commits)
    powerpc/oprofile: fix whitespaces in op_model_cell.c
    powerpc/oprofile: IBM CELL: add SPU event profiling support
    powerpc/oprofile: fix cell/pr_util.h
    powerpc/oprofile: IBM CELL: cleanup and restructuring
    oprofile: make new cpu buffer functions part of the api
    oprofile: remove #ifdef CONFIG_OPROFILE_IBS in non-ibs code
    ring_buffer: fix ring_buffer_event_length()
    oprofile: use new data sample format for ibs
    oprofile: add op_cpu_buffer_get_data()
    oprofile: add op_cpu_buffer_add_data()
    oprofile: rework implementation of cpu buffer events
    oprofile: modify op_cpu_buffer_read_entry()
    oprofile: add op_cpu_buffer_write_reserve()
    oprofile: rename variables in add_ibs_begin()
    oprofile: rename add_sample() in cpu_buffer.c
    oprofile: rename variable ibs_allowed to has_ibs in op_model_amd.c
    oprofile: making add_sample_entry() inline
    oprofile: remove backtrace code for ibs
    oprofile: remove unused ibs macro
    oprofile: remove unused components in struct oprofile_cpu_buffer
    ...

    Linus Torvalds