06 Apr, 2009

40 commits

  • Impact: build fix

    mutex_lock() is was defined inline in kernel/mutex.c, but wasn't
    declared so not in . This didn't cause a problem until
    checkin 3a2d367d9aabac486ac4444c6c7ec7a1dab16267 added the
    atomic_dec_and_mutex_lock() inline in between declaration and
    definion.

    This broke building with CONFIG_ALLOW_WARNINGS=n, e.g. make
    allnoconfig.

    Either from the source code nor the allnoconfig binary output I cannot
    find any internal references to mutex_lock() in kernel/mutex.c, so
    presumably this "inline" is now-useless legacy.

    Cc: Eric Paris
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Orig-LKML-Reference:
    Signed-off-by: H. Peter Anvin

    H. Peter Anvin
     
  • Much like the atomic_dec_and_lock() function in which we take an hold a
    spin_lock if we drop the atomic to 0 this function takes and holds the
    mutex if we dec the atomic to 0.

    Signed-off-by: Eric Paris
    Signed-off-by: Peter Zijlstra
    Cc: Paul Mackerras
    Orig-LKML-Reference:
    Signed-off-by: Ingo Molnar

    Eric Paris
     
  • Impact: new feature giving performance improvement

    This adds the ability for userspace to do an mmap on a hardware counter
    fd and get access to a read-only page that contains the information
    needed to translate a hardware counter value to the full 64-bit
    counter value that would be returned by a read on the fd. This is
    useful on architectures that allow user programs to read the hardware
    counters, such as PowerPC.

    The mmap will only succeed if the counter is a hardware counter
    monitoring the current process.

    On my quad 2.5GHz PowerPC 970MP machine, userspace can read a counter
    and translate it to the full 64-bit value in about 30ns using the
    mmapped page, compared to about 830ns for the read syscall on the
    counter, so this does give a significant performance improvement.

    Signed-off-by: Paul Mackerras
    Signed-off-by: Peter Zijlstra
    Orig-LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul Mackerras
     
  • Tracepoint events like lock_acquire and software counters like
    pagefaults can recurse into the perf counter code again, avoid that.

    Signed-off-by: Peter Zijlstra
    Cc: Paul Mackerras
    Orig-LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Since the bitfields turned into a bit of a mess, remove them and rely on
    good old masks.

    Signed-off-by: Peter Zijlstra
    Cc: Paul Mackerras
    Orig-LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Signed-off-by: Wu Fengguang
    Acked-by: Peter Zijlstra
    Cc: Paul Mackerras
    Signed-off-by: Ingo Molnar

    Wu Fengguang
     
  • In my system, it takes kerneltop dozens of minutes to
    show up usable numbers. Make the default count 100 times
    smaller fixed this long startup latency.

    I'm not sure if it's the right solution though.

    Signed-off-by: Wu Fengguang
    Acked-by: Peter Zijlstra
    Cc: Paul Mackerras
    Signed-off-by: Ingo Molnar

    Wu Fengguang
     
  • Signed-off-by: Wu Fengguang
    Acked-by: Peter Zijlstra
    Cc: Paul Mackerras
    Signed-off-by: Ingo Molnar

    Wu Fengguang
     
  • Signed-off-by: Wu Fengguang
    Acked-by: Peter Zijlstra
    Cc: Paul Mackerras
    Signed-off-by: Ingo Molnar

    Wu Fengguang
     
  • - perfstat.c can be safely removed now
    - perfstat: -s => -a for system wide accounting
    - kerneltop: add -S/--stat for perfstat mode
    - minor adjustments to kerneltop --help, perfstat --help

    Signed-off-by: Wu Fengguang
    Acked-by: Peter Zijlstra
    Cc: Paul Mackerras
    Signed-off-by: Ingo Molnar

    Wu Fengguang
     
  • - can handle sw counters now
    - the outputs will look slightly different

    Acked-by: Peter Zijlstra
    Cc: Paul Mackerras
    Signed-off-by: Ingo Molnar

    Wu Fengguang
     
  • - kerneltop: --event_id => --event
    - kerneltop: can accept SW event types now
    - perfstat: it used to implicitly add event -2(task-clock),
    the new code no longer does this. Shall we?

    Signed-off-by: Wu Fengguang
    Acked-by: Peter Zijlstra
    Cc: Paul Mackerras
    Signed-off-by: Ingo Molnar

    Wu Fengguang
     
  • Signed-off-by: Wu Fengguang
    Acked-by: Peter Zijlstra
    Cc: Paul Mackerras
    Signed-off-by: Ingo Molnar

    Wu Fengguang
     
  • kerneltop's MAX_COUNTERS is increased from 8 to 64(the value used by perfstat).

    Signed-off-by: Wu Fengguang
    Acked-by: Peter Zijlstra
    Cc: Paul Mackerras
    Signed-off-by: Ingo Molnar

    Wu Fengguang
     
  • Initial version of kerneltop.c and perfstat.c.

    Acked-by: Peter Zijlstra
    Cc: Paul Mackerras
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • We'll have more files in that directory, prepare for that.

    Acked-by: Peter Zijlstra
    Cc: Paul Mackerras
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Impact: build fix for powerpc

    Commit db3a944aca35ae61 ("perf_counter: revamp syscall input ABI")
    expanded the hw_event.type field into a union of structs containing
    bitfields. In particular it introduced a type field and a raw_type
    field, with the intention that the 1-bit raw_type field should
    overlay the most-significant bit of the 8-bit type field, and in fact
    perf_counter_alloc() now assumes that (or at least, assumes that
    raw_type doesn't overlay any of the bits that are 1 in the values of
    PERF_TYPE_{HARDWARE,SOFTWARE,TRACEPOINT}).

    Unfortunately this is not true on big-endian systems such as PowerPC,
    where bitfields are laid out from left to right, i.e. from most
    significant bit to least significant. This means that setting
    hw_event.type = PERF_TYPE_SOFTWARE will set hw_event.raw_type to 1.

    This fixes it by making the layout depend on whether or not
    __BIG_ENDIAN_BITFIELD is defined. It's a bit ugly, but that's what
    we get for using bitfields in a user/kernel ABI.

    Also, that commit didn't fix up some places in arch/powerpc/kernel/
    perf_counter.c where hw_event.raw and hw_event.event_id were used.
    This fixes them too.

    Signed-off-by: Paul Mackerras

    Paul Mackerras
     
  • Impact: cleanup

    This updates the powerpc perf_counter_interrupt following on from the
    "perf_counter: unify irq output code" patch. Since we now use the
    generic perf_counter_output code, which sets the perf_counter_pending
    flag directly, we no longer need the need_wakeup variable.

    This removes need_wakeup and makes perf_counter_interrupt use
    get_perf_counter_pending() instead.

    Signed-off-by: Paul Mackerras
    Signed-off-by: Peter Zijlstra
    Cc: Steven Rostedt
    Orig-LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul Mackerras
     
  • Impact: cleanup

    Having 3 slightly different copies of the same code around does nobody
    any good. First step in revamping the output format.

    Signed-off-by: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Steven Rostedt
    Orig-LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Impact: modify ABI

    The hardware/software classification in hw_event->type became a little
    strained due to the addition of tracepoint tracing.

    Instead split up the field and provide a type field to explicitly specify
    the counter type, while using the event_id field to specify which event to
    use.

    Raw counters still work as before, only the raw config now goes into
    raw_event.

    Signed-off-by: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Steven Rostedt
    Orig-LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Impact: new perfcounters feature

    Enable usage of tracepoints as perf counter events.

    tracepoint event ids can be found in /debug/tracing/event/*/*/id
    and (for now) are represented as -65536+id in the type field.

    Signed-off-by: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Steven Rostedt
    Orig-LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Impact: fix crash during perfcounters use

    I found another counter free path, create a free_counter() call to
    accomodate generic tear-down.

    Fixes an RCU bug.

    Signed-off-by: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Steven Rostedt
    Orig-LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Impact: cleanup

    Use the generic software events for context switches.

    Signed-off-by: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Steven Rostedt
    Orig-LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Impact: fix boot crash

    When doing the generic context switch event I ran into some early
    boot hangs, which were caused by inf func recursion (event, fault,
    event, fault).

    I eventually tracked it down to event_list not being initialized
    at the time of the first event. Fix this.

    Signed-off-by: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Steven Rostedt
    Orig-LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Impact: build fix for powerpc

    Commit bd753921015e7905 ("perf_counter: software counter event
    infrastructure") introduced a use of TIF_PERF_COUNTERS into the core
    perfcounter code. This breaks the build on powerpc because we use
    a flag in a per-cpu area to signal wakeups on powerpc rather than
    a thread_info flag, because the thread_info flags have to be
    manipulated with atomic operations and are thus slower than per-cpu
    flags.

    This fixes the by changing the core to use an abstracted
    set_perf_counter_pending() function, which is defined on x86 to set
    the TIF_PERF_COUNTERS flag and on powerpc to set the per-cpu flag
    (paca->perf_counter_pending). It changes the previous powerpc
    definition of set_perf_counter_pending to not take an argument and
    adds a clear_perf_counter_pending, so as to simplify the definition
    on x86.

    On x86, set_perf_counter_pending() is defined as a macro. Defining
    it as a static inline in arch/x86/include/asm/perf_counters.h causes
    compile failures because gets included early in
    , and the definitions of set_tsk_thread_flag etc. are
    therefore not available in . (On powerpc this
    problem is avoided by defining set_perf_counter_pending etc. in
    .)

    Signed-off-by: Paul Mackerras

    Paul Mackerras
     
  • Impact: fix boot crash on Intel Perfmon Version 1 systems

    Intel Perfmon v1 does not support the global MSRs, nor does
    it offer the generalized MSR ranges. So support v2 and later
    CPUs only.

    Also mark pmc_ops as read-mostly - to avoid false cacheline
    sharing.

    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Impact: build fix

    In order to compile a kernel with performance counter patches,
    has to be included to provide the declaration of
    struct pt_regs *get_irq_regs(void);

    [ This bug was masked by unrelated x86 header file changes in the
    x86 tree, but occurs in the tip:perfcounters/core standalone
    tree. ]

    Signed-off-by: Tim Blechmann
    Orig-LKML-Reference:
    Signed-off-by: Ingo Molnar

    Tim Blechmann
     
  • Impact: fix deadlock with perfstat

    Fix for the perfstat fubar..

    We cannot unconditionally call hrtimer_cancel() without ever having done
    hrtimer_init() on the thing.

    Signed-off-by: Peter Zijlstra
    Orig-LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • I noticed that the counter_list only includes top-level counters, thus
    perf_swcounter_event() will miss sw-counters in groups.

    Since perf_swcounter_event() also wants an RCU safe list, create a new
    event_list that includes all counters and uses RCU list ops and use call_rcu
    to free the counter structure.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Use hrtimers to profile timer based sampling for the software time
    counters.

    This allows platforms without hardware counter support to still
    perform sample based profiling.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Provide separate sw counters for major and minor page faults.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • We use the generic software counter infrastructure to provide
    page fault events.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Provide generic software counter infrastructure that supports
    software events.

    This will be used to allow sample based profiling based on software
    events such as pagefaults. The current infrastructure can only
    provide a count of such events, no place information.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Fix a build warning on 32bit machines by explicitly marking the
    constants as 64-bit.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • We need to ensure the enabled=0 write happens before we
    start disabling the actual counters, so that a pcm_amd_enable()
    will not enable one underneath us.

    I think the race is impossible anyway, we always balance the
    ops within any one context and perform enable() with IRQs disabled.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Instead of del/add use a move list-op.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • No need to assume the irq_period is 32bit.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Merge reason: we have gathered quite a few conflicts, need to merge upstream

    Conflicts:
    arch/powerpc/kernel/Makefile
    arch/x86/ia32/ia32entry.S
    arch/x86/include/asm/hardirq.h
    arch/x86/include/asm/unistd_32.h
    arch/x86/include/asm/unistd_64.h
    arch/x86/kernel/cpu/common.c
    arch/x86/kernel/irq.c
    arch/x86/kernel/syscall_table_32.S
    arch/x86/mm/iomap_32.c
    include/linux/sched.h
    kernel/Makefile

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • * 'audit.b62' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current:
    Audit: remove spaces from audit_log_d_path
    audit: audit_set_auditable defined but not used
    audit: incorrect ref counting in audit tree tag_chunk
    audit: Fix possible return value truncation in audit_get_context()
    audit: ignore terminating NUL in AUDIT_USER_TTY messages
    Audit: fix handling of 'strings' with NULL characters
    make the e->rule.xxx shorter in kernel auditfilter.c
    auditsc: fix kernel-doc notation
    audit: EXECVE record - removed bogus newline

    Linus Torvalds
     
  • * 'for-next' of git://git.o-hand.com/linux-mfd:
    mfd: fix da903x warning
    mfd: fix MAINTAINERS entry
    mfd: Use the value of the final spin when reading the AUXADC
    mfd: Storage class should be before const qualifier
    mfd: PASIC3: supply clock_rate to DS1WM via driver_data
    mfd: remove DS1WM clock handling
    mfd: remove unused PASIC3 bus_shift field
    pxa/magician: remove deprecated .bus_shift from PASIC3 platform_data
    mfd: convert PASIC3 to use MFD core
    mfd: convert DS1WM to use MFD core
    mfd: Support active high IRQs on WM835x
    mfd: Use bulk read to fill WM8350 register cache
    mfd: remove duplicated #include from pcf50633

    Linus Torvalds