26 May, 2011

1 commit

  • Commit 0837e3242c73566fc1c0196b4ec61779c25ffc93 fixes a situation on POWER7
    where events can roll back if a specualtive event doesn't actually complete.
    This can raise a performance monitor exception. We need to catch this to ensure
    that we reset the PMC. In all cases the PMC will be less than 256 cycles from
    overflow.

    This patch lifts Anton's fix for the problem in perf and applies it to oprofile
    as well.

    Signed-off-by: Eric B Munson
    Cc: # as far back as it applies cleanly
    Signed-off-by: Benjamin Herrenschmidt

    Eric B Munson
     

31 Mar, 2011

1 commit


02 Nov, 2010

1 commit

  • "gadget", "through", "command", "maintain", "maintain", "controller", "address",
    "between", "initiali[zs]e", "instead", "function", "select", "already",
    "equal", "access", "management", "hierarchy", "registration", "interest",
    "relative", "memory", "offset", "already",

    Signed-off-by: Uwe Kleine-König
    Signed-off-by: Jiri Kosina

    Uwe Kleine-König
     

14 Oct, 2010

1 commit

  • On an arch 2.06 hypervisor, a pending perfmon interrupt will be delivered
    to the hypervisor at any point the guest is running, regardless of
    MSR[EE]. In order to reflect this interrupt, the hypervisor has to mask
    the interrupt in PMGC0 -- and set MSRP[PMMP] to intercept futher guest
    accesses to the PMRs to detect when to unmask (and prevent the guest from
    unmasking early, or seeing inconsistent state).

    This has the side effect of ignoring any changes the guest makes to
    MSR[PMM], so wait until after the interrupt is clear, and thus the
    hypervisor should have cleared MSRP[PMMP], before setting MSR[PMM]. The
    counters wil not actually run until PMGC0[FAC] is cleared in
    pmc_start_ctrs(), so this will not reduce the effectiveness of PMM.

    Signed-off-by: Scott Wood
    Signed-off-by: Kumar Gala

    Scott Wood
     

13 Oct, 2010

1 commit


02 Sep, 2010

1 commit


14 Jul, 2010

1 commit


07 Jun, 2010

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

04 Dec, 2009

1 commit

  • That is "success", "unknown", "through", "performance", "[re|un]mapping"
    , "access", "default", "reasonable", "[con]currently", "temperature"
    , "channel", "[un]used", "application", "example","hierarchy", "therefore"
    , "[over|under]flow", "contiguous", "threshold", "enough" and others.

    Signed-off-by: André Goddard Rosa
    Signed-off-by: Jiri Kosina

    André Goddard Rosa
     

09 Nov, 2009

1 commit

  • something-bility is spelled as something-blity
    so a grep for 'blit' would find these lines

    this is so trivial that I didn't split it by subsystem / copy
    additional maintainers - all changes are to comments
    The only purpose is to get fewer false positives when grepping
    around the kernel sources.

    Signed-off-by: Dirk Hohndel
    Signed-off-by: Jiri Kosina

    Dirk Hohndel
     

08 Jul, 2009

1 commit


16 Jun, 2009

1 commit

  • Add the option to build the code under arch/powerpc with -Werror.

    The intention is to make it harder for people to inadvertantly introduce
    warnings in the arch/powerpc code. It needs to be configurable so that
    if a warning is introduced, people can easily work around it while it's
    being fixed.

    The option is a negative, ie. don't enable -Werror, so that it will be
    turned on for allyes and allmodconfig builds.

    The default is n, in the hope that developers will build with -Werror,
    that will probably lead to some build breaks, I am prepared to be flamed.

    It's not enabled for math-emu, which is a steaming pile of warnings.

    Signed-off-by: Michael Ellerman
    Signed-off-by: Benjamin Herrenschmidt

    Michael Ellerman
     

19 May, 2009

1 commit


15 May, 2009

1 commit

  • Description
    -----------
    Change ppc64 oprofile kernel driver to use the SLOT bits (MMCRA[37:39]only on
    older processors where those bits are defined.

    Background
    ----------
    The performance monitor unit of the 64-bit POWER processor family has the
    ability to collect accurate instruction-level samples when profiling on marked
    events (i.e., "PM_MRK_"). In processors prior to POWER6, the MMCRA
    register contained "slot information" that the oprofile kernel driver used to
    adjust the value latched in the SIAR at the time of a PMU interrupt. But as of
    POWER6, these slot bits in MMCRA are no longer necessary for oprofile to use,
    since the SIAR itself holds the accurate sampled instruction address. With
    POWER6, these MMCRA slot bits were zero'ed out by hardware so oprofile's use of
    these slot bits was, in effect, a NOP. But with POWER7, these bits are no
    longer zero'ed out; however, they serve some other purpose rather than slot
    information. Thus, using these bits on POWER7 to adjust the SIAR value results
    in samples being attributed to the wrong instructions. The attached patch
    changes the oprofile kernel driver to ignore these slot bits on all newer
    processors starting with POWER6.

    Signed-off-by: Maynard Johnson
    Signed-off-by: Michael Wolf
    Signed-off-by: Benjamin Herrenschmidt

    Maynard Johnson
     

11 Mar, 2009

1 commit


10 Feb, 2009

1 commit


13 Jan, 2009

1 commit

  • Convert arch/powerpc/ over to long long based u64:

    -#ifdef __powerpc64__
    -# include
    -#else
    -# include
    -#endif
    +#include

    This will avoid reoccuring spurious warnings in core kernel code that
    comes when people test on their own hardware. (i.e. x86 in ~98% of the
    cases) This is what x86 uses and it generally helps keep 64-bit code
    32-bit clean too.

    [Adjusted to not impact user mode (from paulus) - sfr]

    Signed-off-by: Ingo Molnar
    Signed-off-by: Stephen Rothwell
    Signed-off-by: Benjamin Herrenschmidt

    Ingo Molnar
     

10 Jan, 2009

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile: (31 commits)
    powerpc/oprofile: fix whitespaces in op_model_cell.c
    powerpc/oprofile: IBM CELL: add SPU event profiling support
    powerpc/oprofile: fix cell/pr_util.h
    powerpc/oprofile: IBM CELL: cleanup and restructuring
    oprofile: make new cpu buffer functions part of the api
    oprofile: remove #ifdef CONFIG_OPROFILE_IBS in non-ibs code
    ring_buffer: fix ring_buffer_event_length()
    oprofile: use new data sample format for ibs
    oprofile: add op_cpu_buffer_get_data()
    oprofile: add op_cpu_buffer_add_data()
    oprofile: rework implementation of cpu buffer events
    oprofile: modify op_cpu_buffer_read_entry()
    oprofile: add op_cpu_buffer_write_reserve()
    oprofile: rename variables in add_ibs_begin()
    oprofile: rename add_sample() in cpu_buffer.c
    oprofile: rename variable ibs_allowed to has_ibs in op_model_amd.c
    oprofile: making add_sample_entry() inline
    oprofile: remove backtrace code for ibs
    oprofile: remove unused ibs macro
    oprofile: remove unused components in struct oprofile_cpu_buffer
    ...

    Linus Torvalds
     

08 Jan, 2009

5 commits


06 Jan, 2009

1 commit


01 Jan, 2009

1 commit

  • struct dentry is one of the most critical structures in the kernel. So it's
    sad to see it going neglected.

    With CONFIG_PROFILING turned on (which is probably the common case at least
    for distros and kernel developers), sizeof(struct dcache) == 208 here
    (64-bit). This gives 19 objects per slab.

    I packed d_mounted into a hole, and took another 4 bytes off the inline
    name length to take the padding out from the end of the structure. This
    shinks it to 200 bytes. I could have gone the other way and increased the
    length to 40, but I'm aiming for a magic number, read on...

    I then got rid of the d_cookie pointer. This shrinks it to 192 bytes. Rant:
    why was this ever a good idea? The cookie system should increase its hash
    size or use a tree or something if lookups are a problem. Also the "fast
    dcookie lookups" in oprofile should be moved into the dcookie code -- how
    can oprofile possibly care about the dcookie_mutex? It gets dropped after
    get_dcookie() returns so it can't be providing any sort of protection.

    At 192 bytes, 21 objects fit into a 4K page, saving about 3MB on my system
    with ~140 000 entries allocated. 192 is also a multiple of 64, so we get
    nice cacheline alignment on 64 and 32 byte line systems -- any given dentry
    will now require 3 cachelines to touch all fields wheras previously it
    would require 4.

    I know the inline name size was chosen quite carefully, however with the
    reduction in cacheline footprint, it should actually be just about as fast
    to do a name lookup for a 36 character name as it was before the patch (and
    faster for other sizes). The memory footprint savings for names which are
    36 bytes long should more than make up for the memory cost for
    33-36 byte names.

    Performance is a feature...

    Signed-off-by: Al Viro

    Nick Piggin
     

31 Oct, 2008

1 commit

  • The size of the pm_signal_local array should be equal to the
    number of SPUs being configured in the array. Currently, the
    array is of size 4 (NR_PHYS_CTRS) but being indexed by a for
    loop from 0 to 7 (NUM_SPUS_PER_NODE). This could potentially
    cause an oops or random memory corruption since the pm_signal_local
    array is on the stack. This fixes it.

    Signed-off-by: Carl Love
    Signed-off-by: Paul Mackerras

    Carl Love
     

24 Oct, 2008

2 commits

  • …inux/kernel/git/tip/linux-2.6-tip

    * 'v28-range-hrtimers-for-linus-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (37 commits)
    hrtimers: add missing docbook comments to struct hrtimer
    hrtimers: simplify hrtimer_peek_ahead_timers()
    hrtimers: fix docbook comments
    DECLARE_PER_CPU needs linux/percpu.h
    hrtimers: fix typo
    rangetimers: fix the bug reported by Ingo for real
    rangetimer: fix BUG_ON reported by Ingo
    rangetimer: fix x86 build failure for the !HRTIMERS case
    select: fix alpha OSF wrapper
    select: fix alpha OSF wrapper
    hrtimer: peek at the timer queue just before going idle
    hrtimer: make the futex() system call use the per process slack value
    hrtimer: make the nanosleep() syscall use the per process slack
    hrtimer: fix signed/unsigned bug in slack estimator
    hrtimer: show the timer ranges in /proc/timer_list
    hrtimer: incorporate feedback from Peter Zijlstra
    hrtimer: add a hrtimer_start_range() function
    hrtimer: another build fix
    hrtimer: fix build bug found by Ingo
    hrtimer: make select() and poll() use the hrtimer range feature
    ...

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile: (21 commits)
    OProfile: Fix buffer synchronization for IBS
    oprofile: hotplug cpu fix
    oprofile: fixing whitespaces in arch/x86/oprofile/*
    oprofile: fixing whitespaces in arch/x86/oprofile/*
    oprofile: fixing whitespaces in drivers/oprofile/*
    x86/oprofile: add the logic for enabling additional IBS bits
    x86/oprofile: reordering functions in nmi_int.c
    x86/oprofile: removing unused function parameter in add_ibs_begin()
    oprofile: more whitespace fixes
    oprofile: whitespace fixes
    OProfile: Rename IBS sysfs dir into "ibs_op"
    OProfile: Rework string handling in setup_ibs_files()
    OProfile: Rework oprofile_add_ibs_sample() function
    oprofile: discover counters for op ppro too
    oprofile: Implement Intel architectural perfmon support
    oprofile: Don't report Nehalem as core_2
    oprofile: drop const in num counters field
    Revert "Oprofile Multiplexing Patch"
    x86, oprofile: BUG: using smp_processor_id() in preemptible code
    x86/oprofile: fix on_each_cpu build error
    ...

    Manually fixed trivial conflicts in
    drivers/oprofile/{cpu_buffer.c,event_buffer.h}

    Linus Torvalds
     

21 Oct, 2008

1 commit

  • The issue is the SPU code is not holding the kernel mutex lock while
    adding samples to the kernel buffer.

    This patch creates per SPU buffers to hold the data. Data
    is added to the buffers from in interrupt context. The data
    is periodically pushed to the kernel buffer via a new Oprofile
    function oprofile_put_buff(). The oprofile_put_buff() function
    is called via a work queue enabling the funtion to acquire the
    mutex lock.

    The existing user controls for adjusting the per CPU buffer
    size is used to control the size of the per SPU buffers.
    Similarly, overflows of the SPU buffers are reported by
    incrementing the per CPU buffer stats. This eliminates the
    need to have architecture specific controls for the per SPU
    buffers which is not acceptable to the OProfile user tool
    maintainer.

    The export of the oprofile add_event_entry() is removed as it
    is no longer needed given this patch.

    Note, this patch has not addressed the issue of indexing arrays
    by the spu number. This still needs to be fixed as the spu
    numbering is not guarenteed to be 0 to max_num_spus-1.

    Signed-off-by: Carl Love
    Signed-off-by: Maynard Johnson
    Signed-off-by: Arnd Bergmann
    Acked-by: Acked-by: Robert Richter
    Signed-off-by: Benjamin Herrenschmidt

    Carl Love
     

18 Oct, 2008

1 commit


16 Oct, 2008

1 commit


10 Oct, 2008

1 commit

  • Offset is unsigned and when an address isn't found in the vma map
    vma_map_lookup() returns the vma physical address + 0x10000000.

    vma_map_lookup used to return 0xffffffff on a failed lookup, but
    a change was made to return the vma physical address + 0x10000000
    There are two callers of vam_map_lookup: one of them correctly
    deals with this new return value, but the other (below) did not.

    Signed-off-by: Roel Kluin
    Acked-by: Maynard Johnson
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Benjamin Herrenschmidt

    Roel Kluin
     

06 Sep, 2008

1 commit


20 Aug, 2008

1 commit

  • The file arch/powerpc/kernel/sysfs.c is currently only compiled for
    64-bit kernels. It contain code to register CPU sysdevs in sysfs and
    add various properties such as cache topology and raw access by root
    to performance monitor counters (PMCs). A lot of that can be re-used
    as is on 32-bits.

    This makes the file be built for both, with appropriate ifdef'ing
    for the few bits that are really 64-bit specific, and adds some
    support for the raw PMCs for 75x and 74xx processors.

    Signed-off-by: Benjamin Herrenschmidt
    Signed-off-by: Paul Mackerras

    Benjamin Herrenschmidt
     

26 Jun, 2008

1 commit


14 Apr, 2008

1 commit


01 Apr, 2008

1 commit


31 Mar, 2008

1 commit


03 Mar, 2008

1 commit

  • This patch enables OProfile callgraph support for the Cell processor. The
    original code was just calling a function to add the PC value, now it will
    call a function that first checks the callgraph depth. Callgraph is already
    enabled on the other Power platforms.

    Signed-off-by: Bob Nelson
    Signed-off-by: Arnd Bergmann

    Bob Nelson