01 May, 2013

1 commit

  • If we allocate perf ring buffer with the size of single (user)
    page, we will get memory corruption when releasing itin
    rb_free_work function (for CONFIG_PERF_USE_VMALLOC option).

    For single page sized ring buffer the page_order is -1 (because
    nr_pages is 0). This needs to be recognized in the rb_free_work
    function to release proper amount of pages.

    Adding data_page_nr function that returns number of allocated
    data pages. Customizing the rest of the code to use it.

    Reported-by: Jan Stancek
    Original-patch-by: Peter Zijlstra
    Acked-by: Peter Zijlstra
    Cc: Corey Ashford
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Signed-off-by: Jiri Olsa
    Link: http://lkml.kernel.org/r/20130319143509.GA1128@krava.brq.redhat.com
    Signed-off-by: Ingo Molnar

    Jiri Olsa
     

21 Mar, 2013

1 commit

  • This patch fixes a flaw in perf_output_space(). In case the size
    of the space needed is bigger than the actual buffer size, there
    may be situations where the function would return true (i.e.,
    there is space) when it should not. head > offset due to
    rounding of the masking logic.

    The problem can be tested by activating BTS on Intel processors.
    A BTS record can be as big as 16 pages. The following command
    fails:

    $ perf record -m 4 -c 1 -e branches:u my_test_program

    You will get a buffer corruption with this. Perf report won't be
    able to parse the perf.data.

    The fix is to first check that the requested space is smaller
    than the buffer size. If so, then the masking logic will work
    fine. If not, then there is no chance the record can be saved
    and it will be gracefully handled by upper code layers.

    [ In v2, we also make the logic for the writable more explicit by
    renaming it to rb->overwrite because it tells whether or not the
    buffer can overwrite its tail (suggested by PeterZ). ]

    Signed-off-by: Stephane Eranian
    Acked-by: Peter Zijlstra
    Cc: peterz@infradead.org
    Cc: jolsa@redhat.com
    Cc: fweisbec@gmail.com
    Link: http://lkml.kernel.org/r/20130318133327.GA3056@quad
    Signed-off-by: Ingo Molnar

    Stephane Eranian
     

10 Aug, 2012

2 commits

  • Introducing perf_output_skip function to be able to skip data within the
    perf ring buffer.

    When writing data into perf ring buffer we first reserve needed place in
    ring buffer and then copy the actual data.

    There's a possibility we won't be able to fill all the reserved size
    with data, so we need a way to skip the remaining bytes.

    This is going to be useful when storing the user stack dump, where we
    might end up with less data than we originally requested.

    Signed-off-by: Jiri Olsa
    Acked-by: Frederic Weisbecker
    Cc: "Frank Ch. Eigler"
    Cc: Arun Sharma
    Cc: Benjamin Redelings
    Cc: Corey Ashford
    Cc: Cyrill Gorcunov
    Cc: Frank Ch. Eigler
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Masami Hiramatsu
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Robert Richter
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    Cc: Ulrich Drepper
    Link: http://lkml.kernel.org/r/1344345647-11536-5-git-send-email-jolsa@redhat.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • Adding a generic way to use __output_copy function with specific copy
    function via DEFINE_PERF_OUTPUT_COPY macro.

    Using this to add new __output_copy_user function, that provides output
    copy from user pointers. For x86 the copy_from_user_nmi function is used
    and __copy_from_user_inatomic for the rest of the architectures.

    This new function will be used in user stack dump on sample, coming in
    next patches.

    Signed-off-by: Jiri Olsa
    Cc: "Frank Ch. Eigler"
    Cc: Arun Sharma
    Cc: Benjamin Redelings
    Cc: Corey Ashford
    Cc: Cyrill Gorcunov
    Cc: Frank Ch. Eigler
    Cc: Ingo Molnar
    Cc: Jiri Olsa
    Cc: Masami Hiramatsu
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Robert Richter
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    Cc: Ulrich Drepper
    Link: http://lkml.kernel.org/r/1344345647-11536-4-git-send-email-jolsa@redhat.com
    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Arnaldo Carvalho de Melo

    Frederic Weisbecker
     

09 Jan, 2012

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (53 commits)
    Kconfig: acpi: Fix typo in comment.
    misc latin1 to utf8 conversions
    devres: Fix a typo in devm_kfree comment
    btrfs: free-space-cache.c: remove extra semicolon.
    fat: Spelling s/obsolate/obsolete/g
    SCSI, pmcraid: Fix spelling error in a pmcraid_err() call
    tools/power turbostat: update fields in manpage
    mac80211: drop spelling fix
    types.h: fix comment spelling for 'architectures'
    typo fixes: aera -> area, exntension -> extension
    devices.txt: Fix typo of 'VMware'.
    sis900: Fix enum typo 'sis900_rx_bufer_status'
    decompress_bunzip2: remove invalid vi modeline
    treewide: Fix comment and string typo 'bufer'
    hyper-v: Update MAINTAINERS
    treewide: Fix typos in various parts of the kernel, and fix some comments.
    clockevents: drop unknown Kconfig symbol GENERIC_CLOCKEVENTS_MIGR
    gpio: Kconfig: drop unknown symbol 'CS5535_GPIO'
    leds: Kconfig: Fix typo 'D2NET_V2'
    sound: Kconfig: drop unknown symbol ARCH_CLPS7500
    ...

    Fix up trivial conflicts in arch/powerpc/platforms/40x/Kconfig (some new
    kconfig additions, close to removed commented-out old ones)

    Linus Torvalds
     

02 Jan, 2012

1 commit


05 Dec, 2011

1 commit

  • When you do:
    $ perf record -e cycles,cycles,cycles noploop 10

    You expect about 10,000 samples for each event, i.e., 10s at
    1000samples/sec. However, this is not what's happening. You
    get much fewer samples, maybe 3700 samples/event:

    $ perf report -D | tail -15
    Aggregated stats:
    TOTAL events: 10998
    MMAP events: 66
    COMM events: 2
    SAMPLE events: 10930
    cycles stats:
    TOTAL events: 3644
    SAMPLE events: 3644
    cycles stats:
    TOTAL events: 3642
    SAMPLE events: 3642
    cycles stats:
    TOTAL events: 3644
    SAMPLE events: 3644

    On a Intel Nehalem or even AMD64, there are 4 counters capable
    of measuring cycles, so there is plenty of space to measure those
    events without multiplexing (even with the NMI watchdog active).
    And even with multiplexing, we'd expect roughly the same number
    of samples per event.

    The root of the problem was that when the event that caused the buffer
    to become full was not the first event passed on the cmdline, the user
    notification would get lost. The notification was sent to the file
    descriptor of the overflowed event but the perf tool was not polling
    on it. The perf tool aggregates all samples into a single buffer,
    i.e., the buffer of the first event. Consequently, it assumes
    notifications for any event will come via that descriptor.

    The seemingly straight forward solution of moving the waitq into the
    ringbuffer object doesn't work because of life-time issues. One could
    perf_event_set_output() on a fd that you're also blocking on and cause
    the old rb object to be freed while its waitq would still be
    referenced by the blocked thread -> FAIL.

    Therefore link all events to the ringbuffer and broadcast the wakeup
    from the ringbuffer object to all possible events that could be waited
    upon. This is rather ugly, and we're open to better solutions but it
    works for now.

    Reported-by: Stephane Eranian
    Finished-by: Stephane Eranian
    Reviewed-by: Stephane Eranian
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20111126014731.GA7030@quad
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

01 Jul, 2011

3 commits

  • Since only samples call perf_output_sample() its much saner (and more
    correct) to put the sample logic in there than in the
    perf_output_begin()/perf_output_end() pair.

    Saves a useless argument, reduces conditionals and shrinks
    struct perf_output_handle, win!

    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/n/tip-2crpvsx3cqu67q3zqjbnlpsc@git.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • The nmi parameter indicated if we could do wakeups from the current
    context, if not, we would set some state and self-IPI and let the
    resulting interrupt do the wakeup.

    For the various event classes:

    - hardware: nmi=0; PMI is in fact an NMI or we run irq_work_run from
    the PMI-tail (ARM etc.)
    - tracepoint: nmi=0; since tracepoint could be from NMI context.
    - software: nmi=[0,1]; some, like the schedule thing cannot
    perform wakeups, and hence need 0.

    As one can see, there is very little nmi=1 usage, and the down-side of
    not using it is that on some platforms some software events can have a
    jiffy delay in wakeup (when arch_irq_work_raise isn't implemented).

    The up-side however is that we can remove the nmi parameter and save a
    bunch of conditionals in fast paths.

    Signed-off-by: Peter Zijlstra
    Cc: Michael Cree
    Cc: Will Deacon
    Cc: Deng-Cheng Zhu
    Cc: Anton Blanchard
    Cc: Eric B Munson
    Cc: Heiko Carstens
    Cc: Paul Mundt
    Cc: David S. Miller
    Cc: Frederic Weisbecker
    Cc: Jason Wessel
    Cc: Don Zickus
    Link: http://lkml.kernel.org/n/tip-agjev8eu666tvknpb3iaj0fg@git.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Since 2.6.36 (specifically commit d57e34fdd60b ("perf: Simplify the
    ring-buffer logic: make perf_buffer_alloc() do everything needed"),
    the perf_buffer_init_code() has been mis-setting the buffer watermark
    if perf_event_attr.wakeup_events has a non-zero value.

    This is because perf_event_attr.wakeup_events is a union with
    perf_event_attr.wakeup_watermark.

    This commit re-enables the check for perf_event_attr.watermark being
    set before continuing with setting a non-default watermark.

    This bug is most noticable when you are trying to use PERF_IOC_REFRESH
    with a value larger than one and perf_event_attr.wakeup_events is set to
    one. In this case the buffer watermark will be set to 1 and you will
    get extraneous POLL_IN overflows rather than POLL_HUP as expected.

    [ avoid using attr.wakeup_events when attr.watermark is set ]

    Signed-off-by: Vince Weaver
    Signed-off-by: Peter Zijlstra
    Cc:
    Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1106011506390.5384@cl320.eecs.utk.edu
    Signed-off-by: Ingo Molnar

    Vince Weaver
     

09 Jun, 2011

1 commit

  • And create the internal perf events header.

    v2: Keep an internal inlined perf_output_copy()

    Signed-off-by: Frederic Weisbecker
    Acked-by: Peter Zijlstra
    Cc: Borislav Petkov
    Cc: Stephane Eranian
    Cc: Arnaldo Carvalho de Melo
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/1305827704-5607-1-git-send-email-fweisbec@gmail.com
    [ v3: use clearer 'ring_buffer' and 'rb' naming ]
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker