16 May, 2018

1 commit

  • commit bfb3d7b8b906b66551424d7636182126e1d134c8 upstream.

    If the get_callchain_buffers fails to allocate the buffer it will
    decrease the nr_callchain_events right away.

    There's no point of checking the allocation error for
    nr_callchain_events > 1. Removing that check.

    Signed-off-by: Jiri Olsa
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: H. Peter Anvin
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: syzkaller-bugs@googlegroups.com
    Cc: x86@kernel.org
    Link: http://lkml.kernel.org/r/20180415092352.12403-3-jolsa@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Greg Kroah-Hartman

    Jiri Olsa
     

26 Apr, 2018

1 commit

  • commit 5af44ca53d019de47efe6dbc4003dd518e5197ed upstream.

    The syzbot hit KASAN bug in perf_callchain_store having the entry stored
    behind the allocated bounds [1].

    We miss the sample_max_stack check for the initial event that allocates
    callchain buffers. This missing check allows to create an event with
    sample_max_stack value bigger than the global sysctl maximum:

    # sysctl -a | grep perf_event_max_stack
    kernel.perf_event_max_stack = 127

    # perf record -vv -C 1 -e cycles/max-stack=256/ kill
    ...
    perf_event_attr:
    size 112
    ...
    sample_max_stack 256
    ------------------------------------------------------------
    sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 4

    Note the '-C 1', which forces perf record to create just single event.
    Otherwise it opens event for every cpu, then the sample_max_stack check
    fails on the second event and all's fine.

    The fix is to run the sample_max_stack check also for the first event
    with callchains.

    [1] https://marc.info/?l=linux-kernel&m=152352732920874&w=2

    Reported-by: syzbot+7c449856228b63ac951e@syzkaller.appspotmail.com
    Signed-off-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: H. Peter Anvin
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: syzkaller-bugs@googlegroups.com
    Cc: x86@kernel.org
    Fixes: 97c79a38cd45 ("perf core: Per event callchain limit")
    Link: http://lkml.kernel.org/r/20180415092352.12403-2-jolsa@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Greg Kroah-Hartman

    Jiri Olsa
     

10 May, 2017

1 commit

  • Perf can generate and record a user callchain in response to a synchronous
    request, such as a tracepoint firing. If this happens under set_fs(KERNEL_DS),
    then we can end up walking the user stack (and dereferencing/saving whatever we
    find there) without the protections usually afforded by checks such as
    access_ok.

    Rather than play whack-a-mole with each architecture's stack unwinding
    implementation, fix the root of the problem by ensuring that we force USER_DS
    when invoking perf_callchain_user from the perf core.

    Reported-by: Al Viro
    Signed-off-by: Will Deacon
    Acked-by: Peter Zijlstra
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Will Deacon
     

02 Mar, 2017

1 commit


30 May, 2016

1 commit

  • Additionally to being able to control the system wide maximum depth via
    /proc/sys/kernel/perf_event_max_stack, now we are able to ask for
    different depths per event, using perf_event_attr.sample_max_stack for
    that.

    This uses an u16 hole at the end of perf_event_attr, that, when
    perf_event_attr.sample_type has the PERF_SAMPLE_CALLCHAIN, if
    sample_max_stack is zero, means use perf_event_max_stack, otherwise
    it'll be bounds checked under callchain_mutex.

    Cc: Adrian Hunter
    Cc: Alexander Shishkin
    Cc: Alexei Starovoitov
    Cc: Brendan Gregg
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: He Kuang
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Masami Hiramatsu
    Cc: Milian Wolff
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Cc: Wang Nan
    Cc: Zefan Li
    Link: http://lkml.kernel.org/n/tip-kolmn1yo40p7jhswxwrc7rrd@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

17 May, 2016

5 commits

  • The perf_sample->ip_callchain->nr value includes all the entries in the
    ip_callchain->ip[] array, real addresses and PERF_CONTEXT_{KERNEL,USER,etc},
    while what the user expects is that what is in the kernel.perf_event_max_stack
    sysctl or in the upcoming per event perf_event_attr.sample_max_stack knob be
    honoured in terms of IP addresses in the stack trace.

    So allocate a bunch of extra entries for contexts, and do the accounting
    via perf_callchain_entry_ctx struct members.

    A new sysctl, kernel.perf_event_max_contexts_per_stack is also
    introduced for investigating possible bugs in the callchain
    implementation by some arch.

    Cc: Adrian Hunter
    Cc: Alexander Shishkin
    Cc: Alexei Starovoitov
    Cc: Brendan Gregg
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: He Kuang
    Cc: Jiri Olsa
    Cc: Masami Hiramatsu
    Cc: Milian Wolff
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Cc: Wang Nan
    Cc: Zefan Li
    Link: http://lkml.kernel.org/n/tip-3b4wnqk340c4sg4gwkfdi9yk@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • We need have different helpers to account how many contexts we have in
    the sample and for real addresses, so do it now as a prep patch, to
    ease review.

    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/n/tip-q964tnyuqrxw5gld18vizs3c@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • We will use it to count how many addresses are in the entry->ip[] array,
    excluding PERF_CONTEXT_{KERNEL,USER,etc} entries, so that we can really
    return the number of entries specified by the user via the relevant
    sysctl, kernel.perf_event_max_contexts, or via the per event
    perf_event_attr.sample_max_stack knob.

    This way we keep the perf_sample->ip_callchain->nr meaning, that is the
    number of entries, be it real addresses or PERF_CONTEXT_ entries, while
    honouring the max_stack knobs, i.e. the end result will be max_stack
    entries if we have at least that many entries in a given stack trace.

    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/n/tip-s8teto51tdqvlfhefndtat9r@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • This makes perf_callchain_{user,kernel}() receive the max stack
    as context for the perf_callchain_entry, instead of accessing
    the global sysctl_perf_event_max_stack.

    Cc: Adrian Hunter
    Cc: Alexander Shishkin
    Cc: Alexei Starovoitov
    Cc: Brendan Gregg
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: He Kuang
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Masami Hiramatsu
    Cc: Milian Wolff
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Cc: Wang Nan
    Cc: Zefan Li
    Link: http://lkml.kernel.org/n/tip-kolmn1yo40p7jhswxwrc7rrd@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • So that it can be used for other stack related knobs, such as the
    upcoming one to tweak the max number of of contexts per stack sample.

    In all those cases we can only change the value if there are no perf
    sessions collecting stacks, so they need to grab that mutex, etc.

    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/n/tip-8t3fk94wuzp8m2z1n4gc0s17@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

27 Apr, 2016

1 commit

  • The default remains 127, which is good for most cases, and not even hit
    most of the time, but then for some cases, as reported by Brendan, 1024+
    deep frames are appearing on the radar for things like groovy, ruby.

    And in some workloads putting a _lower_ cap on this may make sense. One
    that is per event still needs to be put in place tho.

    The new file is:

    # cat /proc/sys/kernel/perf_event_max_stack
    127

    Chaging it:

    # echo 256 > /proc/sys/kernel/perf_event_max_stack
    # cat /proc/sys/kernel/perf_event_max_stack
    256

    But as soon as there is some event using callchains we get:

    # echo 512 > /proc/sys/kernel/perf_event_max_stack
    -bash: echo: write error: Device or resource busy
    #

    Because we only allocate the callchain percpu data structures when there
    is a user, which allows for changing the max easily, its just a matter
    of having no callchain users at that point.

    Reported-and-Tested-by: Brendan Gregg
    Reviewed-by: Frederic Weisbecker
    Acked-by: Alexei Starovoitov
    Acked-by: David Ahern
    Cc: Adrian Hunter
    Cc: Alexander Shishkin
    Cc: He Kuang
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Masami Hiramatsu
    Cc: Milian Wolff
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Cc: Wang Nan
    Cc: Zefan Li
    Link: http://lkml.kernel.org/r/20160426002928.GB16708@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

20 Feb, 2016

1 commit


23 Nov, 2015

1 commit

  • There were still a number of references to my old Red Hat email
    address in the kernel source. Remove these while keeping the
    Red Hat copyright notices intact.

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

15 Oct, 2014

1 commit

  • Pull percpu consistent-ops changes from Tejun Heo:
    "Way back, before the current percpu allocator was implemented, static
    and dynamic percpu memory areas were allocated and handled separately
    and had their own accessors. The distinction has been gone for many
    years now; however, the now duplicate two sets of accessors remained
    with the pointer based ones - this_cpu_*() - evolving various other
    operations over time. During the process, we also accumulated other
    inconsistent operations.

    This pull request contains Christoph's patches to clean up the
    duplicate accessor situation. __get_cpu_var() uses are replaced with
    with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr().

    Unfortunately, the former sometimes is tricky thanks to C being a bit
    messy with the distinction between lvalues and pointers, which led to
    a rather ugly solution for cpumask_var_t involving the introduction of
    this_cpu_cpumask_var_ptr().

    This converts most of the uses but not all. Christoph will follow up
    with the remaining conversions in this merge window and hopefully
    remove the obsolete accessors"

    * 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits)
    irqchip: Properly fetch the per cpu offset
    percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix
    ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write.
    percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t
    Revert "powerpc: Replace __get_cpu_var uses"
    percpu: Remove __this_cpu_ptr
    clocksource: Replace __this_cpu_ptr with raw_cpu_ptr
    sparc: Replace __get_cpu_var uses
    avr32: Replace __get_cpu_var with __this_cpu_write
    blackfin: Replace __get_cpu_var uses
    tile: Use this_cpu_ptr() for hardware counters
    tile: Replace __get_cpu_var uses
    powerpc: Replace __get_cpu_var uses
    alpha: Replace __get_cpu_var
    ia64: Replace __get_cpu_var uses
    s390: cio driver &__get_cpu_var replacements
    s390: Replace __get_cpu_var uses
    mips: Replace __get_cpu_var uses
    MIPS: Replace __get_cpu_var uses in FPU emulator.
    arm: Replace __this_cpu_ptr with raw_cpu_ptr
    ...

    Linus Torvalds
     

09 Sep, 2014

1 commit

  • The use of "rcu_assign_pointer()" is NULLing out the pointer.
    According to RCU_INIT_POINTER()'s block comment:

    "1. This use of RCU_INIT_POINTER() is NULLing out the pointer"

    it is better to use it instead of rcu_assign_pointer() because it has a
    smaller overhead.

    The following Coccinelle semantic patch was used:
    @@
    @@

    - rcu_assign_pointer
    + RCU_INIT_POINTER
    (..., NULL)

    Signed-off-by: Andreea-Cristina Bernat
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: paulmck@linux.vnet.ibm.com
    Cc: Arnaldo Carvalho de Melo
    Link: http://lkml.kernel.org/r/20140822141536.GA32051@ada
    Signed-off-by: Ingo Molnar

    Andreea-Cristina Bernat
     

27 Aug, 2014

1 commit


16 Aug, 2013

1 commit

  • When we fail to allocate the callchain buffers, we roll back the refcount
    we did and return from get_callchain_buffers().

    However we take the refcount and allocate under the callchain lock
    but the rollback is done outside the lock.

    As a result, while we roll back, some concurrent callchain user may
    call get_callchain_buffers(), see the non-zero refcount and give up
    because the buffers are NULL without itself retrying the allocation.

    The consequences aren't that bad but that behaviour looks weird enough and
    it's better to give their chances to the following callchain users where
    we failed.

    Reported-by: Jiri Olsa
    Signed-off-by: Frederic Weisbecker
    Acked-by: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Arnaldo Carvalho de Melo
    Cc: Stephane Eranian
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1375460996-16329-2-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

31 Jul, 2013

1 commit

  • In case of allocation failure, get_callchain_buffer() keeps the
    refcount incremented for the current event.

    As a result, when get_callchain_buffers() returns an error,
    we must cleanup what it did by cancelling its last refcount
    with a call to put_callchain_buffers().

    This is a hack in order to be able to call free_event()
    after that failure.

    The original purpose of that was to simplify the failure
    path. But this error handling is actually counter intuitive,
    ugly and not very easy to follow because one expect to
    see the resources used to perform a service to be cleaned
    by the callee if case of failure, not by the caller.

    So lets clean this up by cancelling the refcount from
    get_callchain_buffer() in case of failure. And correctly free
    the event accordingly in perf_event_alloc().

    Signed-off-by: Frederic Weisbecker
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Arnaldo Carvalho de Melo
    Cc: Stephane Eranian
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1374539466-4799-3-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

10 Aug, 2012

1 commit

  • Introducing following bits to the the perf_event_attr struct:

    - exclude_callchain_kernel to filter out kernel callchain
    from the sample dump

    - exclude_callchain_user to filter out user callchain
    from the sample dump

    We need to be able to disable standard user callchain dump when we use
    the dwarf cfi callchain mode, because frame pointer based user
    callchains are useless in this mode.

    Implementing also exclude_callchain_kernel to have complete set of
    options.

    Signed-off-by: Jiri Olsa
    [ Added kernel callchains filtering ]
    Cc: "Frank Ch. Eigler"
    Cc: Arun Sharma
    Cc: Benjamin Redelings
    Cc: Corey Ashford
    Cc: Cyrill Gorcunov
    Cc: Frank Ch. Eigler
    Cc: Ingo Molnar
    Cc: Jiri Olsa
    Cc: Masami Hiramatsu
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Robert Richter
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    Cc: Ulrich Drepper
    Link: http://lkml.kernel.org/r/1344345647-11536-7-git-send-email-jolsa@redhat.com
    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Arnaldo Carvalho de Melo

    Frederic Weisbecker
     

31 Jul, 2012

1 commit

  • A few events are interesting not only for a current task.
    For example, sched_stat_* events are interesting for a task
    which wakes up. For this reason, it will be good if such
    events will be delivered to a target task too.

    Now a target task can be set by using __perf_task().

    The original idea and a draft patch belongs to Peter Zijlstra.

    I need these events for profiling sleep times. sched_switch is used for
    getting callchains and sched_stat_* is used for getting time periods.
    These events are combined in user space, then it can be analyzed by
    perf tools.

    Inspired-by: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: Steven Rostedt
    Cc: Arun Sharma
    Signed-off-by: Andrew Vagin
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1342016098-213063-1-git-send-email-avagin@openvz.org
    Signed-off-by: Ingo Molnar

    Andrew Vagin
     

21 Jan, 2012

1 commit

  • When alloc_callchain_buffers() fails, it frees all of
    entries before return. In addition, calling the
    release_callchain_buffers() will cause a NULL pointer
    dereference since callchain_cpu_entries is not set.

    Signed-off-by: Namhyung Kim
    Acked-by: Frederic Weisbecker
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Link: http://lkml.kernel.org/r/1327021966-27688-1-git-send-email-namhyung.kim@lge.com
    Signed-off-by: Ingo Molnar

    Namhyung Kim
     

14 Nov, 2011

1 commit

  • Split the callchain code from the perf events core into
    a new kernel/events/callchain.c file.

    This simplifies a bit the big core.c

    Signed-off-by: Borislav Petkov
    Cc: Arnaldo Carvalho de Melo
    Cc: Stephane Eranian
    [keep ctx recursion handling inline and use internal headers]
    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1318778104-17152-1-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar

    Borislav Petkov