03 Mar, 2016

6 commits

  • Create record__synthesize(). It can be used to create tracking events
    for each perf.data after perf supporting splitting into multiple
    outputs.

    Signed-off-by: He Kuang
    Cc: Alexei Starovoitov
    Cc: He Kuang
    Cc: Jiri Olsa
    Cc: Li Zefan
    Cc: Masami Hiramatsu
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Zefan Li
    Cc: pi3orama@163.com
    Link: http://lkml.kernel.org/r/1456479154-136027-20-git-send-email-wangnan0@huawei.com
    Signed-off-by: Wang Nan
    Signed-off-by: Arnaldo Carvalho de Melo

    Wang Nan
     
  • Commits in a BPF patchkit will extract kernel and module synthesizing
    code into a separated function and call it multiple times. This patch
    replace 'if (err < 0)' using WARN_ONCE, makes sure the error message
    show one time.

    Signed-off-by: Wang Nan
    Cc: Alexei Starovoitov
    Cc: He Kuang
    Cc: Jiri Olsa
    Cc: Li Zefan
    Cc: Masami Hiramatsu
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Zefan Li
    Cc: pi3orama@163.com
    Link: http://lkml.kernel.org/r/1456479154-136027-19-git-send-email-wangnan0@huawei.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Wang Nan
     
  • After babeltrace commit 5cec03e402aa ("ir: copy variants and sequences
    when setting a field path"), 'perf data convert' gets incorrect result
    if there's bpf output data. For example:

    # perf data convert --to-ctf ./out.ctf
    # babeltrace ./out.ctf
    [10:44:31.186045346] (+?.?????????) evt: { cpu_id = 0 }, { perf_ip = 0xFFFFFFFF810E7DD1, perf_tid = 23819, perf_pid = 23819, perf_id = 518, raw_len = 3, raw_data = [ [0] = 0xC028E32F, [1] = 0x815D0100, [2] = 0x1000000 ] }
    [10:44:31.286101003] (+0.100055657) evt: { cpu_id = 0 }, { perf_ip = 0xFFFFFFFF8105B609, perf_tid = 23819, perf_pid = 23819, perf_id = 518, raw_len = 3, raw_data = [ [0] = 0x35D9F1EB, [1] = 0x15D81, [2] = 0x2 ] }

    The expected result of the first sample should be:

    raw_data = [ [0] = 0x2FE328C0, [1] = 0x15D81, [2] = 0x1 ] }

    however, 'perf data convert' output big endian value to resuling CTF
    file.

    The reason is a internal change (or a bug?) of babeltrace.

    Before this patch, at the first add_bpf_output_values(), byte order of
    all integer type is uncertain (is 0, neither 1234 (le) nor 4321 (be)).
    It would be fixed by:

    perf_evlist__deliver_sample
    -> process_sample_event
    -> ctf_stream
    ...
    ->bt_ctf_trace_add_stream_class
    ->bt_ctf_field_type_structure_set_byte_order
    ->bt_ctf_field_type_integer_set_byte_order

    during creating the stream.

    However, the babeltrace commit mentioned above duplicates types in
    sequence to prevent potential conflict in following call stack and link
    the newly allocated type into the 'raw_data' sequence:

    perf_evlist__deliver_sample
    -> process_sample_event
    -> ctf_stream
    ...
    -> bt_ctf_trace_add_stream_class
    -> bt_ctf_stream_class_resolve_types
    ...
    -> bt_ctf_field_type_sequence_copy
    ->bt_ctf_field_type_integer_copy

    This happens before byte order setting, so only the newly allocated
    type is initialized, the byte order of original type perf choose to
    create the first raw_data is still uncertain.

    Byte order in CTF output is not related to byte order in perf.data.
    Setting it to anything other than BT_CTF_BYTE_ORDER_NATIVE solves this
    problem (only BT_CTF_BYTE_ORDER_NATIVE needs to be fixed). To reduce
    behavior changing, set byte order according to compiling options.

    Signed-off-by: Wang Nan
    Cc: Jeremie Galarneau
    Cc: Alexei Starovoitov
    Cc: Brendan Gregg
    Cc: Jiri Olsa
    Cc: Jérémie Galarneau
    Cc: Li Zefan
    Cc: Masami Hiramatsu
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Zefan Li
    Cc: pi3orama@163.com
    Link: http://lkml.kernel.org/r/1456479154-136027-10-git-send-email-wangnan0@huawei.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Wang Nan
     
  • bpf_perf_event_output() outputs data through sample->raw_data. This
    patch adds support to convert those data into CTF. A python script then
    can be used to process output data from BPF programs.

    Test result:

    # cat ./test_bpf_output_2.c
    /************************ BEGIN **************************/
    #include
    struct bpf_map_def {
    unsigned int type;
    unsigned int key_size;
    unsigned int value_size;
    unsigned int max_entries;
    };
    #define SEC(NAME) __attribute__((section(NAME), used))
    static u64 (*ktime_get_ns)(void) =
    (void *)BPF_FUNC_ktime_get_ns;
    static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
    (void *)BPF_FUNC_trace_printk;
    static int (*get_smp_processor_id)(void) =
    (void *)BPF_FUNC_get_smp_processor_id;
    static int (*perf_event_output)(void *, struct bpf_map_def *, int, void *, unsigned long) =
    (void *)BPF_FUNC_perf_event_output;

    struct bpf_map_def SEC("maps") channel = {
    .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
    .key_size = sizeof(int),
    .value_size = sizeof(u32),
    .max_entries = __NR_CPUS__,
    };

    static inline int __attribute__((always_inline))
    func(void *ctx, int type)
    {
    struct {
    u64 ktime;
    int type;
    } __attribute__((packed)) output_data;
    char error_data[] = "Error: failed to output\n";
    int err;

    output_data.type = type;
    output_data.ktime = ktime_get_ns();
    err = perf_event_output(ctx, &channel, get_smp_processor_id(),
    &output_data, sizeof(output_data));
    if (err)
    trace_printk(error_data, sizeof(error_data));
    return 0;
    }
    SEC("func_begin=sys_nanosleep")
    int func_begin(void *ctx) {return func(ctx, 1);}
    SEC("func_end=sys_nanosleep%return")
    int func_end(void *ctx) { return func(ctx, 2);}
    char _license[] SEC("license") = "GPL";
    int _version SEC("version") = LINUX_VERSION_CODE;
    /************************* END ***************************/

    # ./perf record -e bpf-output/no-inherit,name=evt/ \
    -e ./test_bpf_output_2.c/map:channel.event=evt/ \
    usleep 100000
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.012 MB perf.data (2 samples) ]

    # ./perf script
    usleep 14942 92503.198504: evt: ffffffff810e0ba1 sys_nanosleep (/lib/modules/4.3.0....
    usleep 14942 92503.298562: evt: ffffffff810585e9 kretprobe_trampoline_holder (/lib....

    # ./perf data convert --to-ctf ./out.ctf
    [ perf data convert: Converted 'perf.data' into CTF data './out.ctf' ]
    [ perf data convert: Converted and wrote 0.000 MB (2 samples) ]

    # babeltrace ./out.ctf
    [01:41:43.198504134] (+?.?????????) evt: { cpu_id = 0 }, { perf_ip = 0xFFFFFFFF810E0BA1, perf_tid = 14942, perf_pid = 14942, perf_id = 1044, raw_len = 3, raw_data = [ [0] = 0x32C0C07B, [1] = 0x5421, [2] = 0x1 ] }
    [01:41:43.298562257] (+0.100058123) evt: { cpu_id = 0 }, { perf_ip = 0xFFFFFFFF810585E9, perf_tid = 14942, perf_pid = 14942, perf_id = 1044, raw_len = 3, raw_data = [ [0] = 0x38B77FAA, [1] = 0x5421, [2] = 0x2 ] }

    # cat ./test_bpf_output_2.py
    from babeltrace import TraceCollection
    tc = TraceCollection()
    tc.add_trace('./out.ctf', 'ctf')
    d = {1:[], 2:[]}
    for event in tc.events:
    if not event.name.startswith('evt'):
    continue
    raw_data = event['raw_data']
    (time, type) = ((raw_data[0] + (raw_data[1] << 32)), raw_data[2])
    d[type].append(time)
    print(list(map(lambda i: d[2][i] - d[1][i], range(len(d[1])))));

    # python3 ./test_bpf_output_2.py
    [100056879]

    Committer note:

    Make sure you have python3-devel installed, not python-devel, which may
    be for python2, which will lead to some "PyInstance_Type" errors. Also
    make sure that you use the right libbabeltrace, because it is shipped
    in Fedora, for instance, but an older version.

    To build libbabeltrace's python binding one also needs to use:

    ./configure --enable-python-bindings

    And then set PYTHONPATH=/usr/local/lib64/python3.4/site-packages/.

    Signed-off-by: Wang Nan
    Tested-by: Arnaldo Carvalho de Melo
    Acked-by: Jiri Olsa
    Cc: Alexei Starovoitov
    Cc: Brendan Gregg
    Cc: Li Zefan
    Cc: Masami Hiramatsu
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Zefan Li
    Cc: pi3orama@163.com
    Link: http://lkml.kernel.org/r/1456479154-136027-9-git-send-email-wangnan0@huawei.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Wang Nan
     
  • Only put the frontend/backend stalled cycles into the default perf stat
    events when the CPU actually supports them.

    This avoids empty columns with --metric-only on newer Intel CPUs.

    Committer note:

    Before:

    $ perf stat ls

    Performance counter stats for 'ls':

    1.080893 task-clock (msec) # 0.619 CPUs utilized
    0 context-switches # 0.000 K/sec
    0 cpu-migrations # 0.000 K/sec
    97 page-faults # 0.090 M/sec
    3,327,741 cycles # 3.079 GHz
    stalled-cycles-frontend
    stalled-cycles-backend
    1,609,544 instructions # 0.48 insn per cycle
    319,117 branches # 295.235 M/sec
    12,246 branch-misses # 3.84% of all branches

    0.001746508 seconds time elapsed
    $

    After:

    $ perf stat ls

    Performance counter stats for 'ls':

    0.693948 task-clock (msec) # 0.662 CPUs utilized
    0 context-switches # 0.000 K/sec
    0 cpu-migrations # 0.000 K/sec
    95 page-faults # 0.137 M/sec
    1,792,509 cycles # 2.583 GHz
    1,599,047 instructions # 0.89 insn per cycle
    316,328 branches # 455.838 M/sec
    12,453 branch-misses # 3.94% of all branches

    0.001048987 seconds time elapsed
    $

    Signed-off-by: Andi Kleen
    Acked-by: Jiri Olsa
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/1456532881-26621-2-git-send-email-andi@firstfloor.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Andi Kleen
     
  • Ingo reported regression on display format of big numbers, which is
    missing separators (in default perf stat output).

    triton:~/tip> perf stat -a sleep 1
    ...
    127008602 cycles # 0.011 GHz
    279538533 stalled-cycles-frontend # 220.09% frontend cycles idle
    119213269 instructions # 0.94 insn per cycle

    This is caused by recent change:

    perf stat: Check existence of frontend/backed stalled cycles

    that added call to pmu_have_event, that subsequently calls
    perf_pmu__parse_scale, which has a bug in locale handling.

    The lc string returned from setlocale, that we use to store old locale
    value, may be allocated in static storage. Getting a dynamic copy to
    make it survive another setlocale call.

    $ perf stat ls
    ...
    2,360,602 cycles # 3.080 GHz
    2,703,090 instructions # 1.15 insn per cycle
    546,031 branches # 712.511 M/sec

    Committer note:

    Since the patch introducing the regression didn't made to perf/core,
    move it to just before where the regression was introduced, so that we
    don't break bisection for this feature.

    Reported-by: Ingo Molnar
    Signed-off-by: Jiri Olsa
    Tested-by: Arnaldo Carvalho de Melo
    Cc: David Ahern
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20160303095348.GA24511@krava.redhat.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     

29 Feb, 2016

33 commits

  • Currently there's a single function that is used to display a record's
    data in human readable format. That's pevent_print_event().
    Unfortunately, this gives little room for adding other output within the
    line without updating that function call.

    I've decided to split that function into 3 parts.

    pevent_print_event_task() which prints the task comm, pid and the CPU
    pevent_print_event_time() which outputs the record's timestamp
    pevent_print_event_data() which outputs the rest of the event data.

    pevent_print_event() now simply calls these three functions.

    To save time from doing the search for event from the record's type, I
    created a new helper function called pevent_find_event_by_record(),
    which returns the record's event, and this event has to be passed to the
    above functions.

    Signed-off-by: Steven Rostedt
    Cc: Namhyung Kim
    Link: http://lkml.kernel.org/r/20160229090128.43a56704@gandalf.local.home
    Signed-off-by: Arnaldo Carvalho de Melo

    Steven Rostedt
     
  • Some tracepoint have multiple fields with the same name, "nr", the first
    one is a unique syscall ID, the other is a syscall argument:

    # cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_io_getevents/format
    name: sys_enter_io_getevents
    ID: 747
    format:
    field:unsigned short common_type; offset:0; size:2; signed:0;
    field:unsigned char common_flags; offset:2; size:1; signed:0;
    field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
    field:int common_pid; offset:4; size:4; signed:1;

    field:int nr; offset:8; size:4; signed:1;
    field:aio_context_t ctx_id; offset:16; size:8; signed:0;
    field:long min_nr; offset:24; size:8; signed:0;
    field:long nr; offset:32; size:8; signed:0;
    field:struct io_event * events; offset:40; size:8; signed:0;
    field:struct timespec * timeout; offset:48; size:8; signed:0;

    print fmt: "ctx_id: 0x%08lx, min_nr: 0x%08lx, nr: 0x%08lx, events: 0x%08lx, timeout: 0x%08lx", ((unsigned long)(REC->ctx_id)), ((unsigned long)(REC->min_nr)), ((unsigned long)(REC->nr)), ((unsigned long)(REC->events)), ((unsigned long)(REC->timeout))
    #

    Fix it by renaming the "/format" common tracepoint field "nr" to "__syscall_nr".

    Signed-off-by: Taeung Song
    [ Do not rename the struct member, just the '/format' field name ]
    Signed-off-by: Steven Rostedt
    Acked-by: Peter Zijlstra
    Cc: Jiri Olsa
    Cc: Lai Jiangshan
    Cc: Namhyung Kim
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20160226132301.3ae065a4@gandalf.local.home
    Signed-off-by: Arnaldo Carvalho de Melo

    Taeung Song
     
  • Format fields of a syscall have the first variable '__syscall_nr' or
    'nr' that mean the syscall number. But it isn't relevant here so drop
    it.

    'nr' among fields of syscall was renamed '__syscall_nr'. So add
    exception handling to drop '__syscall_nr' and modify the comment for
    this excpetion handling.

    Reported-by: Arnaldo Carvalho de Melo
    Signed-off-by: Taeung Song
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/1456492465-5946-1-git-send-email-treeze.taeung@gmail.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Taeung Song
     
  • The util/python-ext-sources file contains source files required to build
    the python extension relative to $(srctree)/tools/perf,

    Such a file path $(FILE).c is handed over to the python extension build
    system, which builds the final object in the
    $(PYTHON_EXTBUILD)/tmp/$(FILE).o path.

    After the build is done all files from $(PYTHON_EXTBUILD)lib/ are
    carried as the result binaries.

    Above system fails when we add source file relative to ../lib, which we
    do for:

    ../lib/bitmap.c
    ../lib/find_bit.c
    ../lib/hweight.c
    ../lib/rbtree.c

    All above objects will be built like:

    $(PYTHON_EXTBUILD)/tmp/../lib/bitmap.c
    $(PYTHON_EXTBUILD)/tmp/../lib/find_bit.c
    $(PYTHON_EXTBUILD)/tmp/../lib/hweight.c
    $(PYTHON_EXTBUILD)/tmp/../lib/rbtree.c

    which accidentally happens to be final library path:

    $(PYTHON_EXTBUILD)/lib/

    Changing setup.py to pass full paths of source files to Extension build
    class and thus keep all built objects under $(PYTHON_EXTBUILD)tmp
    directory.

    Reported-by: Jeff Bastian
    Signed-off-by: Jiri Olsa
    Tested-by: Josh Boyer
    Cc: David Ahern
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: stable@vger.kernel.org # v4.2+
    Link: http://lkml.kernel.org/r/20160227201350.GB28494@krava.redhat.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • Required to use it in modular perf drivers.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andi Kleen
    Cc: Arnaldo Carvalho de Melo
    Cc: Borislav Petkov
    Cc: Harish Chegondi
    Cc: Jacob Pan
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Vince Weaver
    Cc: linux-kernel@vger.kernel.org
    Link: http://lkml.kernel.org/r/20160222221012.930735780@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • RAPL is a per package facility and we already have a mechanism for a dedicated
    per package reader. So there is no point to have multiple CPUs doing the
    same. The current implementation actually starts two timers on two CPUs if one
    does:

    perf stat -C1,2 -e -e power/energy-pkg ....

    which makes the whole concept of 1 reader per package moot.

    What's worse is that the above returns the double of the actual energy
    consumption, but that's a different problem to address and cannot be solved by
    removing the pointless per cpuness of that mechanism.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andi Kleen
    Cc: Arnaldo Carvalho de Melo
    Cc: Borislav Petkov
    Cc: Harish Chegondi
    Cc: Jacob Pan
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Vince Weaver
    Cc: linux-kernel@vger.kernel.org
    Link: http://lkml.kernel.org/r/20160222221012.845369524@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • Store the PMU pointer in event->pmu_private and use it instead of the per CPU
    data. Preparatory step to get rid of the per CPU allocations. The usage sites
    are the perf fast path, so we keep that even after the conversion to per
    package storage as a CPU to package lookup involves 3 loads versus 1 with the
    pmu_private pointer.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andi Kleen
    Cc: Arnaldo Carvalho de Melo
    Cc: Borislav Petkov
    Cc: Harish Chegondi
    Cc: Jacob Pan
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Vince Weaver
    Cc: linux-kernel@vger.kernel.org
    Link: http://lkml.kernel.org/r/20160222221012.748151799@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • This lock is taken in hard interrupt context even on Preempt-RT. Make it raw
    so RT does not have to patch it.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andi Kleen
    Cc: Arnaldo Carvalho de Melo
    Cc: Borislav Petkov
    Cc: Harish Chegondi
    Cc: Jacob Pan
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Vince Weaver
    Cc: linux-kernel@vger.kernel.org
    Link: http://lkml.kernel.org/r/20160222221012.669411833@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • Split out code from init into seperate functions. Tidy up the code and get rid
    of pointless comments. I wish there would be comments for code which is not
    obvious....

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andi Kleen
    Cc: Arnaldo Carvalho de Melo
    Cc: Borislav Petkov
    Cc: Harish Chegondi
    Cc: Jacob Pan
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Vince Weaver
    Cc: linux-kernel@vger.kernel.org
    Link: http://lkml.kernel.org/r/20160222221012.588544679@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • The output is inconsistent. Use a proper pr_fmt prefix and split out the
    advertisement into a seperate function.

    Remove the WARN_ON() in the failure case. It's pointless as we already know
    where it failed.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andi Kleen
    Cc: Arnaldo Carvalho de Melo
    Cc: Borislav Petkov
    Cc: Harish Chegondi
    Cc: Jacob Pan
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Vince Weaver
    Cc: linux-kernel@vger.kernel.org
    Link: http://lkml.kernel.org/r/20160222221012.504551295@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • No point in doing the same calculation over and over. Do it once in
    rapl_check_hw_unit().

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andi Kleen
    Cc: Arnaldo Carvalho de Melo
    Cc: Borislav Petkov
    Cc: Harish Chegondi
    Cc: Jacob Pan
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Vince Weaver
    Cc: linux-kernel@vger.kernel.org
    Link: http://lkml.kernel.org/r/20160222221012.409238136@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • There is no point in having a quirk machinery for a single possible
    function. Get rid of it and move the quirk to a place where it actually
    makes sense.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andi Kleen
    Cc: Arnaldo Carvalho de Melo
    Cc: Borislav Petkov
    Cc: Harish Chegondi
    Cc: Jacob Pan
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Vince Weaver
    Cc: linux-kernel@vger.kernel.org
    Link: http://lkml.kernel.org/r/20160222221012.311639465@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • Like uncore the rapl driver lacks error handling. It leaks memory and leaves
    the hotplug notifier registered.

    Add the proper error checks, cleanup the memory and register the hotplug
    notifier only on success.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andi Kleen
    Cc: Arnaldo Carvalho de Melo
    Cc: Borislav Petkov
    Cc: Harish Chegondi
    Cc: Jacob Pan
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Vince Weaver
    Cc: linux-kernel@vger.kernel.org
    Link: http://lkml.kernel.org/r/20160222221012.231222076@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • The Knights Landings support added the events and the detection case, but then
    returns 0 without actually initializing the driver.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andi Kleen
    Cc: Arnaldo Carvalho de Melo
    Cc: Borislav Petkov
    Cc: Dasaratharaman Chandramouli
    Cc: Harish Chegondi
    Cc: Jacob Pan
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Vince Weaver
    Cc: linux-kernel@vger.kernel.org
    Fixes: 3a2a7797326a4 "perf/x86/intel/rapl: Add support for Knights Landing (KNL)"
    Link: http://lkml.kernel.org/r/20160222221012.149331888@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • CQM is a strict per package facility. Use the proper cpumasks to lookup the
    readers.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andi Kleen
    Cc: Arnaldo Carvalho de Melo
    Cc: Borislav Petkov
    Cc: Harish Chegondi
    Cc: Jacob Pan
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Vince Weaver
    Cc: linux-kernel@vger.kernel.org
    Link: http://lkml.kernel.org/r/20160222221012.054916179@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • Almost every cpumask function is exported, just not the one I need to make the
    Intel uncore driver modular.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andi Kleen
    Cc: Arnaldo Carvalho de Melo
    Cc: Borislav Petkov
    Cc: David S. Miller
    Cc: Harish Chegondi
    Cc: Jacob Pan
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Rusty Russell
    Cc: Stephane Eranian
    Cc: Vince Weaver
    Cc: linux-kernel@vger.kernel.org
    Link: http://lkml.kernel.org/r/20160222221011.878299859@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • Andi wanted to do this before, but the patch fell down the cracks. Implement
    it with the proper error handling.

    Requested-by: Andi Kleen
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andi Kleen
    Cc: Arnaldo Carvalho de Melo
    Cc: Borislav Petkov
    Cc: Harish Chegondi
    Cc: Jacob Pan
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Vince Weaver
    Cc: linux-kernel@vger.kernel.org
    Link: http://lkml.kernel.org/r/20160222221011.799159968@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • The only missing bit is to completely clear the hardware state on failure
    exit. This is now a pretty simple exercise.

    Undo the box->init_box() setup on all packages which have been initialized so
    far.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andi Kleen
    Cc: Arnaldo Carvalho de Melo
    Cc: Borislav Petkov
    Cc: Harish Chegondi
    Cc: Jacob Pan
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Vince Weaver
    Cc: linux-kernel@vger.kernel.org
    Link: http://lkml.kernel.org/r/20160222221011.702452407@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • Uncore is a per package facility, but the code tries to mimick a per CPU
    facility with completely convoluted constructs.

    Simplify the whole machinery by tracking per package information. While at it,
    avoid the kfree/alloc dance when a CPU goes offline and online again. There is
    no point in freeing the box after it was allocated. We just keep proper
    refcounting and the first CPU which comes online in a package does the
    initialization/activation of the box.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andi Kleen
    Cc: Arnaldo Carvalho de Melo
    Cc: Borislav Petkov
    Cc: Harish Chegondi
    Cc: Jacob Pan
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Vince Weaver
    Cc: linux-kernel@vger.kernel.org
    Link: http://lkml.kernel.org/r/20160222221011.622258933@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • For per package oriented services we must be able to rely on the number of CPU
    packages to be within bounds. Create a tracking facility, which

    - calculates the number of possible packages depending on nr_cpu_ids after boot

    - makes sure that the package id is within the number of possible packages. If
    the apic id is outside we map it to a logical package id if there is enough
    space available.

    Provide interfaces for drivers to query the mapping and do translations from
    physcial to logical ids.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andi Kleen
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Arnaldo Carvalho de Melo
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Harish Chegondi
    Cc: Jacob Pan
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Linus Torvalds
    Cc: Luis R. Rodriguez
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Toshi Kani
    Cc: Vince Weaver
    Cc: linux-kernel@vger.kernel.org
    Link: http://lkml.kernel.org/r/20160222221011.541071755@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • Store the PMU pointer in event->pmu_private, so we can get rid of the
    per CPU data storage.

    We keep it after converting to per package data, because a CPU to
    package lookup will be 3 loads versus one and these usage sites are
    in the perf fast path.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andi Kleen
    Cc: Arnaldo Carvalho de Melo
    Cc: Borislav Petkov
    Cc: Harish Chegondi
    Cc: Jacob Pan
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Vince Weaver
    Cc: linux-kernel@vger.kernel.org
    Link: http://lkml.kernel.org/r/20160222221011.460851335@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • For PMUs which are not per CPU, but e.g. per package/socket, we want to be
    able to store a reference to the underlying per package/socket facility in the
    event at init time so we can avoid magic storage constructs in the PMU driver.

    This allows us to get rid of the per CPU dance in the intel uncore and RAPL
    drivers and avoids a lookup of the per package data in the perf hotpath.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andi Kleen
    Cc: Arnaldo Carvalho de Melo
    Cc: Borislav Petkov
    Cc: Harish Chegondi
    Cc: Jacob Pan
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Vince Weaver
    Cc: linux-kernel@vger.kernel.org
    Link: http://lkml.kernel.org/r/20160222221011.364140369@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • No users outside of this file.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andi Kleen
    Cc: Arnaldo Carvalho de Melo
    Cc: Borislav Petkov
    Cc: Harish Chegondi
    Cc: Jacob Pan
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Vince Weaver
    Cc: linux-kernel@vger.kernel.org
    Link: http://lkml.kernel.org/r/20160222221011.285504825@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • Clean up the code a bit before reworking it completely.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andi Kleen
    Cc: Arnaldo Carvalho de Melo
    Cc: Borislav Petkov
    Cc: Harish Chegondi
    Cc: Jacob Pan
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Vince Weaver
    Cc: linux-kernel@vger.kernel.org
    Link: http://lkml.kernel.org/r/20160222221011.204771538@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • When tearing down the boxes nothing undoes the hardware state which was setup
    by box->init_box(). Add a box->exit_box() callback and implement it for the
    uncores which have an init_box() callback.

    This misses the cleanup in the error exit pathes, but I cannot be bothered to
    implement it before cleaning up the rest of the driver, which makes that task
    way simpler.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andi Kleen
    Cc: Arnaldo Carvalho de Melo
    Cc: Borislav Petkov
    Cc: Harish Chegondi
    Cc: Jacob Pan
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Vince Weaver
    Cc: linux-kernel@vger.kernel.org
    Link: http://lkml.kernel.org/r/20160222221011.023930023@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • The storage array is size limited, but misses a sanity check

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andi Kleen
    Cc: Arnaldo Carvalho de Melo
    Cc: Borislav Petkov
    Cc: Harish Chegondi
    Cc: Jacob Pan
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Vince Weaver
    Cc: linux-kernel@vger.kernel.org
    Link: http://lkml.kernel.org/r/20160222221010.929967806@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • This driver lacks any form of proper error handling. If initialization fails
    or hotplug prepare fails, it lets the facility with half initialized stuff
    around.

    Fix the state and memory leaks in a first step. As a second step we need to
    undo the hardware state which is set via uncore_box_init() on some of the
    uncore implementations.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andi Kleen
    Cc: Arnaldo Carvalho de Melo
    Cc: Borislav Petkov
    Cc: Harish Chegondi
    Cc: Jacob Pan
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Vince Weaver
    Cc: linux-kernel@vger.kernel.org
    Link: http://lkml.kernel.org/r/20160222221010.848880559@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • No point in doing partial rollbacks. Robustify uncore_exit_type() so it does
    not dereference type->pmus unconditionally and remove all the partial rollback
    hackery.

    Preparatory patch for proper error handling.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andi Kleen
    Cc: Arnaldo Carvalho de Melo
    Cc: Borislav Petkov
    Cc: Harish Chegondi
    Cc: Jacob Pan
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Vince Weaver
    Cc: linux-kernel@vger.kernel.org
    Link: http://lkml.kernel.org/r/20160222221010.751077467@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • uncore_cpumask_init() is only ever called from intel_uncore_init() where the
    mask is guaranteed to be empty.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andi Kleen
    Cc: Arnaldo Carvalho de Melo
    Cc: Borislav Petkov
    Cc: Harish Chegondi
    Cc: Jacob Pan
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Vince Weaver
    Cc: linux-kernel@vger.kernel.org
    Link: http://lkml.kernel.org/r/20160222221010.657326866@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • Alexander volunteered to review perf (kernel) patches.

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • BDX-DE and BDX-EP share the same uncore code path. But there is no sbox
    in BDX-DE. This patch remove SBOX support for BDX-DE.

    Signed-off-by: Kan Liang
    Signed-off-by: Peter Zijlstra (Intel)
    Cc:
    Cc:
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Tony Battersby
    Cc: Vince Weaver
    Link: http://lkml.kernel.org/r/37D7C6CF3E00A74B8858931C1DB2F0770589D336@SHSMSX103.ccr.corp.intel.com
    Signed-off-by: Ingo Molnar

    Kan Liang
     
  • Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Linus Torvalds
     

28 Feb, 2016

1 commit

  • Pull perf fixes from Thomas Gleixner:
    "A rather largish series of 12 patches addressing a maze of race
    conditions in the perf core code from Peter Zijlstra"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    perf: Robustify task_function_call()
    perf: Fix scaling vs. perf_install_in_context()
    perf: Fix scaling vs. perf_event_enable()
    perf: Fix scaling vs. perf_event_enable_on_exec()
    perf: Fix ctx time tracking by introducing EVENT_TIME
    perf: Cure event->pending_disable race
    perf: Fix race between event install and jump_labels
    perf: Fix cloning
    perf: Only update context time when active
    perf: Allow perf_release() with !event->ctx
    perf: Do not double free
    perf: Close install vs. exit race

    Linus Torvalds