23 Nov, 2015

1 commit

  • This patch reinforces the lockdep checks performed by
    perf_cgroup_from_tsk() by passing the perf_event_context
    whenever possible. It is okay to not hold the RCU read lock
    when we know we hold the ctx->lock. This patch makes sure this
    property holds.

    In some functions, such as perf_cgroup_sched_in(), we do not
    pass the context because we are sure we are holding the RCU
    read lock.

    Signed-off-by: Stephane Eranian
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Cc: edumazet@google.com
    Link: http://lkml.kernel.org/r/1447322404-10920-3-git-send-email-eranian@google.com
    Signed-off-by: Ingo Molnar

    Stephane Eranian
     

13 Sep, 2015

4 commits

  • We currently use PERF_EVENT_TXN flag to determine if we are in the middle
    of a transaction. If in a transaction, we defer the schedulability checks
    from pmu->add() operation to the pmu->commit() operation.

    Now that we have "transaction types" (PERF_PMU_TXN_ADD, PERF_PMU_TXN_READ)
    we can use the type to determine if we are in a transaction and drop the
    PERF_EVENT_TXN flag.

    When PERF_EVENT_TXN is dropped, the cpuhw->group_flag on some architectures
    becomes unused, so drop that field as well.

    This is an extension of the Powerpc patch from Peter Zijlstra to s390,
    Sparc and x86 architectures.

    Signed-off-by: Sukadev Bhattiprolu
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Arnaldo Carvalho de Melo
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Michael Ellerman
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Link: http://lkml.kernel.org/r/1441336073-22750-11-git-send-email-sukadev@linux.vnet.ibm.com
    Signed-off-by: Ingo Molnar

    Sukadev Bhattiprolu
     
  • Define a new PERF_PMU_TXN_READ interface to read a group of counters
    at once.

    pmu->start_txn() // Initialize before first event

    for each event in group
    pmu->read(event); // Queue each event to be read

    rc = pmu->commit_txn() // Read/update all queued counters

    Note that we use this interface with all PMUs. PMUs that implement this
    interface use the ->read() operation to _queue_ the counters to be read
    and use ->commit_txn() to actually read all the queued counters at once.

    PMUs that don't implement PERF_PMU_TXN_READ ignore ->start_txn() and
    ->commit_txn() and continue to read counters one at a time.

    Thanks to input from Peter Zijlstra.

    Signed-off-by: Sukadev Bhattiprolu
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Arnaldo Carvalho de Melo
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Michael Ellerman
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Link: http://lkml.kernel.org/r/1441336073-22750-9-git-send-email-sukadev@linux.vnet.ibm.com
    Signed-off-by: Ingo Molnar

    Sukadev Bhattiprolu
     
  • Currently, the PMU interface allows reading only one counter at a time.
    But some PMUs like the 24x7 counters in Power, support reading several
    counters at once. To leveage this functionality, extend the transaction
    interface to support a "transaction type".

    The first type, PERF_PMU_TXN_ADD, refers to the existing transactions,
    i.e. used to _schedule_ all the events on the PMU as a group. A second
    transaction type, PERF_PMU_TXN_READ, will be used in a follow-on patch,
    by the 24x7 counters to read several counters at once.

    Extend the transaction interfaces to the PMU to accept a 'txn_flags'
    parameter and use this parameter to ignore any transactions that are
    not of type PERF_PMU_TXN_ADD.

    Thanks to Peter Zijlstra for his input.

    Signed-off-by: Sukadev Bhattiprolu
    [peterz: s390 compile fix]
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Michael Ellerman
    Cc: Arnaldo Carvalho de Melo
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Link: http://lkml.kernel.org/r/1441336073-22750-3-git-send-email-sukadev@linux.vnet.ibm.com
    Signed-off-by: Ingo Molnar

    Sukadev Bhattiprolu
     
  • Signed-off-by: Peter Zijlstra (Intel)
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

10 Aug, 2015

1 commit

  • This patch add three core perf APIs:
    - perf_event_attrs(): export the struct perf_event_attr from struct
    perf_event;
    - perf_event_get(): get the struct perf_event from the given fd;
    - perf_event_read_local(): read the events counters active on the
    current CPU;
    These APIs are needed when accessing events counters in eBPF programs.

    The API perf_event_read_local() comes from Peter and I add the
    corresponding SOB.

    Signed-off-by: Kaixu Xia
    Signed-off-by: Peter Zijlstra
    Signed-off-by: David S. Miller

    Kaixu Xia
     

27 Jun, 2015

2 commits

  • Pull tracing updates from Steven Rostedt:
    "This patch series contains several clean ups and even a new trace
    clock "monitonic raw". Also some enhancements to make the ring buffer
    even faster. But the biggest and most noticeable change is the
    renaming of the ftrace* files, structures and variables that have to
    deal with trace events.

    Over the years I've had several developers tell me about their
    confusion with what ftrace is compared to events. Technically,
    "ftrace" is the infrastructure to do the function hooks, which include
    tracing and also helps with live kernel patching. But the trace
    events are a separate entity altogether, and the files that affect the
    trace events should not be named "ftrace". These include:

    include/trace/ftrace.h -> include/trace/trace_events.h
    include/linux/ftrace_event.h -> include/linux/trace_events.h

    Also, functions that are specific for trace events have also been renamed:

    ftrace_print_*() -> trace_print_*()
    (un)register_ftrace_event() -> (un)register_trace_event()
    ftrace_event_name() -> trace_event_name()
    ftrace_trigger_soft_disabled() -> trace_trigger_soft_disabled()
    ftrace_define_fields_##call() -> trace_define_fields_##call()
    ftrace_get_offsets_##call() -> trace_get_offsets_##call()

    Structures have been renamed:

    ftrace_event_file -> trace_event_file
    ftrace_event_{call,class} -> trace_event_{call,class}
    ftrace_event_buffer -> trace_event_buffer
    ftrace_subsystem_dir -> trace_subsystem_dir
    ftrace_event_raw_##call -> trace_event_raw_##call
    ftrace_event_data_offset_##call-> trace_event_data_offset_##call
    ftrace_event_type_funcs_##call -> trace_event_type_funcs_##call

    And a few various variables and flags have also been updated.

    This has been sitting in linux-next for some time, and I have not
    heard a single complaint about this rename breaking anything. Mostly
    because these functions, variables and structures are mostly internal
    to the tracing system and are seldom (if ever) used by anything
    external to that"

    * tag 'trace-v4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (33 commits)
    ring_buffer: Allow to exit the ring buffer benchmark immediately
    ring-buffer-benchmark: Fix the wrong type
    ring-buffer-benchmark: Fix the wrong param in module_param
    ring-buffer: Add enum names for the context levels
    ring-buffer: Remove useless unused tracing_off_permanent()
    ring-buffer: Give NMIs a chance to lock the reader_lock
    ring-buffer: Add trace_recursive checks to ring_buffer_write()
    ring-buffer: Allways do the trace_recursive checks
    ring-buffer: Move recursive check to per_cpu descriptor
    ring-buffer: Add unlikelys to make fast path the default
    tracing: Rename ftrace_get_offsets_##call() to trace_event_get_offsets_##call()
    tracing: Rename ftrace_define_fields_##call() to trace_event_define_fields_##call()
    tracing: Rename ftrace_event_type_funcs_##call to trace_event_type_funcs_##call
    tracing: Rename ftrace_data_offset_##call to trace_event_data_offset_##call
    tracing: Rename ftrace_raw_##call event structures to trace_event_raw_##call
    tracing: Rename ftrace_trigger_soft_disabled() to trace_trigger_soft_disabled()
    tracing: Rename FTRACE_EVENT_FL_* flags to EVENT_FILE_FL_*
    tracing: Rename struct ftrace_subsystem_dir to trace_subsystem_dir
    tracing: Rename ftrace_event_name() to trace_event_name()
    tracing: Rename FTRACE_MAX_EVENT to TRACE_EVENT_TYPE_MAX
    ...

    Linus Torvalds
     
  • Pull ARM updates from Russell King:
    "Bigger items included in this update are:

    - A series of updates from Arnd for ARM randconfig build failures
    - Updates from Dmitry for StrongARM SA-1100 to move IRQ handling to
    drivers/irqchip/
    - Move ARMs SP804 timer to drivers/clocksource/
    - Perf updates from Mark Rutland in preparation to move the ARM perf
    code into drivers/ so it can be shared with ARM64.
    - MCPM updates from Nicolas
    - Add support for taking platform serial number from DT
    - Re-implement Keystone2 physical address space switch to conform to
    architecture requirements
    - Clean up ARMv7 LPAE code, which goes in hand with the Keystone2
    changes.
    - L2C cleanups to avoid unlocking caches if we're prevented by the
    secure support to unlock.
    - Avoid cleaning a potentially dirty cache containing stale data on
    CPU initialisation
    - Add ARM-only entry point for secondary startup (for machines that
    can only call into a Thumb kernel in ARM mode). Same thing is also
    done for the resume entry point.
    - Provide arch_irqs_disabled via asm-generic
    - Enlarge ARMv7M vector table
    - Always use BFD linker for VDSO, as gold doesn't accept some of the
    options we need.
    - Fix an incorrect BSYM (for Thumb symbols) usage, and convert all
    BSYM compiler macros to a "badr" (for branch address).
    - Shut up compiler warnings provoked by our cmpxchg() implementation.
    - Ensure bad xchg sizes fail to link"

    * 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: (75 commits)
    ARM: Fix build if CLKDEV_LOOKUP is not configured
    ARM: fix new BSYM() usage introduced via for-arm-soc branch
    ARM: 8383/1: nommu: avoid deprecated source register on mov
    ARM: 8391/1: l2c: add options to overwrite prefetching behavior
    ARM: 8390/1: irqflags: Get arch_irqs_disabled from asm-generic
    ARM: 8387/1: arm/mm/dma-mapping.c: Add arm_coherent_dma_mmap
    ARM: 8388/1: tcm: Don't crash when TCM banks are protected by TrustZone
    ARM: 8384/1: VDSO: force use of BFD linker
    ARM: 8385/1: VDSO: group link options
    ARM: cmpxchg: avoid warnings from macro-ized cmpxchg() implementations
    ARM: remove __bad_xchg definition
    ARM: 8369/1: ARMv7M: define size of vector table for Vybrid
    ARM: 8382/1: clocksource: make ARM_TIMER_SP804 depend on GENERIC_SCHED_CLOCK
    ARM: 8366/1: move Dual-Timer SP804 driver to drivers/clocksource
    ARM: 8365/1: introduce sp804_timer_disable and remove arm_timer.h inclusion
    ARM: 8364/1: fix BE32 module loading
    ARM: 8360/1: add secondary_startup_arm prototype in header file
    ARM: 8359/1: correct secondary_startup_arm mode
    ARM: proc-v7: sanitise and document registers around errata
    ARM: proc-v7: clean up MIDR access
    ...

    Linus Torvalds
     

23 Jun, 2015

1 commit

  • Pull timer updates from Thomas Gleixner:
    "A rather largish update for everything time and timer related:

    - Cache footprint optimizations for both hrtimers and timer wheel

    - Lower the NOHZ impact on systems which have NOHZ or timer migration
    disabled at runtime.

    - Optimize run time overhead of hrtimer interrupt by making the clock
    offset updates smarter

    - hrtimer cleanups and removal of restrictions to tackle some
    problems in sched/perf

    - Some more leap second tweaks

    - Another round of changes addressing the 2038 problem

    - First step to change the internals of clock event devices by
    introducing the necessary infrastructure

    - Allow constant folding for usecs/msecs_to_jiffies()

    - The usual pile of clockevent/clocksource driver updates

    The hrtimer changes contain updates to sched, perf and x86 as they
    depend on them plus changes all over the tree to cleanup API changes
    and redundant code, which got copied all over the place. The y2038
    changes touch s390 to remove the last non 2038 safe code related to
    boot/persistant clock"

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits)
    clocksource: Increase dependencies of timer-stm32 to limit build wreckage
    timer: Minimize nohz off overhead
    timer: Reduce timer migration overhead if disabled
    timer: Stats: Simplify the flags handling
    timer: Replace timer base by a cpu index
    timer: Use hlist for the timer wheel hash buckets
    timer: Remove FIFO "guarantee"
    timers: Sanitize catchup_timer_jiffies() usage
    hrtimer: Allow hrtimer::function() to free the timer
    seqcount: Introduce raw_write_seqcount_barrier()
    seqcount: Rename write_seqcount_barrier()
    hrtimer: Fix hrtimer_is_queued() hole
    hrtimer: Remove HRTIMER_STATE_MIGRATE
    selftest: Timers: Avoid signal deadlock in leap-a-day
    timekeeping: Copy the shadow-timekeeper over the real timekeeper last
    clockevents: Check state instead of mode in suspend/resume path
    selftests: timers: Add leap-second timer edge testing to leap-a-day.c
    ntp: Do leapsecond adjustment in adjtimex read path
    time: Prevent early expiry of hrtimers[CLOCK_REALTIME] at the leap second edge
    ntp: Introduce and use SECS_PER_DAY macro instead of 86400
    ...

    Linus Torvalds
     

07 Jun, 2015

2 commits

  • After enlarging the PEBS interrupt threshold, there may be some mixed up
    PEBS samples which are discarded by the kernel.

    This patch makes the kernel emit a PERF_RECORD_LOST_SAMPLES record with
    the number of possible discarded records when it is impossible to demux
    the samples.

    It makes sure the user is not left in the dark about such discards.

    Signed-off-by: Kan Liang
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: acme@infradead.org
    Cc: eranian@google.com
    Link: http://lkml.kernel.org/r/1431285195-14269-8-git-send-email-kan.liang@intel.com
    Signed-off-by: Ingo Molnar

    Kan Liang
     
  • When the PEBS interrupt threshold is larger than one record and the
    machine supports multiple PEBS events, the records of these events are
    mixed up and we need to demultiplex them.

    Demuxing the records is hard because the hardware is deficient. The
    hardware has two issues that, when combined, create impossible
    scenarios to demux.

    The first issue is that the 'status' field of the PEBS record is a copy
    of the GLOBAL_STATUS MSR at PEBS assist time. To see why this is a
    problem let us first describe the regular PEBS cycle:

    A) the CTRn value reaches 0:
    - the corresponding bit in GLOBAL_STATUS gets set
    - we start arming the hardware assist
    < some unspecified amount of time later -- this could cover multiple
    events of interest >

    B) the hardware assist is armed, any next event will trigger it

    C) a matching event happens:
    - the hardware assist triggers and generates a PEBS record
    this includes a copy of GLOBAL_STATUS at this moment
    - if we auto-reload we (re)set CTRn
    - we clear the relevant bit in GLOBAL_STATUS

    Now consider the following chain of events:

    A0, B0, A1, C0

    The event generated for counter 0 will include a status with counter 1
    set, even though its not at all related to the record. A similar thing
    can happen with a !PEBS event if it just happens to overflow at the
    right moment.

    The second issue is that the hardware will only emit one record for two
    or more counters if the event that triggers the assist is 'close'. The
    'close' can be several cycles. In some cases even the complete assist,
    if the event is something that doesn't need retirement.

    For instance, consider this chain of events:

    A0, B0, A1, B1, C01

    Where C01 is an event that triggers both hardware assists, we will
    generate but a single record, but again with both counters listed in the
    status field.

    This time the record pertains to both events.

    Note that these two cases are different but undistinguishable with the
    data as generated. Therefore demuxing records with multiple PEBS bits
    (we can safely ignore status bits for !PEBS counters) is impossible.

    Furthermore we cannot emit the record to both events because that might
    cause a data leak -- the events might not have the same privileges -- so
    what this patch does is discard such events.

    The assumption/hope is that such discards will be rare.

    Here lists some possible ways you may get high discard rate.

    - when you count the same thing multiple times. But it is not a useful
    configuration.
    - you can be unfortunate if you measure with a userspace only PEBS
    event along with either a kernel or unrestricted PEBS event. Imagine
    the event triggering and setting the overflow flag right before
    entering the kernel. Then all kernel side events will end up with
    multiple bits set.

    Signed-off-by: Yan, Zheng
    Signed-off-by: Kan Liang
    [ Changelog improvements. ]
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: acme@infradead.org
    Cc: eranian@google.com
    Link: http://lkml.kernel.org/r/1430940834-8964-4-git-send-email-kan.liang@intel.com
    Signed-off-by: Ingo Molnar

    Signed-off-by: Ingo Molnar

    Yan, Zheng
     

27 May, 2015

4 commits

  • In certain circumstances it may not be possible to schedule particular
    events due to constraints other than a lack of hardware counters (e.g.
    on big.LITTLE systems where CPUs support different events). The core
    perf event code does not distinguish these cases and pessimistically
    assumes that any failure to schedule an event means that it is not worth
    attempting to schedule later events, even if some hardware counters are
    still unused.

    When an event a pmu cannot schedule exists in a flexible group list it
    can unnecessarily prevent event groups following it in the list from
    being scheduled (until it is rotated to the end of the list). This means
    some events are scheduled for only a portion of the time they could be,
    and for short running programs no events may be scheduled if the list is
    initially sorted in an unfortunate order.

    This patch adds a new (optional) filter_match function pointer to struct
    pmu which a pmu driver can use to tell perf core when an event matches
    pmu-specific scheduling requirements. This plugs into the existing
    event_filter_match logic, and makes it possible to avoid the scheduling
    problem described above. When no filter is provided by the PMU, the
    existing behaviour is retained.

    Cc: Paul Mackerras
    Cc: Ingo Molnar
    Cc: Arnaldo Carvalho de Melo
    Acked-by: Will Deacon
    Acked-by: Peter Zijlstra
    Signed-off-by: Mark Rutland
    Signed-off-by: Will Deacon

    Mark Rutland
     
  • 'int' is really not a proper data type for an MSR. Use u32 to make it
    clear that we are dealing with a 32-bit unsigned hardware value.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Matt Fleming
    Cc: Kanaka Juvva
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Vikas Shivappa
    Cc: Will Auld
    Link: http://lkml.kernel.org/r/20150518235149.919350144@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Commit 43b4578071c0 ("perf/x86: Reduce stack usage of
    x86_schedule_events()") violated the rule that 'fake' scheduling; as
    used for event/group validation; should not change the event state.

    This went mostly un-noticed because repeated calls of
    x86_pmu::get_event_constraints() would give the same result. And
    x86_pmu::put_event_constraints() would mostly not do anything.

    Commit e979121b1b15 ("perf/x86/intel: Implement cross-HT corruption
    bug workaround") made the situation much worse by actually setting the
    event->hw.constraint value to NULL, so when validation and actual
    scheduling interact we get NULL ptr derefs.

    Fix it by removing the constraint pointer from the event and move it
    back to an array, this time in cpuc instead of on the stack.

    validate_group()
    x86_schedule_events()
    event->hw.constraint = c; # store


    perf_task_event_sched_in()
    ...
    x86_schedule_events();
    event->hw.constraint = c2; # store

    ...

    put_event_constraints(event); # assume failure to schedule
    intel_put_event_constraints()
    event->hw.constraint = NULL;

    c = event->hw.constraint; # read -> NULL

    if (!test_bit(hwc->idx, c->idxmsk)) #
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Hunter
    Cc: Linus Torvalds
    Cc: Maria Dimakopoulou
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Fixes: 43b4578071c0 ("perf/x86: Reduce stack usage of x86_schedule_events()")
    Fixes: e979121b1b15 ("perf/x86/intel: Implement cross-HT corruption bug workaround")
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

18 May, 2015

1 commit

  • In the below two commits (see Fixes) we have periodic timers that can
    stop themselves when they're no longer required, but need to be
    (re)-started when their idle condition changes.

    Further complications is that we want the timer handler to always do
    the forward such that it will always correctly deal with the overruns,
    and we do not want to race such that the handler has already decided
    to stop, but the (external) restart sees the timer still active and we
    end up with a 'lost' timer.

    The problem with the current code is that the re-start can come before
    the callback does the forward, at which point the forward from the
    callback will WARN about forwarding an enqueued timer.

    Now, conceptually its easy to detect if you're before or after the fwd
    by comparing the expiration time against the current time. Of course,
    that's expensive (and racy) because we don't have the current time.

    Alternatively one could cache this state inside the timer, but then
    everybody pays the overhead of maintaining this extra state, and that
    is undesired.

    The only other option that I could see is the external timer_active
    variable, which I tried to kill before. I would love a nicer interface
    for this seemingly simple 'problem' but alas.

    Fixes: 272325c4821f ("perf: Fix mux_interval hrtimer wreckage")
    Fixes: 77a4d1a1b9a1 ("sched: Cleanup bandwidth timers")
    Cc: pjt@google.com
    Cc: tglx@linutronix.de
    Cc: klamm@yandex-team.ru
    Cc: mingo@kernel.org
    Cc: bsegall@google.com
    Cc: hpa@zytor.com
    Cc: Sasha Levin
    Signed-off-by: Peter Zijlstra (Intel)
    Link: http://lkml.kernel.org/r/20150514102311.GX21418@twins.programming.kicks-ass.net

    Peter Zijlstra
     

14 May, 2015

1 commit


08 May, 2015

1 commit

  • Stephane asked about PERF_COUNT_SW_CPU_MIGRATIONS and I realized it
    was borken:

    > The problem is that the task isn't actually scheduled while its being
    > migrated (obviously), and if its not scheduled, the counters aren't
    > scheduled either, so there's no observing of the fact.
    >
    > A further problem with migrations is that many migrations happen from
    > softirq context, which is nested inside the 'random' task context of
    > whoemever happens to run at that time, similarly for the wakeup
    > migrations triggered from (soft)irq context. All those end up being
    > accounted in the task that's currently running, eg. your 'ls'.

    The below cures this by marking a task as migrated and accounting it
    on the subsequent sched_in().

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Borislav Petkov
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

02 Apr, 2015

6 commits

  • For counters that generate AUX data that is bound to the context of a
    running task, such as instruction tracing, the decoder needs to know
    exactly which task is running when the event is first scheduled in,
    before the first sched_switch. The decoder's need to know this stems
    from the fact that instruction flow trace decoding will almost always
    require program's object code in order to reconstruct said flow and
    for that we need at least its pid/tid in the perf stream.

    To single out such instruction tracing pmus, this patch introduces
    ITRACE PMU capability. The reason this is not part of RECORD_AUX
    record is that not all pmus capable of generating AUX data need this,
    and the opposite is *probably* also true.

    While sched_switch covers for most cases, there are two problems with it:
    the consumer will need to process events out of order (that is, having
    found RECORD_AUX, it will have to skip forward to the nearest sched_switch
    to figure out which task it was, then go back to the actual trace to
    decode it) and it completely misses the case when the tracing is enabled
    and disabled before sched_switch, for example, via PERF_EVENT_IOC_DISABLE.

    Signed-off-by: Alexander Shishkin
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Borislav Petkov
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Kaixu Xia
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Robert Richter
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: acme@infradead.org
    Cc: adrian.hunter@intel.com
    Cc: kan.liang@intel.com
    Cc: markus.t.metzger@intel.com
    Cc: mathieu.poirier@linaro.org
    Link: http://lkml.kernel.org/r/1421237903-181015-15-git-send-email-alexander.shishkin@linux.intel.com
    Signed-off-by: Ingo Molnar

    Alexander Shishkin
     
  • For pmus that wish to write data to ring buffer's AUX area, provide
    perf_aux_output_{begin,end}() calls to initiate/commit data writes,
    similarly to perf_output_{begin,end}. These also use the same output
    handle structure. Also, similarly to software counterparts, these
    will direct inherited events' output to parents' ring buffers.

    After the perf_aux_output_begin() returns successfully, handle->size
    is set to the maximum amount of data that can be written wrt aux_tail
    pointer, so that no data that the user hasn't seen will be overwritten,
    therefore this should always be called before hardware writing is
    enabled. On success, this will return the pointer to pmu driver's
    private structure allocated for this aux area by pmu::setup_aux. Same
    pointer can also be retrieved using perf_get_aux() while hardware
    writing is enabled.

    PMU driver should pass the actual amount of data written as a parameter
    to perf_aux_output_end(). All hardware writes should be completed and
    visible before this one is called.

    Additionally, perf_aux_output_skip() will adjust output handle and
    aux_head in case some part of the buffer has to be skipped over to
    maintain hardware's alignment constraints.

    Nested writers are forbidden and guards are in place to catch such
    attempts.

    Signed-off-by: Alexander Shishkin
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Borislav Petkov
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Kaixu Xia
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Robert Richter
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: acme@infradead.org
    Cc: adrian.hunter@intel.com
    Cc: kan.liang@intel.com
    Cc: markus.t.metzger@intel.com
    Cc: mathieu.poirier@linaro.org
    Link: http://lkml.kernel.org/r/1421237903-181015-8-git-send-email-alexander.shishkin@linux.intel.com
    Signed-off-by: Ingo Molnar

    Alexander Shishkin
     
  • Usually, pmus that do, for example, instruction tracing, would only ever
    be able to have one event per task per cpu (or per perf_event_context). For
    such pmus it makes sense to disallow creating conflicting events early on,
    so as to provide consistent behavior for the user.

    This patch adds a pmu capability that indicates such constraint on event
    creation.

    Signed-off-by: Alexander Shishkin
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Borislav Petkov
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Kaixu Xia
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Robert Richter
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: acme@infradead.org
    Cc: adrian.hunter@intel.com
    Cc: kan.liang@intel.com
    Cc: markus.t.metzger@intel.com
    Cc: mathieu.poirier@linaro.org
    Link: http://lkml.kernel.org/r/1422613866-113186-1-git-send-email-alexander.shishkin@linux.intel.com
    Signed-off-by: Ingo Molnar

    Alexander Shishkin
     
  • For pmus that don't support scatter-gather for AUX data in hardware, it
    might still make sense to implement software double buffering to avoid
    losing data while the user is reading data out. For this purpose, add
    a pmu capability that guarantees multiple high-order chunks for AUX buffer,
    so that the pmu driver can do switchover tricks.

    To make use of this feature, add PERF_PMU_CAP_AUX_SW_DOUBLEBUF to your
    pmu's capability mask. This will make the ring buffer AUX allocation code
    ensure that the biggest high order allocation for the aux buffer pages is
    no bigger than half of the total requested buffer size, thus making sure
    that the buffer has at least two high order allocations.

    Signed-off-by: Alexander Shishkin
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Borislav Petkov
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Kaixu Xia
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Robert Richter
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: acme@infradead.org
    Cc: adrian.hunter@intel.com
    Cc: kan.liang@intel.com
    Cc: markus.t.metzger@intel.com
    Cc: mathieu.poirier@linaro.org
    Link: http://lkml.kernel.org/r/1421237903-181015-5-git-send-email-alexander.shishkin@linux.intel.com
    Signed-off-by: Ingo Molnar

    Alexander Shishkin
     
  • Some pmus (such as BTS or Intel PT without multiple-entry ToPA capability)
    don't support scatter-gather and will prefer larger contiguous areas for
    their output regions.

    This patch adds a new pmu capability to request higher order allocations.

    Signed-off-by: Alexander Shishkin
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Borislav Petkov
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Kaixu Xia
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Robert Richter
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: acme@infradead.org
    Cc: adrian.hunter@intel.com
    Cc: kan.liang@intel.com
    Cc: markus.t.metzger@intel.com
    Cc: mathieu.poirier@linaro.org
    Link: http://lkml.kernel.org/r/1421237903-181015-4-git-send-email-alexander.shishkin@linux.intel.com
    Signed-off-by: Ingo Molnar

    Alexander Shishkin
     
  • This patch introduces "AUX space" in the perf mmap buffer, intended for
    exporting high bandwidth data streams to userspace, such as instruction
    flow traces.

    AUX space is a ring buffer, defined by aux_{offset,size} fields in the
    user_page structure, and read/write pointers aux_{head,tail}, which abide
    by the same rules as data_* counterparts of the main perf buffer.

    In order to allocate/mmap AUX, userspace needs to set up aux_offset to
    such an offset that will be greater than data_offset+data_size and
    aux_size to be the desired buffer size. Both need to be page aligned.
    Then, same aux_offset and aux_size should be passed to mmap() call and
    if everything adds up, you should have an AUX buffer as a result.

    Pages that are mapped into this buffer also come out of user's mlock
    rlimit plus perf_event_mlock_kb allowance.

    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Alexander Shishkin
    Cc: Borislav Petkov
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Kaixu Xia
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Robert Richter
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: acme@infradead.org
    Cc: adrian.hunter@intel.com
    Cc: kan.liang@intel.com
    Cc: markus.t.metzger@intel.com
    Cc: mathieu.poirier@linaro.org
    Link: http://lkml.kernel.org/r/1421237903-181015-3-git-send-email-alexander.shishkin@linux.intel.com
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

27 Mar, 2015

2 commits

  • While thinking on the whole clock discussion it occurred to me we have
    two distinct uses of time:

    1) the tracking of event/ctx/cgroup enabled/running/stopped times
    which includes the self-monitoring support in struct
    perf_event_mmap_page.

    2) the actual timestamps visible in the data records.

    And we've been conflating them.

    The first is all about tracking time deltas, nobody should really care
    in what time base that happens, its all relative information, as long
    as its internally consistent it works.

    The second however is what people are worried about when having to
    merge their data with external sources. And here we have the
    discussion on MONOTONIC vs MONOTONIC_RAW etc..

    Where MONOTONIC is good for correlating between machines (static
    offset), MONOTNIC_RAW is required for correlating against a fixed rate
    hardware clock.

    This means configurability; now 1) makes that hard because it needs to
    be internally consistent across groups of unrelated events; which is
    why we had to have a global perf_clock().

    However, for 2) it doesn't really matter, perf itself doesn't care
    what it writes into the buffer.

    The below patch makes the distinction between these two cases by
    adding perf_event_clock() which is used for the second case. It
    further makes this configurable on a per-event basis, but adds a few
    sanity checks such that we cannot combine events with different clocks
    in confusing ways.

    And since we then have per-event configurability we might as well
    retain the 'legacy' behaviour as a default.

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Arnaldo Carvalho de Melo
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: John Stultz
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Signed-off-by: Ingo Molnar

    Ingo Molnar
     

23 Mar, 2015

1 commit

  • The only reason CQM had to use a hard-coded pmu type was so it could use
    cqm_target in hw_perf_event.

    Do away with the {tp,bp,cqm}_target pointers and provide a non type
    specific one.

    This allows us to do away with that silly pmu type as well.

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Vince Weaver
    Cc: acme@kernel.org
    Cc: acme@redhat.com
    Cc: hpa@zytor.com
    Cc: jolsa@redhat.com
    Cc: kanaka.d.juvva@intel.com
    Cc: matt.fleming@intel.com
    Cc: tglx@linutronix.de
    Cc: torvalds@linux-foundation.org
    Cc: vikas.shivappa@linux.intel.com
    Link: http://lkml.kernel.org/r/20150305211019.GU21418@twins.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

26 Feb, 2015

1 commit


25 Feb, 2015

4 commits

  • Add support for task events as well as system-wide events. This change
    has a big impact on the way that we gather LLC occupancy values in
    intel_cqm_event_read().

    Currently, for system-wide (per-cpu) events we defer processing to
    userspace which knows how to discard all but one cpu result per package.

    Things aren't so simple for task events because we need to do the value
    aggregation ourselves. To do this, we defer updating the LLC occupancy
    value in event->count from intel_cqm_event_read() and do an SMP
    cross-call to read values for all packages in intel_cqm_event_count().
    We need to ensure that we only do this for one task event per cache
    group, otherwise we'll report duplicate values.

    If we're a system-wide event we want to fallback to the default
    perf_event_count() implementation. Refactor this into a common function
    so that we don't duplicate the code.

    Also, introduce PERF_TYPE_INTEL_CQM, since we need a way to track an
    event's task (if the event isn't per-cpu) inside of the Intel CQM PMU
    driver. This task information is only availble in the upper layers of
    the perf infrastructure.

    Other perf backends stash the target task in event->hw.*target so we
    need to do something similar. The task is used to determine whether
    events should share a cache group and an RMID.

    Signed-off-by: Matt Fleming
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Arnaldo Carvalho de Melo
    Cc: Arnaldo Carvalho de Melo
    Cc: H. Peter Anvin
    Cc: Jiri Olsa
    Cc: Kanaka Juvva
    Cc: Linus Torvalds
    Cc: Vikas Shivappa
    Cc: linux-api@vger.kernel.org
    Link: http://lkml.kernel.org/r/1422038748-21397-8-git-send-email-matt@codeblueprint.co.uk
    Signed-off-by: Ingo Molnar

    Matt Fleming
     
  • Future Intel Xeon processors support a Cache QoS Monitoring feature that
    allows tracking of the LLC occupancy for a task or task group, i.e. the
    amount of data in pulled into the LLC for the task (group).

    Currently the PMU only supports per-cpu events. We create an event for
    each cpu and read out all the LLC occupancy values.

    Because this results in duplicate values being written out to userspace,
    we also export a .per-pkg event file so that the perf tools only
    accumulate values for one cpu per package.

    Signed-off-by: Matt Fleming
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Arnaldo Carvalho de Melo
    Cc: Arnaldo Carvalho de Melo
    Cc: H. Peter Anvin
    Cc: Jiri Olsa
    Cc: Kanaka Juvva
    Cc: Linus Torvalds
    Cc: Vikas Shivappa
    Link: http://lkml.kernel.org/r/1422038748-21397-6-git-send-email-matt@codeblueprint.co.uk
    Signed-off-by: Ingo Molnar

    Matt Fleming
     
  • For PMU drivers that record per-package counters, the ->count variable
    cannot be used to record an accurate aggregated value, since it's not
    possible to perform SMP cross-calls to cpus on other packages from the
    context in which we update ->count.

    Introduce a new optional ->count() accessor function that can be used to
    customize how values are collected. If a PMU driver doesn't provide a
    ->count() function, we fallback to the existing code.

    There is necessarily a window of staleness with this approach because
    the task that generated the counter value may not have been scheduled by
    the cpu recently.

    An alternative and more complex approach would be to use a hrtimer to
    periodically refresh the values from a more permissive scheduling
    context. So, we're trading off complexity for accuracy.

    Signed-off-by: Matt Fleming
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Arnaldo Carvalho de Melo
    Cc: Arnaldo Carvalho de Melo
    Cc: H. Peter Anvin
    Cc: Jiri Olsa
    Cc: Kanaka Juvva
    Cc: Linus Torvalds
    Cc: Vikas Shivappa
    Link: http://lkml.kernel.org/r/1422038748-21397-3-git-send-email-matt@codeblueprint.co.uk
    Signed-off-by: Ingo Molnar

    Matt Fleming
     
  • Move perf_cgroup_from_task() from kernel/events/ to include/linux/ along
    with the necessary struct definitions, so that it can be used by the PMU
    code.

    When the upcoming Intel Cache Monitoring PMU driver assigns monitoring
    IDs to perf events, it needs to be able to check whether any two
    monitoring events overlap (say, a cgroup and task event), which means we
    need to be able to lookup the cgroup associated with a task (if any).

    Signed-off-by: Matt Fleming
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Arnaldo Carvalho de Melo
    Cc: Arnaldo Carvalho de Melo
    Cc: H. Peter Anvin
    Cc: Jiri Olsa
    Cc: Kanaka Juvva
    Cc: Linus Torvalds
    Cc: Paul Mackerras
    Cc: Vikas Shivappa
    Link: http://lkml.kernel.org/r/1422038748-21397-2-git-send-email-matt@codeblueprint.co.uk
    Signed-off-by: Ingo Molnar

    Matt Fleming
     

19 Feb, 2015

5 commits

  • The recent LBR rework for x86 left a stray flush_branch_stack() user in
    the PowerPC code, fix that up.

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Anshuman Khandual
    Cc: Anton Blanchard
    Cc: Arnaldo Carvalho de Melo
    Cc: Benjamin Herrenschmidt
    Cc: Christoph Lameter
    Cc: Joel Stanley
    Cc: Linus Torvalds
    Cc: Michael Ellerman
    Cc: Michael Neuling
    Cc: Paul Mackerras
    Cc: Tejun Heo
    Cc: linuxppc-dev@lists.ozlabs.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Use event->attr.branch_sample_type to replace
    intel_pmu_needs_lbr_smpl() for avoiding duplicated code that
    implicitly enables the LBR.

    Currently, branch stack can be enabled by user explicitly requesting
    branch sampling or implicit branch sampling to correct PEBS skid.

    For user explicitly requested branch sampling, the branch_sample_type
    is explicitly set by user. For PEBS case, the branch_sample_type is also
    implicitly set to PERF_SAMPLE_BRANCH_ANY in x86_pmu_hw_config.

    Signed-off-by: Yan, Zheng
    Signed-off-by: Kan Liang
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Arnaldo Carvalho de Melo
    Cc: Linus Torvalds
    Cc: Paul Mackerras
    Cc: eranian@google.com
    Cc: jolsa@redhat.com
    Link: http://lkml.kernel.org/r/1415156173-10035-11-git-send-email-kan.liang@intel.com
    Signed-off-by: Ingo Molnar

    Yan, Zheng
     
  • Introduce a new flag PERF_ATTACH_TASK_DATA for perf event's attach
    stata. The flag is set by PMU's event_init() callback, it indicates
    that perf event needs PMU specific data.

    The PMU specific data are initialized to zeros. Later patches will
    use PMU specific data to save LBR stack.

    Signed-off-by: Yan, Zheng
    Signed-off-by: Kan Liang
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Arnaldo Carvalho de Melo
    Cc: Linus Torvalds
    Cc: Paul Mackerras
    Cc: eranian@google.com
    Cc: jolsa@redhat.com
    Link: http://lkml.kernel.org/r/1415156173-10035-6-git-send-email-kan.liang@intel.com
    Signed-off-by: Ingo Molnar

    Yan, Zheng
     
  • Previous commit introduces context switch callback, its function
    overlaps with the flush branch stack callback. So we can use the
    context switch callback to flush LBR stack.

    This patch adds code that uses the flush branch callback to
    flush the LBR stack when task is being scheduled in. The callback
    is enabled only when there are events use the LBR hardware. This
    patch also removes all old flush branch stack code.

    Signed-off-by: Yan, Zheng
    Signed-off-by: Kan Liang
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andy Lutomirski
    Cc: Arnaldo Carvalho de Melo
    Cc: Linus Torvalds
    Cc: Paul Mackerras
    Cc: Vince Weaver
    Cc: eranian@google.com
    Cc: jolsa@redhat.com
    Link: http://lkml.kernel.org/r/1415156173-10035-4-git-send-email-kan.liang@intel.com
    Signed-off-by: Ingo Molnar

    Yan, Zheng
     
  • The callback is invoked when process is scheduled in or out.
    It provides mechanism for later patches to save/store the LBR
    stack. For the schedule in case, the callback is invoked at
    the same place that flush branch stack callback is invoked.
    So it also can replace the flush branch stack callback. To
    avoid unnecessary overhead, the callback is enabled only when
    there are events use the LBR stack.

    Signed-off-by: Yan, Zheng
    Signed-off-by: Kan Liang
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andy Lutomirski
    Cc: Arnaldo Carvalho de Melo
    Cc: Linus Torvalds
    Cc: Paul Mackerras
    Cc: Vince Weaver
    Cc: eranian@google.com
    Cc: jolsa@redhat.com
    Link: http://lkml.kernel.org/r/1415156173-10035-3-git-send-email-kan.liang@intel.com
    Signed-off-by: Ingo Molnar

    Yan, Zheng
     

17 Feb, 2015

1 commit

  • Pull x86 perf updates from Ingo Molnar:
    "This series tightens up RDPMC permissions: currently even highly
    sandboxed x86 execution environments (such as seccomp) have permission
    to execute RDPMC, which may leak various perf events / PMU state such
    as timing information and other CPU execution details.

    This 'all is allowed' RDPMC mode is still preserved as the
    (non-default) /sys/devices/cpu/rdpmc=2 setting. The new default is
    that RDPMC access is only allowed if a perf event is mmap-ed (which is
    needed to correctly interpret RDPMC counter values in any case).

    As a side effect of these changes CR4 handling is cleaned up in the
    x86 code and a shadow copy of the CR4 value is added.

    The extra CR4 manipulation adds ~ of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    perf/x86: Add /sys/devices/cpu/rdpmc=2 to allow rdpmc for all tasks
    perf/x86: Only allow rdpmc if a perf_event is mapped
    perf: Pass the event to arch_perf_update_userpage()
    perf: Add pmu callbacks to track event mapping and unmapping
    x86: Add a comment clarifying LDT context switching
    x86: Store a per-cpu shadow copy of CR4
    x86: Clean up cr4 manipulation

    Linus Torvalds
     

12 Feb, 2015

1 commit

  • Pull powerpc updates from Michael Ellerman:

    - Update of all defconfigs

    - Addition of a bunch of config options to modernise our defconfigs

    - Some PS3 updates from Geoff

    - Optimised memcmp for 64 bit from Anton

    - Fix for kprobes that allows 'perf probe' to work from Naveen

    - Several cxl updates from Ian & Ryan

    - Expanded support for the '24x7' PMU from Cody & Sukadev

    - Freescale updates from Scott:
    "Highlights include 8xx optimizations, some more work on datapath
    device tree content, e300 machine check support, t1040 corenet
    error reporting, and various cleanups and fixes"

    * tag 'powerpc-3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (102 commits)
    cxl: Add missing return statement after handling AFU errror
    cxl: Fail AFU initialisation if an invalid configuration record is found
    cxl: Export optional AFU configuration record in sysfs
    powerpc/mm: Warn on flushing tlb page in kernel context
    powerpc/powernv: Add OPAL soft-poweroff routine
    powerpc/perf/hv-24x7: Document sysfs event description entries
    powerpc/perf/hv-gpci: add the remaining gpci requests
    powerpc/perf/{hv-gpci, hv-common}: generate requests with counters annotated
    powerpc/perf/hv-24x7: parse catalog and populate sysfs with events
    perf: define EVENT_DEFINE_RANGE_FORMAT_LITE helper
    perf: add PMU_EVENT_ATTR_STRING() helper
    perf: provide sysfs_show for struct perf_pmu_events_attr
    powerpc/kernel: Avoid initializing device-tree pointer twice
    powerpc: Remove old compile time disabled syscall tracing code
    powerpc/kernel: Make syscall_exit a local label
    cxl: Fix device_node reference counting
    powerpc/mm: bail out early when flushing TLB page
    powerpc: defconfigs: add MTD_SPI_NOR (new dependency for M25P80)
    perf/powerpc: reset event hw state when adding it to the PMU
    powerpc/qe: Use strlcpy()
    ...

    Linus Torvalds
     

04 Feb, 2015

1 commit

  • Signed-off-by: Andy Lutomirski
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: Kees Cook
    Cc: Andrea Arcangeli
    Cc: Vince Weaver
    Cc: "hillf.zj"
    Cc: Valdis Kletnieks
    Cc: Linus Torvalds
    Link: http://lkml.kernel.org/r/266afcba1d1f91ea5501e4e16e94bbbc1a9339b6.1414190806.git.luto@amacapital.net
    Signed-off-by: Ingo Molnar

    Andy Lutomirski