Eric Lee / smarc-fsl-linux-kernel

10 Nov, 2020

2 commits

76a4efa80 perf/arch: Remove perf_sample_data::regs_user_copy ... Browse Code »

struct perf_sample_data lives on-stack, we should be careful about it's
size. Furthermore, the pt_regs copy in there is only because x86_64 is a
trainwreck, solve it differently.

Reported-by: Thomas Gleixner
Signed-off-by: Peter Zijlstra (Intel)
Tested-by: Steven Rostedt
Link: https://lkml.kernel.org/r/20201030151955.258178461@infradead.org

Peter Zijlstra
2020-11-10 01:12:34 +0800
267fb2735 perf: Reduce stack usage of perf_output_begin() ... Browse Code »

__perf_output_begin() has an on-stack struct perf_sample_data in the
unlikely case it needs to generate a LOST record. However, every call
to perf_output_begin() must already have a perf_sample_data on-stack.

Reported-by: Thomas Gleixner
Signed-off-by: Peter Zijlstra (Intel)
Link: https://lkml.kernel.org/r/20201030151954.985416146@infradead.org

Peter Zijlstra
2020-11-10 01:12:33 +0800

10 Sep, 2020

1 commit

44fae179c perf/core: Pull pmu::sched_task() into perf_event_context_sched_out() ... Browse Code »

The pmu::sched_task() is a context switch callback. It passes the
cpuctx->task_ctx as a parameter to the lower code. To find the
cpuctx->task_ctx, the current code iterates a cpuctx list.
The same context will iterated in perf_event_context_sched_out() soon.
Share the cpuctx->task_ctx can avoid the unnecessary iteration of the
cpuctx list.

The pmu::sched_task() is also required for the optimization case for
equivalent contexts.

The task_ctx_sched_out() will eventually disable and reenable the PMU
when schedule out events. Add perf_pmu_disable() and perf_pmu_enable()
around task_ctx_sched_out() don't break anything.

Drop the cpuctx->ctx.lock for the pmu::sched_task(). The lock is for
per-CPU context, which is not necessary for the per-task context
schedule.

No one uses sched_cb_entry, perf_sched_cb_usages, sched_cb_list, and
perf_pmu_sched_task() any more.

Suggested-by: Peter Zijlstra (Intel)
Signed-off-by: Kan Liang
Signed-off-by: Peter Zijlstra (Intel)
Link: https://lkml.kernel.org/r/20200821195754.20159-2-kan.liang@linux.intel.com

Kan Liang
2020-09-10 17:19:34 +0800

18 Aug, 2020

2 commits

2cb5383b3 perf/x86/intel: Support per-thread RDPMC TopDown metrics ... Browse Code »

Starts from Ice Lake, the TopDown metrics are directly available as
fixed counters and do not require generic counters. Also, the TopDown
metrics can be collected per thread. Extend the RDPMC usage to support
per-thread TopDown metrics.

The RDPMC index of the PERF_METRICS will be output if RDPMC users ask
for the RDPMC index of the metrics events.

To support per thread RDPMC TopDown, the metrics and slots counters have
to be saved/restored during the context switching.

The last_period and period_left are not used in the counting mode. Use
the fields for saved_metric and saved_slots.

Signed-off-by: Kan Liang
Signed-off-by: Peter Zijlstra (Intel)
Link: https://lkml.kernel.org/r/20200723171117.9918-12-kan.liang@linux.intel.com

Kan Liang
2020-08-18 22:34:37 +0800
9f0c4fa11 perf/core: Add a new PERF_EV_CAP_SIBLING event capability ... Browse Code »

Current perf assumes that events in a group are independent. Close an
event doesn't impact the value of the other events in the same group.
If the closed event is a member, after the event closure, other events
are still running like a group. If the closed event is a leader, other
events are running as singleton events.

Add PERF_EV_CAP_SIBLING to allow events to indicate they require being
part of a group, and when the leader dies they cannot exist
independently.

Suggested-by: Peter Zijlstra
Signed-off-by: Kan Liang
Signed-off-by: Peter Zijlstra (Intel)
Link: https://lkml.kernel.org/r/20200723171117.9918-8-kan.liang@linux.intel.com

Kan Liang
2020-08-18 22:34:36 +0800

06 Aug, 2020

1 commit

47ec5303d Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next ... Browse Code »

Pull networking updates from David Miller:

1) Support 6Ghz band in ath11k driver, from Rajkumar Manoharan.

2) Support UDP segmentation in code TSO code, from Eric Dumazet.

3) Allow flashing different flash images in cxgb4 driver, from Vishal
Kulkarni.

4) Add drop frames counter and flow status to tc flower offloading,
from Po Liu.

5) Support n-tuple filters in cxgb4, from Vishal Kulkarni.

6) Various new indirect call avoidance, from Eric Dumazet and Brian
Vazquez.

7) Fix BPF verifier failures on 32-bit pointer arithmetic, from
Yonghong Song.

8) Support querying and setting hardware address of a port function via
devlink, use this in mlx5, from Parav Pandit.

9) Support hw ipsec offload on bonding slaves, from Jarod Wilson.

10) Switch qca8k driver over to phylink, from Jonathan McDowell.

11) In bpftool, show list of processes holding BPF FD references to
maps, programs, links, and btf objects. From Andrii Nakryiko.

12) Several conversions over to generic power management, from Vaibhav
Gupta.

13) Add support for SO_KEEPALIVE et al. to bpf_setsockopt(), from Dmitry
Yakunin.

14) Various https url conversions, from Alexander A. Klimov.

15) Timestamping and PHC support for mscc PHY driver, from Antoine
Tenart.

16) Support bpf iterating over tcp and udp sockets, from Yonghong Song.

17) Support 5GBASE-T i40e NICs, from Aleksandr Loktionov.

18) Add kTLS RX HW offload support to mlx5e, from Tariq Toukan.

19) Fix the ->ndo_start_xmit() return type to be netdev_tx_t in several
drivers. From Luc Van Oostenryck.

20) XDP support for xen-netfront, from Denis Kirjanov.

21) Support receive buffer autotuning in MPTCP, from Florian Westphal.

22) Support EF100 chip in sfc driver, from Edward Cree.

23) Add XDP support to mvpp2 driver, from Matteo Croce.

24) Support MPTCP in sock_diag, from Paolo Abeni.

25) Commonize UDP tunnel offloading code by creating udp_tunnel_nic
infrastructure, from Jakub Kicinski.

26) Several pci_ --> dma_ API conversions, from Christophe JAILLET.

27) Add FLOW_ACTION_POLICE support to mlxsw, from Ido Schimmel.

28) Add SK_LOOKUP bpf program type, from Jakub Sitnicki.

29) Refactor a lot of networking socket option handling code in order to
avoid set_fs() calls, from Christoph Hellwig.

30) Add rfc4884 support to icmp code, from Willem de Bruijn.

31) Support TBF offload in dpaa2-eth driver, from Ioana Ciornei.

32) Support XDP_REDIRECT in qede driver, from Alexander Lobakin.

33) Support PCI relaxed ordering in mlx5 driver, from Aya Levin.

34) Support TCP syncookies in MPTCP, from Flowian Westphal.

35) Fix several tricky cases of PMTU handling wrt. briding, from Stefano
Brivio.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2056 commits)
net: thunderx: initialize VF's mailbox mutex before first usage
usb: hso: remove bogus check for EINPROGRESS
usb: hso: no complaint about kmalloc failure
hso: fix bailout in error case of probe
ip_tunnel_core: Fix build for archs without _HAVE_ARCH_IPV6_CSUM
selftests/net: relax cpu affinity requirement in msg_zerocopy test
mptcp: be careful on subflow creation
selftests: rtnetlink: make kci_test_encap() return sub-test result
selftests: rtnetlink: correct the final return value for the test
net: dsa: sja1105: use detected device id instead of DT one on mismatch
tipc: set ub->ifindex for local ipv6 address
ipv6: add ipv6_dev_find()
net: openvswitch: silence suspicious RCU usage warning
Revert "vxlan: fix tos value before xmit"
ptp: only allow phase values lower than 1 period
farsync: switch from 'pci_' to 'dma_' API
wan: wanxl: switch from 'pci_' to 'dma_' API
hv_netvsc: do not use VF device if link is down
dpaa2-eth: Fix passing zero to 'PTR_ERR' warning
net: macb: Properly handle phylink on at91sam9x
...

Linus Torvalds
2020-08-06 11:13:21 +0800

22 Jul, 2020

1 commit

c2127e14c perf: <linux/perf_event.h>: drop a duplicated word ... Browse Code »

Drop the repeated word "the" in a comment.

Signed-off-by: Randy Dunlap
Signed-off-by: Peter Zijlstra (Intel)
Link: https://lkml.kernel.org/r/20200719003027.20798-1-rdunlap@infradead.org

Randy Dunlap
2020-07-22 16:22:08 +0800

08 Jul, 2020

2 commits

5a09928d3 perf/x86: Remove task_ctx_size ... Browse Code »

A new kmem_cache method has replaced the kzalloc() to allocate the PMU
specific data. The task_ctx_size is not required anymore.

Signed-off-by: Kan Liang
Signed-off-by: Peter Zijlstra (Intel)
Link: https://lkml.kernel.org/r/1593780569-62993-19-git-send-email-kan.liang@linux.intel.com

Kan Liang
2020-07-08 17:38:55 +0800
217c2a633 perf/core: Use kmem_cache to allocate the PMU specific data ... Browse Code »

Currently, the PMU specific data task_ctx_data is allocated by the
function kzalloc() in the perf generic code. When there is no specific
alignment requirement for the task_ctx_data, the method works well for
now. However, there will be a problem once a specific alignment
requirement is introduced in future features, e.g., the Architecture LBR
XSAVE feature requires 64-byte alignment. If the specific alignment
requirement is not fulfilled, the XSAVE family of instructions will fail
to save/restore the xstate to/from the task_ctx_data.

The function kzalloc() itself only guarantees a natural alignment. A
new method to allocate the task_ctx_data has to be introduced, which
has to meet the requirements as below:
- must be a generic method can be used by different architectures,
because the allocation of the task_ctx_data is implemented in the
perf generic code;
- must be an alignment-guarantee method (The alignment requirement is
not changed after the boot);
- must be able to allocate/free a buffer (smaller than a page size)
dynamically;
- should not cause extra CPU overhead or space overhead.

Several options were considered as below:
- One option is to allocate a larger buffer for task_ctx_data. E.g.,
ptr = kmalloc(size + alignment, GFP_KERNEL);
ptr &= ~(alignment - 1);
This option causes space overhead.
- Another option is to allocate the task_ctx_data in the PMU specific
code. To do so, several function pointers have to be added. As a
result, both the generic structure and the PMU specific structure
will become bigger. Besides, extra function calls are added when
allocating/freeing the buffer. This option will increase both the
space overhead and CPU overhead.
- The third option is to use a kmem_cache to allocate a buffer for the
task_ctx_data. The kmem_cache can be created with a specific alignment
requirement by the PMU at boot time. A new pointer for kmem_cache has
to be added in the generic struct pmu, which would be used to
dynamically allocate a buffer for the task_ctx_data at run time.
Although the new pointer is added to the struct pmu, the existing
variable task_ctx_size is not required anymore. The size of the
generic structure is kept the same.

The third option which meets all the aforementioned requirements is used
to replace kzalloc() for the PMU specific data allocation. A later patch
will remove the kzalloc() method and the related variables.

Signed-off-by: Kan Liang
Signed-off-by: Peter Zijlstra (Intel)
Link: https://lkml.kernel.org/r/1593780569-62993-17-git-send-email-kan.liang@linux.intel.com

Kan Liang
2020-07-08 17:38:55 +0800

01 Jul, 2020

1 commit

d141b8bc5 perf: Expose get/put_callchain_entry() ... Browse Code »

Sanitize and expose get/put_callchain_entry(). This would be used by bpf
stack map.

Suggested-by: Peter Zijlstra
Signed-off-by: Song Liu
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Link: https://lore.kernel.org/bpf/20200630062846.664389-2-songliubraving@fb.com

Song Liu
2020-07-01 23:22:08 +0800

15 Jun, 2020

1 commit

e17d43b93 perf: Add perf text poke event ... Browse Code »

Record (single instruction) changes to the kernel text (i.e.
self-modifying code) in order to support tracers like Intel PT and
ARM CoreSight.

A copy of the running kernel code is needed as a reference point (e.g.
from /proc/kcore). The text poke event records the old bytes and the
new bytes so that the event can be processed forwards or backwards.

The basic problem is recording the modified instruction in an
unambiguous manner given SMP instruction cache (in)coherence. That is,
when modifying an instruction concurrently any solution with one or
multiple timestamps is not sufficient:

CPU0 CPU1
0
1 write insn A
2 execute insn A
3 sync-I$
4

Due to I$, CPU1 might execute either the old or new A. No matter where
we record tracepoints on CPU0, one simply cannot tell what CPU1 will
have observed, except that at 0 it must be the old one and at 4 it
must be the new one.

To solve this, take inspiration from x86 text poking, which has to
solve this exact problem due to variable length instruction encoding
and I-fetch windows.

1) overwrite the instruction with a breakpoint and sync I$

This guarantees that that code flow will never hit the target
instruction anymore, on any CPU (or rather, it will cause an
exception).

2) issue the TEXT_POKE event

3) overwrite the breakpoint with the new instruction and sync I$

Now we know that any execution after the TEXT_POKE event will either
observe the breakpoint (and hit the exception) or the new instruction.

So by guarding the TEXT_POKE event with an exception on either side;
we can now tell, without doubt, which instruction another CPU will
have observed.

Signed-off-by: Adrian Hunter
Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Peter Zijlstra (Intel)
Link: https://lkml.kernel.org/r/20200512121922.8997-2-adrian.hunter@intel.com

Adrian Hunter
2020-06-15 20:09:48 +0800

04 Jun, 2020

1 commit

cb8e59cc8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next ... Browse Code »

Pull networking updates from David Miller:

1) Allow setting bluetooth L2CAP modes via socket option, from Luiz
Augusto von Dentz.

2) Add GSO partial support to igc, from Sasha Neftin.

3) Several cleanups and improvements to r8169 from Heiner Kallweit.

4) Add IF_OPER_TESTING link state and use it when ethtool triggers a
device self-test. From Andrew Lunn.

5) Start moving away from custom driver versions, use the globally
defined kernel version instead, from Leon Romanovsky.

6) Support GRO vis gro_cells in DSA layer, from Alexander Lobakin.

7) Allow hard IRQ deferral during NAPI, from Eric Dumazet.

8) Add sriov and vf support to hinic, from Luo bin.

9) Support Media Redundancy Protocol (MRP) in the bridging code, from
Horatiu Vultur.

10) Support netmap in the nft_nat code, from Pablo Neira Ayuso.

11) Allow UDPv6 encapsulation of ESP in the ipsec code, from Sabrina
Dubroca. Also add ipv6 support for espintcp.

12) Lots of ReST conversions of the networking documentation, from Mauro
Carvalho Chehab.

13) Support configuration of ethtool rxnfc flows in bcmgenet driver,
from Doug Berger.

14) Allow to dump cgroup id and filter by it in inet_diag code, from
Dmitry Yakunin.

15) Add infrastructure to export netlink attribute policies to
userspace, from Johannes Berg.

16) Several optimizations to sch_fq scheduler, from Eric Dumazet.

17) Fallback to the default qdisc if qdisc init fails because otherwise
a packet scheduler init failure will make a device inoperative. From
Jesper Dangaard Brouer.

18) Several RISCV bpf jit optimizations, from Luke Nelson.

19) Correct the return type of the ->ndo_start_xmit() method in several
drivers, it's netdev_tx_t but many drivers were using
'int'. From Yunjian Wang.

20) Add an ethtool interface for PHY master/slave config, from Oleksij
Rempel.

21) Add BPF iterators, from Yonghang Song.

22) Add cable test infrastructure, including ethool interfaces, from
Andrew Lunn. Marvell PHY driver is the first to support this
facility.

23) Remove zero-length arrays all over, from Gustavo A. R. Silva.

24) Calculate and maintain an explicit frame size in XDP, from Jesper
Dangaard Brouer.

25) Add CAP_BPF, from Alexei Starovoitov.

26) Support terse dumps in the packet scheduler, from Vlad Buslov.

27) Support XDP_TX bulking in dpaa2 driver, from Ioana Ciornei.

28) Add devm_register_netdev(), from Bartosz Golaszewski.

29) Minimize qdisc resets, from Cong Wang.

30) Get rid of kernel_getsockopt and kernel_setsockopt in order to
eliminate set_fs/get_fs calls. From Christoph Hellwig.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2517 commits)
selftests: net: ip_defrag: ignore EPERM
net_failover: fixed rollback in net_failover_open()
Revert "tipc: Fix potential tipc_aead refcnt leak in tipc_crypto_rcv"
Revert "tipc: Fix potential tipc_node refcnt leak in tipc_rcv"
vmxnet3: allow rx flow hash ops only when rss is enabled
hinic: add set_channels ethtool_ops support
selftests/bpf: Add a default $(CXX) value
tools/bpf: Don't use $(COMPILE.c)
bpf, selftests: Use bpf_probe_read_kernel
s390/bpf: Use bcr 0,%0 as tail call nop filler
s390/bpf: Maintain 8-byte stack alignment
selftests/bpf: Fix verifier test
selftests/bpf: Fix sample_cnt shared between two threads
bpf, selftests: Adapt cls_redirect to call csum_level helper
bpf: Add csum_level helper for fixing up csum levels
bpf: Fix up bpf_skb_adjust_room helper's skb csum setting
sfc: add missing annotation for efx_ef10_try_update_nic_stats_vf()
crypto/chtls: IPv6 support for inline TLS
Crypto/chcr: Fixes a coccinile check error
Crypto/chcr: Fixes compilations warnings
...

Linus Torvalds
2020-06-04 07:27:18 +0800

20 May, 2020

1 commit

c50c75e9b perf/core: Replace zero-length array with flexible-array ... Browse Code »

The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
int stuff;
struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

sizeof(flexible-array-member) triggers a warning because flexible array
members have incomplete type[1]. There are some instances of code in
which the sizeof operator is being incorrectly/erroneously applied to
zero-length arrays and the result is zero. Such instances may be hiding
some bugs. So, this work (flexible-array member conversions) will also
help to get completely rid of those sorts of issues.

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva
Signed-off-by: Peter Zijlstra (Intel)
Link: https://lkml.kernel.org/r/20200511201227.GA14041@embeddedor

Gustavo A. R. Silva
2020-05-20 02:34:16 +0800

27 Apr, 2020

1 commit

32927393d sysctl: pass kernel pointers to ->proc_handler ... Browse Code »

Instead of having all the sysctl handlers deal with user pointers, which
is rather hairy in terms of the BPF interaction, copy the input to and
from userspace in common code. This also means that the strings are
always NUL-terminated by the common code, making the API a little bit
safer.

As most handler just pass through the data to one of the common handlers
a lot of the changes are mechnical.

Signed-off-by: Christoph Hellwig
Acked-by: Andrey Ignatov
Signed-off-by: Al Viro

Christoph Hellwig
2020-04-27 14:07:40 +0800

16 Apr, 2020

1 commit

18aa18566 perf/core: Open access to the core for CAP_PERFMON privileged process ... Browse Code »

Open access to monitoring of kernel code, CPUs, tracepoints and
namespaces data for a CAP_PERFMON privileged process. Providing the
access under CAP_PERFMON capability singly, without the rest of
CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials
and makes operation more secure.

CAP_PERFMON implements the principle of least privilege for performance
monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39
principle of least privilege: A security design principle that states
that a process or program be granted only those privileges (e.g.,
capabilities) necessary to accomplish its legitimate function, and only
for the time that such privileges are actually required)

For backward compatibility reasons the access to perf_events subsystem
remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN
usage for secure perf_events monitoring is discouraged with respect to
CAP_PERFMON capability.

Signed-off-by: Alexey Budankov
Reviewed-by: James Morris
Tested-by: Arnaldo Carvalho de Melo
Cc: Alexei Starovoitov
Cc: Andi Kleen
Cc: Igor Lubashev
Cc: Jiri Olsa
Cc: linux-man@vger.kernel.org
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Serge Hallyn
Cc: Song Liu
Cc: Stephane Eranian
Cc: Thomas Gleixner
Cc: intel-gfx@lists.freedesktop.org
Cc: linux-doc@vger.kernel.org
Cc: linux-security-module@vger.kernel.org
Cc: selinux@vger.kernel.org
Link: http://lore.kernel.org/lkml/471acaef-bb8a-5ce2-923f-90606b78eef9@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo

Alexey Budankov
2020-04-16 23:19:08 +0800

27 Mar, 2020

1 commit

6546b19f9 perf/core: Add PERF_SAMPLE_CGROUP feature ... Browse Code »

The PERF_SAMPLE_CGROUP bit is to save (perf_event) cgroup information in
the sample. It will add a 64-bit id to identify current cgroup and it's
the file handle in the cgroup file system. Userspace should use this
information with PERF_RECORD_CGROUP event to match which cgroup it
belongs.

I put it before PERF_SAMPLE_AUX for simplicity since it just needs a
64-bit word. But if we want bigger samples, I can work on that
direction too.

Committer testing:

$ pahole perf_sample_data | grep -w cgroup -B5 -A5
/* --- cacheline 4 boundary (256 bytes) was 56 bytes ago --- */
struct perf_regs regs_intr; /* 312 16 */
/* --- cacheline 5 boundary (320 bytes) was 8 bytes ago --- */
u64 stack_user_size; /* 328 8 */
u64 phys_addr; /* 336 8 */
u64 cgroup; /* 344 8 */

/* size: 384, cachelines: 6, members: 22 */
/* padding: 32 */
};
$

Signed-off-by: Namhyung Kim
Tested-by: Arnaldo Carvalho de Melo
Acked-by: Peter Zijlstra (Intel)
Acked-by: Tejun Heo
Cc: Alexander Shishkin
Cc: Jiri Olsa
Cc: Johannes Weiner
Cc: Mark Rutland
Cc: Zefan Li
Link: http://lore.kernel.org/lkml/20200325124536.2800725-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Namhyung Kim
2020-03-27 21:41:44 +0800

06 Mar, 2020

1 commit

836196beb perf/core: Add per perf_cpu_context min_heap storage ... Browse Code »

The storage required for visit_groups_merge's min heap needs to vary in
order to support more iterators, such as when multiple nested cgroups'
events are being visited. This change allows for 2 iterators and doesn't
support growth.

Based-on-work-by: Peter Zijlstra (Intel)
Signed-off-by: Ian Rogers
Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Ingo Molnar
Link: https://lkml.kernel.org/r/20200214075133.181299-5-irogers@google.com

Ian Rogers
2020-03-06 18:57:00 +0800

11 Feb, 2020

1 commit

bbfd5e4fa perf/core: Add new branch sample type for HW index of raw branch records ... Browse Code »

The low level index is the index in the underlying hardware buffer of
the most recently captured taken branch which is always saved in
branch_entries[0]. It is very useful for reconstructing the call stack.
For example, in Intel LBR call stack mode, the depth of reconstructed
LBR call stack limits to the number of LBR registers. With the low level
index information, perf tool may stitch the stacks of two samples. The
reconstructed LBR call stack can break the HW limitation.

Add a new branch sample type to retrieve low level index of raw branch
records. The low level index is between -1 (unknown) and max depth which
can be retrieved in /sys/devices/cpu/caps/branches.

Only when the new branch sample type is set, the low level index
information is dumped into the PERF_SAMPLE_BRANCH_STACK output.
Perf tool should check the attr.branch_sample_type, and apply the
corresponding format for PERF_SAMPLE_BRANCH_STACK samples.
Otherwise, some user case may be broken. For example, users may parse a
perf.data, which include the new branch sample type, with an old version
perf tool (without the check). Users probably get incorrect information
without any warning.

Signed-off-by: Kan Liang
Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Ingo Molnar
Link: https://lkml.kernel.org/r/20200127165355.27495-2-kan.liang@linux.intel.com

Kan Liang
2020-02-11 20:23:49 +0800

10 Feb, 2020

1 commit

ca21b9b37 Merge tag 'perf-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull perf fixes from Thomas Gleixner:
"A set of fixes and improvements for the perf subsystem:

Kernel fixes:

- Install cgroup events to the correct CPU context to prevent a
potential list double add

- Prevent an integer underflow in the perf mlock accounting

- Add a missing prototype for arch_perf_update_userpage()

Tooling:

- Add a missing unlock in the error path of maps__insert() in perf
maps.

- Fix the build with the latest libbfd

- Fix the perf parser so it does not delete parse event terms, which
caused a regression for using perf with the ARM CoreSight as the
sink configuration was missing due to the deletion.

- Fix the double free in the perf CPU map merging test case

- Add the missing ustring support for the perf probe command"

* tag 'perf-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf maps: Add missing unlock to maps__insert() error case
perf probe: Add ustring support for perf probe command
perf: Make perf able to build with latest libbfd
perf test: Fix test case Merge cpu map
perf parse: Copy string to perf_evsel_config_term
perf parse: Refactor 'struct perf_evsel_config_term'
kernel/events: Add a missing prototype for arch_perf_update_userpage()
perf/cgroups: Install cgroup events to correct cpuctx
perf/core: Fix mlock accounting in perf_mmap()

Linus Torvalds
2020-02-10 04:04:09 +0800

29 Jan, 2020

1 commit

f1ec3a517 kernel/events: Add a missing prototype for arch_perf_update_userpage() ... Browse Code »

... in order to fix a -Wmissing-prototype warning.

No functional changes.

Signed-off-by: Benjamin Thiel
Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Ingo Molnar
Link: https://lkml.kernel.org/r/20200109131351.9468-1-b.thiel@posteo.de

Benjamin Thiel
2020-01-29 04:26:44 +0800

14 Jan, 2020

1 commit

56de4e8f9 perf: Make struct ring_buffer less ambiguous ... Browse Code »

eBPF requires needing to know the size of the perf ring buffer structure.
But it unfortunately has the same name as the generic ring buffer used by
tracing and oprofile. To make it less ambiguous, rename the perf ring buffer
structure to "perf_buffer".

As other parts of the ring buffer code has "perf_" as the prefix, it only
makes sense to give the ring buffer the "perf_" prefix as well.

Link: https://lore.kernel.org/r/20191213153553.GE20583@krava
Acked-by: Peter Zijlstra
Suggested-by: Alexei Starovoitov
Signed-off-by: Steven Rostedt (VMware)

Steven Rostedt (VMware)
2020-01-14 02:19:38 +0800

27 Nov, 2019

1 commit

3f59dbcac Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull perf updates from Ingo Molnar:
"The main kernel side changes in this cycle were:

- Various Intel-PT updates and optimizations (Alexander Shishkin)

- Prohibit kprobes on Xen/KVM emulate prefixes (Masami Hiramatsu)

- Add support for LSM and SELinux checks to control access to the
perf syscall (Joel Fernandes)

- Misc other changes, optimizations, fixes and cleanups - see the
shortlog for details.

There were numerous tooling changes as well - 254 non-merge commits.
Here are the main changes - too many to list in detail:

- Enhancements to core tooling infrastructure, perf.data, libperf,
libtraceevent, event parsing, vendor events, Intel PT, callchains,
BPF support and instruction decoding.

- There were updates to the following tools:

perf annotate
perf diff
perf inject
perf kvm
perf list
perf maps
perf parse
perf probe
perf record
perf report
perf script
perf stat
perf test
perf trace

- And a lot of other changes: please see the shortlog and Git log for
more details"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (279 commits)
perf parse: Fix potential memory leak when handling tracepoint errors
perf probe: Fix spelling mistake "addrees" -> "address"
libtraceevent: Fix memory leakage in copy_filter_type
libtraceevent: Fix header installation
perf intel-bts: Does not support AUX area sampling
perf intel-pt: Add support for decoding AUX area samples
perf intel-pt: Add support for recording AUX area samples
perf pmu: When using default config, record which bits of config were changed by the user
perf auxtrace: Add support for queuing AUX area samples
perf session: Add facility to peek at all events
perf auxtrace: Add support for dumping AUX area samples
perf inject: Cut AUX area samples
perf record: Add aux-sample-size config term
perf record: Add support for AUX area sampling
perf auxtrace: Add support for AUX area sample recording
perf auxtrace: Move perf_evsel__find_pmu()
perf record: Add a function to test for kernel support for AUX area sampling
perf tools: Add kernel AUX area sampling definitions
perf/core: Make the mlock accounting simple again
perf report: Jump to symbol source view from total cycles view
...

Linus Torvalds
2019-11-27 07:04:47 +0800

21 Nov, 2019

1 commit

46f4f0aab Merge branch 'kvm-tsx-ctrl' into HEAD ... Browse Code »

Conflicts:
arch/x86/kvm/vmx/vmx.c

Paolo Bonzini
2019-11-21 19:03:40 +0800

15 Nov, 2019

2 commits

52ba4b0b9 perf/core: Provide a kernel-internal interface to pause perf_event ... Browse Code »

Exporting perf_event_pause() as an external accessor for kernel users (such
as KVM) who may do both disable perf_event and read count with just one
time to hold perf_event_ctx_lock. Also the value could be reset optionally.

Suggested-by: Peter Zijlstra
Signed-off-by: Like Xu
Acked-by: Peter Zijlstra
Signed-off-by: Paolo Bonzini

Like Xu
2019-11-15 18:44:07 +0800
3ca270fc9 perf/core: Provide a kernel-internal interface to recalibrate event period ... Browse Code »

Currently, perf_event_period() is used by user tools via ioctl. Based on
naming convention, exporting perf_event_period() for kernel users (such
as KVM) who may recalibrate the event period for their assigned counter
according to their requirements.

The perf_event_period() is an external accessor, just like the
perf_event_{en,dis}able() and should thus use perf_event_ctx_lock().

Suggested-by: Kan Liang
Signed-off-by: Like Xu
Acked-by: Peter Zijlstra
Signed-off-by: Paolo Bonzini

Like Xu
2019-11-15 18:44:06 +0800

13 Nov, 2019

1 commit

a4faf00d9 perf/aux: Allow using AUX data in perf samples ... Browse Code »

AUX data can be used to annotate perf events such as performance counters
or tracepoints/breakpoints by including it in sample records when
PERF_SAMPLE_AUX flag is set. Such samples would be instrumental in debugging
and profiling by providing, for example, a history of instruction flow
leading up to the event's overflow.

The implementation makes use of grouping an AUX event with all the events
that wish to take samples of the AUX data, such that the former is the
group leader. The samplees should also specify the desired size of the AUX
sample via attr.aux_sample_size.

AUX capable PMUs need to explicitly add support for sampling, because it
relies on a new callback to take a snapshot of the buffer without touching
the event states.

Signed-off-by: Alexander Shishkin
Signed-off-by: Peter Zijlstra (Intel)
Cc: Arnaldo Carvalho de Melo
Cc: David Ahern
Cc: Jiri Olsa
Cc: Linus Torvalds
Cc: Mark Rutland
Cc: Namhyung Kim
Cc: Stephane Eranian
Cc: Thomas Gleixner
Cc: Vince Weaver
Cc: adrian.hunter@intel.com
Cc: mathieu.poirier@linaro.org
Link: https://lkml.kernel.org/r/20191025140835.53665-2-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar

Alexander Shishkin
2019-11-13 18:06:14 +0800

28 Oct, 2019

3 commits

fc1adfe30 perf/core, perf/x86: Introduce swap_task_ctx() method at 'struct pmu' ... Browse Code »

Declare swap_task_ctx() methods at the generic and x86 specific
pmu types to bridge calls to platform specific PMU code on optimized
context switch path between equivalent task perf event contexts.

Signed-off-by: Alexey Budankov
Signed-off-by: Peter Zijlstra (Intel)
Cc: Alexander Shishkin
Cc: Arnaldo Carvalho de Melo
Cc: Arnaldo Carvalho de Melo
Cc: Ian Rogers
Cc: Jiri Olsa
Cc: Kan Liang
Cc: Linus Torvalds
Cc: Mark Rutland
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Song Liu
Cc: Stephane Eranian
Cc: Thomas Gleixner
Cc: Vince Weaver
Link: https://lkml.kernel.org/r/9a0aa84a-f062-9b64-3133-373658550c4b@linux.intel.com
Signed-off-by: Ingo Molnar

Alexey Budankov
2019-10-28 19:50:59 +0800
65133033e Merge branch 'perf/urgent' into perf/core, to pick up fixes ... Browse Code »

Signed-off-by: Ingo Molnar

Ingo Molnar
2019-10-28 19:38:26 +0800
652521d46 perf/headers: Fix spelling s/EACCESS/EACCES/, s/privilidge/privilege/ ... Browse Code »

As per POSIX, the correct spelling of the error code is EACCES:

include/uapi/asm-generic/errno-base.h:#define EACCES 13 /* Permission denied */

Signed-off-by: Geert Uytterhoeven
Signed-off-by: Peter Zijlstra (Intel)
Cc: Alexander Shishkin
Cc: Arnaldo Carvalho de Melo
Cc: Arnaldo Carvalho de Melo
Cc: Jiri Kosina
Cc: Jiri Olsa
Cc: Linus Torvalds
Cc: Mark Rutland
Cc: Namhyung Kim
Cc: Stephane Eranian
Cc: Thomas Gleixner
Cc: Vince Weaver
Link: https://lkml.kernel.org/r/20191024122904.12463-1-geert+renesas@glider.be
Signed-off-by: Ingo Molnar

Geert Uytterhoeven
2019-10-28 18:02:01 +0800

18 Oct, 2019

1 commit

da97e1845 perf_event: Add support for LSM and SELinux checks ... Browse Code »

In current mainline, the degree of access to perf_event_open(2) system
call depends on the perf_event_paranoid sysctl. This has a number of
limitations:

1. The sysctl is only a single value. Many types of accesses are controlled
based on the single value thus making the control very limited and
coarse grained.
2. The sysctl is global, so if the sysctl is changed, then that means
all processes get access to perf_event_open(2) opening the door to
security issues.

This patch adds LSM and SELinux access checking which will be used in
Android to access perf_event_open(2) for the purposes of attaching BPF
programs to tracepoints, perf profiling and other operations from
userspace. These operations are intended for production systems.

5 new LSM hooks are added:
1. perf_event_open: This controls access during the perf_event_open(2)
syscall itself. The hook is called from all the places that the
perf_event_paranoid sysctl is checked to keep it consistent with the
systctl. The hook gets passed a 'type' argument which controls CPU,
kernel and tracepoint accesses (in this context, CPU, kernel and
tracepoint have the same semantics as the perf_event_paranoid sysctl).
Additionally, I added an 'open' type which is similar to
perf_event_paranoid sysctl == 3 patch carried in Android and several other
distros but was rejected in mainline [1] in 2016.

2. perf_event_alloc: This allocates a new security object for the event
which stores the current SID within the event. It will be useful when
the perf event's FD is passed through IPC to another process which may
try to read the FD. Appropriate security checks will limit access.

3. perf_event_free: Called when the event is closed.

4. perf_event_read: Called from the read(2) and mmap(2) syscalls for the event.

5. perf_event_write: Called from the ioctl(2) syscalls for the event.

[1] https://lwn.net/Articles/696240/

Since Peter had suggest LSM hooks in 2016 [1], I am adding his
Suggested-by tag below.

To use this patch, we set the perf_event_paranoid sysctl to -1 and then
apply selinux checking as appropriate (default deny everything, and then
add policy rules to give access to domains that need it). In the future
we can remove the perf_event_paranoid sysctl altogether.

Suggested-by: Peter Zijlstra
Co-developed-by: Peter Zijlstra
Signed-off-by: Joel Fernandes (Google)
Signed-off-by: Peter Zijlstra (Intel)
Acked-by: James Morris
Cc: Arnaldo Carvalho de Melo
Cc: rostedt@goodmis.org
Cc: Yonghong Song
Cc: Kees Cook
Cc: Ingo Molnar
Cc: Alexei Starovoitov
Cc: jeffv@google.com
Cc: Jiri Olsa
Cc: Daniel Borkmann
Cc: primiano@google.com
Cc: Song Liu
Cc: rsavitski@google.com
Cc: Namhyung Kim
Cc: Matthew Garrett
Link: https://lkml.kernel.org/r/20191014170308.70668-1-joel@joelfernandes.org

Joel Fernandes (Google)
2019-10-18 03:31:55 +0800

28 Aug, 2019

1 commit

ab43762ef perf: Allow normal events to output AUX data ... Browse Code »

In some cases, ordinary (non-AUX) events can generate data for AUX events.
For example, PEBS events can come out as records in the Intel PT stream
instead of their usual DS records, if configured to do so.

One requirement for such events is to consistently schedule together, to
ensure that the data from the "AUX output" events isn't lost while their
corresponding AUX event is not scheduled. We use grouping to provide this
guarantee: an "AUX output" event can be added to a group where an AUX event
is a group leader, and provided that the former supports writing to the
latter.

Signed-off-by: Alexander Shishkin
Signed-off-by: Peter Zijlstra (Intel)
Cc: Ingo Molnar
Cc: Arnaldo Carvalho de Melo
Cc: kan.liang@linux.intel.com
Link: https://lkml.kernel.org/r/20190806084606.4021-2-alexander.shishkin@linux.intel.com

Alexander Shishkin
2019-08-28 17:29:38 +0800

13 Jul, 2019

1 commit

8a58ddae2 perf/core: Fix exclusive events' grouping ... Browse Code »

So far, we tried to disallow grouping exclusive events for the fear of
complications they would cause with moving between contexts. Specifically,
moving a software group to a hardware context would violate the exclusivity
rules if both groups contain matching exclusive events.

This attempt was, however, unsuccessful: the check that we have in the
perf_event_open() syscall is both wrong (looks at wrong PMU) and
insufficient (group leader may still be exclusive), as can be illustrated
by running:

$ perf record -e '{intel_pt//,cycles}' uname
$ perf record -e '{cycles,intel_pt//}' uname

ultimately successfully.

Furthermore, we are completely free to trigger the exclusivity violation
by:

perf -e '{cycles,intel_pt//}' -e '{intel_pt//,instructions}'

even though the helpful perf record will not allow that, the ABI will.

The warning later in the perf_event_open() path will also not trigger, because
it's also wrong.

Fix all this by validating the original group before moving, getting rid
of broken safeguards and placing a useful one to perf_install_in_context().

Signed-off-by: Alexander Shishkin
Signed-off-by: Peter Zijlstra (Intel)
Cc:
Cc: Arnaldo Carvalho de Melo
Cc: Jiri Olsa
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Thomas Gleixner
Cc: Vince Weaver
Cc: mathieu.poirier@linaro.org
Cc: will.deacon@arm.com
Fixes: bed5b25ad9c8a ("perf: Add a pmu capability for "exclusive" events")
Link: https://lkml.kernel.org/r/20190701110755.24646-1-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar

Alexander Shishkin
2019-07-13 17:21:28 +0800

09 Jul, 2019

1 commit

552a031ba Merge tag 'v5.2' into perf/core, to pick up fixes ... Browse Code »

Signed-off-by: Ingo Molnar

Ingo Molnar
2019-07-09 00:04:41 +0800

25 Jun, 2019

2 commits

fd7d55172 perf/cgroups: Don't rotate events for cgroups unnecessarily ... Browse Code »

Currently perf_rotate_context assumes that if the context's nr_events !=
nr_active a rotation is necessary for perf event multiplexing. With
cgroups, nr_events is the total count of events for all cgroups and
nr_active will not include events in a cgroup other than the current
task's. This makes rotation appear necessary for cgroups when it is not.

Add a perf_event_context flag that is set when rotation is necessary.
Clear the flag during sched_out and set it when a flexible sched_in
fails due to resources.

Signed-off-by: Ian Rogers
Signed-off-by: Peter Zijlstra (Intel)
Cc: Alexander Shishkin
Cc: Arnaldo Carvalho de Melo
Cc: Arnaldo Carvalho de Melo
Cc: Borislav Petkov
Cc: Jiri Olsa
Cc: Kan Liang
Cc: Linus Torvalds
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Thomas Gleixner
Cc: Vince Weaver
Link: https://lkml.kernel.org/r/20190601082722.44543-1-irogers@google.com
Signed-off-by: Ingo Molnar

Ian Rogers
2019-06-25 01:30:04 +0800
e321d02db perf/x86: Disable extended registers for non-supported PMUs ... Browse Code »

The perf fuzzer caused Skylake machine to crash:

[ 9680.085831] Call Trace:
[ 9680.088301]
[ 9680.090363] perf_output_sample_regs+0x43/0xa0
[ 9680.094928] perf_output_sample+0x3aa/0x7a0
[ 9680.099181] perf_event_output_forward+0x53/0x80
[ 9680.103917] __perf_event_overflow+0x52/0xf0
[ 9680.108266] ? perf_trace_run_bpf_submit+0xc0/0xc0
[ 9680.113108] perf_swevent_hrtimer+0xe2/0x150
[ 9680.117475] ? check_preempt_wakeup+0x181/0x230
[ 9680.122091] ? check_preempt_curr+0x62/0x90
[ 9680.126361] ? ttwu_do_wakeup+0x19/0x140
[ 9680.130355] ? try_to_wake_up+0x54/0x460
[ 9680.134366] ? reweight_entity+0x15b/0x1a0
[ 9680.138559] ? __queue_work+0x103/0x3f0
[ 9680.142472] ? update_dl_rq_load_avg+0x1cd/0x270
[ 9680.147194] ? timerqueue_del+0x1e/0x40
[ 9680.151092] ? __remove_hrtimer+0x35/0x70
[ 9680.155191] __hrtimer_run_queues+0x100/0x280
[ 9680.159658] hrtimer_interrupt+0x100/0x220
[ 9680.163835] smp_apic_timer_interrupt+0x6a/0x140
[ 9680.168555] apic_timer_interrupt+0xf/0x20
[ 9680.172756]

The XMM registers can only be collected by PEBS hardware events on the
platforms with PEBS baseline support, e.g. Icelake, not software/probe
events.

Add capabilities flag PERF_PMU_CAP_EXTENDED_REGS to indicate the PMU
which support extended registers. For X86, the extended registers are
XMM registers.

Add has_extended_regs() to check if extended registers are applied.

The generic code define the mask of extended registers as 0 if arch
headers haven't overridden it.

Originally-by: Peter Zijlstra (Intel)
Reported-by: Vince Weaver
Signed-off-by: Kan Liang
Signed-off-by: Peter Zijlstra (Intel)
Cc: Alexander Shishkin
Cc: Arnaldo Carvalho de Melo
Cc: Jiri Olsa
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Thomas Gleixner
Fixes: 878068ea270e ("perf/x86: Support outputting XMM registers")
Link: https://lkml.kernel.org/r/1559081314-9714-1-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Ingo Molnar

Kan Liang
2019-06-25 01:19:23 +0800

03 Jun, 2019

1 commit

f3a3a8257 perf/core: Add attr_groups_update into struct pmu ... Browse Code »

Adding attr_update attribute group into pmu, to allow
having multiple attribute groups for same group name.

This will allow us to update "events" or "format"
directories with attributes that depend on various
HW conditions.

For example having group_format_extra group that updates
"format" directory only if pmu version is 2 and higher:

static umode_t
exra_is_visible(struct kobject *kobj, struct attribute *attr, int i)
{
return x86_pmu.version >= 2 ? attr->mode : 0;
}

static struct attribute_group group_format_extra = {
.name = "format",
.is_visible = exra_is_visible,
};

Signed-off-by: Jiri Olsa
Signed-off-by: Peter Zijlstra (Intel)
Cc: Alexander Shishkin
Cc: Arnaldo Carvalho de Melo
Cc: Greg Kroah-Hartman
Cc: Linus Torvalds
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: https://lkml.kernel.org/r/20190512155518.21468-3-jolsa@kernel.org
Signed-off-by: Ingo Molnar

Jiri Olsa
2019-06-03 17:58:21 +0800

18 May, 2019

1 commit

0ef0fd351 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm ... Browse Code »

Pull KVM updates from Paolo Bonzini:
"ARM:
- support for SVE and Pointer Authentication in guests
- PMU improvements

POWER:
- support for direct access to the POWER9 XIVE interrupt controller
- memory and performance optimizations

x86:
- support for accessing memory not backed by struct page
- fixes and refactoring

Generic:
- dirty page tracking improvements"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (155 commits)
kvm: fix compilation on aarch64
Revert "KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU"
kvm: x86: Fix L1TF mitigation for shadow MMU
KVM: nVMX: Disable intercept for FS/GS base MSRs in vmcs02 when possible
KVM: PPC: Book3S: Remove useless checks in 'release' method of KVM device
KVM: PPC: Book3S HV: XIVE: Fix spelling mistake "acessing" -> "accessing"
KVM: PPC: Book3S HV: Make sure to load LPID for radix VCPUs
kvm: nVMX: Set nested_run_pending in vmx_set_nested_state after checks complete
tests: kvm: Add tests for KVM_SET_NESTED_STATE
KVM: nVMX: KVM_SET_NESTED_STATE - Tear down old EVMCS state before setting new state
tests: kvm: Add tests for KVM_CAP_MAX_VCPUS and KVM_CAP_MAX_CPU_ID
tests: kvm: Add tests to .gitignore
KVM: Introduce KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2
KVM: Fix kvm_clear_dirty_log_protect off-by-(minus-)one
KVM: Fix the bitmap range to copy during clear dirty
KVM: arm64: Fix ptrauth ID register masking logic
KVM: x86: use direct accessors for RIP and RSP
KVM: VMX: Use accessors for GPRs outside of dedicated caching logic
KVM: x86: Omit caching logic for always-available GPRs
kvm, x86: Properly check whether a pfn is an MMIO or not
...

Linus Torvalds
2019-05-18 01:33:30 +0800

07 May, 2019

1 commit

90489a72f Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull perf updates from Ingo Molnar:
"The main kernel changes were:

- add support for Intel's "adaptive PEBS v4" - which embedds LBS data
in PEBS records and can thus batch up and reduce the IRQ (NMI) rate
significantly - reducing overhead and making call-graph profiling
less intrusive.

- add Intel CPU core and uncore support updates for Tremont, Icelake,

- extend the x86 PMU constraints scheduler with 'constraint ranges'
to better support Icelake hw constraints,

- make x86 call-chain support work better with CONFIG_FRAME_POINTER=y

- misc other changes

Tooling changes:

- updates to the main tools: 'perf record', 'perf trace', 'perf
stat'

- updated Intel and S/390 vendor events

- libtraceevent updates

- misc other updates and fixes"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (69 commits)
perf/x86: Make perf callchains work without CONFIG_FRAME_POINTER
watchdog: Fix typo in comment
perf/x86/intel: Add Tremont core PMU support
perf/x86/intel/uncore: Add Intel Icelake uncore support
perf/x86/msr: Add Icelake support
perf/x86/intel/rapl: Add Icelake support
perf/x86/intel/cstate: Add Icelake support
perf/x86/intel: Add Icelake support
perf/x86: Support constraint ranges
perf/x86/lbr: Avoid reading the LBRs when adaptive PEBS handles them
perf/x86/intel: Support adaptive PEBS v4
perf/x86/intel/ds: Extract code of event update in short period
perf/x86/intel: Extract memory code PEBS parser for reuse
perf/x86: Support outputting XMM registers
perf/x86/intel: Force resched when TFA sysctl is modified
perf/core: Add perf_pmu_resched() as global function
perf/headers: Fix stale comment for struct perf_addr_filter
perf/core: Make perf_swevent_init_cpu() static
perf/x86: Add sanity checks to x86_schedule_events()
perf/x86: Optimize x86_schedule_events()
...

Linus Torvalds
2019-05-07 05:16:36 +0800

03 May, 2019

1 commit

72e830f68 perf/x86/intel/pt: Remove software double buffering PMU capability ... Browse Code »

Now that all AUX allocations are high-order by default, the software
double buffering PMU capability doesn't make sense any more, get rid
of it. In case some PMUs choose to opt out, we can re-introduce it.

Signed-off-by: Alexander Shishkin
Acked-by: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Jiri Olsa
Cc: Linus Torvalds
Cc: Stephane Eranian
Cc: Thomas Gleixner
Cc: Vince Weaver
Cc: adrian.hunter@intel.com
Link: http://lkml.kernel.org/r/20190503085536.24119-3-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar

Alexander Shishkin
2019-05-03 18:46:20 +0800

01 May, 2019

1 commit

8479e04e7 KVM: x86: Inject PMI for KVM guest ... Browse Code »

Inject a PMI for KVM guest when Intel PT working
in Host-Guest mode and Guest ToPA entry memory buffer
was completely filled.

Signed-off-by: Luwei Kang
Signed-off-by: Paolo Bonzini

Luwei Kang
2019-05-01 03:32:13 +0800