Eric Lee / smarc-fsl-linux-kernel

26 Mar, 2016

1 commit

be7635e72 arch, ftrace: for KASAN put hard/soft IRQ entries into separate sections ... Browse Code »

KASAN needs to know whether the allocation happens in an IRQ handler.
This lets us strip everything below the IRQ entry point to reduce the
number of unique stack traces needed to be stored.

Move the definition of __irq_entry to so that the
users don't need to pull in . Also introduce the
__softirq_entry macro which is similar to __irq_entry, but puts the
corresponding functions to the .softirqentry.text section.

Signed-off-by: Alexander Potapenko
Acked-by: Steven Rostedt
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Cc: Andrey Konovalov
Cc: Dmitry Vyukov
Cc: Andrey Ryabinin
Cc: Konstantin Serebryany
Cc: Dmitry Chernenkov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexander Potapenko
2016-03-26 07:37:42 +0800

25 Mar, 2016

1 commit

e46b4e2b4 Merge tag 'trace-v4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace ... Browse Code »

Pull tracing updates from Steven Rostedt:
"Nothing major this round. Mostly small clean ups and fixes.

Some visible changes:

- A new flag was added to distinguish traces done in NMI context.

- Preempt tracer now shows functions where preemption is disabled but
interrupts are still enabled.

Other notes:

- Updates were done to function tracing to allow better performance
with perf.

- Infrastructure code has been added to allow for a new histogram
feature for recording live trace event histograms that can be
configured by simple user commands. The feature itself was just
finished, but needs a round in linux-next before being pulled.

This only includes some infrastructure changes that will be needed"

* tag 'trace-v4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (22 commits)
tracing: Record and show NMI state
tracing: Fix trace_printk() to print when not using bprintk()
tracing: Remove redundant reset per-CPU buff in irqsoff tracer
x86: ftrace: Fix the misleading comment for arch/x86/kernel/ftrace.c
tracing: Fix crash from reading trace_pipe with sendfile
tracing: Have preempt(irqs)off trace preempt disabled functions
tracing: Fix return while holding a lock in register_tracer()
ftrace: Use kasprintf() in ftrace_profile_tracefs()
ftrace: Update dynamic ftrace calls only if necessary
ftrace: Make ftrace_hash_rec_enable return update bool
tracing: Fix typoes in code comment and printk in trace_nop.c
tracing, writeback: Replace cgroup path to cgroup ino
tracing: Use flags instead of bool in trigger structure
tracing: Add an unreg_all() callback to trigger commands
tracing: Add needs_rec flag to event triggers
tracing: Add a per-event-trigger 'paused' field
tracing: Add get_syscall_name()
tracing: Add event record param to trigger_ops.func()
tracing: Make event trigger functions available
tracing: Make ftrace_event_field checking functions available
...

Linus Torvalds
2016-03-25 01:52:25 +0800

23 Mar, 2016

3 commits

a395d6a7e kernel/...: convert pr_warning to pr_warn ... Browse Code »

Use the more common logging method with the eventual goal of removing
pr_warning altogether.

Miscellanea:

- Realign arguments
- Coalesce formats
- Add missing space between a few coalesced formats

Signed-off-by: Joe Perches
Acked-by: Rafael J. Wysocki [kernel/power/suspend.c]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joe Perches
2016-03-23 06:36:02 +0800
7e6867bf8 tracing: Record and show NMI state ... Browse Code »

The latency tracer format has a nice column to indicate IRQ state, but
this is not able to tell us about NMI state.

When tracing perf interrupt handlers (which often run in NMI context)
it is very useful to see how the events nest.

Link: http://lkml.kernel.org/r/20160318153022.105068893@infradead.org

Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Steven Rostedt

Peter Zijlstra
2016-03-23 06:04:10 +0800
3debb0a9d tracing: Fix trace_printk() to print when not using bprintk() ... Browse Code »

The trace_printk() code will allocate extra buffers if the compile detects
that a trace_printk() is used. To do this, the format of the trace_printk()
is saved to the __trace_printk_fmt section, and if that section is bigger
than zero, the buffers are allocated (along with a message that this has
happened).

If trace_printk() uses a format that is not a constant, and thus something
not guaranteed to be around when the print happens, the compiler optimizes
the fmt out, as it is not used, and the __trace_printk_fmt section is not
filled. This means the kernel will not allocate the special buffers needed
for the trace_printk() and the trace_printk() will not write anything to the
tracing buffer.

Adding a "__used" to the variable in the __trace_printk_fmt section will
keep it around, even though it is set to NULL. This will keep the string
from being printed in the debugfs/tracing/printk_formats section as it is
not needed.

Reported-by: Vlastimil Babka
Fixes: 07d777fe8c398 "tracing: Add percpu buffers for trace_printk()"
Cc: stable@vger.kernel.org # v3.5+
Signed-off-by: Steven Rostedt

Steven Rostedt (Red Hat)
2016-03-23 06:02:40 +0800

20 Mar, 2016

1 commit

1200b6809 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next ... Browse Code »

Pull networking updates from David Miller:
"Highlights:

1) Support more Realtek wireless chips, from Jes Sorenson.

2) New BPF types for per-cpu hash and arrap maps, from Alexei
Starovoitov.

3) Make several TCP sysctls per-namespace, from Nikolay Borisov.

4) Allow the use of SO_REUSEPORT in order to do per-thread processing
of incoming TCP/UDP connections. The muxing can be done using a
BPF program which hashes the incoming packet. From Craig Gallek.

5) Add a multiplexer for TCP streams, to provide a messaged based
interface. BPF programs can be used to determine the message
boundaries. From Tom Herbert.

6) Add 802.1AE MACSEC support, from Sabrina Dubroca.

7) Avoid factorial complexity when taking down an inetdev interface
with lots of configured addresses. We were doing things like
traversing the entire address less for each address removed, and
flushing the entire netfilter conntrack table for every address as
well.

8) Add and use SKB bulk free infrastructure, from Jesper Brouer.

9) Allow offloading u32 classifiers to hardware, and implement for
ixgbe, from John Fastabend.

10) Allow configuring IRQ coalescing parameters on a per-queue basis,
from Kan Liang.

11) Extend ethtool so that larger link mode masks can be supported.
From David Decotigny.

12) Introduce devlink, which can be used to configure port link types
(ethernet vs Infiniband, etc.), port splitting, and switch device
level attributes as a whole. From Jiri Pirko.

13) Hardware offload support for flower classifiers, from Amir Vadai.

14) Add "Local Checksum Offload". Basically, for a tunneled packet
the checksum of the outer header is 'constant' (because with the
checksum field filled into the inner protocol header, the payload
of the outer frame checksums to 'zero'), and we can take advantage
of that in various ways. From Edward Cree"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1548 commits)
bonding: fix bond_get_stats()
net: bcmgenet: fix dma api length mismatch
net/mlx4_core: Fix backward compatibility on VFs
phy: mdio-thunder: Fix some Kconfig typos
lan78xx: add ndo_get_stats64
lan78xx: handle statistics counter rollover
RDS: TCP: Remove unused constant
RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket
net: smc911x: convert pxa dma to dmaengine
team: remove duplicate set of flag IFF_MULTICAST
bonding: remove duplicate set of flag IFF_MULTICAST
net: fix a comment typo
ethernet: micrel: fix some error codes
ip_tunnels, bpf: define IP_TUNNEL_OPTS_MAX and use it
bpf, dst: add and use dst_tclassid helper
bpf: make skb->tc_classid also readable
net: mvneta: bm: clarify dependencies
cls_bpf: reset class and reuse major in da
ldmvsw: Checkpatch sunvnet.c and sunvnet_common.c
ldmvsw: Add ldmvsw.c driver code
...

Linus Torvalds
2016-03-20 01:05:34 +0800

19 Mar, 2016

3 commits

741f3a69f tracing: Remove redundant reset per-CPU buff in irqsoff tracer ... Browse Code »

There is no reason to do it twice: from commit b6f11df26fdc28
("trace: Call tracing_reset_online_cpus before tracer->init()")
resetting of per-CPU buffers done before tracer->init() call.

tracer->init() calls {irqs,preempt,preemptirqs}off_tracer_init() and it
calls __irqsoff_tracer_init(), which resets per-CPU ringbuffer second
time.
It's slowpath, but anyway.

Link: http://lkml.kernel.org/r/1445278226-16187-1-git-send-email-0x7f454c46@gmail.com

Signed-off-by: Dmitry Safonov
Signed-off-by: Steven Rostedt

Dmitry Safonov
2016-03-19 04:39:11 +0800
a29054d94 tracing: Fix crash from reading trace_pipe with sendfile ... Browse Code »

If tracing contains data and the trace_pipe file is read with sendfile(),
then it can trigger a NULL pointer dereference and various BUG_ON within the
VM code.

There's a patch to fix this in the splice_to_pipe() code, but it's also a
good idea to not let that happen from trace_pipe either.

Link: http://lkml.kernel.org/r/1457641146-9068-1-git-send-email-rabin@rab.in

Cc: stable@vger.kernel.org # 2.6.30+
Reported-by: Rabin Vincent
Signed-off-by: Steven Rostedt

Steven Rostedt (Red Hat)
2016-03-19 03:51:42 +0800
cb86e0539 tracing: Have preempt(irqs)off trace preempt disabled functions ... Browse Code »

Joel Fernandes reported that the function tracing of preempt disabled
sections was not being reported when running either the preemptirqsoff or
preemptoff tracers. This was due to the fact that the function tracer
callback for those tracers checked if irqs were disabled before tracing. But
this fails when we want to trace preempt off locations as well.

Joel explained that he wanted to see funcitons where interrupts are enabled
but preemption was disabled. The expected output he wanted:

-2265 1d.h1 3419us : preempt_count_sub -2265 1d..1 3419us : __do_softirq -2265 1d..1 3419us : msecs_to_jiffies -2265 1d..1 3420us : irqtime_account_irq -2265 1d..1 3420us : __local_bh_disable_ip -2265 1..s1 3421us : run_timer_softirq -2265 1..s1 3421us : hrtimer_run_pending -2265 1..s1 3421us : _raw_spin_lock_irq -2265 1d.s1 3422us : preempt_count_add -2265 1d.s2 3422us : _raw_spin_unlock_irq -2265 1..s2 3422us : preempt_count_sub -2265 1..s1 3423us : rcu_bh_qs -2265 1d.s1 3423us : irqtime_account_irq -2265 1d.s1 3423us : __local_bh_enable
Signed-off-by: Steven Rostedt

Steven Rostedt (Red Hat)
2016-03-19 00:47:38 +0800

18 Mar, 2016

4 commits

c8ca003b2 tracing: Fix return while holding a lock in register_tracer() ... Browse Code »

commit d39cdd2036a6 ("tracing: Make tracer_flags use the right set_flag
callback") introduces a potential mutex deadlock issue, as it forgets to
free the mutex when allocaing the tracer_flags gets fail.

The issue was found by Dan Carpenter through Smatch static code check tool.

Link: http://lkml.kernel.org/r/1457958941-30265-1-git-send-email-chuhu@redhat.com

Fixes: d39cdd2036a6 ("tracing: Make tracer_flags use the right set_flag callback")
Reported-by: Dan Carpenter
Signed-off-by: Chunyu Hu
Signed-off-by: Steven Rostedt

Chunyu Hu
2016-03-18 22:36:21 +0800
6363c6b59 ftrace: Use kasprintf() in ftrace_profile_tracefs() ... Browse Code »

Use kasprintf() instead of kmalloc() and snprintf().

Link: http://lkml.kernel.org/r/135a7bc36e51fd9eaa57124dd2140285b771f738.1458050835.git.geliangtang@163.com

Acked-by: Namhyung Kim
Signed-off-by: Geliang Tang
Signed-off-by: Steven Rostedt

Geliang Tang
2016-03-18 22:31:34 +0800
7f50d06bb ftrace: Update dynamic ftrace calls only if necessary ... Browse Code »

Currently dynamic ftrace calls are updated any time
the ftrace_ops is un/registered. If we do this update
only when it's needed, we save lot of time for perf
system wide ftrace function sampling/counting.

The reason is that for system wide sampling/counting,
perf creates event for each cpu in the system.

Each event then registers separate copy of ftrace_ops,
which ends up in FTRACE_UPDATE_CALLS updates. On servers
with many cpus that means serious stall (240 cpus server):

Counting:
# time ./perf stat -e ftrace:function -a sleep 1

Performance counter stats for 'system wide':

370,663 ftrace:function

1.401427505 seconds time elapsed

real 3m51.743s
user 0m0.023s
sys 3m48.569s

Sampling:
# time ./perf record -e ftrace:function -a sleep 1
[ perf record: Woken up 0 times to write data ]
Warning:
Processed 141200 events and lost 5 chunks!

[ perf record: Captured and wrote 10.703 MB perf.data (135950 samples) ]

real 2m31.429s
user 0m0.213s
sys 2m29.494s

There's no reason to do the FTRACE_UPDATE_CALLS update
for each event in perf case, because all the ftrace_ops
always share the same filter, so the updated calls are
always the same.

It's required that only first ftrace_ops registration
does the FTRACE_UPDATE_CALLS update (also sometimes
the second if the first one used the trampoline), but
the rest can be only cheaply linked into the ftrace_ops
list.

Counting:
# time ./perf stat -e ftrace:function -a sleep 1

Performance counter stats for 'system wide':

398,571 ftrace:function

1.377503733 seconds time elapsed

real 0m2.787s
user 0m0.005s
sys 0m1.883s

Sampling:
# time ./perf record -e ftrace:function -a sleep 1
[ perf record: Woken up 0 times to write data ]
Warning:
Processed 261730 events and lost 9 chunks!

[ perf record: Captured and wrote 19.907 MB perf.data (256293 samples) ]

real 1m31.948s
user 0m0.309s
sys 1m32.051s

Link: http://lkml.kernel.org/r/1458138873-1553-6-git-send-email-jolsa@kernel.org

Acked-by: Namhyung Kim
Signed-off-by: Jiri Olsa
Signed-off-by: Steven Rostedt

Jiri Olsa
2016-03-18 22:30:34 +0800
84b6d3e61 ftrace: Make ftrace_hash_rec_enable return update bool ... Browse Code »

Change __ftrace_hash_rec_update to return true in case
we need to update dynamic ftrace call records. It return
false in case no update is needed.

Link: http://lkml.kernel.org/r/1458138873-1553-5-git-send-email-jolsa@kernel.org

Acked-by: Namhyung Kim
Signed-off-by: Jiri Olsa
Signed-off-by: Steven Rostedt

Jiri Olsa
2016-03-18 22:30:15 +0800

17 Mar, 2016

1 commit

277edbabf Merge tag 'pm+acpi-4.6-rc1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm ... Browse Code »

Pull power management and ACPI updates from Rafael Wysocki:
"This time the majority of changes go into cpufreq and they are
significant.

First off, the way CPU frequency updates are triggered is different
now. Instead of having to set up and manage a deferrable timer for
each CPU in the system to evaluate and possibly change its frequency
periodically, cpufreq governors set up callbacks to be invoked by the
scheduler on a regular basis (basically on utilization updates). The
"old" governors, "ondemand" and "conservative", still do all of their
work in process context (although that is triggered by the scheduler
now), but intel_pstate does it all in the callback invoked by the
scheduler with no need for any additional asynchronous processing.

Of course, this eliminates the overhead related to the management of
all those timers, but also it allows the cpufreq governor code to be
simplified quite a bit. On top of that, the common code and data
structures used by the "ondemand" and "conservative" governors are
cleaned up and made more straightforward and some long-standing and
quite annoying problems are addressed. In particular, the handling of
governor sysfs attributes is modified and the related locking becomes
more fine grained which allows some concurrency problems to be avoided
(particularly deadlocks with the core cpufreq code).

In principle, the new mechanism for triggering frequency updates
allows utilization information to be passed from the scheduler to
cpufreq. Although the current code doesn't make use of it, in the
works is a new cpufreq governor that will make decisions based on the
scheduler's utilization data. That should allow the scheduler and
cpufreq to work more closely together in the long run.

In addition to the core and governor changes, cpufreq drivers are
updated too. Fixes and optimizations go into intel_pstate, the
cpufreq-dt driver is updated on top of some modification in the
Operating Performance Points (OPP) framework and there are fixes and
other updates in the powernv cpufreq driver.

Apart from the cpufreq updates there is some new ACPICA material,
including a fix for a problem introduced by previous ACPICA updates,
and some less significant changes in the ACPI code, like CPPC code
optimizations, ACPI processor driver cleanups and support for loading
ACPI tables from initrd.

Also updated are the generic power domains framework, the Intel RAPL
power capping driver and the turbostat utility and we have a bunch of
traditional assorted fixes and cleanups.

Specifics:

- Redesign of cpufreq governors and the intel_pstate driver to make
them use callbacks invoked by the scheduler to trigger CPU
frequency evaluation instead of using per-CPU deferrable timers for
that purpose (Rafael Wysocki).

- Reorganization and cleanup of cpufreq governor code to make it more
straightforward and fix some concurrency problems in it (Rafael
Wysocki, Viresh Kumar).

- Cleanup and improvements of locking in the cpufreq core (Viresh
Kumar).

- Assorted cleanups in the cpufreq core (Rafael Wysocki, Viresh
Kumar, Eric Biggers).

- intel_pstate driver updates including fixes, optimizations and a
modification to make it enable enable hardware-coordinated P-state
selection (HWP) by default if supported by the processor (Philippe
Longepe, Srinivas Pandruvada, Rafael Wysocki, Viresh Kumar, Felipe
Franciosi).

- Operating Performance Points (OPP) framework updates to improve its
handling of voltage regulators and device clocks and updates of the
cpufreq-dt driver on top of that (Viresh Kumar, Jon Hunter).

- Updates of the powernv cpufreq driver to fix initialization and
cleanup problems in it and correct its worker thread handling with
respect to CPU offline, new powernv_throttle tracepoint (Shilpasri
Bhat).

- ACPI cpufreq driver optimization and cleanup (Rafael Wysocki).

- ACPICA updates including one fix for a regression introduced by
previos changes in the ACPICA code (Bob Moore, Lv Zheng, David Box,
Colin Ian King).

- Support for installing ACPI tables from initrd (Lv Zheng).

- Optimizations of the ACPI CPPC code (Prashanth Prakash, Ashwin
Chaugule).

- Support for _HID(ACPI0010) devices (ACPI processor containers) and
ACPI processor driver cleanups (Sudeep Holla).

- Support for ACPI-based enumeration of the AMBA bus (Graeme Gregory,
Aleksey Makarov).

- Modification of the ACPI PCI IRQ management code to make it treat
255 in the Interrupt Line register as "not connected" on x86 (as
per the specification) and avoid attempts to use that value as a
valid interrupt vector (Chen Fan).

- ACPI APEI fixes related to resource leaks (Josh Hunt).

- Removal of modularity from a few ACPI drivers (BGRT, GHES,
intel_pmic_crc) that cannot be built as modules in practice (Paul
Gortmaker).

- PNP framework update to make it treat ACPI_RESOURCE_TYPE_SERIAL_BUS
as a valid resource type (Harb Abdulhamid).

- New device ID (future AMD I2C controller) in the ACPI driver for
AMD SoCs (APD) and in the designware I2C driver (Xiangliang Yu).

- Assorted ACPI cleanups (Colin Ian King, Kaiyen Chang, Oleg Drokin).

- cpuidle menu governor optimization to avoid a square root
computation in it (Rasmus Villemoes).

- Fix for potential use-after-free in the generic device properties
framework (Heikki Krogerus).

- Updates of the generic power domains (genpd) framework including
support for multiple power states of a domain, fixes and debugfs
output improvements (Axel Haslam, Jon Hunter, Laurent Pinchart,
Geert Uytterhoeven).

- Intel RAPL power capping driver updates to reduce IPI overhead in
it (Jacob Pan).

- System suspend/hibernation code cleanups (Eric Biggers, Saurabh
Sengar).

- Year 2038 fix for the process freezer (Abhilash Jindal).

- turbostat utility updates including new features (decoding of more
registers and CPUID fields, sub-second intervals support, GFX MHz
and RC6 printout, --out command line option), fixes (syscall jitter
detection and workaround, reductioin of the number of syscalls
made, fixes related to Xeon x200 processors, compiler warning
fixes) and cleanups (Len Brown, Hubert Chrzaniuk, Chen Yu)"

* tag 'pm+acpi-4.6-rc1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (182 commits)
tools/power turbostat: bugfix: TDP MSRs print bits fixing
tools/power turbostat: correct output for MSR_NHM_SNB_PKG_CST_CFG_CTL dump
tools/power turbostat: call __cpuid() instead of __get_cpuid()
tools/power turbostat: indicate SMX and SGX support
tools/power turbostat: detect and work around syscall jitter
tools/power turbostat: show GFX%rc6
tools/power turbostat: show GFXMHz
tools/power turbostat: show IRQs per CPU
tools/power turbostat: make fewer systems calls
tools/power turbostat: fix compiler warnings
tools/power turbostat: add --out option for saving output in a file
tools/power turbostat: re-name "%Busy" field to "Busy%"
tools/power turbostat: Intel Xeon x200: fix turbo-ratio decoding
tools/power turbostat: Intel Xeon x200: fix erroneous bclk value
tools/power turbostat: allow sub-sec intervals
ACPI / APEI: ERST: Fixed leaked resources in erst_init
ACPI / APEI: Fix leaked resources
intel_pstate: Do not skip samples partially
intel_pstate: Remove freq calculation from intel_pstate_calc_busy()
intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance()
...

Linus Torvalds
2016-03-17 05:10:53 +0800

15 Mar, 2016

1 commit

e71c2c1ee Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull perf updates from Ingo Molnar:
"Main kernel side changes:

- Big reorganization of the x86 perf support code. The old code grew
organically deep inside arch/x86/kernel/cpu/perf* and its naming
became somewhat messy.

The new location is under arch/x86/events/, using the following
cleaner hierarchy of source code files:

perf/x86: Move perf_event.c .................. => x86/events/core.c
perf/x86: Move perf_event_amd.c .............. => x86/events/amd/core.c
perf/x86: Move perf_event_amd_ibs.c .......... => x86/events/amd/ibs.c
perf/x86: Move perf_event_amd_iommu.[ch] ..... => x86/events/amd/iommu.[ch]
perf/x86: Move perf_event_amd_uncore.c ....... => x86/events/amd/uncore.c
perf/x86: Move perf_event_intel_bts.c ........ => x86/events/intel/bts.c
perf/x86: Move perf_event_intel.c ............ => x86/events/intel/core.c
perf/x86: Move perf_event_intel_cqm.c ........ => x86/events/intel/cqm.c
perf/x86: Move perf_event_intel_cstate.c ..... => x86/events/intel/cstate.c
perf/x86: Move perf_event_intel_ds.c ......... => x86/events/intel/ds.c
perf/x86: Move perf_event_intel_lbr.c ........ => x86/events/intel/lbr.c
perf/x86: Move perf_event_intel_pt.[ch] ...... => x86/events/intel/pt.[ch]
perf/x86: Move perf_event_intel_rapl.c ....... => x86/events/intel/rapl.c
perf/x86: Move perf_event_intel_uncore.[ch] .. => x86/events/intel/uncore.[ch]
perf/x86: Move perf_event_intel_uncore_nhmex.c => x86/events/intel/uncore_nmhex.c
perf/x86: Move perf_event_intel_uncore_snb.c => x86/events/intel/uncore_snb.c
perf/x86: Move perf_event_intel_uncore_snbep.c => x86/events/intel/uncore_snbep.c
perf/x86: Move perf_event_knc.c .............. => x86/events/intel/knc.c
perf/x86: Move perf_event_p4.c ............... => x86/events/intel/p4.c
perf/x86: Move perf_event_p6.c ............... => x86/events/intel/p6.c
perf/x86: Move perf_event_msr.c .............. => x86/events/msr.c

(Borislav Petkov)

- Update various x86 PMU constraint and hw support details (Stephane
Eranian)

- Optimize kprobes for BPF execution (Martin KaFai Lau)

- Rewrite, refactor and fix the Intel uncore PMU driver code (Thomas
Gleixner)

- Rewrite, refactor and fix the Intel RAPL PMU code (Thomas Gleixner)

- Various fixes and smaller cleanups.

There are lots of perf tooling updates as well. A few highlights:

perf report/top:

- Hierarchy histogram mode for 'perf top' and 'perf report',
showing multiple levels, one per --sort entry: (Namhyung Kim)

On a mostly idle system:

# perf top --hierarchy -s comm,dso

Then expand some levels and use 'P' to take a snapshot:

# cat perf.hist.0
- 92.32% perf
58.20% perf
22.29% libc-2.22.so
5.97% [kernel]
4.18% libelf-0.165.so
1.69% [unknown]
- 4.71% qemu-system-x86
3.10% [kernel]
1.60% qemu-system-x86_64 (deleted)
+ 2.97% swapper
#

- Add 'L' hotkey to dynamicly set the percent threshold for
histogram entries and callchains, i.e. dynamicly do what the
--percent-limit command line option to 'top' and 'report' does.
(Namhyung Kim)

perf mem:

- Allow specifying events via -e in 'perf mem record', also listing
what events can be specified via 'perf mem record -e list' (Jiri
Olsa)

perf record:

- Add 'perf record' --all-user/--all-kernel options, so that one
can tell that all the events in the command line should be
restricted to the user or kernel levels (Jiri Olsa), i.e.:

perf record -e cycles:u,instructions:u

is equivalent to:

perf record --all-user -e cycles,instructions

- Make 'perf record' collect CPU cache info in the perf.data file header:

$ perf record usleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.017 MB perf.data (7 samples) ]
$ perf report --header-only -I | tail -10 | head -8
# CPU cache info:
# L1 Data 32K [0-1]
# L1 Instruction 32K [0-1]
# L1 Data 32K [2-3]
# L1 Instruction 32K [2-3]
# L2 Unified 256K [0-1]
# L2 Unified 256K [2-3]
# L3 Unified 4096K [0-3]

Will be used in 'perf c2c' and eventually in 'perf diff' to
allow, for instance running the same workload in multiple
machines and then when using 'diff' show the hardware difference.
(Jiri Olsa)

- Improved support for Java, using the JVMTI agent library to do
jitdumps that then will be inserted in synthesized
PERF_RECORD_MMAP2 events via 'perf inject' pointed to synthesized
ELF files stored in ~/.debug and keyed with build-ids, to allow
symbol resolution and even annotation with source line info, see
the changeset comments to see how to use it (Stephane Eranian)

perf script/trace:

- Decode data_src values (e.g. perf.data files generated by 'perf
mem record') in 'perf script': (Jiri Olsa)

# perf script
perf 693 [1] 4.088652: 1 cpu/mem-loads,ldlat=30/P: ffff88007d0b0f40 68100142 L1 hit|SNP None|TLB L1 or L2 hit|LCK No
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- Improve support to 'data_src', 'weight' and 'addr' fields in
'perf script' (Jiri Olsa)

- Handle empty print fmts in 'perf script -s' i.e. when running
python or perl scripts (Taeung Song)

perf stat:

- 'perf stat' now shows shadow metrics (insn per cycle, etc) in
interval mode too. E.g:

# perf stat -I 1000 -e instructions,cycles sleep 1
# time counts unit events
1.000215928 519,620 instructions # 0.69 insn per cycle
1.000215928 752,003 cycles

- Port 'perf kvm stat' to PowerPC (Hemant Kumar)

- Implement CSV metrics output in 'perf stat' (Andi Kleen)

perf BPF support:

- Support converting data from bpf events in 'perf data' (Wang Nan)

- Print bpf-output events in 'perf script': (Wang Nan).

# perf record -e bpf-output/no-inherit,name=evt/ -e ./test_bpf_output_3.c/map:channel.event=evt/ usleep 1000
# perf script
usleep 4882 21384.532523: evt: ffffffff810e97d1 sys_nanosleep ([kernel.kallsyms])
BPF output: 0000: 52 61 69 73 65 20 61 20 Raise a
0008: 42 50 46 20 65 76 65 6e BPF even
0010: 74 21 00 00 t!..
BPF string: "Raise a BPF event!"
#

- Add API to set values of map entries in a BPF object, be it
individual map slots or ranges (Wang Nan)

- Introduce support for the 'bpf-output' event (Wang Nan)

- Add glue to read perf events in a BPF program (Wang Nan)

- Improve support for bpf-output events in 'perf trace' (Wang Nan)

... and tons of other changes as well - see the shortlog and git log
for details!"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (342 commits)
perf stat: Add --metric-only support for -A
perf stat: Implement --metric-only mode
perf stat: Document CSV format in manpage
perf hists browser: Check sort keys before hot key actions
perf hists browser: Allow thread filtering for comm sort key
perf tools: Add sort__has_comm variable
perf tools: Recalc total periods using top-level entries in hierarchy
perf tools: Remove nr_sort_keys field
perf hists browser: Cleanup hist_browser__fprintf_hierarchy_entry()
perf tools: Remove hist_entry->fmt field
perf tools: Fix command line filters in hierarchy mode
perf tools: Add more sort entry check functions
perf tools: Fix hist_entry__filter() for hierarchy
perf jitdump: Build only on supported archs
tools lib traceevent: Add '~' operation within arg_num_eval()
perf tools: Omit unnecessary cast in perf_pmu__parse_scale
perf tools: Pass perf_hpp_list all the way through setup_sort_list
perf tools: Fix perf script python database export crash
perf jitdump: DWARF is also needed
perf bench mem: Prepare the x86-64 build for upstream memcpy_mcsafe() changes
...

Linus Torvalds
2016-03-15 08:58:53 +0800

14 Mar, 2016

1 commit

4ed390042 Merge branch 'pm-cpufreq' ... Browse Code »

* pm-cpufreq: (94 commits)
intel_pstate: Do not skip samples partially
intel_pstate: Remove freq calculation from intel_pstate_calc_busy()
intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance()
intel_pstate: Optimize calculation for max/min_perf_adj
intel_pstate: Remove extra conversions in pid calculation
cpufreq: Move scheduler-related code to the sched directory
Revert "cpufreq: postfix policy directory with the first CPU in related_cpus"
cpufreq: Reduce cpufreq_update_util() overhead a bit
cpufreq: Select IRQ_WORK if CPU_FREQ_GOV_COMMON is set
cpufreq: Remove 'policy->governor_enabled'
cpufreq: Rename __cpufreq_governor() to cpufreq_governor()
cpufreq: Relocate handle_update() to kill its declaration
cpufreq: governor: Drop unnecessary checks from show() and store()
cpufreq: governor: Fix race in dbs_update_util_handler()
cpufreq: governor: Make gov_set_update_util() static
cpufreq: governor: Narrow down the dbs_data_mutex coverage
cpufreq: governor: Make dbs_data_mutex static
cpufreq: governor: Relocate definitions of tuners structures
cpufreq: governor: Move per-CPU data to the common code
cpufreq: governor: Make governor private data per-policy
...

Rafael J. Wysocki
2016-03-14 21:22:03 +0800

09 Mar, 2016

12 commits

b121d1e74 bpf: prevent kprobe+bpf deadlocks ... Browse Code »

if kprobe is placed within update or delete hash map helpers
that hold bucket spin lock and triggered bpf program is trying to
grab the spinlock for the same bucket on the same cpu, it will
deadlock.
Fix it by extending existing recursion prevention mechanism.

Note, map_lookup and other tracing helpers don't have this problem,
since they don't hold any locks and don't modify global data.
bpf_trace_printk has its own recursive check and ok as well.

Signed-off-by: Alexei Starovoitov
Acked-by: Daniel Borkmann
Signed-off-by: David S. Miller

Alexei Starovoitov
2016-03-09 04:28:30 +0800
810813c47 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Several cases of overlapping changes, as well as one instance
(vxlan) of a bug fix in 'net' overlapping with code movement
in 'net-next'.

Signed-off-by: David S. Miller

David S. Miller
2016-03-09 01:34:12 +0800
1cf8067b5 tracing: Fix typoes in code comment and printk in trace_nop.c ... Browse Code »

echo nop > /sys/kernel/debug/tracing/options/current_tracer
echo 1 > /sys/kernel/debug/tracing/options/test_nop_accept
echo 0 > /sys/kernel/debug/tracing/options/test_nop_accept
echo 1 > /sys/kernel/debug/tracing/options/test_nop_refuse

Before the fix, the dmesg is a bit ugly since a align issue.

[ 191.973081] nop_test_accept flag set to 1: we accept. Now cat trace_options to see the result
[ 195.156942] nop_test_refuse flag set to 1: we refuse.Now cat trace_options to see the result

After the fix, the dmesg will show aligned log for nop_test_refuse and nop_test_accept.

[ 2718.032413] nop_test_refuse flag set to 1: we refuse. Now cat trace_options to see the result
[ 2734.253360] nop_test_accept flag set to 1: we accept. Now cat trace_options to see the result

Link: http://lkml.kernel.org/r/1457444222-8654-2-git-send-email-chuhu@redhat.com

Signed-off-by: Chunyu Hu
Signed-off-by: Steven Rostedt

Chunyu Hu
2016-03-09 00:23:57 +0800
353206f5c tracing: Use flags instead of bool in trigger structure ... Browse Code »

gcc isn't known for handling bool in structures. Instead of using bool, use
an integer mask and use bit flags instead.

Signed-off-by: Steven Rostedt

Steven Rostedt (Red Hat)
2016-03-09 00:19:36 +0800
a88e1cfb1 tracing: Add an unreg_all() callback to trigger commands ... Browse Code »

Add a new unreg_all() callback that can be used to remove all
command-specific triggers from an event and arrange to have it called
whenever a trigger file is opened with O_TRUNC set.

Commands that don't want truncate semantics, or existing commands that
don't implement this function simply do nothing and their triggers
remain intact.

Link: http://lkml.kernel.org/r/2b7d62854d01f28c19185e1bbb8f826f385edfba.1449767187.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi
Reviewed-by: Namhyung Kim
Signed-off-by: Steven Rostedt

Tom Zanussi
2016-03-09 00:19:35 +0800
a5863dae8 tracing: Add needs_rec flag to event triggers ... Browse Code »

Add a new needs_rec flag for triggers that require unconditional
access to trace records in order to function.

Normally a trigger requires access to the contents of a trace record
only if it has a filter associated with it (since filters need the
contents of a record in order to make a filtering decision). Some
types of triggers, such as 'hist' triggers, require access to trace
record contents independent of the presence of filters, so add a new
flag for those triggers.

Link: http://lkml.kernel.org/r/7be8fa38f9b90fdb6c47ca0f98d20a07b9fd512b.1449767187.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi
Tested-by: Masami Hiramatsu
Reviewed-by: Namhyung Kim
Signed-off-by: Steven Rostedt

Tom Zanussi
2016-03-09 00:19:34 +0800
104f28104 tracing: Add a per-event-trigger 'paused' field ... Browse Code »

Add a simple per-trigger 'paused' flag, allowing individual triggers
to pause. We could leave it to individual triggers that need this
functionality to do it themselves, but we also want to allow other
events to control pausing, so add it to the trigger data.

Link: http://lkml.kernel.org/r/fed37e4879684d7dcc57fe00ce0cbf170032b06d.1449767187.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi
Tested-by: Masami Hiramatsu
Reviewed-by: Namhyung Kim
Signed-off-by: Steven Rostedt

Tom Zanussi
2016-03-09 00:19:33 +0800
dbfeaa7ab tracing: Add get_syscall_name() ... Browse Code »

Add a utility function to grab the syscall name from the syscall
metadata, given a syscall id.

Link: http://lkml.kernel.org/r/be26a8dfe3f15e16a837799f1c1e2b4d62742843.1449767187.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi
Tested-by: Masami Hiramatsu
Reviewed-by: Namhyung Kim
Signed-off-by: Steven Rostedt

Tom Zanussi
2016-03-09 00:19:32 +0800
c4a592305 tracing: Add event record param to trigger_ops.func() ... Browse Code »

Some triggers may need access to the trace event, so pass it in. Also
fix up the existing trigger funcs and their callers.

Link: http://lkml.kernel.org/r/543e31e9fc445ef61077421ab219033401c39846.1449767187.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi
Tested-by: Masami Hiramatsu
Reviewed-by: Namhyung Kim
Signed-off-by: Steven Rostedt

Tom Zanussi
2016-03-09 00:19:31 +0800
ab4bf0089 tracing: Make event trigger functions available ... Browse Code »

Make various event trigger utility functions available outside of
trace_events_trigger.c so that new triggers can be defined outside of
that file.

Link: http://lkml.kernel.org/r/4a40c1695dd43cac6cd475d72e13ffe30ba84bff.1449767187.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi
Tested-by: Masami Hiramatsu
Reviewed-by: Namhyung Kim
Signed-off-by: Steven Rostedt

Tom Zanussi
2016-03-09 00:19:30 +0800
4ef56902f tracing: Make ftrace_event_field checking functions available ... Browse Code »

Make is_string_field() and is_function_field() accessible outside of
trace_event_filters.c for other users of ftrace_event_fields.

Link: http://lkml.kernel.org/r/2d3f00d3311702e556e82eed7754bae6f017939f.1449767187.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi
Reviewed-by: Masami Hiramatsu
Tested-by: Masami Hiramatsu
Reviewed-by: Namhyung Kim
Signed-off-by: Steven Rostedt

Tom Zanussi
2016-03-09 00:19:29 +0800
d39cdd203 tracing: Make tracer_flags use the right set_flag callback ... Browse Code »

When I was updating the ftrace_stress test of ltp. I encountered
a strange phenomemon, excute following steps:

echo nop > /sys/kernel/debug/tracing/current_tracer
echo 0 > /sys/kernel/debug/tracing/options/funcgraph-cpu
bash: echo: write error: Invalid argument

check dmesg:
[ 1024.903855] nop_test_refuse flag set to 0: we refuse.Now cat trace_options to see the result

The reason is that the trace option test will randomly setup trace
option under tracing/options no matter what the current_tracer is.
but the set_tracer_option is always using the set_flag callback
from the current_tracer. This patch adds a pointer to tracer_flags
and make it point to the tracer it belongs to. When the option is
setup, the set_flag of the right tracer will be used no matter
what the the current_tracer is.

And the old dummy_tracer_flags is used for all the tracers which
doesn't have a tracer_flags, having issue to use it to save the
pointer of a tracer. So remove it and use dynamic dummy tracer_flags
for tracers needing a dummy tracer_flags, as a result, there are no
tracers sharing tracer_flags, so remove the check code.

And save the current tracer to trace_option_dentry seems not good as
it may waste mem space when mount the debug/trace fs more than one time.

Link: http://lkml.kernel.org/r/1457444222-8654-1-git-send-email-chuhu@redhat.com

Signed-off-by: Chunyu Hu
[ Fixed up function tracer options to work with the change ]
Signed-off-by: Steven Rostedt

Chunyu Hu
2016-03-09 00:19:08 +0800

05 Mar, 2016

1 commit

78baab7aa Merge tag 'trace-fixes-v4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/gi… ... Browse Code »

…t/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:
"A feature was added in 4.3 that allowed users to filter trace points
on a tasks "comm" field. But this prevented filtering on a comm field
that is within a trace event (like sched_migrate_task).

When trying to filter on when a program migrated, this change
prevented the filtering of the sched_migrate_task.

To fix this, the event fields are examined first, and then the extra
fields like "comm" and "cpu" are examined. Also, instead of testing
to assign the comm filter function based on the field's name, the
generic comm field is given a new filter type (FILTER_COMM). When
this field is used to filter the type is checked. The same is done
for the cpu filter field.

Two new special filter types are added: "COMM" and "CPU". This allows
users to still filter the tasks comm for events that have "comm" as
one of their fields, in cases that users would like to filter
sched_migrate_task on the comm of the task that called the event, and
not the comm of the task that is being migrated"

* tag 'trace-fixes-v4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Do not have 'comm' filter override event 'comm' field

Linus Torvalds
2016-03-05 08:57:04 +0800

04 Mar, 2016

1 commit

e57cbaf0e tracing: Do not have 'comm' filter override event 'comm' field ... Browse Code »

Commit 9f61668073a8d "tracing: Allow triggers to filter for CPU ids and
process names" added a 'comm' filter that will filter events based on the
current tasks struct 'comm'. But this now hides the ability to filter events
that have a 'comm' field too. For example, sched_migrate_task trace event.
That has a 'comm' field of the task to be migrated.

echo 'comm == "bash"' > events/sched_migrate_task/filter

will now filter all sched_migrate_task events for tasks named "bash" that
migrates other tasks (in interrupt context), instead of seeing when "bash"
itself gets migrated.

This fix requires a couple of changes.

1) Change the look up order for filter predicates to look at the events
fields before looking at the generic filters.

2) Instead of basing the filter function off of the "comm" name, have the
generic "comm" filter have its own filter_type (FILTER_COMM). Test
against the type instead of the name to assign the filter function.

3) Add a new "COMM" filter that works just like "comm" but will filter based
on the current task, even if the trace event contains a "comm" field.

Do the same for "cpu" field, adding a FILTER_CPU and a filter "CPU".

Cc: stable@vger.kernel.org # v4.3+
Fixes: 9f61668073a8d "tracing: Allow triggers to filter for CPU ids and process names"
Reported-by: Matt Fleming
Signed-off-by: Steven Rostedt

Steven Rostedt (Red Hat)
2016-03-04 22:57:10 +0800

29 Feb, 2016

2 commits

026842d14 tracing/syscalls: Rename "/format" tracepoint field name "nr" to "__syscall_nr: ... Browse Code »

Some tracepoint have multiple fields with the same name, "nr", the first
one is a unique syscall ID, the other is a syscall argument:

# cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_io_getevents/format
name: sys_enter_io_getevents
ID: 747
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;

field:int nr; offset:8; size:4; signed:1;
field:aio_context_t ctx_id; offset:16; size:8; signed:0;
field:long min_nr; offset:24; size:8; signed:0;
field:long nr; offset:32; size:8; signed:0;
field:struct io_event * events; offset:40; size:8; signed:0;
field:struct timespec * timeout; offset:48; size:8; signed:0;

print fmt: "ctx_id: 0x%08lx, min_nr: 0x%08lx, nr: 0x%08lx, events: 0x%08lx, timeout: 0x%08lx", ((unsigned long)(REC->ctx_id)), ((unsigned long)(REC->min_nr)), ((unsigned long)(REC->nr)), ((unsigned long)(REC->events)), ((unsigned long)(REC->timeout))
#

Fix it by renaming the "/format" common tracepoint field "nr" to "__syscall_nr".

Signed-off-by: Taeung Song
[ Do not rename the struct member, just the '/format' field name ]
Signed-off-by: Steven Rostedt
Acked-by: Peter Zijlstra
Cc: Jiri Olsa
Cc: Lai Jiangshan
Cc: Namhyung Kim
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20160226132301.3ae065a4@gandalf.local.home
Signed-off-by: Arnaldo Carvalho de Melo

Taeung Song
2016-02-29 22:34:53 +0800
0a7348925 Merge tag 'v4.5-rc6' into perf/core, to pick up fixes ... Browse Code »

Signed-off-by: Ingo Molnar

Ingo Molnar
2016-02-29 16:04:01 +0800

26 Feb, 2016

1 commit

5bb9871eb Merge tag 'trace-fixes-v4.5-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/… ... Browse Code »

…git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:
"Another small bug reported to me by Chunyu Hu.

When perf added a "reg" function to the function tracing event (not a
tracepoint), it caused that event to be displayed as a tracepoint and
could cause errors in tracepoint handling. That was solved by adding
a flag to ignore ftrace non-tracepoint events. But that flag was
missed when displaying events in available_events, which should only
contain tracepoint events.

This broke a documented way to enable all events with:

cat available_events > set_event

As the function non-tracepoint event would cause that to error out.
The commit here fixes that by having the available_events file not
list events that have the ignore flag set"

* tag 'trace-fixes-v4.5-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix showing function event in available_events

Linus Torvalds
2016-02-26 12:12:09 +0800

24 Feb, 2016

1 commit

d045437a1 tracing: Fix showing function event in available_events ... Browse Code »

The ftrace:function event is only displayed for parsing the function tracer
data. It is not used to enable function tracing, and does not include an
"enable" file in its event directory.

Originally, this event was kept separate from other events because it did
not have a ->reg parameter. But perf added a "reg" parameter for its use
which caused issues, because it made the event available to functions where
it was not compatible for.

Commit 9b63776fa3ca9 "tracing: Do not enable function event with enable"
added a TRACE_EVENT_FL_IGNORE_ENABLE flag that prevented the function event
from being enabled by normal trace events. But this commit missed keeping
the function event from being displayed by the "available_events" directory,
which is used to show what events can be enabled by set_event.

One documented way to enable all events is to:

cat available_events > set_event

But because the function event is displayed in the available_events, this
now causes an INVALID error:

cat: write error: Invalid argument

Reported-by: Chunyu Hu
Fixes: 9b63776fa3ca9 "tracing: Do not enable function event with enable"
Cc: stable@vger.kernel.org # 3.4+
Signed-off-by: Steven Rostedt

Steven Rostedt (Red Hat)
2016-02-24 22:17:11 +0800

23 Feb, 2016

2 commits

b63335311 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/net/phy/bcm7xxx.c
drivers/net/phy/marvell.c
drivers/net/vxlan.c

All three conflicts were cases of simple overlapping changes.

Signed-off-by: David S. Miller

David S. Miller
2016-02-23 13:09:14 +0800
4de8ebeff Merge tag 'trace-fixes-v4.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/gi… ... Browse Code »

…t/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
"Two more small fixes.

One is by Yang Shi who added a READ_ONCE_NOCHECK() to the scan of the
stack made by the stack tracer. As the stack tracer scans the entire
kernel stack, KASAN triggers seeing it as a "stack out of bounds"
error. As the scan is looking at the contents of the stack from
parent functions. The NOCHECK() tells KASAN that this is done on
purpose, and is not some kind of stack overflow.

The second fix is to the ftrace selftests, to retrieve the PID of
executed commands from the shell with '$!' and not by parsing 'jobs'"

* tag 'trace-fixes-v4.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing, kasan: Silence Kasan warning in check_stack of stack_tracer
ftracetest: Fix instance test to use proper shell command for pids

Linus Torvalds
2016-02-23 06:09:18 +0800

20 Feb, 2016

2 commits

d5a3b1f69 bpf: introduce BPF_MAP_TYPE_STACK_TRACE ... Browse Code »

add new map type to store stack traces and corresponding helper
bpf_get_stackid(ctx, map, flags) - walk user or kernel stack and return id
@ctx: struct pt_regs*
@map: pointer to stack_trace map
@flags: bits 0-7 - numer of stack frames to skip
bit 8 - collect user stack instead of kernel
bit 9 - compare stacks by hash only
bit 10 - if two different stacks hash into the same stackid
discard old
other bits - reserved
Return: >= 0 stackid on success or negative error

stackid is a 32-bit integer handle that can be further combined with
other data (including other stackid) and used as a key into maps.

Userspace will access stackmap using standard lookup/delete syscall commands to
retrieve full stack trace for given stackid.

Signed-off-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Alexei Starovoitov
2016-02-20 13:21:44 +0800
6e22c8366 tracing, kasan: Silence Kasan warning in check_stack of stack_tracer ... Browse Code »

When enabling stack trace via "echo 1 > /proc/sys/kernel/stack_tracer_enabled",
the below KASAN warning is triggered:

BUG: KASAN: stack-out-of-bounds in check_stack+0x344/0x848 at addr ffffffc0689ebab8
Read of size 8 by task ksoftirqd/4/29
page:ffffffbdc3a27ac0 count:0 mapcount:0 mapping: (null) index:0x0
flags: 0x0()
page dumped because: kasan: bad access detected
CPU: 4 PID: 29 Comm: ksoftirqd/4 Not tainted 4.5.0-rc1 #129
Hardware name: Freescale Layerscape 2085a RDB Board (DT)
Call trace:
[] dump_backtrace+0x0/0x3a0
[] show_stack+0x24/0x30
[] dump_stack+0xd8/0x168
[] kasan_report_error+0x6a0/0x920
[] kasan_report+0x70/0xb8
[] __asan_load8+0x60/0x78
[] check_stack+0x344/0x848
[] stack_trace_call+0x1c4/0x370
[] ftrace_ops_no_ops+0x2c0/0x590
[] ftrace_graph_call+0x0/0x14
[] fpsimd_thread_switch+0x24/0x1e8
[] __switch_to+0x34/0x218
[] __schedule+0x3ac/0x15b8
[] schedule+0x5c/0x178
[] smpboot_thread_fn+0x350/0x960
[] kthread+0x1d8/0x2b0
[] ret_from_fork+0x10/0x40
Memory state around the buggy address:
ffffffc0689eb980: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 f4 f4 f4
ffffffc0689eba00: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
>ffffffc0689eba80: 00 00 f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3 f3 00 00
^
ffffffc0689ebb00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffffffc0689ebb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

The stacker tracer traverses the whole kernel stack when saving the max stack
trace. It may touch the stack red zones to cause the warning. So, just disable
the instrumentation to silence the warning.

Link: http://lkml.kernel.org/r/1455309960-18930-1-git-send-email-yang.shi@linaro.org

Signed-off-by: Yang Shi
Signed-off-by: Steven Rostedt

Yang Shi
2016-02-20 01:36:44 +0800

19 Feb, 2016

1 commit

705d43dbe Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching ... Browse Code »

Pull livepatching fixes from Jiri Kosina:

- regression (from 4.4) fix for ordering issue, introduced by an
earlier ftrace change, that broke live patching of modules.

The fix replaces the ftrace module notifier by direct call in order
to make the ordering guaranteed and well-defined. The patch, from
Jessica Yu, has been acked both by Steven and Rusty

- error message fix from Miroslav Benes

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
ftrace/module: remove ftrace module notifier
livepatch: change the error message in asm/livepatch.h header files

Linus Torvalds
2016-02-19 08:34:15 +0800

18 Feb, 2016

1 commit

7dcd182be ftrace/module: remove ftrace module notifier ... Browse Code »

Remove the ftrace module notifier in favor of directly calling
ftrace_module_enable() and ftrace_release_mod() in the module loader.
Hard-coding the function calls directly in the module loader removes
dependence on the module notifier call chain and provides better
visibility and control over what gets called when, which is important
to kernel utilities such as livepatch.

This fixes a notifier ordering issue in which the ftrace module notifier
(and hence ftrace_module_enable()) for coming modules was being called
after klp_module_notify(), which caused livepatch modules to initialize
incorrectly. This patch removes dependence on the module notifier call
chain in favor of hard coding the corresponding function calls in the
module loader. This ensures that ftrace and livepatch code get called in
the correct order on patch module load and unload.

Fixes: 5156dca34a3e ("ftrace: Fix the race between ftrace and insmod")
Signed-off-by: Jessica Yu
Reviewed-by: Steven Rostedt
Reviewed-by: Petr Mladek
Acked-by: Rusty Russell
Reviewed-by: Josh Poimboeuf
Reviewed-by: Miroslav Benes
Signed-off-by: Jiri Kosina

Jessica Yu
2016-02-18 05:14:06 +0800