26 Mar, 2016

1 commit

  • KASAN needs to know whether the allocation happens in an IRQ handler.
    This lets us strip everything below the IRQ entry point to reduce the
    number of unique stack traces needed to be stored.

    Move the definition of __irq_entry to so that the
    users don't need to pull in . Also introduce the
    __softirq_entry macro which is similar to __irq_entry, but puts the
    corresponding functions to the .softirqentry.text section.

    Signed-off-by: Alexander Potapenko
    Acked-by: Steven Rostedt
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Andrey Konovalov
    Cc: Dmitry Vyukov
    Cc: Andrey Ryabinin
    Cc: Konstantin Serebryany
    Cc: Dmitry Chernenkov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Potapenko
     

25 Mar, 2016

1 commit

  • Pull tracing updates from Steven Rostedt:
    "Nothing major this round. Mostly small clean ups and fixes.

    Some visible changes:

    - A new flag was added to distinguish traces done in NMI context.

    - Preempt tracer now shows functions where preemption is disabled but
    interrupts are still enabled.

    Other notes:

    - Updates were done to function tracing to allow better performance
    with perf.

    - Infrastructure code has been added to allow for a new histogram
    feature for recording live trace event histograms that can be
    configured by simple user commands. The feature itself was just
    finished, but needs a round in linux-next before being pulled.

    This only includes some infrastructure changes that will be needed"

    * tag 'trace-v4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (22 commits)
    tracing: Record and show NMI state
    tracing: Fix trace_printk() to print when not using bprintk()
    tracing: Remove redundant reset per-CPU buff in irqsoff tracer
    x86: ftrace: Fix the misleading comment for arch/x86/kernel/ftrace.c
    tracing: Fix crash from reading trace_pipe with sendfile
    tracing: Have preempt(irqs)off trace preempt disabled functions
    tracing: Fix return while holding a lock in register_tracer()
    ftrace: Use kasprintf() in ftrace_profile_tracefs()
    ftrace: Update dynamic ftrace calls only if necessary
    ftrace: Make ftrace_hash_rec_enable return update bool
    tracing: Fix typoes in code comment and printk in trace_nop.c
    tracing, writeback: Replace cgroup path to cgroup ino
    tracing: Use flags instead of bool in trigger structure
    tracing: Add an unreg_all() callback to trigger commands
    tracing: Add needs_rec flag to event triggers
    tracing: Add a per-event-trigger 'paused' field
    tracing: Add get_syscall_name()
    tracing: Add event record param to trigger_ops.func()
    tracing: Make event trigger functions available
    tracing: Make ftrace_event_field checking functions available
    ...

    Linus Torvalds
     

23 Mar, 2016

3 commits

  • Use the more common logging method with the eventual goal of removing
    pr_warning altogether.

    Miscellanea:

    - Realign arguments
    - Coalesce formats
    - Add missing space between a few coalesced formats

    Signed-off-by: Joe Perches
    Acked-by: Rafael J. Wysocki [kernel/power/suspend.c]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • The latency tracer format has a nice column to indicate IRQ state, but
    this is not able to tell us about NMI state.

    When tracing perf interrupt handlers (which often run in NMI context)
    it is very useful to see how the events nest.

    Link: http://lkml.kernel.org/r/20160318153022.105068893@infradead.org

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Steven Rostedt

    Peter Zijlstra
     
  • The trace_printk() code will allocate extra buffers if the compile detects
    that a trace_printk() is used. To do this, the format of the trace_printk()
    is saved to the __trace_printk_fmt section, and if that section is bigger
    than zero, the buffers are allocated (along with a message that this has
    happened).

    If trace_printk() uses a format that is not a constant, and thus something
    not guaranteed to be around when the print happens, the compiler optimizes
    the fmt out, as it is not used, and the __trace_printk_fmt section is not
    filled. This means the kernel will not allocate the special buffers needed
    for the trace_printk() and the trace_printk() will not write anything to the
    tracing buffer.

    Adding a "__used" to the variable in the __trace_printk_fmt section will
    keep it around, even though it is set to NULL. This will keep the string
    from being printed in the debugfs/tracing/printk_formats section as it is
    not needed.

    Reported-by: Vlastimil Babka
    Fixes: 07d777fe8c398 "tracing: Add percpu buffers for trace_printk()"
    Cc: stable@vger.kernel.org # v3.5+
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

20 Mar, 2016

1 commit

  • Pull networking updates from David Miller:
    "Highlights:

    1) Support more Realtek wireless chips, from Jes Sorenson.

    2) New BPF types for per-cpu hash and arrap maps, from Alexei
    Starovoitov.

    3) Make several TCP sysctls per-namespace, from Nikolay Borisov.

    4) Allow the use of SO_REUSEPORT in order to do per-thread processing
    of incoming TCP/UDP connections. The muxing can be done using a
    BPF program which hashes the incoming packet. From Craig Gallek.

    5) Add a multiplexer for TCP streams, to provide a messaged based
    interface. BPF programs can be used to determine the message
    boundaries. From Tom Herbert.

    6) Add 802.1AE MACSEC support, from Sabrina Dubroca.

    7) Avoid factorial complexity when taking down an inetdev interface
    with lots of configured addresses. We were doing things like
    traversing the entire address less for each address removed, and
    flushing the entire netfilter conntrack table for every address as
    well.

    8) Add and use SKB bulk free infrastructure, from Jesper Brouer.

    9) Allow offloading u32 classifiers to hardware, and implement for
    ixgbe, from John Fastabend.

    10) Allow configuring IRQ coalescing parameters on a per-queue basis,
    from Kan Liang.

    11) Extend ethtool so that larger link mode masks can be supported.
    From David Decotigny.

    12) Introduce devlink, which can be used to configure port link types
    (ethernet vs Infiniband, etc.), port splitting, and switch device
    level attributes as a whole. From Jiri Pirko.

    13) Hardware offload support for flower classifiers, from Amir Vadai.

    14) Add "Local Checksum Offload". Basically, for a tunneled packet
    the checksum of the outer header is 'constant' (because with the
    checksum field filled into the inner protocol header, the payload
    of the outer frame checksums to 'zero'), and we can take advantage
    of that in various ways. From Edward Cree"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1548 commits)
    bonding: fix bond_get_stats()
    net: bcmgenet: fix dma api length mismatch
    net/mlx4_core: Fix backward compatibility on VFs
    phy: mdio-thunder: Fix some Kconfig typos
    lan78xx: add ndo_get_stats64
    lan78xx: handle statistics counter rollover
    RDS: TCP: Remove unused constant
    RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket
    net: smc911x: convert pxa dma to dmaengine
    team: remove duplicate set of flag IFF_MULTICAST
    bonding: remove duplicate set of flag IFF_MULTICAST
    net: fix a comment typo
    ethernet: micrel: fix some error codes
    ip_tunnels, bpf: define IP_TUNNEL_OPTS_MAX and use it
    bpf, dst: add and use dst_tclassid helper
    bpf: make skb->tc_classid also readable
    net: mvneta: bm: clarify dependencies
    cls_bpf: reset class and reuse major in da
    ldmvsw: Checkpatch sunvnet.c and sunvnet_common.c
    ldmvsw: Add ldmvsw.c driver code
    ...

    Linus Torvalds
     

19 Mar, 2016

3 commits

  • There is no reason to do it twice: from commit b6f11df26fdc28
    ("trace: Call tracing_reset_online_cpus before tracer->init()")
    resetting of per-CPU buffers done before tracer->init() call.

    tracer->init() calls {irqs,preempt,preemptirqs}off_tracer_init() and it
    calls __irqsoff_tracer_init(), which resets per-CPU ringbuffer second
    time.
    It's slowpath, but anyway.

    Link: http://lkml.kernel.org/r/1445278226-16187-1-git-send-email-0x7f454c46@gmail.com

    Signed-off-by: Dmitry Safonov
    Signed-off-by: Steven Rostedt

    Dmitry Safonov
     
  • If tracing contains data and the trace_pipe file is read with sendfile(),
    then it can trigger a NULL pointer dereference and various BUG_ON within the
    VM code.

    There's a patch to fix this in the splice_to_pipe() code, but it's also a
    good idea to not let that happen from trace_pipe either.

    Link: http://lkml.kernel.org/r/1457641146-9068-1-git-send-email-rabin@rab.in

    Cc: stable@vger.kernel.org # 2.6.30+
    Reported-by: Rabin Vincent
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • Joel Fernandes reported that the function tracing of preempt disabled
    sections was not being reported when running either the preemptirqsoff or
    preemptoff tracers. This was due to the fact that the function tracer
    callback for those tracers checked if irqs were disabled before tracing. But
    this fails when we want to trace preempt off locations as well.

    Joel explained that he wanted to see funcitons where interrupts are enabled
    but preemption was disabled. The expected output he wanted:

    -2265 1d.h1 3419us : preempt_count_sub -2265 1d..1 3419us : __do_softirq -2265 1d..1 3419us : msecs_to_jiffies -2265 1d..1 3420us : irqtime_account_irq -2265 1d..1 3420us : __local_bh_disable_ip -2265 1..s1 3421us : run_timer_softirq -2265 1..s1 3421us : hrtimer_run_pending -2265 1..s1 3421us : _raw_spin_lock_irq -2265 1d.s1 3422us : preempt_count_add -2265 1d.s2 3422us : _raw_spin_unlock_irq -2265 1..s2 3422us : preempt_count_sub -2265 1..s1 3423us : rcu_bh_qs -2265 1d.s1 3423us : irqtime_account_irq -2265 1d.s1 3423us : __local_bh_enable
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

18 Mar, 2016

4 commits

  • commit d39cdd2036a6 ("tracing: Make tracer_flags use the right set_flag
    callback") introduces a potential mutex deadlock issue, as it forgets to
    free the mutex when allocaing the tracer_flags gets fail.

    The issue was found by Dan Carpenter through Smatch static code check tool.

    Link: http://lkml.kernel.org/r/1457958941-30265-1-git-send-email-chuhu@redhat.com

    Fixes: d39cdd2036a6 ("tracing: Make tracer_flags use the right set_flag callback")
    Reported-by: Dan Carpenter
    Signed-off-by: Chunyu Hu
    Signed-off-by: Steven Rostedt

    Chunyu Hu
     
  • Use kasprintf() instead of kmalloc() and snprintf().

    Link: http://lkml.kernel.org/r/135a7bc36e51fd9eaa57124dd2140285b771f738.1458050835.git.geliangtang@163.com

    Acked-by: Namhyung Kim
    Signed-off-by: Geliang Tang
    Signed-off-by: Steven Rostedt

    Geliang Tang
     
  • Currently dynamic ftrace calls are updated any time
    the ftrace_ops is un/registered. If we do this update
    only when it's needed, we save lot of time for perf
    system wide ftrace function sampling/counting.

    The reason is that for system wide sampling/counting,
    perf creates event for each cpu in the system.

    Each event then registers separate copy of ftrace_ops,
    which ends up in FTRACE_UPDATE_CALLS updates. On servers
    with many cpus that means serious stall (240 cpus server):

    Counting:
    # time ./perf stat -e ftrace:function -a sleep 1

    Performance counter stats for 'system wide':

    370,663 ftrace:function

    1.401427505 seconds time elapsed

    real 3m51.743s
    user 0m0.023s
    sys 3m48.569s

    Sampling:
    # time ./perf record -e ftrace:function -a sleep 1
    [ perf record: Woken up 0 times to write data ]
    Warning:
    Processed 141200 events and lost 5 chunks!

    [ perf record: Captured and wrote 10.703 MB perf.data (135950 samples) ]

    real 2m31.429s
    user 0m0.213s
    sys 2m29.494s

    There's no reason to do the FTRACE_UPDATE_CALLS update
    for each event in perf case, because all the ftrace_ops
    always share the same filter, so the updated calls are
    always the same.

    It's required that only first ftrace_ops registration
    does the FTRACE_UPDATE_CALLS update (also sometimes
    the second if the first one used the trampoline), but
    the rest can be only cheaply linked into the ftrace_ops
    list.

    Counting:
    # time ./perf stat -e ftrace:function -a sleep 1

    Performance counter stats for 'system wide':

    398,571 ftrace:function

    1.377503733 seconds time elapsed

    real 0m2.787s
    user 0m0.005s
    sys 0m1.883s

    Sampling:
    # time ./perf record -e ftrace:function -a sleep 1
    [ perf record: Woken up 0 times to write data ]
    Warning:
    Processed 261730 events and lost 9 chunks!

    [ perf record: Captured and wrote 19.907 MB perf.data (256293 samples) ]

    real 1m31.948s
    user 0m0.309s
    sys 1m32.051s

    Link: http://lkml.kernel.org/r/1458138873-1553-6-git-send-email-jolsa@kernel.org

    Acked-by: Namhyung Kim
    Signed-off-by: Jiri Olsa
    Signed-off-by: Steven Rostedt

    Jiri Olsa
     
  • Change __ftrace_hash_rec_update to return true in case
    we need to update dynamic ftrace call records. It return
    false in case no update is needed.

    Link: http://lkml.kernel.org/r/1458138873-1553-5-git-send-email-jolsa@kernel.org

    Acked-by: Namhyung Kim
    Signed-off-by: Jiri Olsa
    Signed-off-by: Steven Rostedt

    Jiri Olsa
     

17 Mar, 2016

1 commit

  • Pull power management and ACPI updates from Rafael Wysocki:
    "This time the majority of changes go into cpufreq and they are
    significant.

    First off, the way CPU frequency updates are triggered is different
    now. Instead of having to set up and manage a deferrable timer for
    each CPU in the system to evaluate and possibly change its frequency
    periodically, cpufreq governors set up callbacks to be invoked by the
    scheduler on a regular basis (basically on utilization updates). The
    "old" governors, "ondemand" and "conservative", still do all of their
    work in process context (although that is triggered by the scheduler
    now), but intel_pstate does it all in the callback invoked by the
    scheduler with no need for any additional asynchronous processing.

    Of course, this eliminates the overhead related to the management of
    all those timers, but also it allows the cpufreq governor code to be
    simplified quite a bit. On top of that, the common code and data
    structures used by the "ondemand" and "conservative" governors are
    cleaned up and made more straightforward and some long-standing and
    quite annoying problems are addressed. In particular, the handling of
    governor sysfs attributes is modified and the related locking becomes
    more fine grained which allows some concurrency problems to be avoided
    (particularly deadlocks with the core cpufreq code).

    In principle, the new mechanism for triggering frequency updates
    allows utilization information to be passed from the scheduler to
    cpufreq. Although the current code doesn't make use of it, in the
    works is a new cpufreq governor that will make decisions based on the
    scheduler's utilization data. That should allow the scheduler and
    cpufreq to work more closely together in the long run.

    In addition to the core and governor changes, cpufreq drivers are
    updated too. Fixes and optimizations go into intel_pstate, the
    cpufreq-dt driver is updated on top of some modification in the
    Operating Performance Points (OPP) framework and there are fixes and
    other updates in the powernv cpufreq driver.

    Apart from the cpufreq updates there is some new ACPICA material,
    including a fix for a problem introduced by previous ACPICA updates,
    and some less significant changes in the ACPI code, like CPPC code
    optimizations, ACPI processor driver cleanups and support for loading
    ACPI tables from initrd.

    Also updated are the generic power domains framework, the Intel RAPL
    power capping driver and the turbostat utility and we have a bunch of
    traditional assorted fixes and cleanups.

    Specifics:

    - Redesign of cpufreq governors and the intel_pstate driver to make
    them use callbacks invoked by the scheduler to trigger CPU
    frequency evaluation instead of using per-CPU deferrable timers for
    that purpose (Rafael Wysocki).

    - Reorganization and cleanup of cpufreq governor code to make it more
    straightforward and fix some concurrency problems in it (Rafael
    Wysocki, Viresh Kumar).

    - Cleanup and improvements of locking in the cpufreq core (Viresh
    Kumar).

    - Assorted cleanups in the cpufreq core (Rafael Wysocki, Viresh
    Kumar, Eric Biggers).

    - intel_pstate driver updates including fixes, optimizations and a
    modification to make it enable enable hardware-coordinated P-state
    selection (HWP) by default if supported by the processor (Philippe
    Longepe, Srinivas Pandruvada, Rafael Wysocki, Viresh Kumar, Felipe
    Franciosi).

    - Operating Performance Points (OPP) framework updates to improve its
    handling of voltage regulators and device clocks and updates of the
    cpufreq-dt driver on top of that (Viresh Kumar, Jon Hunter).

    - Updates of the powernv cpufreq driver to fix initialization and
    cleanup problems in it and correct its worker thread handling with
    respect to CPU offline, new powernv_throttle tracepoint (Shilpasri
    Bhat).

    - ACPI cpufreq driver optimization and cleanup (Rafael Wysocki).

    - ACPICA updates including one fix for a regression introduced by
    previos changes in the ACPICA code (Bob Moore, Lv Zheng, David Box,
    Colin Ian King).

    - Support for installing ACPI tables from initrd (Lv Zheng).

    - Optimizations of the ACPI CPPC code (Prashanth Prakash, Ashwin
    Chaugule).

    - Support for _HID(ACPI0010) devices (ACPI processor containers) and
    ACPI processor driver cleanups (Sudeep Holla).

    - Support for ACPI-based enumeration of the AMBA bus (Graeme Gregory,
    Aleksey Makarov).

    - Modification of the ACPI PCI IRQ management code to make it treat
    255 in the Interrupt Line register as "not connected" on x86 (as
    per the specification) and avoid attempts to use that value as a
    valid interrupt vector (Chen Fan).

    - ACPI APEI fixes related to resource leaks (Josh Hunt).

    - Removal of modularity from a few ACPI drivers (BGRT, GHES,
    intel_pmic_crc) that cannot be built as modules in practice (Paul
    Gortmaker).

    - PNP framework update to make it treat ACPI_RESOURCE_TYPE_SERIAL_BUS
    as a valid resource type (Harb Abdulhamid).

    - New device ID (future AMD I2C controller) in the ACPI driver for
    AMD SoCs (APD) and in the designware I2C driver (Xiangliang Yu).

    - Assorted ACPI cleanups (Colin Ian King, Kaiyen Chang, Oleg Drokin).

    - cpuidle menu governor optimization to avoid a square root
    computation in it (Rasmus Villemoes).

    - Fix for potential use-after-free in the generic device properties
    framework (Heikki Krogerus).

    - Updates of the generic power domains (genpd) framework including
    support for multiple power states of a domain, fixes and debugfs
    output improvements (Axel Haslam, Jon Hunter, Laurent Pinchart,
    Geert Uytterhoeven).

    - Intel RAPL power capping driver updates to reduce IPI overhead in
    it (Jacob Pan).

    - System suspend/hibernation code cleanups (Eric Biggers, Saurabh
    Sengar).

    - Year 2038 fix for the process freezer (Abhilash Jindal).

    - turbostat utility updates including new features (decoding of more
    registers and CPUID fields, sub-second intervals support, GFX MHz
    and RC6 printout, --out command line option), fixes (syscall jitter
    detection and workaround, reductioin of the number of syscalls
    made, fixes related to Xeon x200 processors, compiler warning
    fixes) and cleanups (Len Brown, Hubert Chrzaniuk, Chen Yu)"

    * tag 'pm+acpi-4.6-rc1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (182 commits)
    tools/power turbostat: bugfix: TDP MSRs print bits fixing
    tools/power turbostat: correct output for MSR_NHM_SNB_PKG_CST_CFG_CTL dump
    tools/power turbostat: call __cpuid() instead of __get_cpuid()
    tools/power turbostat: indicate SMX and SGX support
    tools/power turbostat: detect and work around syscall jitter
    tools/power turbostat: show GFX%rc6
    tools/power turbostat: show GFXMHz
    tools/power turbostat: show IRQs per CPU
    tools/power turbostat: make fewer systems calls
    tools/power turbostat: fix compiler warnings
    tools/power turbostat: add --out option for saving output in a file
    tools/power turbostat: re-name "%Busy" field to "Busy%"
    tools/power turbostat: Intel Xeon x200: fix turbo-ratio decoding
    tools/power turbostat: Intel Xeon x200: fix erroneous bclk value
    tools/power turbostat: allow sub-sec intervals
    ACPI / APEI: ERST: Fixed leaked resources in erst_init
    ACPI / APEI: Fix leaked resources
    intel_pstate: Do not skip samples partially
    intel_pstate: Remove freq calculation from intel_pstate_calc_busy()
    intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance()
    ...

    Linus Torvalds
     

15 Mar, 2016

1 commit

  • Pull perf updates from Ingo Molnar:
    "Main kernel side changes:

    - Big reorganization of the x86 perf support code. The old code grew
    organically deep inside arch/x86/kernel/cpu/perf* and its naming
    became somewhat messy.

    The new location is under arch/x86/events/, using the following
    cleaner hierarchy of source code files:

    perf/x86: Move perf_event.c .................. => x86/events/core.c
    perf/x86: Move perf_event_amd.c .............. => x86/events/amd/core.c
    perf/x86: Move perf_event_amd_ibs.c .......... => x86/events/amd/ibs.c
    perf/x86: Move perf_event_amd_iommu.[ch] ..... => x86/events/amd/iommu.[ch]
    perf/x86: Move perf_event_amd_uncore.c ....... => x86/events/amd/uncore.c
    perf/x86: Move perf_event_intel_bts.c ........ => x86/events/intel/bts.c
    perf/x86: Move perf_event_intel.c ............ => x86/events/intel/core.c
    perf/x86: Move perf_event_intel_cqm.c ........ => x86/events/intel/cqm.c
    perf/x86: Move perf_event_intel_cstate.c ..... => x86/events/intel/cstate.c
    perf/x86: Move perf_event_intel_ds.c ......... => x86/events/intel/ds.c
    perf/x86: Move perf_event_intel_lbr.c ........ => x86/events/intel/lbr.c
    perf/x86: Move perf_event_intel_pt.[ch] ...... => x86/events/intel/pt.[ch]
    perf/x86: Move perf_event_intel_rapl.c ....... => x86/events/intel/rapl.c
    perf/x86: Move perf_event_intel_uncore.[ch] .. => x86/events/intel/uncore.[ch]
    perf/x86: Move perf_event_intel_uncore_nhmex.c => x86/events/intel/uncore_nmhex.c
    perf/x86: Move perf_event_intel_uncore_snb.c => x86/events/intel/uncore_snb.c
    perf/x86: Move perf_event_intel_uncore_snbep.c => x86/events/intel/uncore_snbep.c
    perf/x86: Move perf_event_knc.c .............. => x86/events/intel/knc.c
    perf/x86: Move perf_event_p4.c ............... => x86/events/intel/p4.c
    perf/x86: Move perf_event_p6.c ............... => x86/events/intel/p6.c
    perf/x86: Move perf_event_msr.c .............. => x86/events/msr.c

    (Borislav Petkov)

    - Update various x86 PMU constraint and hw support details (Stephane
    Eranian)

    - Optimize kprobes for BPF execution (Martin KaFai Lau)

    - Rewrite, refactor and fix the Intel uncore PMU driver code (Thomas
    Gleixner)

    - Rewrite, refactor and fix the Intel RAPL PMU code (Thomas Gleixner)

    - Various fixes and smaller cleanups.

    There are lots of perf tooling updates as well. A few highlights:

    perf report/top:

    - Hierarchy histogram mode for 'perf top' and 'perf report',
    showing multiple levels, one per --sort entry: (Namhyung Kim)

    On a mostly idle system:

    # perf top --hierarchy -s comm,dso

    Then expand some levels and use 'P' to take a snapshot:

    # cat perf.hist.0
    - 92.32% perf
    58.20% perf
    22.29% libc-2.22.so
    5.97% [kernel]
    4.18% libelf-0.165.so
    1.69% [unknown]
    - 4.71% qemu-system-x86
    3.10% [kernel]
    1.60% qemu-system-x86_64 (deleted)
    + 2.97% swapper
    #

    - Add 'L' hotkey to dynamicly set the percent threshold for
    histogram entries and callchains, i.e. dynamicly do what the
    --percent-limit command line option to 'top' and 'report' does.
    (Namhyung Kim)

    perf mem:

    - Allow specifying events via -e in 'perf mem record', also listing
    what events can be specified via 'perf mem record -e list' (Jiri
    Olsa)

    perf record:

    - Add 'perf record' --all-user/--all-kernel options, so that one
    can tell that all the events in the command line should be
    restricted to the user or kernel levels (Jiri Olsa), i.e.:

    perf record -e cycles:u,instructions:u

    is equivalent to:

    perf record --all-user -e cycles,instructions

    - Make 'perf record' collect CPU cache info in the perf.data file header:

    $ perf record usleep 1
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.017 MB perf.data (7 samples) ]
    $ perf report --header-only -I | tail -10 | head -8
    # CPU cache info:
    # L1 Data 32K [0-1]
    # L1 Instruction 32K [0-1]
    # L1 Data 32K [2-3]
    # L1 Instruction 32K [2-3]
    # L2 Unified 256K [0-1]
    # L2 Unified 256K [2-3]
    # L3 Unified 4096K [0-3]

    Will be used in 'perf c2c' and eventually in 'perf diff' to
    allow, for instance running the same workload in multiple
    machines and then when using 'diff' show the hardware difference.
    (Jiri Olsa)

    - Improved support for Java, using the JVMTI agent library to do
    jitdumps that then will be inserted in synthesized
    PERF_RECORD_MMAP2 events via 'perf inject' pointed to synthesized
    ELF files stored in ~/.debug and keyed with build-ids, to allow
    symbol resolution and even annotation with source line info, see
    the changeset comments to see how to use it (Stephane Eranian)

    perf script/trace:

    - Decode data_src values (e.g. perf.data files generated by 'perf
    mem record') in 'perf script': (Jiri Olsa)

    # perf script
    perf 693 [1] 4.088652: 1 cpu/mem-loads,ldlat=30/P: ffff88007d0b0f40 68100142 L1 hit|SNP None|TLB L1 or L2 hit|LCK No
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    - Improve support to 'data_src', 'weight' and 'addr' fields in
    'perf script' (Jiri Olsa)

    - Handle empty print fmts in 'perf script -s' i.e. when running
    python or perl scripts (Taeung Song)

    perf stat:

    - 'perf stat' now shows shadow metrics (insn per cycle, etc) in
    interval mode too. E.g:

    # perf stat -I 1000 -e instructions,cycles sleep 1
    # time counts unit events
    1.000215928 519,620 instructions # 0.69 insn per cycle
    1.000215928 752,003 cycles

    - Port 'perf kvm stat' to PowerPC (Hemant Kumar)

    - Implement CSV metrics output in 'perf stat' (Andi Kleen)

    perf BPF support:

    - Support converting data from bpf events in 'perf data' (Wang Nan)

    - Print bpf-output events in 'perf script': (Wang Nan).

    # perf record -e bpf-output/no-inherit,name=evt/ -e ./test_bpf_output_3.c/map:channel.event=evt/ usleep 1000
    # perf script
    usleep 4882 21384.532523: evt: ffffffff810e97d1 sys_nanosleep ([kernel.kallsyms])
    BPF output: 0000: 52 61 69 73 65 20 61 20 Raise a
    0008: 42 50 46 20 65 76 65 6e BPF even
    0010: 74 21 00 00 t!..
    BPF string: "Raise a BPF event!"
    #

    - Add API to set values of map entries in a BPF object, be it
    individual map slots or ranges (Wang Nan)

    - Introduce support for the 'bpf-output' event (Wang Nan)

    - Add glue to read perf events in a BPF program (Wang Nan)

    - Improve support for bpf-output events in 'perf trace' (Wang Nan)

    ... and tons of other changes as well - see the shortlog and git log
    for details!"

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (342 commits)
    perf stat: Add --metric-only support for -A
    perf stat: Implement --metric-only mode
    perf stat: Document CSV format in manpage
    perf hists browser: Check sort keys before hot key actions
    perf hists browser: Allow thread filtering for comm sort key
    perf tools: Add sort__has_comm variable
    perf tools: Recalc total periods using top-level entries in hierarchy
    perf tools: Remove nr_sort_keys field
    perf hists browser: Cleanup hist_browser__fprintf_hierarchy_entry()
    perf tools: Remove hist_entry->fmt field
    perf tools: Fix command line filters in hierarchy mode
    perf tools: Add more sort entry check functions
    perf tools: Fix hist_entry__filter() for hierarchy
    perf jitdump: Build only on supported archs
    tools lib traceevent: Add '~' operation within arg_num_eval()
    perf tools: Omit unnecessary cast in perf_pmu__parse_scale
    perf tools: Pass perf_hpp_list all the way through setup_sort_list
    perf tools: Fix perf script python database export crash
    perf jitdump: DWARF is also needed
    perf bench mem: Prepare the x86-64 build for upstream memcpy_mcsafe() changes
    ...

    Linus Torvalds
     

14 Mar, 2016

1 commit

  • * pm-cpufreq: (94 commits)
    intel_pstate: Do not skip samples partially
    intel_pstate: Remove freq calculation from intel_pstate_calc_busy()
    intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance()
    intel_pstate: Optimize calculation for max/min_perf_adj
    intel_pstate: Remove extra conversions in pid calculation
    cpufreq: Move scheduler-related code to the sched directory
    Revert "cpufreq: postfix policy directory with the first CPU in related_cpus"
    cpufreq: Reduce cpufreq_update_util() overhead a bit
    cpufreq: Select IRQ_WORK if CPU_FREQ_GOV_COMMON is set
    cpufreq: Remove 'policy->governor_enabled'
    cpufreq: Rename __cpufreq_governor() to cpufreq_governor()
    cpufreq: Relocate handle_update() to kill its declaration
    cpufreq: governor: Drop unnecessary checks from show() and store()
    cpufreq: governor: Fix race in dbs_update_util_handler()
    cpufreq: governor: Make gov_set_update_util() static
    cpufreq: governor: Narrow down the dbs_data_mutex coverage
    cpufreq: governor: Make dbs_data_mutex static
    cpufreq: governor: Relocate definitions of tuners structures
    cpufreq: governor: Move per-CPU data to the common code
    cpufreq: governor: Make governor private data per-policy
    ...

    Rafael J. Wysocki
     

09 Mar, 2016

12 commits

  • if kprobe is placed within update or delete hash map helpers
    that hold bucket spin lock and triggered bpf program is trying to
    grab the spinlock for the same bucket on the same cpu, it will
    deadlock.
    Fix it by extending existing recursion prevention mechanism.

    Note, map_lookup and other tracing helpers don't have this problem,
    since they don't hold any locks and don't modify global data.
    bpf_trace_printk has its own recursive check and ok as well.

    Signed-off-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • Several cases of overlapping changes, as well as one instance
    (vxlan) of a bug fix in 'net' overlapping with code movement
    in 'net-next'.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • echo nop > /sys/kernel/debug/tracing/options/current_tracer
    echo 1 > /sys/kernel/debug/tracing/options/test_nop_accept
    echo 0 > /sys/kernel/debug/tracing/options/test_nop_accept
    echo 1 > /sys/kernel/debug/tracing/options/test_nop_refuse

    Before the fix, the dmesg is a bit ugly since a align issue.

    [ 191.973081] nop_test_accept flag set to 1: we accept. Now cat trace_options to see the result
    [ 195.156942] nop_test_refuse flag set to 1: we refuse.Now cat trace_options to see the result

    After the fix, the dmesg will show aligned log for nop_test_refuse and nop_test_accept.

    [ 2718.032413] nop_test_refuse flag set to 1: we refuse. Now cat trace_options to see the result
    [ 2734.253360] nop_test_accept flag set to 1: we accept. Now cat trace_options to see the result

    Link: http://lkml.kernel.org/r/1457444222-8654-2-git-send-email-chuhu@redhat.com

    Signed-off-by: Chunyu Hu
    Signed-off-by: Steven Rostedt

    Chunyu Hu
     
  • gcc isn't known for handling bool in structures. Instead of using bool, use
    an integer mask and use bit flags instead.

    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • Add a new unreg_all() callback that can be used to remove all
    command-specific triggers from an event and arrange to have it called
    whenever a trigger file is opened with O_TRUNC set.

    Commands that don't want truncate semantics, or existing commands that
    don't implement this function simply do nothing and their triggers
    remain intact.

    Link: http://lkml.kernel.org/r/2b7d62854d01f28c19185e1bbb8f826f385edfba.1449767187.git.tom.zanussi@linux.intel.com

    Signed-off-by: Tom Zanussi
    Reviewed-by: Namhyung Kim
    Signed-off-by: Steven Rostedt

    Tom Zanussi
     
  • Add a new needs_rec flag for triggers that require unconditional
    access to trace records in order to function.

    Normally a trigger requires access to the contents of a trace record
    only if it has a filter associated with it (since filters need the
    contents of a record in order to make a filtering decision). Some
    types of triggers, such as 'hist' triggers, require access to trace
    record contents independent of the presence of filters, so add a new
    flag for those triggers.

    Link: http://lkml.kernel.org/r/7be8fa38f9b90fdb6c47ca0f98d20a07b9fd512b.1449767187.git.tom.zanussi@linux.intel.com

    Signed-off-by: Tom Zanussi
    Tested-by: Masami Hiramatsu
    Reviewed-by: Namhyung Kim
    Signed-off-by: Steven Rostedt

    Tom Zanussi
     
  • Add a simple per-trigger 'paused' flag, allowing individual triggers
    to pause. We could leave it to individual triggers that need this
    functionality to do it themselves, but we also want to allow other
    events to control pausing, so add it to the trigger data.

    Link: http://lkml.kernel.org/r/fed37e4879684d7dcc57fe00ce0cbf170032b06d.1449767187.git.tom.zanussi@linux.intel.com

    Signed-off-by: Tom Zanussi
    Tested-by: Masami Hiramatsu
    Reviewed-by: Namhyung Kim
    Signed-off-by: Steven Rostedt

    Tom Zanussi
     
  • Add a utility function to grab the syscall name from the syscall
    metadata, given a syscall id.

    Link: http://lkml.kernel.org/r/be26a8dfe3f15e16a837799f1c1e2b4d62742843.1449767187.git.tom.zanussi@linux.intel.com

    Signed-off-by: Tom Zanussi
    Tested-by: Masami Hiramatsu
    Reviewed-by: Namhyung Kim
    Signed-off-by: Steven Rostedt

    Tom Zanussi
     
  • Some triggers may need access to the trace event, so pass it in. Also
    fix up the existing trigger funcs and their callers.

    Link: http://lkml.kernel.org/r/543e31e9fc445ef61077421ab219033401c39846.1449767187.git.tom.zanussi@linux.intel.com

    Signed-off-by: Tom Zanussi
    Tested-by: Masami Hiramatsu
    Reviewed-by: Namhyung Kim
    Signed-off-by: Steven Rostedt

    Tom Zanussi
     
  • Make various event trigger utility functions available outside of
    trace_events_trigger.c so that new triggers can be defined outside of
    that file.

    Link: http://lkml.kernel.org/r/4a40c1695dd43cac6cd475d72e13ffe30ba84bff.1449767187.git.tom.zanussi@linux.intel.com

    Signed-off-by: Tom Zanussi
    Tested-by: Masami Hiramatsu
    Reviewed-by: Namhyung Kim
    Signed-off-by: Steven Rostedt

    Tom Zanussi
     
  • Make is_string_field() and is_function_field() accessible outside of
    trace_event_filters.c for other users of ftrace_event_fields.

    Link: http://lkml.kernel.org/r/2d3f00d3311702e556e82eed7754bae6f017939f.1449767187.git.tom.zanussi@linux.intel.com

    Signed-off-by: Tom Zanussi
    Reviewed-by: Masami Hiramatsu
    Tested-by: Masami Hiramatsu
    Reviewed-by: Namhyung Kim
    Signed-off-by: Steven Rostedt

    Tom Zanussi
     
  • When I was updating the ftrace_stress test of ltp. I encountered
    a strange phenomemon, excute following steps:

    echo nop > /sys/kernel/debug/tracing/current_tracer
    echo 0 > /sys/kernel/debug/tracing/options/funcgraph-cpu
    bash: echo: write error: Invalid argument

    check dmesg:
    [ 1024.903855] nop_test_refuse flag set to 0: we refuse.Now cat trace_options to see the result

    The reason is that the trace option test will randomly setup trace
    option under tracing/options no matter what the current_tracer is.
    but the set_tracer_option is always using the set_flag callback
    from the current_tracer. This patch adds a pointer to tracer_flags
    and make it point to the tracer it belongs to. When the option is
    setup, the set_flag of the right tracer will be used no matter
    what the the current_tracer is.

    And the old dummy_tracer_flags is used for all the tracers which
    doesn't have a tracer_flags, having issue to use it to save the
    pointer of a tracer. So remove it and use dynamic dummy tracer_flags
    for tracers needing a dummy tracer_flags, as a result, there are no
    tracers sharing tracer_flags, so remove the check code.

    And save the current tracer to trace_option_dentry seems not good as
    it may waste mem space when mount the debug/trace fs more than one time.

    Link: http://lkml.kernel.org/r/1457444222-8654-1-git-send-email-chuhu@redhat.com

    Signed-off-by: Chunyu Hu
    [ Fixed up function tracer options to work with the change ]
    Signed-off-by: Steven Rostedt

    Chunyu Hu
     

05 Mar, 2016

1 commit

  • …t/rostedt/linux-trace

    Pull tracing fix from Steven Rostedt:
    "A feature was added in 4.3 that allowed users to filter trace points
    on a tasks "comm" field. But this prevented filtering on a comm field
    that is within a trace event (like sched_migrate_task).

    When trying to filter on when a program migrated, this change
    prevented the filtering of the sched_migrate_task.

    To fix this, the event fields are examined first, and then the extra
    fields like "comm" and "cpu" are examined. Also, instead of testing
    to assign the comm filter function based on the field's name, the
    generic comm field is given a new filter type (FILTER_COMM). When
    this field is used to filter the type is checked. The same is done
    for the cpu filter field.

    Two new special filter types are added: "COMM" and "CPU". This allows
    users to still filter the tasks comm for events that have "comm" as
    one of their fields, in cases that users would like to filter
    sched_migrate_task on the comm of the task that called the event, and
    not the comm of the task that is being migrated"

    * tag 'trace-fixes-v4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing: Do not have 'comm' filter override event 'comm' field

    Linus Torvalds
     

04 Mar, 2016

1 commit

  • Commit 9f61668073a8d "tracing: Allow triggers to filter for CPU ids and
    process names" added a 'comm' filter that will filter events based on the
    current tasks struct 'comm'. But this now hides the ability to filter events
    that have a 'comm' field too. For example, sched_migrate_task trace event.
    That has a 'comm' field of the task to be migrated.

    echo 'comm == "bash"' > events/sched_migrate_task/filter

    will now filter all sched_migrate_task events for tasks named "bash" that
    migrates other tasks (in interrupt context), instead of seeing when "bash"
    itself gets migrated.

    This fix requires a couple of changes.

    1) Change the look up order for filter predicates to look at the events
    fields before looking at the generic filters.

    2) Instead of basing the filter function off of the "comm" name, have the
    generic "comm" filter have its own filter_type (FILTER_COMM). Test
    against the type instead of the name to assign the filter function.

    3) Add a new "COMM" filter that works just like "comm" but will filter based
    on the current task, even if the trace event contains a "comm" field.

    Do the same for "cpu" field, adding a FILTER_CPU and a filter "CPU".

    Cc: stable@vger.kernel.org # v4.3+
    Fixes: 9f61668073a8d "tracing: Allow triggers to filter for CPU ids and process names"
    Reported-by: Matt Fleming
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

29 Feb, 2016

2 commits

  • Some tracepoint have multiple fields with the same name, "nr", the first
    one is a unique syscall ID, the other is a syscall argument:

    # cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_io_getevents/format
    name: sys_enter_io_getevents
    ID: 747
    format:
    field:unsigned short common_type; offset:0; size:2; signed:0;
    field:unsigned char common_flags; offset:2; size:1; signed:0;
    field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
    field:int common_pid; offset:4; size:4; signed:1;

    field:int nr; offset:8; size:4; signed:1;
    field:aio_context_t ctx_id; offset:16; size:8; signed:0;
    field:long min_nr; offset:24; size:8; signed:0;
    field:long nr; offset:32; size:8; signed:0;
    field:struct io_event * events; offset:40; size:8; signed:0;
    field:struct timespec * timeout; offset:48; size:8; signed:0;

    print fmt: "ctx_id: 0x%08lx, min_nr: 0x%08lx, nr: 0x%08lx, events: 0x%08lx, timeout: 0x%08lx", ((unsigned long)(REC->ctx_id)), ((unsigned long)(REC->min_nr)), ((unsigned long)(REC->nr)), ((unsigned long)(REC->events)), ((unsigned long)(REC->timeout))
    #

    Fix it by renaming the "/format" common tracepoint field "nr" to "__syscall_nr".

    Signed-off-by: Taeung Song
    [ Do not rename the struct member, just the '/format' field name ]
    Signed-off-by: Steven Rostedt
    Acked-by: Peter Zijlstra
    Cc: Jiri Olsa
    Cc: Lai Jiangshan
    Cc: Namhyung Kim
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20160226132301.3ae065a4@gandalf.local.home
    Signed-off-by: Arnaldo Carvalho de Melo

    Taeung Song
     
  • Signed-off-by: Ingo Molnar

    Ingo Molnar
     

26 Feb, 2016

1 commit

  • …git/rostedt/linux-trace

    Pull tracing fix from Steven Rostedt:
    "Another small bug reported to me by Chunyu Hu.

    When perf added a "reg" function to the function tracing event (not a
    tracepoint), it caused that event to be displayed as a tracepoint and
    could cause errors in tracepoint handling. That was solved by adding
    a flag to ignore ftrace non-tracepoint events. But that flag was
    missed when displaying events in available_events, which should only
    contain tracepoint events.

    This broke a documented way to enable all events with:

    cat available_events > set_event

    As the function non-tracepoint event would cause that to error out.
    The commit here fixes that by having the available_events file not
    list events that have the ignore flag set"

    * tag 'trace-fixes-v4.5-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing: Fix showing function event in available_events

    Linus Torvalds
     

24 Feb, 2016

1 commit

  • The ftrace:function event is only displayed for parsing the function tracer
    data. It is not used to enable function tracing, and does not include an
    "enable" file in its event directory.

    Originally, this event was kept separate from other events because it did
    not have a ->reg parameter. But perf added a "reg" parameter for its use
    which caused issues, because it made the event available to functions where
    it was not compatible for.

    Commit 9b63776fa3ca9 "tracing: Do not enable function event with enable"
    added a TRACE_EVENT_FL_IGNORE_ENABLE flag that prevented the function event
    from being enabled by normal trace events. But this commit missed keeping
    the function event from being displayed by the "available_events" directory,
    which is used to show what events can be enabled by set_event.

    One documented way to enable all events is to:

    cat available_events > set_event

    But because the function event is displayed in the available_events, this
    now causes an INVALID error:

    cat: write error: Invalid argument

    Reported-by: Chunyu Hu
    Fixes: 9b63776fa3ca9 "tracing: Do not enable function event with enable"
    Cc: stable@vger.kernel.org # 3.4+
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

23 Feb, 2016

2 commits

  • Conflicts:
    drivers/net/phy/bcm7xxx.c
    drivers/net/phy/marvell.c
    drivers/net/vxlan.c

    All three conflicts were cases of simple overlapping changes.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • …t/rostedt/linux-trace

    Pull tracing fixes from Steven Rostedt:
    "Two more small fixes.

    One is by Yang Shi who added a READ_ONCE_NOCHECK() to the scan of the
    stack made by the stack tracer. As the stack tracer scans the entire
    kernel stack, KASAN triggers seeing it as a "stack out of bounds"
    error. As the scan is looking at the contents of the stack from
    parent functions. The NOCHECK() tells KASAN that this is done on
    purpose, and is not some kind of stack overflow.

    The second fix is to the ftrace selftests, to retrieve the PID of
    executed commands from the shell with '$!' and not by parsing 'jobs'"

    * tag 'trace-fixes-v4.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing, kasan: Silence Kasan warning in check_stack of stack_tracer
    ftracetest: Fix instance test to use proper shell command for pids

    Linus Torvalds
     

20 Feb, 2016

2 commits

  • add new map type to store stack traces and corresponding helper
    bpf_get_stackid(ctx, map, flags) - walk user or kernel stack and return id
    @ctx: struct pt_regs*
    @map: pointer to stack_trace map
    @flags: bits 0-7 - numer of stack frames to skip
    bit 8 - collect user stack instead of kernel
    bit 9 - compare stacks by hash only
    bit 10 - if two different stacks hash into the same stackid
    discard old
    other bits - reserved
    Return: >= 0 stackid on success or negative error

    stackid is a 32-bit integer handle that can be further combined with
    other data (including other stackid) and used as a key into maps.

    Userspace will access stackmap using standard lookup/delete syscall commands to
    retrieve full stack trace for given stackid.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • When enabling stack trace via "echo 1 > /proc/sys/kernel/stack_tracer_enabled",
    the below KASAN warning is triggered:

    BUG: KASAN: stack-out-of-bounds in check_stack+0x344/0x848 at addr ffffffc0689ebab8
    Read of size 8 by task ksoftirqd/4/29
    page:ffffffbdc3a27ac0 count:0 mapcount:0 mapping: (null) index:0x0
    flags: 0x0()
    page dumped because: kasan: bad access detected
    CPU: 4 PID: 29 Comm: ksoftirqd/4 Not tainted 4.5.0-rc1 #129
    Hardware name: Freescale Layerscape 2085a RDB Board (DT)
    Call trace:
    [] dump_backtrace+0x0/0x3a0
    [] show_stack+0x24/0x30
    [] dump_stack+0xd8/0x168
    [] kasan_report_error+0x6a0/0x920
    [] kasan_report+0x70/0xb8
    [] __asan_load8+0x60/0x78
    [] check_stack+0x344/0x848
    [] stack_trace_call+0x1c4/0x370
    [] ftrace_ops_no_ops+0x2c0/0x590
    [] ftrace_graph_call+0x0/0x14
    [] fpsimd_thread_switch+0x24/0x1e8
    [] __switch_to+0x34/0x218
    [] __schedule+0x3ac/0x15b8
    [] schedule+0x5c/0x178
    [] smpboot_thread_fn+0x350/0x960
    [] kthread+0x1d8/0x2b0
    [] ret_from_fork+0x10/0x40
    Memory state around the buggy address:
    ffffffc0689eb980: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 f4 f4 f4
    ffffffc0689eba00: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
    >ffffffc0689eba80: 00 00 f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3 f3 00 00
    ^
    ffffffc0689ebb00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    ffffffc0689ebb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

    The stacker tracer traverses the whole kernel stack when saving the max stack
    trace. It may touch the stack red zones to cause the warning. So, just disable
    the instrumentation to silence the warning.

    Link: http://lkml.kernel.org/r/1455309960-18930-1-git-send-email-yang.shi@linaro.org

    Signed-off-by: Yang Shi
    Signed-off-by: Steven Rostedt

    Yang Shi
     

19 Feb, 2016

1 commit

  • Pull livepatching fixes from Jiri Kosina:

    - regression (from 4.4) fix for ordering issue, introduced by an
    earlier ftrace change, that broke live patching of modules.

    The fix replaces the ftrace module notifier by direct call in order
    to make the ordering guaranteed and well-defined. The patch, from
    Jessica Yu, has been acked both by Steven and Rusty

    - error message fix from Miroslav Benes

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
    ftrace/module: remove ftrace module notifier
    livepatch: change the error message in asm/livepatch.h header files

    Linus Torvalds
     

18 Feb, 2016

1 commit

  • Remove the ftrace module notifier in favor of directly calling
    ftrace_module_enable() and ftrace_release_mod() in the module loader.
    Hard-coding the function calls directly in the module loader removes
    dependence on the module notifier call chain and provides better
    visibility and control over what gets called when, which is important
    to kernel utilities such as livepatch.

    This fixes a notifier ordering issue in which the ftrace module notifier
    (and hence ftrace_module_enable()) for coming modules was being called
    after klp_module_notify(), which caused livepatch modules to initialize
    incorrectly. This patch removes dependence on the module notifier call
    chain in favor of hard coding the corresponding function calls in the
    module loader. This ensures that ftrace and livepatch code get called in
    the correct order on patch module load and unload.

    Fixes: 5156dca34a3e ("ftrace: Fix the race between ftrace and insmod")
    Signed-off-by: Jessica Yu
    Reviewed-by: Steven Rostedt
    Reviewed-by: Petr Mladek
    Acked-by: Rusty Russell
    Reviewed-by: Josh Poimboeuf
    Reviewed-by: Miroslav Benes
    Signed-off-by: Jiri Kosina

    Jessica Yu