01 Aug, 2012

3 commits

  • Merge Andrew's second set of patches:
    - MM
    - a few random fixes
    - a couple of RTC leftovers

    * emailed patches from Andrew Morton : (120 commits)
    rtc/rtc-88pm80x: remove unneed devm_kfree
    rtc/rtc-88pm80x: assign ret only when rtc_register_driver fails
    mm: hugetlbfs: close race during teardown of hugetlbfs shared page tables
    tmpfs: distribute interleave better across nodes
    mm: remove redundant initialization
    mm: warn if pg_data_t isn't initialized with zero
    mips: zero out pg_data_t when it's allocated
    memcg: gix memory accounting scalability in shrink_page_list
    mm/sparse: remove index_init_lock
    mm/sparse: more checks on mem_section number
    mm/sparse: optimize sparse_index_alloc
    memcg: add mem_cgroup_from_css() helper
    memcg: further prevent OOM with too many dirty pages
    memcg: prevent OOM with too many dirty pages
    mm: mmu_notifier: fix freed page still mapped in secondary MMU
    mm: memcg: only check anon swapin page charges for swap cache
    mm: memcg: only check swap cache pages for repeated charging
    mm: memcg: split swapin charge function into private and public part
    mm: memcg: remove needless !mm fixup to init_mm when charging
    mm: memcg: remove unneeded shmem charge type
    ...

    Linus Torvalds
     
  • "fault-injection: add tool to run command with failslab or
    fail_page_alloc" added tools/testing/fault-injection/failcmd.sh to make it
    easier to inject slab/page allocation failures by fault injection.

    failcmd.sh prints the following warning when running with arguments
    for command.

    # ./failcmd.sh echo aaa
    failcmd.sh: line 209: [: echo: binary operator expected
    aaa

    This warning is caused by an improper check whether at least one
    parameter is left after parsing command options.

    Fix it by testing the length of $1 instead of $@

    Signed-off-by: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • Pull perf updates from Ingo Molnar:
    "The biggest changes are Intel Nehalem-EX PMU uncore support, uprobes
    updates/cleanups/fixes from Oleg and diverse tooling updates (mostly
    fixes) now that Arnaldo is back from vacation."

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
    uprobes: __replace_page() needs munlock_vma_page()
    uprobes: Rename vma_address() and make it return "unsigned long"
    uprobes: Fix register_for_each_vma()->vma_address() check
    uprobes: Introduce vaddr_to_offset(vma, vaddr)
    uprobes: Teach build_probe_list() to consider the range
    uprobes: Remove insert_vm_struct()->uprobe_mmap()
    uprobes: Remove copy_vma()->uprobe_mmap()
    uprobes: Fix overflow in vma_address()/find_active_uprobe()
    uprobes: Suppress uprobe_munmap() from mmput()
    uprobes: Uprobe_mmap/munmap needs list_for_each_entry_safe()
    uprobes: Clean up and document write_opcode()->lock_page(old_page)
    uprobes: Kill write_opcode()->lock_page(new_page)
    uprobes: __replace_page() should not use page_address_in_vma()
    uprobes: Don't recheck vma/f_mapping in write_opcode()
    perf/x86: Fix missing struct before structure name
    perf/x86: Fix format definition of SNB-EP uncore QPI box
    perf/x86: Make bitfield unsigned
    perf/x86: Fix LLC-* and node-* events on Intel SandyBridge
    perf/x86: Add Intel Nehalem-EX uncore support
    perf/x86: Fix typo in format definition of uncore PCU filter
    ...

    Linus Torvalds
     

31 Jul, 2012

7 commits

  • Merge Andrew's first set of patches:
    "Non-MM patches:

    - lots of misc bits

    - tree-wide have_clk() cleanups

    - quite a lot of printk tweaks. I draw your attention to "printk:
    convert the format for KERN_ to a 2 byte pattern" which
    looks a bit scary. But afaict it's solid.

    - backlight updates

    - lib/ feature work (notably the addition and use of memweight())

    - checkpatch updates

    - rtc updates

    - nilfs updates

    - fatfs updates (partial, still waiting for acks)

    - kdump, proc, fork, IPC, sysctl, taskstats, pps, etc

    - new fault-injection feature work"

    * Merge emailed patches from Andrew Morton : (128 commits)
    drivers/misc/lkdtm.c: fix missing allocation failure check
    lib/scatterlist: do not re-write gfp_flags in __sg_alloc_table()
    fault-injection: add tool to run command with failslab or fail_page_alloc
    fault-injection: add selftests for cpu and memory hotplug
    powerpc: pSeries reconfig notifier error injection module
    memory: memory notifier error injection module
    PM: PM notifier error injection module
    cpu: rewrite cpu-notifier-error-inject module
    fault-injection: notifier error injection
    c/r: fcntl: add F_GETOWNER_UIDS option
    resource: make sure requested range is included in the root range
    include/linux/aio.h: cpp->C conversions
    fs: cachefiles: add support for large files in filesystem caching
    pps: return PTR_ERR on error in device_create
    taskstats: check nla_reserve() return
    sysctl: suppress kmemleak messages
    ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION
    ipc: compat: use signed size_t types for msgsnd and msgrcv
    ipc: allow compat IPC version field parsing if !ARCH_WANT_OLD_COMPAT_IPC
    ipc: add COMPAT_SHMLBA support
    ...

    Linus Torvalds
     
  • This adds tools/testing/fault-injection/failcmd.sh to run a command while
    injecting slab/page allocation failures via fault injection.

    Example:

    Run a command "make -C tools/testing/selftests/ run_tests" with
    injecting slab allocation failure.

    # ./tools/testing/fault-injection/failcmd.sh \
    -- make -C tools/testing/selftests/ run_tests

    Same as above except to specify 100 times failures at most instead of
    one time at most by default.

    # ./tools/testing/fault-injection/failcmd.sh --times=100 \
    -- make -C tools/testing/selftests/ run_tests

    Same as above except to inject page allocation failure instead of slab
    allocation failure.

    # env FAILCMD_TYPE=fail_page_alloc \
    ./tools/testing/fault-injection/failcmd.sh --times=100 \
    -- make -C tools/testing/selftests/ run_tests

    Signed-off-by: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • This adds two selftests

    * tools/testing/selftests/cpu-hotplug/on-off-test.sh is testing script
    for CPU hotplug

    1. Online all hot-pluggable CPUs
    2. Offline all hot-pluggable CPUs
    3. Online all hot-pluggable CPUs again
    4. Exit if cpu-notifier-error-inject.ko is not available
    5. Offline all hot-pluggable CPUs in preparation for testing
    6. Test CPU hot-add error handling by injecting notifier errors
    7. Online all hot-pluggable CPUs in preparation for testing
    8. Test CPU hot-remove error handling by injecting notifier errors

    * tools/testing/selftests/memory-hotplug/on-off-test.sh is doing the
    similar thing for memory hotplug.

    1. Online all hot-pluggable memory
    2. Offline 10% of hot-pluggable memory
    3. Online all hot-pluggable memory again
    4. Exit if memory-notifier-error-inject.ko is not available
    5. Offline 10% of hot-pluggable memory in preparation for testing
    6. Test memory hot-add error handling by injecting notifier errors
    7. Online all hot-pluggable memory in preparation for testing
    8. Test memory hot-remove error handling by injecting notifier errors

    Signed-off-by: Akinobu Mita
    Suggested-by: Andrew Morton
    Cc: Pavel Machek
    Cc: "Rafael J. Wysocki"
    Cc: Greg KH
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Cc: Dave Jones
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • Pull ktest changes from Steven Rostedt:
    "Set of updates for v3.6 (some fixes too)

    Seems that you opened the merge window the day I left for the beach.
    I just got back (yes us Americans only take a week vacation), and just
    got the last of my ktest quilt queue into git."

    * tag 'ktest-v3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
    ktest: Allow perl regex expressions in conditional statements
    ktest: Ignore errors it tests if IGNORE_ERRORS is set
    ktest: Reset saved min (force) configs for each test
    ktest: Add check for bug or panic during reboot
    ktest: Add MAX_MONITOR_WAIT option
    ktest: Fix config bisect with how make oldnoconfig works
    ktest: Add CONFIG_BISECT_CHECK option
    ktest: Add PRE_INSTALL option
    ktest: Add PRE/POST_KTEST and TEST options
    ktest: Remove commented exit

    Linus Torvalds
     
  • Add '=~' and '!~' to the list of allowed conditionals for DEFAULT and
    TEST_START section if statements.

    ie.

    TEST_START IF TEST =~ .*test$

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • The option IGNORE_ERRORS is used to allow a test to succeed even if a
    warning appears from the kernel. Sometimes kernels will produce warnings
    that are not associated with a test, and the user wants to test
    something else.

    The IGNORE_ERRORS works for boot up, but was not preventing test runs to
    succeed if the kernel produced a warning.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Pull SLAB changes from Pekka Enberg:
    "Most of the changes included are from Christoph Lameter's "common
    slab" patch series that unifies common parts of SLUB, SLAB, and SLOB
    allocators. The unification is needed for Glauber Costa's "kmem
    memcg" work that will hopefully appear for v3.7.

    The rest of the changes are fixes and speedups by various people."

    * 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux: (32 commits)
    mm: Fix build warning in kmem_cache_create()
    slob: Fix early boot kernel crash
    mm, slub: ensure irqs are enabled for kmemcheck
    mm, sl[aou]b: Move kmem_cache_create mutex handling to common code
    mm, sl[aou]b: Use a common mutex definition
    mm, sl[aou]b: Common definition for boot state of the slab allocators
    mm, sl[aou]b: Extract common code for kmem_cache_create()
    slub: remove invalid reference to list iterator variable
    mm: Fix signal SIGFPE in slabinfo.c.
    slab: move FULL state transition to an initcall
    slab: Fix a typo in commit 8c138b "slab: Get rid of obj_size macro"
    mm, slab: Build fix for recent kmem_cache changes
    slab: rename gfpflags to allocflags
    slub: refactoring unfreeze_partials()
    slub: use __cmpxchg_double_slab() at interrupt disabled place
    slab/mempolicy: always use local policy from interrupt context
    slab: Get rid of obj_size macro
    mm, sl[aou]b: Extract common fields from struct kmem_cache
    slab: Remove some accessors
    slab: Use page struct fields instead of casting
    ...

    Linus Torvalds
     

27 Jul, 2012

2 commits

  • Pull ACPI & power management update from Len Brown:
    "Re-write of the turbostat tool.
    lower overhead was necessary for measuring very large system when
    they are very idle.

    IVB support in intel_idle
    It's what I run on my IVB, others should be able to also:-)

    ACPICA core update
    We have found some bugs due to divergence between Linux and the
    upstream ACPICA base. Most of these patches are to reduce that
    divergence to reduce the risk of future bugs.

    Some cpuidle updates, mostly for non-Intel
    More will be coming, as they depend on this part.

    Some thermal management changes needed by non-ACPI systems.

    Some _OST (OS Status Indication) updates for hot ACPI hot-plug."

    * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (51 commits)
    Thermal: Documentation update
    Thermal: Add Hysteresis attributes
    Thermal: Make Thermal trip points writeable
    ACPI/AC: prevent OOPS on some boxes due to missing check power_supply_register() return value check
    tools/power: turbostat: fix large c1% issue
    tools/power: turbostat v2 - re-write for efficiency
    ACPICA: Update to version 20120711
    ACPICA: AcpiSrc: Fix some translation issues for Linux conversion
    ACPICA: Update header files copyrights to 2012
    ACPICA: Add new ACPI table load/unload external interfaces
    ACPICA: Split file: tbxface.c -> tbxfload.c
    ACPICA: Add PCC address space to space ID decode function
    ACPICA: Fix some comment fields
    ACPICA: Table manager: deploy new firmware error/warning interfaces
    ACPICA: Add new interfaces for BIOS(firmware) errors and warnings
    ACPICA: Split exception code utilities to a new file, utexcep.c
    ACPI: acpi_pad: tune round_robin_time
    ACPICA: Update to version 20120620
    ACPICA: Add support for implicit notify on multiple devices
    ACPICA: Update comments; no functional change
    ...

    Linus Torvalds
     
  • Pull USB patches from Greg Kroah-Hartman:
    "Here's the big USB patch set for the 3.6-rc1 merge window.

    Lots of little changes in here, primarily for gadget controllers and
    drivers. There's some scsi changes that I think also went in through
    the scsi tree, but they merge just fine. All of these patches have
    been in the linux-next tree for a while now.

    Signed-off-by: Greg Kroah-Hartman "

    Fix up trivial conflicts in include/scsi/scsi_device.h (same libata
    conflict that Jeff had already encountered)

    * tag 'usb-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (207 commits)
    usb: Add USB_QUIRK_RESET_RESUME for all Logitech UVC webcams
    usb: Add quirk detection based on interface information
    usb: s3c-hsotg: Add header file protection macros in s3c-hsotg.h
    USB: ehci-s5p: Add vbus setup function to the s5p ehci glue layer
    USB: add USB_VENDOR_AND_INTERFACE_INFO() macro
    USB: notify phy when root hub port connect change
    USB: remove 8 bytes of padding from usb_host_interface on 64 bit builds
    USB: option: add ZTE MF821D
    USB: sierra: QMI mode MC7710 moved to qcserial
    USB: qcserial: adding Sierra Wireless devices
    USB: qcserial: support generic Qualcomm serial ports
    USB: qcserial: make probe more flexible
    USB: qcserial: centralize probe exit path
    USB: qcserial: consolidate usb_set_interface calls
    USB: ehci-s5p: Add support for device tree
    USB: ohci-exynos: Add support for device tree
    USB: ehci-omap: fix compile failure(v1)
    usb: host: tegra: pass correct pointer in ehci_setup()
    USB: ehci-fsl: Update ifdef check to work on 64-bit ppc
    USB: serial: keyspan: Removed unrequired parentheses.
    ...

    Linus Torvalds
     

26 Jul, 2012

2 commits

  • …coupled', 'cpuidle-tweaks', 'intel_idle-ivb', 'ost', 'red-hat-bz-772730', 'thermal', 'thermal-spear' and 'turbostat-v2' into release

    Len Brown
     
  • A large enough symbol size causes an overflow in the size parameter to
    the histogram allocation, leading to a segfault in
    symbol__inc_addr_samples later on when this histogram is accessed.

    In the case of being called via perf-report, this returns back and
    gracefully ignores the sample, eventually ignoring the chained return
    value of perf_session_deliver_event in flush_sample_queue.

    Signed-off-by: Cody Schafer
    Acked-by: Namhyung Kim
    Cc: Ingo Molnar
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Sukadev Bhattiprolu
    Link: http://lkml.kernel.org/r/1342753525-4521-1-git-send-email-cody@linux.vnet.ibm.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Cody Schafer
     

25 Jul, 2012

17 commits

  • The TRACEEVENT-CFLAGS file is used to detect any change on compiler
    flags. Just ignore it.

    Signed-off-by: Namhyung Kim
    Cc: David Ahern
    Cc: Ingo Molnar
    Cc: Jiri Olsa
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1341559297-25725-3-git-send-email-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     
  • Cross compiling perf requires setting ARCH and CROSS_COMPILE variables,
    but libtraceevent couldn't detect the changes so it ends up believing no
    recompiling is required. Thus the linker failed like:

    LINK perf
    ../lib/traceevent//libtraceevent.a: member ../lib/traceevent//libtraceevent.a(event-parse.o) in archive is not an object
    collect2: ld returned 1 exit status
    make: *** [perf] Error 1

    This patch fixes this by adding TRACEEVENT-CFLAGS file like
    PERF-CFLAGS to track those changes.

    Signed-off-by: Namhyung Kim
    Cc: David Ahern
    Cc: Ingo Molnar
    Cc: Jiri Olsa
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1341559297-25725-2-git-send-email-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     
  • Bison 2.6 started to generate parse_events_parse() declaration in header. In
    this case we have redundant redeclaration:

    util/parse-events.c:29:5: error: redundant redeclaration of ‘parse_events_parse’ [-Werror=redundant-decls]
    In file included from util/parse-events.c:14:0:
    util/parse-events-bison.h:99:5: note: previous declaration of ‘parse_events_parse’ was here
    cc1: all warnings being treated as errors

    Let's disable -Wredundant-decls for util/parse-events.c since it includes
    header we can't control.

    Signed-off-by: Kirill A. Shutemov
    Cc: Ingo Molnar
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20120723210407.GA25186@shutemov.name
    Signed-off-by: Arnaldo Carvalho de Melo

    Kirill A. Shutemov
     
  • Perf uses GNU-specific version of strerror_r(). The GNU-specific strerror_r()
    returns a pointer to a string containing the error message. This may be either
    a pointer to a string that the function stores in buf, or a pointer to some
    (immutable) static string (in which case buf is unused).

    In glibc-2.16 GNU version was marked with attribute warn_unused_result. It
    triggers few warnings in perf:

    util/target.c: In function ‘perf_target__strerror’:
    util/target.c:114:13: error: ignoring return value of ‘strerror_r’, declared with attribute warn_unused_result [-Werror=unused-result]
    ui/browsers/hists.c: In function ‘hist_browser__dump’:
    ui/browsers/hists.c:981:13: error: ignoring return value of ‘strerror_r’, declared with attribute warn_unused_result [-Werror=unused-result]

    They are bugs.

    Let's fix strerror_r() usage.

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Ulrich Drepper
    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Ulrich Drepper
    Link: http://lkml.kernel.org/r/20120723210654.GA25248@shutemov.name
    [ committer note: s/assert/BUG_ON/g ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Kirill A. Shutemov
     
  • There have one problem about hw_breakpoint perf event, as watched, the
    events reported to userspace is not correctly, sometime one trigger
    bp_event report several events, sometime bp_event cannot go through to
    user.

    The root cause is attr->freq is 1 passed to kernel defaultly in bp
    events, this make kernel calculate event period not as expect, make
    sample period to 1 will change attr->freq to 0, to fix this problem.

    This patch is similar with commit f92128 about tracepoint events:
    perf: Make the trace events sample period default to 1

    Signed-off-by: Jovi Zhang
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/CACV3sbLF8taiCq_VYW-sgRJyupeMzg58C7ZXfMe3xZUiH_Mx6w@mail.gmail.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Jovi Zhang
     
  • Adding automated test for DSO data reading. Testing raw/cached reads
    from different file/cache locations.

    Signed-off-by: Jiri Olsa
    Cc: Arun Sharma
    Cc: Benjamin Redelings
    Cc: Corey Ashford
    Cc: Cyrill Gorcunov
    Cc: Frank Ch. Eigler
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Masami Hiramatsu
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Robert Richter
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    Cc: Ulrich Drepper
    Link: http://lkml.kernel.org/r/1342959280-5361-18-git-send-email-jolsa@redhat.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • Adding dso data caching so we don't need to open/read/close, each time
    we want dso data.

    The DSO data caching affects following functions:
    dso__data_read_offset
    dso__data_read_addr

    Each DSO read tries to find the data (based on offset) inside the cache.
    If it's not present it fills the cache from file, and returns the data.
    If it is present, data are returned with no file read.

    Each data read is cached by reading cache page sized/aligned amount of
    DSO data. The cache page size is hardcoded to 4096. The cache is using
    RB tree with file offset as a sort key.

    Signed-off-by: Jiri Olsa
    Cc: Arun Sharma
    Cc: Benjamin Redelings
    Cc: Corey Ashford
    Cc: Cyrill Gorcunov
    Cc: Frank Ch. Eigler
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Masami Hiramatsu
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Robert Richter
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    Cc: Ulrich Drepper
    Link: http://lkml.kernel.org/r/1342959280-5361-17-git-send-email-jolsa@redhat.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • Adding following interface for DSO object to allow
    reading of DSO image data:

    dso__data_fd
    - opens DSO and returns file descriptor
    Binary types are used to locate/open DSO in following order:
    DSO_BINARY_TYPE__BUILD_ID_CACHE
    DSO_BINARY_TYPE__SYSTEM_PATH_DSO
    In other word we first try to open DSO build-id path,
    and if that fails we try to open DSO system path.

    dso__data_read_offset
    - reads DSO data from specified offset

    dso__data_read_addr
    - reads DSO data from specified address/map.

    Signed-off-by: Jiri Olsa
    Cc: Arun Sharma
    Cc: Benjamin Redelings
    Cc: Corey Ashford
    Cc: Cyrill Gorcunov
    Cc: Frank Ch. Eigler
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Masami Hiramatsu
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Robert Richter
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    Cc: Ulrich Drepper
    Link: http://lkml.kernel.org/r/1342959280-5361-11-git-send-email-jolsa@redhat.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • Adding interface to access DSOs so it could be used
    from another place.

    New DSO binary type is added - making current SYMTAB__*
    types more general:
    DSO_BINARY_TYPE__* = SYMTAB__*

    Following function is added to return path based on the specified
    binary type:
    dso__binary_type_file

    Signed-off-by: Jiri Olsa
    Cc: Arun Sharma
    Cc: Benjamin Redelings
    Cc: Corey Ashford
    Cc: Cyrill Gorcunov
    Cc: Frank Ch. Eigler
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Masami Hiramatsu
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Robert Richter
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    Cc: Ulrich Drepper
    Link: http://lkml.kernel.org/r/1342959280-5361-10-git-send-email-jolsa@redhat.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • Tiny cosmetic fix. The lack of a newline between hists callchains was
    looking slightly messy.

    Before:

    0.24% swapper [kernel.kallsyms] [k] _raw_spin_lock_irq
    |
    --- _raw_spin_lock_irq
    run_timer_softirq
    __do_softirq
    call_softirq
    do_softirq
    irq_exit
    smp_apic_timer_interrupt
    apic_timer_interrupt
    default_idle
    amd_e400_idle
    cpu_idle
    start_secondary
    0.10% perf [kernel.kallsyms] [k] lock_is_held
    |
    --- lock_is_held
    __might_sleep
    mutex_lock_nested
    perf_event_for_each_child
    perf_ioctl
    do_vfs_ioctl
    sys_ioctl
    system_call_fastpath
    ioctl
    cmd_record
    run_builtin
    main
    __libc_start_main

    After:

    0.24% swapper [kernel.kallsyms] [k] _raw_spin_lock_irq
    |
    --- _raw_spin_lock_irq
    run_timer_softirq
    __do_softirq
    call_softirq
    do_softirq
    irq_exit
    smp_apic_timer_interrupt
    apic_timer_interrupt
    default_idle
    amd_e400_idle
    cpu_idle
    start_secondary

    0.10% perf [kernel.kallsyms] [k] lock_is_held
    |
    --- lock_is_held
    __might_sleep
    mutex_lock_nested
    perf_event_for_each_child
    perf_ioctl
    do_vfs_ioctl
    sys_ioctl
    system_call_fastpath
    ioctl
    cmd_record
    run_builtin
    main
    __libc_start_main

    Signed-off-by: Frederic Weisbecker
    Cc: David Ahern
    Cc: Ingo Molnar
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1342631456-7233-3-git-send-email-fweisbec@gmail.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Frederic Weisbecker
     
  • Trace events have a period (weight) of 1 by default. This can be
    overriden on events definition by using the __perf_count() macro.

    For example, the sched_stat_runtime() is weighted with the runtime of
    the task that fired the event.

    By default, perf handles such weighted event by dividing it into
    individual events carrying a weight of 1. For example if
    sched_stat_runtime is fired and the task has run 5000000 nsecs, perf
    divides it into 5000000 events in the buffer.

    This behaviour makes weighted events unusable because they quickly
    fullfill the buffers and we lose most events.

    The commit 5d81e5cfb37a174e8ddc0413e2e70cdf05807ace ("events: Don't
    divide events if it has field period") solves this problem by sending
    only one event when PERF_SAMPLE_PERIOD flag is set. The weight is
    carried in the sample itself such that we don't need to demultiplex it
    anymore.

    This patch provides the last missing piece to use this feature by
    setting PERF_SAMPLE_PERIOD from perf tools when we deal with trace
    events.

    Before:
    $ ./perf record -e sched:* -a sleep 1
    [ perf record: Woken up 3 times to write data ]
    [ perf record: Captured and wrote 1.619 MB perf.data (~70749 samples) ]
    Warning:
    Processed 16909 events and lost 1 chunks!

    Check IO/CPU overload!

    $ ./perf script
    perf 1894 [003] 824.898327: sched_migrate_task: comm=perf pid=1898 prio=120 orig_cpu=2 dest_cpu=0
    perf 1894 [003] 824.898335: sched_stat_sleep: comm=perf pid=1898 delay=113179500 [ns]
    perf 1894 [003] 824.898336: sched_stat_sleep: comm=perf pid=1898 delay=113179500 [ns]
    perf 1894 [003] 824.898337: sched_stat_sleep: comm=perf pid=1898 delay=113179500 [ns]
    perf 1894 [003] 824.898338: sched_stat_sleep: comm=perf pid=1898 delay=113179500 [ns]
    perf 1894 [003] 824.898339: sched_stat_sleep: comm=perf pid=1898 delay=113179500 [ns]
    perf 1894 [003] 824.898340: sched_stat_sleep: comm=perf pid=1898 delay=113179500 [ns]
    perf 1894 [003] 824.898341: sched_stat_sleep: comm=perf pid=1898 delay=113179500 [ns]
    [...]

    After:
    $ ./perf record -e sched:* -a sleep 1
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.074 MB perf.data (~3228 samples) ]

    $ ./perf script

    perf 1461 [000] 554.286957: sched_migrate_task: comm=perf pid=1465 prio=120 orig_cpu=3 dest_cpu=1
    perf 1461 [000] 554.286964: sched_stat_sleep: comm=perf pid=1465 delay=133047190 [ns]
    perf 1461 [000] 554.286967: sched_wakeup: comm=perf pid=1465 prio=120 success=1 target_cpu=001
    swapper 0 [001] 554.286976: sched_stat_wait: comm=perf pid=1465 delay=0 [ns]
    swapper 0 [001] 554.286983: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=perf
    [...]

    Signed-off-by: Frederic Weisbecker
    Cc: David Ahern
    Cc: Ingo Molnar
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1342631456-7233-1-git-send-email-fweisbec@gmail.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Frederic Weisbecker
     
  • Include the omitted number of characters printed for the first entry.

    Not that it really matters because nobody seem to care about the number
    of printed characters for now. But just in case.

    Signed-off-by: Frederic Weisbecker
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1342631456-7233-2-git-send-email-fweisbec@gmail.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Frederic Weisbecker
     
  • Adds the attributes to the event line in the header dump.
    From:
    event : name = cycles, type = 0, config = 0x0, config1 = 0x0,
    config2 = 0x0, excl_usr = 0, excl_kern = 0, ...
    to
    event : name = cycles, type = 0, config = 0x0, config1 = 0x0,
    config2 = 0x0, excl_usr = 0, excl_kern = 0, excl_host = 0,
    excl_guest = 0, precise_ip = 0, ...

    Signed-off-by: David Ahern
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/1342826756-64663-8-git-send-email-dsahern@gmail.com
    Signed-off-by: Arnaldo Carvalho de Melo

    David Ahern
     
  • After 7ed97ad use of the guestmount option without a subdir for *each*
    VM generates an error message for each sample related to that VM. Once
    per VM is enough.

    Signed-off-by: David Ahern
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1342826756-64663-7-git-send-email-dsahern@gmail.com
    Signed-off-by: Arnaldo Carvalho de Melo

    David Ahern
     
  • Guest kernel symbols are not resolved despite passing the information
    needed to resolve them. e.g.,

    perf kvm --guest --guestmount=/tmp/guest-mount record -a -- sleep 1
    perf kvm --guest --guestmount=/tmp/guest-mount report --stdio

    36.55% [guest/11399] [unknown] [g] 0xffffffff81600bc8
    33.19% [guest/10474] [unknown] [g] 0x00000000c0116e00
    30.26% [guest/11094] [unknown] [g] 0xffffffff8100a288
    43.69% [guest/10474] [unknown] [g] 0x00000000c0103d90
    37.38% [guest/11399] [unknown] [g] 0xffffffff81600bc8
    12.24% [guest/11094] [unknown] [g] 0xffffffff810aa91d
    6.69% [guest/11094] [unknown] [u] 0x00007fa784d721c3

    which is just pathetic.

    After a maddening 2 days sifting through perf minutia I found it --
    id_hdr_size is not initialized for guest machines. This shows up on the
    report side as random garbage for the cpu and timestamp, e.g.,

    29816 7310572949125804849 0x1ac0 [0x50]: PERF_RECORD_MMAP ...

    That messes up the sample sorting such that synthesized guest maps are
    processed last.

    With this patch you get a much more helpful report:

    12.11% [guest/11399] [guest.kernel.kallsyms.11399] [g] irqtime_account_process_tick
    10.58% [guest/11399] [guest.kernel.kallsyms.11399] [g] run_timer_softirq
    6.95% [guest/11094] [guest.kernel.kallsyms.11094] [g] printk_needs_cpu
    6.50% [guest/11094] [guest.kernel.kallsyms.11094] [g] do_timer
    6.45% [guest/11399] [guest.kernel.kallsyms.11399] [g] idle_balance
    4.90% [guest/11094] [guest.kernel.kallsyms.11094] [g] native_read_tsc
    ...

    v2:
    - changed rbtree walk to use rb_first per Namhyung's suggestion

    Tested-by: Jiri Olsa
    Signed-off-by: David Ahern
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1342826756-64663-5-git-send-email-dsahern@gmail.com
    Signed-off-by: Arnaldo Carvalho de Melo

    David Ahern
     
  • e.g., perf kvm --host --guest report -i perf.data --stdio -D
    shows:

    1 599127912065356 0x143b8 [0x48]: PERF_RECORD_SAMPLE(IP, 5): 5671/5676: 0x7fdf95a061c0 period: 1 addr: 0
    ... chain: nr:2
    ..... 0: ffffffffffffff80
    ..... 1: fffffffffffffe00
    ... thread: qemu-kvm:5671
    ...... dso:

    (IP, 5) means sample in guest userspace. Those samples should not be lumped
    into the VMM's host thread. i.e, the report output:

    56.86% qemu-kvm [unknown] [u] 0x00007fdf95a061c0

    With this patch the output emphasizes it is a guest userspace hit:

    56.86% [guest/5671] [unknown] [u] 0x00007fdf95a061c0

    Looking at 3 VMs (2 64-bit, 1 32-bit) with each running a CPU bound
    process (openssl speed), perf report currently shows:

    93.84% 117726 qemu-kvm [unknown] [u] 0x00007fd7dcaea8e5

    which is wrong. With this patch you get:

    31.50% 39258 [guest/18772] [unknown] [u] 0x00007fd7dcaea8e5
    31.50% 39236 [guest/11230] [unknown] [u] 0x0000000000a57340
    30.84% 39232 [guest/18395] [unknown] [u] 0x00007f66f641e107

    Tested-by: Jiri Olsa
    Signed-off-by: David Ahern
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1342826756-64663-4-git-send-email-dsahern@gmail.com
    Signed-off-by: Arnaldo Carvalho de Melo

    David Ahern
     
  • COMM events are not generated in the context of a guest machine, so the
    thread name is never set for the VMM process. For example, the qemu-kvm
    name applies to the process in the host machine, not the guest machine.
    So, samples for guest machines are currently displayed as:

    99.67% :5671 [unknown] [g] 0xffffffff81366b41

    where 5671 is the pid of the VMM. With this patch the samples in the guest
    machine are shown as:

    18.43% [guest/5671] [unknown] [g] 0xffffffff810d68b7

    Tested-by: Jiri Olsa
    Signed-off-by: David Ahern
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1342826756-64663-3-git-send-email-dsahern@gmail.com
    Signed-off-by: Arnaldo Carvalho de Melo

    David Ahern
     

24 Jul, 2012

1 commit

  • Current debug message is:
    Problems creating module maps, continuing anyway...

    When running multiple VMs it would be nice to know which machine the
    message is referring to:

    $ perf kvm --guest --guestmount=/tmp/guest-mount record -av -- sleep 10
    Problems creating module maps for guest 6613, continuing anyway...

    Signed-off-by: David Ahern
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1342826756-64663-2-git-send-email-dsahern@gmail.com
    Signed-off-by: Arnaldo Carvalho de Melo

    David Ahern
     

21 Jul, 2012

1 commit

  • The min configs are saved in a perl hash called force_configs, and this
    hash is used to add configs to the .config file. But it was not being
    reset between tests and a min config from a previous test would affect
    the min config of the next test causing undesirable results.

    Reset the force_config hash at the start of each test.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

20 Jul, 2012

7 commits

  • Under some conditions, c1% was displayed as very large number,
    much higher than 100%.

    c1% is not measured, it is derived as "that, which is left over"
    from other counters. However, the other counters are not collected
    atomically, and so it is possible for c1% to be calaculagted as
    a small negative number -- displayed as very large positive.

    There was a check for mperf vs tsc for this already,
    but it needed to also include the other counters
    that are used to calculate c1.

    Signed-off-by: Len Brown

    Len Brown
     
  • Measuring large profoundly-idle configurations
    requires turbostat to be more lightweight.
    Otherwise, the operation of turbostat itself
    can interfere with the measurements.

    This re-write makes turbostat topology aware.
    Hardware is accessed in "topology order".
    Redundant hardware accesses are deleted.
    Redundant output is deleted.
    Also, output is buffered and
    local RDTSC use replaces remote MSR access for TSC.

    From a feature point of view, the output
    looks different since redundant figures are absent.
    Also, there are now -c and -p options -- to restrict
    output to the 1st thread in each core, and the 1st
    thread in each package, respectively. This is helpful
    to reduce output on big systems, where more detail
    than the "-s" system summary is desired.
    Finally, periodic mode output is now on stdout, not stderr.

    Turbostat v2 is also slightly more robust in
    handling run-time CPU online/offline events,
    as it now checks the actual map of on-line cpus rather
    than just the total number of on-line cpus.

    Signed-off-by: Len Brown

    Len Brown
     
  • Usually the target is booted into a dependable kernel when a test
    starts. The test will install the test kernel and reboot the box. But
    there may be a time that the kernel is running an unreliable kernel and
    the reboot may crash.

    Have ktest detect crashes on a reboot and force a power-cycle instead.

    This can usually happen if a test kernel was installed to run manual
    tests, but the user forgot to reboot to the known good kernel.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • If the console is constantly outputting content, this can cause ktest
    to get stuck waiting on the monitor to settle down.

    The option MAX_MONITOR_WAIT is the maximum time (in seconds) for ktest
    to wait for the console to flush.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • With a name like 'oldnoconfig' one may think that the config generated
    would disable all configs that were not defined (selecting "no" for all
    options). But this is not the case. It selects the default. If a config
    has a 'default y', then it is added if not specified.

    This broke the config bisect, because options not specified by a config
    will just use the default, where it expected to turn off. This caused an
    option to be enabled that disabled an option that would break the build.
    The end result was that we never found the bad config at the end of the
    test.

    Instead of using 'make oldnoconfig', ktest now builds the options it
    expects enabled and disabled. When it turns off an option, it will no
    longer remove it, but actually set it to:

    # CONFIG_FOO is not set.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • The config-bisect can take a bad config and bisect it down to find out
    what config actually breaks the config. But as all tests will apply a
    minconfig (defined by a user) to apply before booting, it is possible
    that the minconfig could actually make the bad config work (minconfigs
    can disable configs). The end result is that the config bisect test will
    not find a config that breaks. This can be rather frustrating to the
    user.

    The CONFIG_BISECT_CHECK option, when set to 1, will make sure that the
    bad config (with the minconfig applied) still fails before trying to
    bisect.

    And yes, I did get burned by this.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Add the PRE_INSTALL option that will allow a user to specify a shell
    command to be executed before the install operation executes.

    Signed-off-by: Steven Rostedt

    Steven Rostedt