27 Jul, 2016

8 commits

  • Merge updates from Andrew Morton:

    - a few misc bits

    - ocfs2

    - most(?) of MM

    * emailed patches from Andrew Morton : (125 commits)
    thp: fix comments of __pmd_trans_huge_lock()
    cgroup: remove unnecessary 0 check from css_from_id()
    cgroup: fix idr leak for the first cgroup root
    mm: memcontrol: fix documentation for compound parameter
    mm: memcontrol: remove BUG_ON in uncharge_list
    mm: fix build warnings in
    mm, thp: convert from optimistic swapin collapsing to conservative
    mm, thp: fix comment inconsistency for swapin readahead functions
    thp: update Documentation/{vm/transhuge,filesystems/proc}.txt
    shmem: split huge pages beyond i_size under memory pressure
    thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE
    khugepaged: add support of collapse for tmpfs/shmem pages
    shmem: make shmem_inode_info::lock irq-safe
    khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page()
    thp: extract khugepaged from mm/huge_memory.c
    shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings
    shmem: add huge pages support
    shmem: get_unmapped_area align huge page
    shmem: prepare huge= mount option and sysfs knob
    mm, rmap: account shmem thp pages
    ...

    Linus Torvalds
     
  • Pull power management updates from Rafael Wysocki:
    "Again, the majority of changes go into the cpufreq subsystem, but
    there are no big features this time. The cpufreq changes that stand
    out somewhat are the governor interface rework and improvements
    related to the handling of frequency tables. Apart from those, there
    are fixes and new device/CPU IDs in drivers, cleanups and an
    improvement of the new schedutil governor.

    Next, there are some changes in the hibernation core, including a fix
    for a nasty problem related to the MONITOR/MWAIT usage by CPU offline
    during resume from hibernation, a few core improvements related to
    memory management during resume, a couple of additional debug features
    and cleanups.

    Finally, we have some fixes and cleanups in the devfreq subsystem,
    generic power domains framework improvements related to system
    suspend/resume, support for some new chips in intel_idle and in the
    power capping RAPL driver, a new version of the AnalyzeSuspend utility
    and some assorted fixes and cleanups.

    Specifics:

    - Rework the cpufreq governor interface to make it more
    straightforward and modify the conservative governor to avoid using
    transition notifications (Rafael Wysocki).

    - Rework the handling of frequency tables by the cpufreq core to make
    it more efficient (Viresh Kumar).

    - Modify the schedutil governor to reduce the number of wakeups it
    causes to occur in cases when the CPU frequency doesn't need to be
    changed (Steve Muckle, Viresh Kumar).

    - Fix some minor issues and clean up code in the cpufreq core and
    governors (Rafael Wysocki, Viresh Kumar).

    - Add Intel Broxton support to the intel_pstate driver (Srinivas
    Pandruvada).

    - Fix problems related to the config TDP feature and to the validity
    of the MSR_HWP_INTERRUPT register in intel_pstate (Jan Kiszka,
    Srinivas Pandruvada).

    - Make intel_pstate update the cpu_frequency tracepoint even if the
    frequency doesn't change to avoid confusing powertop (Rafael
    Wysocki).

    - Clean up the usage of __init/__initdata in intel_pstate, mark some
    of its internal variables as __read_mostly and drop an unused
    structure element from it (Jisheng Zhang, Carsten Emde).

    - Clean up the usage of some duplicate MSR symbols in intel_pstate
    and turbostat (Srinivas Pandruvada).

    - Update/fix the powernv, s3c24xx and mvebu cpufreq drivers (Akshay
    Adiga, Viresh Kumar, Ben Dooks).

    - Fix a regression (introduced during the 4.5 cycle) in the
    pcc-cpufreq driver by reverting the problematic commit (Andreas
    Herrmann).

    - Add support for Intel Denverton to intel_idle, clean up Broxton
    support in it and make it explicitly non-modular (Jacob Pan, Jan
    Beulich, Paul Gortmaker).

    - Add support for Denverton and Ivy Bridge server to the Intel RAPL
    power capping driver and make it more careful about the handing of
    MSRs that may not be present (Jacob Pan, Xiaolong Wang).

    - Fix resume from hibernation on x86-64 by making the CPU offline
    during resume avoid using MONITOR/MWAIT in the "play dead" loop
    which may lead to an inadvertent "revival" of a "dead" CPU and a
    page fault leading to a kernel crash from it (Rafael Wysocki).

    - Make memory management during resume from hibernation more
    straightforward (Rafael Wysocki).

    - Add debug features that should help to detect problems related to
    hibernation and resume from it (Rafael Wysocki, Chen Yu).

    - Clean up hibernation core somewhat (Rafael Wysocki).

    - Prevent KASAN from instrumenting the hibernation core which leads
    to large numbers of false-positives from it (James Morse).

    - Prevent PM (hibernate and suspend) notifiers from being called
    during the cleanup phase if they have not been called during the
    corresponding preparation phase which is possible if one of the
    other notifiers returns an error at that time (Lianwei Wang).

    - Improve suspend-related debug printout in the tasks freezer and
    clean up suspend-related console handling (Roger Lu, Borislav
    Petkov).

    - Update the AnalyzeSuspend script in the kernel sources to version
    4.2 (Todd Brandt).

    - Modify the generic power domains framework to make it handle system
    suspend/resume better (Ulf Hansson).

    - Make the runtime PM framework avoid resuming devices synchronously
    when user space changes the runtime PM settings for them and
    improve its error reporting (Rafael Wysocki, Linus Walleij).

    - Fix error paths in devfreq drivers (exynos, exynos-ppmu,
    exynos-bus) and in the core, make some devfreq code explicitly
    non-modular and change some of it into tristate (Bartlomiej
    Zolnierkiewicz, Peter Chen, Paul Gortmaker).

    - Add DT support to the generic PM clocks management code and make it
    export some more symbols (Jon Hunter, Paul Gortmaker).

    - Make the PCI PM core code slightly more robust against possible
    driver errors (Andy Shevchenko).

    - Make it possible to change DESTDIR and PREFIX in turbostat (Andy
    Shevchenko)"

    * tag 'pm-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (89 commits)
    Revert "cpufreq: pcc-cpufreq: update default value of cpuinfo_transition_latency"
    PM / hibernate: Introduce test_resume mode for hibernation
    cpufreq: export cpufreq_driver_resolve_freq()
    cpufreq: Disallow ->resolve_freq() for drivers providing ->target_index()
    PCI / PM: check all fields in pci_set_platform_pm()
    cpufreq: acpi-cpufreq: use cached frequency mapping when possible
    cpufreq: schedutil: map raw required frequency to driver frequency
    cpufreq: add cpufreq_driver_resolve_freq()
    cpufreq: intel_pstate: Check cpuid for MSR_HWP_INTERRUPT
    intel_pstate: Update cpu_frequency tracepoint every time
    cpufreq: intel_pstate: clean remnant struct element
    PM / tools: scripts: AnalyzeSuspend v4.2
    x86 / hibernate: Use hlt_play_dead() when resuming from hibernation
    cpufreq: powernv: Replacing pstate_id with frequency table index
    intel_pstate: Fix MSR_CONFIG_TDP_x addressing in core_get_max_pstate()
    PM / hibernate: Image data protection during restoration
    PM / hibernate: Add missing braces in __register_nosave_region()
    PM / hibernate: Clean up comments in snapshot.c
    PM / hibernate: Clean up function headers in snapshot.c
    PM / hibernate: Add missing braces in hibernate_setup()
    ...

    Linus Torvalds
     
  • css_idr allocation starts at 1, so index 0 will never point to an item.
    css_from_id() currently filters that before asking idr_find(), but
    idr_find() would also just return NULL, so this is not needed.

    Link: http://lkml.kernel.org/r/20160617162427.GC19084@cmpxchg.org
    Signed-off-by: Johannes Weiner
    Cc: Vladimir Davydov
    Cc: Tejun Heo
    Cc: Nikolay Borisov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • The valid cgroup hierarchy ID range includes 0, so we can't filter for
    positive numbers when freeing it, or it'll leak the first ID. No big
    deal, just disruptive when reading the code.

    The ID is freed during error handling and when the reference count hits
    zero, so the double-free test is not necessary; remove it.

    Link: http://lkml.kernel.org/r/20160617162359.GB19084@cmpxchg.org
    Signed-off-by: Johannes Weiner
    Cc: Vladimir Davydov
    Cc: Tejun Heo
    Cc: Nikolay Borisov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Currently, to charge a non-slab allocation to kmemcg one has to use
    alloc_kmem_pages helper with __GFP_ACCOUNT flag. A page allocated with
    this helper should finally be freed using free_kmem_pages, otherwise it
    won't be uncharged.

    This API suits its current users fine, but it turns out to be impossible
    to use along with page reference counting, i.e. when an allocation is
    supposed to be freed with put_page, as it is the case with pipe or unix
    socket buffers.

    To overcome this limitation, this patch moves charging/uncharging to
    generic page allocator paths, i.e. to __alloc_pages_nodemask and
    free_pages_prepare, and zaps alloc/free_kmem_pages helpers. This way,
    one can use any of the available page allocation functions to get the
    allocated page charged to kmemcg - it's enough to pass __GFP_ACCOUNT,
    just like in case of kmalloc and friends. A charged page will be
    automatically uncharged on free.

    To make it possible, we need to mark pages charged to kmemcg somehow.
    To avoid introducing a new page flag, we make use of page->_mapcount for
    marking such pages. Since pages charged to kmemcg are not supposed to
    be mapped to userspace, it should work just fine. There are other
    (ab)users of page->_mapcount - buddy and balloon pages - but we don't
    conflict with them.

    In case kmemcg is compiled out or not used at runtime, this patch
    introduces no overhead to generic page allocator paths. If kmemcg is
    used, it will be plus one gfp flags check on alloc and plus one
    page->_mapcount check on free, which shouldn't hurt performance, because
    the data accessed are hot.

    Link: http://lkml.kernel.org/r/a9736d856f895bcb465d9f257b54efe32eda6f99.1464079538.git.vdavydov@virtuozzo.com
    Signed-off-by: Vladimir Davydov
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Eric Dumazet
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     
  • Pull block driver updates from Jens Axboe:
    "This branch also contains core changes. I've come to the conclusion
    that from 4.9 and forward, I'll be doing just a single branch. We
    often have dependencies between core and drivers, and it's hard to
    always split them up appropriately without pulling core into drivers
    when that happens.

    That said, this contains:

    - separate secure erase type for the core block layer, from
    Christoph.

    - set of discard fixes, from Christoph.

    - bio shrinking fixes from Christoph, as a followup up to the
    op/flags change in the core branch.

    - map and append request fixes from Christoph.

    - NVMeF (NVMe over Fabrics) code from Christoph. This is pretty
    exciting!

    - nvme-loop fixes from Arnd.

    - removal of ->driverfs_dev from Dan, after providing a
    device_add_disk() helper.

    - bcache fixes from Bhaktipriya and Yijing.

    - cdrom subchannel read fix from Vchannaiah.

    - set of lightnvm updates from Wenwei, Matias, Johannes, and Javier.

    - set of drbd updates and fixes from Fabian, Lars, and Philipp.

    - mg_disk error path fix from Bart.

    - user notification for failed device add for loop, from Minfei.

    - NVMe in general:
    + NVMe delay quirk from Guilherme.
    + SR-IOV support and command retry limits from Keith.
    + fix for memory-less NUMA node from Masayoshi.
    + use UINT_MAX for discard sectors, from Minfei.
    + cancel IO fixes from Ming.
    + don't allocate unused major, from Neil.
    + error code fixup from Dan.
    + use constants for PSDT/FUSE from James.
    + variable init fix from Jay.
    + fabrics fixes from Ming, Sagi, and Wei.
    + various fixes"

    * 'for-4.8/drivers' of git://git.kernel.dk/linux-block: (115 commits)
    nvme/pci: Provide SR-IOV support
    nvme: initialize variable before logical OR'ing it
    block: unexport various bio mapping helpers
    scsi/osd: open code blk_make_request
    target: stop using blk_make_request
    block: simplify and export blk_rq_append_bio
    block: ensure bios return from blk_get_request are properly initialized
    virtio_blk: use blk_rq_map_kern
    memstick: don't allow REQ_TYPE_BLOCK_PC requests
    block: shrink bio size again
    block: simplify and cleanup bvec pool handling
    block: get rid of bio_rw and READA
    block: don't ignore -EOPNOTSUPP blkdev_issue_write_same
    block: introduce BLKDEV_DISCARD_ZERO to fix zeroout
    NVMe: don't allocate unused nvme_major
    nvme: avoid crashes when node 0 is memoryless node.
    nvme: Limit command retries
    loop: Make user notify for adding loop device failed
    nvme-loop: fix nvme-loop Kconfig dependencies
    nvmet: fix return value check in nvmet_subsys_alloc()
    ...

    Linus Torvalds
     
  • Pull core block updates from Jens Axboe:

    - the big change is the cleanup from Mike Christie, cleaning up our
    uses of command types and modified flags. This is what will throw
    some merge conflicts

    - regression fix for the above for btrfs, from Vincent

    - following up to the above, better packing of struct request from
    Christoph

    - a 2038 fix for blktrace from Arnd

    - a few trivial/spelling fixes from Bart Van Assche

    - a front merge check fix from Damien, which could cause issues on
    SMR drives

    - Atari partition fix from Gabriel

    - convert cfq to highres timers, since jiffies isn't granular enough
    for some devices these days. From Jan and Jeff

    - CFQ priority boost fix idle classes, from me

    - cleanup series from Ming, improving our bio/bvec iteration

    - a direct issue fix for blk-mq from Omar

    - fix for plug merging not involving the IO scheduler, like we do for
    other types of merges. From Tahsin

    - expose DAX type internally and through sysfs. From Toshi and Yigal

    * 'for-4.8/core' of git://git.kernel.dk/linux-block: (76 commits)
    block: Fix front merge check
    block: do not merge requests without consulting with io scheduler
    block: Fix spelling in a source code comment
    block: expose QUEUE_FLAG_DAX in sysfs
    block: add QUEUE_FLAG_DAX for devices to advertise their DAX support
    Btrfs: fix comparison in __btrfs_map_block()
    block: atari: Return early for unsupported sector size
    Doc: block: Fix a typo in queue-sysfs.txt
    cfq-iosched: Charge at least 1 jiffie instead of 1 ns
    cfq-iosched: Fix regression in bonnie++ rewrite performance
    cfq-iosched: Convert slice_resid from u64 to s64
    block: Convert fifo_time from ulong to u64
    blktrace: avoid using timespec
    block/blk-cgroup.c: Declare local symbols static
    block/bio-integrity.c: Add #include "blk.h"
    block/partition-generic.c: Remove a set-but-not-used variable
    block: bio: kill BIO_MAX_SIZE
    cfq-iosched: temporarily boost queue priority for idle classes
    block: drbd: avoid to use BIO_MAX_SIZE
    block: bio: remove BIO_MAX_SECTORS
    ...

    Linus Torvalds
     
  • Pull cgroup updates from Tejun Heo:
    "Nothing too exciting.

    - updates to the pids controller so that pid limit breaches can be
    noticed and monitored from userland.

    - cleanups and non-critical bug fixes"

    * 'for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
    cgroup: remove duplicated include from cgroup.c
    cgroup: Use lld instead of ld when printing pids controller events_limit
    cgroup: Add pids controller event when fork fails because of pid limit
    cgroup: allow NULL return from ss->css_alloc()
    cgroup: remove unnecessary 0 check from css_from_id()
    cgroup: fix idr leak for the first cgroup root

    Linus Torvalds
     

26 Jul, 2016

8 commits

  • Pull irq updates from Thomas Gleixner:
    "The irq department delivers:

    - new core infrastructure to allow better management of multi-queue
    devices (interrupt spreading, node aware descriptor allocation ...)

    - a new interrupt flow handler to support the new fangled Intel VMD
    devices.

    - yet another new interrupt controller driver.

    - a series of fixes which addresses sparse warnings, missing
    includes, missing static declarations etc from Ben Dooks.

    - a fix for the error handling in the hierarchical domain allocation
    code.

    - the usual pile of small updates to core and driver code"

    * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
    genirq: Fix missing irq allocation affinity hint
    irqdomain: Fix irq_domain_alloc_irqs_recursive() error handling
    irq/Documentation: Correct result of echnoing 5 to smp_affinity
    MAINTAINERS: Remove Jiang Liu from irq domains
    genirq/msi: Fix broken debug output
    genirq: Add a helper to spread an affinity mask for MSI/MSI-X vectors
    genirq/msi: Make use of affinity aware allocations
    genirq: Use affinity hint in irqdesc allocation
    genirq: Add affinity hint to irq allocation
    genirq: Introduce IRQD_AFFINITY_MANAGED flag
    genirq/msi: Remove unused MSI_FLAG_IDENTITY_MAP
    irqchip/s3c24xx: Fixup IO accessors for big endian
    irqchip/exynos-combiner: Fix usage of __raw IO
    irqdomain: Fix disposal of mappings for interrupt hierarchies
    irqchip/aspeed-vic: Add irq controller for Aspeed
    doc/devicetree: Add Aspeed VIC bindings
    x86/PCI/VMD: Use untracked irq handler
    genirq: Add untracked irq handler
    irqchip/mips-gic: Populate irq_domain names
    irqchip/gicv3-its: Implement two-level(indirect) device table support
    ...

    Linus Torvalds
     
  • Pull timer updates from Thomas Gleixner:
    "This update provides the following changes:

    - The rework of the timer wheel which addresses the shortcomings of
    the current wheel (cascading, slow search for next expiring timer,
    etc). That's the first major change of the wheel in almost 20
    years since Finn implemted it.

    - A large overhaul of the clocksource drivers init functions to
    consolidate the Device Tree initialization

    - Some more Y2038 updates

    - A capability fix for timerfd

    - Yet another clock chip driver

    - The usual pile of updates, comment improvements all over the place"

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (130 commits)
    tick/nohz: Optimize nohz idle enter
    clockevents: Make clockevents_subsys static
    clocksource/drivers/time-armada-370-xp: Fix return value check
    timers: Implement optimization for same expiry time in mod_timer()
    timers: Split out index calculation
    timers: Only wake softirq if necessary
    timers: Forward the wheel clock whenever possible
    timers/nohz: Remove pointless tick_nohz_kick_tick() function
    timers: Optimize collect_expired_timers() for NOHZ
    timers: Move __run_timers() function
    timers: Remove set_timer_slack() leftovers
    timers: Switch to a non-cascading wheel
    timers: Reduce the CPU index space to 256k
    timers: Give a few structs and members proper names
    hlist: Add hlist_is_singular_node() helper
    signals: Use hrtimer for sigtimedwait()
    timers: Remove the deprecated mod_timer_pinned() API
    timers, net/ipv4/inet: Initialize connection request timers as pinned
    timers, drivers/tty/mips_ejtag: Initialize the poll timer as pinned
    timers, drivers/tty/metag_da: Initialize the poll timer as pinned
    ...

    Linus Torvalds
     
  • Pull x86 boot updates from Ingo Molnar:
    "The main changes:

    - add initial commits to randomize kernel memory section virtual
    addresses, enabled via a new kernel option: RANDOMIZE_MEMORY
    (Thomas Garnier, Kees Cook, Baoquan He, Yinghai Lu)

    - enhance KASLR (RANDOMIZE_BASE) physical memory randomization (Kees
    Cook)

    - EBDA/BIOS region boot quirk cleanups (Andy Lutomirski, Ingo Molnar)

    - misc cleanups/fixes"

    * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/boot: Simplify EBDA-vs-BIOS reservation logic
    x86/boot: Clarify what x86_legacy_features.reserve_bios_regions does
    x86/boot: Reorganize and clean up the BIOS area reservation code
    x86/mm: Do not reference phys addr beyond kernel
    x86/mm: Add memory hotplug support for KASLR memory randomization
    x86/mm: Enable KASLR for vmalloc memory regions
    x86/mm: Enable KASLR for physical mapping memory regions
    x86/mm: Implement ASLR for kernel memory regions
    x86/mm: Separate variable for trampoline PGD
    x86/mm: Add PUD VA support for physical mapping
    x86/mm: Update physical mapping variable names
    x86/mm: Refactor KASLR entropy functions
    x86/KASLR: Fix boot crash with certain memory configurations
    x86/boot/64: Add forgotten end of function marker
    x86/KASLR: Allow randomization below the load address
    x86/KASLR: Extend kernel image physical address randomization to addresses larger than 4G
    x86/KASLR: Randomize virtual address separately
    x86/KASLR: Clarify identity map interface
    x86/boot: Refuse to build with data relocations
    x86/KASLR, x86/power: Remove x86 hibernation restrictions

    Linus Torvalds
     
  • Pull NOHZ updates from Ingo Molnar:

    - fix system/idle cputime leaked on cputime accounting (all nohz
    configs) (Rik van Riel)

    - remove the messy, ad-hoc irqtime account on nohz-full and make it
    compatible with CONFIG_IRQ_TIME_ACCOUNTING=y instead (Rik van Riel)

    - cleanups (Frederic Weisbecker)

    - remove unecessary irq disablement in the irqtime code (Rik van Riel)

    * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched/cputime: Drop local_irq_save/restore from irqtime_account_irq()
    sched/cputime: Reorganize vtime native irqtime accounting headers
    sched/cputime: Clean up the old vtime gen irqtime accounting completely
    sched/cputime: Replace VTIME_GEN irq time code with IRQ_TIME_ACCOUNTING code
    sched/cputime: Count actually elapsed irq & softirq time

    Linus Torvalds
     
  • Pull scheduler updates from Ingo Molnar:

    - introduce and use task_rcu_dereference()/try_get_task_struct() to fix
    and generalize task_struct handling (Oleg Nesterov)

    - do various per entity load tracking (PELT) fixes and optimizations
    (Peter Zijlstra)

    - cputime virt-steal time accounting enhancements/fixes (Wanpeng Li)

    - introduce consolidated cputime output file cpuacct.usage_all and
    related refactorings (Zhao Lei)

    - ... plus misc fixes and enhancements

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched/core: Panic on scheduling while atomic bugs if kernel.panic_on_warn is set
    sched/cpuacct: Introduce cpuacct.usage_all to show all CPU stats together
    sched/cpuacct: Use loop to consolidate code in cpuacct_stats_show()
    sched/cpuacct: Merge cpuacct_usage_index and cpuacct_stat_index enums
    sched/fair: Rework throttle_count sync
    sched/core: Fix sched_getaffinity() return value kerneldoc comment
    sched/fair: Reorder cgroup creation code
    sched/fair: Apply more PELT fixes
    sched/fair: Fix PELT integrity for new tasks
    sched/cgroup: Fix cpu_cgroup_fork() handling
    sched/fair: Fix PELT integrity for new groups
    sched/fair: Fix and optimize the fork() path
    sched/cputime: Add steal time support to full dynticks CPU time accounting
    sched/cputime: Fix prev steal time accouting during CPU hotplug
    KVM: Fix steal clock warp during guest CPU hotplug
    sched/debug: Always show 'nr_migrations'
    sched/fair: Use task_rcu_dereference()
    sched/api: Introduce task_rcu_dereference() and try_get_task_struct()
    sched/idle: Optimize the generic idle loop
    sched/fair: Fix the wrong throttled clock time for cfs_rq_clock_task()

    Linus Torvalds
     
  • Pull perf updates from Ingo Molnar:
    "With over 300 commits it's been a busy cycle - with most of the work
    concentrated on the tooling side (as it should).

    The main kernel side enhancements were:

    - Add per event callchain limit: Recently we introduced a sysctl to
    tune the max-stack for all events for which callchains were
    requested:

    $ sysctl kernel.perf_event_max_stack
    kernel.perf_event_max_stack = 127

    Now this patch introduces a way to configure this per event, i.e.
    this becomes possible:

    $ perf record -e sched:*/max-stack=2/ -e block:*/max-stack=10/ -a

    allowing finer tuning of how much buffer space callchains use.

    This uses an u16 from the reserved space at the end, leaving
    another u16 for future use.

    There has been interest in even finer tuning, namely to control the
    max stack for kernel and userspace callchains separately. Further
    discussion is needed, we may for instance use the remaining u16 for
    that and when it is present, assume that the sample_max_stack
    introduced in this patch applies for the kernel, and the u16 left
    is used for limiting the userspace callchain (Arnaldo Carvalho de
    Melo)

    - Optimize AUX event (hardware assisted side-band event) delivery
    (Kan Liang)

    - Rework Intel family name macro usage (this is partially x86 arch
    work) (Dave Hansen)

    - Refine and fix Intel LBR support (David Carrillo-Cisneros)

    - Add support for Intel 'TopDown' events (Andi Kleen)

    - Intel uncore PMU driver fixes and enhancements (Kan Liang)

    - ... other misc changes.

    Here's an incomplete list of the tooling enhancements (but there's
    much more, see the shortlog and the git log for details):

    - Support cross unwinding, i.e. collecting '--call-graph dwarf'
    perf.data files in one machine and then doing analysis in another
    machine of a different hardware architecture. This enables, for
    instance, to do:

    $ perf record -a --call-graph dwarf

    on a x86-32 or aarch64 system and then do 'perf report' on it on a
    x86_64 workstation (He Kuang)

    - Allow reading from a backward ring buffer (one setup via
    sys_perf_event_open() with perf_event_attr.write_backward = 1)
    (Wang Nan)

    - Finish merging initial SDT (Statically Defined Traces) support, see
    cset comments for details about how it all works (Masami Hiramatsu)

    - Support attaching eBPF programs to tracepoints (Wang Nan)

    - Add demangling of symbols in programs written in the Rust language
    (David Tolnay)

    - Add support for tracepoints in the python binding, including an
    example, that sets up and parses sched:sched_switch events,
    tools/perf/python/tracepoint.py (Jiri Olsa)

    - Introduce --stdio-color to set up the color output mode selection
    in 'annotate' and 'report', allowing emit color escape sequences
    when redirecting the output of these tools (Arnaldo Carvalho de
    Melo)

    - Add 'callindent' option to 'perf script -F', to indent the Intel PT
    call stack, making this output more ftrace-like (Adrian Hunter,
    Andi Kleen)

    - Allow dumping the object files generated by llvm when processing
    eBPF scriptlet events (Wang Nan)

    - Add stackcollapse.py script to help generating flame graphs (Paolo
    Bonzini)

    - Add --ldlat option to 'perf mem' to specify load latency for loads
    event (e.g. cpu/mem-loads/ ) (Jiri Olsa)

    - Tooling support for Intel TopDown counters, recently added to the
    kernel (Andi Kleen)"

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (303 commits)
    perf tests: Add is_printable_array test
    perf tools: Make is_printable_array global
    perf script python: Fix string vs byte array resolving
    perf probe: Warn unmatched function filter correctly
    perf cpu_map: Add more helpers
    perf stat: Balance opening and reading events
    tools: Copy linux/{hash,poison}.h and check for drift
    perf tools: Remove include/linux/list.h from perf's MANIFEST
    tools: Copy the bitops files accessed from the kernel and check for drift
    Remove: kernel unistd*h files from perf's MANIFEST, not used
    perf tools: Remove tools/perf/util/include/linux/const.h
    perf tools: Remove tools/perf/util/include/asm/byteorder.h
    perf tools: Add missing linux/compiler.h include to perf-sys.h
    perf jit: Remove some no-op error handling
    perf jit: Add missing curly braces
    objtool: Initialize variable to silence old compiler
    objtool: Add -I$(srctree)/tools/arch/$(ARCH)/include/uapi
    perf record: Add --tail-synthesize option
    perf session: Don't warn about out of order event if write_backward is used
    perf tools: Enable overwrite settings
    ...

    Linus Torvalds
     
  • Pull locking updates from Ingo Molnar:
    "The locking tree was busier in this cycle than the usual pattern - a
    couple of major projects happened to coincide.

    The main changes are:

    - implement the atomic_fetch_{add,sub,and,or,xor}() API natively
    across all SMP architectures (Peter Zijlstra)

    - add atomic_fetch_{inc/dec}() as well, using the generic primitives
    (Davidlohr Bueso)

    - optimize various aspects of rwsems (Jason Low, Davidlohr Bueso,
    Waiman Long)

    - optimize smp_cond_load_acquire() on arm64 and implement LSE based
    atomic{,64}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}()
    on arm64 (Will Deacon)

    - introduce smp_acquire__after_ctrl_dep() and fix various barrier
    mis-uses and bugs (Peter Zijlstra)

    - after discovering ancient spin_unlock_wait() barrier bugs in its
    implementation and usage, strengthen its semantics and update/fix
    usage sites (Peter Zijlstra)

    - optimize mutex_trylock() fastpath (Peter Zijlstra)

    - ... misc fixes and cleanups"

    * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (67 commits)
    locking/atomic: Introduce inc/dec variants for the atomic_fetch_$op() API
    locking/barriers, arch/arm64: Implement LDXR+WFE based smp_cond_load_acquire()
    locking/static_keys: Fix non static symbol Sparse warning
    locking/qspinlock: Use __this_cpu_dec() instead of full-blown this_cpu_dec()
    locking/atomic, arch/tile: Fix tilepro build
    locking/atomic, arch/m68k: Remove comment
    locking/atomic, arch/arc: Fix build
    locking/Documentation: Clarify limited control-dependency scope
    locking/atomic, arch/rwsem: Employ atomic_long_fetch_add()
    locking/atomic, arch/qrwlock: Employ atomic_fetch_add_acquire()
    locking/atomic, arch/mips: Convert to _relaxed atomics
    locking/atomic, arch/alpha: Convert to _relaxed atomics
    locking/atomic: Remove the deprecated atomic_{set,clear}_mask() functions
    locking/atomic: Remove linux/atomic.h:atomic_fetch_or()
    locking/atomic: Implement atomic{,64,_long}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}()
    locking/atomic: Fix atomic64_relaxed() bits
    locking/atomic, arch/xtensa: Implement atomic_fetch_{add,sub,and,or,xor}()
    locking/atomic, arch/x86: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
    locking/atomic, arch/tile: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
    locking/atomic, arch/sparc: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
    ...

    Linus Torvalds
     
  • Pull RCU updates from Ingo Molnar:
    "The main changes in this cycle were:

    - documentation updates

    - miscellaneous fixes

    - minor reorganization of code

    - torture-test updates"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
    rcu: Correctly handle sparse possible cpus
    rcu: sysctl: Panic on RCU Stall
    rcu: Fix a typo in a comment
    rcu: Make call_rcu_tasks() tolerate first call with irqs disabled
    rcu: Disable TASKS_RCU for usermode Linux
    rcu: No ordering for rcu_assign_pointer() of NULL
    rcutorture: Fix error return code in rcu_perf_init()
    torture: Inflict default jitter
    rcuperf: Don't treat gp_exp mis-setting as a WARN
    rcutorture: Drop "-soundhw pcspkr" from x86 boot arguments
    rcutorture: Don't specify the cpu type of QEMU on PPC
    rcutorture: Make -soundhw a x86 specific option
    rcutorture: Use vmlinux as the fallback kernel image
    rcutorture/doc: Create initrd using dracut
    torture: Stop onoff task if there is only one cpu
    torture: Add starvation events to error summary
    torture: Break online and offline functions out of torture_onoff()
    torture: Forgive lengthy trace dumps and preemption
    torture: Remove CONFIG_RCU_TORTURE_TEST_RUNNABLE, simplify code
    torture: Simplify code, eliminate RCU_PERF_TEST_RUNNABLE
    ...

    Linus Torvalds
     

25 Jul, 2016

3 commits

  • * pm-cpufreq: (41 commits)
    Revert "cpufreq: pcc-cpufreq: update default value of cpuinfo_transition_latency"
    cpufreq: export cpufreq_driver_resolve_freq()
    cpufreq: Disallow ->resolve_freq() for drivers providing ->target_index()
    cpufreq: acpi-cpufreq: use cached frequency mapping when possible
    cpufreq: schedutil: map raw required frequency to driver frequency
    cpufreq: add cpufreq_driver_resolve_freq()
    cpufreq: intel_pstate: Check cpuid for MSR_HWP_INTERRUPT
    intel_pstate: Update cpu_frequency tracepoint every time
    cpufreq: intel_pstate: clean remnant struct element
    cpufreq: powernv: Replacing pstate_id with frequency table index
    intel_pstate: Fix MSR_CONFIG_TDP_x addressing in core_get_max_pstate()
    cpufreq: Reuse new freq-table helpers
    cpufreq: Handle sorted frequency tables more efficiently
    cpufreq: Drop redundant check from cpufreq_update_current_freq()
    intel_pstate: Declare pid_params/pstate_funcs/hwp_active __read_mostly
    intel_pstate: add __init/__initdata marker to some functions/variables
    intel_pstate: Fix incorrect placement of __initdata
    cpufreq: mvebu: fix integer to pointer cast
    cpufreq: intel_pstate: Broxton support
    cpufreq: conservative: Do not use transition notifications
    ...

    Rafael J. Wysocki
     
  • * pm-sleep:
    PM / hibernate: Introduce test_resume mode for hibernation
    x86 / hibernate: Use hlt_play_dead() when resuming from hibernation
    PM / hibernate: Image data protection during restoration
    PM / hibernate: Add missing braces in __register_nosave_region()
    PM / hibernate: Clean up comments in snapshot.c
    PM / hibernate: Clean up function headers in snapshot.c
    PM / hibernate: Add missing braces in hibernate_setup()
    PM / hibernate: Recycle safe pages after image restoration
    PM / hibernate: Simplify mark_unsafe_pages()
    PM / hibernate: Do not free preallocated safe pages during image restore
    PM / suspend: show workqueue state in suspend flow
    PM / sleep: make PM notifiers called symmetrically
    PM / sleep: Make pm_prepare_console() return void
    PM / Hibernate: Don't let kasan instrument snapshot.c

    * pm-tools:
    PM / tools: scripts: AnalyzeSuspend v4.2
    tools/turbostat: allow user to alter DESTDIR and PREFIX

    Rafael J. Wysocki
     
  • Pull staging and IIO driver updates from Greg KH:
    "Here is the big Staging and IIO driver update for 4.8-rc1.

    We ended up adding more code than removing, again, but it's not all
    that bad. Lots of cleanups all over the staging tree, and new IIO
    drivers, full details in the shortlog.

    All of these have been in linux-next for a while with no reported
    issues"

    * tag 'staging-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (417 commits)
    drivers:iio:accel:mma8452: removed unwanted return statements
    drivers:iio:accel:mma8452: added cleanup provision in case of failure.
    iio: Add iio.git tree to MAINTAINERS
    iio:st_pressure: clean useless static channel initializers
    iio:st_pressure:lps22hb: temperature support
    iio:st_pressure:lps22hb: open drain support
    iio:st_pressure: temperature triggered buffering
    iio:st_pressure: document sampling gains
    iio:st_pressure: align storagebits on power of 2
    iio:st_sensors: align on storagebits boundaries
    staging:iio:lis3l02dq drop separate driver
    iio: accel: st_accel: Add lis3l02dq support
    iio: adc: add missing of_node references to iio_dev
    iio: adc: ti-ads1015: add indio_dev->dev.of_node reference
    iio: potentiometer: Fix typo in Kconfig
    iio: potentiometer: mcp4531: Add device tree binding
    iio: potentiometer: mcp4531: Add device tree binding documentation
    iio: potentiometer: mcp4531: Add support for MCP454x, MCP456x, MCP464x and MCP466x
    iio:imu:mpu6050: icm20608 initial support
    iio: adc: max1363: Add device tree binding
    ...

    Linus Torvalds
     

22 Jul, 2016

2 commits

  • test_resume mode is to verify if the snapshot data
    written to swap device can be successfully restored
    to memory. It is useful to ease the debugging process
    on hibernation, since this mode can not only bypass
    the BIOSes/bootloader, but also the system re-initialization.

    To avoid the risk to break the filesystm on persistent storage,
    this patch resumes the image with tasks frozen.

    For example:
    echo test_resume > /sys/power/disk
    echo disk > /sys/power/state

    [ 187.306470] PM: Image saving progress: 70%
    [ 187.395298] PM: Image saving progress: 80%
    [ 187.476697] PM: Image saving progress: 90%
    [ 187.554641] PM: Image saving done.
    [ 187.558896] PM: Wrote 594600 kbytes in 0.90 seconds (660.66 MB/s)
    [ 187.566000] PM: S|
    [ 187.589742] PM: Basic memory bitmaps freed
    [ 187.594694] PM: Checking hibernation image
    [ 187.599865] PM: Image signature found, resuming
    [ 187.605209] PM: Loading hibernation image.
    [ 187.665753] PM: Basic memory bitmaps created
    [ 187.691397] PM: Using 3 thread(s) for decompression.
    [ 187.691397] PM: Loading and decompressing image data (148650 pages)...
    [ 187.889719] PM: Image loading progress: 0%
    [ 188.100452] PM: Image loading progress: 10%
    [ 188.244781] PM: Image loading progress: 20%
    [ 189.057305] PM: Image loading done.
    [ 189.068793] PM: Image successfully loaded

    Suggested-by: Rafael J. Wysocki
    Signed-off-by: Chen Yu
    Signed-off-by: Rafael J. Wysocki

    Chen Yu
     
  • The slow-path frequency transition path is relatively expensive as it
    requires waking up a thread to do work. Should support be added for
    remote CPU cpufreq updates that is also expensive since it requires an
    IPI. These activities should be avoided if they are not necessary.

    To that end, calculate the actual driver-supported frequency required by
    the new utilization value in schedutil by using the recently added
    cpufreq_driver_resolve_freq API. If it is the same as the previously
    requested driver frequency then there is no need to continue with the
    update assuming the cpu frequency limits have not changed. This will
    have additional benefits should the semantics of the rate limit be
    changed to apply solely to frequency transitions rather than to
    frequency calculations in schedutil.

    The last raw required frequency is cached. This allows the driver
    frequency lookup to be skipped in the event that the new raw required
    frequency matches the last one, assuming a frequency update has not been
    forced due to limits changing (indicated by a next_freq value of
    UINT_MAX, see sugov_should_update_freq).

    Signed-off-by: Steve Muckle
    Reviewed-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Steve Muckle
     

20 Jul, 2016

1 commit


19 Jul, 2016

3 commits

  • tick_nohz_start_idle is called before checking whether the idle tick can be
    stopped. If the tick cannot be stopped, calling tick_nohz_start_idle() is
    pointless and just wasting CPU cycles.

    Only invoke tick_nohz_start_idle() when can_stop_idle_tick() returns true. A
    short one minute observation of the effect on ARM64 shows a reduction of calls
    by 1.5% thus optimizing the idle entry sequence.

    [tglx: Massaged changelog ]

    Co-developed-by: Sanjeev Yadav
    Signed-off-by: Gaurav Jindal
    Link: http://lkml.kernel.org/r/20160714120416.GB21099@gaurav.jindal@spreadtrum.com
    Signed-off-by: Thomas Gleixner

    Gaurav Jindal
     
  • The new affinity hint argument of __irq_domain_alloc_irqs() is missing in
    irq_reserve_ipi(). Add it.

    This fixes the following compilation error:

    kernel/irq/ipi.c: In function ‘irq_reserve_ipi’:
    kernel/irq/ipi.c:85:9: error: too few arguments to function ‘__irq_domain_alloc_irqs’
    virq = __irq_domain_alloc_irqs(domain, virq, nr_irqs, NUMA_NO_NODE,
    ^
    Fixes: 06ee6d571f0e ("genirq: Add affinity hint to irq allocation")
    Signed-off-by: Vincent Stehlé
    Cc: linux-pci@vger.kernel.org
    Cc: Christoph Hellwig
    Signed-off-by: Thomas Gleixner

    Vincent Stehle
     
  • The clockevents_subsys struct is used for sysfs support and
    is not declared or used outside the file it is defined in.
    Fix the following warning by making it static:

    kernel/time/clockevents.c:648:17: warning: symbol 'clockevents_subsys' was not declared. Should it be static?

    Signed-off-by: Ben Dooks
    Cc: linux-kernel@lists.codethink.co.uk
    Link: http://lkml.kernel.org/r/1466178974-7105-1-git-send-email-ben.dooks@codethink.co.uk
    Signed-off-by: Thomas Gleixner

    Ben Dooks
     

16 Jul, 2016

2 commits

  • Pull workqueue fix from Tejun Heo:
    "The optimization for setting unbound worker affinity masks collided
    with recent scheduler changes triggering warning messages.

    This late pull request fixes the bug by removing the optimization"

    * 'for-4.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
    workqueue: Fix setting affinity of unbound worker threads

    Linus Torvalds
     
  • On Intel hardware, native_play_dead() uses mwait_play_dead() by
    default and only falls back to the other methods if that fails.
    That also happens during resume from hibernation, when the restore
    (boot) kernel runs disable_nonboot_cpus() to take all of the CPUs
    except for the boot one offline.

    However, that is problematic, because the address passed to
    __monitor() in mwait_play_dead() is likely to be written to in the
    last phase of hibernate image restoration and that causes the "dead"
    CPU to start executing instructions again. Unfortunately, the page
    containing the address in that CPU's instruction pointer may not be
    valid any more at that point.

    First, that page may have been overwritten with image kernel memory
    contents already, so the instructions the CPU attempts to execute may
    simply be invalid. Second, the page tables previously used by that
    CPU may have been overwritten by image kernel memory contents, so the
    address in its instruction pointer is impossible to resolve then.

    A report from Varun Koyyalagunta and investigation carried out by
    Chen Yu show that the latter sometimes happens in practice.

    To prevent it from happening, temporarily change the smp_ops.play_dead
    pointer during resume from hibernation so that it points to a special
    "play dead" routine which uses hlt_play_dead() and avoids the
    inadvertent "revivals" of "dead" CPUs this way.

    A slightly unpleasant consequence of this change is that if the
    system is hibernated with one or more CPUs offline, it will generally
    draw more power after resume than it did before hibernation, because
    the physical state entered by CPUs via hlt_play_dead() is higher-power
    than the mwait_play_dead() one in the majority of cases. It is
    possible to work around this, but it is unclear how much of a problem
    that's going to be in practice, so the workaround will be implemented
    later if it turns out to be necessary.

    Link: https://bugzilla.kernel.org/show_bug.cgi?id=106371
    Reported-by: Varun Koyyalagunta
    Original-by: Chen Yu
    Tested-by: Chen Yu
    Signed-off-by: Rafael J. Wysocki
    Acked-by: Ingo Molnar

    Rafael J. Wysocki
     

15 Jul, 2016

3 commits

  • Merge misc fixes from Andrew Morton:
    "20 fixes"

    * emailed patches from Andrew Morton :
    m32r: fix build warning about putc
    mm: workingset: printk missing log level, use pr_info()
    mm: thp: refix false positive BUG in page_move_anon_rmap()
    mm: rmap: call page_check_address() with sync enabled to avoid racy check
    mm: thp: move pmd check inside ptl for freeze_page()
    vmlinux.lds: account for destructor sections
    gcov: add support for gcc version >= 6
    mm, meminit: ensure node is online before checking whether pages are uninitialised
    mm, meminit: always return a valid node from early_pfn_to_nid
    kasan/quarantine: fix bugs on qlist_move_cache()
    uapi: export lirc.h header
    madvise_free, thp: fix madvise_free_huge_pmd return value after splitting
    Revert "scripts/gdb: add documentation example for radix tree"
    Revert "scripts/gdb: add a Radix Tree Parser"
    scripts/gdb: Perform path expansion to lx-symbol's arguments
    scripts/gdb: add constants.py to .gitignore
    scripts/gdb: rebuild constants.py on dependancy change
    scripts/gdb: silence 'nothing to do' message
    kasan: add newline to messages
    mm, compaction: prevent VM_BUG_ON when terminating freeing scanner

    Linus Torvalds
     
  • Pull scheduler fix from Ingo Molnar:
    "Fix a CPU hotplug related corruption of the load average that got
    introduced in this merge window"

    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched/core: Correct off by one bug in load migration calculation

    Linus Torvalds
     
  • Link: http://lkml.kernel.org/r/20160701130914.GA23225@styxhp
    Signed-off-by: Florian Meier
    Reviewed-by: Peter Oberparleiter
    Tested-by: Peter Oberparleiter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Florian Meier
     

14 Jul, 2016

7 commits

  • Paolo pointed out that irqs are already blocked when irqtime_account_irq()
    is called. That means there is no reason to call local_irq_save/restore()
    again.

    Suggested-by: Paolo Bonzini
    Signed-off-by: Rik van Riel
    Signed-off-by: Frederic Weisbecker
    Reviewed-by: Paolo Bonzini
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Radim Krcmar
    Cc: Thomas Gleixner
    Cc: Wanpeng Li
    Link: http://lkml.kernel.org/r/1468421405-20056-6-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar

    Rik van Riel
     
  • Vtime generic irqtime accounting has been removed but there are a few
    remnants to clean up:

    * The vtime_accounting_cpu_enabled() check in irq entry was only used
    by CONFIG_VIRT_CPU_ACCOUNTING_GEN. We can safely remove it.

    * Without the vtime_accounting_cpu_enabled(), we no longer need to
    have a vtime_common_account_irq_enter() indirect function.

    * Move vtime_account_irq_enter() implementation under
    CONFIG_VIRT_CPU_ACCOUNTING_NATIVE which is the last user.

    * The vtime_account_user() call was only used on irq entry for
    CONFIG_VIRT_CPU_ACCOUNTING_GEN. We can remove that too.

    Signed-off-by: Frederic Weisbecker
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Paolo Bonzini
    Cc: Peter Zijlstra
    Cc: Radim Krcmar
    Cc: Rik van Riel
    Cc: Thomas Gleixner
    Cc: Wanpeng Li
    Link: http://lkml.kernel.org/r/1468421405-20056-4-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • The CONFIG_VIRT_CPU_ACCOUNTING_GEN irq time tracking code does not
    appear to currently work right.

    On CPUs without nohz_full=, only tick based irq time sampling is
    done, which breaks down when dealing with a nohz_idle CPU.

    On firewalls and similar systems, no ticks may happen on a CPU for a
    while, and the irq time spent may never get accounted properly. This
    can cause issues with capacity planning and power saving, which use
    the CPU statistics as inputs in decision making.

    Remove the VTIME_GEN vtime irq time code, and replace it with the
    IRQ_TIME_ACCOUNTING code, when selected as a config option by the user.

    Signed-off-by: Rik van Riel
    Signed-off-by: Frederic Weisbecker
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Paolo Bonzini
    Cc: Peter Zijlstra
    Cc: Radim Krcmar
    Cc: Thomas Gleixner
    Cc: Wanpeng Li
    Link: http://lkml.kernel.org/r/1468421405-20056-3-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar

    Rik van Riel
     
  • Currently, if there was any irq or softirq time during 'ticks'
    jiffies, the entire period will be accounted as irq or softirq
    time.

    This is inaccurate if only a subset of the time was actually spent
    handling irqs, and could conceivably mis-count all of the ticks during
    a period as irq time, when there was some irq and some softirq time.

    This can actually happen when irqtime_account_process_tick is called
    from account_idle_ticks, which can pass a larger number of ticks down
    all at once.

    Fix this by changing irqtime_account_hi_update(), irqtime_account_si_update(),
    and steal_account_process_ticks() to work with cputime_t time units, and
    return the amount of time spent in each mode.

    Rename steal_account_process_ticks() to steal_account_process_time(), to
    reflect that time is now accounted in cputime_t, instead of ticks.

    Additionally, have irqtime_account_process_tick() take into account how
    much time was spent in each of steal, irq, and softirq time.

    The latter could help improve the accuracy of cputime
    accounting when returning from idle on a NO_HZ_IDLE CPU.

    Properly accounting how much time was spent in hardirq and
    softirq time will also allow the NO_HZ_FULL code to re-use
    these same functions for hardirq and softirq accounting.

    Signed-off-by: Rik van Riel
    [ Make nsecs_to_cputime64() actually return cputime64_t. ]
    Signed-off-by: Frederic Weisbecker
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Paolo Bonzini
    Cc: Peter Zijlstra
    Cc: Radim Krcmar
    Cc: Thomas Gleixner
    Cc: Wanpeng Li
    Link: http://lkml.kernel.org/r/1468421405-20056-2-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar

    Rik van Riel
     
  • Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • …iio into staging-next

    Jonathan writes:

    Third set of IIO new device support, features and cleanups for the 4.8 cycle.

    New core features
    - Selection of the clock source for IIO timestamps. This is done per device
    as it makes little sense to have events in one timebase and data timestamped
    on another. Biggest reason for this is that we currently use a clock
    source which is non monotonic which can result in 'interesting' data sets.
    (Includes export for get_monotonic_corse64 which Thomas Gleixner didn't mind
    in an earlier version.)
    - MAINTAINERS add the git tree to the list for IIO.

    New device support + a kind of indirect staging graduation.
    * Broadcom iproc-static-adc
    - new driver
    * mcp4531
    - support for MCP454x, MCP456x, MCP464x and MCP466x potentiometers
    * mpu6050
    - support the IC20608 6 axis motion tracking device
    * st-sensors
    - support the lis3l02dq + drop the lis3l02dq driver from staging.
    The general purpose driver is missing event support, but good to get
    rid of this driver which was rather long in the tooth.

    New driver features
    * ak8975
    - Add vid regulator support and refactor handling in general.
    - Allow a delay after enabling regulators.
    - Runtime and system PM.
    * bmg160
    - filter frequency control support.
    * bmp280
    - SPI device support.
    - EOC interrupt support for the BMP085
    - power management support.
    - supply regulator support.
    - reset gpio support
    - dt bindings for reset gpio and regulators.
    - of table to support device tree registration
    * max1363
    - Device tree bindings.
    * mcp4531
    - Device tree bindings.
    * st-pressure
    - temperature channels as part of triggered buffer (previously not due
    probably to alignment issues - see below).
    - lps22hb open drain interrupt support.
    - lps22hb temperature channel support

    Cleanups and reworkings.
    * numerous ADC drivers
    - ensure the iio_dev->dev.of_node is set to the parent dev.of_node so
    as to allow client bindings to find the device.
    * ak8975
    - Fix incorrect handling of missing regulator
    - make sure power is down and remove.
    * bmp280
    - read the calibration data only once as it doesn't change.
    * isl29125
    - Use a few macros to make code a touch more readable.
    * mma8452
    - fix a memory leak on error.
    - drop an unecessary bit of return value handling.
    * potentiometer kconfig
    - typo fix.
    * st-pressure
    - drop some uninformative default assignments of elements of the channel
    array structure (aids readability).
    * st-sensors
    - Harden interrupt handling considerably. These are actually all using
    level interrupts, but at least two known boards have them wired to
    edge only interrupt chips. Hence a slightly interesting bit of handling
    is needed in which we first allow for the easy option (level triggered) and
    secondly check the status registers before reenabling edge interrupts and
    fall back to a tight loop in the thread until we successfully clear the
    interrupt. No harm is done if we never succeed in doing so. It's an odd
    patch that has been through a lot of revisions to reach a consensus on how
    to handle what is basically broken hardware (which the previous defaults
    allowed to kind of work).
    - Fix alignment to defined storagebytes boundaries.
    - Ensure alignment of power of 2 byte boundaries. This has always in theory
    been part of the ABI of IIO, but we missed a few that snuck in that need
    fixing. The effect was minor as they were only followed by timestamp
    channels which were correctly aligned,
    - Add some docs to explain the gain calculations.

    Greg Kroah-Hartman
     
  • …t.kernel.org/pub/scm/linux/kernel/git/tip/tip

    Pull perf and timer fixes from Ingo Molnar:
    "A fix for a posix CPU timers bug, and a perf printk message fix"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    perf/x86: Fix bogus kernel printk, again

    * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    posix_cpu_timer: Exit early when process has been reaped

    Linus Torvalds
     

13 Jul, 2016

2 commits

  • The move of calc_load_migrate() from CPU_DEAD to CPU_DYING did not take into
    account that the function is now called from a thread running on the outgoing
    CPU. As a result a cpu unplug leakes a load of 1 into the global load
    accounting mechanism.

    Fix it by adjusting for the currently running thread which calls
    calc_load_migrate().

    Reported-by: Anton Blanchard
    Signed-off-by: Thomas Gleixner
    Acked-by: Peter Zijlstra
    Cc: Linus Torvalds
    Cc: Michael Ellerman
    Cc: Vaidyanathan Srinivasan
    Cc: rt@linutronix.de
    Cc: shreyas@linux.vnet.ibm.com
    Fixes: e9cd8fa4fcfd: ("sched/migration: Move calc_load_migrate() into CPU_DYING")
    Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1607121744350.4083@nanos
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • Xiaolong Ye reported lock debug warnings triggered by the following commit:

    8de4a0066106 ("perf/x86: Convert the core to the hotplug state machine")

    The bug is the following: the cpuhp_bp_states[] array is cut short when
    CONFIG_SMP=n, but the dynamically registered callbacks are stored nevertheless
    and happily scribble outside of the array bounds...

    We need to store them in case that the state is unregistered so we can invoke
    the teardown function. That's independent of CONFIG_SMP. Make sure the array
    is large enough.

    Reported-by: kernel test robot
    Signed-off-by: Thomas Gleixner
    Cc: Adam Borowski
    Cc: Alexander Shishkin
    Cc: Anna-Maria Gleixner
    Cc: Arnaldo Carvalho de Melo
    Cc: Arnaldo Carvalho de Melo
    Cc: Borislav Petkov
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Sebastian Andrzej Siewior
    Cc: Stephane Eranian
    Cc: Vince Weaver
    Cc: lkp@01.org
    Cc: stable@vger.kernel.org
    Cc: tipbuild@zytor.com
    Fixes: cff7d378d3fd "cpu/hotplug: Convert to a state machine for the control processor"
    Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1607122144560.4083@nanos
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

11 Jul, 2016

1 commit

  • If an irq_domain is auto-recursive and irq_domain_alloc_irqs_recursive()
    for its parent has returned an error, then do return and avoid calling
    irq_domain_free_irqs_recursive() uselessly, because:
    - if domain->ops->alloc() had failed for an auto-recursive irq_domain,
    then irq_domain_free_irqs_recursive() had already been called;
    - if domain->ops->alloc() had failed for a not auto-recursive irq_domain,
    then there is nothing to free at all.

    Signed-off-by: Alexander Popov
    Acked-by: Marc Zyngier
    Link: http://lkml.kernel.org/r/1467505448-2850-1-git-send-email-alex.popov@linux.com
    Signed-off-by: Thomas Gleixner

    Alexander Popov