27 Jul, 2016

1 commit

  • Currently, to charge a non-slab allocation to kmemcg one has to use
    alloc_kmem_pages helper with __GFP_ACCOUNT flag. A page allocated with
    this helper should finally be freed using free_kmem_pages, otherwise it
    won't be uncharged.

    This API suits its current users fine, but it turns out to be impossible
    to use along with page reference counting, i.e. when an allocation is
    supposed to be freed with put_page, as it is the case with pipe or unix
    socket buffers.

    To overcome this limitation, this patch moves charging/uncharging to
    generic page allocator paths, i.e. to __alloc_pages_nodemask and
    free_pages_prepare, and zaps alloc/free_kmem_pages helpers. This way,
    one can use any of the available page allocation functions to get the
    allocated page charged to kmemcg - it's enough to pass __GFP_ACCOUNT,
    just like in case of kmalloc and friends. A charged page will be
    automatically uncharged on free.

    To make it possible, we need to mark pages charged to kmemcg somehow.
    To avoid introducing a new page flag, we make use of page->_mapcount for
    marking such pages. Since pages charged to kmemcg are not supposed to
    be mapped to userspace, it should work just fine. There are other
    (ab)users of page->_mapcount - buddy and balloon pages - but we don't
    conflict with them.

    In case kmemcg is compiled out or not used at runtime, this patch
    introduces no overhead to generic page allocator paths. If kmemcg is
    used, it will be plus one gfp flags check on alloc and plus one
    page->_mapcount check on free, which shouldn't hurt performance, because
    the data accessed are hot.

    Link: http://lkml.kernel.org/r/a9736d856f895bcb465d9f257b54efe32eda6f99.1464079538.git.vdavydov@virtuozzo.com
    Signed-off-by: Vladimir Davydov
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Eric Dumazet
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     

26 Jul, 2016

8 commits

  • Pull irq updates from Thomas Gleixner:
    "The irq department delivers:

    - new core infrastructure to allow better management of multi-queue
    devices (interrupt spreading, node aware descriptor allocation ...)

    - a new interrupt flow handler to support the new fangled Intel VMD
    devices.

    - yet another new interrupt controller driver.

    - a series of fixes which addresses sparse warnings, missing
    includes, missing static declarations etc from Ben Dooks.

    - a fix for the error handling in the hierarchical domain allocation
    code.

    - the usual pile of small updates to core and driver code"

    * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
    genirq: Fix missing irq allocation affinity hint
    irqdomain: Fix irq_domain_alloc_irqs_recursive() error handling
    irq/Documentation: Correct result of echnoing 5 to smp_affinity
    MAINTAINERS: Remove Jiang Liu from irq domains
    genirq/msi: Fix broken debug output
    genirq: Add a helper to spread an affinity mask for MSI/MSI-X vectors
    genirq/msi: Make use of affinity aware allocations
    genirq: Use affinity hint in irqdesc allocation
    genirq: Add affinity hint to irq allocation
    genirq: Introduce IRQD_AFFINITY_MANAGED flag
    genirq/msi: Remove unused MSI_FLAG_IDENTITY_MAP
    irqchip/s3c24xx: Fixup IO accessors for big endian
    irqchip/exynos-combiner: Fix usage of __raw IO
    irqdomain: Fix disposal of mappings for interrupt hierarchies
    irqchip/aspeed-vic: Add irq controller for Aspeed
    doc/devicetree: Add Aspeed VIC bindings
    x86/PCI/VMD: Use untracked irq handler
    genirq: Add untracked irq handler
    irqchip/mips-gic: Populate irq_domain names
    irqchip/gicv3-its: Implement two-level(indirect) device table support
    ...

    Linus Torvalds
     
  • Pull timer updates from Thomas Gleixner:
    "This update provides the following changes:

    - The rework of the timer wheel which addresses the shortcomings of
    the current wheel (cascading, slow search for next expiring timer,
    etc). That's the first major change of the wheel in almost 20
    years since Finn implemted it.

    - A large overhaul of the clocksource drivers init functions to
    consolidate the Device Tree initialization

    - Some more Y2038 updates

    - A capability fix for timerfd

    - Yet another clock chip driver

    - The usual pile of updates, comment improvements all over the place"

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (130 commits)
    tick/nohz: Optimize nohz idle enter
    clockevents: Make clockevents_subsys static
    clocksource/drivers/time-armada-370-xp: Fix return value check
    timers: Implement optimization for same expiry time in mod_timer()
    timers: Split out index calculation
    timers: Only wake softirq if necessary
    timers: Forward the wheel clock whenever possible
    timers/nohz: Remove pointless tick_nohz_kick_tick() function
    timers: Optimize collect_expired_timers() for NOHZ
    timers: Move __run_timers() function
    timers: Remove set_timer_slack() leftovers
    timers: Switch to a non-cascading wheel
    timers: Reduce the CPU index space to 256k
    timers: Give a few structs and members proper names
    hlist: Add hlist_is_singular_node() helper
    signals: Use hrtimer for sigtimedwait()
    timers: Remove the deprecated mod_timer_pinned() API
    timers, net/ipv4/inet: Initialize connection request timers as pinned
    timers, drivers/tty/mips_ejtag: Initialize the poll timer as pinned
    timers, drivers/tty/metag_da: Initialize the poll timer as pinned
    ...

    Linus Torvalds
     
  • Pull x86 boot updates from Ingo Molnar:
    "The main changes:

    - add initial commits to randomize kernel memory section virtual
    addresses, enabled via a new kernel option: RANDOMIZE_MEMORY
    (Thomas Garnier, Kees Cook, Baoquan He, Yinghai Lu)

    - enhance KASLR (RANDOMIZE_BASE) physical memory randomization (Kees
    Cook)

    - EBDA/BIOS region boot quirk cleanups (Andy Lutomirski, Ingo Molnar)

    - misc cleanups/fixes"

    * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/boot: Simplify EBDA-vs-BIOS reservation logic
    x86/boot: Clarify what x86_legacy_features.reserve_bios_regions does
    x86/boot: Reorganize and clean up the BIOS area reservation code
    x86/mm: Do not reference phys addr beyond kernel
    x86/mm: Add memory hotplug support for KASLR memory randomization
    x86/mm: Enable KASLR for vmalloc memory regions
    x86/mm: Enable KASLR for physical mapping memory regions
    x86/mm: Implement ASLR for kernel memory regions
    x86/mm: Separate variable for trampoline PGD
    x86/mm: Add PUD VA support for physical mapping
    x86/mm: Update physical mapping variable names
    x86/mm: Refactor KASLR entropy functions
    x86/KASLR: Fix boot crash with certain memory configurations
    x86/boot/64: Add forgotten end of function marker
    x86/KASLR: Allow randomization below the load address
    x86/KASLR: Extend kernel image physical address randomization to addresses larger than 4G
    x86/KASLR: Randomize virtual address separately
    x86/KASLR: Clarify identity map interface
    x86/boot: Refuse to build with data relocations
    x86/KASLR, x86/power: Remove x86 hibernation restrictions

    Linus Torvalds
     
  • Pull NOHZ updates from Ingo Molnar:

    - fix system/idle cputime leaked on cputime accounting (all nohz
    configs) (Rik van Riel)

    - remove the messy, ad-hoc irqtime account on nohz-full and make it
    compatible with CONFIG_IRQ_TIME_ACCOUNTING=y instead (Rik van Riel)

    - cleanups (Frederic Weisbecker)

    - remove unecessary irq disablement in the irqtime code (Rik van Riel)

    * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched/cputime: Drop local_irq_save/restore from irqtime_account_irq()
    sched/cputime: Reorganize vtime native irqtime accounting headers
    sched/cputime: Clean up the old vtime gen irqtime accounting completely
    sched/cputime: Replace VTIME_GEN irq time code with IRQ_TIME_ACCOUNTING code
    sched/cputime: Count actually elapsed irq & softirq time

    Linus Torvalds
     
  • Pull scheduler updates from Ingo Molnar:

    - introduce and use task_rcu_dereference()/try_get_task_struct() to fix
    and generalize task_struct handling (Oleg Nesterov)

    - do various per entity load tracking (PELT) fixes and optimizations
    (Peter Zijlstra)

    - cputime virt-steal time accounting enhancements/fixes (Wanpeng Li)

    - introduce consolidated cputime output file cpuacct.usage_all and
    related refactorings (Zhao Lei)

    - ... plus misc fixes and enhancements

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched/core: Panic on scheduling while atomic bugs if kernel.panic_on_warn is set
    sched/cpuacct: Introduce cpuacct.usage_all to show all CPU stats together
    sched/cpuacct: Use loop to consolidate code in cpuacct_stats_show()
    sched/cpuacct: Merge cpuacct_usage_index and cpuacct_stat_index enums
    sched/fair: Rework throttle_count sync
    sched/core: Fix sched_getaffinity() return value kerneldoc comment
    sched/fair: Reorder cgroup creation code
    sched/fair: Apply more PELT fixes
    sched/fair: Fix PELT integrity for new tasks
    sched/cgroup: Fix cpu_cgroup_fork() handling
    sched/fair: Fix PELT integrity for new groups
    sched/fair: Fix and optimize the fork() path
    sched/cputime: Add steal time support to full dynticks CPU time accounting
    sched/cputime: Fix prev steal time accouting during CPU hotplug
    KVM: Fix steal clock warp during guest CPU hotplug
    sched/debug: Always show 'nr_migrations'
    sched/fair: Use task_rcu_dereference()
    sched/api: Introduce task_rcu_dereference() and try_get_task_struct()
    sched/idle: Optimize the generic idle loop
    sched/fair: Fix the wrong throttled clock time for cfs_rq_clock_task()

    Linus Torvalds
     
  • Pull perf updates from Ingo Molnar:
    "With over 300 commits it's been a busy cycle - with most of the work
    concentrated on the tooling side (as it should).

    The main kernel side enhancements were:

    - Add per event callchain limit: Recently we introduced a sysctl to
    tune the max-stack for all events for which callchains were
    requested:

    $ sysctl kernel.perf_event_max_stack
    kernel.perf_event_max_stack = 127

    Now this patch introduces a way to configure this per event, i.e.
    this becomes possible:

    $ perf record -e sched:*/max-stack=2/ -e block:*/max-stack=10/ -a

    allowing finer tuning of how much buffer space callchains use.

    This uses an u16 from the reserved space at the end, leaving
    another u16 for future use.

    There has been interest in even finer tuning, namely to control the
    max stack for kernel and userspace callchains separately. Further
    discussion is needed, we may for instance use the remaining u16 for
    that and when it is present, assume that the sample_max_stack
    introduced in this patch applies for the kernel, and the u16 left
    is used for limiting the userspace callchain (Arnaldo Carvalho de
    Melo)

    - Optimize AUX event (hardware assisted side-band event) delivery
    (Kan Liang)

    - Rework Intel family name macro usage (this is partially x86 arch
    work) (Dave Hansen)

    - Refine and fix Intel LBR support (David Carrillo-Cisneros)

    - Add support for Intel 'TopDown' events (Andi Kleen)

    - Intel uncore PMU driver fixes and enhancements (Kan Liang)

    - ... other misc changes.

    Here's an incomplete list of the tooling enhancements (but there's
    much more, see the shortlog and the git log for details):

    - Support cross unwinding, i.e. collecting '--call-graph dwarf'
    perf.data files in one machine and then doing analysis in another
    machine of a different hardware architecture. This enables, for
    instance, to do:

    $ perf record -a --call-graph dwarf

    on a x86-32 or aarch64 system and then do 'perf report' on it on a
    x86_64 workstation (He Kuang)

    - Allow reading from a backward ring buffer (one setup via
    sys_perf_event_open() with perf_event_attr.write_backward = 1)
    (Wang Nan)

    - Finish merging initial SDT (Statically Defined Traces) support, see
    cset comments for details about how it all works (Masami Hiramatsu)

    - Support attaching eBPF programs to tracepoints (Wang Nan)

    - Add demangling of symbols in programs written in the Rust language
    (David Tolnay)

    - Add support for tracepoints in the python binding, including an
    example, that sets up and parses sched:sched_switch events,
    tools/perf/python/tracepoint.py (Jiri Olsa)

    - Introduce --stdio-color to set up the color output mode selection
    in 'annotate' and 'report', allowing emit color escape sequences
    when redirecting the output of these tools (Arnaldo Carvalho de
    Melo)

    - Add 'callindent' option to 'perf script -F', to indent the Intel PT
    call stack, making this output more ftrace-like (Adrian Hunter,
    Andi Kleen)

    - Allow dumping the object files generated by llvm when processing
    eBPF scriptlet events (Wang Nan)

    - Add stackcollapse.py script to help generating flame graphs (Paolo
    Bonzini)

    - Add --ldlat option to 'perf mem' to specify load latency for loads
    event (e.g. cpu/mem-loads/ ) (Jiri Olsa)

    - Tooling support for Intel TopDown counters, recently added to the
    kernel (Andi Kleen)"

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (303 commits)
    perf tests: Add is_printable_array test
    perf tools: Make is_printable_array global
    perf script python: Fix string vs byte array resolving
    perf probe: Warn unmatched function filter correctly
    perf cpu_map: Add more helpers
    perf stat: Balance opening and reading events
    tools: Copy linux/{hash,poison}.h and check for drift
    perf tools: Remove include/linux/list.h from perf's MANIFEST
    tools: Copy the bitops files accessed from the kernel and check for drift
    Remove: kernel unistd*h files from perf's MANIFEST, not used
    perf tools: Remove tools/perf/util/include/linux/const.h
    perf tools: Remove tools/perf/util/include/asm/byteorder.h
    perf tools: Add missing linux/compiler.h include to perf-sys.h
    perf jit: Remove some no-op error handling
    perf jit: Add missing curly braces
    objtool: Initialize variable to silence old compiler
    objtool: Add -I$(srctree)/tools/arch/$(ARCH)/include/uapi
    perf record: Add --tail-synthesize option
    perf session: Don't warn about out of order event if write_backward is used
    perf tools: Enable overwrite settings
    ...

    Linus Torvalds
     
  • Pull locking updates from Ingo Molnar:
    "The locking tree was busier in this cycle than the usual pattern - a
    couple of major projects happened to coincide.

    The main changes are:

    - implement the atomic_fetch_{add,sub,and,or,xor}() API natively
    across all SMP architectures (Peter Zijlstra)

    - add atomic_fetch_{inc/dec}() as well, using the generic primitives
    (Davidlohr Bueso)

    - optimize various aspects of rwsems (Jason Low, Davidlohr Bueso,
    Waiman Long)

    - optimize smp_cond_load_acquire() on arm64 and implement LSE based
    atomic{,64}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}()
    on arm64 (Will Deacon)

    - introduce smp_acquire__after_ctrl_dep() and fix various barrier
    mis-uses and bugs (Peter Zijlstra)

    - after discovering ancient spin_unlock_wait() barrier bugs in its
    implementation and usage, strengthen its semantics and update/fix
    usage sites (Peter Zijlstra)

    - optimize mutex_trylock() fastpath (Peter Zijlstra)

    - ... misc fixes and cleanups"

    * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (67 commits)
    locking/atomic: Introduce inc/dec variants for the atomic_fetch_$op() API
    locking/barriers, arch/arm64: Implement LDXR+WFE based smp_cond_load_acquire()
    locking/static_keys: Fix non static symbol Sparse warning
    locking/qspinlock: Use __this_cpu_dec() instead of full-blown this_cpu_dec()
    locking/atomic, arch/tile: Fix tilepro build
    locking/atomic, arch/m68k: Remove comment
    locking/atomic, arch/arc: Fix build
    locking/Documentation: Clarify limited control-dependency scope
    locking/atomic, arch/rwsem: Employ atomic_long_fetch_add()
    locking/atomic, arch/qrwlock: Employ atomic_fetch_add_acquire()
    locking/atomic, arch/mips: Convert to _relaxed atomics
    locking/atomic, arch/alpha: Convert to _relaxed atomics
    locking/atomic: Remove the deprecated atomic_{set,clear}_mask() functions
    locking/atomic: Remove linux/atomic.h:atomic_fetch_or()
    locking/atomic: Implement atomic{,64,_long}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}()
    locking/atomic: Fix atomic64_relaxed() bits
    locking/atomic, arch/xtensa: Implement atomic_fetch_{add,sub,and,or,xor}()
    locking/atomic, arch/x86: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
    locking/atomic, arch/tile: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
    locking/atomic, arch/sparc: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
    ...

    Linus Torvalds
     
  • Pull RCU updates from Ingo Molnar:
    "The main changes in this cycle were:

    - documentation updates

    - miscellaneous fixes

    - minor reorganization of code

    - torture-test updates"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
    rcu: Correctly handle sparse possible cpus
    rcu: sysctl: Panic on RCU Stall
    rcu: Fix a typo in a comment
    rcu: Make call_rcu_tasks() tolerate first call with irqs disabled
    rcu: Disable TASKS_RCU for usermode Linux
    rcu: No ordering for rcu_assign_pointer() of NULL
    rcutorture: Fix error return code in rcu_perf_init()
    torture: Inflict default jitter
    rcuperf: Don't treat gp_exp mis-setting as a WARN
    rcutorture: Drop "-soundhw pcspkr" from x86 boot arguments
    rcutorture: Don't specify the cpu type of QEMU on PPC
    rcutorture: Make -soundhw a x86 specific option
    rcutorture: Use vmlinux as the fallback kernel image
    rcutorture/doc: Create initrd using dracut
    torture: Stop onoff task if there is only one cpu
    torture: Add starvation events to error summary
    torture: Break online and offline functions out of torture_onoff()
    torture: Forgive lengthy trace dumps and preemption
    torture: Remove CONFIG_RCU_TORTURE_TEST_RUNNABLE, simplify code
    torture: Simplify code, eliminate RCU_PERF_TEST_RUNNABLE
    ...

    Linus Torvalds
     

25 Jul, 2016

1 commit

  • Pull staging and IIO driver updates from Greg KH:
    "Here is the big Staging and IIO driver update for 4.8-rc1.

    We ended up adding more code than removing, again, but it's not all
    that bad. Lots of cleanups all over the staging tree, and new IIO
    drivers, full details in the shortlog.

    All of these have been in linux-next for a while with no reported
    issues"

    * tag 'staging-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (417 commits)
    drivers:iio:accel:mma8452: removed unwanted return statements
    drivers:iio:accel:mma8452: added cleanup provision in case of failure.
    iio: Add iio.git tree to MAINTAINERS
    iio:st_pressure: clean useless static channel initializers
    iio:st_pressure:lps22hb: temperature support
    iio:st_pressure:lps22hb: open drain support
    iio:st_pressure: temperature triggered buffering
    iio:st_pressure: document sampling gains
    iio:st_pressure: align storagebits on power of 2
    iio:st_sensors: align on storagebits boundaries
    staging:iio:lis3l02dq drop separate driver
    iio: accel: st_accel: Add lis3l02dq support
    iio: adc: add missing of_node references to iio_dev
    iio: adc: ti-ads1015: add indio_dev->dev.of_node reference
    iio: potentiometer: Fix typo in Kconfig
    iio: potentiometer: mcp4531: Add device tree binding
    iio: potentiometer: mcp4531: Add device tree binding documentation
    iio: potentiometer: mcp4531: Add support for MCP454x, MCP456x, MCP464x and MCP466x
    iio:imu:mpu6050: icm20608 initial support
    iio: adc: max1363: Add device tree binding
    ...

    Linus Torvalds
     

19 Jul, 2016

3 commits

  • tick_nohz_start_idle is called before checking whether the idle tick can be
    stopped. If the tick cannot be stopped, calling tick_nohz_start_idle() is
    pointless and just wasting CPU cycles.

    Only invoke tick_nohz_start_idle() when can_stop_idle_tick() returns true. A
    short one minute observation of the effect on ARM64 shows a reduction of calls
    by 1.5% thus optimizing the idle entry sequence.

    [tglx: Massaged changelog ]

    Co-developed-by: Sanjeev Yadav
    Signed-off-by: Gaurav Jindal
    Link: http://lkml.kernel.org/r/20160714120416.GB21099@gaurav.jindal@spreadtrum.com
    Signed-off-by: Thomas Gleixner

    Gaurav Jindal
     
  • The new affinity hint argument of __irq_domain_alloc_irqs() is missing in
    irq_reserve_ipi(). Add it.

    This fixes the following compilation error:

    kernel/irq/ipi.c: In function ‘irq_reserve_ipi’:
    kernel/irq/ipi.c:85:9: error: too few arguments to function ‘__irq_domain_alloc_irqs’
    virq = __irq_domain_alloc_irqs(domain, virq, nr_irqs, NUMA_NO_NODE,
    ^
    Fixes: 06ee6d571f0e ("genirq: Add affinity hint to irq allocation")
    Signed-off-by: Vincent Stehlé
    Cc: linux-pci@vger.kernel.org
    Cc: Christoph Hellwig
    Signed-off-by: Thomas Gleixner

    Vincent Stehle
     
  • The clockevents_subsys struct is used for sysfs support and
    is not declared or used outside the file it is defined in.
    Fix the following warning by making it static:

    kernel/time/clockevents.c:648:17: warning: symbol 'clockevents_subsys' was not declared. Should it be static?

    Signed-off-by: Ben Dooks
    Cc: linux-kernel@lists.codethink.co.uk
    Link: http://lkml.kernel.org/r/1466178974-7105-1-git-send-email-ben.dooks@codethink.co.uk
    Signed-off-by: Thomas Gleixner

    Ben Dooks
     

16 Jul, 2016

1 commit


15 Jul, 2016

3 commits

  • Merge misc fixes from Andrew Morton:
    "20 fixes"

    * emailed patches from Andrew Morton :
    m32r: fix build warning about putc
    mm: workingset: printk missing log level, use pr_info()
    mm: thp: refix false positive BUG in page_move_anon_rmap()
    mm: rmap: call page_check_address() with sync enabled to avoid racy check
    mm: thp: move pmd check inside ptl for freeze_page()
    vmlinux.lds: account for destructor sections
    gcov: add support for gcc version >= 6
    mm, meminit: ensure node is online before checking whether pages are uninitialised
    mm, meminit: always return a valid node from early_pfn_to_nid
    kasan/quarantine: fix bugs on qlist_move_cache()
    uapi: export lirc.h header
    madvise_free, thp: fix madvise_free_huge_pmd return value after splitting
    Revert "scripts/gdb: add documentation example for radix tree"
    Revert "scripts/gdb: add a Radix Tree Parser"
    scripts/gdb: Perform path expansion to lx-symbol's arguments
    scripts/gdb: add constants.py to .gitignore
    scripts/gdb: rebuild constants.py on dependancy change
    scripts/gdb: silence 'nothing to do' message
    kasan: add newline to messages
    mm, compaction: prevent VM_BUG_ON when terminating freeing scanner

    Linus Torvalds
     
  • Pull scheduler fix from Ingo Molnar:
    "Fix a CPU hotplug related corruption of the load average that got
    introduced in this merge window"

    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched/core: Correct off by one bug in load migration calculation

    Linus Torvalds
     
  • Link: http://lkml.kernel.org/r/20160701130914.GA23225@styxhp
    Signed-off-by: Florian Meier
    Reviewed-by: Peter Oberparleiter
    Tested-by: Peter Oberparleiter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Florian Meier
     

14 Jul, 2016

7 commits

  • Paolo pointed out that irqs are already blocked when irqtime_account_irq()
    is called. That means there is no reason to call local_irq_save/restore()
    again.

    Suggested-by: Paolo Bonzini
    Signed-off-by: Rik van Riel
    Signed-off-by: Frederic Weisbecker
    Reviewed-by: Paolo Bonzini
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Radim Krcmar
    Cc: Thomas Gleixner
    Cc: Wanpeng Li
    Link: http://lkml.kernel.org/r/1468421405-20056-6-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar

    Rik van Riel
     
  • Vtime generic irqtime accounting has been removed but there are a few
    remnants to clean up:

    * The vtime_accounting_cpu_enabled() check in irq entry was only used
    by CONFIG_VIRT_CPU_ACCOUNTING_GEN. We can safely remove it.

    * Without the vtime_accounting_cpu_enabled(), we no longer need to
    have a vtime_common_account_irq_enter() indirect function.

    * Move vtime_account_irq_enter() implementation under
    CONFIG_VIRT_CPU_ACCOUNTING_NATIVE which is the last user.

    * The vtime_account_user() call was only used on irq entry for
    CONFIG_VIRT_CPU_ACCOUNTING_GEN. We can remove that too.

    Signed-off-by: Frederic Weisbecker
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Paolo Bonzini
    Cc: Peter Zijlstra
    Cc: Radim Krcmar
    Cc: Rik van Riel
    Cc: Thomas Gleixner
    Cc: Wanpeng Li
    Link: http://lkml.kernel.org/r/1468421405-20056-4-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • The CONFIG_VIRT_CPU_ACCOUNTING_GEN irq time tracking code does not
    appear to currently work right.

    On CPUs without nohz_full=, only tick based irq time sampling is
    done, which breaks down when dealing with a nohz_idle CPU.

    On firewalls and similar systems, no ticks may happen on a CPU for a
    while, and the irq time spent may never get accounted properly. This
    can cause issues with capacity planning and power saving, which use
    the CPU statistics as inputs in decision making.

    Remove the VTIME_GEN vtime irq time code, and replace it with the
    IRQ_TIME_ACCOUNTING code, when selected as a config option by the user.

    Signed-off-by: Rik van Riel
    Signed-off-by: Frederic Weisbecker
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Paolo Bonzini
    Cc: Peter Zijlstra
    Cc: Radim Krcmar
    Cc: Thomas Gleixner
    Cc: Wanpeng Li
    Link: http://lkml.kernel.org/r/1468421405-20056-3-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar

    Rik van Riel
     
  • Currently, if there was any irq or softirq time during 'ticks'
    jiffies, the entire period will be accounted as irq or softirq
    time.

    This is inaccurate if only a subset of the time was actually spent
    handling irqs, and could conceivably mis-count all of the ticks during
    a period as irq time, when there was some irq and some softirq time.

    This can actually happen when irqtime_account_process_tick is called
    from account_idle_ticks, which can pass a larger number of ticks down
    all at once.

    Fix this by changing irqtime_account_hi_update(), irqtime_account_si_update(),
    and steal_account_process_ticks() to work with cputime_t time units, and
    return the amount of time spent in each mode.

    Rename steal_account_process_ticks() to steal_account_process_time(), to
    reflect that time is now accounted in cputime_t, instead of ticks.

    Additionally, have irqtime_account_process_tick() take into account how
    much time was spent in each of steal, irq, and softirq time.

    The latter could help improve the accuracy of cputime
    accounting when returning from idle on a NO_HZ_IDLE CPU.

    Properly accounting how much time was spent in hardirq and
    softirq time will also allow the NO_HZ_FULL code to re-use
    these same functions for hardirq and softirq accounting.

    Signed-off-by: Rik van Riel
    [ Make nsecs_to_cputime64() actually return cputime64_t. ]
    Signed-off-by: Frederic Weisbecker
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Paolo Bonzini
    Cc: Peter Zijlstra
    Cc: Radim Krcmar
    Cc: Thomas Gleixner
    Cc: Wanpeng Li
    Link: http://lkml.kernel.org/r/1468421405-20056-2-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar

    Rik van Riel
     
  • Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • …iio into staging-next

    Jonathan writes:

    Third set of IIO new device support, features and cleanups for the 4.8 cycle.

    New core features
    - Selection of the clock source for IIO timestamps. This is done per device
    as it makes little sense to have events in one timebase and data timestamped
    on another. Biggest reason for this is that we currently use a clock
    source which is non monotonic which can result in 'interesting' data sets.
    (Includes export for get_monotonic_corse64 which Thomas Gleixner didn't mind
    in an earlier version.)
    - MAINTAINERS add the git tree to the list for IIO.

    New device support + a kind of indirect staging graduation.
    * Broadcom iproc-static-adc
    - new driver
    * mcp4531
    - support for MCP454x, MCP456x, MCP464x and MCP466x potentiometers
    * mpu6050
    - support the IC20608 6 axis motion tracking device
    * st-sensors
    - support the lis3l02dq + drop the lis3l02dq driver from staging.
    The general purpose driver is missing event support, but good to get
    rid of this driver which was rather long in the tooth.

    New driver features
    * ak8975
    - Add vid regulator support and refactor handling in general.
    - Allow a delay after enabling regulators.
    - Runtime and system PM.
    * bmg160
    - filter frequency control support.
    * bmp280
    - SPI device support.
    - EOC interrupt support for the BMP085
    - power management support.
    - supply regulator support.
    - reset gpio support
    - dt bindings for reset gpio and regulators.
    - of table to support device tree registration
    * max1363
    - Device tree bindings.
    * mcp4531
    - Device tree bindings.
    * st-pressure
    - temperature channels as part of triggered buffer (previously not due
    probably to alignment issues - see below).
    - lps22hb open drain interrupt support.
    - lps22hb temperature channel support

    Cleanups and reworkings.
    * numerous ADC drivers
    - ensure the iio_dev->dev.of_node is set to the parent dev.of_node so
    as to allow client bindings to find the device.
    * ak8975
    - Fix incorrect handling of missing regulator
    - make sure power is down and remove.
    * bmp280
    - read the calibration data only once as it doesn't change.
    * isl29125
    - Use a few macros to make code a touch more readable.
    * mma8452
    - fix a memory leak on error.
    - drop an unecessary bit of return value handling.
    * potentiometer kconfig
    - typo fix.
    * st-pressure
    - drop some uninformative default assignments of elements of the channel
    array structure (aids readability).
    * st-sensors
    - Harden interrupt handling considerably. These are actually all using
    level interrupts, but at least two known boards have them wired to
    edge only interrupt chips. Hence a slightly interesting bit of handling
    is needed in which we first allow for the easy option (level triggered) and
    secondly check the status registers before reenabling edge interrupts and
    fall back to a tight loop in the thread until we successfully clear the
    interrupt. No harm is done if we never succeed in doing so. It's an odd
    patch that has been through a lot of revisions to reach a consensus on how
    to handle what is basically broken hardware (which the previous defaults
    allowed to kind of work).
    - Fix alignment to defined storagebytes boundaries.
    - Ensure alignment of power of 2 byte boundaries. This has always in theory
    been part of the ABI of IIO, but we missed a few that snuck in that need
    fixing. The effect was minor as they were only followed by timestamp
    channels which were correctly aligned,
    - Add some docs to explain the gain calculations.

    Greg Kroah-Hartman
     
  • …t.kernel.org/pub/scm/linux/kernel/git/tip/tip

    Pull perf and timer fixes from Ingo Molnar:
    "A fix for a posix CPU timers bug, and a perf printk message fix"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    perf/x86: Fix bogus kernel printk, again

    * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    posix_cpu_timer: Exit early when process has been reaped

    Linus Torvalds
     

13 Jul, 2016

2 commits

  • The move of calc_load_migrate() from CPU_DEAD to CPU_DYING did not take into
    account that the function is now called from a thread running on the outgoing
    CPU. As a result a cpu unplug leakes a load of 1 into the global load
    accounting mechanism.

    Fix it by adjusting for the currently running thread which calls
    calc_load_migrate().

    Reported-by: Anton Blanchard
    Signed-off-by: Thomas Gleixner
    Acked-by: Peter Zijlstra
    Cc: Linus Torvalds
    Cc: Michael Ellerman
    Cc: Vaidyanathan Srinivasan
    Cc: rt@linutronix.de
    Cc: shreyas@linux.vnet.ibm.com
    Fixes: e9cd8fa4fcfd: ("sched/migration: Move calc_load_migrate() into CPU_DYING")
    Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1607121744350.4083@nanos
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • Xiaolong Ye reported lock debug warnings triggered by the following commit:

    8de4a0066106 ("perf/x86: Convert the core to the hotplug state machine")

    The bug is the following: the cpuhp_bp_states[] array is cut short when
    CONFIG_SMP=n, but the dynamically registered callbacks are stored nevertheless
    and happily scribble outside of the array bounds...

    We need to store them in case that the state is unregistered so we can invoke
    the teardown function. That's independent of CONFIG_SMP. Make sure the array
    is large enough.

    Reported-by: kernel test robot
    Signed-off-by: Thomas Gleixner
    Cc: Adam Borowski
    Cc: Alexander Shishkin
    Cc: Anna-Maria Gleixner
    Cc: Arnaldo Carvalho de Melo
    Cc: Arnaldo Carvalho de Melo
    Cc: Borislav Petkov
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Sebastian Andrzej Siewior
    Cc: Stephane Eranian
    Cc: Vince Weaver
    Cc: lkp@01.org
    Cc: stable@vger.kernel.org
    Cc: tipbuild@zytor.com
    Fixes: cff7d378d3fd "cpu/hotplug: Convert to a state machine for the control processor"
    Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1607122144560.4083@nanos
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

11 Jul, 2016

4 commits

  • If an irq_domain is auto-recursive and irq_domain_alloc_irqs_recursive()
    for its parent has returned an error, then do return and avoid calling
    irq_domain_free_irqs_recursive() uselessly, because:
    - if domain->ops->alloc() had failed for an auto-recursive irq_domain,
    then irq_domain_free_irqs_recursive() had already been called;
    - if domain->ops->alloc() had failed for a not auto-recursive irq_domain,
    then there is nothing to free at all.

    Signed-off-by: Alexander Popov
    Acked-by: Marc Zyngier
    Link: http://lkml.kernel.org/r/1467505448-2850-1-git-send-email-alex.popov@linux.com
    Signed-off-by: Thomas Gleixner

    Alexander Popov
     
  • Variable "now" seems to be genuinely used unintialized
    if branch

    if (CPUCLOCK_PERTHREAD(timer->it_clock)) {

    is not taken and branch

    if (unlikely(sighand == NULL)) {

    is taken. In this case the process has been reaped and the timer is marked as
    disarmed anyway. So none of the postprocessing of the sample is
    required. Return right away.

    Signed-off-by: Alexey Dobriyan
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/20160707223911.GA26483@p183.telecom.by
    Signed-off-by: Thomas Gleixner

    Alexey Dobriyan
     
  • This reverts commit 2c95afc1e83d93fac3be6923465e1753c2c53b0a.

    Stephane reported the following regression:

    > Since Andi added:
    >
    > commit 2c95afc1e83d93fac3be6923465e1753c2c53b0a
    > Author: Andi Kleen
    > Date: Thu Jun 9 06:14:38 2016 -0700
    >
    > perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86
    >
    > $ perf stat -e ref-cycles ls
    > ....
    >
    > fails systematically because the ref-cycles is now used by the
    > watchdog and given this is a system-wide pinned event, it monopolizes
    > the fixed counter 2 which is the only counter able to measure this event.

    Since the next merge window is near, fix the regression for now
    by reverting the commit.

    Reported-by: Stephane Eranian
    Cc: Andi Kleen
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Vince Weaver
    Cc: Alexander Shishkin
    Cc: Linus Torvalds
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Currently, a schedule while atomic error prints the stack trace to the
    kernel log and the system continue running.

    Although it is possible to collect the kernel log messages and analyze
    it, often more information are needed. Furthermore, keep the system
    running is not always the best choice. For example, when the preempt
    count underflows the system will not stop to complain about scheduling
    while atomic, so the kernel log can wrap around overwriting the first
    stack trace, tuning the analysis even more challenging.

    This patch uses the kernel.panic_on_warn sysctl to help out on these
    more complex situations.

    When kernel.panic_on_warn is set to 1, the kernel will panic() in the
    schedule while atomic detection.

    The default value of the sysctl is 0, maintaining the current behavior.

    Signed-off-by: Daniel Bristot de Oliveira
    Reviewed-by: Luis Claudio R. Goncalves
    Cc: Christian Borntraeger
    Cc: Linus Torvalds
    Cc: Luis Claudio R. Goncalves
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/e8f7b80f353aa22c63bd8557208163989af8493d.1464983675.git.bristot@redhat.com
    Signed-off-by: Ingo Molnar

    Daniel Bristot de Oliveira
     

09 Jul, 2016

4 commits

  • In current code, we can get cpuacct data from several files,
    but each file has various limitations.

    For example:

    - We can get CPU usage in user and kernel mode via cpuacct.stat,
    but we can't get detailed data about each CPU.

    - We can get each CPU's kernel mode usage in cpuacct.usage_percpu_sys,
    but we can't get user mode usage data at the same time.

    This patch introduces cpuacct.usage_all, to show all detailed CPU
    accounting data together:

    # cat cpuacct.usage_all
    cpu user system
    0 3809760299 5807968992
    1 3250329855 454612211
    ..

    Signed-off-by: Zhao Lei
    Cc: KOSAKI Motohiro
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/7744460969edd7caaf0e903592ee52353ed9bdd6.1466415271.git.zhaolei@cn.fujitsu.com
    Signed-off-by: Ingo Molnar

    Zhao Lei
     
  • In cpuacct_stats_show() we currently we have copies of similar code,
    for each cpustat(system/user) variant.

    Use a loop instead to consolidate the code. This will also work better
    if we extend the CPUACCT_STAT_NSTATS type.

    Signed-off-by: Zhao Lei
    Cc: KOSAKI Motohiro
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/b0597d4224655e9f333f1a6224ed9654c7d7d36a.1466415271.git.zhaolei@cn.fujitsu.com
    Signed-off-by: Ingo Molnar

    Zhao Lei
     
  • These two types have similar function, no need to separate them.

    Signed-off-by: Zhao Lei
    Cc: KOSAKI Motohiro
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/436748885270d64363c7dc67167507d486c2057a.1466415271.git.zhaolei@cn.fujitsu.com
    Signed-off-by: Ingo Molnar

    Zhao Lei
     
  • Pull scheduler fixes from Ingo Molnar:
    "Two load-balancing fixes for cgroups-intense workloads"

    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched/fair: Fix calc_cfs_shares() fixed point arithmetics width confusion
    sched/fair: Fix effective_load() to consistently use smoothed load

    Linus Torvalds
     

08 Jul, 2016

1 commit


07 Jul, 2016

5 commits

  • Ingo Molnar
     
  • The existing optimization for same expiry time in mod_timer() checks whether
    the timer expiry time is the same as the new requested expiry time. In the old
    timer wheel implementation this does not take the slack batching into account,
    neither does the new implementation evaluate whether the new expiry time will
    requeue the timer to the same bucket.

    To optimize that, we can calculate the resulting bucket and check if the new
    expiry time is different from the current expiry time. This calculation
    happens outside the base lock held region. If the resulting bucket is the same
    we can avoid taking the base lock and requeueing the timer.

    If the timer needs to be requeued then we have to check under the base lock
    whether the base time has changed between the lockless calculation and taking
    the lock. If it has changed we need to recalculate under the lock.

    This optimization takes effect for timers which are enqueued into the less
    granular wheel levels (1 and above). With a simple test case the functionality
    has been verified:

    Before After
    Match: 5.5% 86.6%
    Requeue: 94.5% 13.4%
    Recalc:
    Signed-off-by: Thomas Gleixner
    Cc: Arjan van de Ven
    Cc: Chris Mason
    Cc: Eric Dumazet
    Cc: Frederic Weisbecker
    Cc: George Spelvin
    Cc: Josh Triplett
    Cc: Len Brown
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: rt@linutronix.de
    Link: http://lkml.kernel.org/r/20160704094342.778527749@linutronix.de
    Signed-off-by: Ingo Molnar

    Anna-Maria Gleixner
     
  • For further optimizations we need to seperate index calculation
    from queueing. No functional change.

    Signed-off-by: Anna-Maria Gleixner
    Signed-off-by: Thomas Gleixner
    Cc: Arjan van de Ven
    Cc: Chris Mason
    Cc: Eric Dumazet
    Cc: Frederic Weisbecker
    Cc: George Spelvin
    Cc: Josh Triplett
    Cc: Len Brown
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: rt@linutronix.de
    Link: http://lkml.kernel.org/r/20160704094342.691159619@linutronix.de
    Signed-off-by: Ingo Molnar

    Anna-Maria Gleixner
     
  • With the wheel forwading in place and with the HZ=1000 4ms folding we can
    avoid running the softirq at all.

    Signed-off-by: Thomas Gleixner
    Cc: Arjan van de Ven
    Cc: Chris Mason
    Cc: Frederic Weisbecker
    Cc: George Spelvin
    Cc: Josh Triplett
    Cc: Len Brown
    Cc: Linus Torvalds
    Cc: Paul McKenney
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: rt@linutronix.de
    Link: http://lkml.kernel.org/r/20160704094342.607650550@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • The wheel clock is stale when a CPU goes into a long idle sleep. This has the
    side effect that timers which are queued end up in the outer wheel levels.
    That results in coarser granularity.

    To solve this, we keep track of the idle state and forward the wheel clock
    whenever possible.

    Signed-off-by: Thomas Gleixner
    Cc: Arjan van de Ven
    Cc: Chris Mason
    Cc: Eric Dumazet
    Cc: Frederic Weisbecker
    Cc: George Spelvin
    Cc: Josh Triplett
    Cc: Len Brown
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: rt@linutronix.de
    Link: http://lkml.kernel.org/r/20160704094342.512039360@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner