05 Jun, 2014

2 commits

  • Currently to allocate a page that should be charged to kmemcg (e.g.
    threadinfo), we pass __GFP_KMEMCG flag to the page allocator. The page
    allocated is then to be freed by free_memcg_kmem_pages. Apart from
    looking asymmetrical, this also requires intrusion to the general
    allocation path. So let's introduce separate functions that will
    alloc/free pages charged to kmemcg.

    The new functions are called alloc_kmem_pages and free_kmem_pages. They
    should be used when the caller actually would like to use kmalloc, but
    has to fall back to the page allocator for the allocation is large.
    They only differ from alloc_pages and free_pages in that besides
    allocating or freeing pages they also charge them to the kmem resource
    counter of the current memory cgroup.

    [sfr@canb.auug.org.au: export kmalloc_order() to modules]
    Signed-off-by: Vladimir Davydov
    Acked-by: Greg Thelen
    Cc: Johannes Weiner
    Acked-by: Michal Hocko
    Cc: Glauber Costa
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Signed-off-by: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     
  • Commit 786235eeba0e ("kthread: make kthread_create() killable") meant
    for allowing kthread_create() to abort as soon as killed by the
    OOM-killer. But returning -ENOMEM is wrong if killed by SIGKILL from
    userspace. Change kthread_create() to return -EINTR upon SIGKILL.

    Signed-off-by: Tetsuo Handa
    Cc: Oleg Nesterov
    Acked-by: David Rientjes
    Cc: [3.13+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tetsuo Handa
     

04 Jun, 2014

10 commits

  • …fael/linux-pm into next

    Pull ACPI and power management updates from Rafael Wysocki:
    "ACPICA is the leader this time (63 commits), followed by cpufreq (28
    commits), devfreq (15 commits), system suspend/hibernation (12
    commits), ACPI video and ACPI device enumeration (10 commits each).

    We have no major new features this time, but there are a few
    significant changes of how things work. The most visible one will
    probably be that we are now going to create platform devices rather
    than PNP devices by default for ACPI device objects with _HID. That
    was long overdue and will be really necessary to be able to use the
    same drivers for the same hardware blocks on ACPI and DT-based systems
    going forward. We're not expecting fallout from this one (as usual),
    but it's something to watch nevertheless.

    The second change having a chance to be visible is that ACPI video
    will now default to using native backlight rather than the ACPI
    backlight interface which should generally help systems with broken
    Win8 BIOSes. We're hoping that all problems with the native backlight
    handling that we had previously have been addressed and we are in a
    good enough shape to flip the default, but this change should be easy
    enough to revert if need be.

    In addition to that, the system suspend core has a new mechanism to
    allow runtime-suspended devices to stay suspended throughout system
    suspend/resume transitions if some extra conditions are met
    (generally, they are related to coordination within device hierarchy).
    However, enabling this feature requires cooperation from the bus type
    layer and for now it has only been implemented for the ACPI PM domain
    (used by ACPI-enumerated platform devices mostly today).

    Also, the acpidump utility that was previously shipped as a separate
    tool will now be provided by the upstream ACPICA along with the rest
    of ACPICA code, which will allow it to be more up to date and better
    supported, and we have one new cpuidle driver (ARM clps711x).

    The rest is improvements related to certain specific use cases,
    cleanups and fixes all over the place.

    Specifics:

    - ACPICA update to upstream version 20140424. That includes a number
    of fixes and improvements related to things like GPE handling,
    table loading, headers, memory mapping and unmapping, DSDT/SSDT
    overriding, and the Unload() operator. The acpidump utility from
    upstream ACPICA is included too. From Bob Moore, Lv Zheng, David
    Box, David Binderman, and Colin Ian King.

    - Fixes and cleanups related to ACPI video and backlight interfaces
    from Hans de Goede. That includes blacklist entries for some new
    machines and using native backlight by default.

    - ACPI device enumeration changes to create platform devices rather
    than PNP devices for ACPI device objects with _HID by default. PNP
    devices will still be created for the ACPI device object with
    device IDs corresponding to real PNP devices, so that change should
    not break things left and right, and we're expecting to see more
    and more ACPI-enumerated platform devices in the future. From
    Zhang Rui and Rafael J Wysocki.

    - Updates for the ACPI LPSS (Low-Power Subsystem) driver allowing it
    to handle system suspend/resume on Asus T100 correctly. From
    Heikki Krogerus and Rafael J Wysocki.

    - PM core update introducing a mechanism to allow runtime-suspended
    devices to stay suspended over system suspend/resume transitions if
    certain additional conditions related to coordination within device
    hierarchy are met. Related PM documentation update and ACPI PM
    domain support for the new feature. From Rafael J Wysocki.

    - Fixes and improvements related to the "freeze" sleep state. They
    affect several places including cpuidle, PM core, ACPI core, and
    the ACPI battery driver. From Rafael J Wysocki and Zhang Rui.

    - Miscellaneous fixes and updates of the ACPI core from Aaron Lu,
    Bjørn Mork, Hanjun Guo, Lan Tianyu, and Rafael J Wysocki.

    - Fixes and cleanups for the ACPI processor and ACPI PAD (Processor
    Aggregator Device) drivers from Baoquan He, Manuel Schölling, Tony
    Camuso, and Toshi Kani.

    - System suspend/resume optimization in the ACPI battery driver from
    Lan Tianyu.

    - OPP (Operating Performance Points) subsystem updates from Chander
    Kashyap, Mark Brown, and Nishanth Menon.

    - cpufreq core fixes, updates and cleanups from Srivatsa S Bhat,
    Stratos Karafotis, and Viresh Kumar.

    - Updates, fixes and cleanups for the Tegra, powernow-k8, imx6q,
    s5pv210, nforce2, and powernv cpufreq drivers from Brian Norris,
    Jingoo Han, Paul Bolle, Philipp Zabel, Stratos Karafotis, and
    Viresh Kumar.

    - intel_pstate driver fixes and cleanups from Dirk Brandewie, Doug
    Smythies, and Stratos Karafotis.

    - Enabling the big.LITTLE cpufreq driver on arm64 from Mark Brown.

    - Fix for the cpuidle menu governor from Chander Kashyap.

    - New ARM clps711x cpuidle driver from Alexander Shiyan.

    - Hibernate core fixes and cleanups from Chen Gang, Dan Carpenter,
    Fabian Frederick, Pali Rohár, and Sebastian Capella.

    - Intel RAPL (Running Average Power Limit) driver updates from Jacob
    Pan.

    - PNP subsystem updates from Bjorn Helgaas and Fabian Frederick.

    - devfreq core updates from Chanwoo Choi and Paul Bolle.

    - devfreq updates for exynos4 and exynos5 from Chanwoo Choi and
    Bartlomiej Zolnierkiewicz.

    - turbostat tool fix from Jean Delvare.

    - cpupower tool updates from Prarit Bhargava, Ramkumar Ramachandra
    and Thomas Renninger.

    - New ACPI ec_access.c tool for poking at the EC in a safe way from
    Thomas Renninger"

    * tag 'pm+acpi-3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (187 commits)
    ACPICA: Namespace: Remove _PRP method support.
    intel_pstate: Improve initial busy calculation
    intel_pstate: add sample time scaling
    intel_pstate: Correct rounding in busy calculation
    intel_pstate: Remove C0 tracking
    PM / hibernate: fixed typo in comment
    ACPI: Fix x86 regression related to early mapping size limitation
    ACPICA: Tables: Add mechanism to control early table checksum verification.
    ACPI / scan: use platform bus type by default for _HID enumeration
    ACPI / scan: always register ACPI LPSS scan handler
    ACPI / scan: always register memory hotplug scan handler
    ACPI / scan: always register container scan handler
    ACPI / scan: Change the meaning of missing .attach() in scan handlers
    ACPI / scan: introduce platform_id device PNP type flag
    ACPI / scan: drop unsupported serial IDs from PNP ACPI scan handler ID list
    ACPI / scan: drop IDs that do not comply with the ACPI PNP ID rule
    ACPI / PNP: use device ID list for PNPACPI device enumeration
    ACPI / scan: .match() callback for ACPI scan handlers
    ACPI / battery: wakeup the system only when necessary
    power_supply: allow power supply devices registered w/o wakeup source
    ...

    Linus Torvalds
     
  • * pnp:
    MAINTAINERS: Remove Bjorn Helgaas as PNP maintainer
    PNP / resources: remove positive test on unsigned values

    * powercap:
    powercap / RAPL: add new CPU IDs
    powercap / RAPL: further relax energy counter checks

    * pm-runtime:
    PM / runtime: Update documentation to reflect the current code flow

    * pm-opp:
    PM / OPP: discard duplicate OPPs
    PM / OPP: Make OPP invisible to users in Kconfig
    PM / OPP: fix incorrect OPP count handling in of_init_opp_table

    Rafael J. Wysocki
     
  • * acpi-pm:
    ACPI / PM: Export rest of the subsys PM callbacks
    ACPI / PM: Avoid resuming devices in ACPI PM domain during system suspend
    ACPI / PM: Hold ACPI scan lock over the "freeze" sleep state
    ACPI / PM: Export acpi_target_system_state() to modules

    Rafael J. Wysocki
     
  • * pm-sleep:
    PM / hibernate: fixed typo in comment
    PM / sleep: unregister wakeup source when disabling device wakeup
    PM / sleep: Introduce command line argument for sleep state enumeration
    PM / sleep: Use valid_state() for platform-dependent sleep states only
    PM / sleep: Add state field to pm_states[] entries
    PM / sleep: Update device PM documentation to cover direct_complete
    PM / sleep: Mechanism to avoid resuming runtime-suspended devices unnecessarily
    PM / hibernate: Fix memory corruption in resumedelay_setup()
    PM / hibernate: convert simple_strtoul to kstrtoul
    PM / hibernate: Documentation: Fix script for unswapping
    PM / hibernate: no kernel_power_off when pm_power_off NULL
    PM / hibernate: use unsigned local variables in swsusp_show_speed()

    Rafael J. Wysocki
     
  • * pm-cpuidle:
    PM / suspend: Always use deepest C-state in the "freeze" sleep state
    cpuidle / menu: move repeated correction factor check to init
    cpuidle / menu: Return (-1) if there are no suitable states
    cpuidle: Combine cpuidle_enabled() with cpuidle_select()
    ARM: clps711x: Add cpuidle driver

    Rafael J. Wysocki
     
  • …/git/tip/tip into next

    Pull scheduler updates from Ingo Molnar:
    "The main scheduling related changes in this cycle were:

    - various sched/numa updates, for better performance

    - tree wide cleanup of open coded nice levels

    - nohz fix related to rq->nr_running use

    - cpuidle changes and continued consolidation to improve the
    kernel/sched/idle.c high level idle scheduling logic. As part of
    this effort I pulled cpuidle driver changes from Rafael as well.

    - standardized idle polling amongst architectures

    - continued work on preparing better power/energy aware scheduling

    - sched/rt updates

    - misc fixlets and cleanups"

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (49 commits)
    sched/numa: Decay ->wakee_flips instead of zeroing
    sched/numa: Update migrate_improves/degrades_locality()
    sched/numa: Allow task switch if load imbalance improves
    sched/rt: Fix 'struct sched_dl_entity' and dl_task_time() comments, to match the current upstream code
    sched: Consolidate open coded implementations of nice level frobbing into nice_to_rlimit() and rlimit_to_nice()
    sched: Initialize rq->age_stamp on processor start
    sched, nohz: Change rq->nr_running to always use wrappers
    sched: Fix the rq->next_balance logic in rebalance_domains() and idle_balance()
    sched: Use clamp() and clamp_val() to make sys_nice() more readable
    sched: Do not zero sg->cpumask and sg->sgp->power in build_sched_groups()
    sched/numa: Fix initialization of sched_domain_topology for NUMA
    sched: Call select_idle_sibling() when not affine_sd
    sched: Simplify return logic in sched_read_attr()
    sched: Simplify return logic in sched_copy_attr()
    sched: Fix exec_start/task_hot on migrated tasks
    arm64: Remove TIF_POLLING_NRFLAG
    metag: Remove TIF_POLLING_NRFLAG
    sched/idle: Make cpuidle_idle_call() void
    sched/idle: Reflow cpuidle_idle_call()
    sched/idle: Delay clearing the polling bit
    ...

    Linus Torvalds
     
  • …git/tip/tip into next

    Pull perf updates from Ingo Molnar:
    "The tooling changes maintained by Jiri Olsa until Arnaldo is on
    vacation:

    User visible changes:
    - Add -F option for specifying output fields (Namhyung Kim)
    - Propagate exit status of a command line workload for record command
    (Namhyung Kim)
    - Use tid for finding thread (Namhyung Kim)
    - Clarify the output of perf sched map plus small sched command
    fixes (Dongsheng Yang)
    - Wire up perf_regs and unwind support for ARM64 (Jean Pihet)
    - Factor hists statistics counts processing which in turn also fixes
    several bugs in TUI report command (Namhyung Kim)
    - Add --percentage option to control absolute/relative percentage
    output (Namhyung Kim)
    - Add --list-cmds to 'kmem', 'mem', 'lock' and 'sched', for use by
    completion scripts (Ramkumar Ramachandra)

    Development/infrastructure changes and fixes:
    - Android related fixes for pager and map dso resolving (Michael
    Lentine)
    - Add libdw DWARF post unwind support for ARM (Jean Pihet)
    - Consolidate types.h for ARM and ARM64 (Jean Pihet)
    - Fix possible null pointer dereference in session.c (Masanari Iida)
    - Cleanup, remove unused variables in map_switch_event() (Dongsheng
    Yang)
    - Remove nr_state_machine_bugs in perf latency (Dongsheng Yang)
    - Remove usage of trace_sched_wakeup(.success) (Peter Zijlstra)
    - Cleanups for perf.h header (Jiri Olsa)
    - Consolidate types.h and export.h within tools (Borislav Petkov)
    - Move u64_swap union to its single user's header, evsel.h (Borislav
    Petkov)
    - Fix for s390 to properly parse tracepoints plus test code
    (Alexander Yarygin)
    - Handle EINTR error for readn/writen (Namhyung Kim)
    - Add a test case for hists filtering (Namhyung Kim)
    - Share map_groups among threads of the same group (Arnaldo Carvalho
    de Melo, Jiri Olsa)
    - Making some code (cpu node map and report parse callchain callback)
    global to be usable by upcomming changes (Don Zickus)
    - Fix pmu object compilation error (Jiri Olsa)

    Kernel side changes:
    - intrusive uprobes fixes from Oleg Nesterov. Since the interface is
    admin-only, and the bug only affects user-space ("any probed
    jmp/call can kill the application"), we queued these fixes via the
    development tree, as a special exception.
    - more fuzzer motivated race fixes and related refactoring and
    robustization.
    - allow PMU drivers to be built as modules. (No actual module yet,
    because the x86 Intel uncore module wasn't ready in time for this)"

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits)
    perf tools: Add automatic remapping of Android libraries
    perf tools: Add cat as fallback pager
    perf tests: Add a testcase for histogram output sorting
    perf tests: Factor out print_hists_*()
    perf tools: Introduce reset_output_field()
    perf tools: Get rid of obsolete hist_entry__sort_list
    perf hists: Reset width of output fields with header length
    perf tools: Skip elided sort entries
    perf top: Add --fields option to specify output fields
    perf report/tui: Fix a bug when --fields/sort is given
    perf tools: Add ->sort() member to struct sort_entry
    perf report: Add -F option to specify output fields
    perf tools: Call perf_hpp__init() before setting up GUI browsers
    perf tools: Consolidate management of default sort orders
    perf tools: Allow hpp fields to be sort keys
    perf ui: Get rid of callback from __hpp__fmt()
    perf tools: Consolidate output field handling to hpp format routines
    perf tools: Use hpp formats to sort final output
    perf tools: Support event grouping in hpp ->sort()
    perf tools: Use hpp formats to sort hist entries
    ...

    Linus Torvalds
     
  • …el/git/tip/tip into next

    Pull core locking updates from Ingo Molnar:
    "The main changes in this cycle were:

    - reduced/streamlined smp_mb__*() interface that allows more usecases
    and makes the existing ones less buggy, especially in rarer
    architectures

    - add rwsem implementation comments

    - bump up lockdep limits"

    * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
    rwsem: Add comments to explain the meaning of the rwsem's count field
    lockdep: Increase static allocations
    arch: Mass conversion of smp_mb__*()
    arch,doc: Convert smp_mb__*()
    arch,xtensa: Convert smp_mb__*()
    arch,x86: Convert smp_mb__*()
    arch,tile: Convert smp_mb__*()
    arch,sparc: Convert smp_mb__*()
    arch,sh: Convert smp_mb__*()
    arch,score: Convert smp_mb__*()
    arch,s390: Convert smp_mb__*()
    arch,powerpc: Convert smp_mb__*()
    arch,parisc: Convert smp_mb__*()
    arch,openrisc: Convert smp_mb__*()
    arch,mn10300: Convert smp_mb__*()
    arch,mips: Convert smp_mb__*()
    arch,metag: Convert smp_mb__*()
    arch,m68k: Convert smp_mb__*()
    arch,m32r: Convert smp_mb__*()
    arch,ia64: Convert smp_mb__*()
    ...

    Linus Torvalds
     
  • Pull RCU changes from Ingo Molnar:
    "The main RCU changes in this cycle were:

    - RCU torture-test changes.

    - variable-name renaming cleanup.

    - update RCU documentation.

    - miscellaneous fixes.

    - patch to suppress RCU stall warnings while sysrq requests are being
    processed"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (68 commits)
    rcu: Provide API to suppress stall warnings while sysrc runs
    rcu: Variable name changed in tree_plugin.h and used in tree.c
    torture: Remove unused definition
    torture: Remove __init from torture_init_begin/end
    torture: Check for multiple concurrent torture tests
    locktorture: Remove reference to nonexistent Kconfig parameter
    rcutorture: Run rcu_torture_writer at normal priority
    rcutorture: Note diffs from git commits
    rcutorture: Add missing destroy_timer_on_stack()
    rcutorture: Explicitly test synchronous grace-period primitives
    rcutorture: Add tests for get_state_synchronize_rcu()
    rcutorture: Test RCU-sched primitives in TREE_PREEMPT_RCU kernels
    torture: Use elapsed time to detect hangs
    rcutorture: Check for rcu_torture_fqs creation errors
    torture: Better summary diagnostics for build failures
    torture: Notice if an all-zero cpumask is passed inside a critical section
    rcutorture: Make rcu_torture_reader() use cond_resched()
    sched,rcu: Make cond_resched() report RCU quiescent states
    percpu: Fix raw_cpu_inc_return()
    rcutorture: Export RCU grace-period kthread wait state to rcutorture
    ...

    Linus Torvalds
     
  • Pull tty/serial driver updates from Greg KH:
    "Here is the big tty / serial driver pull request for 3.16-rc1.

    A variety of different serial driver fixes and updates and additions,
    nothing huge, and no real major core tty changes at all.

    All have been in linux-next for a while"

    * tag 'tty-3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (84 commits)
    Revert "serial: imx: remove the DMA wait queue"
    serial: kgdb_nmi: Improve console integration with KDB I/O
    serial: kgdb_nmi: Switch from tasklets to real timers
    serial: kgdb_nmi: Use container_of() to locate private data
    serial: cpm_uart: No LF conversion in put_poll_char()
    serial: sirf: Fix compilation failure
    console: Remove superfluous readonly check
    console: Use explicit pointer type for vc_uni_pagedir* fields
    vgacon: Fix & cleanup refcounting
    ARM: tty: Move HVC DCC assembly to arch/arm
    tty/hvc/hvc_console: Fix wakeup of HVC thread on hvc_kick()
    drivers/tty/n_hdlc.c: replace kmalloc/memset by kzalloc
    vt: emulate 8- and 24-bit colour codes.
    printk/of_serial: fix serial console cessation part way through boot.
    serial: 8250_dma: check the result of TX buffer mapping
    serial: uart: add hw flow control support configuration
    tty/serial: at91: add interrupts for modem control lines
    tty/serial: at91: use mctrl_gpio helpers
    tty/serial: Add GPIOLIB helpers for controlling modem lines
    ARM: at91: gpio: implement get_direction
    ...

    Linus Torvalds
     

03 Jun, 2014

2 commits

  • …t/gregkh/driver-core into next

    Pull driver core / kernfs changes from Greg KH:
    "Here is the "big" pull request for 3.16-rc1.

    Not a lot of changes here, some kernfs work, a revert of a very old
    driver core change that ended up cauing some memory leaks on driver
    probe error paths, and other minor things.

    As was pointed out earlier today, one commit here, 26fc9cd200ec
    ("kernfs: move the last knowledge of sysfs out from kernfs") is also
    needed in your 3.15-final branch as well. If you could cherry-pick it
    there, it would be most appreciated by Andy Lutomirski to prevent a
    regression there.

    All of these have been in linux-next for a while"

    * tag 'driver-core-3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
    crypto/nx/nx-842: dev_set_drvdata can no longer fail
    kernfs: move the last knowledge of sysfs out from kernfs
    sysfs: fix attribute_group bin file path on removal
    sysfs.h: don't return a void-valued expression in sysfs_remove_file
    init.h: Update initcall_sync variants to fix build errors
    driver core: Inline dev_set/get_drvdata
    driver core: dev_get_drvdata: Don't check for NULL dev
    driver core: dev_set_drvdata returns void
    driver core: dev_set_drvdata can no longer fail
    driver core: Move driver_data back to struct device
    lib/devres.c: fix checkpatch warnings
    lib/devres.c: use dev in devm_request_and_ioremap
    kobject: Make support for uevent_helper optional.
    kernfs: make kernfs_notify() trigger inotify events too
    kernfs: implement kernfs_root->supers list

    Linus Torvalds
     
  • Pull PCI changes from Bjorn Helgaas:
    "Enumeration
    - Notify driver before and after device reset (Keith Busch)
    - Use reset notification in NVMe (Keith Busch)

    NUMA
    - Warn if we have to guess host bridge node information (Myron Stowe)
    - Work around AMD Fam15h BIOSes that fail to provide _PXM (Suravee
    Suthikulpanit)
    - Clean up and mark early_root_info_init() as deprecated (Suravee
    Suthikulpanit)

    Driver binding
    - Add "driver_override" for force specific binding (Alex Williamson)
    - Fail "new_id" addition for devices we already know about (Bandan
    Das)

    Resource management
    - Support BAR sizes up to 8GB (Nikhil Rao, Alan Cox)
    - Don't move IORESOURCE_PCI_FIXED resources (Bjorn Helgaas)
    - Mark SBx00 HPET BAR as IORESOURCE_PCI_FIXED (Bjorn Helgaas)
    - Fail safely if we can't handle BARs larger than 4GB (Bjorn Helgaas)
    - Reject BAR above 4GB if dma_addr_t is too small (Bjorn Helgaas)
    - Don't convert BAR address to resource if dma_addr_t is too small
    (Bjorn Helgaas)
    - Don't set BAR to zero if dma_addr_t is too small (Bjorn Helgaas)
    - Don't print anything while decoding is disabled (Bjorn Helgaas)
    - Don't add disabled subtractive decode bus resources (Bjorn Helgaas)
    - Add resource allocation comments (Bjorn Helgaas)
    - Restrict 64-bit prefetchable bridge windows to 64-bit resources
    (Yinghai Lu)
    - Assign i82875p_edac PCI resources before adding device (Yinghai Lu)

    PCI device hotplug
    - Remove unnecessary "dev->bus" test (Bjorn Helgaas)
    - Use PCI_EXP_SLTCAP_PSN define (Bjorn Helgaas)
    - Fix rphahp endianess issues (Laurent Dufour)
    - Acknowledge spurious "cmd completed" event (Rajat Jain)
    - Allow hotplug service drivers to operate in polling mode (Rajat Jain)
    - Fix cpqphp possible NULL dereference (Rickard Strandqvist)

    MSI
    - Replace pci_enable_msi_block() by pci_enable_msi_exact()
    (Alexander Gordeev)
    - Replace pci_enable_msix() by pci_enable_msix_exact() (Alexander Gordeev)
    - Simplify populate_msi_sysfs() (Jan Beulich)

    Virtualization
    - Add Intel Patsburg (X79) root port ACS quirk (Alex Williamson)
    - Mark RTL8110SC INTx masking as broken (Alex Williamson)

    Generic host bridge driver
    - Add generic PCI host controller driver (Will Deacon)

    Freescale i.MX6
    - Use new clock names (Lucas Stach)
    - Drop old IRQ mapping (Lucas Stach)
    - Remove optional (and unused) IRQs (Lucas Stach)
    - Add support for MSI (Lucas Stach)
    - Fix imx6_add_pcie_port() section mismatch warning (Sachin Kamat)

    Renesas R-Car
    - Add gen2 device tree support (Ben Dooks)
    - Use new OF interrupt mapping when possible (Lucas Stach)
    - Add PCIe driver (Phil Edworthy)
    - Add PCIe MSI support (Phil Edworthy)
    - Add PCIe device tree bindings (Phil Edworthy)

    Samsung Exynos
    - Remove unnecessary OOM messages (Jingoo Han)
    - Fix add_pcie_port() section mismatch warning (Sachin Kamat)

    Synopsys DesignWare
    - Make MSI ISR shared IRQ aware (Lucas Stach)

    Miscellaneous
    - Check for broken config space aliasing (Alex Williamson)
    - Update email address (Ben Hutchings)
    - Fix Broadcom CNB20LE unintended sign extension (Bjorn Helgaas)
    - Fix incorrect vgaarb conditional in WARN_ON() (Bjorn Helgaas)
    - Remove unnecessary __ref annotations (Bjorn Helgaas)
    - Add arch/x86/kernel/quirks.c to MAINTAINERS PCI file patterns
    (Bjorn Helgaas)
    - Fix use of uninitialized MPS value (Bjorn Helgaas)
    - Tidy x86/gart messages (Bjorn Helgaas)
    - Fix return value from pci_user_{read,write}_config_*() (Gavin Shan)
    - Turn pcibios_penalize_isa_irq() into a weak function (Hanjun Guo)
    - Remove unused serial device IDs (Jean Delvare)
    - Use designated initialization in PCI_VDEVICE (Mark Rustad)
    - Fix powerpc NULL dereference in pci_root_buses traversal (Mike Qiu)
    - Configure MPS on ARM (Murali Karicheri)
    - Remove unnecessary includes of (Paul Gortmaker)
    - Move Open Firmware devspec attribute to PCI common code (Sebastian Ott)
    - Use pdev->dev.groups for attribute creation on s390 (Sebastian Ott)
    - Remove pcibios_add_platform_entries() (Sebastian Ott)
    - Add new ID for Intel GPU "spurious interrupt" quirk (Thomas Jarosch)
    - Rename pci_is_bridge() to pci_has_subordinate() (Yijing Wang)
    - Add and use new pci_is_bridge() interface (Yijing Wang)
    - Make pci_bus_add_device() void (Yijing Wang)

    DMA API
    - Clarify physical/bus address distinction in docs (Bjorn Helgaas)
    - Fix typos in docs (Emilio López)
    - Update dma_pool_create ()and dma_pool_alloc() descriptions (Gioh Kim)
    - Change dma_declare_coherent_memory() CPU address to phys_addr_t
    (Bjorn Helgaas)
    - Pass GAPSPCI_DMA_BASE CPU & bus address to dma_declare_coherent_memory()
    (Bjorn Helgaas)"

    * tag 'pci-v3.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (92 commits)
    MAINTAINERS: Add generic PCI host controller driver
    PCI: generic: Add generic PCI host controller driver
    PCI: imx6: Add support for MSI
    PCI: designware: Make MSI ISR shared IRQ aware
    PCI: imx6: Remove optional (and unused) IRQs
    PCI: imx6: Drop old IRQ mapping
    PCI: imx6: Use new clock names
    i82875p_edac: Assign PCI resources before adding device
    ARM/PCI: Call pcie_bus_configure_settings() to set MPS
    PCI: imx6: Fix imx6_add_pcie_port() section mismatch warning
    PCI: Make pci_bus_add_device() void
    PCI: exynos: Fix add_pcie_port() section mismatch warning
    PCI: Introduce new device binding path using pci_dev.driver_override
    PCI: rcar: Add gen2 device tree support
    PCI: cpqphp: Fix possible null pointer dereference
    PCI: rcar: Add R-Car PCIe device tree bindings
    PCI: rcar: Add MSI support for PCIe
    PCI: rcar: Add Renesas R-Car PCIe driver
    PCI: Fix return value from pci_user_{read,write}_config_*()
    PCI: exynos: Remove unnecessary OOM messages
    ...

    Linus Torvalds
     

02 Jun, 2014

1 commit

  • Pull scheduler fixes from Ingo Molnar:
    "Various fixlets, mostly related to the (root-only) SCHED_DEADLINE
    policy, but also a hotplug bug fix and a fix for a NR_CPUS related
    overallocation bug causing a suspend/resume regression"

    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched: Fix hotplug vs. set_cpus_allowed_ptr()
    sched/cpupri: Replace NR_CPUS arrays
    sched/deadline: Replace NR_CPUS arrays
    sched/deadline: Restrict user params max value to 2^63 ns
    sched/deadline: Change sched_getparam() behaviour vs SCHED_DEADLINE
    sched: Disallow sched_attr::sched_policy < 0
    sched: Make sched_setattr() correctly return -EFBIG

    Linus Torvalds
     

01 Jun, 2014

2 commits

  • Fix a trivial comment typo (s/mam/map) in kernel/power/swap.c.

    Signed-off-by: Niv Yehezkel
    Signed-off-by: Rafael J. Wysocki

    Niv Yehezkel
     
  • Pull core futex/rtmutex fixes from Thomas Gleixner:
    "Three fixlets for long standing issues in the futex/rtmutex code
    unearthed by Dave Jones syscall fuzzer:

    - Add missing early deadlock detection checks in the futex code
    - Prevent user space from attaching a futex to kernel threads
    - Make the deadlock detector of rtmutex work again

    Looks large, but is more comments than code change"

    * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    rtmutex: Fix deadlock detector for real
    futex: Prevent attaching to kernel threads
    futex: Add another early deadlock detection check

    Linus Torvalds
     

29 May, 2014

1 commit

  • Commit 5f5c9ae56c38942623f69c3e6dc6ec78e4da2076
    "serial_core: Unregister console in uart_remove_one_port()"
    fixed a crash where a serial port was removed but
    not deregistered as a console.

    There is a side effect of that commit for platforms having serial consoles
    and of_serial configured (CONFIG_SERIAL_OF_PLATFORM). The serial console
    is disabled midway through the boot process.

    This cessation of the serial console affects PowerPC computers
    such as the MVME5100 and SAM440EP.

    The sequence is:

    bootconsole [udbg0] enabled
    ....
    serial8250/16550 driver initialises and registers its UARTS,
    one of these is the serial console.
    console [ttyS0] enabled
    ....
    of_serial probes "platform" devices, registering them as it goes.
    One of these is the serial console.
    console [ttyS0] disabled.

    The disabling of the serial console is due to:

    a. unregister_console in printk not clearing the
    CONS_ENABLED bit in the console flags,
    even though it has announced that the console is disabled; and

    b. of_platform_serial_probe in of_serial not setting the port type
    before it registers with serial8250_register_8250_port.

    This patch ensures that the serial console is re-enabled when of_serial
    registers a serial port that corresponds to the designated console.

    Signed-off-by: Stephen Chivers
    Tested-by: Stephen Chivers
    Acked-by: Geert Uytterhoeven [unregister_console]
    Cc: stable # 3.15

    ===
    The above failure was identified in Linux-3.15-rc2.

    Tested using MVME5100 and SAM440EP PowerPC computers with
    kernels built from Linux-3.15-rc5 and tty-next.

    The continued operation of the serial console is vital for computers
    such as the MVME5100 as that Single Board Computer does not
    have any grapical/display hardware.
    Signed-off-by: Greg Kroah-Hartman

    Stephen Chivers
     

28 May, 2014

4 commits

  • The current deadlock detection logic does not work reliably due to the
    following early exit path:

    /*
    * Drop out, when the task has no waiters. Note,
    * top_waiter can be NULL, when we are in the deboosting
    * mode!
    */
    if (top_waiter && (!task_has_pi_waiters(task) ||
    top_waiter != task_top_pi_waiter(task)))
    goto out_unlock_pi;

    So this not only exits when the task has no waiters, it also exits
    unconditionally when the current waiter is not the top priority waiter
    of the task.

    So in a nested locking scenario, it might abort the lock chain walk
    and therefor miss a potential deadlock.

    Simple fix: Continue the chain walk, when deadlock detection is
    enabled.

    We also avoid the whole enqueue, if we detect the deadlock right away
    (A-A). It's an optimization, but also prevents that another waiter who
    comes in after the detection and before the task has undone the damage
    observes the situation and detects the deadlock and returns
    -EDEADLOCK, which is wrong as the other task is not in a deadlock
    situation.

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Reviewed-by: Steven Rostedt
    Cc: Lai Jiangshan
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/20140522031949.725272460@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • Pull two powerpc fixes from Ben Herrenschmidt:
    "Here's a pair of powerpc fixes for 3.15 which are also going to
    stable.

    One's a fix for building with newer binutils (the problem currently
    only affects the BookE kernels but the affected macro might come back
    into use on BookS platforms at any time). Unfortunately, the binutils
    maintainer did a backward incompatible change to a construct that we
    use so we have to add Makefile check.

    The other one is a fix for CPUs getting stuck in kexec when running
    single threaded. Since we routinely use kexec on power (including in
    our newer bootloaders), I deemed that important enough"

    * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
    powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode
    powerpc: Fix 64 bit builds with binutils 2.24

    Linus Torvalds
     
  • If we try to perform a kexec when the machine is in ST (Single-Threaded) mode
    (ppc64_cpu --smt=off), the kexec operation doesn't succeed properly, and we
    get the following messages during boot:

    [ 0.089866] POWER8 performance monitor hardware support registered
    [ 0.089985] power8-pmu: PMAO restore workaround active.
    [ 5.095419] Processor 1 is stuck.
    [ 10.097933] Processor 2 is stuck.
    [ 15.100480] Processor 3 is stuck.
    [ 20.102982] Processor 4 is stuck.
    [ 25.105489] Processor 5 is stuck.
    [ 30.108005] Processor 6 is stuck.
    [ 35.110518] Processor 7 is stuck.
    [ 40.113369] Processor 9 is stuck.
    [ 45.115879] Processor 10 is stuck.
    [ 50.118389] Processor 11 is stuck.
    [ 55.120904] Processor 12 is stuck.
    [ 60.123425] Processor 13 is stuck.
    [ 65.125970] Processor 14 is stuck.
    [ 70.128495] Processor 15 is stuck.
    [ 75.131316] Processor 17 is stuck.

    Note that only the sibling threads are stuck, while the primary threads (0, 8,
    16 etc) boot just fine. Looking closer at the previous step of kexec, we observe
    that kexec tries to wakeup (bring online) the sibling threads of all the cores,
    before performing kexec:

    [ 9464.131231] Starting new kernel
    [ 9464.148507] kexec: Waking offline cpu 1.
    [ 9464.148552] kexec: Waking offline cpu 2.
    [ 9464.148600] kexec: Waking offline cpu 3.
    [ 9464.148636] kexec: Waking offline cpu 4.
    [ 9464.148671] kexec: Waking offline cpu 5.
    [ 9464.148708] kexec: Waking offline cpu 6.
    [ 9464.148743] kexec: Waking offline cpu 7.
    [ 9464.148779] kexec: Waking offline cpu 9.
    [ 9464.148815] kexec: Waking offline cpu 10.
    [ 9464.148851] kexec: Waking offline cpu 11.
    [ 9464.148887] kexec: Waking offline cpu 12.
    [ 9464.148922] kexec: Waking offline cpu 13.
    [ 9464.148958] kexec: Waking offline cpu 14.
    [ 9464.148994] kexec: Waking offline cpu 15.
    [ 9464.149030] kexec: Waking offline cpu 17.

    Instrumenting this piece of code revealed that the cpu_up() operation actually
    fails with -EBUSY. Thus, only the primary threads of all the cores are online
    during kexec, and hence this is a sure-shot receipe for disaster, as explained
    in commit e8e5c2155b (powerpc/kexec: Fix orphaned offline CPUs across kexec),
    as well as in the comment above wake_offline_cpus().

    It turns out that cpu_up() was returning -EBUSY because the variable
    'cpu_hotplug_disabled' was set to 1; and this disabling of CPU hotplug was done
    by migrate_to_reboot_cpu() inside kernel_kexec().

    Now, migrate_to_reboot_cpu() was originally written with the assumption that
    any further code will not need to perform CPU hotplug, since we are anyway in
    the reboot path. However, kexec is clearly not such a case, since we depend on
    onlining CPUs, atleast on powerpc.

    So re-enable cpu-hotplug after returning from migrate_to_reboot_cpu() in the
    kexec path, to fix this regression in kexec on powerpc.

    Also, wrap the cpu_up() in powerpc kexec code within a WARN_ON(), so that we
    can catch such issues more easily in the future.

    Fixes: c97102ba963 (kexec: migrate to reboot cpu)
    Cc: stable@vger.kernel.org
    Signed-off-by: Srivatsa S. Bhat
    Signed-off-by: Benjamin Herrenschmidt

    Srivatsa S. Bhat
     
  • There is still one residue of sysfs remaining: the sb_magic
    SYSFS_MAGIC. However this should be kernfs user specific,
    so this patch moves it out. Kerrnfs user should specify their
    magic number while mouting.

    Signed-off-by: Jianyu Zhan
    Acked-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Jianyu Zhan
     

26 May, 2014

3 commits

  • On some systems the platform doesn't support neither
    PM_SUSPEND_MEM nor PM_SUSPEND_STANDBY, so PM_SUSPEND_FREEZE is the
    only available system sleep state. However, some user space frameworks
    only use the "mem" and (sometimes) "standby" sleep state labels, so
    the users of those systems need to modify user space in order to be
    able to use system suspend at all and that is not always possible.

    For this reason, add a new kernel command line argument,
    relative_sleep_states, allowing the users of those systems to change
    the way in which the kernel assigns labels to system sleep states.
    Namely, for relative_sleep_states=1, the "mem", "standby" and "freeze"
    labels will enumerate the available system sleem states from the
    deepest to the shallowest, respectively, so that "mem" is always
    present in /sys/power/state and the other state strings may or may
    not be presend depending on what is supported by the platform.

    Update system sleep states documentation to reflect this change.

    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     
  • Use the observation that, for platform-dependent sleep states
    (PM_SUSPEND_STANDBY, PM_SUSPEND_MEM), a given state is either
    always supported or always unsupported and store that information
    in pm_states[] instead of calling valid_state() every time we
    need to check it.

    Also do not use valid_state() for PM_SUSPEND_FREEZE, which is always
    valid, and move the pm_test_level validity check for PM_SUSPEND_FREEZE
    directly into enter_state().

    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     
  • To allow sleep states corresponding to the "mem", "standby" and
    "freeze" lables to be different from the pm_states[] indexes of
    those strings, introduce struct pm_sleep_state, consisting of
    a string label and a state number, and turn pm_states[] into an
    array of objects of that type.

    This modification should not lead to any functional changes.

    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     

24 May, 2014

3 commits

  • Pull scheduler fixes from Ingo Molnar:
    "The biggest commit is an irqtime accounting loop latency fix, the rest
    are misc fixes all over the place: deadline scheduling, docs, numa,
    balancer and a bad to-idle latency fix"

    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched/numa: Initialize newidle balance stats in sd_numa_init()
    sched: Fix updating rq->max_idle_balance_cost and rq->next_balance in idle_balance()
    sched: Skip double execution of pick_next_task_fair()
    sched: Use CPUPRI_NR_PRIORITIES instead of MAX_RT_PRIO in cpupri check
    sched/deadline: Fix memory leak
    sched/deadline: Fix sched_yield() behavior
    sched: Sanitize irq accounting madness
    sched/docbook: Fix 'make htmldocs' warnings caused by missing description

    Linus Torvalds
     
  • Pull perf fixes from Ingo Molnar:
    "The biggest changes are fixes for races that kept triggering Trinity
    crashes, plus liblockdep build fixes and smaller misc fixes.

    The liblockdep bits in perf/urgent are a pull mistake - they should
    have been in locking/urgent - but by the time I noticed other commits
    were added and testing was done :-/ Sorry about that"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    perf: Fix a race between ring_buffer_detach() and ring_buffer_attach()
    perf: Prevent false warning in perf_swevent_add
    perf: Limit perf_event_attr::sample_period to 63 bits
    tools/liblockdep: Remove all build files when doing make clean
    tools/liblockdep: Build liblockdep from tools/Makefile
    perf/x86/intel: Fix Silvermont's event constraints
    perf: Fix perf_event_init_context()
    perf: Fix race in removing an event

    Linus Torvalds
     
  • The resource map sanity check message is a bit confusing. Change it to be
    more readable:

    -resource map sanity check conflict: 0xfed10000 0xfed15fff 0xfed10000 0xfed13fff pnp 00:01
    +resource sanity check: requesting [mem 0xfed10000-0xfed15fff], which spans more than pnp 00:01 [mem 0xfed10000-0xfed13fff]

    Signed-off-by: Bjorn Helgaas

    Bjorn Helgaas
     

23 May, 2014

1 commit


22 May, 2014

11 commits

  • …/linux-rcu into core/rcu

    Pull RCU updates from Paul E. McKenney:

    " 1. Update RCU documentation. These were posted to LKML at
    https://lkml.org/lkml/2014/4/28/634.

    2. Miscellaneous fixes. These were posted to LKML at
    https://lkml.org/lkml/2014/4/28/645.

    3. Torture-test changes. These were posted to LKML at
    https://lkml.org/lkml/2014/4/28/667.

    4. Variable-name renaming cleanup, sent separately due to conflicts.
    This was posted to LKML at https://lkml.org/lkml/2014/5/13/854.

    5. Patch to suppress RCU stall warnings while sysrq requests are
    being processed. This patch is the RCU portions of the patch
    that Rik posted to LKML at https://lkml.org/lkml/2014/4/29/457.
    The reason for pushing this patch ahead instead of waiting until
    3.17 is that the NMI-based stack traces are messing up sysrq
    output, and in some cases also messing up the system as well."

    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     
  • Affine wakeups have the potential to interfere with NUMA placement.
    If a task wakes up too many other tasks, affine wakeups will get
    disabled.

    However, regardless of how many other tasks it wakes up, it gets
    re-enabled once a second, potentially interfering with NUMA
    placement of other tasks.

    By decaying wakee_wakes in half instead of zeroing it, we can avoid
    that problem for some workloads.

    Signed-off-by: Rik van Riel
    Signed-off-by: Peter Zijlstra
    Cc: chegu_vinod@hp.com
    Cc: umgwanakikbuti@gmail.com
    Link: http://lkml.kernel.org/r/20140516001332.67f91af2@annuminas.surriel.com
    Signed-off-by: Ingo Molnar

    Rik van Riel
     
  • Update the migrate_improves/degrades_locality() functions with
    knowledge of pseudo-interleaving.

    Do not consider moving tasks around within the set of group's active
    nodes as improving or degrading locality. Instead, leave the load
    balancer free to balance the load between a numa_group's active nodes.

    Also, switch from the group/task_weight functions to the group/task_fault
    functions. The "weight" functions involve a division, but both calls use
    the same divisor, so there's no point in doing that from these functions.

    On a 4 node (x10 core) system, performance of SPECjbb2005 seems
    unaffected, though the number of migrations with 2 8-warehouse wide
    instances seems to have almost halved, due to the scheduler running
    each instance on a single node.

    Signed-off-by: Rik van Riel
    Signed-off-by: Peter Zijlstra
    Cc: mgorman@suse.de
    Cc: chegu_vinod@hp.com
    Link: http://lkml.kernel.org/r/20140515130306.61aae7db@cuia.bos.redhat.com
    Signed-off-by: Ingo Molnar

    Rik van Riel
     
  • Currently the NUMA balancing code only allows moving tasks between NUMA
    nodes when the load on both nodes is in balance. This breaks down when
    the load was imbalanced to begin with.

    Allow tasks to be moved between NUMA nodes if the imbalance is small,
    or if the new imbalance is be smaller than the original one.

    Suggested-by: Peter Zijlstra
    Signed-off-by: Rik van Riel
    Signed-off-by: Peter Zijlstra
    Cc: mgorman@suse.de
    Cc: chegu_vinod@hp.com
    Signed-off-by: Ingo Molnar
    Link: http://lkml.kernel.org/r/20140514132221.274b3463@annuminas.surriel.com

    Rik van Riel
     
  • … current upstream code

    Signed-off-by: xiaofeng.yan <xiaofeng.yan@huawei.com>
    Signed-off-by: Peter Zijlstra <peterz@infradead.org>
    Link: http://lkml.kernel.org/r/1399605687-18094-1-git-send-email-xiaofeng.yan@huawei.com
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    xiaofeng.yan
     
  • …o_rlimit() and rlimit_to_nice()

    Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
    Signed-off-by: Peter Zijlstra <peterz@infradead.org>
    Link: http://lkml.kernel.org/r/a568a1e3cc8e78648f41b5035fa5e381d36274da.1399532322.git.yangds.fnst@cn.fujitsu.com
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Dongsheng Yang
     
  • If the sched_clock time starts at a large value, the kernel will spin
    in sched_avg_update for a long time while rq->age_stamp catches up
    with rq->clock.

    The comment in kernel/sched/clock.c says that there is no strict promise
    that it starts at zero. So initialize rq->age_stamp when a cpu starts up
    to avoid this.

    I was seeing long delays on a simulator that didn't start the clock at
    zero. This might also be an issue on reboots on processors that don't
    re-initialize the timer to zero on reset, and when using kexec.

    Signed-off-by: Corey Minyard
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1399574859-11714-1-git-send-email-minyard@acm.org
    Signed-off-by: Ingo Molnar

    Corey Minyard
     
  • Sometimes ->nr_running may cross 2 but interrupt is not being
    sent to rq's cpu. In this case we don't reenable the timer.
    Looks like this may be the reason for rare unexpected effects,
    if nohz is enabled.

    Patch replaces all places of direct changing of nr_running
    and makes add_nr_running() caring about crossing border.

    Signed-off-by: Kirill Tkhai
    Acked-by: Frederic Weisbecker
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20140508225830.2469.97461.stgit@localhost
    Signed-off-by: Ingo Molnar

    Kirill Tkhai
     
  • Currently, in idle_balance(), we update rq->next_balance when we pull_tasks.
    However, it is also important to update this in the !pulled_tasks case too.

    When the CPU is "busy" (the CPU isn't idle), rq->next_balance gets computed
    using sd->busy_factor (so we increase the balance interval when the CPU is
    busy). However, when the CPU goes idle, rq->next_balance could still be set
    to a large value that was computed with the sd->busy_factor.

    Thus, we need to also update rq->next_balance in idle_balance() in the cases
    where !pulled_tasks too, so that rq->next_balance gets updated without taking
    the busy_factor into account when the CPU is about to go idle.

    This patch makes rq->next_balance get updated independently of whether or
    not we pulled_task. Also, we add logic to ensure that we always traverse
    at least 1 of the sched domains to get a proper next_balance value for
    updating rq->next_balance.

    Additionally, since load_balance() modifies the sd->balance_interval, we
    need to re-obtain the sched domain's interval after the call to
    load_balance() in rebalance_domains() before we update rq->next_balance.

    This patch adds and uses 2 new helper functions, update_next_balance() and
    get_sd_balance_interval() to update next_balance and obtain the sched
    domain's balance_interval.

    Signed-off-by: Jason Low
    Reviewed-by: Preeti U Murthy
    Signed-off-by: Peter Zijlstra
    Cc: daniel.lezcano@linaro.org
    Cc: alex.shi@linaro.org
    Cc: efault@gmx.de
    Cc: vincent.guittot@linaro.org
    Cc: morten.rasmussen@arm.com
    Cc: aswin@hp.com
    Link: http://lkml.kernel.org/r/1399596562.2200.7.camel@j-VirtualBox
    Signed-off-by: Ingo Molnar

    Jason Low
     
  • Suggested-by: Kees Cook
    Signed-off-by: Dongsheng Yang
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1399541715-19568-1-git-send-email-yangds.fnst@cn.fujitsu.com
    Signed-off-by: Ingo Molnar

    Dongsheng Yang
     
  • There is no need to zero struct sched_group member cpumask and struct
    sched_group_power member power since both structures are already allocated
    as zeroed memory in __sdt_alloc().

    This patch has been tested with
    BUG_ON(!cpumask_empty(sched_group_cpus(sg))); and BUG_ON(sg->sgp->power);
    in build_sched_groups() on ARM TC2 and INTEL i5 M520 platform including
    CPU hotplug scenarios.

    Signed-off-by: Dietmar Eggemann
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1398865178-12577-1-git-send-email-dietmar.eggemann@arm.com
    Signed-off-by: Ingo Molnar

    Dietmar Eggemann