15 Apr, 2015

9 commits

  • Have kvm_guest_init() use hardlockup_detector_disable() instead of
    watchdog_enable_hardlockup_detector(false).

    Remove the watchdog_hardlockup_detector_is_enabled() and the
    watchdog_enable_hardlockup_detector() function which are no longer needed.

    Signed-off-by: Ulrich Obergfell
    Signed-off-by: Don Zickus
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Obergfell
     
  • Rename the update_timers*() functions to update_watchdog*().

    Remove the boolean argument from watchdog_enable_all_cpus() because
    update_watchdog_all_cpus() is now a generic function to change the run
    state of the lockup detectors and to have the lockup detectors use a new
    sample period.

    Signed-off-by: Ulrich Obergfell
    Signed-off-by: Don Zickus
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Obergfell
     
  • With the current user interface of the watchdog mechanism it is only
    possible to disable or enable both lockup detectors at the same time.
    This series introduces new kernel parameters and changes the semantics of
    some existing kernel parameters, so that the hard lockup detector and the
    soft lockup detector can be disabled or enabled individually. With this
    series applied, the user interface is as follows.

    - parameters in /proc/sys/kernel

    . soft_watchdog
    This is a new parameter to control and examine the run state of
    the soft lockup detector.

    . nmi_watchdog
    The semantics of this parameter have changed. It can now be used
    to control and examine the run state of the hard lockup detector.

    . watchdog
    This parameter is still available to control the run state of both
    lockup detectors at the same time. If this parameter is examined,
    it shows the logical OR of soft_watchdog and nmi_watchdog.

    . watchdog_thresh
    The semantics of this parameter are not affected by the patch.

    - kernel command line parameters

    . nosoftlockup
    The semantics of this parameter have changed. It can now be used
    to disable the soft lockup detector at boot time.

    . nmi_watchdog=0 or nmi_watchdog=1
    Disable or enable the hard lockup detector at boot time. The patch
    introduces '=1' as a new option.

    . nowatchdog
    The semantics of this parameter are not affected by the patch. It
    is still available to disable both lockup detectors at boot time.

    Also, remove the proc_dowatchdog() function which is no longer needed.

    [dzickus@redhat.com: wrote changelog]
    [dzickus@redhat.com: update documentation for kernel params and sysctl]
    Signed-off-by: Ulrich Obergfell
    Signed-off-by: Don Zickus
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Obergfell
     
  • If watchdog_nmi_enable() fails to set up the hardware perf event of one
    CPU, the entire hard lockup detector is deemed unreliable. Hence, disable
    the hard lockup detector and shut down the hardware perf events on all
    CPUs.

    [dzickus@redhat.com: update comments to explain some code]
    Signed-off-by: Ulrich Obergfell
    Signed-off-by: Don Zickus
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Obergfell
     
  • Separate handlers for each watchdog parameter in /proc/sys/kernel replace
    the proc_dowatchdog() function. Three of those handlers merely call
    proc_watchdog_common() with one different argument.

    Signed-off-by: Ulrich Obergfell
    Signed-off-by: Don Zickus
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Obergfell
     
  • Three of four handlers for the watchdog parameters in /proc/sys/kernel
    essentially have to do the same thing.

    if the parameter is being read {
    return the state of the corresponding bit(s) in 'watchdog_enabled'
    } else {
    set/clear the state of the corresponding bit(s) in 'watchdog_enabled'
    update the run state of the lockup detector(s)
    }

    Hence, introduce a common function that can be called by those handlers.
    The callers pass a 'bit mask' to this function to indicate which bit(s)
    should be set/cleared in 'watchdog_enabled'.

    This function handles an uncommon race with watchdog_nmi_enable() where a
    concurrent update of 'watchdog_enabled' is possible. We use 'cmpxchg' to
    detect the concurrency. [This avoids introducing a new spinlock or a
    mutex to synchronize updates of 'watchdog_enabled'. Using the same lock
    or mutex in watchdog thread context and in system call context needs to be
    considered carefully because it can make the code prone to deadlock
    situations in connection with parking/unparking the watchdog threads.]

    Signed-off-by: Ulrich Obergfell
    Signed-off-by: Don Zickus
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Obergfell
     
  • This series removes proc_dowatchdog(). Since multiple new functions need
    the 'watchdog_proc_mutex' to serialize access to the watchdog parameters
    in /proc/sys/kernel, move the mutex outside of any function.

    Signed-off-by: Ulrich Obergfell
    Signed-off-by: Don Zickus
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Obergfell
     
  • This series introduces a separate handler for each watchdog parameter in
    /proc/sys/kernel. The separate handlers need a common function that they
    can call to update the run state of the lockup detectors, or to have the
    lockup detectors use a new sample period.

    Signed-off-by: Ulrich Obergfell
    Signed-off-by: Don Zickus
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Obergfell
     
  • The hardlockup and softockup had always been tied together. Due to the
    request of KVM folks, they had a need to have one enabled but not the
    other. Internally rework the code to split things apart more cleanly.

    There is a bunch of churn here, but the end result should be code that
    should be easier to maintain and fix without knowing the internals of what
    is going on.

    This patch (of 9):

    Introduce new definitions and variables to separate the user interface in
    /proc/sys/kernel from the internal run state of the lockup detectors. The
    internal run state is represented by two bits in a new variable that is
    named 'watchdog_enabled'. This helps simplify the code, for example:

    - In order to check if any of the two lockup detectors is enabled,
    it is sufficient to check if 'watchdog_enabled' is not zero.

    - In order to enable/disable one or both lockup detectors,
    it is sufficient to set/clear one or both bits in 'watchdog_enabled'.

    - Concurrent updates of 'watchdog_enabled' need not be synchronized via
    a spinlock or a mutex. Updates can either be atomic or concurrency can
    be detected by using 'cmpxchg'.

    Signed-off-by: Ulrich Obergfell
    Signed-off-by: Don Zickus
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Obergfell
     

14 Apr, 2015

7 commits

  • Pull cgroup updates from Tejun Heo:
    "Nothing too interesting. Rik made cpuset cooperate better with
    isolcpus and there are several other cleanup patches"

    * 'for-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
    cpuset, isolcpus: document relationship between cpusets & isolcpus
    cpusets, isolcpus: exclude isolcpus from load balancing in cpusets
    sched, isolcpu: make cpu_isolated_map visible outside scheduler
    cpuset: initialize cpuset a bit early
    cgroup: Use kvfree in pidlist_free()
    cgroup: call cgroup_subsys->bind on cgroup subsys initialization

    Linus Torvalds
     
  • Pull workqueue updates from Tejun Heo:
    "Workqueue now prints debug information at the end of sysrq-t which
    should be helpful when tracking down suspected workqueue stalls. It
    only prints out the ones with something currently going on so it
    shouldn't add much output in most cases"

    * 'for-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
    workqueue: Reorder sysfs code
    percpu: Fix trivial typos in comments
    workqueue: dump workqueues on sysrq-t
    workqueue: keep track of the flushing task and pool manager
    workqueue: make the workqueues list RCU walkable

    Linus Torvalds
     
  • Pull irq core updates from Thomas Gleixner:
    "Managerial summary:

    Core code:
    - final removal of IRQF_DISABLED
    - new state save/restore functions for virtualization support
    - wakeup support for stacked irqdomains
    - new function to solve the netpoll synchronization problem

    irqchips:
    - new driver for STi based devices
    - new driver for Vybrid MSCM
    - massive cleanup of the GIC driver by moving the GIC-addons to
    stacked irqdomains
    - the usual pile of fixes and updates to the various chip drivers"

    * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
    irqchip: GICv3: Add support for irq_[get, set]_irqchip_state()
    irqchip: GIC: Add support for irq_[get, set]_irqchip_state()
    genirq: Allow the irqchip state of an IRQ to be save/restored
    genirq: MSI: Fix freeing of unallocated MSI
    irqchip: renesas-irqc: Add wake-up support
    irqchip: armada-370-xp: Allow using wakeup source
    irqchip: mips-gic: Add new functions to start/stop the GIC counter
    irqchip: tegra: Add Tegra210 support
    irqchip: digicolor: Move digicolor_set_gc to init section
    irqchip: renesas-irqc: Add functional clock to bindings
    irqchip: renesas-irqc: Add minimal runtime PM support
    irqchip: renesas-irqc: Add more register documentation
    DT: exynos: update PMU binding
    ARM: exynos4/5: convert pmu wakeup to stacked domains
    irqchip: gic: Don't complain in gic_get_cpumask() if UP system
    ARM: zynq: switch from gic_arch_extn to gic_set_irqchip_flags
    ARM: ux500: switch from gic_arch_extn to gic_set_irqchip_flags
    ARM: shmobile: remove use of gic_arch_extn.irq_set_wake
    irqchip: gic: Add an entry point to set up irqchip flags
    ARM: omap: convert wakeupgen to stacked domains
    ...

    Linus Torvalds
     
  • Pull timer updates from Ingo Molnar:
    "The main changes in this cycle were:

    - clockevents state machine cleanups and enhancements (Viresh Kumar)

    - clockevents broadcast notifier horror to state machine conversion
    and related cleanups (Thomas Gleixner, Rafael J Wysocki)

    - clocksource and timekeeping core updates (John Stultz)

    - clocksource driver updates and fixes (Ben Dooks, Dmitry Osipenko,
    Hans de Goede, Laurent Pinchart, Maxime Ripard, Xunlei Pang)

    - y2038 fixes (Xunlei Pang, John Stultz)

    - NMI-safe ktime_get_raw_fast() and general refactoring of the clock
    code, in preparation to perf's per event clock ID support (Peter
    Zijlstra)

    - generic sched/clock fixes, optimizations and cleanups (Daniel
    Thompson)

    - clockevents cpu_down() race fix (Preeti U Murthy)"

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (94 commits)
    timers/PM: Drop unnecessary braces from tick_freeze()
    timers/PM: Fix up tick_unfreeze()
    timekeeping: Get rid of stale comment
    clockevents: Cleanup dead cpu explicitely
    clockevents: Make tick handover explicit
    clockevents: Remove broadcast oneshot control leftovers
    sched/idle: Use explicit broadcast oneshot control function
    ARM: Tegra: Use explicit broadcast oneshot control function
    ARM: OMAP: Use explicit broadcast oneshot control function
    intel_idle: Use explicit broadcast oneshot control function
    ACPI/idle: Use explicit broadcast control function
    ACPI/PAD: Use explicit broadcast oneshot control function
    x86/amd/idle, clockevents: Use explicit broadcast oneshot control functions
    clockevents: Provide explicit broadcast oneshot control functions
    clockevents: Remove the broadcast control leftovers
    ARM: OMAP: Use explicit broadcast control function
    intel_idle: Use explicit broadcast control function
    cpuidle: Use explicit broadcast control function
    ACPI/processor: Use explicit broadcast control function
    ACPI/PAD: Use explicit broadcast control function
    ...

    Linus Torvalds
     
  • Pull scheduler changes from Ingo Molnar:
    "Major changes:

    - Reworked CPU capacity code, for better SMP load balancing on
    systems with assymetric CPUs. (Vincent Guittot, Morten Rasmussen)

    - Reworked RT task SMP balancing to be push based instead of pull
    based, to reduce latencies on large CPU count systems. (Steven
    Rostedt)

    - SCHED_DEADLINE support updates and fixes. (Juri Lelli)

    - SCHED_DEADLINE task migration support during CPU hotplug. (Wanpeng Li)

    - x86 mwait-idle optimizations and fixes. (Mike Galbraith, Len Brown)

    - sched/numa improvements. (Rik van Riel)

    - various cleanups"

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits)
    sched/core: Drop debugging leftover trace_printk call
    sched/deadline: Support DL task migration during CPU hotplug
    sched/core: Check for available DL bandwidth in cpuset_cpu_inactive()
    sched/deadline: Always enqueue on previous rq when dl_task_timer() fires
    sched/core: Remove unused argument from init_[rt|dl]_rq()
    sched/deadline: Fix rt runtime corruption when dl fails its global constraints
    sched/deadline: Avoid a superfluous check
    sched: Improve load balancing in the presence of idle CPUs
    sched: Optimize freq invariant accounting
    sched: Move CFS tasks to CPUs with higher capacity
    sched: Add SD_PREFER_SIBLING for SMT level
    sched: Remove unused struct sched_group_capacity::capacity_orig
    sched: Replace capacity_factor by usage
    sched: Calculate CPU's usage statistic and put it into struct sg_lb_stats::group_usage
    sched: Add struct rq::cpu_capacity_orig
    sched: Make scale_rt invariant with frequency
    sched: Make sched entity usage tracking scale-invariant
    sched: Remove frequency scaling from cpu_capacity
    sched: Track group sched_entity usage contributions
    sched: Add sched_avg::utilization_avg_contrib
    ...

    Linus Torvalds
     
  • Pull core locking changes from Ingo Molnar:
    "Main changes:

    - jump label asm preparatory work for PowerPC (Anton Blanchard)

    - rwsem optimizations and cleanups (Davidlohr Bueso)

    - mutex optimizations and cleanups (Jason Low)

    - futex fix (Oleg Nesterov)

    - remove broken atomicity checks from {READ,WRITE}_ONCE() (Peter
    Zijlstra)"

    * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    powerpc, jump_label: Include linux/jump_label.h to get HAVE_JUMP_LABEL define
    jump_label: Allow jump labels to be used in assembly
    jump_label: Allow asm/jump_label.h to be included in assembly
    locking/mutex: Further simplify mutex_spin_on_owner()
    locking: Remove atomicy checks from {READ,WRITE}_ONCE
    locking/rtmutex: Rename argument in the rt_mutex_adjust_prio_chain() documentation as well
    locking/rwsem: Fix lock optimistic spinning when owner is not running
    locking: Remove ACCESS_ONCE() usage
    locking/rwsem: Check for active lock before bailing on spinning
    locking/rwsem: Avoid deceiving lock spinners
    locking/rwsem: Set lock ownership ASAP
    locking/rwsem: Document barrier need when waking tasks
    locking/futex: Check PF_KTHREAD rather than !p->mm to filter out kthreads
    locking/mutex: Refactor mutex_spin_on_owner()
    locking/mutex: In mutex_spin_on_owner(), return true when owner changes

    Linus Torvalds
     
  • Pull KVM updates from Paolo Bonzini:
    "First batch of KVM changes for 4.1

    The most interesting bit here is irqfd/ioeventfd support for ARM and
    ARM64.

    Summary:

    ARM/ARM64:
    fixes for live migration, irqfd and ioeventfd support (enabling
    vhost, too), page aging

    s390:
    interrupt handling rework, allowing to inject all local interrupts
    via new ioctl and to get/set the full local irq state for migration
    and introspection. New ioctls to access memory by virtual address,
    and to get/set the guest storage keys. SIMD support.

    MIPS:
    FPU and MIPS SIMD Architecture (MSA) support. Includes some
    patches from Ralf Baechle's MIPS tree.

    x86:
    bugfixes (notably for pvclock, the others are small) and cleanups.
    Another small latency improvement for the TSC deadline timer"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (146 commits)
    KVM: use slowpath for cross page cached accesses
    kvm: mmu: lazy collapse small sptes into large sptes
    KVM: x86: Clear CR2 on VCPU reset
    KVM: x86: DR0-DR3 are not clear on reset
    KVM: x86: BSP in MSR_IA32_APICBASE is writable
    KVM: x86: simplify kvm_apic_map
    KVM: x86: avoid logical_map when it is invalid
    KVM: x86: fix mixed APIC mode broadcast
    KVM: x86: use MDA for interrupt matching
    kvm/ppc/mpic: drop unused IRQ_testbit
    KVM: nVMX: remove unnecessary double caching of MAXPHYADDR
    KVM: nVMX: checks for address bits beyond MAXPHYADDR on VM-entry
    KVM: x86: cache maxphyaddr CPUID leaf in struct kvm_vcpu
    KVM: vmx: pass error code with internal error #2
    x86: vdso: fix pvclock races with task migration
    KVM: remove kvm_read_hva and kvm_read_hva_atomic
    KVM: x86: optimize delivery of TSC deadline timer interrupt
    KVM: x86: extract blocking logic from __vcpu_run
    kvm: x86: fix x86 eflags fixed bit
    KVM: s390: migrate vcpu interrupt state
    ...

    Linus Torvalds
     

11 Apr, 2015

1 commit

  • irqchip core change for v4.1 (round 3) from Jason Cooper

    Purge the gic_arch_extn hacks and abuse by using the new stacked domains

    NOTE: Due to the nature of these changes, patches crossing subsystems have
    been kept together in their own branches.

    - tegra
    - Handle the LIC properly

    - omap
    - Convert crossbar to stacked domains
    - kill arm,routable-irqs in GIC binding

    - exynos
    - Convert PMU wakeup to stacked domains

    - shmobile, ux500, zynq (irq_set_wake branch)
    - Switch from abusing gic_arch_extn to using gic_set_irqchip_flags

    Thomas Gleixner
     

10 Apr, 2015

2 commits

  • Pull power management and ACPI fixes from Rafael Wysocki:
    "These are stable-candidate fixes of some recently reported issues in
    the cpufreq core, cpuidle core, the ACPI cpuidle driver and the
    hibernate core.

    Specifics:

    - Revert a 3.17 hibernate commit that was supposed to fix an issue
    related to e820 reserved regions, but broke resume from hibernation
    on Lenovo x230 (Rafael J Wysocki).

    - Prevent the ACPI cpuidle driver from overwriting the name and
    description of the C0 state set by the core when the list of
    C-states changes (Thomas Schlichter).

    - Remove the no longer needed state_count field from struct
    cpuidle_device which prevents the list of C-states shown by the
    sysfs interface from becoming incorrect when the current number of
    them is different from the number of C-states on boot (Bartlomiej
    Zolnierkiewicz).

    - The cpufreq core updates the policy object of the only online CPU
    during system resume to make it reflect the current hardware state,
    but it always assumes that CPU to be CPU0 which need not be the
    case, so fix the code to avoid that assumption (Viresh Kumar)"

    * tag 'pm+acpi-4.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
    Revert "PM / hibernate: avoid unsafe pages in e820 reserved regions"
    cpuidle: ACPI: do not overwrite name and description of C0
    cpuidle: remove state_count field from struct cpuidle_device
    cpufreq: Schedule work for the first-online CPU on resume

    Linus Torvalds
     
  • * pm-sleep:
    Revert "PM / hibernate: avoid unsafe pages in e820 reserved regions"

    * pm-cpufreq:
    cpufreq: Schedule work for the first-online CPU on resume

    * pm-cpuidle:
    cpuidle: ACPI: do not overwrite name and description of C0
    cpuidle: remove state_count field from struct cpuidle_device

    Rafael J. Wysocki
     

09 Apr, 2015

6 commits

  • Similar to what Linus suggested for rwsem_spin_on_owner(), in
    mutex_spin_on_owner() instead of having while (true) and
    breaking out of the spin loop on lock->owner != owner, we can
    have the loop directly check for while (lock->owner == owner) to
    improve the readability of the code.

    It also shrinks the code a bit:

    text data bss dec hex filename
    3721 0 0 3721 e89 mutex.o.before
    3705 0 0 3705 e79 mutex.o.after

    Signed-off-by: Jason Low
    Cc: Andrew Morton
    Cc: Aswin Chandramouleeswaran
    Cc: Davidlohr Bueso
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Tim Chen
    Link: http://lkml.kernel.org/r/1428521960-5268-2-git-send-email-jason.low2@hp.com
    [ Added code generation info. ]
    Signed-off-by: Ingo Molnar

    Jason Low
     
  • Merge misc fixes from Andrew Morton:
    "Three fixes"

    * emailed patches from Andrew Morton :
    mm: numa: disable change protection for vma(VM_HUGETLB)
    include/linux/dmapool.h: declare struct device
    mm: move zone lock to a different cache line than order-0 free page lists

    Linus Torvalds
     
  • Unlike most (all?) other copies from user space, kernel module loading
    is almost unlimited in size. So we do a potentially huge
    "copy_from_user()" when we copy the module data from user space to the
    kernel buffer, which can be a latency concern when preemption is
    disabled (or voluntary).

    Also, because 'copy_from_user()' clears the tail of the kernel buffer on
    failures, even a *failed* copy can end up wasting a lot of time.

    Normally neither of these are concerns in real life, but they do trigger
    when doing stress-testing with trinity. Running in a VM seems to add
    its own overheadm causing trinity module load testing to even trigger
    the watchdog.

    The simple fix is to just chunk up the module loading, so that it never
    tries to copy insanely big areas in one go. That bounds the latency,
    and also the amount of (unnecessarily, in this case) cleared memory for
    the failure case.

    Reported-by: Sasha Levin
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • There is a number of cases where a kernel subsystem may want to
    introspect the state of an interrupt at the irqchip level:

    - When a peripheral is shared between virtual machines,
    its interrupt state becomes part of the guest's state,
    and must be switched accordingly. KVM on arm/arm64 requires
    this for its guest-visible timer
    - Some GPIO controllers seem to require peeking into the
    interrupt controller they are connected to to report
    their internal state

    This seem to be a pattern that is common enough for the core code
    to try and support this without too many horrible hacks. Introduce
    a pair of accessors (irq_get_irqchip_state/irq_set_irqchip_state)
    to retrieve the bits that can be of interest to another subsystem:
    pending, active, and masked.

    - irq_get_irqchip_state returns the state of the interrupt according
    to a parameter set to IRQCHIP_STATE_PENDING, IRQCHIP_STATE_ACTIVE,
    IRQCHIP_STATE_MASKED or IRQCHIP_STATE_LINE_LEVEL.
    - irq_set_irqchip_state similarly sets the state of the interrupt.

    Signed-off-by: Marc Zyngier
    Reviewed-by: Bjorn Andersson
    Tested-by: Bjorn Andersson
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: Abhijeet Dharmapurikar
    Cc: Stephen Boyd
    Cc: Phong Vo
    Cc: Linus Walleij
    Cc: Tin Huynh
    Cc: Y Vo
    Cc: Toan Le
    Cc: Bjorn Andersson
    Cc: Jason Cooper
    Cc: Arnd Bergmann
    Link: http://lkml.kernel.org/r/1426676484-21812-2-git-send-email-marc.zyngier@arm.com
    Signed-off-by: Thomas Gleixner

    Marc Zyngier
     
  • While debugging an unrelated issue with the GICv3 ITS driver, the
    following trace triggered:

    WARNING: CPU: 1 PID: 1 at kernel/irq/irqdomain.c:1121 irq_domain_free_irqs+0x160/0x17c()
    NULL pointer, cannot free irq
    Modules linked in:
    CPU: 1 PID: 1 Comm: swapper/0 Tainted: G W 3.19.0-rc6+ #3690
    Hardware name: FVP Base (DT)
    Call trace:
    [] dump_backtrace+0x0/0x13c
    [] show_stack+0x10/0x1c
    [] dump_stack+0x74/0x94
    [] warn_slowpath_common+0x9c/0xd4
    [] warn_slowpath_fmt+0x5c/0x80
    [] irq_domain_free_irqs+0x15c/0x17c
    [] msi_domain_free_irqs+0x58/0x74
    [] free_msi_irqs+0xb4/0x1c0

    // The msi_prepare callback fails here

    [] pci_enable_msix+0x25c/0x3d4
    [] pci_enable_msix_range+0x34/0x80
    [] vp_try_to_find_vqs+0xec/0x528
    [] vp_find_vqs+0x6c/0xa8
    [] init_vq+0x120/0x248
    [] virtblk_probe+0xb0/0x6bc
    [] virtio_dev_probe+0x17c/0x214
    [] driver_probe_device+0x7c/0x23c
    [] __driver_attach+0x98/0xa0
    [] bus_for_each_dev+0x60/0xb4
    [] driver_attach+0x1c/0x28
    [] bus_add_driver+0x150/0x208
    [] driver_register+0x64/0x130
    [] register_virtio_driver+0x24/0x68
    [] init+0x70/0xac
    [] do_one_initcall+0x94/0x1d0
    [] kernel_init_freeable+0x144/0x1e4
    [] kernel_init+0xc/0xd8
    ---[ end trace f9ee562a77cc7bae ]---

    The ITS msi_prepare callback having failed, we end-up trying to
    free MSIs that have never been allocated. Oddly enough, the kernel
    is pretty upset about it.

    It turns out that this behaviour was expected before the MSI domain
    was introduced (and dealt with in arch_teardown_msi_irqs).

    The obvious fix is to detect this early enough and bail out.

    Signed-off-by: Marc Zyngier
    Reviewed-by: Jiang Liu
    Link: http://lkml.kernel.org/r/1422299419-6051-1-git-send-email-marc.zyngier@arm.com
    Signed-off-by: Thomas Gleixner

    Marc Zyngier
     
  • conflict with pending GIC changes.

    Conflicts:
    drivers/usb/isp1760/isp1760-core.c

    Thomas Gleixner
     

08 Apr, 2015

1 commit

  • Currently when a process accesses a hugetlb range protected with
    PROTNONE, unexpected COWs are triggered, which finally puts the hugetlb
    subsystem into a broken/uncontrollable state, where for example
    h->resv_huge_pages is subtracted too much and wraps around to a very
    large number, and the free hugepage pool is no longer maintainable.

    This patch simply stops changing protection for vma(VM_HUGETLB) to fix
    the problem. And this also allows us to avoid useless overhead of minor
    faults.

    Signed-off-by: Naoya Horiguchi
    Suggested-by: Mel Gorman
    Cc: Hugh Dickins
    Cc: "Kirill A. Shutemov"
    Cc: David Rientjes
    Cc: Rik van Riel
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     

07 Apr, 2015

1 commit


06 Apr, 2015

1 commit

  • The sysfs code usually belongs to the botom of the file since it deals
    with high level objects. In the workqueue code it's misplaced and such
    that we'll need to work around functions references to allow the sysfs
    code to call APIs like apply_workqueue_attrs().

    Lets move that block further in the file, almost the botom.

    And declare workqueue_sysfs_unregister() just before destroy_workqueue()
    which reference it.

    tj: Moved workqueue_sysfs_unregister() forward declaration where other
    forward declarations are.

    Suggested-by: Tejun Heo
    Cc: Christoph Lameter
    Cc: Kevin Hilman
    Cc: Lai Jiangshan
    Cc: Mike Galbraith
    Cc: Paul E. McKenney
    Cc: Tejun Heo
    Cc: Viresh Kumar
    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Lai Jiangshan
    Signed-off-by: Tejun Heo

    Frederic Weisbecker
     

03 Apr, 2015

12 commits

  • Some braces in tick_freeze() are not necessary, so drop them.

    Signed-off-by: Rafael J. Wysocki
    Cc: peterz@infradead.org
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1534128.H5hN3KBFB4@vostro.rjw.lan
    Signed-off-by: Ingo Molnar

    Rafael J. Wysocki
     
  • A recent conflict resolution has left tick_resume() in
    tick_unfreeze() which leads to an unbalanced execution of
    tick_resume_broadcast() every time that function runs.

    Fix that by replacing the tick_resume() in tick_unfreeze()
    with tick_resume_local() as appropriate.

    Signed-off-by: Rafael J. Wysocki
    Cc: boris.ostrovsky@oracle.com
    Cc: david.vrabel@citrix.com
    Cc: konrad.wilk@oracle.com
    Cc: peterz@infradead.org
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/8099075.V0LvN3pQAV@vostro.rjw.lan
    Signed-off-by: Ingo Molnar

    Rafael J. Wysocki
     
  • Commit:

    3c18d447b3b3 ("sched/core: Check for available DL bandwidth in cpuset_cpu_inactive()")

    forgot a trace_printk() debugging piece in and Steve's banner screamed
    in dmesg. Remove it.

    Signed-off-by: Borislav Petkov
    Cc: Juri Lelli
    Cc: Juri Lelli
    Cc: Peter Zijlstra (Intel)
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/1428050570-21041-1-git-send-email-bp@alien8.de
    Signed-off-by: Ingo Molnar

    Borislav Petkov
     
  • Arch specific management of xtime/jiffies/wall_to_monotonic is
    gone for quite a while. Zap the stale comment.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Rafael J. Wysocki
    Acked-by: John Stultz
    Cc: Peter Zijlstra
    Cc: John Stultz
    Link: http://lkml.kernel.org/r/2422730.dmO29q661S@vostro.rjw.lan
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • clockevents_notify() is a leftover from the early design of the
    clockevents facility. It's really not a notification mechanism,
    it's a multiplex call. We are way better off to have explicit
    calls instead of this monstrosity.

    Split out the cleanup function for a dead cpu and invoke it
    directly from the cpu down code. Make it conditional on
    CPU_HOTPLUG as well.

    Temporary change, will be refined in the future.

    Signed-off-by: Thomas Gleixner
    [ Rebased, added clockevents_notify() removal ]
    Signed-off-by: Rafael J. Wysocki
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1735025.raBZdQHM3m@vostro.rjw.lan
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • clockevents_notify() is a leftover from the early design of the
    clockevents facility. It's really not a notification mechanism,
    it's a multiplex call. We are way better off to have explicit
    calls instead of this monstrosity.

    Split out the tick_handover call and invoke it explicitely from
    the hotplug code. Temporary solution will be cleaned up in later
    patches.

    Signed-off-by: Thomas Gleixner
    [ Rebase ]
    Signed-off-by: Rafael J. Wysocki
    Cc: Peter Zijlstra
    Cc: John Stultz
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/1658173.RkEEILFiQZ@vostro.rjw.lan
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • Now that all users are converted over to explicit calls into the
    clockevents state machine, remove the notification chain leftovers.

    Original-from: Thomas Gleixner
    Signed-off-by: Rafael J. Wysocki
    Cc: Peter Zijlstra
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Cc: John Stultz
    Link: http://lkml.kernel.org/r/14018863.NQUzkFuafr@vostro.rjw.lan
    Signed-off-by: Ingo Molnar

    Rafael J. Wysocki
     
  • Replace the clockevents_notify() call with an explicit function call.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Rafael J. Wysocki
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/6422336.RMm7oUHcXh@vostro.rjw.lan
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • clockevents_notify() is a leftover from the early design of the
    clockevents facility. It's really not a notification mechanism,
    it's a multiplex call. We are way better off to have explicit
    calls instead of this monstrosity.

    Split out the broadcast oneshot control into a separate function
    and provide inline helpers. Switch clockevents_notify() over.
    This will go away once all callers are converted.

    This also gets rid of the nested locking of clockevents_lock and
    broadcast_lock. The broadcast oneshot control functions do not
    require clockevents_lock. Only the managing functions
    (setup/shutdown/suspend/resume of the broadcast device require
    clockevents_lock.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Rafael J. Wysocki
    Cc: Alexandre Courbot
    Cc: Daniel Lezcano
    Cc: Len Brown
    Cc: Peter Zijlstra
    Cc: Stephen Warren
    Cc: Thierry Reding
    Cc: Tony Lindgren
    Link: http://lkml.kernel.org/r/13000649.8qZuEDV0OA@vostro.rjw.lan
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • All users converted. Remove the notify leftovers.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Rafael J. Wysocki
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/2076318.76XJZ8QYP3@vostro.rjw.lan
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • clockevents_notify() is a leftover from the early design of the
    clockevents facility. It's really not a notification mechanism,
    it's a multiplex call. We are way better off to have explicit
    calls instead of this monstrosity.

    Split out the broadcast control into a separate function and
    provide inline helpers. Switch clockevents_notify() over. This
    will go away once all callers are converted.

    This also gets rid of the nested locking of clockevents_lock and
    broadcast_lock. The broadcast control functions do not require
    clockevents_lock. Only the managing functions
    (setup/shutdown/suspend/resume of the broadcast device require
    clockevents_lock.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Rafael J. Wysocki
    Cc: Daniel Lezcano
    Cc: Len Brown
    Cc: Peter Zijlstra
    Cc: Tony Lindgren
    Link: http://lkml.kernel.org/r/8086559.ttsuS0n1Xr@vostro.rjw.lan
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • Ingo noted that the description of clocks_calc_max_nsecs()'s
    50% safety margin was somewhat circular. So this patch tries
    to improve the comment to better explain what we mean by the
    50% safety margin and why we need it.

    Signed-off-by: John Stultz
    Cc: Peter Zijlstra
    Cc: Prarit Bhargava
    Cc: Richard Cochran
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1427945681-29972-20-git-send-email-john.stultz@linaro.org
    Signed-off-by: Ingo Molnar

    John Stultz