Eric Lee / smarc-fsl-linux-kernel

15 Apr, 2015

9 commits

692297d8f watchdog: introduce the hardlockup_detector_disable() function ... Browse Code »

Have kvm_guest_init() use hardlockup_detector_disable() instead of
watchdog_enable_hardlockup_detector(false).

Remove the watchdog_hardlockup_detector_is_enabled() and the
watchdog_enable_hardlockup_detector() function which are no longer needed.

Signed-off-by: Ulrich Obergfell
Signed-off-by: Don Zickus
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ulrich Obergfell
2015-04-15 07:48:59 +0800
b2f57c3a0 watchdog: clean up some function names and arguments ... Browse Code »

Rename the update_timers*() functions to update_watchdog*().

Remove the boolean argument from watchdog_enable_all_cpus() because
update_watchdog_all_cpus() is now a generic function to change the run
state of the lockup detectors and to have the lockup detectors use a new
sample period.

Signed-off-by: Ulrich Obergfell
Signed-off-by: Don Zickus
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ulrich Obergfell
2015-04-15 07:48:59 +0800
195daf665 watchdog: enable the new user interface of the watchdog mechanism ... Browse Code »

With the current user interface of the watchdog mechanism it is only
possible to disable or enable both lockup detectors at the same time.
This series introduces new kernel parameters and changes the semantics of
some existing kernel parameters, so that the hard lockup detector and the
soft lockup detector can be disabled or enabled individually. With this
series applied, the user interface is as follows.

- parameters in /proc/sys/kernel

. soft_watchdog
This is a new parameter to control and examine the run state of
the soft lockup detector.

. nmi_watchdog
The semantics of this parameter have changed. It can now be used
to control and examine the run state of the hard lockup detector.

. watchdog
This parameter is still available to control the run state of both
lockup detectors at the same time. If this parameter is examined,
it shows the logical OR of soft_watchdog and nmi_watchdog.

. watchdog_thresh
The semantics of this parameter are not affected by the patch.

- kernel command line parameters

. nosoftlockup
The semantics of this parameter have changed. It can now be used
to disable the soft lockup detector at boot time.

. nmi_watchdog=0 or nmi_watchdog=1
Disable or enable the hard lockup detector at boot time. The patch
introduces '=1' as a new option.

. nowatchdog
The semantics of this parameter are not affected by the patch. It
is still available to disable both lockup detectors at boot time.

Also, remove the proc_dowatchdog() function which is no longer needed.

[dzickus@redhat.com: wrote changelog]
[dzickus@redhat.com: update documentation for kernel params and sysctl]
Signed-off-by: Ulrich Obergfell
Signed-off-by: Don Zickus
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ulrich Obergfell
2015-04-15 07:48:59 +0800
bcfba4f4b watchdog: implement error handling for failure to set up hardware perf events ... Browse Code »

If watchdog_nmi_enable() fails to set up the hardware perf event of one
CPU, the entire hard lockup detector is deemed unreliable. Hence, disable
the hard lockup detector and shut down the hardware perf events on all
CPUs.

[dzickus@redhat.com: update comments to explain some code]
Signed-off-by: Ulrich Obergfell
Signed-off-by: Don Zickus
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ulrich Obergfell
2015-04-15 07:48:59 +0800
83a80a390 watchdog: introduce separate handlers for parameters in /proc/sys/kernel ... Browse Code »

Separate handlers for each watchdog parameter in /proc/sys/kernel replace
the proc_dowatchdog() function. Three of those handlers merely call
proc_watchdog_common() with one different argument.

Signed-off-by: Ulrich Obergfell
Signed-off-by: Don Zickus
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ulrich Obergfell
2015-04-15 07:48:59 +0800
ef246a216 watchdog: introduce proc_watchdog_common() ... Browse Code »

Three of four handlers for the watchdog parameters in /proc/sys/kernel
essentially have to do the same thing.

if the parameter is being read {
return the state of the corresponding bit(s) in 'watchdog_enabled'
} else {
set/clear the state of the corresponding bit(s) in 'watchdog_enabled'
update the run state of the lockup detector(s)
}

Hence, introduce a common function that can be called by those handlers.
The callers pass a 'bit mask' to this function to indicate which bit(s)
should be set/cleared in 'watchdog_enabled'.

This function handles an uncommon race with watchdog_nmi_enable() where a
concurrent update of 'watchdog_enabled' is possible. We use 'cmpxchg' to
detect the concurrency. [This avoids introducing a new spinlock or a
mutex to synchronize updates of 'watchdog_enabled'. Using the same lock
or mutex in watchdog thread context and in system call context needs to be
considered carefully because it can make the code prone to deadlock
situations in connection with parking/unparking the watchdog threads.]

Signed-off-by: Ulrich Obergfell
Signed-off-by: Don Zickus
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ulrich Obergfell
2015-04-15 07:48:59 +0800
f54c2274f watchdog: move definition of 'watchdog_proc_mutex' outside of proc_dowatchdog() ... Browse Code »

This series removes proc_dowatchdog(). Since multiple new functions need
the 'watchdog_proc_mutex' to serialize access to the watchdog parameters
in /proc/sys/kernel, move the mutex outside of any function.

Signed-off-by: Ulrich Obergfell
Signed-off-by: Don Zickus
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ulrich Obergfell
2015-04-15 07:48:58 +0800
a0c9cbb93 watchdog: introduce the proc_watchdog_update() function ... Browse Code »

This series introduces a separate handler for each watchdog parameter in
/proc/sys/kernel. The separate handlers need a common function that they
can call to update the run state of the lockup detectors, or to have the
lockup detectors use a new sample period.

Signed-off-by: Ulrich Obergfell
Signed-off-by: Don Zickus
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ulrich Obergfell
2015-04-15 07:48:58 +0800
84d56e66b watchdog: new definitions and variables, initialization ... Browse Code »

The hardlockup and softockup had always been tied together. Due to the
request of KVM folks, they had a need to have one enabled but not the
other. Internally rework the code to split things apart more cleanly.

There is a bunch of churn here, but the end result should be code that
should be easier to maintain and fix without knowing the internals of what
is going on.

This patch (of 9):

Introduce new definitions and variables to separate the user interface in
/proc/sys/kernel from the internal run state of the lockup detectors. The
internal run state is represented by two bits in a new variable that is
named 'watchdog_enabled'. This helps simplify the code, for example:

- In order to check if any of the two lockup detectors is enabled,
it is sufficient to check if 'watchdog_enabled' is not zero.

- In order to enable/disable one or both lockup detectors,
it is sufficient to set/clear one or both bits in 'watchdog_enabled'.

- Concurrent updates of 'watchdog_enabled' need not be synchronized via
a spinlock or a mutex. Updates can either be atomic or concurrency can
be detected by using 'cmpxchg'.

Signed-off-by: Ulrich Obergfell
Signed-off-by: Don Zickus
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ulrich Obergfell
2015-04-15 07:48:58 +0800

14 Apr, 2015

7 commits

4fd48b45f Merge branch 'for-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup ... Browse Code »

Pull cgroup updates from Tejun Heo:
"Nothing too interesting. Rik made cpuset cooperate better with
isolcpus and there are several other cleanup patches"

* 'for-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cpuset, isolcpus: document relationship between cpusets & isolcpus
cpusets, isolcpus: exclude isolcpus from load balancing in cpusets
sched, isolcpu: make cpu_isolated_map visible outside scheduler
cpuset: initialize cpuset a bit early
cgroup: Use kvfree in pidlist_free()
cgroup: call cgroup_subsys->bind on cgroup subsys initialization

Linus Torvalds
2015-04-14 07:47:11 +0800
45141eeaf Merge branch 'for-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq ... Browse Code »

Pull workqueue updates from Tejun Heo:
"Workqueue now prints debug information at the end of sysrq-t which
should be helpful when tracking down suspected workqueue stalls. It
only prints out the ones with something currently going on so it
shouldn't add much output in most cases"

* 'for-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Reorder sysfs code
percpu: Fix trivial typos in comments
workqueue: dump workqueues on sysrq-t
workqueue: keep track of the flushing task and pool manager
workqueue: make the workqueues list RCU walkable

Linus Torvalds
2015-04-14 07:19:18 +0800
8954672d8 Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull irq core updates from Thomas Gleixner:
"Managerial summary:

Core code:
- final removal of IRQF_DISABLED
- new state save/restore functions for virtualization support
- wakeup support for stacked irqdomains
- new function to solve the netpoll synchronization problem

irqchips:
- new driver for STi based devices
- new driver for Vybrid MSCM
- massive cleanup of the GIC driver by moving the GIC-addons to
stacked irqdomains
- the usual pile of fixes and updates to the various chip drivers"

* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
irqchip: GICv3: Add support for irq_[get, set]_irqchip_state()
irqchip: GIC: Add support for irq_[get, set]_irqchip_state()
genirq: Allow the irqchip state of an IRQ to be save/restored
genirq: MSI: Fix freeing of unallocated MSI
irqchip: renesas-irqc: Add wake-up support
irqchip: armada-370-xp: Allow using wakeup source
irqchip: mips-gic: Add new functions to start/stop the GIC counter
irqchip: tegra: Add Tegra210 support
irqchip: digicolor: Move digicolor_set_gc to init section
irqchip: renesas-irqc: Add functional clock to bindings
irqchip: renesas-irqc: Add minimal runtime PM support
irqchip: renesas-irqc: Add more register documentation
DT: exynos: update PMU binding
ARM: exynos4/5: convert pmu wakeup to stacked domains
irqchip: gic: Don't complain in gic_get_cpumask() if UP system
ARM: zynq: switch from gic_arch_extn to gic_set_irqchip_flags
ARM: ux500: switch from gic_arch_extn to gic_set_irqchip_flags
ARM: shmobile: remove use of gic_arch_extn.irq_set_wake
irqchip: gic: Add an entry point to set up irqchip flags
ARM: omap: convert wakeupgen to stacked domains
...

Linus Torvalds
2015-04-14 06:54:50 +0800
7fd56474d Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull timer updates from Ingo Molnar:
"The main changes in this cycle were:

- clockevents state machine cleanups and enhancements (Viresh Kumar)

- clockevents broadcast notifier horror to state machine conversion
and related cleanups (Thomas Gleixner, Rafael J Wysocki)

- clocksource and timekeeping core updates (John Stultz)

- clocksource driver updates and fixes (Ben Dooks, Dmitry Osipenko,
Hans de Goede, Laurent Pinchart, Maxime Ripard, Xunlei Pang)

- y2038 fixes (Xunlei Pang, John Stultz)

- NMI-safe ktime_get_raw_fast() and general refactoring of the clock
code, in preparation to perf's per event clock ID support (Peter
Zijlstra)

- generic sched/clock fixes, optimizations and cleanups (Daniel
Thompson)

- clockevents cpu_down() race fix (Preeti U Murthy)"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (94 commits)
timers/PM: Drop unnecessary braces from tick_freeze()
timers/PM: Fix up tick_unfreeze()
timekeeping: Get rid of stale comment
clockevents: Cleanup dead cpu explicitely
clockevents: Make tick handover explicit
clockevents: Remove broadcast oneshot control leftovers
sched/idle: Use explicit broadcast oneshot control function
ARM: Tegra: Use explicit broadcast oneshot control function
ARM: OMAP: Use explicit broadcast oneshot control function
intel_idle: Use explicit broadcast oneshot control function
ACPI/idle: Use explicit broadcast control function
ACPI/PAD: Use explicit broadcast oneshot control function
x86/amd/idle, clockevents: Use explicit broadcast oneshot control functions
clockevents: Provide explicit broadcast oneshot control functions
clockevents: Remove the broadcast control leftovers
ARM: OMAP: Use explicit broadcast control function
intel_idle: Use explicit broadcast control function
cpuidle: Use explicit broadcast control function
ACPI/processor: Use explicit broadcast control function
ACPI/PAD: Use explicit broadcast control function
...

Linus Torvalds
2015-04-14 02:08:28 +0800
49d2953c7 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler changes from Ingo Molnar:
"Major changes:

- Reworked CPU capacity code, for better SMP load balancing on
systems with assymetric CPUs. (Vincent Guittot, Morten Rasmussen)

- Reworked RT task SMP balancing to be push based instead of pull
based, to reduce latencies on large CPU count systems. (Steven
Rostedt)

- SCHED_DEADLINE support updates and fixes. (Juri Lelli)

- SCHED_DEADLINE task migration support during CPU hotplug. (Wanpeng Li)

- x86 mwait-idle optimizations and fixes. (Mike Galbraith, Len Brown)

- sched/numa improvements. (Rik van Riel)

- various cleanups"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits)
sched/core: Drop debugging leftover trace_printk call
sched/deadline: Support DL task migration during CPU hotplug
sched/core: Check for available DL bandwidth in cpuset_cpu_inactive()
sched/deadline: Always enqueue on previous rq when dl_task_timer() fires
sched/core: Remove unused argument from init_[rt|dl]_rq()
sched/deadline: Fix rt runtime corruption when dl fails its global constraints
sched/deadline: Avoid a superfluous check
sched: Improve load balancing in the presence of idle CPUs
sched: Optimize freq invariant accounting
sched: Move CFS tasks to CPUs with higher capacity
sched: Add SD_PREFER_SIBLING for SMT level
sched: Remove unused struct sched_group_capacity::capacity_orig
sched: Replace capacity_factor by usage
sched: Calculate CPU's usage statistic and put it into struct sg_lb_stats::group_usage
sched: Add struct rq::cpu_capacity_orig
sched: Make scale_rt invariant with frequency
sched: Make sched entity usage tracking scale-invariant
sched: Remove frequency scaling from cpu_capacity
sched: Track group sched_entity usage contributions
sched: Add sched_avg::utilization_avg_contrib
...

Linus Torvalds
2015-04-14 01:47:34 +0800
cc76ee75a Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull core locking changes from Ingo Molnar:
"Main changes:

- jump label asm preparatory work for PowerPC (Anton Blanchard)

- rwsem optimizations and cleanups (Davidlohr Bueso)

- mutex optimizations and cleanups (Jason Low)

- futex fix (Oleg Nesterov)

- remove broken atomicity checks from {READ,WRITE}_ONCE() (Peter
Zijlstra)"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
powerpc, jump_label: Include linux/jump_label.h to get HAVE_JUMP_LABEL define
jump_label: Allow jump labels to be used in assembly
jump_label: Allow asm/jump_label.h to be included in assembly
locking/mutex: Further simplify mutex_spin_on_owner()
locking: Remove atomicy checks from {READ,WRITE}_ONCE
locking/rtmutex: Rename argument in the rt_mutex_adjust_prio_chain() documentation as well
locking/rwsem: Fix lock optimistic spinning when owner is not running
locking: Remove ACCESS_ONCE() usage
locking/rwsem: Check for active lock before bailing on spinning
locking/rwsem: Avoid deceiving lock spinners
locking/rwsem: Set lock ownership ASAP
locking/rwsem: Document barrier need when waking tasks
locking/futex: Check PF_KTHREAD rather than !p->mm to filter out kthreads
locking/mutex: Refactor mutex_spin_on_owner()
locking/mutex: In mutex_spin_on_owner(), return true when owner changes

Linus Torvalds
2015-04-14 01:27:28 +0800
900360131 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm ... Browse Code »

Pull KVM updates from Paolo Bonzini:
"First batch of KVM changes for 4.1

The most interesting bit here is irqfd/ioeventfd support for ARM and
ARM64.

Summary:

ARM/ARM64:
fixes for live migration, irqfd and ioeventfd support (enabling
vhost, too), page aging

s390:
interrupt handling rework, allowing to inject all local interrupts
via new ioctl and to get/set the full local irq state for migration
and introspection. New ioctls to access memory by virtual address,
and to get/set the guest storage keys. SIMD support.

MIPS:
FPU and MIPS SIMD Architecture (MSA) support. Includes some
patches from Ralf Baechle's MIPS tree.

x86:
bugfixes (notably for pvclock, the others are small) and cleanups.
Another small latency improvement for the TSC deadline timer"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (146 commits)
KVM: use slowpath for cross page cached accesses
kvm: mmu: lazy collapse small sptes into large sptes
KVM: x86: Clear CR2 on VCPU reset
KVM: x86: DR0-DR3 are not clear on reset
KVM: x86: BSP in MSR_IA32_APICBASE is writable
KVM: x86: simplify kvm_apic_map
KVM: x86: avoid logical_map when it is invalid
KVM: x86: fix mixed APIC mode broadcast
KVM: x86: use MDA for interrupt matching
kvm/ppc/mpic: drop unused IRQ_testbit
KVM: nVMX: remove unnecessary double caching of MAXPHYADDR
KVM: nVMX: checks for address bits beyond MAXPHYADDR on VM-entry
KVM: x86: cache maxphyaddr CPUID leaf in struct kvm_vcpu
KVM: vmx: pass error code with internal error #2
x86: vdso: fix pvclock races with task migration
KVM: remove kvm_read_hva and kvm_read_hva_atomic
KVM: x86: optimize delivery of TSC deadline timer interrupt
KVM: x86: extract blocking logic from __vcpu_run
kvm: x86: fix x86 eflags fixed bit
KVM: s390: migrate vcpu interrupt state
...

Linus Torvalds
2015-04-14 00:47:01 +0800

11 Apr, 2015

1 commit

b7dccbea6 Merge tag 'irqchip-core-4.1-3' of git://git.infradead.org/users/jcooper/linux into irq/core ... Browse Code »

irqchip core change for v4.1 (round 3) from Jason Cooper

Purge the gic_arch_extn hacks and abuse by using the new stacked domains

NOTE: Due to the nature of these changes, patches crossing subsystems have
been kept together in their own branches.

- tegra
- Handle the LIC properly

- omap
- Convert crossbar to stacked domains
- kill arm,routable-irqs in GIC binding

- exynos
- Convert PMU wakeup to stacked domains

- shmobile, ux500, zynq (irq_set_wake branch)
- Switch from abusing gic_arch_extn to using gic_set_irqchip_flags

Thomas Gleixner
2015-04-11 17:17:28 +0800

10 Apr, 2015

2 commits

e5e02de06 Merge tag 'pm+acpi-4.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm ... Browse Code »

Pull power management and ACPI fixes from Rafael Wysocki:
"These are stable-candidate fixes of some recently reported issues in
the cpufreq core, cpuidle core, the ACPI cpuidle driver and the
hibernate core.

Specifics:

- Revert a 3.17 hibernate commit that was supposed to fix an issue
related to e820 reserved regions, but broke resume from hibernation
on Lenovo x230 (Rafael J Wysocki).

- Prevent the ACPI cpuidle driver from overwriting the name and
description of the C0 state set by the core when the list of
C-states changes (Thomas Schlichter).

- Remove the no longer needed state_count field from struct
cpuidle_device which prevents the list of C-states shown by the
sysfs interface from becoming incorrect when the current number of
them is different from the number of C-states on boot (Bartlomiej
Zolnierkiewicz).

- The cpufreq core updates the policy object of the only online CPU
during system resume to make it reflect the current hardware state,
but it always assumes that CPU to be CPU0 which need not be the
case, so fix the code to avoid that assumption (Viresh Kumar)"

* tag 'pm+acpi-4.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "PM / hibernate: avoid unsafe pages in e820 reserved regions"
cpuidle: ACPI: do not overwrite name and description of C0
cpuidle: remove state_count field from struct cpuidle_device
cpufreq: Schedule work for the first-online CPU on resume

Linus Torvalds
2015-04-10 08:44:27 +0800
b2d5fb97d Merge branches 'pm-sleep', 'pm-cpufreq' and 'pm-cpuidle' ... Browse Code »

* pm-sleep:
Revert "PM / hibernate: avoid unsafe pages in e820 reserved regions"

* pm-cpufreq:
cpufreq: Schedule work for the first-online CPU on resume

* pm-cpuidle:
cpuidle: ACPI: do not overwrite name and description of C0
cpuidle: remove state_count field from struct cpuidle_device

Rafael J. Wysocki
2015-04-10 05:25:23 +0800

09 Apr, 2015

6 commits

01ac33c1f locking/mutex: Further simplify mutex_spin_on_owner() ... Browse Code »

Similar to what Linus suggested for rwsem_spin_on_owner(), in
mutex_spin_on_owner() instead of having while (true) and
breaking out of the spin loop on lock->owner != owner, we can
have the loop directly check for while (lock->owner == owner) to
improve the readability of the code.

It also shrinks the code a bit:

text data bss dec hex filename
3721 0 0 3721 e89 mutex.o.before
3705 0 0 3705 e79 mutex.o.after

Signed-off-by: Jason Low
Cc: Andrew Morton
Cc: Aswin Chandramouleeswaran
Cc: Davidlohr Bueso
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Tim Chen
Link: http://lkml.kernel.org/r/1428521960-5268-2-git-send-email-jason.low2@hp.com
[ Added code generation info. ]
Signed-off-by: Ingo Molnar

Jason Low
2015-04-09 14:10:23 +0800
b97fdef8e Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge misc fixes from Andrew Morton:
"Three fixes"

* emailed patches from Andrew Morton :
mm: numa: disable change protection for vma(VM_HUGETLB)
include/linux/dmapool.h: declare struct device
mm: move zone lock to a different cache line than order-0 free page lists

Linus Torvalds
2015-04-09 05:42:49 +0800
3afe9f849 Copy the kernel module data from user space in chunks ... Browse Code »

Unlike most (all?) other copies from user space, kernel module loading
is almost unlimited in size. So we do a potentially huge
"copy_from_user()" when we copy the module data from user space to the
kernel buffer, which can be a latency concern when preemption is
disabled (or voluntary).

Also, because 'copy_from_user()' clears the tail of the kernel buffer on
failures, even a *failed* copy can end up wasting a lot of time.

Normally neither of these are concerns in real life, but they do trigger
when doing stress-testing with trinity. Running in a VM seems to add
its own overheadm causing trinity module load testing to even trigger
the watchdog.

The simple fix is to just chunk up the module loading, so that it never
tries to copy insanely big areas in one go. That bounds the latency,
and also the amount of (unnecessarily, in this case) cleared memory for
the failure case.

Reported-by: Sasha Levin
Signed-off-by: Linus Torvalds

Linus Torvalds
2015-04-09 05:35:48 +0800
1b7047edf genirq: Allow the irqchip state of an IRQ to be save/restored ... Browse Code »

There is a number of cases where a kernel subsystem may want to
introspect the state of an interrupt at the irqchip level:

- When a peripheral is shared between virtual machines,
its interrupt state becomes part of the guest's state,
and must be switched accordingly. KVM on arm/arm64 requires
this for its guest-visible timer
- Some GPIO controllers seem to require peeking into the
interrupt controller they are connected to to report
their internal state

This seem to be a pattern that is common enough for the core code
to try and support this without too many horrible hacks. Introduce
a pair of accessors (irq_get_irqchip_state/irq_set_irqchip_state)
to retrieve the bits that can be of interest to another subsystem:
pending, active, and masked.

- irq_get_irqchip_state returns the state of the interrupt according
to a parameter set to IRQCHIP_STATE_PENDING, IRQCHIP_STATE_ACTIVE,
IRQCHIP_STATE_MASKED or IRQCHIP_STATE_LINE_LEVEL.
- irq_set_irqchip_state similarly sets the state of the interrupt.

Signed-off-by: Marc Zyngier
Reviewed-by: Bjorn Andersson
Tested-by: Bjorn Andersson
Cc: linux-arm-kernel@lists.infradead.org
Cc: Abhijeet Dharmapurikar
Cc: Stephen Boyd
Cc: Phong Vo
Cc: Linus Walleij
Cc: Tin Huynh
Cc: Y Vo
Cc: Toan Le
Cc: Bjorn Andersson
Cc: Jason Cooper
Cc: Arnd Bergmann
Link: http://lkml.kernel.org/r/1426676484-21812-2-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner

Marc Zyngier
2015-04-09 05:28:28 +0800
fe0c52fc0 genirq: MSI: Fix freeing of unallocated MSI ... Browse Code »

While debugging an unrelated issue with the GICv3 ITS driver, the
following trace triggered:

WARNING: CPU: 1 PID: 1 at kernel/irq/irqdomain.c:1121 irq_domain_free_irqs+0x160/0x17c()
NULL pointer, cannot free irq
Modules linked in:
CPU: 1 PID: 1 Comm: swapper/0 Tainted: G W 3.19.0-rc6+ #3690
Hardware name: FVP Base (DT)
Call trace:
[] dump_backtrace+0x0/0x13c
[] show_stack+0x10/0x1c
[] dump_stack+0x74/0x94
[] warn_slowpath_common+0x9c/0xd4
[] warn_slowpath_fmt+0x5c/0x80
[] irq_domain_free_irqs+0x15c/0x17c
[] msi_domain_free_irqs+0x58/0x74
[] free_msi_irqs+0xb4/0x1c0

// The msi_prepare callback fails here

[] pci_enable_msix+0x25c/0x3d4
[] pci_enable_msix_range+0x34/0x80
[] vp_try_to_find_vqs+0xec/0x528
[] vp_find_vqs+0x6c/0xa8
[] init_vq+0x120/0x248
[] virtblk_probe+0xb0/0x6bc
[] virtio_dev_probe+0x17c/0x214
[] driver_probe_device+0x7c/0x23c
[] __driver_attach+0x98/0xa0
[] bus_for_each_dev+0x60/0xb4
[] driver_attach+0x1c/0x28
[] bus_add_driver+0x150/0x208
[] driver_register+0x64/0x130
[] register_virtio_driver+0x24/0x68
[] init+0x70/0xac
[] do_one_initcall+0x94/0x1d0
[] kernel_init_freeable+0x144/0x1e4
[] kernel_init+0xc/0xd8
---[ end trace f9ee562a77cc7bae ]---

The ITS msi_prepare callback having failed, we end-up trying to
free MSIs that have never been allocated. Oddly enough, the kernel
is pretty upset about it.

It turns out that this behaviour was expected before the MSI domain
was introduced (and dealt with in arch_teardown_msi_irqs).

The obvious fix is to detect this early enough and bail out.

Signed-off-by: Marc Zyngier
Reviewed-by: Jiang Liu
Link: http://lkml.kernel.org/r/1422299419-6051-1-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner

Marc Zyngier
2015-04-09 05:28:28 +0800
462b69b1e Merge branch 'linus' into irq/core to get the GIC updates which ... Browse Code »

conflict with pending GIC changes.

Conflicts:
drivers/usb/isp1760/isp1760-core.c

Thomas Gleixner
2015-04-09 05:26:21 +0800

08 Apr, 2015

1 commit

6b79c57b9 mm: numa: disable change protection for vma(VM_HUGETLB) ... Browse Code »

Currently when a process accesses a hugetlb range protected with
PROTNONE, unexpected COWs are triggered, which finally puts the hugetlb
subsystem into a broken/uncontrollable state, where for example
h->resv_huge_pages is subtracted too much and wraps around to a very
large number, and the free hugepage pool is no longer maintainable.

This patch simply stops changing protection for vma(VM_HUGETLB) to fix
the problem. And this also allows us to avoid useless overhead of minor
faults.

Signed-off-by: Naoya Horiguchi
Suggested-by: Mel Gorman
Cc: Hugh Dickins
Cc: "Kirill A. Shutemov"
Cc: David Rientjes
Cc: Rik van Riel
Cc: Peter Zijlstra
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Naoya Horiguchi
2015-04-08 07:45:33 +0800

07 Apr, 2015

1 commit

f82daee49 Revert "PM / hibernate: avoid unsafe pages in e820 reserved regions" ... Browse Code »

Commit 84c91b7ae07c (PM / hibernate: avoid unsafe pages in e820 reserved
regions) is reported to make resume from hibernation on Lenovo x230
unreliable, so revert it.

We will revisit the issue the commit in question was supposed to fix
in the future.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=96111
Reported-by: rhn
Cc: 3.17+ # 3.17+
Signed-off-by: Rafael J. Wysocki

Rafael J. Wysocki
2015-04-07 07:13:23 +0800

06 Apr, 2015

1 commit

6ba94429c workqueue: Reorder sysfs code ... Browse Code »

The sysfs code usually belongs to the botom of the file since it deals
with high level objects. In the workqueue code it's misplaced and such
that we'll need to work around functions references to allow the sysfs
code to call APIs like apply_workqueue_attrs().

Lets move that block further in the file, almost the botom.

And declare workqueue_sysfs_unregister() just before destroy_workqueue()
which reference it.

tj: Moved workqueue_sysfs_unregister() forward declaration where other
forward declarations are.

Suggested-by: Tejun Heo
Cc: Christoph Lameter
Cc: Kevin Hilman
Cc: Lai Jiangshan
Cc: Mike Galbraith
Cc: Paul E. McKenney
Cc: Tejun Heo
Cc: Viresh Kumar
Signed-off-by: Frederic Weisbecker
Signed-off-by: Lai Jiangshan
Signed-off-by: Tejun Heo

Frederic Weisbecker
2015-04-06 23:16:04 +0800

03 Apr, 2015

12 commits

def747087 timers/PM: Drop unnecessary braces from tick_freeze() ... Browse Code »

Some braces in tick_freeze() are not necessary, so drop them.

Signed-off-by: Rafael J. Wysocki
Cc: peterz@infradead.org
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1534128.H5hN3KBFB4@vostro.rjw.lan
Signed-off-by: Ingo Molnar

Rafael J. Wysocki
2015-04-03 21:15:52 +0800
422fe7502 timers/PM: Fix up tick_unfreeze() ... Browse Code »

A recent conflict resolution has left tick_resume() in
tick_unfreeze() which leads to an unbalanced execution of
tick_resume_broadcast() every time that function runs.

Fix that by replacing the tick_resume() in tick_unfreeze()
with tick_resume_local() as appropriate.

Signed-off-by: Rafael J. Wysocki
Cc: boris.ostrovsky@oracle.com
Cc: david.vrabel@citrix.com
Cc: konrad.wilk@oracle.com
Cc: peterz@infradead.org
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/8099075.V0LvN3pQAV@vostro.rjw.lan
Signed-off-by: Ingo Molnar

Rafael J. Wysocki
2015-04-03 21:15:51 +0800
62a935b25 sched/core: Drop debugging leftover trace_printk call ... Browse Code »

Commit:

3c18d447b3b3 ("sched/core: Check for available DL bandwidth in cpuset_cpu_inactive()")

forgot a trace_printk() debugging piece in and Steve's banner screamed
in dmesg. Remove it.

Signed-off-by: Borislav Petkov
Cc: Juri Lelli
Cc: Juri Lelli
Cc: Peter Zijlstra (Intel)
Cc: Steven Rostedt
Link: http://lkml.kernel.org/r/1428050570-21041-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar

Borislav Petkov
2015-04-03 16:48:25 +0800
347c6f6dd timekeeping: Get rid of stale comment ... Browse Code »

Arch specific management of xtime/jiffies/wall_to_monotonic is
gone for quite a while. Zap the stale comment.

Signed-off-by: Thomas Gleixner
Signed-off-by: Rafael J. Wysocki
Acked-by: John Stultz
Cc: Peter Zijlstra
Cc: John Stultz
Link: http://lkml.kernel.org/r/2422730.dmO29q661S@vostro.rjw.lan
Signed-off-by: Ingo Molnar

Thomas Gleixner
2015-04-03 14:44:37 +0800
a49b116dc clockevents: Cleanup dead cpu explicitely ... Browse Code »

clockevents_notify() is a leftover from the early design of the
clockevents facility. It's really not a notification mechanism,
it's a multiplex call. We are way better off to have explicit
calls instead of this monstrosity.

Split out the cleanup function for a dead cpu and invoke it
directly from the cpu down code. Make it conditional on
CPU_HOTPLUG as well.

Temporary change, will be refined in the future.

Signed-off-by: Thomas Gleixner
[ Rebased, added clockevents_notify() removal ]
Signed-off-by: Rafael J. Wysocki
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1735025.raBZdQHM3m@vostro.rjw.lan
Signed-off-by: Ingo Molnar

Thomas Gleixner
2015-04-03 14:44:37 +0800
52c063d1a clockevents: Make tick handover explicit ... Browse Code »

clockevents_notify() is a leftover from the early design of the
clockevents facility. It's really not a notification mechanism,
it's a multiplex call. We are way better off to have explicit
calls instead of this monstrosity.

Split out the tick_handover call and invoke it explicitely from
the hotplug code. Temporary solution will be cleaned up in later
patches.

Signed-off-by: Thomas Gleixner
[ Rebase ]
Signed-off-by: Rafael J. Wysocki
Cc: Peter Zijlstra
Cc: John Stultz
Cc: Linus Torvalds
Cc: Andrew Morton
Link: http://lkml.kernel.org/r/1658173.RkEEILFiQZ@vostro.rjw.lan
Signed-off-by: Ingo Molnar

Thomas Gleixner
2015-04-03 14:44:36 +0800
ffa48c0d7 clockevents: Remove broadcast oneshot control leftovers ... Browse Code »

Now that all users are converted over to explicit calls into the
clockevents state machine, remove the notification chain leftovers.

Original-from: Thomas Gleixner
Signed-off-by: Rafael J. Wysocki
Cc: Peter Zijlstra
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: John Stultz
Link: http://lkml.kernel.org/r/14018863.NQUzkFuafr@vostro.rjw.lan
Signed-off-by: Ingo Molnar

Rafael J. Wysocki
2015-04-03 14:44:36 +0800
335f49196 sched/idle: Use explicit broadcast oneshot control function ... Browse Code »

Replace the clockevents_notify() call with an explicit function call.

Signed-off-by: Thomas Gleixner
Signed-off-by: Rafael J. Wysocki
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/6422336.RMm7oUHcXh@vostro.rjw.lan
Signed-off-by: Ingo Molnar

Thomas Gleixner
2015-04-03 14:44:36 +0800
1fe5d5c3c clockevents: Provide explicit broadcast oneshot control functions ... Browse Code »

clockevents_notify() is a leftover from the early design of the
clockevents facility. It's really not a notification mechanism,
it's a multiplex call. We are way better off to have explicit
calls instead of this monstrosity.

Split out the broadcast oneshot control into a separate function
and provide inline helpers. Switch clockevents_notify() over.
This will go away once all callers are converted.

This also gets rid of the nested locking of clockevents_lock and
broadcast_lock. The broadcast oneshot control functions do not
require clockevents_lock. Only the managing functions
(setup/shutdown/suspend/resume of the broadcast device require
clockevents_lock.

Signed-off-by: Thomas Gleixner
Signed-off-by: Rafael J. Wysocki
Cc: Alexandre Courbot
Cc: Daniel Lezcano
Cc: Len Brown
Cc: Peter Zijlstra
Cc: Stephen Warren
Cc: Thierry Reding
Cc: Tony Lindgren
Link: http://lkml.kernel.org/r/13000649.8qZuEDV0OA@vostro.rjw.lan
Signed-off-by: Ingo Molnar

Thomas Gleixner
2015-04-03 14:44:33 +0800
89feddbfe clockevents: Remove the broadcast control leftovers ... Browse Code »

All users converted. Remove the notify leftovers.

Signed-off-by: Thomas Gleixner
Signed-off-by: Rafael J. Wysocki
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/2076318.76XJZ8QYP3@vostro.rjw.lan
Signed-off-by: Ingo Molnar

Thomas Gleixner
2015-04-03 14:44:33 +0800
592a438ff clockevents: Provide explicit broadcast control functions ... Browse Code »

clockevents_notify() is a leftover from the early design of the
clockevents facility. It's really not a notification mechanism,
it's a multiplex call. We are way better off to have explicit
calls instead of this monstrosity.

Split out the broadcast control into a separate function and
provide inline helpers. Switch clockevents_notify() over. This
will go away once all callers are converted.

This also gets rid of the nested locking of clockevents_lock and
broadcast_lock. The broadcast control functions do not require
clockevents_lock. Only the managing functions
(setup/shutdown/suspend/resume of the broadcast device require
clockevents_lock.

Signed-off-by: Thomas Gleixner
Signed-off-by: Rafael J. Wysocki
Cc: Daniel Lezcano
Cc: Len Brown
Cc: Peter Zijlstra
Cc: Tony Lindgren
Link: http://lkml.kernel.org/r/8086559.ttsuS0n1Xr@vostro.rjw.lan
Signed-off-by: Ingo Molnar

Thomas Gleixner
2015-04-03 14:44:31 +0800
8e56f33f8 clocksource: Improve comment explaining clocks_calc_max_nsecs()'s 50% safety margin ... Browse Code »

Ingo noted that the description of clocks_calc_max_nsecs()'s
50% safety margin was somewhat circular. So this patch tries
to improve the comment to better explain what we mean by the
50% safety margin and why we need it.

Signed-off-by: John Stultz
Cc: Peter Zijlstra
Cc: Prarit Bhargava
Cc: Richard Cochran
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1427945681-29972-20-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar

John Stultz
2015-04-03 14:18:35 +0800