Eric Lee / smarc-fsl-linux-kernel

08 May, 2015

11 commits

2c83e8e94 locking/qspinlock: Use a simple write to grab the lock ... Browse Code »

Currently, atomic_cmpxchg() is used to get the lock. However, this
is not really necessary if there is more than one task in the queue
and the queue head don't need to reset the tail code. For that case,
a simple write to set the lock bit is enough as the queue head will
be the only one eligible to get the lock as long as it checks that
both the lock and pending bits are not set. The current pending bit
waiting code will ensure that the bit will not be set as soon as the
tail code in the lock is set.

With that change, the are some slight improvement in the performance
of the queued spinlock in the 5M loop micro-benchmark run on a 4-socket
Westere-EX machine as shown in the tables below.

[Standalone/Embedded - same node]
# of tasks Before patch After patch %Change
---------- ----------- ---------- -------
3 2324/2321 2248/2265 -3%/-2%
4 2890/2896 2819/2831 -2%/-2%
5 3611/3595 3522/3512 -2%/-2%
6 4281/4276 4173/4160 -3%/-3%
7 5018/5001 4875/4861 -3%/-3%
8 5759/5750 5563/5568 -3%/-3%

[Standalone/Embedded - different nodes]
# of tasks Before patch After patch %Change
---------- ----------- ---------- -------
3 12242/12237 12087/12093 -1%/-1%
4 10688/10696 10507/10521 -2%/-2%

It was also found that this change produced a much bigger performance
improvement in the newer IvyBridge-EX chip and was essentially to close
the performance gap between the ticket spinlock and queued spinlock.

The disk workload of the AIM7 benchmark was run on a 4-socket
Westmere-EX machine with both ext4 and xfs RAM disks at 3000 users
on a 3.14 based kernel. The results of the test runs were:

AIM7 XFS Disk Test
kernel JPM Real Time Sys Time Usr Time
----- --- --------- -------- --------
ticketlock 5678233 3.17 96.61 5.81
qspinlock 5750799 3.13 94.83 5.97

AIM7 EXT4 Disk Test
kernel JPM Real Time Sys Time Usr Time
----- --- --------- -------- --------
ticketlock 1114551 16.15 509.72 7.11
qspinlock 2184466 8.24 232.99 6.01

The ext4 filesystem run had a much higher spinlock contention than
the xfs filesystem run.

The "ebizzy -m" test was also run with the following results:

kernel records/s Real Time Sys Time Usr Time
----- --------- --------- -------- --------
ticketlock 2075 10.00 216.35 3.49
qspinlock 3023 10.00 198.20 4.80

Signed-off-by: Waiman Long
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Boris Ostrovsky
Cc: Borislav Petkov
Cc: Daniel J Blueman
Cc: David Vrabel
Cc: Douglas Hatch
Cc: H. Peter Anvin
Cc: Konrad Rzeszutek Wilk
Cc: Linus Torvalds
Cc: Oleg Nesterov
Cc: Paolo Bonzini
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Raghavendra K T
Cc: Rik van Riel
Cc: Scott J Norton
Cc: Thomas Gleixner
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1429901803-29771-7-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar

Waiman Long
2015-05-08 18:36:55 +0800
69f9cae90 locking/qspinlock: Optimize for smaller NR_CPUS ... Browse Code »

When we allow for a max NR_CPUS < 2^14 we can optimize the pending
wait-acquire and the xchg_tail() operations.

By growing the pending bit to a byte, we reduce the tail to 16bit.
This means we can use xchg16 for the tail part and do away with all
the repeated compxchg() operations.

This in turn allows us to unconditionally acquire; the locked state
as observed by the wait loops cannot change. And because both locked
and pending are now a full byte we can use simple stores for the
state transition, obviating one atomic operation entirely.

This optimization is needed to make the qspinlock achieve performance
parity with ticket spinlock at light load.

All this is horribly broken on Alpha pre EV56 (and any other arch that
cannot do single-copy atomic byte stores).

Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Waiman Long
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Boris Ostrovsky
Cc: Borislav Petkov
Cc: Daniel J Blueman
Cc: David Vrabel
Cc: Douglas Hatch
Cc: H. Peter Anvin
Cc: Konrad Rzeszutek Wilk
Cc: Linus Torvalds
Cc: Oleg Nesterov
Cc: Paolo Bonzini
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Raghavendra K T
Cc: Rik van Riel
Cc: Scott J Norton
Cc: Thomas Gleixner
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1429901803-29771-6-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar

Peter Zijlstra (Intel)
2015-05-08 18:36:48 +0800
6403bd7d0 locking/qspinlock: Extract out code snippets for the next patch ... Browse Code »

This is a preparatory patch that extracts out the following 2 code
snippets to prepare for the next performance optimization patch.

1) the logic for the exchange of new and previous tail code words
into a new xchg_tail() function.
2) the logic for clearing the pending bit and setting the locked bit
into a new clear_pending_set_locked() function.

This patch also simplifies the trylock operation before queuing by
calling queued_spin_trylock() directly.

Signed-off-by: Waiman Long
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Boris Ostrovsky
Cc: Borislav Petkov
Cc: Daniel J Blueman
Cc: David Vrabel
Cc: Douglas Hatch
Cc: H. Peter Anvin
Cc: Konrad Rzeszutek Wilk
Cc: Linus Torvalds
Cc: Oleg Nesterov
Cc: Paolo Bonzini
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Raghavendra K T
Cc: Rik van Riel
Cc: Scott J Norton
Cc: Thomas Gleixner
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1429901803-29771-5-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar

Waiman Long
2015-05-08 18:36:41 +0800
c1fb159db locking/qspinlock: Add pending bit ... Browse Code »

Because the qspinlock needs to touch a second cacheline (the per-cpu
mcs_nodes[]); add a pending bit and allow a single in-word spinner
before we punt to the second cacheline.

It is possible so observe the pending bit without the locked bit when
the last owner has just released but the pending owner has not yet
taken ownership.

In this case we would normally queue -- because the pending bit is
already taken. However, in this case the pending bit is guaranteed
to be released 'soon', therefore wait for it and avoid queueing.

Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Waiman Long
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Boris Ostrovsky
Cc: Borislav Petkov
Cc: Daniel J Blueman
Cc: David Vrabel
Cc: Douglas Hatch
Cc: H. Peter Anvin
Cc: Konrad Rzeszutek Wilk
Cc: Linus Torvalds
Cc: Oleg Nesterov
Cc: Paolo Bonzini
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Raghavendra K T
Cc: Rik van Riel
Cc: Scott J Norton
Cc: Thomas Gleixner
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1429901803-29771-4-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar

Peter Zijlstra (Intel)
2015-05-08 18:36:32 +0800
d73a33973 locking/qspinlock, x86: Enable x86-64 to use queued spinlocks ... Browse Code »

This patch makes the necessary changes at the x86 architecture
specific layer to enable the use of queued spinlocks for x86-64. As
x86-32 machines are typically not multi-socket. The benefit of queue
spinlock may not be apparent. So queued spinlocks are not enabled.

Currently, there is some incompatibilities between the para-virtualized
spinlock code (which hard-codes the use of ticket spinlock) and the
queued spinlocks. Therefore, the use of queued spinlocks is disabled
when the para-virtualized spinlock is enabled.

The arch/x86/include/asm/qspinlock.h header file includes some x86
specific optimization which will make the queueds spinlock code
perform better than the generic implementation.

Signed-off-by: Waiman Long
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Boris Ostrovsky
Cc: Borislav Petkov
Cc: Daniel J Blueman
Cc: David Vrabel
Cc: Douglas Hatch
Cc: H. Peter Anvin
Cc: Konrad Rzeszutek Wilk
Cc: Linus Torvalds
Cc: Oleg Nesterov
Cc: Paolo Bonzini
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Raghavendra K T
Cc: Rik van Riel
Cc: Scott J Norton
Cc: Thomas Gleixner
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1429901803-29771-3-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar

Waiman Long
2015-05-08 18:36:26 +0800
a33fda35e locking/qspinlock: Introduce a simple generic 4-byte queued spinlock ... Browse Code »

This patch introduces a new generic queued spinlock implementation that
can serve as an alternative to the default ticket spinlock. Compared
with the ticket spinlock, this queued spinlock should be almost as fair
as the ticket spinlock. It has about the same speed in single-thread
and it can be much faster in high contention situations especially when
the spinlock is embedded within the data structure to be protected.

Only in light to moderate contention where the average queue depth
is around 1-3 will this queued spinlock be potentially a bit slower
due to the higher slowpath overhead.

This queued spinlock is especially suit to NUMA machines with a large
number of cores as the chance of spinlock contention is much higher
in those machines. The cost of contention is also higher because of
slower inter-node memory traffic.

Due to the fact that spinlocks are acquired with preemption disabled,
the process will not be migrated to another CPU while it is trying
to get a spinlock. Ignoring interrupt handling, a CPU can only be
contending in one spinlock at any one time. Counting soft IRQ, hard
IRQ and NMI, a CPU can only have a maximum of 4 concurrent lock waiting
activities. By allocating a set of per-cpu queue nodes and used them
to form a waiting queue, we can encode the queue node address into a
much smaller 24-bit size (including CPU number and queue node index)
leaving one byte for the lock.

Please note that the queue node is only needed when waiting for the
lock. Once the lock is acquired, the queue node can be released to
be used later.

Signed-off-by: Waiman Long
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Boris Ostrovsky
Cc: Borislav Petkov
Cc: Daniel J Blueman
Cc: David Vrabel
Cc: Douglas Hatch
Cc: H. Peter Anvin
Cc: Konrad Rzeszutek Wilk
Cc: Linus Torvalds
Cc: Oleg Nesterov
Cc: Paolo Bonzini
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Raghavendra K T
Cc: Rik van Riel
Cc: Scott J Norton
Cc: Thomas Gleixner
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1429901803-29771-2-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar

Waiman Long
2015-05-08 18:36:25 +0800
663fdcbee kernel: Replace reference to ASSIGN_ONCE() with WRITE_ONCE() in comment ... Browse Code »

Looks like commit :

43239cbe79fc ("kernel: Change ASSIGN_ONCE(val, x) to WRITE_ONCE(x, val)")

left behind a reference to ASSIGN_ONCE(). Update this to WRITE_ONCE().

Signed-off-by: Preeti U Murthy
Signed-off-by: Peter Zijlstra (Intel)
Cc: Borislav Petkov
Cc: H. Peter Anvin
Cc: Thomas Gleixner
Cc: borntraeger@de.ibm.com
Cc: dave@stgolabs.net
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/20150430115721.22278.94082.stgit@preeti.in.ibm.com
Signed-off-by: Ingo Molnar

Preeti U Murthy
2015-05-08 18:28:53 +0800
59aabfc7e locking/rwsem: Reduce spinlock contention in wakeup after up_read()/up_write() ... Browse Code »

In up_write()/up_read(), rwsem_wake() will be called whenever it
detects that some writers/readers are waiting. The rwsem_wake()
function will take the wait_lock and call __rwsem_do_wake() to do the
real wakeup. For a heavily contended rwsem, doing a spin_lock() on
wait_lock will cause further contention on the heavily contended rwsem
cacheline resulting in delay in the completion of the up_read/up_write
operations.

This patch makes the wait_lock taking and the call to __rwsem_do_wake()
optional if at least one spinning writer is present. The spinning
writer will be able to take the rwsem and call rwsem_wake() later
when it calls up_write(). With the presence of a spinning writer,
rwsem_wake() will now try to acquire the lock using trylock. If that
fails, it will just quit.

Suggested-by: Peter Zijlstra (Intel)
Signed-off-by: Waiman Long
Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Davidlohr Bueso
Acked-by: Jason Low
Cc: Andrew Morton
Cc: Borislav Petkov
Cc: Douglas Hatch
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Scott J Norton
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1430428337-16802-2-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar

Waiman Long
2015-05-08 18:27:59 +0800
3e0283a53 Merge tag 'pm+acpi-4.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm ... Browse Code »

Pull power management and ACPI fixes from Rafael Wysocki:
"These include three regression fixes (PCI resources management,
ACPI/PNP device enumeration, ACPI SBS on MacBook) and two ACPI
documentation fixes related to GPIO.

Specifics:

- Fix for a PCI resources management regression introduced during the
4.0 cycle and related to the handling of ACPI resources'
Producer/Consumer flags that turn out to be useless (Jiang Liu)

- Fix for a MacBook regression related to the Smart Battery Subsystem
(SBS) driver causing various problems (stalls on boot, failure to
detect or report battery) to happen and introduced during the 3.18
cycle (Chris Bainbridge)

- Fix for an ACPI/PNP device enumeration regression introduced during
the 3.16 cycle caused by failing to include two PNP device IDs into
the list of IDs that PNP device objects need to be created for
(Witold Szczeponik)

- Fixes for two minor mistakes in the ACPI GPIO properties
documentation (Antonio Ospite, Rafael J Wysocki)"

* tag 'pm+acpi-4.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / PNP: add two IDs to list for PNPACPI device enumeration
ACPI / documentation: Fix ambiguity in the GPIO properties document
ACPI / documentation: fix a sentence about GPIO resources
ACPI / SBS: Add 5 us delay to fix SBS hangs on MacBook
x86/PCI/ACPI: Make all resources except [io 0xcf8-0xcff] available on PCI bus

Linus Torvalds
2015-05-08 06:58:00 +0800
9a5d9315e Merge branches 'acpi-resources', 'acpi-battery', 'acpi-doc' and 'acpi-pnp' ... Browse Code »

* acpi-resources:
x86/PCI/ACPI: Make all resources except [io 0xcf8-0xcff] available on PCI bus

* acpi-battery:
ACPI / SBS: Add 5 us delay to fix SBS hangs on MacBook

* acpi-doc:
ACPI / documentation: Fix ambiguity in the GPIO properties document
ACPI / documentation: fix a sentence about GPIO resources

* acpi-pnp:
ACPI / PNP: add two IDs to list for PNPACPI device enumeration

Rafael J. Wysocki
2015-05-08 03:24:34 +0800
68c2f356c Merge tag 'for-f2fs-4.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs ... Browse Code »

Pull f2fs fixes from Jaegeuk Kim:
"Fix a performance regression and a bug"

* tag 'for-f2fs-4.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
f2fs: fix wrong error hanlder in f2fs_follow_link
Revert "f2fs: enhance multi-threads performance"

Linus Torvalds
2015-05-08 02:18:34 +0800

07 May, 2015

7 commits

fbb7b92f1 Merge tag 'pinctrl-v4.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl ... Browse Code »

Pull pin control fixes from Linus Walleij:
"Here is a smallish set of pin control fixes for the v4.1 cycle,
collected the last two weeks:

- fix a real nasty legacy bug that has screwed up the protection of
adding pinctrl maps dynamically. Normally this didn't happen so
much but Dough Anderson ran into it and fixed it, kudos!

- minor driver fixes for Qualcomm spmi, mediatek and Marvell drivers"

* tag 'pinctrl-v4.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: Don't just pretend to protect pinctrl_maps, do it for real
pinctrl: mediatek: mtk-common: initialize unmask
pinctrl: qcom-spmi-mpp: Fix input value report
pinctrl: qcom-spmi: Fix pin direction configuration
pinctrl: mvebu: Fix mapping of pin 63 (gpo -> gpio)

Linus Torvalds
2015-05-07 23:27:38 +0800
7bbcd1b86 Merge tag 'vfio-v4.1-rc3' of git://github.com/awilliam/linux-vfio ... Browse Code »

Pull vfio fixes from Alex Williamson:
"Fix some undesirable behavior with the vfio device request interface:

- increase verbosity of device request channel (Alex Williamson)

- fix runaway interruptible timeout (Alex Williamson)"

* tag 'vfio-v4.1-rc3' of git://github.com/awilliam/linux-vfio:
vfio: Fix runaway interruptible timeout
vfio-pci: Log device requests more verbosely

Linus Torvalds
2015-05-07 23:18:01 +0800
8cb7c15b3 Merge tag 'for-linus' of git://github.com/dledford/linux ... Browse Code »

Pull infiniband updates from Doug Ledford:
"Minor updates for 4.1-rc

Most of the changes are fairly small and well confined. The iWARP
address reporting changes are the only ones that are a medium size. I
had these queued up prior to rc1, but due to the shuffle in
maintainers, they did not get submitted when I expected. My apologies
for that. I feel comfortable with them however due to the testing
they've received, so I left them in this submission"

* tag 'for-linus' of git://github.com/dledford/linux:
MAINTAINERS: Update InfiniBand subsystem maintainer
MAINTAINERS: add include/rdma/ to InfiniBand subsystem
IPoIB/CM: Fix indentation level
iw_cxgb4: Remove negative advice dmesg warnings
IB/core: Fix unaligned accesses
IB/core: change rdma_gid2ip into void function as it always return zero
IB/qib: use arch_phys_wc_add()
IB/qib: add acounting for MTRR
IB/core: dma unmap optimizations
IB/core: dma map/unmap locking optimizations
RDMA/cxgb4: Report the actual address of the remote connecting peer
RDMA/nes: Report the actual address of the remote connecting peer
RDMA/core: Enable the iWarp Port Mapper to provide the actual address of the connecting peer to its clients
iw_cxgb4: enforce qp/cq id requirements
iw_cxgb4: use BAR2 GTS register for T5 kernel mode CQs
iw_cxgb4: 32b platform fixes
iw_cxgb4: Cleanup register defines/MACROS
RDMA/CMA: Canonize IPv4 on IPV6 sockets properly

Linus Torvalds
2015-05-07 22:04:33 +0800
0e1dc4274 Merge tag 'for-linus-4.1b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip ... Browse Code »

Pull xen bug fixes from David Vrabel:

- fix blkback regression if using persistent grants

- fix various event channel related suspend/resume bugs

- fix AMD x86 regression with X86_BUG_SYSRET_SS_ATTRS

- SWIOTLB on ARM now uses frames evtchn before binding the channel to CPU in __startup_pirq()
xen/console: Update console event channel on resume
xen/xenbus: Update xenbus event channel on resume
xen/events: Clear cpu_evtchn_mask before resuming
xen-pciback: Add name prefix to global 'permissive' variable
xen: Suspend ticks on all CPUs during suspend
xen/grant: introduce func gnttab_unmap_refs_sync()
xen/blkback: safely unmap purge persistent grants

Linus Torvalds
2015-05-07 06:58:06 +0800
3d54ac9e3 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 fixes from Ingo Molnar:
"EFI fixes, and FPU fix, a ticket spinlock boundary condition fix and
two build fixes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu: Always restore_xinit_state() when use_eager_cpu()
x86: Make cpu_tss available to external modules
efi: Fix error handling in add_sysfs_runtime_map_entry()
x86/spinlocks: Fix regression in spinlock contention detection
x86/mm: Clean up types in xlate_dev_mem_ptr()
x86/efi: Store upper bits of command line buffer address in ext_cmd_line_ptr
efivarfs: Ensure VariableName is NUL-terminated

Linus Torvalds
2015-05-07 01:57:37 +0800
d8fce2db7 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull perf fixes from Ingo Molnar:
"Mostly tooling fixes, but also an uncore PMU driver fix and an uncore
PMU driver hardware-enablement addition"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf probe: Fix segfault if passed with ''.
perf report: Fix -T/--threads option to work again
perf bench numa: Fix immediate meeting of convergence condition
perf bench numa: Fixes of --quiet argument
perf bench futex: Fix hung wakeup tasks after requeueing
perf probe: Fix bug with global variables handling
perf top: Fix a segfault when kernel map is restricted.
tools lib traceevent: Fix build failure on 32-bit arch
perf kmem: Fix compiles on RHEL6/OL6
tools lib api: Undefine _FORTIFY_SOURCE before setting it
perf kmem: Consistently use PRIu64 for printing u64 values
perf trace: Disable events and drain events when forked workload ends
perf trace: Enable events when doing system wide tracing and starting a workload
perf/x86/intel/uncore: Move PCI IDs for IMC to uncore driver
perf/x86/intel/uncore: Add support for Intel Haswell ULT (lower power Mobile Processor) IMC uncore PMUs
perf/x86/intel: Add cpu_(prepare|starting|dying) for core_pmu

Linus Torvalds
2015-05-07 01:47:25 +0800
02f0f5721 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull RCU fix from Ingo Molnar:
"An RCU Kconfig fix that eliminates an annoying interactive kconfig
question for CONFIG_RCU_TORTURE_TEST_SLOW_INIT"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rcu: Control grace-period delays directly from value

Linus Torvalds
2015-05-07 01:26:37 +0800

06 May, 2015

22 commits

c5272a285 pinctrl: Don't just pretend to protect pinctrl_maps, do it for real ... Browse Code »

Way back, when the world was a simpler place and there was no war, no
evil, and no kernel bugs, there was just a single pinctrl lock. That
was how the world was when (57291ce pinctrl: core device tree mapping
table parsing support) was written. In that case, there were
instances where the pinctrl mutex was already held when
pinctrl_register_map() was called, hence a "locked" parameter was
passed to the function to indicate that the mutex was already locked
(so we shouldn't lock it again).

A few years ago in (42fed7b pinctrl: move subsystem mutex to
pinctrl_dev struct), we switched to a separate pinctrl_maps_mutex.
...but (oops) we forgot to re-think about the whole "locked" parameter
for pinctrl_register_map(). Basically the "locked" parameter appears
to still refer to whether the bigger pinctrl_dev mutex is locked, but
we're using it to skip locks of our (now separate) pinctrl_maps_mutex.

That's kind of a bad thing(TM). Probably nobody noticed because most
of the calls to pinctrl_register_map happen at boot time and we've got
synchronous device probing. ...and even cases where we're
asynchronous don't end up actually hitting the race too often. ...but
after banging my head against the wall for a bug that reproduced 1 out
of 1000 reboots and lots of looking through kgdb, I finally noticed
this.

Anyway, we can now safely remove the "locked" parameter and go back to
a war-free, evil-free, and kernel-bug-free world.

Fixes: 42fed7ba44e4 ("pinctrl: move subsystem mutex to pinctrl_dev struct")
Signed-off-by: Doug Anderson
Signed-off-by: Linus Walleij

Doug Anderson
2015-05-06 22:24:28 +0800
8746515d7 xen: Add __GFP_DMA flag when xen_swiotlb_init gets free pages on ARM ... Browse Code »

Make sure that xen_swiotlb_init allocates buffers that are DMA capable
when at least one memblock is available below 4G. Otherwise we assume
that all devices on the SoC can cope with >4G addresses. We do this on
ARM and ARM64, where dom0 is mapped 1:1, so pfn == mfn in this case.

No functional changes on x86.

From: Chen Baozi

Signed-off-by: Chen Baozi
Signed-off-by: Stefano Stabellini
Tested-by: Chen Baozi
Acked-by: Konrad Rzeszutek Wilk
Signed-off-by: David Vrabel

Stefano Stabellini
2015-05-06 22:02:58 +0800
c88d47480 x86/fpu: Always restore_xinit_state() when use_eager_cpu() ... Browse Code »

The following commit:

f893959b0898 ("x86/fpu: Don't abuse drop_init_fpu() in flush_thread()")

removed drop_init_fpu() usage from flush_thread(). This seems to break
things for me - the Go 1.4 test suite fails all over the place with
floating point comparision errors (offending commit found through
bisection).

The functional change was that flush_thread() after this commit
only calls restore_init_xstate() when both use_eager_fpu() and
!used_math() are true. drop_init_fpu() (now fpu_reset_state()) calls
restore_init_xstate() regardless of whether current used_math() - apply
the same logic here.

Switch used_math() -> tsk_used_math(tsk) to consistently use the grabbed
tsk instead of current, like in the rest of flush_thread().

Tested-by: Dave Hansen
Signed-off-by: Bobby Powers
Signed-off-by: Borislav Petkov
Acked-by: Oleg Nesterov
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Fenghua Yu
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Pekka Riikonen
Cc: Quentin Casasnovas
Cc: Rik van Riel
Cc: Suresh Siddha
Cc: Thomas Gleixner
Fixes: f893959b ("x86/fpu: Don't abuse drop_init_fpu() in flush_thread()")
Link: http://lkml.kernel.org/r/1430147441-9820-1-git-send-email-bobbypowers@gmail.com
Signed-off-by: Ingo Molnar

Bobby Powers
2015-05-06 17:22:03 +0800
c102cb097 Merge tag 'efi-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent ... Browse Code »

Pull EFI fixes from Matt Fleming:

* Avoid garbage names in efivarfs due to buggy firmware by zeroing
EFI variable name. (Ross Lagerwall)

* Stop erroneously dropping upper 32 bits of boot command line pointer
in EFI boot stub and stash them in ext_cmd_line_ptr. (Roy Franz)

* Fix double-free bug in error handling code path of EFI runtime map
code. (Dan Carpenter)

Signed-off-by: Ingo Molnar

Ingo Molnar
2015-05-06 14:30:24 +0800
74f40c1f4 Merge tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/g… ... Browse Code »

…it/acme/linux into perf/urgent

Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

- Fix 'perf probe -a' segfault if passed with '' (Wang Nan)

- Fix report -T/--threads option (Namhyung Kim)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Ingo Molnar
2015-05-06 10:54:11 +0800
5198b4437 Merge tag 'for-linus-4.1-1' of git://git.code.sf.net/p/openipmi/linux-ipmi ... Browse Code »

Pull IPMI fixes from Corey Minyard:
"Lots of minor IPMI fixes, especially ones that have have come up since
the SSIF driver has been in the main kernel for a while"

* tag 'for-linus-4.1-1' of git://git.code.sf.net/p/openipmi/linux-ipmi:
ipmi: Fix multi-part message handling
ipmi: Add alert handling to SSIF
ipmi: Fix a problem that messages are not issued in run_to_completion mode
ipmi: Report an error if ACPI _IFT doesn't exist
ipmi: Remove unused including
ipmi: Don't report err in the SI driver for SSIF devices
ipmi: Remove incorrect use of seq_has_overflowed
ipmi:ssif: Ignore spaces when comparing I2C adapter names
ipmi_ssif: Fix the logic on user-supplied addresses

Linus Torvalds
2015-05-06 10:42:01 +0800
2a171aa21 Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge misc fixes from Andrew Morton:
"16 patches

This includes a new rtc driver for the Abracon AB x80x and isn't very
appropriate for -rc2. It was still being fiddled with a bit during
the merge window and I fell asleep during -rc1"

[ So I took the new driver, it seems small and won't regress anything.
I'm a softy. - Linus ]

* emailed patches from Andrew Morton :
rtc: armada38x: fix concurrency access in armada38x_rtc_set_time
ocfs2: dlm: fix race between purge and get lock resource
nilfs2: fix sanity check of btree level in nilfs_btree_root_broken()
util_macros.h: have array pointer point to array of constants
configfs: init configfs module earlier at boot time
mm/hwpoison-inject: check PageLRU of hpage
mm/hwpoison-inject: fix refcounting in no-injection case
mm: soft-offline: fix num_poisoned_pages counting on concurrent events
rtc: add rtc-abx80x, a driver for the Abracon AB x80x i2c rtc
Documentation: bindings: add abracon,abx80x
kasan: show gcc version requirements in Kconfig and Documentation
mm/memory-failure: call shake_page() when error hits thp tail page
lib: delete lib/find_last_bit.c
MAINTAINERS: add co-maintainer for LED subsystem
zram: add Designated Reviewer for zram in MAINTAINERS
revert "zram: move compact_store() to sysfs functions area"

Linus Torvalds
2015-05-06 09:52:13 +0800
3ce05a4e7 Merge tag 'platform-drivers-x86-v4.1-2' of git://git.infradead.org/users/dvhart/… ... Browse Code »

…linux-platform-drivers-x86

Pull x86 platform driver fixes from Darren Hart:
"This includes a trivial warning and adding a Lenovo laptop to an
existing quirk.

I've held off on things like the latter in the past, but I didn't feel
it was risky enough to push out to 4.2.

- thinkpad_acpi:
Fix warning for static not at beginning

- ideapad_laptop:
Add Lenovo G40-30 to devices without radio switch"

* tag 'platform-drivers-x86-v4.1-2' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86:
thinkpad_acpi: Fix warning for static not at beginning
ideapad_laptop: Add Lenovo G40-30 to devices without radio switch

Linus Torvalds
2015-05-06 09:14:04 +0800
3d69d43ba ipmi: Fix multi-part message handling ... Browse Code »

Lots of little fixes for multi-part messages:

The values was not being re-initialized, if something went wrong
handling a multi-part message and it got left in a bad state, it
might be an issue.

The commands were not correct when issuing multi-part reads, the
code was not passing in the proper value for commands. Also clean
up some minor formatting issues.

Get the block number from the right location, limit the maximum send
message size to 63 bytes and explain why, and fix some minor sylistic
issues.

Signed-off-by: Corey Minyard

Corey Minyard
2015-05-06 08:37:22 +0800
916205217 ipmi: Add alert handling to SSIF ... Browse Code »

The SSIF interface can optionally have an SMBus alert come in when
data is ready. Unfortunately, the IPMI spec gives wiggle room to
the implementer to allow them to always have the alert enabled,
even if the driver doesn't enable it. So implement alerts.
If you don't in this situation, the SMBus alert handling will
constantly complain.

Signed-off-by: Corey Minyard

Corey Minyard
2015-05-06 08:36:38 +0800
9f8127048 ipmi: Fix a problem that messages are not issued in run_to_completion mode ... Browse Code »

start_next_msg() issues a message placed in smi_info->waiting_msg
if it is non-NULL. However, sender() sets a message to
smi_info->curr_msg and NULL to smi_info->waiting_msg in the context
of run_to_completion mode. As the result, it leads an infinite
loop by waiting the completion of unissued message when leaving
dying message after kernel panic.

sender() should set the message to smi_info->waiting_msg not
curr_msg.

Signed-off-by: Hidehiro Kawai
Signed-off-by: Corey Minyard

Hidehiro Kawai
2015-05-06 08:33:49 +0800
a182a4b2b ipmi: Report an error if ACPI _IFT doesn't exist ... Browse Code »

When probing an ACPI table, report a specific error, instead of just
returning an error, if _IFT doesn't exist.

Signed-off-by: Corey Minyard

Corey Minyard
2015-05-06 08:33:48 +0800
15c5725e6 ipmi: Remove unused including <linux/version.h> ... Browse Code »

Remove including that don't need it.

Signed-off-by: Wei Yongjun
Signed-off-by: Corey Minyard

Wei Yongjun
2015-05-06 08:33:28 +0800
489405fe5 rtc: armada38x: fix concurrency access in armada38x_rtc_set_time ... Browse Code »

While setting the time, the RTC TIME register should not be accessed.
However due to hardware constraints, setting the RTC time involves
sleeping during 100ms. This sleep was done outside the critical section
protected by the spinlock, so it was possible to read the RTC TIME
register and get an incorrect value. This patch introduces a mutex for
protecting the RTC TIME access, unlike the spinlock it is allowed to
sleep in a critical section protected by a mutex.

The RTC STATUS register can still be used from the interrupt handler but
it has no effect on setting the time.

Signed-off-by: Gregory CLEMENT
Acked-by: Alexandre Belloni
Acked-by: Andrew Lunn
Cc: Alessandro Zummo
Cc: [4.0]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gregory CLEMENT
2015-05-06 08:10:11 +0800
b1432a2a3 ocfs2: dlm: fix race between purge and get lock resource ... Browse Code »

There is a race window in dlm_get_lock_resource(), which may return a
lock resource which has been purged. This will cause the process to
hang forever in dlmlock() as the ast msg can't be handled due to its
lock resource not existing.

dlm_get_lock_resource {
...
spin_lock(&dlm->spinlock);
tmpres = __dlm_lookup_lockres_full(dlm, lockid, namelen, hash);
if (tmpres) {
spin_unlock(&dlm->spinlock);
>>>>>>>> race window, dlm_run_purge_list() may run and purge
the lock resource
spin_lock(&tmpres->spinlock);
...
spin_unlock(&tmpres->spinlock);
}
}

Signed-off-by: Junxiao Bi
Cc: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Junxiao Bi
2015-05-06 08:10:11 +0800
d8fd150fe nilfs2: fix sanity check of btree level in nilfs_btree_root_broken() ... Browse Code »

The range check for b-tree level parameter in nilfs_btree_root_broken()
is wrong; it accepts the case of "level == NILFS_BTREE_LEVEL_MAX" even
though the level is limited to values in the range of 0 to
(NILFS_BTREE_LEVEL_MAX - 1).

Since the level parameter is read from storage device and used to index
nilfs_btree_path array whose element count is NILFS_BTREE_LEVEL_MAX, it
can cause memory overrun during btree operations if the boundary value
is set to the level parameter on device.

This fixes the broken sanity check and adds a comment to clarify that
the upper bound NILFS_BTREE_LEVEL_MAX is exclusive.

Signed-off-by: Ryusuke Konishi
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ryusuke Konishi
2015-05-06 08:10:11 +0800
05836c378 util_macros.h: have array pointer point to array of constants ... Browse Code »

Using the new find_closest() macro can result in the following sparse
warnings.

drivers/hwmon/lm85.c:194:16: warning:
incorrect type in initializer (different modifiers)
drivers/hwmon/lm85.c:194:16: expected int *__fc_a
drivers/hwmon/lm85.c:194:16: got int static const [toplevel] *
drivers/hwmon/lm85.c:210:16: warning:
incorrect type in initializer (different modifiers)
drivers/hwmon/lm85.c:210:16: expected int *__fc_a
drivers/hwmon/lm85.c:210:16: got int const *map

This is because the array passed to find_closest() will typically be
declared as array of constants, but the macro declares a non-constant
pointer to it.

Signed-off-by: Guenter Roeck
Cc: Bartosz Golaszewski

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Guenter Roeck
2015-05-06 08:10:11 +0800
f5b697700 configfs: init configfs module earlier at boot time ... Browse Code »

We need this earlier in the boot process to allow various subsystems to
use configfs (e.g Industrial IIO).

Also, debugfs is at core_initcall level and configfs should be on the same
level from infrastructure point of view.

Signed-off-by: Daniel Baluta
Suggested-by: Lars-Peter Clausen
Reviewed-by: Christoph Hellwig
Cc: Al Viro
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daniel Baluta
2015-05-06 08:10:11 +0800
e386eed89 mm/hwpoison-inject: check PageLRU of hpage ... Browse Code »

Hwpoison injector checks PageLRU of the raw target page to find out
whether the page is an appropriate target, but current code now filters
out thp tail pages, which prevents us from testing for such cases via this
interface. So let's check hpage instead of p.

Signed-off-by: Naoya Horiguchi
Acked-by: Dean Nelson
Cc: Andi Kleen
Cc: Andrea Arcangeli
Cc: Hidetoshi Seto
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Naoya Horiguchi
2015-05-06 08:10:11 +0800
7ea434a4e mm/hwpoison-inject: fix refcounting in no-injection case ... Browse Code »

Hwpoison injection via debugfs:hwpoison/corrupt-pfn takes a refcount of
the target page. But current code doesn't release it if the target page
is not supposed to be injected, which results in memory leak. This patch
simply adds the refcount releasing code.

Signed-off-by: Naoya Horiguchi
Acked-by: Dean Nelson
Cc: Andi Kleen
Cc: Andrea Arcangeli
Cc: Hidetoshi Seto
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Naoya Horiguchi
2015-05-06 08:10:10 +0800
602498f9a mm: soft-offline: fix num_poisoned_pages counting on concurrent events ... Browse Code »

If multiple soft offline events hit one free page/hugepage concurrently,
soft_offline_page() can handle the free page/hugepage multiple times,
which makes num_poisoned_pages counter increased more than once. This
patch fixes this wrong counting by checking TestSetPageHWPoison for normal
papes and by checking the return value of dequeue_hwpoisoned_huge_page()
for hugepages.

Signed-off-by: Naoya Horiguchi
Acked-by: Dean Nelson
Cc: Andi Kleen
Cc: [3.14+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Naoya Horiguchi
2015-05-06 08:10:10 +0800
4d61ff6b9 rtc: add rtc-abx80x, a driver for the Abracon AB x80x i2c rtc ... Browse Code »

This is a basic driver for the ultra-low-power Abracon AB x80x series of RTC
chips. It supports in particular, the supersets AB0805 and AB1805.
It allows reading and writing the time, and enables the supercapacitor/
battery charger.

[arnd@arndb.de: abx805 depends on i2c]
[alexandre.belloni@free-electrons.com: renam buffer from date to buf in abx80x_rtc_read_time()]
Signed-off-by: Philippe De Muyter
Cc: Alessandro Zummo
Signed-off-by: Alexandre Belloni
Signed-off-by: Arnd Bergmann
Cc: Paul Bolle
Cc: Arnd Bergmann
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Philippe De Muyter
2015-05-06 08:10:10 +0800