Eric Lee / smarc-fsl-linux-kernel

26 May, 2016

1 commit

887bddfa9 add down_write_killable_nested() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2016-05-26 12:04:58 +0800

25 May, 2016

1 commit

0e01df100 Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

Pull ext4 updates from Ted Ts'o:
"Fix a number of bugs, most notably a potential stale data exposure
after a crash and a potential BUG_ON crash if a file has the data
journalling flag enabled while it has dirty delayed allocation blocks
that haven't been written yet. Also fix a potential crash in the new
project quota code and a maliciously corrupted file system.

In addition, fix some DAX-specific bugs, including when there is a
transient ENOSPC situation and races between writes via direct I/O and
an mmap'ed segment that could lead to lost I/O.

Finally the usual set of miscellaneous cleanups"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (23 commits)
ext4: pre-zero allocated blocks for DAX IO
ext4: refactor direct IO code
ext4: fix race in transient ENOSPC detection
ext4: handle transient ENOSPC properly for DAX
dax: call get_blocks() with create == 1 for write faults to unwritten extents
ext4: remove unmeetable inconsisteny check from ext4_find_extent()
jbd2: remove excess descriptions for handle_s
ext4: remove unnecessary bio get/put
ext4: silence UBSAN in ext4_mb_init()
ext4: address UBSAN warning in mb_find_order_for_block()
ext4: fix oops on corrupted filesystem
ext4: fix check of dqget() return value in ext4_ioctl_setproject()
ext4: clean up error handling when orphan list is corrupted
ext4: fix hang when processing corrupted orphaned inode list
ext4: remove trailing \n from ext4_warning/ext4_error calls
ext4: fix races between changing inode journal mode and ext4_writepages
ext4: handle unwritten or delalloc buffers before enabling data journaling
ext4: fix jbd2 handle extension in ext4_ext_truncate_extend_restart()
ext4: do not ask jbd2 to write data for delalloc buffers
jbd2: add support for avoiding data writes during transaction commits
...

Linus Torvalds
2016-05-25 03:55:26 +0800

17 May, 2016

2 commits

825a3b260 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler updates from Ingo Molnar:

- massive CPU hotplug rework (Thomas Gleixner)

- improve migration fairness (Peter Zijlstra)

- CPU load calculation updates/cleanups (Yuyang Du)

- cpufreq updates (Steve Muckle)

- nohz optimizations (Frederic Weisbecker)

- switch_mm() micro-optimization on x86 (Andy Lutomirski)

- ... lots of other enhancements, fixes and cleanups.

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (66 commits)
ARM: Hide finish_arch_post_lock_switch() from modules
sched/core: Provide a tsk_nr_cpus_allowed() helper
sched/core: Use tsk_cpus_allowed() instead of accessing ->cpus_allowed
sched/loadavg: Fix loadavg artifacts on fully idle and on fully loaded systems
sched/fair: Correct unit of load_above_capacity
sched/fair: Clean up scale confusion
sched/nohz: Fix affine unpinned timers mess
sched/fair: Fix fairness issue on migration
sched/core: Kill sched_class::task_waking to clean up the migration logic
sched/fair: Prepare to fix fairness problems on migration
sched/fair: Move record_wakee()
sched/core: Fix comment typo in wake_q_add()
sched/core: Remove unused variable
sched: Make hrtick_notifier an explicit call
sched/fair: Make ilb_notifier an explicit call
sched/hotplug: Make activate() the last hotplug step
sched/hotplug: Move migration CPU_DYING to sched_cpu_dying()
sched/migration: Move CPU_ONLINE into scheduler state
sched/migration: Move calc_load_migrate() into CPU_DYING
sched/migration: Move prepare transition to SCHED_STARTING state
...

Linus Torvalds
2016-05-17 05:47:16 +0800
3469d261e Merge branch 'locking-rwsem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull support for killable rwsems from Ingo Molnar:
"This, by Michal Hocko, implements down_write_killable().

The main usecase will be to update mm_sem usage sites to use this new
API, to allow the mm-reaper introduced in commit aac453635549 ("mm,
oom: introduce oom reaper") to tear down oom victim address spaces
asynchronously with minimum latencies and without deadlock worries"

[ The vfs will want it too as the inode lock is changed from a mutex to
a rwsem due to the parallel lookup and readdir updates ]

* 'locking-rwsem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rwsem: Fix comment on register clobbering
locking/rwsem: Fix down_write_killable()
locking/rwsem, x86: Add frame annotation for call_rwsem_down_write_failed_killable()
locking/rwsem: Provide down_write_killable()
locking/rwsem, x86: Provide __down_write_killable()
locking/rwsem, s390: Provide __down_write_killable()
locking/rwsem, ia64: Provide __down_write_killable()
locking/rwsem, alpha: Provide __down_write_killable()
locking/rwsem: Introduce basis for down_write_killable()
locking/rwsem, sparc: Drop superfluous arch specific implementation
locking/rwsem, sh: Drop superfluous arch specific implementation
locking/rwsem, xtensa: Drop superfluous arch specific implementation
locking/rwsem: Drop explicit memory barriers
locking/rwsem: Get rid of __down_write_nested()

Linus Torvalds
2016-05-17 04:41:02 +0800

16 May, 2016

1 commit

04cafed7f locking/rwsem: Fix down_write_killable() ... Browse Code »

The new signal_pending exit path in __rwsem_down_write_failed_common()
was fingered as breaking his kernel by Tetsuo Handa.

Upon inspection it was found that there are two things wrong with it;

- it forgets to remove WAITING_BIAS if it leaves the list empty, or
- it forgets to wake further waiters that were blocked on the now
removed waiter.

Especially the first issue causes new lock attempts to block and stall
indefinitely, as the code assumes that pending waiters mean there is
an owner that will wake when it releases the lock.

Reported-by: Tetsuo Handa
Tested-by: Tetsuo Handa
Tested-by: Michal Hocko
Signed-off-by: Peter Zijlstra (Intel)
Cc: Alexander Shishkin
Cc: Andrew Morton
Cc: Arnaldo Carvalho de Melo
Cc: Chris Zankel
Cc: David S. Miller
Cc: Davidlohr Bueso
Cc: H. Peter Anvin
Cc: Jiri Olsa
Cc: Linus Torvalds
Cc: Max Filippov
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Thomas Gleixner
Cc: Tony Luck
Cc: Vince Weaver
Cc: Waiman Long
Link: http://lkml.kernel.org/r/20160512115745.GP3192@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar

Peter Zijlstra
2016-05-16 04:55:00 +0800

12 May, 2016

1 commit

eb60b3e5e Merge branch 'sched/urgent' into sched/core to pick up fixes ... Browse Code »

Signed-off-by: Ingo Molnar

Ingo Molnar
2016-05-12 15:18:13 +0800

05 May, 2016

4 commits

b96bbdde1 locking/pvqspinlock: Robustify init_qspinlock_stat() ... Browse Code »

Specifically around the debugfs file creation calls,
I have no idea if they could ever possibly fail, but
this is core code (debug aside) so lets at least
check the return value and inform anything fishy.

Signed-off-by: Davidlohr Bueso
Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Waiman Long
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20160420041725.GC3472@linux-uzut.site
Signed-off-by: Ingo Molnar

Davidlohr Bueso
2016-05-05 15:58:51 +0800
dc209a3fd locking/pvqspinlock: Avoid double resetting of stats ... Browse Code »

... remove the redundant second iteration, this is most
likely a copy/past buglet.

Signed-off-by: Davidlohr Bueso
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: dave@stgolabs.net
Cc: waiman.long@hpe.com
Link: http://lkml.kernel.org/r/1460961103-24953-2-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar

Davidlohr Bueso
2016-05-05 15:58:49 +0800
e8c8ce548 Merge tag 'v4.6-rc6' into locking/core, to pick up fixes ... Browse Code »

Signed-off-by: Ingo Molnar

Ingo Molnar
2016-05-05 15:57:27 +0800
e7904a28f locking/lockdep, sched/core: Implement a better lock pinning scheme ... Browse Code »

The problem with the existing lock pinning is that each pin is of
value 1; this mean you can simply unpin if you know its pinned,
without having any extra information.

This scheme generates a random (16 bit) cookie for each pin and
requires this same cookie to unpin. This means you have to keep the
cookie in context.

No objsize difference for !LOCKDEP kernels.

Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Mike Galbraith
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2016-05-05 15:23:59 +0800

28 Apr, 2016

1 commit

5db429813 lcoking/locktorture: Simplify the torture_runnable computation ... Browse Code »

This commit replaces an #ifdef with IS_ENABLED(), saving five lines.

Signed-off-by: Paul E. McKenney
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: corbet@lwn.net
Cc: dave@stgolabs.net
Cc: dhowells@redhat.com
Cc: linux-doc@vger.kernel.org
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1461691328-5429-4-git-send-email-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar

Paul E. McKenney
2016-04-28 16:57:51 +0800

26 Apr, 2016

1 commit

c8585c6fc ext4: fix races between changing inode journal mode and ext4_writepages ... Browse Code »

In ext4, there is a race condition between changing inode journal mode
and ext4_writepages(). While ext4_writepages() is executed on a
non-journalled mode inode, the inode's journal mode could be enabled
by ioctl() and then, some pages dirtied after switching the journal
mode will be still exposed to ext4_writepages() in non-journaled mode.
To resolve this problem, we use fs-wide per-cpu rw semaphore by Jan
Kara's suggestion because we don't want to waste ext4_inode_info's
space for this extra rare case.

Signed-off-by: Daeho Jeong
Signed-off-by: Theodore Ts'o
Reviewed-by: Jan Kara

Daeho Jeong
2016-04-26 11:22:35 +0800

23 Apr, 2016

2 commits

75dd602a5 lockdep: Fix lock_chain::base size ... Browse Code »

lock_chain::base is used to store an index into the chain_hlocks[]
array, however that array contains more elements than can be indexed
using the u16.

Change the lock_chain structure to use a bitfield to encode the data
it needs and add BUILD_BUG_ON() assertions to check the fields are
wide enough.

Also, for DEBUG_LOCKDEP, assert that we don't run out of elements of
that array; as that would wreck the collision detectoring.

Signed-off-by: Peter Zijlstra (Intel)
Cc: Alfredo Alvarez Fernandez
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Sedat Dilek
Cc: Theodore Ts'o
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20160330093659.GS3408@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar

Peter Zijlstra
2016-04-23 19:53:03 +0800
c24697566 locking/lockdep: Fix ->irq_context calculation ... Browse Code »

task_irq_context() returns the encoded irq_context of the task, the
return value is encoded in the same as ->irq_context of held_lock.

Always return 0 if !(CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING)

Signed-off-by: Boqun Feng
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Josh Triplett
Cc: Lai Jiangshan
Cc: Linus Torvalds
Cc: Mathieu Desnoyers
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: Thomas Gleixner
Cc: sasha.levin@oracle.com
Link: http://lkml.kernel.org/r/1455602265-16490-2-git-send-email-boqun.feng@gmail.com
Signed-off-by: Ingo Molnar

Boqun Feng
2016-04-23 19:53:03 +0800

22 Apr, 2016

1 commit

916633a40 locking/rwsem: Provide down_write_killable() ... Browse Code »

Now that all the architectures implement the necessary glue code
we can introduce down_write_killable(). The only difference wrt. regular
down_write() is that the slow path waits in TASK_KILLABLE state and the
interruption by the fatal signal is reported as -EINTR to the caller.

Signed-off-by: Michal Hocko
Cc: Andrew Morton
Cc: Chris Zankel
Cc: David S. Miller
Cc: Linus Torvalds
Cc: Max Filippov
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Signed-off-by: Davidlohr Bueso
Cc: Signed-off-by: Jason Low
Cc: Thomas Gleixner
Cc: Tony Luck
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/1460041951-22347-12-git-send-email-mhocko@kernel.org
Signed-off-by: Ingo Molnar

Michal Hocko
2016-04-22 14:58:33 +0800

19 Apr, 2016

1 commit

668765956 locking/pvqspinlock: Fix division by zero in qstat_read() ... Browse Code »

While playing with the qstat statistics (in /qlockstat/) I ran into
the following splat on a VM when opening pv_hash_hops:

divide error: 0000 [#1] SMP
...
RIP: 0010:[] [] qstat_read+0x12e/0x1e0
...
Call Trace:
[] ? mem_cgroup_commit_charge+0x6c/0xd0
[] ? page_add_new_anon_rmap+0x8c/0xd0
[] ? handle_mm_fault+0x1439/0x1b40
[] ? do_mmap+0x449/0x550
[] ? __vfs_read+0x23/0xd0
[] ? rw_verify_area+0x52/0xd0
[] ? vfs_read+0x81/0x120
[] ? SyS_read+0x42/0xa0
[] ? entry_SYSCALL_64_fastpath+0x1e/0xa8

Fix this by verifying that qstat_pv_kick_unlock is in fact non-zero,
similarly to what the qstat_pv_latency_wake case does, as if nothing
else, this can come from resetting the statistics, thus having 0 kicks
should be quite valid in this context.

Signed-off-by: Davidlohr Bueso
Reviewed-by: Waiman Long
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: dave@stgolabs.net
Cc: waiman.long@hpe.com
Link: http://lkml.kernel.org/r/1460961103-24953-1-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar

Davidlohr Bueso
2016-04-19 16:49:19 +0800

13 Apr, 2016

5 commits

d47996082 locking/rwsem: Introduce basis for down_write_killable() ... Browse Code »

Introduce a generic implementation necessary for down_write_killable().

This is a trivial extension of the already existing down_write() call
which can be interrupted by SIGKILL. This patch doesn't provide
down_write_killable() yet because arches have to provide the necessary
pieces before.

rwsem_down_write_failed() which is a generic slow path for the
write lock is extended to take a task state and renamed to
__rwsem_down_write_failed_common(). The return value is either a valid
semaphore pointer or ERR_PTR(-EINTR).

rwsem_down_write_failed_killable() is exported as a new way to wait for
the lock and be killable.

For rwsem-spinlock implementation the current __down_write() it updated
in a similar way as __rwsem_down_write_failed_common() except it doesn't
need new exports just visible __down_write_killable().

Architectures which are not using the generic rwsem implementation are
supposed to provide their __down_write_killable() implementation and
use rwsem_down_write_failed_killable() for the slow path.

Signed-off-by: Michal Hocko
Cc: Andrew Morton
Cc: Chris Zankel
Cc: David S. Miller
Cc: Linus Torvalds
Cc: Max Filippov
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Signed-off-by: Davidlohr Bueso
Cc: Signed-off-by: Jason Low
Cc: Thomas Gleixner
Cc: Tony Luck
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/1460041951-22347-7-git-send-email-mhocko@kernel.org
Signed-off-by: Ingo Molnar

Michal Hocko
2016-04-13 16:42:20 +0800
f8e04d854 locking/rwsem: Get rid of __down_write_nested() ... Browse Code »

This is no longer used anywhere and all callers (__down_write()) use
0 as a subclass. Ditch __down_write_nested() to make the code easier
to follow.

This shouldn't introduce any functional change.

Signed-off-by: Michal Hocko
Acked-by: Davidlohr Bueso
Cc: Andrew Morton
Cc: Chris Zankel
Cc: David S. Miller
Cc: Linus Torvalds
Cc: Max Filippov
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Signed-off-by: Davidlohr Bueso
Cc: Signed-off-by: Jason Low
Cc: Thomas Gleixner
Cc: Tony Luck
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/1460041951-22347-2-git-send-email-mhocko@kernel.org
Signed-off-by: Ingo Molnar

Michal Hocko
2016-04-13 16:42:16 +0800
c003ed928 locking/lockdep: Deinline register_lock_class(), save 2328 bytes ... Browse Code »

This function compiles to 1328 bytes of machine code. Three callsites.

Registering a new lock class is definitely not *that* time-critical to inline it.

Signed-off-by: Denys Vlasenko
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1460141926-13069-5-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar

Denys Vlasenko
2016-04-13 16:06:13 +0800
c1c33b92d locking/locktorture: Fix NULL pointer dereference for cleanup paths ... Browse Code »

It has been found that paths that invoke cleanups through
lock_torture_cleanup() can trigger NULL pointer dereferencing
bugs during the statistics printing phase. This is mainly
because we should not be calling into statistics before we are
sure things have been set up correctly.

Specifically, early checks (and the need for handling this in
the cleanup call) only include parameter checks and basic
statistics allocation. Once we start write/read kthreads
we then consider the test as started. As such, update the function
in question to check for cxt.lwsa writer stats, if not set,
we either have a bogus parameter or -ENOMEM situation and
therefore only need to deal with general torture calls.

Reported-and-tested-by: Kefeng Wang
Signed-off-by: Davidlohr Bueso
Signed-off-by: Paul E. McKenney
Cc: Andrew Morton
Cc: Davidlohr Bueso
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: bobby.prani@gmail.com
Cc: dhowells@redhat.com
Cc: dipankar@in.ibm.com
Cc: dvhart@linux.intel.com
Cc: edumazet@google.com
Cc: fweisbec@gmail.com
Cc: jiangshanlai@gmail.com
Cc: josh@joshtriplett.org
Cc: mathieu.desnoyers@efficios.com
Cc: oleg@redhat.com
Cc: rostedt@goodmis.org
Link: http://lkml.kernel.org/r/1460476038-27060-2-git-send-email-paulmck@linux.vnet.ibm.com
[ Improved the changelog. ]
Signed-off-by: Ingo Molnar

Davidlohr Bueso
2016-04-13 14:52:23 +0800
1f1909318 locking/locktorture: Fix deboosting NULL pointer dereference ... Browse Code »

For the case of rtmutex torturing we will randomly call into the
boost() handler, including upon module exiting when the tasks are
deboosted before stopping. In such cases the task may or may not have
already been boosted, and therefore the NULL being explicitly passed
can occur anywhere. Currently we only assume that the task will is
at a higher prio, and in consequence, dereference a NULL pointer.

This patch fixes the case of a rmmod locktorture exploding while
pounding on the rtmutex lock (partial trace):

task: ffff88081026cf80 ti: ffff880816120000 task.ti: ffff880816120000
RSP: 0018:ffff880816123eb0 EFLAGS: 00010206
RAX: ffff88081026cf80 RBX: ffff880816bfa630 RCX: 0000000000160d1b
RDX: 0000000000000000 RSI: 0000000000000202 RDI: 0000000000000000
RBP: ffff88081026cf80 R08: 000000000000001f R09: ffff88017c20ca80
R10: 0000000000000000 R11: 000000000048c316 R12: ffffffffa05d1840
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff88203f880000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 0000000001c0a000 CR4: 00000000000406e0
Stack:
ffffffffa05d141d ffff880816bfa630 ffffffffa05d1922 ffff88081e70c2c0
ffff880816bfa630 ffffffff81095fed 0000000000000000 ffffffff8107bf60
ffff880816bfa630 ffffffff00000000 ffff880800000000 ffff880816123f08
Call Trace:
[] kthread+0xbd/0xe0
[] ret_from_fork+0x3f/0x70

This patch ensures that if the random state pointer is not NULL and current
is not boosted, then do nothing.

RIP: 0010:[] [] torture_random+0x5/0x60 [torture]
[] torture_rtmutex_boost+0x1d/0x90 [locktorture]
[] lock_torture_writer+0xe2/0x170 [locktorture]

Signed-off-by: Davidlohr Bueso
Signed-off-by: Paul E. McKenney
Cc: Andrew Morton
Cc: Davidlohr Bueso
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: bobby.prani@gmail.com
Cc: dhowells@redhat.com
Cc: dipankar@in.ibm.com
Cc: dvhart@linux.intel.com
Cc: edumazet@google.com
Cc: fweisbec@gmail.com
Cc: jiangshanlai@gmail.com
Cc: josh@joshtriplett.org
Cc: mathieu.desnoyers@efficios.com
Cc: oleg@redhat.com
Cc: rostedt@goodmis.org
Link: http://lkml.kernel.org/r/1460476038-27060-1-git-send-email-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar

Davidlohr Bueso
2016-04-13 14:52:23 +0800

04 Apr, 2016

1 commit

5c8a010c2 locking/lockdep: Fix print_collision() unused warning ... Browse Code »

Fix this:

kernel/locking/lockdep.c:2051:13: warning: ‘print_collision’ defined but not used [-Wunused-function]
static void print_collision(struct task_struct *curr,
^

Signed-off-by: Borislav Petkov
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1459759327-2880-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar

Borislav Petkov
2016-04-04 17:41:34 +0800

31 Mar, 2016

1 commit

39e2e173f locking/lockdep: Print chain_key collision information ... Browse Code »

A sequence of pairs [class_idx -> corresponding chain_key iteration]
is printed for both the current held_lock chain and the cached chain.

That exposes the two different class_idx sequences that led to that
particular hash value.

This helps with debugging hash chain collision reports.

Signed-off-by: Alfredo Alvarez Fernandez
Acked-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-fsdevel@vger.kernel.org
Cc: sedat.dilek@gmail.com
Cc: tytso@mit.edu
Link: http://lkml.kernel.org/r/1459357416-19190-1-git-send-email-alfredoalvarezernandez@gmail.com
Signed-off-by: Ingo Molnar

Alfredo Alvarez Fernandez
2016-03-31 21:03:58 +0800

23 Mar, 2016

1 commit

5c9a8750a kernel: add kcov code coverage ... Browse Code »

kcov provides code coverage collection for coverage-guided fuzzing
(randomized testing). Coverage-guided fuzzing is a testing technique
that uses coverage feedback to determine new interesting inputs to a
system. A notable user-space example is AFL
(http://lcamtuf.coredump.cx/afl/). However, this technique is not
widely used for kernel testing due to missing compiler and kernel
support.

kcov does not aim to collect as much coverage as possible. It aims to
collect more or less stable coverage that is function of syscall inputs.
To achieve this goal it does not collect coverage in soft/hard
interrupts and instrumentation of some inherently non-deterministic or
non-interesting parts of kernel is disbled (e.g. scheduler, locking).

Currently there is a single coverage collection mode (tracing), but the
API anticipates additional collection modes. Initially I also
implemented a second mode which exposes coverage in a fixed-size hash
table of counters (what Quentin used in his original patch). I've
dropped the second mode for simplicity.

This patch adds the necessary support on kernel side. The complimentary
compiler support was added in gcc revision 231296.

We've used this support to build syzkaller system call fuzzer, which has
found 90 kernel bugs in just 2 months:

https://github.com/google/syzkaller/wiki/Found-Bugs

We've also found 30+ bugs in our internal systems with syzkaller.
Another (yet unexplored) direction where kcov coverage would greatly
help is more traditional "blob mutation". For example, mounting a
random blob as a filesystem, or receiving a random blob over wire.

Why not gcov. Typical fuzzing loop looks as follows: (1) reset
coverage, (2) execute a bit of code, (3) collect coverage, repeat. A
typical coverage can be just a dozen of basic blocks (e.g. an invalid
input). In such context gcov becomes prohibitively expensive as
reset/collect coverage steps depend on total number of basic
blocks/edges in program (in case of kernel it is about 2M). Cost of
kcov depends only on number of executed basic blocks/edges. On top of
that, kernel requires per-thread coverage because there are always
background threads and unrelated processes that also produce coverage.
With inlined gcov instrumentation per-thread coverage is not possible.

kcov exposes kernel PCs and control flow to user-space which is
insecure. But debugfs should not be mapped as user accessible.

Based on a patch by Quentin Casasnovas.

[akpm@linux-foundation.org: make task_struct.kcov_mode have type `enum kcov_mode']
[akpm@linux-foundation.org: unbreak allmodconfig]
[akpm@linux-foundation.org: follow x86 Makefile layout standards]
Signed-off-by: Dmitry Vyukov
Reviewed-by: Kees Cook
Cc: syzkaller
Cc: Vegard Nossum
Cc: Catalin Marinas
Cc: Tavis Ormandy
Cc: Will Deacon
Cc: Quentin Casasnovas
Cc: Kostya Serebryany
Cc: Eric Dumazet
Cc: Alexander Potapenko
Cc: Kees Cook
Cc: Bjorn Helgaas
Cc: Sasha Levin
Cc: David Drysdale
Cc: Ard Biesheuvel
Cc: Andrey Ryabinin
Cc: Kirill A. Shutemov
Cc: Jiri Slaby
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dmitry Vyukov
2016-03-23 06:36:02 +0800

16 Mar, 2016

1 commit

25528213f tags: Fix DEFINE_PER_CPU expansions ... Browse Code »

$ make tags
GEN tags
ctags: Warning: drivers/acpi/processor_idle.c:64: null expansion of name pattern "\1"
ctags: Warning: drivers/xen/events/events_2l.c:41: null expansion of name pattern "\1"
ctags: Warning: kernel/locking/lockdep.c:151: null expansion of name pattern "\1"
ctags: Warning: kernel/rcu/rcutorture.c:133: null expansion of name pattern "\1"
ctags: Warning: kernel/rcu/rcutorture.c:135: null expansion of name pattern "\1"
ctags: Warning: kernel/workqueue.c:323: null expansion of name pattern "\1"
ctags: Warning: net/ipv4/syncookies.c:53: null expansion of name pattern "\1"
ctags: Warning: net/ipv6/syncookies.c:44: null expansion of name pattern "\1"
ctags: Warning: net/rds/page.c:45: null expansion of name pattern "\1"

Which are all the result of the DEFINE_PER_CPU pattern:

scripts/tags.sh:200: '/\
Acked-by: David S. Miller
Acked-by: Rafael J. Wysocki
Cc: Tejun Heo
Cc: "Paul E. McKenney"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2016-03-16 07:55:16 +0800

29 Feb, 2016

7 commits

9e4e7554e locking/lockdep: Detect chain_key collisions ... Browse Code »

Add detection for chain_key collision under CONFIG_DEBUG_LOCKDEP.
When a collision is detected the problem is reported and all lock
debugging is turned off.

Tested using liblockdep and the added tests before and after
applying the fix, confirming both that the code added for the
detection correctly reports the problem and that the fix actually
fixes it.

Tested tweaking lockdep to generate false collisions and
verified that the problem is reported and that lock debugging is
turned off.

Also tested with lockdep's test suite after applying the patch:

[ 0.000000] Good, all 253 testcases passed! |

Signed-off-by: Alfredo Alvarez Fernandez
Cc: Alfredo Alvarez Fernandez
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: sasha.levin@oracle.com
Link: http://lkml.kernel.org/r/1455864533-7536-4-git-send-email-alfredoalvarezernandez@gmail.com
Signed-off-by: Ingo Molnar

Ingo Molnar
2016-02-29 17:32:29 +0800
5f18ab5c6 locking/lockdep: Prevent chain_key collisions ... Browse Code »

The chain_key hashing macro iterate_chain_key(key1, key2) does not
generate a new different value if both key1 and key2 are 0. In that
case the generated value is again 0. This can lead to collisions which
can result in lockdep not detecting deadlocks or circular
dependencies.

Avoid the problem by using class_idx (1-based) instead of class id
(0-based) as an input for the hashing macro 'key2' in
iterate_chain_key(key1, key2).

The use of class id created collisions in cases like the following:

1.- Consider an initial state in which no class has been acquired yet.
Under these circumstances an AA deadlock will not be detected by
lockdep:

lock [key1,key2]->new key (key1=old chain_key, key2=id)
--------------------------
A [0,0]->0
A [0,0]->0 (collision)

The newly generated chain_key collides with the one used before and as
a result the check for a deadlock is skipped

A simple test using liblockdep and a pthread mutex confirms the
problem: (omitting stack traces)

new class 0xe15038: 0x7ffc64950f20
acquire class [0xe15038] 0x7ffc64950f20
acquire class [0xe15038] 0x7ffc64950f20
hash chain already cached, key: 0000000000000000 tail class:
[0xe15038] 0x7ffc64950f20

2.- Consider an ABBA in 2 different tasks and no class yet acquired.

T1 [key1,key2]->new key T2[key1,key2]->new key
-- --
A [0,0]->0

B [0,1]->1

B [0,1]->1 (collision)

A

In this case the collision prevents lockdep from creating the new
dependency A->B. This in turn results in lockdep not detecting the
circular dependency when T2 acquires A.

Signed-off-by: Alfredo Alvarez Fernandez
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: sasha.levin@oracle.com
Link: http://lkml.kernel.org/r/1455147212-2389-4-git-send-email-alfredoalvarezernandez@gmail.com
Signed-off-by: Ingo Molnar

Alfredo Alvarez Fernandez
2016-02-29 17:32:29 +0800
1329ce6fb locking/mutex: Allow next waiter lockless wakeup ... Browse Code »

Make use of wake-queues and enable the wakeup to occur after releasing the
wait_lock. This is similar to what we do with rtmutex top waiter,
slightly shortening the critical region and allow other waiters to
acquire the wait_lock sooner. In low contention cases it can also help
the recently woken waiter to find the wait_lock available (fastpath)
when it continues execution.

Reviewed-by: Waiman Long
Signed-off-by: Davidlohr Bueso
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Ding Tianhong
Cc: Jason Low
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Tim Chen
Cc: Waiman Long
Cc: Will Deacon
Link: http://lkml.kernel.org/r/20160125022343.GA3322@linux-uzut.site
Signed-off-by: Ingo Molnar

Davidlohr Bueso
2016-02-29 17:02:42 +0800
32d62510f locking/pvqspinlock: Enable slowpath locking count tracking ... Browse Code »

This patch enables the tracking of the number of slowpath locking
operations performed. This can be used to compare against the number
of lock stealing operations to see what percentage of locks are stolen
versus acquired via the regular slowpath.

Signed-off-by: Waiman Long
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Douglas Hatch
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Scott J Norton
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1449778666-13593-2-git-send-email-Waiman.Long@hpe.com
Signed-off-by: Ingo Molnar

Waiman Long
2016-02-29 17:02:42 +0800
cb037fdad locking/qspinlock: Use smp_cond_acquire() in pending code ... Browse Code »

The newly introduced smp_cond_acquire() was used to replace the
slowpath lock acquisition loop. Similarly, the new function can also
be applied to the pending bit locking loop. This patch uses the new
function in that loop.

Signed-off-by: Waiman Long
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Douglas Hatch
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Scott J Norton
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1449778666-13593-1-git-send-email-Waiman.Long@hpe.com
Signed-off-by: Ingo Molnar

Waiman Long
2016-02-29 17:02:42 +0800
eaff0e700 locking/pvqspinlock: Move lock stealing count tracking code into pv_queued_spin_steal_lock() ... Browse Code »

This patch moves the lock stealing count tracking code into
pv_queued_spin_steal_lock() instead of via a jacket function simplifying
the code.

Signed-off-by: Waiman Long
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Douglas Hatch
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Scott J Norton
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1449778666-13593-3-git-send-email-Waiman.Long@hpe.com
Signed-off-by: Ingo Molnar

Waiman Long
2016-02-29 17:02:41 +0800
920c720aa locking/mcs: Fix mcs_spin_lock() ordering ... Browse Code »

Similar to commit b4b29f94856a ("locking/osq: Fix ordering of node
initialisation in osq_lock") the use of xchg_acquire() is
fundamentally broken with MCS like constructs.

Furthermore, it turns out we rely on the global transitivity of this
operation because the unlock path observes the pointer with a
READ_ONCE(), not an smp_load_acquire().

This is non-critical because the MCS code isn't actually used and
mostly serves as documentation, a stepping stone to the more complex
things we've build on top of the idea.

Reported-by: Andrea Parri
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Will Deacon
Fixes: 3552a07a9c4a ("locking/mcs: Use acquire/release semantics")
Signed-off-by: Ingo Molnar

Peter Zijlstra
2016-02-29 17:02:41 +0800

09 Feb, 2016

3 commits

06bea3dbf locking/lockdep: Eliminate lockdep_init() ... Browse Code »

Lockdep is initialized at compile time now. Get rid of lockdep_init().

Signed-off-by: Andrey Ryabinin
Signed-off-by: Andrew Morton
Cc: Linus Torvalds
Cc: Mike Krinkin
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Cc: mm-commits@vger.kernel.org
Signed-off-by: Ingo Molnar

Andrey Ryabinin
2016-02-09 19:03:25 +0800
a63f38cc4 locking/lockdep: Convert hash tables to hlists ... Browse Code »

Mike said:

: CONFIG_UBSAN_ALIGNMENT breaks x86-64 kernel with lockdep enabled, i.e.
: kernel with CONFIG_UBSAN_ALIGNMENT=y fails to load without even any error
: message.
:
: The problem is that ubsan callbacks use spinlocks and might be called
: before lockdep is initialized. Particularly this line in the
: reserve_ebda_region function causes problem:
:
: lowmem = *(unsigned short *)__va(BIOS_LOWMEM_KILOBYTES);
:
: If i put lockdep_init() before reserve_ebda_region call in
: x86_64_start_reservations kernel loads well.

Fix this ordering issue permanently: change lockdep so that it uses hlists
for the hash tables. Unlike a list_head, an hlist_head is in its
initialized state when it is all-zeroes, so lockdep is ready for operation
immediately upon boot - lockdep_init() need not have run.

The patch will also save some memory.

Probably lockdep_init() and lockdep_initialized can be done away with now.

Suggested-by: Mike Krinkin
Reported-by: Mike Krinkin
Signed-off-by: Andrew Morton
Cc: Andrey Ryabinin
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Cc: mm-commits@vger.kernel.org
Signed-off-by: Ingo Molnar

Andrew Morton
2016-02-09 19:03:25 +0800
8a5fd5643 locking/lockdep: Fix stack trace caching logic ... Browse Code »

check_prev_add() caches saved stack trace in static trace variable
to avoid duplicate save_trace() calls in dependencies involving trylocks.
But that caching logic contains a bug. We may not save trace on first
iteration due to early return from check_prev_add(). Then on the
second iteration when we actually need the trace we don't save it
because we think that we've already saved it.

Let check_prev_add() itself control when stack is saved.

There is another bug. Trace variable is protected by graph lock.
But we can temporary release graph lock during printing.

Fix this by invalidating cached stack trace when we release graph lock.

Signed-off-by: Dmitry Vyukov
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: glider@google.com
Cc: kcc@google.com
Cc: peter@hurleysoftware.com
Cc: sasha.levin@oracle.com
Link: http://lkml.kernel.org/r/1454593240-121647-1-git-send-email-dvyukov@google.com
Signed-off-by: Ingo Molnar

Dmitry Vyukov
2016-02-09 18:06:08 +0800

26 Jan, 2016

1 commit

b4abf9104 rtmutex: Make wait_lock irq safe ... Browse Code »

Sasha reported a lockdep splat about a potential deadlock between RCU boosting
rtmutex and the posix timer it_lock.

CPU0 CPU1

rtmutex_lock(&rcu->rt_mutex)
spin_lock(&rcu->rt_mutex.wait_lock)
local_irq_disable()
spin_lock(&timer->it_lock)
spin_lock(&rcu->mutex.wait_lock)
--> Interrupt
spin_lock(&timer->it_lock)

This is caused by the following code sequence on CPU1

rcu_read_lock()
x = lookup();
if (x)
spin_lock_irqsave(&x->it_lock);
rcu_read_unlock();
return x;

We could fix that in the posix timer code by keeping rcu read locked across
the spinlocked and irq disabled section, but the above sequence is common and
there is no reason not to support it.

Taking rt_mutex.wait_lock irq safe prevents the deadlock.

Reported-by: Sasha Levin
Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Paul McKenney

Thomas Gleixner
2016-01-26 18:08:35 +0800

12 Jan, 2016

1 commit

24af98c4c Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull locking updates from Ingo Molnar:
"So we have a laundry list of locking subsystem changes:

- continuing barrier API and code improvements

- futex enhancements

- atomics API improvements

- pvqspinlock enhancements: in particular lock stealing and adaptive
spinning

- qspinlock micro-enhancements"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: Allow FUTEX_CLOCK_REALTIME with FUTEX_WAIT op
futex: Cleanup the goto confusion in requeue_pi()
futex: Remove pointless put_pi_state calls in requeue()
futex: Document pi_state refcounting in requeue code
futex: Rename free_pi_state() to put_pi_state()
futex: Drop refcount if requeue_pi() acquired the rtmutex
locking/barriers, arch: Remove ambiguous statement in the smp_store_mb() documentation
lcoking/barriers, arch: Use smp barriers in smp_store_release()
locking/cmpxchg, arch: Remove tas() definitions
locking/pvqspinlock: Queue node adaptive spinning
locking/pvqspinlock: Allow limited lock stealing
locking/pvqspinlock: Collect slowpath lock statistics
sched/core, locking: Document Program-Order guarantees
locking, sched: Introduce smp_cond_acquire() and use it
locking/pvqspinlock, x86: Optimize the PV unlock code path
locking/qspinlock: Avoid redundant read of next pointer
locking/qspinlock: Prefetch the next node cacheline
locking/qspinlock: Use _acquire/_release() versions of cmpxchg() & xchg()
atomics: Add test for atomic operations with _relaxed variants

Linus Torvalds
2016-01-12 06:18:38 +0800

18 Dec, 2015

1 commit

b4b29f948 locking/osq: Fix ordering of node initialisation in osq_lock ... Browse Code »

The Cavium guys reported a soft lockup on their arm64 machine, caused by
commit c55a6ffa6285 ("locking/osq: Relax atomic semantics"):

mutex_optimistic_spin+0x9c/0x1d0
__mutex_lock_slowpath+0x44/0x158
mutex_lock+0x54/0x58
kernfs_iop_permission+0x38/0x70
__inode_permission+0x88/0xd8
inode_permission+0x30/0x6c
link_path_walk+0x68/0x4d4
path_openat+0xb4/0x2bc
do_filp_open+0x74/0xd0
do_sys_open+0x14c/0x228
SyS_openat+0x3c/0x48
el0_svc_naked+0x24/0x28

This is because in osq_lock we initialise the node for the current CPU:

node->locked = 0;
node->next = NULL;
node->cpu = curr;

and then publish the current CPU in the lock tail:

old = atomic_xchg_acquire(&lock->tail, curr);

Once the update to lock->tail is visible to another CPU, the node is
then live and can be both read and updated by concurrent lockers.

Unfortunately, the ACQUIRE semantics of the xchg operation mean that
there is no guarantee the contents of the node will be visible before
lock tail is updated. This can lead to lock corruption when, for
example, a concurrent locker races to set the next field.

Fixes: c55a6ffa6285 ("locking/osq: Relax atomic semantics"):
Reported-by: David Daney
Reported-by: Andrew Pinski
Tested-by: Andrew Pinski
Acked-by: Davidlohr Bueso
Signed-off-by: Will Deacon
Signed-off-by: Peter Zijlstra (Intel)
Link: http://lkml.kernel.org/r/1449856001-21177-1-git-send-email-will.deacon@arm.com
Signed-off-by: Linus Torvalds

Will Deacon
2015-12-18 03:40:29 +0800

04 Dec, 2015

2 commits

cd0272fab locking/pvqspinlock: Queue node adaptive spinning ... Browse Code »

In an overcommitted guest where some vCPUs have to be halted to make
forward progress in other areas, it is highly likely that a vCPU later
in the spinlock queue will be spinning while the ones earlier in the
queue would have been halted. The spinning in the later vCPUs is then
just a waste of precious CPU cycles because they are not going to
get the lock soon as the earlier ones have to be woken up and take
their turn to get the lock.

This patch implements an adaptive spinning mechanism where the vCPU
will call pv_wait() if the previous vCPU is not running.

Linux kernel builds were run in KVM guest on an 8-socket, 4
cores/socket Westmere-EX system and a 4-socket, 8 cores/socket
Haswell-EX system. Both systems are configured to have 32 physical
CPUs. The kernel build times before and after the patch were:

Westmere Haswell
Patch 32 vCPUs 48 vCPUs 32 vCPUs 48 vCPUs
----- -------- -------- -------- --------
Before patch 3m02.3s 5m00.2s 1m43.7s 3m03.5s
After patch 3m03.0s 4m37.5s 1m43.0s 2m47.2s

For 32 vCPUs, this patch doesn't cause any noticeable change in
performance. For 48 vCPUs (over-committed), there is about 8%
performance improvement.

Signed-off-by: Waiman Long
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Davidlohr Bueso
Cc: Douglas Hatch
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Scott J Norton
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1447114167-47185-8-git-send-email-Waiman.Long@hpe.com
Signed-off-by: Ingo Molnar

Waiman Long
2015-12-04 18:39:51 +0800
1c4941fd5 locking/pvqspinlock: Allow limited lock stealing ... Browse Code »

This patch allows one attempt for the lock waiter to steal the lock
when entering the PV slowpath. To prevent lock starvation, the pending
bit will be set by the queue head vCPU when it is in the active lock
spinning loop to disable any lock stealing attempt. This helps to
reduce the performance penalty caused by lock waiter preemption while
not having much of the downsides of a real unfair lock.

The pv_wait_head() function was renamed as pv_wait_head_or_lock()
as it was modified to acquire the lock before returning. This is
necessary because of possible lock stealing attempts from other tasks.

Linux kernel builds were run in KVM guest on an 8-socket, 4
cores/socket Westmere-EX system and a 4-socket, 8 cores/socket
Haswell-EX system. Both systems are configured to have 32 physical
CPUs. The kernel build times before and after the patch were:

Westmere Haswell
Patch 32 vCPUs 48 vCPUs 32 vCPUs 48 vCPUs
----- -------- -------- -------- --------
Before patch 3m15.6s 10m56.1s 1m44.1s 5m29.1s
After patch 3m02.3s 5m00.2s 1m43.7s 3m03.5s

For the overcommited case (48 vCPUs), this patch is able to reduce
kernel build time by more than 54% for Westmere and 44% for Haswell.

Signed-off-by: Waiman Long
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Davidlohr Bueso
Cc: Douglas Hatch
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Scott J Norton
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1447190336-53317-1-git-send-email-Waiman.Long@hpe.com
Signed-off-by: Ingo Molnar

Waiman Long
2015-12-04 18:39:51 +0800