Eric Lee / smarc-fsl-linux-kernel

20 Apr, 2015

1 commit

5224b9613 smp: Fix error case handling in smp_call_function_*() ... Browse Code »

Commit 8053871d0f7f ("smp: Fix smp_call_function_single_async()
locking") fixed the locking for the asynchronous smp-call case, but in
the process of moving the lock handling around, one of the error cases
ended up not unlocking the call data at all.

This went unnoticed on x86, because this is a "caller is buggy" case,
where the caller is trying to call a non-existent CPU. But apparently
ARM does that (at least under qemu-arm). Bindly doing cross-cpu calls
to random CPU's that aren't even online seems a bit fishy, but the error
handling was clearly not correct.

Simply add the missing "csd_unlock()" to the error path.

Reported-and-tested-by: Guenter Roeck
Analyzed-by: Rabin Vincent
Acked-by: Ingo Molnar
Signed-off-by: Linus Torvalds

Linus Torvalds
2015-04-20 04:19:23 +0800

17 Apr, 2015

1 commit

8053871d0 smp: Fix smp_call_function_single_async() locking ... Browse Code »

The current smp_function_call code suffers a number of problems, most
notably smp_call_function_single_async() is broken.

The problem is that flush_smp_call_function_queue() does csd_unlock()
_after_ calling csd->func(). This means that a caller cannot properly
synchronize the csd usage as it has to.

Change the code to release the csd before calling ->func() for the
async case, and put a WARN_ON_ONCE(csd->flags & CSD_FLAG_LOCK) in
smp_call_function_single_async() to warn us of improper serialization,
because any waiting there can results in deadlocks when called with
IRQs disabled.

Rename the (currently) unused WAIT flag to SYNCHRONOUS and (re)use it
such that we know what to do in flush_smp_call_function_queue().

Rework csd_{,un}lock() to use smp_load_acquire() / smp_store_release()
to avoid some full barriers while more clearly providing lock
semantics.

Finally move the csd maintenance out of generic_exec_single() into its
callers for clearer code.

Signed-off-by: Linus Torvalds
[ Added changelog. ]
Signed-off-by: Peter Zijlstra (Intel)
Cc: Frederic Weisbecker
Cc: Jens Axboe
Cc: Rafael David Tinoco
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/CA+55aFz492bzLFhdbKN-Hygjcreup7CjMEYk3nTSfRWjppz-OA@mail.gmail.com
Signed-off-by: Ingo Molnar

Linus Torvalds
2015-04-17 15:57:52 +0800

15 Oct, 2014

1 commit

0429fbc0b Merge branch 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu ... Browse Code »

Pull percpu consistent-ops changes from Tejun Heo:
"Way back, before the current percpu allocator was implemented, static
and dynamic percpu memory areas were allocated and handled separately
and had their own accessors. The distinction has been gone for many
years now; however, the now duplicate two sets of accessors remained
with the pointer based ones - this_cpu_*() - evolving various other
operations over time. During the process, we also accumulated other
inconsistent operations.

This pull request contains Christoph's patches to clean up the
duplicate accessor situation. __get_cpu_var() uses are replaced with
with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr().

Unfortunately, the former sometimes is tricky thanks to C being a bit
messy with the distinction between lvalues and pointers, which led to
a rather ugly solution for cpumask_var_t involving the introduction of
this_cpu_cpumask_var_ptr().

This converts most of the uses but not all. Christoph will follow up
with the remaining conversions in this merge window and hopefully
remove the obsolete accessors"

* 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits)
irqchip: Properly fetch the per cpu offset
percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix
ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write.
percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t
Revert "powerpc: Replace __get_cpu_var uses"
percpu: Remove __this_cpu_ptr
clocksource: Replace __this_cpu_ptr with raw_cpu_ptr
sparc: Replace __get_cpu_var uses
avr32: Replace __get_cpu_var with __this_cpu_write
blackfin: Replace __get_cpu_var uses
tile: Use this_cpu_ptr() for hardware counters
tile: Replace __get_cpu_var uses
powerpc: Replace __get_cpu_var uses
alpha: Replace __get_cpu_var
ia64: Replace __get_cpu_var uses
s390: cio driver &__get_cpu_var replacements
s390: Replace __get_cpu_var uses
mips: Replace __get_cpu_var uses
MIPS: Replace __get_cpu_var uses in FPU emulator.
arm: Replace __this_cpu_ptr with raw_cpu_ptr
...

Linus Torvalds
2014-10-15 13:48:18 +0800

19 Sep, 2014

1 commit

c6f4459fc smp: Add new wake_up_all_idle_cpus() function ... Browse Code »

Currently kick_all_cpus_sync() can break non-polling idle cpus
thru IPI interrupts.

But sometimes we need to break the polling idle cpus immediately
to reselect the suitable c-state, also for non-idle cpus, we need
to do nothing if we try to wake up them.

Here adding one new function wake_up_all_idle_cpus() to let all cpus
out of idle based on function wake_up_if_idle().

Signed-off-by: Chuansheng Liu
Signed-off-by: Peter Zijlstra (Intel)
Cc: daniel.lezcano@linaro.org
Cc: rjw@rjwysocki.net
Cc: linux-pm@vger.kernel.org
Cc: changcheng.liu@intel.com
Cc: xiaoming.wang@intel.com
Cc: souvik.k.chakravarty@intel.com
Cc: luto@amacapital.net
Cc: Andrew Morton
Cc: Christoph Hellwig
Cc: Frederic Weisbecker
Cc: Geert Uytterhoeven
Cc: Jan Kara
Cc: Jens Axboe
Cc: Jens Axboe
Cc: Linus Torvalds
Cc: Michal Hocko
Cc: Paul Gortmaker
Cc: Roman Gushchin
Cc: Srivatsa S. Bhat
Link: http://lkml.kernel.org/r/1409815075-4180-2-git-send-email-chuansheng.liu@intel.com
Signed-off-by: Ingo Molnar

Chuansheng Liu
2014-09-19 18:35:15 +0800

27 Aug, 2014

1 commit

bb964a92c kernel misc: Replace __get_cpu_var uses ... Browse Code »

Replace uses of __get_cpu_var for address calculation with this_cpu_ptr.

Cc: akpm@linux-foundation.org
Signed-off-by: Christoph Lameter
Signed-off-by: Tejun Heo

Christoph Lameter
2014-08-27 01:45:44 +0800

07 Aug, 2014

1 commit

618fde872 kernel/smp.c:on_each_cpu_cond(): fix warning in fallback path ... Browse Code »

The rarely-executed memry-allocation-failed callback path generates a
WARN_ON_ONCE() when smp_call_function_single() succeeds. Presumably
it's supposed to warn on failures.

Signed-off-by: Sasha Levin
Cc: Christoph Lameter
Cc: Gilad Ben-Yossef
Cc: David Rientjes
Cc: Joonsoo Kim
Cc: Tejun Heo
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sasha Levin
2014-08-07 09:01:22 +0800

16 Jul, 2014

1 commit

d26fad5b3 Merge tag 'v3.16-rc5' into sched/core, to refresh the branch before applying big… ... Browse Code »

…ger tree-wide changes

Signed-off-by: Ingo Molnar <mingo@kernel.org>

Ingo Molnar
2014-07-16 21:10:07 +0800

24 Jun, 2014

1 commit

8d056c48e CPU hotplug, smp: flush any pending IPI callbacks before CPU offline ... Browse Code »

There is a race between the CPU offline code (within stop-machine) and
the smp-call-function code, which can lead to getting IPIs on the
outgoing CPU, *after* it has gone offline.

Specifically, this can happen when using
smp_call_function_single_async() to send the IPI, since this API allows
sending asynchronous IPIs from IRQ disabled contexts. The exact race
condition is described below.

During CPU offline, in stop-machine, we don't enforce any rule in the
_DISABLE_IRQ stage, regarding the order in which the outgoing CPU and
the other CPUs disable their local interrupts. Due to this, we can
encounter a situation in which an IPI is sent by one of the other CPUs
to the outgoing CPU (while it is *still* online), but the outgoing CPU
ends up noticing it only *after* it has gone offline.

CPU 1 CPU 2
(Online CPU) (CPU going offline)

Enter _PREPARE stage Enter _PREPARE stage

Enter _DISABLE_IRQ stage

=
Got a device interrupt, and | Didn't notice the IPI
the interrupt handler sent an | since interrupts were
IPI to CPU 2 using | disabled on this CPU.
smp_call_function_single_async() |
=

Enter _DISABLE_IRQ stage

Enter _RUN stage Enter _RUN stage

=
Busy loop with interrupts | Invoke take_cpu_down()
disabled. | and take CPU 2 offline
=

Enter _EXIT stage Enter _EXIT stage

Re-enable interrupts Re-enable interrupts

The pending IPI is noted
immediately, but alas,
the CPU is offline at
this point.

This of course, makes the smp-call-function IPI handler code running on
CPU 2 unhappy and it complains about "receiving an IPI on an offline
CPU".

One real example of the scenario on CPU 1 is the block layer's
complete-request call-path:

__blk_complete_request() [interrupt-handler]
raise_blk_irq()
smp_call_function_single_async()

However, if we look closely, the block layer does check that the target
CPU is online before firing the IPI. So in this case, it is actually
the unfortunate ordering/timing of events in the stop-machine phase that
leads to receiving IPIs after the target CPU has gone offline.

In reality, getting a late IPI on an offline CPU is not too bad by
itself (this can happen even due to hardware latencies in IPI
send-receive). It is a bug only if the target CPU really went offline
without executing all the callbacks queued on its list. (Note that a
CPU is free to execute its pending smp-call-function callbacks in a
batch, without waiting for the corresponding IPIs to arrive for each one
of those callbacks).

So, fixing this issue can be broken up into two parts:

1. Ensure that a CPU goes offline only after executing all the
callbacks queued on it.

2. Modify the warning condition in the smp-call-function IPI handler
code such that it warns only if an offline CPU got an IPI *and* that
CPU had gone offline with callbacks still pending in its queue.

Achieving part 1 is straight-forward - just flush (execute) all the
queued callbacks on the outgoing CPU in the CPU_DYING stage[1],
including those callbacks for which the source CPU's IPIs might not have
been received on the outgoing CPU yet. Once we do this, an IPI that
arrives late on the CPU going offline (either due to the race mentioned
above, or due to hardware latencies) will be completely harmless, since
the outgoing CPU would have executed all the queued callbacks before
going offline.

Overall, this fix (parts 1 and 2 put together) additionally guarantees
that we will see a warning only when the *IPI-sender code* is buggy -
that is, if it queues the callback _after_ the target CPU has gone
offline.

[1]. The CPU_DYING part needs a little more explanation: by the time we
execute the CPU_DYING notifier callbacks, the CPU would have already
been marked offline. But we want to flush out the pending callbacks at
this stage, ignoring the fact that the CPU is offline. So restructure
the IPI handler code so that we can by-pass the "is-cpu-offline?" check
in this particular case. (Of course, the right solution here is to fix
CPU hotplug to mark the CPU offline _after_ invoking the CPU_DYING
notifiers, but this requires a lot of audit to ensure that this change
doesn't break any existing code; hence lets go with the solution
proposed above until that is done).

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Srivatsa S. Bhat
Suggested-by: Frederic Weisbecker
Cc: "Paul E. McKenney"
Cc: Borislav Petkov
Cc: Christoph Hellwig
Cc: Frederic Weisbecker
Cc: Gautham R Shenoy
Cc: Ingo Molnar
Cc: Mel Gorman
Cc: Mike Galbraith
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Rafael J. Wysocki
Cc: Rik van Riel
Cc: Rusty Russell
Cc: Steven Rostedt
Cc: Tejun Heo
Cc: Thomas Gleixner
Tested-by: Sachin Kamat
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Srivatsa S. Bhat
2014-06-24 07:47:43 +0800

16 Jun, 2014

1 commit

478850160 irq_work: Implement remote queueing ... Browse Code »

irq work currently only supports local callbacks. However its code
is mostly ready to run remote callbacks and we have some potential user.

The full nohz subsystem currently open codes its own remote irq work
on top of the scheduler ipi when it wants a CPU to reevaluate its next
tick. However this ad hoc solution bloats the scheduler IPI.

Lets just extend the irq work subsystem to support remote queuing on top
of the generic SMP IPI to handle this kind of user. This shouldn't add
noticeable overhead.

Suggested-by: Peter Zijlstra
Acked-by: Peter Zijlstra
Cc: Andrew Morton
Cc: Eric Dumazet
Cc: Ingo Molnar
Cc: Kevin Hilman
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Viresh Kumar
Signed-off-by: Frederic Weisbecker

Frederic Weisbecker
2014-06-16 22:26:54 +0800

07 Jun, 2014

1 commit

a219ccf46 smp: print more useful debug info upon receiving IPI on an offline CPU ... Browse Code »

There is a longstanding problem related to CPU hotplug which causes IPIs
to be delivered to offline CPUs, and the smp-call-function IPI handler
code prints out a warning whenever this is detected. Every once in a
while this (usually harmless) warning gets reported on LKML, but so far
it has not been completely fixed. Usually the solution involves finding
out the IPI sender and fixing it by adding appropriate synchronization
with CPU hotplug.

However, while going through one such internal bug reports, I found that
there is a significant bug in the receiver side itself (more
specifically, in stop-machine) that can lead to this problem even when
the sender code is perfectly fine. This patchset fixes that
synchronization problem in the CPU hotplug stop-machine code.

Patch 1 adds some additional debug code to the smp-call-function
framework, to help debug such issues easily.

Patch 2 modifies the stop-machine code to ensure that any IPIs that were
sent while the target CPU was online, would be noticed and handled by
that CPU without fail before it goes offline. Thus, this avoids
scenarios where IPIs are received on offline CPUs (as long as the sender
uses proper hotplug synchronization).

In fact, I debugged the problem by using Patch 1, and found that the
payload of the IPI was always the block layer's trigger_softirq()
function. But I was not able to find anything wrong with the block
layer code. That's when I started looking at the stop-machine code and
realized that there is a race-window which makes the IPI _receiver_ the
culprit, not the sender. Patch 2 fixes that race and hence this should
put an end to most of the hard-to-debug IPI-to-offline-CPU issues.

This patch (of 2):

Today the smp-call-function code just prints a warning if we get an IPI
on an offline CPU. This info is sufficient to let us know that
something went wrong, but often it is very hard to debug exactly who
sent the IPI and why, from this info alone.

In most cases, we get the warning about the IPI to an offline CPU,
immediately after the CPU going offline comes out of the stop-machine
phase and reenables interrupts. Since all online CPUs participate in
stop-machine, the information regarding the sender of the IPI is already
lost by the time we exit the stop-machine loop. So even if we dump the
stack on each CPU at this point, we won't find anything useful since all
of them will show the stack-trace of the stopper thread. So we need a
better way to figure out who sent the IPI and why.

To achieve this, when we detect an IPI targeted to an offline CPU, loop
through the call-single-data linked list and print out the payload
(i.e., the name of the function which was supposed to be executed by the
target CPU). This would give us an insight as to who might have sent
the IPI and help us debug this further.

[akpm@linux-foundation.org: correctly suppress warning output on second and later occurrences]
Signed-off-by: Srivatsa S. Bhat
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Tejun Heo
Cc: Rusty Russell
Cc: Frederic Weisbecker
Cc: Christoph Hellwig
Cc: Mel Gorman
Cc: Rik van Riel
Cc: Borislav Petkov
Cc: Steven Rostedt
Cc: Mike Galbraith
Cc: Gautham R Shenoy
Cc: "Paul E. McKenney"
Cc: Oleg Nesterov
Cc: Rafael J. Wysocki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Srivatsa S. Bhat
2014-06-07 07:08:12 +0800

25 Feb, 2014

6 commits

c46fff2a3 smp: Rename __smp_call_function_single() to smp_call_function_single_async() ... Browse Code »

The name __smp_call_function_single() doesn't tell much about the
properties of this function, especially when compared to
smp_call_function_single().

The comments above the implementation are also misleading. The main
point of this function is actually not to be able to embed the csd
in an object. This is actually a requirement that result from the
purpose of this function which is to raise an IPI asynchronously.

As such it can be called with interrupts disabled. And this feature
comes at the cost of the caller who then needs to serialize the
IPIs on this csd.

Lets rename the function and enhance the comments so that they reflect
these properties.

Suggested-by: Christoph Hellwig
Cc: Andrew Morton
Cc: Christoph Hellwig
Cc: Ingo Molnar
Cc: Jan Kara
Cc: Jens Axboe
Signed-off-by: Frederic Weisbecker
Signed-off-by: Jens Axboe

Frederic Weisbecker
2014-02-25 06:47:15 +0800
fce8ad156 smp: Remove wait argument from __smp_call_function_single() ... Browse Code »

The main point of calling __smp_call_function_single() is to send
an IPI in a pure asynchronous way. By embedding a csd in an object,
a caller can send the IPI without waiting for a previous one to complete
as is required by smp_call_function_single() for example. As such,
sending this kind of IPI can be safe even when irqs are disabled.

This flexibility comes at the expense of the caller who then needs to
synchronize the csd lifecycle by himself and make sure that IPIs on a
single csd are serialized.

This is how __smp_call_function_single() works when wait = 0 and this
usecase is relevant.

Now there don't seem to be any usecase with wait = 1 that can't be
covered by smp_call_function_single() instead, which is safer. Lets look
at the two possible scenario:

1) The user calls __smp_call_function_single(wait = 1) on a csd embedded
in an object. It looks like a nice and convenient pattern at the first
sight because we can then retrieve the object from the IPI handler easily.

But actually it is a waste of memory space in the object since the csd
can be allocated from the stack by smp_call_function_single(wait = 1)
and the object can be passed an the IPI argument.

Besides that, embedding the csd in an object is more error prone
because the caller must take care of the serialization of the IPIs
for this csd.

2) The user calls __smp_call_function_single(wait = 1) on a csd that
is allocated on the stack. It's ok but smp_call_function_single()
can do it as well and it already takes care of the allocation on the
stack. Again it's more simple and less error prone.

Therefore, using the underscore prepend API version with wait = 1
is a bad pattern and a sign that the caller can do safer and more
simple.

There was a single user of that which has just been converted.
So lets remove this option to discourage further users.

Cc: Andrew Morton
Cc: Christoph Hellwig
Cc: Ingo Molnar
Cc: Jan Kara
Cc: Jens Axboe
Signed-off-by: Frederic Weisbecker
Signed-off-by: Jens Axboe

Frederic Weisbecker
2014-02-25 06:47:09 +0800
d7877c03f smp: Move __smp_call_function_single() below its safe version ... Browse Code »

Move this function closer to __smp_call_function_single(). These functions
have very similar behavior and should be displayed in the same block
for clarity.

Reviewed-by: Jan Kara
Cc: Andrew Morton
Cc: Christoph Hellwig
Cc: Ingo Molnar
Cc: Jan Kara
Cc: Jens Axboe
Signed-off-by: Frederic Weisbecker
Signed-off-by: Jens Axboe

Frederic Weisbecker
2014-02-25 06:47:01 +0800
8b28499a7 smp: Consolidate the various smp_call_function_single() declensions ... Browse Code »

__smp_call_function_single() and smp_call_function_single() share some
code that can be factorized: execute inline when the target is local,
check if the target is online, lock the csd, call generic_exec_single().

Lets move the common parts to generic_exec_single().

Reviewed-by: Jan Kara
Cc: Andrew Morton
Cc: Christoph Hellwig
Cc: Ingo Molnar
Cc: Jan Kara
Cc: Jens Axboe
Signed-off-by: Frederic Weisbecker
Signed-off-by: Jens Axboe

Frederic Weisbecker
2014-02-25 06:46:58 +0800
08eed44c7 smp: Teach __smp_call_function_single() to check for offline cpus ... Browse Code »

Align __smp_call_function_single() with smp_call_function_single() so
that it also checks whether requested cpu is still online.

Signed-off-by: Jan Kara
Cc: Andrew Morton
Cc: Christoph Hellwig
Cc: Ingo Molnar
Cc: Jens Axboe
Signed-off-by: Frederic Weisbecker
Signed-off-by: Jens Axboe

Jan Kara
2014-02-25 06:46:55 +0800
5fd77595e smp: Iterate functions through llist_for_each_entry_safe() ... Browse Code »

The IPI function llist iteration is open coded. Lets simplify this
with using an llist iterator.

Also we want to keep the iteration safe against possible
csd.llist->next value reuse from the IPI handler. At least the block
subsystem used to do such things so lets stay careful and use
llist_for_each_entry_safe().

Signed-off-by: Jan Kara
Cc: Andrew Morton
Cc: Christoph Hellwig
Cc: Ingo Molnar
Cc: Jens Axboe
Signed-off-by: Frederic Weisbecker
Signed-off-by: Jens Axboe

Jan Kara
2014-02-25 06:46:47 +0800

31 Jan, 2014

2 commits

73f945505 kernel/smp.c: remove cpumask_ipi ... Browse Code »

After commit 9a46ad6d6df3 ("smp: make smp_call_function_many() use logic
similar to smp_call_function_single()"), cfd->cpumask is accessed only
in smp_call_function_many(). So there is no more need to copy it into
cfd->cpumask_ipi before putting csd into the list. The cpumask_ipi
field is obsolete and can be removed.

Signed-off-by: Roman Gushchin
Cc: Ingo Molnar
Cc: Christoph Hellwig
Cc: Wang YanQing
Cc: Xie XiuQi
Cc: Shaohua Li
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Roman Gushchin
2014-01-31 08:56:54 +0800
6897fc22e kernel: use lockless list for smp_call_function_single ... Browse Code »

Make smp_call_function_single and friends more efficient by using a
lockless list.

Signed-off-by: Christoph Hellwig
Reviewed-by: Jan Kara
Cc: Jens Axboe
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2014-01-31 08:56:54 +0800

15 Nov, 2013

2 commits

ca5ecd64c kernel: fix generic_exec_single indentation ... Browse Code »

Signed-off-by: Christoph Hellwig
Cc: Jan Kara
Cc: Jens Axboe
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2013-11-15 08:32:22 +0800
0a06ff068 kernel: remove CONFIG_USE_GENERIC_SMP_HELPERS ... Browse Code »

We've switched over every architecture that supports SMP to it, so
remove the new useless config variable.

Signed-off-by: Christoph Hellwig
Cc: Jan Kara
Cc: Jens Axboe
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2013-11-15 08:32:22 +0800

14 Nov, 2013

1 commit

0910c0bdf Merge branch 'for-3.13/core' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block IO core updates from Jens Axboe:
"This is the pull request for the core changes in the block layer for
3.13. It contains:

- The new blk-mq request interface.

This is a new and more scalable queueing model that marries the
best part of the request based interface we currently have (which
is fully featured, but scales poorly) and the bio based "interface"
which the new drivers for high IOPS devices end up using because
it's much faster than the request based one.

The bio interface has no block layer support, since it taps into
the stack much earlier. This means that drivers end up having to
implement a lot of functionality on their own, like tagging,
timeout handling, requeue, etc. The blk-mq interface provides all
these. Some drivers even provide a switch to select bio or rq and
has code to handle both, since things like merging only works in
the rq model and hence is faster for some workloads. This is a
huge mess. Conversion of these drivers nets us a substantial code
reduction. Initial results on converting SCSI to this model even
shows an 8x improvement on single queue devices. So while the
model was intended to work on the newer multiqueue devices, it has
substantial improvements for "classic" hardware as well. This code
has gone through extensive testing and development, it's now ready
to go. A pull request is coming to convert virtio-blk to this
model will be will be coming as well, with more drivers scheduled
for 3.14 conversion.

- Two blktrace fixes from Jan and Chen Gang.

- A plug merge fix from Alireza Haghdoost.

- Conversion of __get_cpu_var() from Christoph Lameter.

- Fix for sector_div() with 64-bit divider from Geert Uytterhoeven.

- A fix for a race between request completion and the timeout
handling from Jeff Moyer. This is what caused the merge conflict
with blk-mq/core, in case you are looking at that.

- A dm stacking fix from Mike Snitzer.

- A code consolidation fix and duplicated code removal from Kent
Overstreet.

- A handful of block bug fixes from Mikulas Patocka, fixing a loop
crash and memory corruption on blk cg.

- Elevator switch bug fix from Tomoki Sekiyama.

A heads-up that I had to rebase this branch. Initially the immutable
bio_vecs had been queued up for inclusion, but a week later, it became
clear that it wasn't fully cooked yet. So the decision was made to
pull this out and postpone it until 3.14. It was a straight forward
rebase, just pruning out the immutable series and the later fixes of
problems with it. The rest of the patches applied directly and no
further changes were made"

* 'for-3.13/core' of git://git.kernel.dk/linux-block: (31 commits)
block: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO
block: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO
block: Do not call sector_div() with a 64-bit divisor
kernel: trace: blktrace: remove redundent memcpy() in compat_blk_trace_setup()
block: Consolidate duplicated bio_trim() implementations
block: Use rw_copy_check_uvector()
block: Enable sysfs nomerge control for I/O requests in the plug list
block: properly stack underlying max_segment_size to DM device
elevator: acquire q->sysfs_lock in elevator_change()
elevator: Fix a race in elevator switching and md device initialization
block: Replace __get_cpu_var uses
bdi: test bdi_init failure
block: fix a probe argument to blk_register_region
loop: fix crash if blk_alloc_queue fails
blk-core: Fix memory corruption if blkcg_init_queue fails
block: fix race between request completion and timeout handling
blktrace: Send BLK_TN_PROCESS events to all running traces
blk-mq: don't disallow request merges for req->special being set
blk-mq: mq plug list breakage
blk-mq: fix for flush deadlock
...

Linus Torvalds
2013-11-14 11:08:14 +0800

25 Oct, 2013

2 commits

c84a83e2a smp: don't warn about csd->flags having CSD_FLAG_LOCK cleared for !wait ... Browse Code »

blk-mq reuses the request potentially immediately, since the most
cache hot is always given out first. This means that rq->csd could
be reused between csd->func() being called and csd_unlock() being
called. This isn't a problem, since we never use wait == 1 for
the smp call function. Add CSD_FLAG_WAIT to be able to tell the
difference, retaining the warning for other cases.

Cc: Ingo Molnar
Signed-off-by: Jens Axboe

Jens Axboe
2013-10-25 18:55:59 +0800
e3daab6ce smp: export __smp_call_function_single() ... Browse Code »

The blk-mq core and the blk-mq null driver uses it.

Reviewed-by: Christoph Hellwig
Acked-by: Ingo Molnar
Signed-off-by: Jens Axboe

Jens Axboe
2013-10-25 18:55:59 +0800

01 Oct, 2013

1 commit

a17bce4d1 x86/boot: Further compress CPUs bootup message ... Browse Code »

Turn it into (for example):

[ 0.073380] x86: Booting SMP configuration:
[ 0.074005] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7
[ 0.603005] .... node #1, CPUs: #8 #9 #10 #11 #12 #13 #14 #15
[ 1.200005] .... node #2, CPUs: #16 #17 #18 #19 #20 #21 #22 #23
[ 1.796005] .... node #3, CPUs: #24 #25 #26 #27 #28 #29 #30 #31
[ 2.393005] .... node #4, CPUs: #32 #33 #34 #35 #36 #37 #38 #39
[ 2.996005] .... node #5, CPUs: #40 #41 #42 #43 #44 #45 #46 #47
[ 3.600005] .... node #6, CPUs: #48 #49 #50 #51 #52 #53 #54 #55
[ 4.202005] .... node #7, CPUs: #56 #57 #58 #59 #60 #61 #62 #63
[ 4.811005] .... node #8, CPUs: #64 #65 #66 #67 #68 #69 #70 #71
[ 5.421006] .... node #9, CPUs: #72 #73 #74 #75 #76 #77 #78 #79
[ 6.032005] .... node #10, CPUs: #80 #81 #82 #83 #84 #85 #86 #87
[ 6.648006] .... node #11, CPUs: #88 #89 #90 #91 #92 #93 #94 #95
[ 7.262005] .... node #12, CPUs: #96 #97 #98 #99 #100 #101 #102 #103
[ 7.865005] .... node #13, CPUs: #104 #105 #106 #107 #108 #109 #110 #111
[ 8.466005] .... node #14, CPUs: #112 #113 #114 #115 #116 #117 #118 #119
[ 9.073006] .... node #15, CPUs: #120 #121 #122 #123 #124 #125 #126 #127
[ 9.679901] x86: Booted up 16 nodes, 128 CPUs

and drop useless elements.

Change num_digits() to hpa's division-avoiding, cell-phone-typed
version which he went at great lengths and pains to submit on a
Saturday evening.

Signed-off-by: Borislav Petkov
Cc: huawei.libin@huawei.com
Cc: wangyijing@huawei.com
Cc: fenghua.yu@intel.com
Cc: guohanjun@huawei.com
Cc: paul.gortmaker@windriver.com
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20130930095624.GB16383@pd.tnic
Signed-off-by: Ingo Molnar

Borislav Petkov
2013-10-01 16:52:30 +0800

12 Sep, 2013

2 commits

202da4005 kernel/smp.c: quit unconditionally enabling irqs in on_each_cpu_mask(). ... Browse Code »

As in commit f21afc25f9ed ("smp.h: Use local_irq_{save,restore}() in
!SMP version of on_each_cpu()"), we don't want to enable irqs if they
are not already enabled.

I don't know of any bugs currently caused by this unconditional
local_irq_enable(), but I want to use this function in MIPS/OCTEON early
boot (when we have early_boot_irqs_disabled). This also makes this
function have similar semantics to on_each_cpu() which is good in
itself.

Signed-off-by: David Daney
Cc: Gilad Ben-Yossef
Cc: Christoph Lameter
Cc: Chris Metcalf
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Daney
2013-09-12 06:58:25 +0800
60c323699 kernel/smp.c: free related resources when failure occurs in hotplug_cfd() ... Browse Code »

When failure occurs in hotplug_cfd(), need release related resources, or
will cause memory leak.

Signed-off-by: Chen Gang
Acked-by: Wang YanQing
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Chen Gang
2013-09-12 06:58:21 +0800

04 Sep, 2013

1 commit

5e0b3a4e8 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler changes from Ingo Molnar:
"Various optimizations, cleanups and smaller fixes - no major changes
in scheduler behavior"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Fix the sd_parent_degenerate() code
sched/fair: Rework and comment the group_imb code
sched/fair: Optimize find_busiest_queue()
sched/fair: Make group power more consistent
sched/fair: Remove duplicate load_per_task computations
sched/fair: Shrink sg_lb_stats and play memset games
sched: Clean-up struct sd_lb_stat
sched: Factor out code to should_we_balance()
sched: Remove one division operation in find_busiest_queue()
sched/cputime: Use this_cpu_add() in task_group_account_field()
cpumask: Fix cpumask leak in partition_sched_domains()
sched/x86: Optimize switch_mm() for multi-threaded workloads
generic-ipi: Kill unnecessary variable - csd_flags
numa: Mark __node_set() as __always_inline
sched/fair: Cleanup: remove duplicate variable declaration
sched/__wake_up_sync_key(): Fix nr_exclusive tasks which lead to WF_SYNC clearing

Linus Torvalds
2013-09-04 23:36:35 +0800

19 Aug, 2013

1 commit

15e71911f generic-ipi/locking: Fix misleading smp_call_function_any() description ... Browse Code »

Fix locking description: after commit 8969a5ede0f9e17da4b9437
("generic-ipi: remove kmalloc()"), wait = 0 can be guaranteed
because we don't kmalloc() anymore.

Signed-off-by: Xie XiuQi
Cc: Sheng Yang
Cc: Peter Zijlstra
Cc: Jens Axboe
Cc: Rusty Russell
Link: http://lkml.kernel.org/r/51F5E6F8.1000801@huawei.com
Signed-off-by: Ingo Molnar

Xie XiuQi
2013-08-19 15:03:50 +0800

31 Jul, 2013

1 commit

46591962c generic-ipi: Kill unnecessary variable - csd_flags ... Browse Code »

After commit 8969a5ede0f9e17da4b943712429aef2c9bcd82b
("generic-ipi: remove kmalloc()"), wait = 0 can be guaranteed,
and all callsites of generic_exec_single() do an unconditional
csd_lock() now.

So csd_flags is unnecessary now. Remove it.

Signed-off-by: Xie XiuQi
Signed-off-by: Peter Zijlstra
Cc: Oleg Nesterov
Cc: Linus Torvalds
Cc: Nick Piggin
Cc: Jens Axboe
Cc: "Paul E. McKenney"
Cc: Rusty Russell
Link: http://lkml.kernel.org/r/51F72DA1.7010401@huawei.com
Signed-off-by: Ingo Molnar

Xie XiuQi
2013-07-31 04:19:05 +0800

15 Jul, 2013

1 commit

0db0628d9 kernel: delete __cpuinit usage from all core kernel files ... Browse Code »

The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications. For example, the fix in
commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.

After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out. Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.

This removes all the uses of the __cpuinit macros from C files in
the core kernel directories (kernel, init, lib, mm, and include)
that don't really have a specific maintainer.

[1] https://lkml.org/lkml/2013/5/20/589

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2013-07-15 07:36:59 +0800

01 May, 2013

2 commits

e1d12f327 kernel/smp.c: cleanups ... Browse Code »

We sometimes use "struct call_single_data *data" and sometimes "struct
call_single_data *csd". Use "csd" consistently.

We sometimes use "struct call_function_data *data" and sometimes "struct
call_function_data *cfd". Use "cfd" consistently.

Also, avoid some 80-col layout tricks.

Cc: Ingo Molnar
Cc: Jens Axboe
Cc: Peter Zijlstra
Cc: Shaohua Li
Cc: Shaohua Li
Cc: Steven Rostedt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2013-05-01 08:04:03 +0800
1def1dc91 kernel/smp.c: use '|=' for csd_lock ... Browse Code »

csd_lock() uses assignment to data->flags rather than |=. That is not
buggy at present because only one bit (CSD_FLAG_LOCK) is defined in
call_single_data.flags.

But it will become buggy if we later add another flag, so fix it now.

Signed-off-by: liguang
Cc: Peter Zijlstra
Cc: Oleg Nesterov
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

liguang
2013-05-01 08:04:02 +0800

22 Feb, 2013

1 commit

9a46ad6d6 smp: make smp_call_function_many() use logic similar to smp_call_function_single() ... Browse Code »

I'm testing swapout workload in a two-socket Xeon machine. The workload
has 10 threads, each thread sequentially accesses separate memory
region. TLB flush overhead is very big in the workload. For each page,
page reclaim need move it from active lru list and then unmap it. Both
need a TLB flush. And this is a multthread workload, TLB flush happens
in 10 CPUs. In X86, TLB flush uses generic smp_call)function. So this
workload stress smp_call_function_many heavily.

Without patch, perf shows:
+ 24.49% [k] generic_smp_call_function_interrupt
- 21.72% [k] _raw_spin_lock
- _raw_spin_lock
+ 79.80% __page_check_address
+ 6.42% generic_smp_call_function_interrupt
+ 3.31% get_swap_page
+ 2.37% free_pcppages_bulk
+ 1.75% handle_pte_fault
+ 1.54% put_super
+ 1.41% grab_super_passive
+ 1.36% __swap_duplicate
+ 0.68% blk_flush_plug_list
+ 0.62% swap_info_get
+ 6.55% [k] flush_tlb_func
+ 6.46% [k] smp_call_function_many
+ 5.09% [k] call_function_interrupt
+ 4.75% [k] default_send_IPI_mask_sequence_phys
+ 2.18% [k] find_next_bit

swapout throughput is around 1300M/s.

With the patch, perf shows:
- 27.23% [k] _raw_spin_lock
- _raw_spin_lock
+ 80.53% __page_check_address
+ 8.39% generic_smp_call_function_single_interrupt
+ 2.44% get_swap_page
+ 1.76% free_pcppages_bulk
+ 1.40% handle_pte_fault
+ 1.15% __swap_duplicate
+ 1.05% put_super
+ 0.98% grab_super_passive
+ 0.86% blk_flush_plug_list
+ 0.57% swap_info_get
+ 8.25% [k] default_send_IPI_mask_sequence_phys
+ 7.55% [k] call_function_interrupt
+ 7.47% [k] smp_call_function_many
+ 7.25% [k] flush_tlb_func
+ 3.81% [k] _raw_spin_lock_irqsave
+ 3.78% [k] generic_smp_call_function_single_interrupt

swapout throughput is around 1400M/s. So there is around a 7%
improvement, and total cpu utilization doesn't change.

Without the patch, cfd_data is shared by all CPUs.
generic_smp_call_function_interrupt does read/write cfd_data several times
which will create a lot of cache ping-pong. With the patch, the data
becomes per-cpu. The ping-pong is avoided. And from the perf data, this
doesn't make call_single_queue lock contend.

Next step is to remove generic_smp_call_function_interrupt() from arch
code.

Signed-off-by: Shaohua Li
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Steven Rostedt
Cc: Jens Axboe
Cc: Linus Torvalds
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Shaohua Li
2013-02-22 09:22:20 +0800

28 Jan, 2013

1 commit

f44310b98 smp: Fix SMP function call empty cpu mask race ... Browse Code »

I get the following warning every day with v3.7, once or
twice a day:

[ 2235.186027] WARNING: at /mnt/sda7/kernel/linux/arch/x86/kernel/apic/ipi.c:109 default_send_IPI_mask_logical+0x2f/0xb8()

As explained by Linus as well:

|
| Once we've done the "list_add_rcu()" to add it to the
| queue, we can have (another) IPI to the target CPU that can
| now see it and clear the mask.
|
| So by the time we get to actually send the IPI, the mask might
| have been cleared by another IPI.
|

This patch also fixes a system hang problem, if the data->cpumask
gets cleared after passing this point:

if (WARN_ONCE(!mask, "empty IPI mask"))
return;

then the problem in commit 83d349f35e1a ("x86: don't send an IPI to
the empty set of CPU's") will happen again.

Signed-off-by: Wang YanQing
Acked-by: Linus Torvalds
Acked-by: Jan Beulich
Cc: Paul E. McKenney
Cc: Andrew Morton
Cc: peterz@infradead.org
Cc: mina86@mina86.org
Cc: srivatsa.bhat@linux.vnet.ibm.com
Cc:
Link: http://lkml.kernel.org/r/20130126075357.GA3205@udknight
[ Tidied up the changelog and the comment in the code. ]
Signed-off-by: Ingo Molnar

Wang YanQing
2013-01-28 18:21:57 +0800

05 Jun, 2012

1 commit

8feb8e896 smp: Remove ipi_call_lock[_irq]()/ipi_call_unlock[_irq]() ... Browse Code »

There is no user of those APIs anymore, just remove it.

Signed-off-by: Yong Zhang
Cc: ralf@linux-mips.org
Cc: sshtylyov@mvista.com
Cc: david.daney@cavium.com
Cc: nikunj@linux.vnet.ibm.com
Cc: paulmck@linux.vnet.ibm.com
Cc: axboe@kernel.dk
Cc: Andrew Morton
Link: http://lkml.kernel.org/r/1338275765-3217-11-git-send-email-yong.zhang0@gmail.com
Acked-by: Srivatsa S. Bhat
Acked-by: Peter Zijlstra
Signed-off-by: Thomas Gleixner

Yong Zhang
2012-06-05 23:27:14 +0800

08 May, 2012

1 commit

f37f435f3 smp: Implement kick_all_cpus_sync() ... Browse Code »

Will replace the misnomed cpu_idle_wait() function which is copied a
gazillion times all over arch/*

Signed-off-by: Thomas Gleixner
Acked-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20120507175652.049316594@linutronix.de

Thomas Gleixner
2012-05-08 18:35:06 +0800

04 May, 2012

1 commit

3bb5d2ee3 smp, idle: Allocate idle thread for each possible cpu during boot ... Browse Code »

percpu areas are already allocated during boot for each possible cpu.
percpu idle threads can be considered as an extension of the percpu areas,
and allocate them for each possible cpu during boot.

This will eliminate the need for workqueue based idle thread allocation.
In future we can move the idle thread area into the percpu area too.

[ tglx: Moved the loop into smpboot.c and added an error check when
the init code failed to allocate an idle thread for a cpu which
should be onlined ]

Signed-off-by: Suresh Siddha
Cc: Peter Zijlstra
Cc: Rusty Russell
Cc: Paul E. McKenney
Cc: Srivatsa S. Bhat
Cc: Tejun Heo
Cc: David Rientjes
Cc: venki@google.com
Link: http://lkml.kernel.org/r/1334966930.28674.245.camel@sbsiddha-desk.sc.intel.com
Signed-off-by: Thomas Gleixner

Suresh Siddha
2012-05-04 01:32:34 +0800

29 Mar, 2012

2 commits

b3a7e98e0 smp: add func to IPI cpus based on parameter func ... Browse Code »

Add the on_each_cpu_cond() function that wraps on_each_cpu_mask() and
calculates the cpumask of cpus to IPI by calling a function supplied as a
parameter in order to determine whether to IPI each specific cpu.

The function works around allocation failure of cpumask variable in
CONFIG_CPUMASK_OFFSTACK=y by itereating over cpus sending an IPI a time
via smp_call_function_single().

The function is useful since it allows to seperate the specific code that
decided in each case whether to IPI a specific cpu for a specific request
from the common boilerplate code of handling creating the mask, handling
failures etc.

[akpm@linux-foundation.org: s/gfpflags/gfp_flags/]
[akpm@linux-foundation.org: avoid double-evaluation of `info' (per Michal), parenthesise evaluation of `cond_func']
[akpm@linux-foundation.org: s/CPU/CPUs, use all 80 cols in comment]
Signed-off-by: Gilad Ben-Yossef
Cc: Chris Metcalf
Cc: Christoph Lameter
Acked-by: Peter Zijlstra
Cc: Frederic Weisbecker
Cc: Russell King
Cc: Pekka Enberg
Cc: Matt Mackall
Cc: Sasha Levin
Cc: Rik van Riel
Cc: Andi Kleen
Cc: Alexander Viro
Cc: Avi Kivity
Acked-by: Michal Nazarewicz
Cc: Kosaki Motohiro
Cc: Milton Miller
Reviewed-by: "Srivatsa S. Bhat"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gilad Ben-Yossef
2012-03-29 08:14:35 +0800
3fc498f16 smp: introduce a generic on_each_cpu_mask() function ... Browse Code »

We have lots of infrastructure in place to partition multi-core systems
such that we have a group of CPUs that are dedicated to specific task:
cgroups, scheduler and interrupt affinity, and cpuisol= boot parameter.
Still, kernel code will at times interrupt all CPUs in the system via IPIs
for various needs. These IPIs are useful and cannot be avoided
altogether, but in certain cases it is possible to interrupt only specific
CPUs that have useful work to do and not the entire system.

This patch set, inspired by discussions with Peter Zijlstra and Frederic
Weisbecker when testing the nohz task patch set, is a first stab at trying
to explore doing this by locating the places where such global IPI calls
are being made and turning the global IPI into an IPI for a specific group
of CPUs. The purpose of the patch set is to get feedback if this is the
right way to go for dealing with this issue and indeed, if the issue is
even worth dealing with at all. Based on the feedback from this patch set
I plan to offer further patches that address similar issue in other code
paths.

This patch creates an on_each_cpu_mask() and on_each_cpu_cond()
infrastructure API (the former derived from existing arch specific
versions in Tile and Arm) and uses them to turn several global IPI
invocation to per CPU group invocations.

Core kernel:

on_each_cpu_mask() calls a function on processors specified by cpumask,
which may or may not include the local processor.

You must not call this function with disabled interrupts or from a
hardware interrupt handler or from a bottom half handler.

arch/arm:

Note that the generic version is a little different then the Arm one:

1. It has the mask as first parameter
2. It calls the function on the calling CPU with interrupts disabled,
but this should be OK since the function is called on the other CPUs
with interrupts disabled anyway.

arch/tile:

The API is the same as the tile private one, but the generic version
also calls the function on the with interrupts disabled in UP case

This is OK since the function is called on the other CPUs
with interrupts disabled.

Signed-off-by: Gilad Ben-Yossef
Reviewed-by: Christoph Lameter
Acked-by: Chris Metcalf
Acked-by: Peter Zijlstra
Cc: Frederic Weisbecker
Cc: Russell King
Cc: Pekka Enberg
Cc: Matt Mackall
Cc: Rik van Riel
Cc: Andi Kleen
Cc: Sasha Levin
Cc: Mel Gorman
Cc: Alexander Viro
Cc: Avi Kivity
Acked-by: Michal Nazarewicz
Cc: Kosaki Motohiro
Cc: Milton Miller
Cc: Russell King
Acked-by: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gilad Ben-Yossef
2012-03-29 08:14:35 +0800

31 Oct, 2011

1 commit

9984de1a5 kernel: Map most files to use export.h instead of module.h ... Browse Code »

The changed files were only including linux/module.h for the
EXPORT_SYMBOL infrastructure, and nothing else. Revector them
onto the isolated export header for faster compile times.

Nothing to see here but a whole lot of instances of:

-#include
+#include

This commit is only changing the kernel dir; next targets
will probably be mm, fs, the arch dirs, etc.

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2011-10-31 21:20:12 +0800