Eric Lee / smarc-fsl-linux-kernel

12 Sep, 2013

2 commits

202da4005 kernel/smp.c: quit unconditionally enabling irqs in on_each_cpu_mask(). ... Browse Code »

As in commit f21afc25f9ed ("smp.h: Use local_irq_{save,restore}() in
!SMP version of on_each_cpu()"), we don't want to enable irqs if they
are not already enabled.

I don't know of any bugs currently caused by this unconditional
local_irq_enable(), but I want to use this function in MIPS/OCTEON early
boot (when we have early_boot_irqs_disabled). This also makes this
function have similar semantics to on_each_cpu() which is good in
itself.

Signed-off-by: David Daney
Cc: Gilad Ben-Yossef
Cc: Christoph Lameter
Cc: Chris Metcalf
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Daney
2013-09-12 06:58:25 +0800
60c323699 kernel/smp.c: free related resources when failure occurs in hotplug_cfd() ... Browse Code »

When failure occurs in hotplug_cfd(), need release related resources, or
will cause memory leak.

Signed-off-by: Chen Gang
Acked-by: Wang YanQing
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Chen Gang
2013-09-12 06:58:21 +0800

04 Sep, 2013

1 commit

5e0b3a4e8 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler changes from Ingo Molnar:
"Various optimizations, cleanups and smaller fixes - no major changes
in scheduler behavior"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Fix the sd_parent_degenerate() code
sched/fair: Rework and comment the group_imb code
sched/fair: Optimize find_busiest_queue()
sched/fair: Make group power more consistent
sched/fair: Remove duplicate load_per_task computations
sched/fair: Shrink sg_lb_stats and play memset games
sched: Clean-up struct sd_lb_stat
sched: Factor out code to should_we_balance()
sched: Remove one division operation in find_busiest_queue()
sched/cputime: Use this_cpu_add() in task_group_account_field()
cpumask: Fix cpumask leak in partition_sched_domains()
sched/x86: Optimize switch_mm() for multi-threaded workloads
generic-ipi: Kill unnecessary variable - csd_flags
numa: Mark __node_set() as __always_inline
sched/fair: Cleanup: remove duplicate variable declaration
sched/__wake_up_sync_key(): Fix nr_exclusive tasks which lead to WF_SYNC clearing

Linus Torvalds
2013-09-04 23:36:35 +0800

19 Aug, 2013

1 commit

15e71911f generic-ipi/locking: Fix misleading smp_call_function_any() description ... Browse Code »

Fix locking description: after commit 8969a5ede0f9e17da4b9437
("generic-ipi: remove kmalloc()"), wait = 0 can be guaranteed
because we don't kmalloc() anymore.

Signed-off-by: Xie XiuQi
Cc: Sheng Yang
Cc: Peter Zijlstra
Cc: Jens Axboe
Cc: Rusty Russell
Link: http://lkml.kernel.org/r/51F5E6F8.1000801@huawei.com
Signed-off-by: Ingo Molnar

Xie XiuQi
2013-08-19 15:03:50 +0800

31 Jul, 2013

1 commit

46591962c generic-ipi: Kill unnecessary variable - csd_flags ... Browse Code »

After commit 8969a5ede0f9e17da4b943712429aef2c9bcd82b
("generic-ipi: remove kmalloc()"), wait = 0 can be guaranteed,
and all callsites of generic_exec_single() do an unconditional
csd_lock() now.

So csd_flags is unnecessary now. Remove it.

Signed-off-by: Xie XiuQi
Signed-off-by: Peter Zijlstra
Cc: Oleg Nesterov
Cc: Linus Torvalds
Cc: Nick Piggin
Cc: Jens Axboe
Cc: "Paul E. McKenney"
Cc: Rusty Russell
Link: http://lkml.kernel.org/r/51F72DA1.7010401@huawei.com
Signed-off-by: Ingo Molnar

Xie XiuQi
2013-07-31 04:19:05 +0800

15 Jul, 2013

1 commit

0db0628d9 kernel: delete __cpuinit usage from all core kernel files ... Browse Code »

The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications. For example, the fix in
commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.

After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out. Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.

This removes all the uses of the __cpuinit macros from C files in
the core kernel directories (kernel, init, lib, mm, and include)
that don't really have a specific maintainer.

[1] https://lkml.org/lkml/2013/5/20/589

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2013-07-15 07:36:59 +0800

01 May, 2013

2 commits

e1d12f327 kernel/smp.c: cleanups ... Browse Code »

We sometimes use "struct call_single_data *data" and sometimes "struct
call_single_data *csd". Use "csd" consistently.

We sometimes use "struct call_function_data *data" and sometimes "struct
call_function_data *cfd". Use "cfd" consistently.

Also, avoid some 80-col layout tricks.

Cc: Ingo Molnar
Cc: Jens Axboe
Cc: Peter Zijlstra
Cc: Shaohua Li
Cc: Shaohua Li
Cc: Steven Rostedt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2013-05-01 08:04:03 +0800
1def1dc91 kernel/smp.c: use '|=' for csd_lock ... Browse Code »

csd_lock() uses assignment to data->flags rather than |=. That is not
buggy at present because only one bit (CSD_FLAG_LOCK) is defined in
call_single_data.flags.

But it will become buggy if we later add another flag, so fix it now.

Signed-off-by: liguang
Cc: Peter Zijlstra
Cc: Oleg Nesterov
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

liguang
2013-05-01 08:04:02 +0800

22 Feb, 2013

1 commit

9a46ad6d6 smp: make smp_call_function_many() use logic similar to smp_call_function_single() ... Browse Code »

I'm testing swapout workload in a two-socket Xeon machine. The workload
has 10 threads, each thread sequentially accesses separate memory
region. TLB flush overhead is very big in the workload. For each page,
page reclaim need move it from active lru list and then unmap it. Both
need a TLB flush. And this is a multthread workload, TLB flush happens
in 10 CPUs. In X86, TLB flush uses generic smp_call)function. So this
workload stress smp_call_function_many heavily.

Without patch, perf shows:
+ 24.49% [k] generic_smp_call_function_interrupt
- 21.72% [k] _raw_spin_lock
- _raw_spin_lock
+ 79.80% __page_check_address
+ 6.42% generic_smp_call_function_interrupt
+ 3.31% get_swap_page
+ 2.37% free_pcppages_bulk
+ 1.75% handle_pte_fault
+ 1.54% put_super
+ 1.41% grab_super_passive
+ 1.36% __swap_duplicate
+ 0.68% blk_flush_plug_list
+ 0.62% swap_info_get
+ 6.55% [k] flush_tlb_func
+ 6.46% [k] smp_call_function_many
+ 5.09% [k] call_function_interrupt
+ 4.75% [k] default_send_IPI_mask_sequence_phys
+ 2.18% [k] find_next_bit

swapout throughput is around 1300M/s.

With the patch, perf shows:
- 27.23% [k] _raw_spin_lock
- _raw_spin_lock
+ 80.53% __page_check_address
+ 8.39% generic_smp_call_function_single_interrupt
+ 2.44% get_swap_page
+ 1.76% free_pcppages_bulk
+ 1.40% handle_pte_fault
+ 1.15% __swap_duplicate
+ 1.05% put_super
+ 0.98% grab_super_passive
+ 0.86% blk_flush_plug_list
+ 0.57% swap_info_get
+ 8.25% [k] default_send_IPI_mask_sequence_phys
+ 7.55% [k] call_function_interrupt
+ 7.47% [k] smp_call_function_many
+ 7.25% [k] flush_tlb_func
+ 3.81% [k] _raw_spin_lock_irqsave
+ 3.78% [k] generic_smp_call_function_single_interrupt

swapout throughput is around 1400M/s. So there is around a 7%
improvement, and total cpu utilization doesn't change.

Without the patch, cfd_data is shared by all CPUs.
generic_smp_call_function_interrupt does read/write cfd_data several times
which will create a lot of cache ping-pong. With the patch, the data
becomes per-cpu. The ping-pong is avoided. And from the perf data, this
doesn't make call_single_queue lock contend.

Next step is to remove generic_smp_call_function_interrupt() from arch
code.

Signed-off-by: Shaohua Li
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Steven Rostedt
Cc: Jens Axboe
Cc: Linus Torvalds
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Shaohua Li
2013-02-22 09:22:20 +0800

28 Jan, 2013

1 commit

f44310b98 smp: Fix SMP function call empty cpu mask race ... Browse Code »

I get the following warning every day with v3.7, once or
twice a day:

[ 2235.186027] WARNING: at /mnt/sda7/kernel/linux/arch/x86/kernel/apic/ipi.c:109 default_send_IPI_mask_logical+0x2f/0xb8()

As explained by Linus as well:

|
| Once we've done the "list_add_rcu()" to add it to the
| queue, we can have (another) IPI to the target CPU that can
| now see it and clear the mask.
|
| So by the time we get to actually send the IPI, the mask might
| have been cleared by another IPI.
|

This patch also fixes a system hang problem, if the data->cpumask
gets cleared after passing this point:

if (WARN_ONCE(!mask, "empty IPI mask"))
return;

then the problem in commit 83d349f35e1a ("x86: don't send an IPI to
the empty set of CPU's") will happen again.

Signed-off-by: Wang YanQing
Acked-by: Linus Torvalds
Acked-by: Jan Beulich
Cc: Paul E. McKenney
Cc: Andrew Morton
Cc: peterz@infradead.org
Cc: mina86@mina86.org
Cc: srivatsa.bhat@linux.vnet.ibm.com
Cc:
Link: http://lkml.kernel.org/r/20130126075357.GA3205@udknight
[ Tidied up the changelog and the comment in the code. ]
Signed-off-by: Ingo Molnar

Wang YanQing
2013-01-28 18:21:57 +0800

05 Jun, 2012

1 commit

8feb8e896 smp: Remove ipi_call_lock[_irq]()/ipi_call_unlock[_irq]() ... Browse Code »

There is no user of those APIs anymore, just remove it.

Signed-off-by: Yong Zhang
Cc: ralf@linux-mips.org
Cc: sshtylyov@mvista.com
Cc: david.daney@cavium.com
Cc: nikunj@linux.vnet.ibm.com
Cc: paulmck@linux.vnet.ibm.com
Cc: axboe@kernel.dk
Cc: Andrew Morton
Link: http://lkml.kernel.org/r/1338275765-3217-11-git-send-email-yong.zhang0@gmail.com
Acked-by: Srivatsa S. Bhat
Acked-by: Peter Zijlstra
Signed-off-by: Thomas Gleixner

Yong Zhang
2012-06-05 23:27:14 +0800

08 May, 2012

1 commit

f37f435f3 smp: Implement kick_all_cpus_sync() ... Browse Code »

Will replace the misnomed cpu_idle_wait() function which is copied a
gazillion times all over arch/*

Signed-off-by: Thomas Gleixner
Acked-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20120507175652.049316594@linutronix.de

Thomas Gleixner
2012-05-08 18:35:06 +0800

04 May, 2012

1 commit

3bb5d2ee3 smp, idle: Allocate idle thread for each possible cpu during boot ... Browse Code »

percpu areas are already allocated during boot for each possible cpu.
percpu idle threads can be considered as an extension of the percpu areas,
and allocate them for each possible cpu during boot.

This will eliminate the need for workqueue based idle thread allocation.
In future we can move the idle thread area into the percpu area too.

[ tglx: Moved the loop into smpboot.c and added an error check when
the init code failed to allocate an idle thread for a cpu which
should be onlined ]

Signed-off-by: Suresh Siddha
Cc: Peter Zijlstra
Cc: Rusty Russell
Cc: Paul E. McKenney
Cc: Srivatsa S. Bhat
Cc: Tejun Heo
Cc: David Rientjes
Cc: venki@google.com
Link: http://lkml.kernel.org/r/1334966930.28674.245.camel@sbsiddha-desk.sc.intel.com
Signed-off-by: Thomas Gleixner

Suresh Siddha
2012-05-04 01:32:34 +0800

29 Mar, 2012

2 commits

b3a7e98e0 smp: add func to IPI cpus based on parameter func ... Browse Code »

Add the on_each_cpu_cond() function that wraps on_each_cpu_mask() and
calculates the cpumask of cpus to IPI by calling a function supplied as a
parameter in order to determine whether to IPI each specific cpu.

The function works around allocation failure of cpumask variable in
CONFIG_CPUMASK_OFFSTACK=y by itereating over cpus sending an IPI a time
via smp_call_function_single().

The function is useful since it allows to seperate the specific code that
decided in each case whether to IPI a specific cpu for a specific request
from the common boilerplate code of handling creating the mask, handling
failures etc.

[akpm@linux-foundation.org: s/gfpflags/gfp_flags/]
[akpm@linux-foundation.org: avoid double-evaluation of `info' (per Michal), parenthesise evaluation of `cond_func']
[akpm@linux-foundation.org: s/CPU/CPUs, use all 80 cols in comment]
Signed-off-by: Gilad Ben-Yossef
Cc: Chris Metcalf
Cc: Christoph Lameter
Acked-by: Peter Zijlstra
Cc: Frederic Weisbecker
Cc: Russell King
Cc: Pekka Enberg
Cc: Matt Mackall
Cc: Sasha Levin
Cc: Rik van Riel
Cc: Andi Kleen
Cc: Alexander Viro
Cc: Avi Kivity
Acked-by: Michal Nazarewicz
Cc: Kosaki Motohiro
Cc: Milton Miller
Reviewed-by: "Srivatsa S. Bhat"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gilad Ben-Yossef
2012-03-29 08:14:35 +0800
3fc498f16 smp: introduce a generic on_each_cpu_mask() function ... Browse Code »

We have lots of infrastructure in place to partition multi-core systems
such that we have a group of CPUs that are dedicated to specific task:
cgroups, scheduler and interrupt affinity, and cpuisol= boot parameter.
Still, kernel code will at times interrupt all CPUs in the system via IPIs
for various needs. These IPIs are useful and cannot be avoided
altogether, but in certain cases it is possible to interrupt only specific
CPUs that have useful work to do and not the entire system.

This patch set, inspired by discussions with Peter Zijlstra and Frederic
Weisbecker when testing the nohz task patch set, is a first stab at trying
to explore doing this by locating the places where such global IPI calls
are being made and turning the global IPI into an IPI for a specific group
of CPUs. The purpose of the patch set is to get feedback if this is the
right way to go for dealing with this issue and indeed, if the issue is
even worth dealing with at all. Based on the feedback from this patch set
I plan to offer further patches that address similar issue in other code
paths.

This patch creates an on_each_cpu_mask() and on_each_cpu_cond()
infrastructure API (the former derived from existing arch specific
versions in Tile and Arm) and uses them to turn several global IPI
invocation to per CPU group invocations.

Core kernel:

on_each_cpu_mask() calls a function on processors specified by cpumask,
which may or may not include the local processor.

You must not call this function with disabled interrupts or from a
hardware interrupt handler or from a bottom half handler.

arch/arm:

Note that the generic version is a little different then the Arm one:

1. It has the mask as first parameter
2. It calls the function on the calling CPU with interrupts disabled,
but this should be OK since the function is called on the other CPUs
with interrupts disabled anyway.

arch/tile:

The API is the same as the tile private one, but the generic version
also calls the function on the with interrupts disabled in UP case

This is OK since the function is called on the other CPUs
with interrupts disabled.

Signed-off-by: Gilad Ben-Yossef
Reviewed-by: Christoph Lameter
Acked-by: Chris Metcalf
Acked-by: Peter Zijlstra
Cc: Frederic Weisbecker
Cc: Russell King
Cc: Pekka Enberg
Cc: Matt Mackall
Cc: Rik van Riel
Cc: Andi Kleen
Cc: Sasha Levin
Cc: Mel Gorman
Cc: Alexander Viro
Cc: Avi Kivity
Acked-by: Michal Nazarewicz
Cc: Kosaki Motohiro
Cc: Milton Miller
Cc: Russell King
Acked-by: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gilad Ben-Yossef
2012-03-29 08:14:35 +0800

31 Oct, 2011

1 commit

9984de1a5 kernel: Map most files to use export.h instead of module.h ... Browse Code »

The changed files were only including linux/module.h for the
EXPORT_SYMBOL infrastructure, and nothing else. Revector them
onto the isolated export header for faster compile times.

Nothing to see here but a whole lot of instances of:

-#include
+#include

This commit is only changing the kernel dir; next targets
will probably be mm, fs, the arch dirs, etc.

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2011-10-31 21:20:12 +0800

17 Jun, 2011

1 commit

d8ad7d112 generic-ipi: Fix kexec boot crash by initializing call_single_queue before enabling interrupts ... Browse Code »

There is a problem that kdump(2nd kernel) sometimes hangs up due
to a pending IPI from 1st kernel. Kernel panic occurs because IPI
comes before call_single_queue is initialized.

To fix the crash, rename init_call_single_data() to call_function_init()
and call it in start_kernel() so that call_single_queue can be
initialized before enabling interrupts.

The details of the crash are:

(1) 2nd kernel boots up

(2) A pending IPI from 1st kernel comes when irqs are first enabled
in start_kernel().

(3) Kernel tries to handle the interrupt, but call_single_queue
is not initialized yet at this point. As a result, in the
generic_smp_call_function_single_interrupt(), NULL pointer
dereference occurs when list_replace_init() tries to access
&q->list.next.

Therefore this patch changes the name of init_call_single_data()
to call_function_init() and calls it before local_irq_enable()
in start_kernel().

Signed-off-by: Takao Indoh
Reviewed-by: WANG Cong
Acked-by: Neil Horman
Acked-by: Vivek Goyal
Acked-by: Peter Zijlstra
Cc: Milton Miller
Cc: Jens Axboe
Cc: Paul E. McKenney
Cc: kexec@lists.infradead.org
Link: http://lkml.kernel.org/r/D6CBEE2F420741indou.takao@jp.fujitsu.com
Signed-off-by: Ingo Molnar

Takao Indoh
2011-06-17 16:17:12 +0800

23 Mar, 2011

1 commit

34db18a05 smp: move smp setup functions to kernel/smp.c ... Browse Code »

Move setup_nr_cpu_ids(), smp_init() and some other SMP boot parameter
setup functions from init/main.c to kenrel/smp.c, saves some #ifdef
CONFIG_SMP.

Signed-off-by: WANG Cong
Cc: Rakib Mullick
Cc: David Howells
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Tejun Heo
Cc: Arnd Bergmann
Cc: Akinobu Mita
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Amerigo Wang
2011-03-23 08:44:11 +0800

18 Mar, 2011

4 commits

c8def554d smp_call_function_interrupt: use typedef and %pf ... Browse Code »

Use the newly added smp_call_func_t in smp_call_function_interrupt for
the func variable, and make the comment above the WARN more assertive
and explicit. Also, func is a function pointer and does not need an
offset, so use %pf not %pS.

Signed-off-by: Milton Miller
Signed-off-by: Linus Torvalds

Milton Miller
2011-03-18 07:58:11 +0800
723aae25d smp_call_function_many: handle concurrent clearing of mask ... Browse Code »

Mike Galbraith reported finding a lockup ("perma-spin bug") where the
cpumask passed to smp_call_function_many was cleared by other cpu(s)
while a cpu was preparing its call_data block, resulting in no cpu to
clear the last ref and unlock the block.

Having cpus clear their bit asynchronously could be useful on a mask of
cpus that might have a translation context, or cpus that need a push to
complete an rcu window.

Instead of adding a BUG_ON and requiring yet another cpumask copy, just
detect the race and handle it.

Note: arch_send_call_function_ipi_mask must still handle an empty
cpumask because the data block is globally visible before the that arch
callback is made. And (obviously) there are no guarantees to which cpus
are notified if the mask is changed during the call; only cpus that were
online and had their mask bit set during the whole call are guaranteed
to be called.

Reported-by: Mike Galbraith
Reported-by: Jan Beulich
Acked-by: Jan Beulich
Cc: stable@kernel.org
Signed-off-by: Milton Miller
Signed-off-by: Linus Torvalds

Milton Miller
2011-03-18 07:58:10 +0800
45a579192 call_function_many: add missing ordering ... Browse Code »

Paul McKenney's review pointed out two problems with the barriers in the
2.6.38 update to the smp call function many code.

First, a barrier that would force the func and info members of data to
be visible before their consumption in the interrupt handler was
missing. This can be solved by adding a smp_wmb between setting the
func and info members and setting setting the cpumask; this will pair
with the existing and required smp_rmb ordering the cpumask read before
the read of refs. This placement avoids the need a second smp_rmb in
the interrupt handler which would be executed on each of the N cpus
executing the call request. (I was thinking this barrier was present
but was not).

Second, the previous write to refs (establishing the zero that we the
interrupt handler was testing from all cpus) was performed by a third
party cpu. This would invoke transitivity which, as a recient or
concurrent addition to memory-barriers.txt now explicitly states, would
require a full smp_mb().

However, we know the cpumask will only be set by one cpu (the data
owner) and any preivous iteration of the mask would have cleared by the
reading cpu. By redundantly writing refs to 0 on the owning cpu before
the smp_wmb, the write to refs will follow the same path as the writes
that set the cpumask, which in turn allows us to keep the barrier in the
interrupt handler a smp_rmb instead of promoting it to a smp_mb (which
will be be executed by N cpus for each of the possible M elements on the
list).

I moved and expanded the comment about our (ab)use of the rcu list
primitives for the concurrent walk earlier into this function. I
considered moving the first two paragraphs to the queue list head and
lock, but felt it would have been too disconected from the code.

Cc: Paul McKinney
Cc: stable@kernel.org (2.6.32 and later)
Signed-off-by: Milton Miller
Signed-off-by: Linus Torvalds

Milton Miller
2011-03-18 07:58:10 +0800
e6cd1e07a call_function_many: fix list delete vs add race ... Browse Code »

Peter pointed out there was nothing preventing the list_del_rcu in
smp_call_function_interrupt from running before the list_add_rcu in
smp_call_function_many.

Fix this by not setting refs until we have gotten the lock for the list.
Take advantage of the wmb in list_add_rcu to save an explicit additional
one.

I tried to force this race with a udelay before the lock & list_add and
by mixing all 64 online cpus with just 3 random cpus in the mask, but
was unsuccessful. Still, inspection shows a valid race, and the fix is
a extension of the existing protection window in the current code.

Cc: stable@kernel.org (v2.6.32 and later)
Reported-by: Peter Zijlstra
Signed-off-by: Milton Miller
Signed-off-by: Linus Torvalds

Milton Miller
2011-03-18 07:58:10 +0800

21 Jan, 2011

3 commits

2b1caf6ed Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel… ... Browse Code »

…/git/tip/linux-2.6-tip

* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
smp: Allow on_each_cpu() to be called while early_boot_irqs_disabled status to init/main.c
lockdep: Move early boot local IRQ enable/disable status to init/main.c

Linus Torvalds
2011-01-21 10:30:37 +0800
225c8e010 kernel/smp.c: consolidate writes in smp_call_function_interrupt() ... Browse Code »

We have to test the cpu mask in the interrupt handler before checking the
refs, otherwise we can start to follow an entry before its deleted and
find it partially initailzed for the next trip. Presently we also clear
the cpumask bit before executing the called function, which implies
getting write access to the line. After the function is called we then
decrement refs, and if they go to zero we then unlock the structure.

However, this implies getting write access to the call function data
before and after another the function is called. If we can assert that no
smp_call_function execution function is allowed to enable interrupts, then
we can move both writes to after the function is called, hopfully allowing
both writes with one cache line bounce.

On a 256 thread system with a kernel compiled for 1024 threads, the time
to execute testcase in the "smp_call_function_many race" changelog was
reduced by about 30-40ms out of about 545 ms.

I decided to keep this as WARN because its now a buggy function, even
though the stack trace is of no value -- a simple printk would give us the
information needed.

Raw data:

Without patch:
ipi_test startup took 1219366ns complete 539819014ns total 541038380ns
ipi_test startup took 1695754ns complete 543439872ns total 545135626ns
ipi_test startup took 7513568ns complete 539606362ns total 547119930ns
ipi_test startup took 13304064ns complete 533898562ns total 547202626ns
ipi_test startup took 8668192ns complete 544264074ns total 552932266ns
ipi_test startup took 4977626ns complete 548862684ns total 553840310ns
ipi_test startup took 2144486ns complete 541292318ns total 543436804ns
ipi_test startup took 21245824ns complete 530280180ns total 551526004ns

With patch:
ipi_test startup took 5961748ns complete 500859628ns total 506821376ns
ipi_test startup took 8975996ns complete 495098924ns total 504074920ns
ipi_test startup took 19797750ns complete 492204740ns total 512002490ns
ipi_test startup took 14824796ns complete 487495878ns total 502320674ns
ipi_test startup took 11514882ns complete 494439372ns total 505954254ns
ipi_test startup took 8288084ns complete 502570774ns total 510858858ns
ipi_test startup took 6789954ns complete 493388112ns total 500178066ns

#include
#include
#include /* sched clock */

#define ITERATIONS 100

static void do_nothing_ipi(void *dummy)
{
}

static void do_ipis(struct work_struct *dummy)
{
int i;

for (i = 0; i < ITERATIONS; i++)
smp_call_function(do_nothing_ipi, NULL, 1);

printk(KERN_DEBUG "cpu %d finished\n", smp_processor_id());
}

static struct work_struct work[NR_CPUS];

static int __init testcase_init(void)
{
int cpu;
u64 start, started, done;

start = local_clock();
for_each_online_cpu(cpu) {
INIT_WORK(&work[cpu], do_ipis);
schedule_work_on(cpu, &work[cpu]);
}
started = local_clock();
for_each_online_cpu(cpu)
flush_work(&work[cpu]);
done = local_clock();
pr_info("ipi_test startup took %lldns complete %lldns total %lldns\n",
started-start, done-started, done-start);

return 0;
}

static void __exit testcase_exit(void)
{
}

module_init(testcase_init)
module_exit(testcase_exit)
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Anton Blanchard");

Signed-off-by: Milton Miller
Cc: Anton Blanchard
Cc: Ingo Molnar
Cc: "Paul E. McKenney"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Milton Miller
2011-01-21 09:02:06 +0800
6dc198999 kernel/smp.c: fix smp_call_function_many() SMP race ... Browse Code »

I noticed a failure where we hit the following WARN_ON in
generic_smp_call_function_interrupt:

if (!cpumask_test_and_clear_cpu(cpu, data->cpumask))
continue;

data->csd.func(data->csd.info);

refs = atomic_dec_return(&data->refs);
WARN_ON(refs < 0); cpumask sees and
clears bit in cpumask
might be using old or new fn!
decrements refs below 0

set data->refs (too late!)

The important thing to note is since the interrupt handler walks a
potentially stale call_function.queue without any locking, then another
cpu can view the percpu *data structure at any time, even when the owner
is in the process of initialising it.

The following test case hits the WARN_ON 100% of the time on my PowerPC
box (having 128 threads does help :)

#include
#include

#define ITERATIONS 100

static void do_nothing_ipi(void *dummy)
{
}

static void do_ipis(struct work_struct *dummy)
{
int i;

for (i = 0; i < ITERATIONS; i++)
smp_call_function(do_nothing_ipi, NULL, 1);

printk(KERN_DEBUG "cpu %d finished\n", smp_processor_id());
}

static struct work_struct work[NR_CPUS];

static int __init testcase_init(void)
{
int cpu;

for_each_online_cpu(cpu) {
INIT_WORK(&work[cpu], do_ipis);
schedule_work_on(cpu, &work[cpu]);
}

return 0;
}

static void __exit testcase_exit(void)
{
}

module_init(testcase_init)
module_exit(testcase_exit)
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Anton Blanchard");

I tried to fix it by ordering the read and the write of ->cpumask and
->refs. In doing so I missed a critical case but Paul McKenney was able
to spot my bug thankfully :) To ensure we arent viewing previous
iterations the interrupt handler needs to read ->refs then ->cpumask then
->refs _again_.

Thanks to Milton Miller and Paul McKenney for helping to debug this issue.

[miltonm@bga.com: add WARN_ON and BUG_ON, remove extra read of refs before initial read of mask that doesn't help (also noted by Peter Zijlstra), adjust comments, hopefully clarify scenario ]
[miltonm@bga.com: remove excess tests]
Signed-off-by: Anton Blanchard
Signed-off-by: Milton Miller
Cc: Ingo Molnar
Cc: "Paul E. McKenney"
Cc: [2.6.32+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Anton Blanchard
2011-01-21 09:02:06 +0800

20 Jan, 2011

1 commit

bd924e8cb smp: Allow on_each_cpu() to be called while early_boot_irqs_disabled status to init/main.c ... Browse Code »

percpu may end up calling vfree() during early boot which in
turn may call on_each_cpu() for TLB flushes. The function of
on_each_cpu() can be done safely while IRQ is disabled during
early boot but it assumed that the function is always called
with local IRQ enabled which ended up enabling local IRQ
prematurely during boot and triggering a couple of warnings.

This patch updates on_each_cpu() and smp_call_function_many()
such on_each_cpu() can be used safely while
early_boot_irqs_disabled is set.

Signed-off-by: Tejun Heo
Acked-by: Peter Zijlstra
Acked-by: Pekka Enberg
Cc: Linus Torvalds
LKML-Reference:
Signed-off-by: Ingo Molnar
Reported-by: Ingo Molnar

Tejun Heo
2011-01-20 20:32:34 +0800

14 Jan, 2011

1 commit

351f8f8e6 kernel: clean up USE_GENERIC_SMP_HELPERS ... Browse Code »

For arch which needs USE_GENERIC_SMP_HELPERS, it has to select
USE_GENERIC_SMP_HELPERS, rather than leaving a choice to user, since they
don't provide their own implementions.

Also, move on_each_cpu() to kernel/smp.c, it is strange to put it in
kernel/softirq.c.

For arch which doesn't use USE_GENERIC_SMP_HELPERS, e.g. blackfin, only
on_each_cpu() is compiled.

Signed-off-by: Amerigo Wang
Cc: David Howells
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Yinghai Lu
Cc: Peter Zijlstra
Cc: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Amerigo Wang
2011-01-14 00:03:08 +0800

28 Oct, 2010

1 commit

3a5f65df5 Typedef SMP call function pointer ... Browse Code »

Typedef the pointer to the function to be called by smp_call_function() and
friends:

typedef void (*smp_call_func_t)(void *info);

as it is used in a fair number of places.

Signed-off-by: David Howells
cc: linux-arch@vger.kernel.org

David Howells
2010-10-28 00:28:36 +0800

10 Sep, 2010

1 commit

27c379f7f generic-ipi: Fix deadlock in __smp_call_function_single ... Browse Code »

Just got my 6 way machine to a state where cpu 0 is in an
endless loop within __smp_call_function_single.
All other cpus are idle.

The call trace on cpu 0 looks like this:

__smp_call_function_single
scheduler_tick
update_process_times
tick_sched_timer
__run_hrtimer
hrtimer_interrupt
clock_comparator_work
do_extint
ext_int_handler
----> timer irq
cpu_idle

__smp_call_function_single() got called from nohz_balancer_kick()
(inlined) with the remote cpu being 1, wait being 0 and the per
cpu variable remote_sched_softirq_cb (call_single_data) of the
current cpu (0).

Then it loops forever when it tries to grab the lock of the
call_single_data, since it is already locked and enqueued on cpu 0.

My theory how this could have happened: for some reason the
scheduler decided to call __smp_call_function_single() on it's own
cpu, and sends an IPI to itself. The interrupt stays pending
since IRQs are disabled. If then the hypervisor schedules the
cpu away it might happen that upon rescheduling both the IPI and
the timer IRQ are pending. If then interrupts are enabled again
it depends which one gets scheduled first.
If the timer interrupt gets delivered first we end up with the
local deadlock as seen in the calltrace above.

Let's make __smp_call_function_single() check if the target cpu is
the current cpu and execute the function immediately just like
smp_call_function_single does. That should prevent at least the
scenario described here.

It might also be that the scheduler is not supposed to call
__smp_call_function_single with the remote cpu being the current
cpu, but that is a different issue.

Signed-off-by: Heiko Carstens
Acked-by: Peter Zijlstra
Acked-by: Jens Axboe
Cc: Venkatesh Pallipadi
Cc: Suresh Siddha
LKML-Reference:
Signed-off-by: Ingo Molnar

Heiko Carstens
2010-09-10 22:48:40 +0800

28 May, 2010

1 commit

80b5184cc kernel/: convert cpu notifier to return encapsulate errno value ... Browse Code »

By the previous modification, the cpu notifier can return encapsulate
errno value. This converts the cpu notifiers for kernel/*.c

Signed-off-by: Akinobu Mita
Cc: Ingo Molnar
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Akinobu Mita
2010-05-28 00:12:48 +0800

30 Mar, 2010

1 commit

5a0e3ad6a include cleanup: Update gfp.h and slab.h includes to prepare for breaking implic… ... Browse Code »

…it slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

Tejun Heo
2010-03-30 21:02:32 +0800

18 Jan, 2010

1 commit

e03bcb686 generic-ipi: Optimize accesses by using DEFINE_PER_CPU_SHARED_ALIGNED for IPI data ... Browse Code »

The smp ipi data is passed around and given write access by
other cpus and should be separated from per-cpu data consumed by
this cpu.

Looking for hot lines, I saw call_function_data shared with
tick_cpu_sched.

Signed-off-by: Milton Miller
Acked-by: Anton Blanchard
Acked-by: Jens Axboe
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: : Nick Piggin
LKML-Reference:
Signed-off-by: Ingo Molnar

Milton Miller
2010-01-18 16:02:59 +0800

17 Jan, 2010

1 commit

af2422c42 smp_call_function_any(): pass the node value to cpumask_of_node() ... Browse Code »

The change in acpi_cpufreq to use smp_call_function_any causes a warning
when it is called since the function erroneously passes the cpu id to
cpumask_of_node rather than the node that the cpu is on. Fix this.

cpumask_of_node(3): node > nr_node_ids(1)
Pid: 1, comm: swapper Not tainted 2.6.33-rc3-00097-g2c1f189 #223
Call Trace:
[] cpumask_of_node+0x23/0x58
[] smp_call_function_any+0x65/0xfa
[] ? do_drv_read+0x0/0x2f
[] get_cur_val+0xb0/0x102
[] get_cur_freq_on_cpu+0x74/0xc5
[] acpi_cpufreq_cpu_init+0x417/0x515
[] ? __down_write+0xb/0xd
[] cpufreq_add_dev+0x278/0x922

Signed-off-by: David John
Cc: Suresh Siddha
Cc: Rusty Russell
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David John
2010-01-17 04:15:39 +0800

16 Dec, 2009

2 commits

8f0ddf91f Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kern… ... Browse Code »

…el/git/tip/linux-2.6-tip

* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (26 commits)
clockevents: Convert to raw_spinlock
clockevents: Make tick_device_lock static
debugobjects: Convert to raw_spinlocks
perf_event: Convert to raw_spinlock
hrtimers: Convert to raw_spinlocks
genirq: Convert irq_desc.lock to raw_spinlock
smp: Convert smplocks to raw_spinlocks
rtmutes: Convert rtmutex.lock to raw_spinlock
sched: Convert pi_lock to raw_spinlock
sched: Convert cpupri lock to raw_spinlock
sched: Convert rt_runtime_lock to raw_spinlock
sched: Convert rq->lock to raw_spinlock
plist: Make plist debugging raw_spinlock aware
bkl: Fixup core_lock fallout
locking: Cleanup the name space completely
locking: Further name space cleanups
alpha: Fix fallout from locking changes
locking: Implement new raw_spinlock
locking: Convert raw_rwlock functions to arch_rwlock
locking: Convert raw_rwlock to arch_rwlock
...

Linus Torvalds
2009-12-16 01:02:01 +0800
c0f68c2fa generic-ipi: cleanup for generic_smp_call_function_interrupt() ... Browse Code »

Use smp_processor_id() instead of get_cpu() and put_cpu() in
generic_smp_call_function_interrupt(), It's no need to disable preempt,
because we must call generic_smp_call_function_interrupt() with interrupts
disabled.

Signed-off-by: Xiao Guangrong
Acked-by: Ingo Molnar
Cc: Jens Axboe
Cc: Nick Piggin
Cc: Peter Zijlstra
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xiao Guangrong
2009-12-16 00:53:25 +0800

15 Dec, 2009

1 commit

9f5a5621e smp: Convert smplocks to raw_spinlocks ... Browse Code »

Convert locks which cannot be sleeping locks in preempt-rt to
raw_spinlocks.

Signed-off-by: Thomas Gleixner
Acked-by: Peter Zijlstra
Acked-by: Ingo Molnar

Thomas Gleixner
2009-12-15 06:55:33 +0800

18 Nov, 2009

1 commit

2ea6dec4a generic-ipi: Add smp_call_function_any() ... Browse Code »

Andrew points out that acpi-cpufreq uses cpumask_any, when it really
would prefer to use the same CPU if possible (to avoid an IPI). In
general, this seems a good idea to offer.

[ tglx: Documented selection preference and Inlined the UP case to
avoid the copy of smp_call_function_single() and the extra
EXPORT ]

Signed-off-by: Rusty Russell
Cc: Ingo Molnar
Cc: Venkatesh Pallipadi
Cc: Len Brown
Cc: Zhao Yakui
Cc: Dave Jones
Cc: Thomas Gleixner
Cc: Mike Galbraith
Cc: "Zhang, Yanmin"
Signed-off-by: Andrew Morton
Signed-off-by: Thomas Gleixner

Rusty Russell
2009-11-18 21:52:25 +0800

23 Oct, 2009

1 commit

72f279b25 generic-ipi: Fix misleading smp_call_function*() description ... Browse Code »

After commit:8969a5ede0f9e17da4b943712429aef2c9bcd82b
"generic-ipi: remove kmalloc()", wait = 0 can be guaranteed.

Signed-off-by: Sheng Yang
Cc: Peter Zijlstra
Cc: Jens Axboe
Cc: Nick Piggin
LKML-Reference:
Signed-off-by: Ingo Molnar

Sheng Yang
2009-10-23 19:51:45 +0800

24 Sep, 2009

1 commit

0748bd017 cpumask: remove arch_send_call_function_ipi ... Browse Code »

Now everyone is converted to arch_send_call_function_ipi_mask, remove
the shim and the #defines.

Signed-off-by: Rusty Russell

Rusty Russell
2009-09-24 08:04:47 +0800

23 Sep, 2009

1 commit

54fdade1c generic-ipi: make struct call_function_data lockless ... Browse Code »

This patch can remove spinlock from struct call_function_data, the
reasons are below:

1: add a new interface for cpumask named cpumask_test_and_clear_cpu(),
it can atomically test and clear specific cpu, we can use it instead
of cpumask_test_cpu() and cpumask_clear_cpu() and no need data->lock
to protect those in generic_smp_call_function_interrupt().

2: in smp_call_function_many(), after csd_lock() return, the current's
cfd_data is deleted from call_function list, so it not have race
between other cpus, then cfs_data is only used in
smp_call_function_many() that must disable preemption and not from
a hardware interrupthandler or from a bottom half handler to call,
only the correspond cpu can use it, so it not have race in current
cpu, no need cfs_data->lock to protect it.

3: after 1 and 2, cfs_data->lock is only use to protect cfs_data->refs in
generic_smp_call_function_interrupt(), so we can define cfs_data->refs
to atomic_t, and no need cfs_data->lock any more.

Signed-off-by: Xiao Guangrong
Cc: Ingo Molnar
Cc: Jens Axboe
Cc: Nick Piggin
Cc: Peter Zijlstra
Acked-by: Rusty Russell
[akpm@linux-foundation.org: use atomic_dec_return()]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xiao Guangrong
2009-09-23 22:39:28 +0800