01 May, 2013

2 commits

  • We sometimes use "struct call_single_data *data" and sometimes "struct
    call_single_data *csd". Use "csd" consistently.

    We sometimes use "struct call_function_data *data" and sometimes "struct
    call_function_data *cfd". Use "cfd" consistently.

    Also, avoid some 80-col layout tricks.

    Cc: Ingo Molnar
    Cc: Jens Axboe
    Cc: Peter Zijlstra
    Cc: Shaohua Li
    Cc: Shaohua Li
    Cc: Steven Rostedt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • csd_lock() uses assignment to data->flags rather than |=. That is not
    buggy at present because only one bit (CSD_FLAG_LOCK) is defined in
    call_single_data.flags.

    But it will become buggy if we later add another flag, so fix it now.

    Signed-off-by: liguang
    Cc: Peter Zijlstra
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    liguang
     

22 Feb, 2013

1 commit

  • I'm testing swapout workload in a two-socket Xeon machine. The workload
    has 10 threads, each thread sequentially accesses separate memory
    region. TLB flush overhead is very big in the workload. For each page,
    page reclaim need move it from active lru list and then unmap it. Both
    need a TLB flush. And this is a multthread workload, TLB flush happens
    in 10 CPUs. In X86, TLB flush uses generic smp_call)function. So this
    workload stress smp_call_function_many heavily.

    Without patch, perf shows:
    + 24.49% [k] generic_smp_call_function_interrupt
    - 21.72% [k] _raw_spin_lock
    - _raw_spin_lock
    + 79.80% __page_check_address
    + 6.42% generic_smp_call_function_interrupt
    + 3.31% get_swap_page
    + 2.37% free_pcppages_bulk
    + 1.75% handle_pte_fault
    + 1.54% put_super
    + 1.41% grab_super_passive
    + 1.36% __swap_duplicate
    + 0.68% blk_flush_plug_list
    + 0.62% swap_info_get
    + 6.55% [k] flush_tlb_func
    + 6.46% [k] smp_call_function_many
    + 5.09% [k] call_function_interrupt
    + 4.75% [k] default_send_IPI_mask_sequence_phys
    + 2.18% [k] find_next_bit

    swapout throughput is around 1300M/s.

    With the patch, perf shows:
    - 27.23% [k] _raw_spin_lock
    - _raw_spin_lock
    + 80.53% __page_check_address
    + 8.39% generic_smp_call_function_single_interrupt
    + 2.44% get_swap_page
    + 1.76% free_pcppages_bulk
    + 1.40% handle_pte_fault
    + 1.15% __swap_duplicate
    + 1.05% put_super
    + 0.98% grab_super_passive
    + 0.86% blk_flush_plug_list
    + 0.57% swap_info_get
    + 8.25% [k] default_send_IPI_mask_sequence_phys
    + 7.55% [k] call_function_interrupt
    + 7.47% [k] smp_call_function_many
    + 7.25% [k] flush_tlb_func
    + 3.81% [k] _raw_spin_lock_irqsave
    + 3.78% [k] generic_smp_call_function_single_interrupt

    swapout throughput is around 1400M/s. So there is around a 7%
    improvement, and total cpu utilization doesn't change.

    Without the patch, cfd_data is shared by all CPUs.
    generic_smp_call_function_interrupt does read/write cfd_data several times
    which will create a lot of cache ping-pong. With the patch, the data
    becomes per-cpu. The ping-pong is avoided. And from the perf data, this
    doesn't make call_single_queue lock contend.

    Next step is to remove generic_smp_call_function_interrupt() from arch
    code.

    Signed-off-by: Shaohua Li
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Steven Rostedt
    Cc: Jens Axboe
    Cc: Linus Torvalds
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shaohua Li
     

28 Jan, 2013

1 commit

  • I get the following warning every day with v3.7, once or
    twice a day:

    [ 2235.186027] WARNING: at /mnt/sda7/kernel/linux/arch/x86/kernel/apic/ipi.c:109 default_send_IPI_mask_logical+0x2f/0xb8()

    As explained by Linus as well:

    |
    | Once we've done the "list_add_rcu()" to add it to the
    | queue, we can have (another) IPI to the target CPU that can
    | now see it and clear the mask.
    |
    | So by the time we get to actually send the IPI, the mask might
    | have been cleared by another IPI.
    |

    This patch also fixes a system hang problem, if the data->cpumask
    gets cleared after passing this point:

    if (WARN_ONCE(!mask, "empty IPI mask"))
    return;

    then the problem in commit 83d349f35e1a ("x86: don't send an IPI to
    the empty set of CPU's") will happen again.

    Signed-off-by: Wang YanQing
    Acked-by: Linus Torvalds
    Acked-by: Jan Beulich
    Cc: Paul E. McKenney
    Cc: Andrew Morton
    Cc: peterz@infradead.org
    Cc: mina86@mina86.org
    Cc: srivatsa.bhat@linux.vnet.ibm.com
    Cc:
    Link: http://lkml.kernel.org/r/20130126075357.GA3205@udknight
    [ Tidied up the changelog and the comment in the code. ]
    Signed-off-by: Ingo Molnar

    Wang YanQing
     

05 Jun, 2012

1 commit

  • There is no user of those APIs anymore, just remove it.

    Signed-off-by: Yong Zhang
    Cc: ralf@linux-mips.org
    Cc: sshtylyov@mvista.com
    Cc: david.daney@cavium.com
    Cc: nikunj@linux.vnet.ibm.com
    Cc: paulmck@linux.vnet.ibm.com
    Cc: axboe@kernel.dk
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/1338275765-3217-11-git-send-email-yong.zhang0@gmail.com
    Acked-by: Srivatsa S. Bhat
    Acked-by: Peter Zijlstra
    Signed-off-by: Thomas Gleixner

    Yong Zhang
     

08 May, 2012

1 commit


04 May, 2012

1 commit

  • percpu areas are already allocated during boot for each possible cpu.
    percpu idle threads can be considered as an extension of the percpu areas,
    and allocate them for each possible cpu during boot.

    This will eliminate the need for workqueue based idle thread allocation.
    In future we can move the idle thread area into the percpu area too.

    [ tglx: Moved the loop into smpboot.c and added an error check when
    the init code failed to allocate an idle thread for a cpu which
    should be onlined ]

    Signed-off-by: Suresh Siddha
    Cc: Peter Zijlstra
    Cc: Rusty Russell
    Cc: Paul E. McKenney
    Cc: Srivatsa S. Bhat
    Cc: Tejun Heo
    Cc: David Rientjes
    Cc: venki@google.com
    Link: http://lkml.kernel.org/r/1334966930.28674.245.camel@sbsiddha-desk.sc.intel.com
    Signed-off-by: Thomas Gleixner

    Suresh Siddha
     

29 Mar, 2012

2 commits

  • Add the on_each_cpu_cond() function that wraps on_each_cpu_mask() and
    calculates the cpumask of cpus to IPI by calling a function supplied as a
    parameter in order to determine whether to IPI each specific cpu.

    The function works around allocation failure of cpumask variable in
    CONFIG_CPUMASK_OFFSTACK=y by itereating over cpus sending an IPI a time
    via smp_call_function_single().

    The function is useful since it allows to seperate the specific code that
    decided in each case whether to IPI a specific cpu for a specific request
    from the common boilerplate code of handling creating the mask, handling
    failures etc.

    [akpm@linux-foundation.org: s/gfpflags/gfp_flags/]
    [akpm@linux-foundation.org: avoid double-evaluation of `info' (per Michal), parenthesise evaluation of `cond_func']
    [akpm@linux-foundation.org: s/CPU/CPUs, use all 80 cols in comment]
    Signed-off-by: Gilad Ben-Yossef
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Acked-by: Peter Zijlstra
    Cc: Frederic Weisbecker
    Cc: Russell King
    Cc: Pekka Enberg
    Cc: Matt Mackall
    Cc: Sasha Levin
    Cc: Rik van Riel
    Cc: Andi Kleen
    Cc: Alexander Viro
    Cc: Avi Kivity
    Acked-by: Michal Nazarewicz
    Cc: Kosaki Motohiro
    Cc: Milton Miller
    Reviewed-by: "Srivatsa S. Bhat"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gilad Ben-Yossef
     
  • We have lots of infrastructure in place to partition multi-core systems
    such that we have a group of CPUs that are dedicated to specific task:
    cgroups, scheduler and interrupt affinity, and cpuisol= boot parameter.
    Still, kernel code will at times interrupt all CPUs in the system via IPIs
    for various needs. These IPIs are useful and cannot be avoided
    altogether, but in certain cases it is possible to interrupt only specific
    CPUs that have useful work to do and not the entire system.

    This patch set, inspired by discussions with Peter Zijlstra and Frederic
    Weisbecker when testing the nohz task patch set, is a first stab at trying
    to explore doing this by locating the places where such global IPI calls
    are being made and turning the global IPI into an IPI for a specific group
    of CPUs. The purpose of the patch set is to get feedback if this is the
    right way to go for dealing with this issue and indeed, if the issue is
    even worth dealing with at all. Based on the feedback from this patch set
    I plan to offer further patches that address similar issue in other code
    paths.

    This patch creates an on_each_cpu_mask() and on_each_cpu_cond()
    infrastructure API (the former derived from existing arch specific
    versions in Tile and Arm) and uses them to turn several global IPI
    invocation to per CPU group invocations.

    Core kernel:

    on_each_cpu_mask() calls a function on processors specified by cpumask,
    which may or may not include the local processor.

    You must not call this function with disabled interrupts or from a
    hardware interrupt handler or from a bottom half handler.

    arch/arm:

    Note that the generic version is a little different then the Arm one:

    1. It has the mask as first parameter
    2. It calls the function on the calling CPU with interrupts disabled,
    but this should be OK since the function is called on the other CPUs
    with interrupts disabled anyway.

    arch/tile:

    The API is the same as the tile private one, but the generic version
    also calls the function on the with interrupts disabled in UP case

    This is OK since the function is called on the other CPUs
    with interrupts disabled.

    Signed-off-by: Gilad Ben-Yossef
    Reviewed-by: Christoph Lameter
    Acked-by: Chris Metcalf
    Acked-by: Peter Zijlstra
    Cc: Frederic Weisbecker
    Cc: Russell King
    Cc: Pekka Enberg
    Cc: Matt Mackall
    Cc: Rik van Riel
    Cc: Andi Kleen
    Cc: Sasha Levin
    Cc: Mel Gorman
    Cc: Alexander Viro
    Cc: Avi Kivity
    Acked-by: Michal Nazarewicz
    Cc: Kosaki Motohiro
    Cc: Milton Miller
    Cc: Russell King
    Acked-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gilad Ben-Yossef
     

31 Oct, 2011

1 commit

  • The changed files were only including linux/module.h for the
    EXPORT_SYMBOL infrastructure, and nothing else. Revector them
    onto the isolated export header for faster compile times.

    Nothing to see here but a whole lot of instances of:

    -#include
    +#include

    This commit is only changing the kernel dir; next targets
    will probably be mm, fs, the arch dirs, etc.

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

17 Jun, 2011

1 commit

  • There is a problem that kdump(2nd kernel) sometimes hangs up due
    to a pending IPI from 1st kernel. Kernel panic occurs because IPI
    comes before call_single_queue is initialized.

    To fix the crash, rename init_call_single_data() to call_function_init()
    and call it in start_kernel() so that call_single_queue can be
    initialized before enabling interrupts.

    The details of the crash are:

    (1) 2nd kernel boots up

    (2) A pending IPI from 1st kernel comes when irqs are first enabled
    in start_kernel().

    (3) Kernel tries to handle the interrupt, but call_single_queue
    is not initialized yet at this point. As a result, in the
    generic_smp_call_function_single_interrupt(), NULL pointer
    dereference occurs when list_replace_init() tries to access
    &q->list.next.

    Therefore this patch changes the name of init_call_single_data()
    to call_function_init() and calls it before local_irq_enable()
    in start_kernel().

    Signed-off-by: Takao Indoh
    Reviewed-by: WANG Cong
    Acked-by: Neil Horman
    Acked-by: Vivek Goyal
    Acked-by: Peter Zijlstra
    Cc: Milton Miller
    Cc: Jens Axboe
    Cc: Paul E. McKenney
    Cc: kexec@lists.infradead.org
    Link: http://lkml.kernel.org/r/D6CBEE2F420741indou.takao@jp.fujitsu.com
    Signed-off-by: Ingo Molnar

    Takao Indoh
     

23 Mar, 2011

1 commit

  • Move setup_nr_cpu_ids(), smp_init() and some other SMP boot parameter
    setup functions from init/main.c to kenrel/smp.c, saves some #ifdef
    CONFIG_SMP.

    Signed-off-by: WANG Cong
    Cc: Rakib Mullick
    Cc: David Howells
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Tejun Heo
    Cc: Arnd Bergmann
    Cc: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Amerigo Wang
     

18 Mar, 2011

4 commits

  • Use the newly added smp_call_func_t in smp_call_function_interrupt for
    the func variable, and make the comment above the WARN more assertive
    and explicit. Also, func is a function pointer and does not need an
    offset, so use %pf not %pS.

    Signed-off-by: Milton Miller
    Signed-off-by: Linus Torvalds

    Milton Miller
     
  • Mike Galbraith reported finding a lockup ("perma-spin bug") where the
    cpumask passed to smp_call_function_many was cleared by other cpu(s)
    while a cpu was preparing its call_data block, resulting in no cpu to
    clear the last ref and unlock the block.

    Having cpus clear their bit asynchronously could be useful on a mask of
    cpus that might have a translation context, or cpus that need a push to
    complete an rcu window.

    Instead of adding a BUG_ON and requiring yet another cpumask copy, just
    detect the race and handle it.

    Note: arch_send_call_function_ipi_mask must still handle an empty
    cpumask because the data block is globally visible before the that arch
    callback is made. And (obviously) there are no guarantees to which cpus
    are notified if the mask is changed during the call; only cpus that were
    online and had their mask bit set during the whole call are guaranteed
    to be called.

    Reported-by: Mike Galbraith
    Reported-by: Jan Beulich
    Acked-by: Jan Beulich
    Cc: stable@kernel.org
    Signed-off-by: Milton Miller
    Signed-off-by: Linus Torvalds

    Milton Miller
     
  • Paul McKenney's review pointed out two problems with the barriers in the
    2.6.38 update to the smp call function many code.

    First, a barrier that would force the func and info members of data to
    be visible before their consumption in the interrupt handler was
    missing. This can be solved by adding a smp_wmb between setting the
    func and info members and setting setting the cpumask; this will pair
    with the existing and required smp_rmb ordering the cpumask read before
    the read of refs. This placement avoids the need a second smp_rmb in
    the interrupt handler which would be executed on each of the N cpus
    executing the call request. (I was thinking this barrier was present
    but was not).

    Second, the previous write to refs (establishing the zero that we the
    interrupt handler was testing from all cpus) was performed by a third
    party cpu. This would invoke transitivity which, as a recient or
    concurrent addition to memory-barriers.txt now explicitly states, would
    require a full smp_mb().

    However, we know the cpumask will only be set by one cpu (the data
    owner) and any preivous iteration of the mask would have cleared by the
    reading cpu. By redundantly writing refs to 0 on the owning cpu before
    the smp_wmb, the write to refs will follow the same path as the writes
    that set the cpumask, which in turn allows us to keep the barrier in the
    interrupt handler a smp_rmb instead of promoting it to a smp_mb (which
    will be be executed by N cpus for each of the possible M elements on the
    list).

    I moved and expanded the comment about our (ab)use of the rcu list
    primitives for the concurrent walk earlier into this function. I
    considered moving the first two paragraphs to the queue list head and
    lock, but felt it would have been too disconected from the code.

    Cc: Paul McKinney
    Cc: stable@kernel.org (2.6.32 and later)
    Signed-off-by: Milton Miller
    Signed-off-by: Linus Torvalds

    Milton Miller
     
  • Peter pointed out there was nothing preventing the list_del_rcu in
    smp_call_function_interrupt from running before the list_add_rcu in
    smp_call_function_many.

    Fix this by not setting refs until we have gotten the lock for the list.
    Take advantage of the wmb in list_add_rcu to save an explicit additional
    one.

    I tried to force this race with a udelay before the lock & list_add and
    by mixing all 64 online cpus with just 3 random cpus in the mask, but
    was unsuccessful. Still, inspection shows a valid race, and the fix is
    a extension of the existing protection window in the current code.

    Cc: stable@kernel.org (v2.6.32 and later)
    Reported-by: Peter Zijlstra
    Signed-off-by: Milton Miller
    Signed-off-by: Linus Torvalds

    Milton Miller
     

21 Jan, 2011

3 commits

  • …/git/tip/linux-2.6-tip

    * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    smp: Allow on_each_cpu() to be called while early_boot_irqs_disabled status to init/main.c
    lockdep: Move early boot local IRQ enable/disable status to init/main.c

    Linus Torvalds
     
  • We have to test the cpu mask in the interrupt handler before checking the
    refs, otherwise we can start to follow an entry before its deleted and
    find it partially initailzed for the next trip. Presently we also clear
    the cpumask bit before executing the called function, which implies
    getting write access to the line. After the function is called we then
    decrement refs, and if they go to zero we then unlock the structure.

    However, this implies getting write access to the call function data
    before and after another the function is called. If we can assert that no
    smp_call_function execution function is allowed to enable interrupts, then
    we can move both writes to after the function is called, hopfully allowing
    both writes with one cache line bounce.

    On a 256 thread system with a kernel compiled for 1024 threads, the time
    to execute testcase in the "smp_call_function_many race" changelog was
    reduced by about 30-40ms out of about 545 ms.

    I decided to keep this as WARN because its now a buggy function, even
    though the stack trace is of no value -- a simple printk would give us the
    information needed.

    Raw data:

    Without patch:
    ipi_test startup took 1219366ns complete 539819014ns total 541038380ns
    ipi_test startup took 1695754ns complete 543439872ns total 545135626ns
    ipi_test startup took 7513568ns complete 539606362ns total 547119930ns
    ipi_test startup took 13304064ns complete 533898562ns total 547202626ns
    ipi_test startup took 8668192ns complete 544264074ns total 552932266ns
    ipi_test startup took 4977626ns complete 548862684ns total 553840310ns
    ipi_test startup took 2144486ns complete 541292318ns total 543436804ns
    ipi_test startup took 21245824ns complete 530280180ns total 551526004ns

    With patch:
    ipi_test startup took 5961748ns complete 500859628ns total 506821376ns
    ipi_test startup took 8975996ns complete 495098924ns total 504074920ns
    ipi_test startup took 19797750ns complete 492204740ns total 512002490ns
    ipi_test startup took 14824796ns complete 487495878ns total 502320674ns
    ipi_test startup took 11514882ns complete 494439372ns total 505954254ns
    ipi_test startup took 8288084ns complete 502570774ns total 510858858ns
    ipi_test startup took 6789954ns complete 493388112ns total 500178066ns

    #include
    #include
    #include /* sched clock */

    #define ITERATIONS 100

    static void do_nothing_ipi(void *dummy)
    {
    }

    static void do_ipis(struct work_struct *dummy)
    {
    int i;

    for (i = 0; i < ITERATIONS; i++)
    smp_call_function(do_nothing_ipi, NULL, 1);

    printk(KERN_DEBUG "cpu %d finished\n", smp_processor_id());
    }

    static struct work_struct work[NR_CPUS];

    static int __init testcase_init(void)
    {
    int cpu;
    u64 start, started, done;

    start = local_clock();
    for_each_online_cpu(cpu) {
    INIT_WORK(&work[cpu], do_ipis);
    schedule_work_on(cpu, &work[cpu]);
    }
    started = local_clock();
    for_each_online_cpu(cpu)
    flush_work(&work[cpu]);
    done = local_clock();
    pr_info("ipi_test startup took %lldns complete %lldns total %lldns\n",
    started-start, done-started, done-start);

    return 0;
    }

    static void __exit testcase_exit(void)
    {
    }

    module_init(testcase_init)
    module_exit(testcase_exit)
    MODULE_LICENSE("GPL");
    MODULE_AUTHOR("Anton Blanchard");

    Signed-off-by: Milton Miller
    Cc: Anton Blanchard
    Cc: Ingo Molnar
    Cc: "Paul E. McKenney"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Milton Miller
     
  • I noticed a failure where we hit the following WARN_ON in
    generic_smp_call_function_interrupt:

    if (!cpumask_test_and_clear_cpu(cpu, data->cpumask))
    continue;

    data->csd.func(data->csd.info);

    refs = atomic_dec_return(&data->refs);
    WARN_ON(refs < 0); cpumask sees and
    clears bit in cpumask
    might be using old or new fn!
    decrements refs below 0

    set data->refs (too late!)

    The important thing to note is since the interrupt handler walks a
    potentially stale call_function.queue without any locking, then another
    cpu can view the percpu *data structure at any time, even when the owner
    is in the process of initialising it.

    The following test case hits the WARN_ON 100% of the time on my PowerPC
    box (having 128 threads does help :)

    #include
    #include

    #define ITERATIONS 100

    static void do_nothing_ipi(void *dummy)
    {
    }

    static void do_ipis(struct work_struct *dummy)
    {
    int i;

    for (i = 0; i < ITERATIONS; i++)
    smp_call_function(do_nothing_ipi, NULL, 1);

    printk(KERN_DEBUG "cpu %d finished\n", smp_processor_id());
    }

    static struct work_struct work[NR_CPUS];

    static int __init testcase_init(void)
    {
    int cpu;

    for_each_online_cpu(cpu) {
    INIT_WORK(&work[cpu], do_ipis);
    schedule_work_on(cpu, &work[cpu]);
    }

    return 0;
    }

    static void __exit testcase_exit(void)
    {
    }

    module_init(testcase_init)
    module_exit(testcase_exit)
    MODULE_LICENSE("GPL");
    MODULE_AUTHOR("Anton Blanchard");

    I tried to fix it by ordering the read and the write of ->cpumask and
    ->refs. In doing so I missed a critical case but Paul McKenney was able
    to spot my bug thankfully :) To ensure we arent viewing previous
    iterations the interrupt handler needs to read ->refs then ->cpumask then
    ->refs _again_.

    Thanks to Milton Miller and Paul McKenney for helping to debug this issue.

    [miltonm@bga.com: add WARN_ON and BUG_ON, remove extra read of refs before initial read of mask that doesn't help (also noted by Peter Zijlstra), adjust comments, hopefully clarify scenario ]
    [miltonm@bga.com: remove excess tests]
    Signed-off-by: Anton Blanchard
    Signed-off-by: Milton Miller
    Cc: Ingo Molnar
    Cc: "Paul E. McKenney"
    Cc: [2.6.32+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Anton Blanchard
     

20 Jan, 2011

1 commit

  • percpu may end up calling vfree() during early boot which in
    turn may call on_each_cpu() for TLB flushes. The function of
    on_each_cpu() can be done safely while IRQ is disabled during
    early boot but it assumed that the function is always called
    with local IRQ enabled which ended up enabling local IRQ
    prematurely during boot and triggering a couple of warnings.

    This patch updates on_each_cpu() and smp_call_function_many()
    such on_each_cpu() can be used safely while
    early_boot_irqs_disabled is set.

    Signed-off-by: Tejun Heo
    Acked-by: Peter Zijlstra
    Acked-by: Pekka Enberg
    Cc: Linus Torvalds
    LKML-Reference:
    Signed-off-by: Ingo Molnar
    Reported-by: Ingo Molnar

    Tejun Heo
     

14 Jan, 2011

1 commit

  • For arch which needs USE_GENERIC_SMP_HELPERS, it has to select
    USE_GENERIC_SMP_HELPERS, rather than leaving a choice to user, since they
    don't provide their own implementions.

    Also, move on_each_cpu() to kernel/smp.c, it is strange to put it in
    kernel/softirq.c.

    For arch which doesn't use USE_GENERIC_SMP_HELPERS, e.g. blackfin, only
    on_each_cpu() is compiled.

    Signed-off-by: Amerigo Wang
    Cc: David Howells
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Yinghai Lu
    Cc: Peter Zijlstra
    Cc: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Amerigo Wang
     

28 Oct, 2010

1 commit

  • Typedef the pointer to the function to be called by smp_call_function() and
    friends:

    typedef void (*smp_call_func_t)(void *info);

    as it is used in a fair number of places.

    Signed-off-by: David Howells
    cc: linux-arch@vger.kernel.org

    David Howells
     

10 Sep, 2010

1 commit

  • Just got my 6 way machine to a state where cpu 0 is in an
    endless loop within __smp_call_function_single.
    All other cpus are idle.

    The call trace on cpu 0 looks like this:

    __smp_call_function_single
    scheduler_tick
    update_process_times
    tick_sched_timer
    __run_hrtimer
    hrtimer_interrupt
    clock_comparator_work
    do_extint
    ext_int_handler
    ----> timer irq
    cpu_idle

    __smp_call_function_single() got called from nohz_balancer_kick()
    (inlined) with the remote cpu being 1, wait being 0 and the per
    cpu variable remote_sched_softirq_cb (call_single_data) of the
    current cpu (0).

    Then it loops forever when it tries to grab the lock of the
    call_single_data, since it is already locked and enqueued on cpu 0.

    My theory how this could have happened: for some reason the
    scheduler decided to call __smp_call_function_single() on it's own
    cpu, and sends an IPI to itself. The interrupt stays pending
    since IRQs are disabled. If then the hypervisor schedules the
    cpu away it might happen that upon rescheduling both the IPI and
    the timer IRQ are pending. If then interrupts are enabled again
    it depends which one gets scheduled first.
    If the timer interrupt gets delivered first we end up with the
    local deadlock as seen in the calltrace above.

    Let's make __smp_call_function_single() check if the target cpu is
    the current cpu and execute the function immediately just like
    smp_call_function_single does. That should prevent at least the
    scenario described here.

    It might also be that the scheduler is not supposed to call
    __smp_call_function_single with the remote cpu being the current
    cpu, but that is a different issue.

    Signed-off-by: Heiko Carstens
    Acked-by: Peter Zijlstra
    Acked-by: Jens Axboe
    Cc: Venkatesh Pallipadi
    Cc: Suresh Siddha
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Heiko Carstens
     

28 May, 2010

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

18 Jan, 2010

1 commit


17 Jan, 2010

1 commit

  • The change in acpi_cpufreq to use smp_call_function_any causes a warning
    when it is called since the function erroneously passes the cpu id to
    cpumask_of_node rather than the node that the cpu is on. Fix this.

    cpumask_of_node(3): node > nr_node_ids(1)
    Pid: 1, comm: swapper Not tainted 2.6.33-rc3-00097-g2c1f189 #223
    Call Trace:
    [] cpumask_of_node+0x23/0x58
    [] smp_call_function_any+0x65/0xfa
    [] ? do_drv_read+0x0/0x2f
    [] get_cur_val+0xb0/0x102
    [] get_cur_freq_on_cpu+0x74/0xc5
    [] acpi_cpufreq_cpu_init+0x417/0x515
    [] ? __down_write+0xb/0xd
    [] cpufreq_add_dev+0x278/0x922

    Signed-off-by: David John
    Cc: Suresh Siddha
    Cc: Rusty Russell
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David John
     

16 Dec, 2009

2 commits

  • …el/git/tip/linux-2.6-tip

    * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (26 commits)
    clockevents: Convert to raw_spinlock
    clockevents: Make tick_device_lock static
    debugobjects: Convert to raw_spinlocks
    perf_event: Convert to raw_spinlock
    hrtimers: Convert to raw_spinlocks
    genirq: Convert irq_desc.lock to raw_spinlock
    smp: Convert smplocks to raw_spinlocks
    rtmutes: Convert rtmutex.lock to raw_spinlock
    sched: Convert pi_lock to raw_spinlock
    sched: Convert cpupri lock to raw_spinlock
    sched: Convert rt_runtime_lock to raw_spinlock
    sched: Convert rq->lock to raw_spinlock
    plist: Make plist debugging raw_spinlock aware
    bkl: Fixup core_lock fallout
    locking: Cleanup the name space completely
    locking: Further name space cleanups
    alpha: Fix fallout from locking changes
    locking: Implement new raw_spinlock
    locking: Convert raw_rwlock functions to arch_rwlock
    locking: Convert raw_rwlock to arch_rwlock
    ...

    Linus Torvalds
     
  • Use smp_processor_id() instead of get_cpu() and put_cpu() in
    generic_smp_call_function_interrupt(), It's no need to disable preempt,
    because we must call generic_smp_call_function_interrupt() with interrupts
    disabled.

    Signed-off-by: Xiao Guangrong
    Acked-by: Ingo Molnar
    Cc: Jens Axboe
    Cc: Nick Piggin
    Cc: Peter Zijlstra
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xiao Guangrong
     

15 Dec, 2009

1 commit


18 Nov, 2009

1 commit

  • Andrew points out that acpi-cpufreq uses cpumask_any, when it really
    would prefer to use the same CPU if possible (to avoid an IPI). In
    general, this seems a good idea to offer.

    [ tglx: Documented selection preference and Inlined the UP case to
    avoid the copy of smp_call_function_single() and the extra
    EXPORT ]

    Signed-off-by: Rusty Russell
    Cc: Ingo Molnar
    Cc: Venkatesh Pallipadi
    Cc: Len Brown
    Cc: Zhao Yakui
    Cc: Dave Jones
    Cc: Thomas Gleixner
    Cc: Mike Galbraith
    Cc: "Zhang, Yanmin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Rusty Russell
     

23 Oct, 2009

1 commit


24 Sep, 2009

1 commit


23 Sep, 2009

1 commit

  • This patch can remove spinlock from struct call_function_data, the
    reasons are below:

    1: add a new interface for cpumask named cpumask_test_and_clear_cpu(),
    it can atomically test and clear specific cpu, we can use it instead
    of cpumask_test_cpu() and cpumask_clear_cpu() and no need data->lock
    to protect those in generic_smp_call_function_interrupt().

    2: in smp_call_function_many(), after csd_lock() return, the current's
    cfd_data is deleted from call_function list, so it not have race
    between other cpus, then cfs_data is only used in
    smp_call_function_many() that must disable preemption and not from
    a hardware interrupthandler or from a bottom half handler to call,
    only the correspond cpu can use it, so it not have race in current
    cpu, no need cfs_data->lock to protect it.

    3: after 1 and 2, cfs_data->lock is only use to protect cfs_data->refs in
    generic_smp_call_function_interrupt(), so we can define cfs_data->refs
    to atomic_t, and no need cfs_data->lock any more.

    Signed-off-by: Xiao Guangrong
    Cc: Ingo Molnar
    Cc: Jens Axboe
    Cc: Nick Piggin
    Cc: Peter Zijlstra
    Acked-by: Rusty Russell
    [akpm@linux-foundation.org: use atomic_dec_return()]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xiao Guangrong
     

27 Aug, 2009

1 commit


22 Aug, 2009

1 commit


08 Aug, 2009

1 commit

  • Use CONFIG_HOTPLUG_CPU, not CONFIG_CPU_HOTPLUG

    When hot-unpluging a cpu, it will leak memory allocated at cpu hotplug,
    but only if CPUMASK_OFFSTACK=y, which is default to n.

    The bug was introduced by 8969a5ede0f9e17da4b943712429aef2c9bcd82b
    ("generic-ipi: remove kmalloc()").

    Signed-off-by: Xiao Guangrong
    Cc: Ingo Molnar
    Cc: Jens Axboe
    Cc: Nick Piggin
    Cc: Peter Zijlstra
    Cc: Rusty Russell
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xiao Guangrong
     

09 Jun, 2009

1 commit


13 Mar, 2009

1 commit


25 Feb, 2009

1 commit

  • Andrew pointed out that there's some small amount of
    style rot in kernel/smp.c.

    Clean it up.

    Reported-by: Andrew Morton
    Cc: Nick Piggin
    Cc: Jens Axboe
    Cc: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Ingo Molnar