15 Oct, 2014

1 commit

  • Pull percpu consistent-ops changes from Tejun Heo:
    "Way back, before the current percpu allocator was implemented, static
    and dynamic percpu memory areas were allocated and handled separately
    and had their own accessors. The distinction has been gone for many
    years now; however, the now duplicate two sets of accessors remained
    with the pointer based ones - this_cpu_*() - evolving various other
    operations over time. During the process, we also accumulated other
    inconsistent operations.

    This pull request contains Christoph's patches to clean up the
    duplicate accessor situation. __get_cpu_var() uses are replaced with
    with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr().

    Unfortunately, the former sometimes is tricky thanks to C being a bit
    messy with the distinction between lvalues and pointers, which led to
    a rather ugly solution for cpumask_var_t involving the introduction of
    this_cpu_cpumask_var_ptr().

    This converts most of the uses but not all. Christoph will follow up
    with the remaining conversions in this merge window and hopefully
    remove the obsolete accessors"

    * 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits)
    irqchip: Properly fetch the per cpu offset
    percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix
    ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write.
    percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t
    Revert "powerpc: Replace __get_cpu_var uses"
    percpu: Remove __this_cpu_ptr
    clocksource: Replace __this_cpu_ptr with raw_cpu_ptr
    sparc: Replace __get_cpu_var uses
    avr32: Replace __get_cpu_var with __this_cpu_write
    blackfin: Replace __get_cpu_var uses
    tile: Use this_cpu_ptr() for hardware counters
    tile: Replace __get_cpu_var uses
    powerpc: Replace __get_cpu_var uses
    alpha: Replace __get_cpu_var
    ia64: Replace __get_cpu_var uses
    s390: cio driver &__get_cpu_var replacements
    s390: Replace __get_cpu_var uses
    mips: Replace __get_cpu_var uses
    MIPS: Replace __get_cpu_var uses in FPU emulator.
    arm: Replace __this_cpu_ptr with raw_cpu_ptr
    ...

    Linus Torvalds
     

19 Sep, 2014

1 commit

  • Currently kick_all_cpus_sync() can break non-polling idle cpus
    thru IPI interrupts.

    But sometimes we need to break the polling idle cpus immediately
    to reselect the suitable c-state, also for non-idle cpus, we need
    to do nothing if we try to wake up them.

    Here adding one new function wake_up_all_idle_cpus() to let all cpus
    out of idle based on function wake_up_if_idle().

    Signed-off-by: Chuansheng Liu
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: daniel.lezcano@linaro.org
    Cc: rjw@rjwysocki.net
    Cc: linux-pm@vger.kernel.org
    Cc: changcheng.liu@intel.com
    Cc: xiaoming.wang@intel.com
    Cc: souvik.k.chakravarty@intel.com
    Cc: luto@amacapital.net
    Cc: Andrew Morton
    Cc: Christoph Hellwig
    Cc: Frederic Weisbecker
    Cc: Geert Uytterhoeven
    Cc: Jan Kara
    Cc: Jens Axboe
    Cc: Jens Axboe
    Cc: Linus Torvalds
    Cc: Michal Hocko
    Cc: Paul Gortmaker
    Cc: Roman Gushchin
    Cc: Srivatsa S. Bhat
    Link: http://lkml.kernel.org/r/1409815075-4180-2-git-send-email-chuansheng.liu@intel.com
    Signed-off-by: Ingo Molnar

    Chuansheng Liu
     

27 Aug, 2014

1 commit


07 Aug, 2014

1 commit

  • The rarely-executed memry-allocation-failed callback path generates a
    WARN_ON_ONCE() when smp_call_function_single() succeeds. Presumably
    it's supposed to warn on failures.

    Signed-off-by: Sasha Levin
    Cc: Christoph Lameter
    Cc: Gilad Ben-Yossef
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Tejun Heo
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sasha Levin
     

16 Jul, 2014

1 commit


24 Jun, 2014

1 commit

  • There is a race between the CPU offline code (within stop-machine) and
    the smp-call-function code, which can lead to getting IPIs on the
    outgoing CPU, *after* it has gone offline.

    Specifically, this can happen when using
    smp_call_function_single_async() to send the IPI, since this API allows
    sending asynchronous IPIs from IRQ disabled contexts. The exact race
    condition is described below.

    During CPU offline, in stop-machine, we don't enforce any rule in the
    _DISABLE_IRQ stage, regarding the order in which the outgoing CPU and
    the other CPUs disable their local interrupts. Due to this, we can
    encounter a situation in which an IPI is sent by one of the other CPUs
    to the outgoing CPU (while it is *still* online), but the outgoing CPU
    ends up noticing it only *after* it has gone offline.

    CPU 1 CPU 2
    (Online CPU) (CPU going offline)

    Enter _PREPARE stage Enter _PREPARE stage

    Enter _DISABLE_IRQ stage

    =
    Got a device interrupt, and | Didn't notice the IPI
    the interrupt handler sent an | since interrupts were
    IPI to CPU 2 using | disabled on this CPU.
    smp_call_function_single_async() |
    =

    Enter _DISABLE_IRQ stage

    Enter _RUN stage Enter _RUN stage

    =
    Busy loop with interrupts | Invoke take_cpu_down()
    disabled. | and take CPU 2 offline
    =

    Enter _EXIT stage Enter _EXIT stage

    Re-enable interrupts Re-enable interrupts

    The pending IPI is noted
    immediately, but alas,
    the CPU is offline at
    this point.

    This of course, makes the smp-call-function IPI handler code running on
    CPU 2 unhappy and it complains about "receiving an IPI on an offline
    CPU".

    One real example of the scenario on CPU 1 is the block layer's
    complete-request call-path:

    __blk_complete_request() [interrupt-handler]
    raise_blk_irq()
    smp_call_function_single_async()

    However, if we look closely, the block layer does check that the target
    CPU is online before firing the IPI. So in this case, it is actually
    the unfortunate ordering/timing of events in the stop-machine phase that
    leads to receiving IPIs after the target CPU has gone offline.

    In reality, getting a late IPI on an offline CPU is not too bad by
    itself (this can happen even due to hardware latencies in IPI
    send-receive). It is a bug only if the target CPU really went offline
    without executing all the callbacks queued on its list. (Note that a
    CPU is free to execute its pending smp-call-function callbacks in a
    batch, without waiting for the corresponding IPIs to arrive for each one
    of those callbacks).

    So, fixing this issue can be broken up into two parts:

    1. Ensure that a CPU goes offline only after executing all the
    callbacks queued on it.

    2. Modify the warning condition in the smp-call-function IPI handler
    code such that it warns only if an offline CPU got an IPI *and* that
    CPU had gone offline with callbacks still pending in its queue.

    Achieving part 1 is straight-forward - just flush (execute) all the
    queued callbacks on the outgoing CPU in the CPU_DYING stage[1],
    including those callbacks for which the source CPU's IPIs might not have
    been received on the outgoing CPU yet. Once we do this, an IPI that
    arrives late on the CPU going offline (either due to the race mentioned
    above, or due to hardware latencies) will be completely harmless, since
    the outgoing CPU would have executed all the queued callbacks before
    going offline.

    Overall, this fix (parts 1 and 2 put together) additionally guarantees
    that we will see a warning only when the *IPI-sender code* is buggy -
    that is, if it queues the callback _after_ the target CPU has gone
    offline.

    [1]. The CPU_DYING part needs a little more explanation: by the time we
    execute the CPU_DYING notifier callbacks, the CPU would have already
    been marked offline. But we want to flush out the pending callbacks at
    this stage, ignoring the fact that the CPU is offline. So restructure
    the IPI handler code so that we can by-pass the "is-cpu-offline?" check
    in this particular case. (Of course, the right solution here is to fix
    CPU hotplug to mark the CPU offline _after_ invoking the CPU_DYING
    notifiers, but this requires a lot of audit to ensure that this change
    doesn't break any existing code; hence lets go with the solution
    proposed above until that is done).

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Srivatsa S. Bhat
    Suggested-by: Frederic Weisbecker
    Cc: "Paul E. McKenney"
    Cc: Borislav Petkov
    Cc: Christoph Hellwig
    Cc: Frederic Weisbecker
    Cc: Gautham R Shenoy
    Cc: Ingo Molnar
    Cc: Mel Gorman
    Cc: Mike Galbraith
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Rafael J. Wysocki
    Cc: Rik van Riel
    Cc: Rusty Russell
    Cc: Steven Rostedt
    Cc: Tejun Heo
    Cc: Thomas Gleixner
    Tested-by: Sachin Kamat
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Srivatsa S. Bhat
     

16 Jun, 2014

1 commit

  • irq work currently only supports local callbacks. However its code
    is mostly ready to run remote callbacks and we have some potential user.

    The full nohz subsystem currently open codes its own remote irq work
    on top of the scheduler ipi when it wants a CPU to reevaluate its next
    tick. However this ad hoc solution bloats the scheduler IPI.

    Lets just extend the irq work subsystem to support remote queuing on top
    of the generic SMP IPI to handle this kind of user. This shouldn't add
    noticeable overhead.

    Suggested-by: Peter Zijlstra
    Acked-by: Peter Zijlstra
    Cc: Andrew Morton
    Cc: Eric Dumazet
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Viresh Kumar
    Signed-off-by: Frederic Weisbecker

    Frederic Weisbecker
     

07 Jun, 2014

1 commit

  • There is a longstanding problem related to CPU hotplug which causes IPIs
    to be delivered to offline CPUs, and the smp-call-function IPI handler
    code prints out a warning whenever this is detected. Every once in a
    while this (usually harmless) warning gets reported on LKML, but so far
    it has not been completely fixed. Usually the solution involves finding
    out the IPI sender and fixing it by adding appropriate synchronization
    with CPU hotplug.

    However, while going through one such internal bug reports, I found that
    there is a significant bug in the receiver side itself (more
    specifically, in stop-machine) that can lead to this problem even when
    the sender code is perfectly fine. This patchset fixes that
    synchronization problem in the CPU hotplug stop-machine code.

    Patch 1 adds some additional debug code to the smp-call-function
    framework, to help debug such issues easily.

    Patch 2 modifies the stop-machine code to ensure that any IPIs that were
    sent while the target CPU was online, would be noticed and handled by
    that CPU without fail before it goes offline. Thus, this avoids
    scenarios where IPIs are received on offline CPUs (as long as the sender
    uses proper hotplug synchronization).

    In fact, I debugged the problem by using Patch 1, and found that the
    payload of the IPI was always the block layer's trigger_softirq()
    function. But I was not able to find anything wrong with the block
    layer code. That's when I started looking at the stop-machine code and
    realized that there is a race-window which makes the IPI _receiver_ the
    culprit, not the sender. Patch 2 fixes that race and hence this should
    put an end to most of the hard-to-debug IPI-to-offline-CPU issues.

    This patch (of 2):

    Today the smp-call-function code just prints a warning if we get an IPI
    on an offline CPU. This info is sufficient to let us know that
    something went wrong, but often it is very hard to debug exactly who
    sent the IPI and why, from this info alone.

    In most cases, we get the warning about the IPI to an offline CPU,
    immediately after the CPU going offline comes out of the stop-machine
    phase and reenables interrupts. Since all online CPUs participate in
    stop-machine, the information regarding the sender of the IPI is already
    lost by the time we exit the stop-machine loop. So even if we dump the
    stack on each CPU at this point, we won't find anything useful since all
    of them will show the stack-trace of the stopper thread. So we need a
    better way to figure out who sent the IPI and why.

    To achieve this, when we detect an IPI targeted to an offline CPU, loop
    through the call-single-data linked list and print out the payload
    (i.e., the name of the function which was supposed to be executed by the
    target CPU). This would give us an insight as to who might have sent
    the IPI and help us debug this further.

    [akpm@linux-foundation.org: correctly suppress warning output on second and later occurrences]
    Signed-off-by: Srivatsa S. Bhat
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Tejun Heo
    Cc: Rusty Russell
    Cc: Frederic Weisbecker
    Cc: Christoph Hellwig
    Cc: Mel Gorman
    Cc: Rik van Riel
    Cc: Borislav Petkov
    Cc: Steven Rostedt
    Cc: Mike Galbraith
    Cc: Gautham R Shenoy
    Cc: "Paul E. McKenney"
    Cc: Oleg Nesterov
    Cc: Rafael J. Wysocki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Srivatsa S. Bhat
     

25 Feb, 2014

6 commits

  • The name __smp_call_function_single() doesn't tell much about the
    properties of this function, especially when compared to
    smp_call_function_single().

    The comments above the implementation are also misleading. The main
    point of this function is actually not to be able to embed the csd
    in an object. This is actually a requirement that result from the
    purpose of this function which is to raise an IPI asynchronously.

    As such it can be called with interrupts disabled. And this feature
    comes at the cost of the caller who then needs to serialize the
    IPIs on this csd.

    Lets rename the function and enhance the comments so that they reflect
    these properties.

    Suggested-by: Christoph Hellwig
    Cc: Andrew Morton
    Cc: Christoph Hellwig
    Cc: Ingo Molnar
    Cc: Jan Kara
    Cc: Jens Axboe
    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Jens Axboe

    Frederic Weisbecker
     
  • The main point of calling __smp_call_function_single() is to send
    an IPI in a pure asynchronous way. By embedding a csd in an object,
    a caller can send the IPI without waiting for a previous one to complete
    as is required by smp_call_function_single() for example. As such,
    sending this kind of IPI can be safe even when irqs are disabled.

    This flexibility comes at the expense of the caller who then needs to
    synchronize the csd lifecycle by himself and make sure that IPIs on a
    single csd are serialized.

    This is how __smp_call_function_single() works when wait = 0 and this
    usecase is relevant.

    Now there don't seem to be any usecase with wait = 1 that can't be
    covered by smp_call_function_single() instead, which is safer. Lets look
    at the two possible scenario:

    1) The user calls __smp_call_function_single(wait = 1) on a csd embedded
    in an object. It looks like a nice and convenient pattern at the first
    sight because we can then retrieve the object from the IPI handler easily.

    But actually it is a waste of memory space in the object since the csd
    can be allocated from the stack by smp_call_function_single(wait = 1)
    and the object can be passed an the IPI argument.

    Besides that, embedding the csd in an object is more error prone
    because the caller must take care of the serialization of the IPIs
    for this csd.

    2) The user calls __smp_call_function_single(wait = 1) on a csd that
    is allocated on the stack. It's ok but smp_call_function_single()
    can do it as well and it already takes care of the allocation on the
    stack. Again it's more simple and less error prone.

    Therefore, using the underscore prepend API version with wait = 1
    is a bad pattern and a sign that the caller can do safer and more
    simple.

    There was a single user of that which has just been converted.
    So lets remove this option to discourage further users.

    Cc: Andrew Morton
    Cc: Christoph Hellwig
    Cc: Ingo Molnar
    Cc: Jan Kara
    Cc: Jens Axboe
    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Jens Axboe

    Frederic Weisbecker
     
  • Move this function closer to __smp_call_function_single(). These functions
    have very similar behavior and should be displayed in the same block
    for clarity.

    Reviewed-by: Jan Kara
    Cc: Andrew Morton
    Cc: Christoph Hellwig
    Cc: Ingo Molnar
    Cc: Jan Kara
    Cc: Jens Axboe
    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Jens Axboe

    Frederic Weisbecker
     
  • __smp_call_function_single() and smp_call_function_single() share some
    code that can be factorized: execute inline when the target is local,
    check if the target is online, lock the csd, call generic_exec_single().

    Lets move the common parts to generic_exec_single().

    Reviewed-by: Jan Kara
    Cc: Andrew Morton
    Cc: Christoph Hellwig
    Cc: Ingo Molnar
    Cc: Jan Kara
    Cc: Jens Axboe
    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Jens Axboe

    Frederic Weisbecker
     
  • Align __smp_call_function_single() with smp_call_function_single() so
    that it also checks whether requested cpu is still online.

    Signed-off-by: Jan Kara
    Cc: Andrew Morton
    Cc: Christoph Hellwig
    Cc: Ingo Molnar
    Cc: Jens Axboe
    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Jens Axboe

    Jan Kara
     
  • The IPI function llist iteration is open coded. Lets simplify this
    with using an llist iterator.

    Also we want to keep the iteration safe against possible
    csd.llist->next value reuse from the IPI handler. At least the block
    subsystem used to do such things so lets stay careful and use
    llist_for_each_entry_safe().

    Signed-off-by: Jan Kara
    Cc: Andrew Morton
    Cc: Christoph Hellwig
    Cc: Ingo Molnar
    Cc: Jens Axboe
    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Jens Axboe

    Jan Kara
     

31 Jan, 2014

2 commits

  • After commit 9a46ad6d6df3 ("smp: make smp_call_function_many() use logic
    similar to smp_call_function_single()"), cfd->cpumask is accessed only
    in smp_call_function_many(). So there is no more need to copy it into
    cfd->cpumask_ipi before putting csd into the list. The cpumask_ipi
    field is obsolete and can be removed.

    Signed-off-by: Roman Gushchin
    Cc: Ingo Molnar
    Cc: Christoph Hellwig
    Cc: Wang YanQing
    Cc: Xie XiuQi
    Cc: Shaohua Li
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roman Gushchin
     
  • Make smp_call_function_single and friends more efficient by using a
    lockless list.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Jan Kara
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

15 Nov, 2013

2 commits


14 Nov, 2013

1 commit

  • Pull block IO core updates from Jens Axboe:
    "This is the pull request for the core changes in the block layer for
    3.13. It contains:

    - The new blk-mq request interface.

    This is a new and more scalable queueing model that marries the
    best part of the request based interface we currently have (which
    is fully featured, but scales poorly) and the bio based "interface"
    which the new drivers for high IOPS devices end up using because
    it's much faster than the request based one.

    The bio interface has no block layer support, since it taps into
    the stack much earlier. This means that drivers end up having to
    implement a lot of functionality on their own, like tagging,
    timeout handling, requeue, etc. The blk-mq interface provides all
    these. Some drivers even provide a switch to select bio or rq and
    has code to handle both, since things like merging only works in
    the rq model and hence is faster for some workloads. This is a
    huge mess. Conversion of these drivers nets us a substantial code
    reduction. Initial results on converting SCSI to this model even
    shows an 8x improvement on single queue devices. So while the
    model was intended to work on the newer multiqueue devices, it has
    substantial improvements for "classic" hardware as well. This code
    has gone through extensive testing and development, it's now ready
    to go. A pull request is coming to convert virtio-blk to this
    model will be will be coming as well, with more drivers scheduled
    for 3.14 conversion.

    - Two blktrace fixes from Jan and Chen Gang.

    - A plug merge fix from Alireza Haghdoost.

    - Conversion of __get_cpu_var() from Christoph Lameter.

    - Fix for sector_div() with 64-bit divider from Geert Uytterhoeven.

    - A fix for a race between request completion and the timeout
    handling from Jeff Moyer. This is what caused the merge conflict
    with blk-mq/core, in case you are looking at that.

    - A dm stacking fix from Mike Snitzer.

    - A code consolidation fix and duplicated code removal from Kent
    Overstreet.

    - A handful of block bug fixes from Mikulas Patocka, fixing a loop
    crash and memory corruption on blk cg.

    - Elevator switch bug fix from Tomoki Sekiyama.

    A heads-up that I had to rebase this branch. Initially the immutable
    bio_vecs had been queued up for inclusion, but a week later, it became
    clear that it wasn't fully cooked yet. So the decision was made to
    pull this out and postpone it until 3.14. It was a straight forward
    rebase, just pruning out the immutable series and the later fixes of
    problems with it. The rest of the patches applied directly and no
    further changes were made"

    * 'for-3.13/core' of git://git.kernel.dk/linux-block: (31 commits)
    block: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO
    block: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO
    block: Do not call sector_div() with a 64-bit divisor
    kernel: trace: blktrace: remove redundent memcpy() in compat_blk_trace_setup()
    block: Consolidate duplicated bio_trim() implementations
    block: Use rw_copy_check_uvector()
    block: Enable sysfs nomerge control for I/O requests in the plug list
    block: properly stack underlying max_segment_size to DM device
    elevator: acquire q->sysfs_lock in elevator_change()
    elevator: Fix a race in elevator switching and md device initialization
    block: Replace __get_cpu_var uses
    bdi: test bdi_init failure
    block: fix a probe argument to blk_register_region
    loop: fix crash if blk_alloc_queue fails
    blk-core: Fix memory corruption if blkcg_init_queue fails
    block: fix race between request completion and timeout handling
    blktrace: Send BLK_TN_PROCESS events to all running traces
    blk-mq: don't disallow request merges for req->special being set
    blk-mq: mq plug list breakage
    blk-mq: fix for flush deadlock
    ...

    Linus Torvalds
     

25 Oct, 2013

2 commits


01 Oct, 2013

1 commit

  • Turn it into (for example):

    [ 0.073380] x86: Booting SMP configuration:
    [ 0.074005] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7
    [ 0.603005] .... node #1, CPUs: #8 #9 #10 #11 #12 #13 #14 #15
    [ 1.200005] .... node #2, CPUs: #16 #17 #18 #19 #20 #21 #22 #23
    [ 1.796005] .... node #3, CPUs: #24 #25 #26 #27 #28 #29 #30 #31
    [ 2.393005] .... node #4, CPUs: #32 #33 #34 #35 #36 #37 #38 #39
    [ 2.996005] .... node #5, CPUs: #40 #41 #42 #43 #44 #45 #46 #47
    [ 3.600005] .... node #6, CPUs: #48 #49 #50 #51 #52 #53 #54 #55
    [ 4.202005] .... node #7, CPUs: #56 #57 #58 #59 #60 #61 #62 #63
    [ 4.811005] .... node #8, CPUs: #64 #65 #66 #67 #68 #69 #70 #71
    [ 5.421006] .... node #9, CPUs: #72 #73 #74 #75 #76 #77 #78 #79
    [ 6.032005] .... node #10, CPUs: #80 #81 #82 #83 #84 #85 #86 #87
    [ 6.648006] .... node #11, CPUs: #88 #89 #90 #91 #92 #93 #94 #95
    [ 7.262005] .... node #12, CPUs: #96 #97 #98 #99 #100 #101 #102 #103
    [ 7.865005] .... node #13, CPUs: #104 #105 #106 #107 #108 #109 #110 #111
    [ 8.466005] .... node #14, CPUs: #112 #113 #114 #115 #116 #117 #118 #119
    [ 9.073006] .... node #15, CPUs: #120 #121 #122 #123 #124 #125 #126 #127
    [ 9.679901] x86: Booted up 16 nodes, 128 CPUs

    and drop useless elements.

    Change num_digits() to hpa's division-avoiding, cell-phone-typed
    version which he went at great lengths and pains to submit on a
    Saturday evening.

    Signed-off-by: Borislav Petkov
    Cc: huawei.libin@huawei.com
    Cc: wangyijing@huawei.com
    Cc: fenghua.yu@intel.com
    Cc: guohanjun@huawei.com
    Cc: paul.gortmaker@windriver.com
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20130930095624.GB16383@pd.tnic
    Signed-off-by: Ingo Molnar

    Borislav Petkov
     

12 Sep, 2013

2 commits

  • As in commit f21afc25f9ed ("smp.h: Use local_irq_{save,restore}() in
    !SMP version of on_each_cpu()"), we don't want to enable irqs if they
    are not already enabled.

    I don't know of any bugs currently caused by this unconditional
    local_irq_enable(), but I want to use this function in MIPS/OCTEON early
    boot (when we have early_boot_irqs_disabled). This also makes this
    function have similar semantics to on_each_cpu() which is good in
    itself.

    Signed-off-by: David Daney
    Cc: Gilad Ben-Yossef
    Cc: Christoph Lameter
    Cc: Chris Metcalf
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Daney
     
  • When failure occurs in hotplug_cfd(), need release related resources, or
    will cause memory leak.

    Signed-off-by: Chen Gang
    Acked-by: Wang YanQing
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chen Gang
     

04 Sep, 2013

1 commit

  • Pull scheduler changes from Ingo Molnar:
    "Various optimizations, cleanups and smaller fixes - no major changes
    in scheduler behavior"

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched/fair: Fix the sd_parent_degenerate() code
    sched/fair: Rework and comment the group_imb code
    sched/fair: Optimize find_busiest_queue()
    sched/fair: Make group power more consistent
    sched/fair: Remove duplicate load_per_task computations
    sched/fair: Shrink sg_lb_stats and play memset games
    sched: Clean-up struct sd_lb_stat
    sched: Factor out code to should_we_balance()
    sched: Remove one division operation in find_busiest_queue()
    sched/cputime: Use this_cpu_add() in task_group_account_field()
    cpumask: Fix cpumask leak in partition_sched_domains()
    sched/x86: Optimize switch_mm() for multi-threaded workloads
    generic-ipi: Kill unnecessary variable - csd_flags
    numa: Mark __node_set() as __always_inline
    sched/fair: Cleanup: remove duplicate variable declaration
    sched/__wake_up_sync_key(): Fix nr_exclusive tasks which lead to WF_SYNC clearing

    Linus Torvalds
     

19 Aug, 2013

1 commit


31 Jul, 2013

1 commit

  • After commit 8969a5ede0f9e17da4b943712429aef2c9bcd82b
    ("generic-ipi: remove kmalloc()"), wait = 0 can be guaranteed,
    and all callsites of generic_exec_single() do an unconditional
    csd_lock() now.

    So csd_flags is unnecessary now. Remove it.

    Signed-off-by: Xie XiuQi
    Signed-off-by: Peter Zijlstra
    Cc: Oleg Nesterov
    Cc: Linus Torvalds
    Cc: Nick Piggin
    Cc: Jens Axboe
    Cc: "Paul E. McKenney"
    Cc: Rusty Russell
    Link: http://lkml.kernel.org/r/51F72DA1.7010401@huawei.com
    Signed-off-by: Ingo Molnar

    Xie XiuQi
     

15 Jul, 2013

1 commit

  • The __cpuinit type of throwaway sections might have made sense
    some time ago when RAM was more constrained, but now the savings
    do not offset the cost and complications. For example, the fix in
    commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
    is a good example of the nasty type of bugs that can be created
    with improper use of the various __init prefixes.

    After a discussion on LKML[1] it was decided that cpuinit should go
    the way of devinit and be phased out. Once all the users are gone,
    we can then finally remove the macros themselves from linux/init.h.

    This removes all the uses of the __cpuinit macros from C files in
    the core kernel directories (kernel, init, lib, mm, and include)
    that don't really have a specific maintainer.

    [1] https://lkml.org/lkml/2013/5/20/589

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

01 May, 2013

2 commits

  • We sometimes use "struct call_single_data *data" and sometimes "struct
    call_single_data *csd". Use "csd" consistently.

    We sometimes use "struct call_function_data *data" and sometimes "struct
    call_function_data *cfd". Use "cfd" consistently.

    Also, avoid some 80-col layout tricks.

    Cc: Ingo Molnar
    Cc: Jens Axboe
    Cc: Peter Zijlstra
    Cc: Shaohua Li
    Cc: Shaohua Li
    Cc: Steven Rostedt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • csd_lock() uses assignment to data->flags rather than |=. That is not
    buggy at present because only one bit (CSD_FLAG_LOCK) is defined in
    call_single_data.flags.

    But it will become buggy if we later add another flag, so fix it now.

    Signed-off-by: liguang
    Cc: Peter Zijlstra
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    liguang
     

22 Feb, 2013

1 commit

  • I'm testing swapout workload in a two-socket Xeon machine. The workload
    has 10 threads, each thread sequentially accesses separate memory
    region. TLB flush overhead is very big in the workload. For each page,
    page reclaim need move it from active lru list and then unmap it. Both
    need a TLB flush. And this is a multthread workload, TLB flush happens
    in 10 CPUs. In X86, TLB flush uses generic smp_call)function. So this
    workload stress smp_call_function_many heavily.

    Without patch, perf shows:
    + 24.49% [k] generic_smp_call_function_interrupt
    - 21.72% [k] _raw_spin_lock
    - _raw_spin_lock
    + 79.80% __page_check_address
    + 6.42% generic_smp_call_function_interrupt
    + 3.31% get_swap_page
    + 2.37% free_pcppages_bulk
    + 1.75% handle_pte_fault
    + 1.54% put_super
    + 1.41% grab_super_passive
    + 1.36% __swap_duplicate
    + 0.68% blk_flush_plug_list
    + 0.62% swap_info_get
    + 6.55% [k] flush_tlb_func
    + 6.46% [k] smp_call_function_many
    + 5.09% [k] call_function_interrupt
    + 4.75% [k] default_send_IPI_mask_sequence_phys
    + 2.18% [k] find_next_bit

    swapout throughput is around 1300M/s.

    With the patch, perf shows:
    - 27.23% [k] _raw_spin_lock
    - _raw_spin_lock
    + 80.53% __page_check_address
    + 8.39% generic_smp_call_function_single_interrupt
    + 2.44% get_swap_page
    + 1.76% free_pcppages_bulk
    + 1.40% handle_pte_fault
    + 1.15% __swap_duplicate
    + 1.05% put_super
    + 0.98% grab_super_passive
    + 0.86% blk_flush_plug_list
    + 0.57% swap_info_get
    + 8.25% [k] default_send_IPI_mask_sequence_phys
    + 7.55% [k] call_function_interrupt
    + 7.47% [k] smp_call_function_many
    + 7.25% [k] flush_tlb_func
    + 3.81% [k] _raw_spin_lock_irqsave
    + 3.78% [k] generic_smp_call_function_single_interrupt

    swapout throughput is around 1400M/s. So there is around a 7%
    improvement, and total cpu utilization doesn't change.

    Without the patch, cfd_data is shared by all CPUs.
    generic_smp_call_function_interrupt does read/write cfd_data several times
    which will create a lot of cache ping-pong. With the patch, the data
    becomes per-cpu. The ping-pong is avoided. And from the perf data, this
    doesn't make call_single_queue lock contend.

    Next step is to remove generic_smp_call_function_interrupt() from arch
    code.

    Signed-off-by: Shaohua Li
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Steven Rostedt
    Cc: Jens Axboe
    Cc: Linus Torvalds
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shaohua Li
     

28 Jan, 2013

1 commit

  • I get the following warning every day with v3.7, once or
    twice a day:

    [ 2235.186027] WARNING: at /mnt/sda7/kernel/linux/arch/x86/kernel/apic/ipi.c:109 default_send_IPI_mask_logical+0x2f/0xb8()

    As explained by Linus as well:

    |
    | Once we've done the "list_add_rcu()" to add it to the
    | queue, we can have (another) IPI to the target CPU that can
    | now see it and clear the mask.
    |
    | So by the time we get to actually send the IPI, the mask might
    | have been cleared by another IPI.
    |

    This patch also fixes a system hang problem, if the data->cpumask
    gets cleared after passing this point:

    if (WARN_ONCE(!mask, "empty IPI mask"))
    return;

    then the problem in commit 83d349f35e1a ("x86: don't send an IPI to
    the empty set of CPU's") will happen again.

    Signed-off-by: Wang YanQing
    Acked-by: Linus Torvalds
    Acked-by: Jan Beulich
    Cc: Paul E. McKenney
    Cc: Andrew Morton
    Cc: peterz@infradead.org
    Cc: mina86@mina86.org
    Cc: srivatsa.bhat@linux.vnet.ibm.com
    Cc:
    Link: http://lkml.kernel.org/r/20130126075357.GA3205@udknight
    [ Tidied up the changelog and the comment in the code. ]
    Signed-off-by: Ingo Molnar

    Wang YanQing
     

05 Jun, 2012

1 commit

  • There is no user of those APIs anymore, just remove it.

    Signed-off-by: Yong Zhang
    Cc: ralf@linux-mips.org
    Cc: sshtylyov@mvista.com
    Cc: david.daney@cavium.com
    Cc: nikunj@linux.vnet.ibm.com
    Cc: paulmck@linux.vnet.ibm.com
    Cc: axboe@kernel.dk
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/1338275765-3217-11-git-send-email-yong.zhang0@gmail.com
    Acked-by: Srivatsa S. Bhat
    Acked-by: Peter Zijlstra
    Signed-off-by: Thomas Gleixner

    Yong Zhang
     

08 May, 2012

1 commit


04 May, 2012

1 commit

  • percpu areas are already allocated during boot for each possible cpu.
    percpu idle threads can be considered as an extension of the percpu areas,
    and allocate them for each possible cpu during boot.

    This will eliminate the need for workqueue based idle thread allocation.
    In future we can move the idle thread area into the percpu area too.

    [ tglx: Moved the loop into smpboot.c and added an error check when
    the init code failed to allocate an idle thread for a cpu which
    should be onlined ]

    Signed-off-by: Suresh Siddha
    Cc: Peter Zijlstra
    Cc: Rusty Russell
    Cc: Paul E. McKenney
    Cc: Srivatsa S. Bhat
    Cc: Tejun Heo
    Cc: David Rientjes
    Cc: venki@google.com
    Link: http://lkml.kernel.org/r/1334966930.28674.245.camel@sbsiddha-desk.sc.intel.com
    Signed-off-by: Thomas Gleixner

    Suresh Siddha
     

29 Mar, 2012

2 commits

  • Add the on_each_cpu_cond() function that wraps on_each_cpu_mask() and
    calculates the cpumask of cpus to IPI by calling a function supplied as a
    parameter in order to determine whether to IPI each specific cpu.

    The function works around allocation failure of cpumask variable in
    CONFIG_CPUMASK_OFFSTACK=y by itereating over cpus sending an IPI a time
    via smp_call_function_single().

    The function is useful since it allows to seperate the specific code that
    decided in each case whether to IPI a specific cpu for a specific request
    from the common boilerplate code of handling creating the mask, handling
    failures etc.

    [akpm@linux-foundation.org: s/gfpflags/gfp_flags/]
    [akpm@linux-foundation.org: avoid double-evaluation of `info' (per Michal), parenthesise evaluation of `cond_func']
    [akpm@linux-foundation.org: s/CPU/CPUs, use all 80 cols in comment]
    Signed-off-by: Gilad Ben-Yossef
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Acked-by: Peter Zijlstra
    Cc: Frederic Weisbecker
    Cc: Russell King
    Cc: Pekka Enberg
    Cc: Matt Mackall
    Cc: Sasha Levin
    Cc: Rik van Riel
    Cc: Andi Kleen
    Cc: Alexander Viro
    Cc: Avi Kivity
    Acked-by: Michal Nazarewicz
    Cc: Kosaki Motohiro
    Cc: Milton Miller
    Reviewed-by: "Srivatsa S. Bhat"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gilad Ben-Yossef
     
  • We have lots of infrastructure in place to partition multi-core systems
    such that we have a group of CPUs that are dedicated to specific task:
    cgroups, scheduler and interrupt affinity, and cpuisol= boot parameter.
    Still, kernel code will at times interrupt all CPUs in the system via IPIs
    for various needs. These IPIs are useful and cannot be avoided
    altogether, but in certain cases it is possible to interrupt only specific
    CPUs that have useful work to do and not the entire system.

    This patch set, inspired by discussions with Peter Zijlstra and Frederic
    Weisbecker when testing the nohz task patch set, is a first stab at trying
    to explore doing this by locating the places where such global IPI calls
    are being made and turning the global IPI into an IPI for a specific group
    of CPUs. The purpose of the patch set is to get feedback if this is the
    right way to go for dealing with this issue and indeed, if the issue is
    even worth dealing with at all. Based on the feedback from this patch set
    I plan to offer further patches that address similar issue in other code
    paths.

    This patch creates an on_each_cpu_mask() and on_each_cpu_cond()
    infrastructure API (the former derived from existing arch specific
    versions in Tile and Arm) and uses them to turn several global IPI
    invocation to per CPU group invocations.

    Core kernel:

    on_each_cpu_mask() calls a function on processors specified by cpumask,
    which may or may not include the local processor.

    You must not call this function with disabled interrupts or from a
    hardware interrupt handler or from a bottom half handler.

    arch/arm:

    Note that the generic version is a little different then the Arm one:

    1. It has the mask as first parameter
    2. It calls the function on the calling CPU with interrupts disabled,
    but this should be OK since the function is called on the other CPUs
    with interrupts disabled anyway.

    arch/tile:

    The API is the same as the tile private one, but the generic version
    also calls the function on the with interrupts disabled in UP case

    This is OK since the function is called on the other CPUs
    with interrupts disabled.

    Signed-off-by: Gilad Ben-Yossef
    Reviewed-by: Christoph Lameter
    Acked-by: Chris Metcalf
    Acked-by: Peter Zijlstra
    Cc: Frederic Weisbecker
    Cc: Russell King
    Cc: Pekka Enberg
    Cc: Matt Mackall
    Cc: Rik van Riel
    Cc: Andi Kleen
    Cc: Sasha Levin
    Cc: Mel Gorman
    Cc: Alexander Viro
    Cc: Avi Kivity
    Acked-by: Michal Nazarewicz
    Cc: Kosaki Motohiro
    Cc: Milton Miller
    Cc: Russell King
    Acked-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gilad Ben-Yossef
     

31 Oct, 2011

1 commit

  • The changed files were only including linux/module.h for the
    EXPORT_SYMBOL infrastructure, and nothing else. Revector them
    onto the isolated export header for faster compile times.

    Nothing to see here but a whole lot of instances of:

    -#include
    +#include

    This commit is only changing the kernel dir; next targets
    will probably be mm, fs, the arch dirs, etc.

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

17 Jun, 2011

1 commit

  • There is a problem that kdump(2nd kernel) sometimes hangs up due
    to a pending IPI from 1st kernel. Kernel panic occurs because IPI
    comes before call_single_queue is initialized.

    To fix the crash, rename init_call_single_data() to call_function_init()
    and call it in start_kernel() so that call_single_queue can be
    initialized before enabling interrupts.

    The details of the crash are:

    (1) 2nd kernel boots up

    (2) A pending IPI from 1st kernel comes when irqs are first enabled
    in start_kernel().

    (3) Kernel tries to handle the interrupt, but call_single_queue
    is not initialized yet at this point. As a result, in the
    generic_smp_call_function_single_interrupt(), NULL pointer
    dereference occurs when list_replace_init() tries to access
    &q->list.next.

    Therefore this patch changes the name of init_call_single_data()
    to call_function_init() and calls it before local_irq_enable()
    in start_kernel().

    Signed-off-by: Takao Indoh
    Reviewed-by: WANG Cong
    Acked-by: Neil Horman
    Acked-by: Vivek Goyal
    Acked-by: Peter Zijlstra
    Cc: Milton Miller
    Cc: Jens Axboe
    Cc: Paul E. McKenney
    Cc: kexec@lists.infradead.org
    Link: http://lkml.kernel.org/r/D6CBEE2F420741indou.takao@jp.fujitsu.com
    Signed-off-by: Ingo Molnar

    Takao Indoh
     

23 Mar, 2011

1 commit

  • Move setup_nr_cpu_ids(), smp_init() and some other SMP boot parameter
    setup functions from init/main.c to kenrel/smp.c, saves some #ifdef
    CONFIG_SMP.

    Signed-off-by: WANG Cong
    Cc: Rakib Mullick
    Cc: David Howells
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Tejun Heo
    Cc: Arnd Bergmann
    Cc: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Amerigo Wang