21 Feb, 2008

1 commit


07 Feb, 2008

1 commit

  • calibrate_delay() must be __cpuinit, not __{dev,}init.

    I've verified that this is correct for all users.

    While doing the latter, I also did the following cleanups:
    - remove pointless additional prototypes in C files
    - ensure all users #include

    This fixes the following section mismatches with CONFIG_HOTPLUG=n,
    CONFIG_HOTPLUG_CPU=y:

    WARNING: vmlinux.o(.text+0x1128d): Section mismatch: reference to .init.text.1:calibrate_delay (between 'check_cx686_slop' and 'set_cx86_reorder')
    WARNING: vmlinux.o(.text+0x25102): Section mismatch: reference to .init.text.1:calibrate_delay (between 'smp_callin' and 'cpu_coregroup_map')

    Signed-off-by: Adrian Bunk
    Cc: Ivan Kokshaysky
    Cc: Richard Henderson
    Cc: "Luck, Tony"
    Cc: Ralf Baechle
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: "David S. Miller"
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Christian Zankel
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     

12 Dec, 2007

1 commit


05 Dec, 2007

1 commit


27 Oct, 2007

1 commit


17 Oct, 2007

1 commit

  • Convert cpu_sibling_map from a static array sized by NR_CPUS to a per_cpu
    variable. This saves sizeof(cpumask_t) * NR unused cpus. Access is mostly
    from startup and CPU HOTPLUG functions.

    Signed-off-by: Mike Travis
    Cc: Andi Kleen
    Cc: Christoph Lameter
    Cc: "Siddha, Suresh B"
    Cc: "David S. Miller"
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: "Luck, Tony"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Travis
     

05 Oct, 2007

1 commit


09 Aug, 2007

1 commit

  • Every time a cpu is added via hotplug, we allocate the per-cpu MONDO
    queues but we never free them up. Freeing isn't easy since the first
    cpu gets this memory from bootmem.

    Therefore, the simplest thing to do to fix this bug is to allocate the
    queues for all possible cpus at boot time.

    Signed-off-by: David S. Miller

    David S. Miller
     

16 Jul, 2007

7 commits

  • Signed-off-by: David S. Miller

    David S. Miller
     
  • When we hot-plug in new cpus, the core_id and proc_id of existing
    cpus can change. So in order to set the cpu groups correctly we
    need to clear the maps out completely first.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Signed-off-by: David S. Miller

    David S. Miller
     
  • Take a page from the powerpc folks and just calculate the
    delay factor directly.

    Since frequency scaling chips use a system-tick register,
    the value is going to be the same system-wide.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • With the move of ldom_startcpu_cpuid() into smp.c some other
    things need to follow along:

    1) smp.c is not a driver so we can't use "PFX" macro in the
    printk calls.

    2) smp.c now needs asm/io.h and asm/hvtramp.h, ds.c no longer
    does

    3) kimage_addr_to_ra() also needs to move into smp.c

    While we're here, update copyright info and my email address
    in smp.c

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Do not select HOTPLUG_CPU from SUN_LDOMS, that causes
    HOTPLUG_CPU to be selected even on non-SMP which is
    illegal.

    Only build hvtramp.o when SMP, just like trampoline.o

    Protect dr-cpu code in ds.c with HOTPLUG_CPU.

    Likewise move ldom_startcpu_cpuid() to smp.c and protect
    it and the call site with SUN_LDOMS && HOTPLUG_CPU.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Only adding cpus is supports at the moment, removal
    will come next.

    When new cpus are configured, the machine description is
    updated. When we get the configure request we pass in a
    cpu mask of to-be-added cpus to the mdesc CPU node parser
    so it only fetches information for those cpus. That code
    also proceeds to update the SMT/multi-core scheduling bitmaps.

    cpu_up() does all the work and we return the status back
    over the DS channel.

    CPUs via dr-cpu need to be booted straight out of the
    hypervisor, and this requires:

    1) A new trampoline mechanism. CPUs are booted straight
    out of the hypervisor with MMU disabled and running in
    physical addresses with no mappings installed in the TLB.

    The new hvtramp.S code sets up the critical cpu state,
    installs the locked TLB mappings for the kernel, and
    turns the MMU on. It then proceeds to follow the logic
    of the existing trampoline.S SMP cpu bringup code.

    2) All calls into OBP have to be disallowed when domaining
    is enabled. Since cpus boot straight into the kernel from
    the hypervisor, OBP has no state about that cpu and therefore
    cannot handle being invoked on that cpu.

    Luckily it's only a handful of interfaces which can be called
    after the OBP device tree is obtained. For example, rebooting,
    halting, powering-off, and setting options node variables.

    CPU removal support will require some infrastructure changes
    here. Namely we'll have to process the requests via a true
    kernel thread instead of in a workqueue. workqueues run on
    a per-cpu thread, but when unconfiguring we might need to
    force the thread to execute on another cpu if the current cpu
    is the one being removed. Removal of a cpu also causes the kernel
    to destroy that cpu's workqueue running thread.

    Another issue on removal is that we may have interrupts still
    pointing to the cpu-to-be-removed. So new code will be needed
    to walk the active INO list and retarget those cpus as-needed.

    Signed-off-by: David S. Miller

    David S. Miller
     

10 Jul, 2007

1 commit

  • the SMP load-balancer uses the boot-time migration-cost estimation
    code to attempt to improve the quality of balancing. The reason for
    this code is that the discrete priority queues do not preserve
    the order of scheduling accurately, so the load-balancer skips
    tasks that were running on a CPU 'recently'.

    this code is fundamental fragile: the boot-time migration cost detector
    doesnt really work on systems that had large L3 caches, it caused boot
    delays on large systems and the whole cache-hot concept made the
    balancing code pretty undeterministic as well.

    (and hey, i wrote most of it, so i can say it out loud that it sucks ;-)

    under CFS the same purpose of cache affinity can be achieved without
    any special cache-hot special-case: tasks are sorted in the 'timeline'
    tree and the SMP balancer picks tasks from the left side of the
    tree, thus the most cache-cold task is balanced automatically.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

05 Jun, 2007

2 commits


29 May, 2007

2 commits

  • Cheetah systems can have cpuids as large as 1023, although physical
    systems don't have that many cpus.

    Only three limitations existed in the kernel preventing arbitrary
    NR_CPUS values:

    1) dcache dirty cpu state stored in page->flags on
    D-cache aliasing platforms. With some build time
    calculations and some build-time BUG checks on
    page->flags layout, this one was easily solved.

    2) The cheetah XCALL delivery code could only handle
    a cpumask with up to 32 cpus set. Some simple looping
    logic clears that up too.

    3) thread_info->cpu was a u8, easily changed to a u16.

    There are a few spots in the kernel that still put NR_CPUS
    sized arrays on the kernel stack, but that's not a sparc64
    specific problem.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Signed-off-by: David S. Miller

    David S. Miller
     

14 May, 2007

1 commit


09 May, 2007

1 commit


03 May, 2007

1 commit

  • Let's allow page-alignment in general for per-cpu data (wanted by Xen, and
    Ingo suggested KVM as well).

    Because larger alignments can use more room, we increase the max per-cpu
    memory to 64k rather than 32k: it's getting a little tight.

    Signed-off-by: Rusty Russell
    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: Andi Kleen
    Acked-by: Ingo Molnar
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton

    Jeremy Fitzhardinge
     

26 Apr, 2007

2 commits

  • I'd like to thank John Stul and others for helping
    me along the way.

    A lot of cleanups fell out of this. For example, the get_compare()
    tick_op was totally unused, so was deleted. And the most often used
    tick_op members were grouped together for cache-friendlyness.

    The sparc64 TSC is given to the kernel as a one-shot timer.

    tick_ops->init_timer() simply turns off the privileged bit in
    the tick register (when possible), and disables the interrupt
    by setting bit 63 in the compare register. The ->disable_irq()
    op also sets this bit.

    tick_ops->add_compare() is changed to:

    1) Add the given delta to "tick" not to "compare"
    2) Return a boolean which, if true, means that the tick
    value read after writing the compare value was found
    to have incremented past the initial tick value. This
    mirrors logic used in the HPET driver's ->next_event()
    method.

    Each tick_ops implementation also now provides a name string.
    And we feed this into the clocksource and clockevents layers.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Things were scattered all over the place, split between
    SMP and non-SMP.

    Unify it all so that dyntick support is easier to add.

    Signed-off-by: David S. Miller

    David S. Miller
     

12 Jan, 2007

1 commit

  • Compiling the kernel with CONFIG_HOTPLUG = y and CONFIG_HOTPLUG_CPU = n
    with CONFIG_RELOCATABLE = y generates the following modpost warnings

    WARNING: vmlinux - Section mismatch: reference to .init.data: from
    .text between '_cpu_up' (at offset 0xc0141b7d) and 'cpu_up'
    WARNING: vmlinux - Section mismatch: reference to .init.data: from
    .text between '_cpu_up' (at offset 0xc0141b9c) and 'cpu_up'
    WARNING: vmlinux - Section mismatch: reference to .init.text:__cpu_up
    from .text between '_cpu_up' (at offset 0xc0141bd8) and 'cpu_up'
    WARNING: vmlinux - Section mismatch: reference to .init.data: from
    .text between '_cpu_up' (at offset 0xc0141c05) and 'cpu_up'
    WARNING: vmlinux - Section mismatch: reference to .init.data: from
    .text between '_cpu_up' (at offset 0xc0141c26) and 'cpu_up'
    WARNING: vmlinux - Section mismatch: reference to .init.data: from
    .text between '_cpu_up' (at offset 0xc0141c37) and 'cpu_up'

    This is because cpu_up, _cpu_up and __cpu_up (in some architectures) are
    defined as __devinit
    AND
    __cpu_up calls some __cpuinit functions.

    Since __cpuinit would map to __init with this kind of a configuration,
    we get a .text refering .init.data warning.

    This patch solves the problem by converting all of __cpu_up, _cpu_up
    and cpu_up from __devinit to __cpuinit. The approach is justified since
    the callers of cpu_up are either dependent on CONFIG_HOTPLUG_CPU or
    are of __init type.

    Thus when CONFIG_HOTPLUG_CPU=y, all these cpu up functions would land up
    in .text section, and when CONFIG_HOTPLUG_CPU=n, all these functions would
    land up in .init section.

    Tested on a i386 SMP machine running linux-2.6.20-rc3-mm1.

    Signed-off-by: Gautham R Shenoy
    Cc: Vivek Goyal
    Cc: Mikael Starvik
    Cc: Ralf Baechle
    Cc: Kyle McMartin
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gautham R Shenoy
     

18 Dec, 2006

1 commit


09 Oct, 2006

1 commit


24 Jun, 2006

1 commit


11 Jun, 2006

1 commit


31 May, 2006

1 commit


11 Apr, 2006

1 commit

  • for_each_cpu() actually iterates across all possible CPUs. We've had mistakes
    in the past where people were using for_each_cpu() where they should have been
    iterating across only online or present CPUs. This is inefficient and
    possibly buggy.

    We're renaming for_each_cpu() to for_each_possible_cpu() to avoid this in the
    future.

    This patch replaces for_each_cpu with for_each_possible_cpu.
    for sparc64.

    Signed-off-by: KAMEZAWA Hiroyuki
    Acked-by: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

10 Apr, 2006

1 commit


01 Apr, 2006

1 commit

  • switch_mm() changes the mm state and does a tsb_context_switch()
    first, then we do the cpu register state switch which changes
    current_thread_info() and current().

    So it's safer to check the PGD physical address stored in the
    trap block (which will be updated by the tsb_context_switch() in
    switch_mm()) than current->active_mm.

    Technically we should never run here in between those two
    updates, because interrupts are disabled during the entire
    context switch operation. But some day we might like to leave
    interrupts enabled during the context switch and this change
    allows that to happen without any surprises.

    Signed-off-by: David S. Miller

    David S. Miller
     

26 Mar, 2006

1 commit


23 Mar, 2006

1 commit

  • When we stop allocating percpu memory for not-possible CPUs we must not touch
    the percpu data for not-possible CPUs at all. The correct way of doing this
    is to test cpu_possible() or to use for_each_cpu().

    This patch is a kernel-wide sweep of all instances of NR_CPUS. I found very
    few instances of this bug, if any. But the patch converts lots of open-coded
    test to use the preferred helper macros.

    Cc: Mikael Starvik
    Cc: David Howells
    Acked-by: Kyle McMartin
    Cc: Anton Blanchard
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Paul Mundt
    Cc: "David S. Miller"
    Cc: William Lee Irwin III
    Cc: Andi Kleen
    Cc: Christian Zankel
    Cc: Philippe Elie
    Cc: Nathan Scott
    Cc: Jens Axboe
    Cc: Eric Dumazet
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

20 Mar, 2006

4 commits

  • The mapping is a simple "(cpuid >> 2) == core" for now.
    Later we'll add more sophisticated code that will walk
    the sun4v machine description and figure this out from
    there.

    We should also add core mappings for jaguar and panther
    processors.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Don't piggy back the SMP receive signal code to do the
    context version change handling.

    Instead allocate another fixed PIL number for this
    asynchronous cross-call. We can't use smp_call_function()
    because this thing is invoked with interrupts disabled
    and a few spinlocks held.

    Also, fix smp_call_function_mask() to count "cpus" correctly.
    There is no guarentee that the local cpu is in the mask
    yet that is exactly what this code was assuming.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • This cpu mondo sending interface isn't all that easy to
    use correctly...

    We were clearing out the wrong bits from the "mask" after getting
    something other than EOK from the hypervisor.

    It turns out the hypervisor can just be resent the same cpu_list[]
    array, with the 0xffff "done" entries still in there, and it will do
    the right thing.

    So don't update or try to rebuild the cpu_list[] array to condense it.

    This requires the "forward_progress" check to be done slightly
    differently, but this new scheme is less bug prone than what we were
    doing before.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • There were several bugs in the SUN4V cpu mondo dispatch code.

    In fact, if we ever got a EWOULDBLOCK or other error from
    the hypervisor call, we'd potentially send a cpu mondo multiple
    times to the same cpu and even worse we could loop until the
    timeout resending the same mondo over and over to such cpus.

    So let's bulletproof this thing as follows:

    1) Implement cpu_mondo_send() and cpu_state() hypervisor calls
    in arch/sparc64/kernel/entry.S, add prototypes to asm/hypervisor.h

    2) Don't build and update the cpulist using inline functions, this
    was causing the cpu mask to not get updated in the caller.

    3) Disable interrupts during the entire mondo send, otherwise our
    cpu list and/or mondo block could get overwritten if we take
    an interrupt and do a cpu mondo send on the current cpu.

    4) Check for all possible error return types from the cpu_mondo_send()
    hypervisor call. In particular:

    HV_EOK) Our work is done, all cpus have received the mondo.
    HV_CPUERROR) One or more of the cpus in the cpu list we passed
    to the hypervisor are in error state. Use cpu_state()
    calls over the entries in the cpu list to see which
    ones. Record them in "error_mask" and report this
    after we are done sending the mondo to cpus which are
    not in error state.
    HV_EWOULDBLOCK) We need to keep trying.

    Any other error we consider fatal, we report the event and exit
    immediately.

    5) We only timeout if forward progress is not made. Forward progress
    is defined as having at least one cpu get the mondo successfully
    in a given cpu_mondo_send() call. Otherwise we bump a counter
    and delay a little. If the counter hits a limit, we signal an
    error and report the event.

    Also, smp_call_function_mask() error handling reports the number
    of cpus incorrectly.

    Signed-off-by: David S. Miller

    David S. Miller