06 Dec, 2011

1 commit

  • Reduce the startup time for slave cpus.

    Adds hooks for an arch-specific function for clock calibration.
    These hooks are used on x86. If a newly started cpu has the
    same phys_proc_id as a core already active, uses the TSC for the
    delay loop and has a CONSTANT_TSC, use the already-calculated
    value of loops_per_jiffy.

    This patch reduces the time required to start slave cpus on a
    4096 cpu system from: 465 sec OLD 62 sec NEW

    This reduces boot time on a 4096p system by almost 7 minutes.
    Nice...

    Signed-off-by: Jack Steiner
    Cc: "H. Peter Anvin"
    Cc: John Stultz
    [fix CONFIG_SMP=n build]
    Signed-off-by: Andrew Morton
    Signed-off-by: Ingo Molnar

    Jack Steiner
     

26 Jul, 2011

1 commit

  • For each CPU, do the calibration delay only once. For subsequent calls,
    use the cached per-CPU value of loops_per_jiffy.

    This saves about 200ms of resume time on dual core Intel Atom N5xx based
    systems. This helps bring down the kernel resume time on such systems
    from about 500ms to about 300ms.

    [akpm@linux-foundation.org: make cpu_loops_per_jiffy static]
    [akpm@linux-foundation.org: clean up message text]
    [akpm@linux-foundation.org: fix things up after upstream rmk changes]
    Signed-off-by: Sameer Nanda
    Cc: Phil Carmody
    Cc: Andrew Worsley
    Cc: David Daney
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sameer Nanda
     

23 Jun, 2011

1 commit

  • Secondary CPU bringup typically calls calibrate_delay() during its
    initialization. However, calibrate_delay() modifies a global variable
    (loops_per_jiffy) used for udelay() and __delay().

    A side effect of 71c696b1 ("calibrate: extract fall-back calculation
    into own helper") introduced in the 2.6.39 merge window means that we
    end up with a substantial period where loops_per_jiffy is zero. This
    causes the spinlock debugging code to malfunction:

    u64 loops = loops_per_jiffy * HZ;
    for (;;) {
    for (i = 0; i < loops; i++) {
    if (arch_spin_trylock(&lock->raw_lock))
    return;
    __delay(1);
    }
    ...
    }

    by never calling arch_spin_trylock() - resulting in the CPU locking
    up in an infinite loop inside __spin_lock_debug().

    Work around this by only writing to loops_per_jiffy only once we have
    completed all the calibration decisions.

    Tested-by: Santosh Shilimkar
    Signed-off-by: Russell King
    Cc: (2.6.39-stable)
    --
    Better solutions (such as omitting the calibration for secondary CPUs,
    or arranging for calibrate_delay() to return the LPJ value and leave
    it to the caller to decide where to store it) are a possibility, but
    would be much more invasive into each architecture.

    I think this is the best solution for -rc and stable, but it should be
    revisited for the next merge window.

    init/calibrate.c | 14 ++++++++------
    1 files changed, 8 insertions(+), 6 deletions(-)
    Signed-off-by: Linus Torvalds

    Russell King
     

16 Jun, 2011

1 commit

  • Remove calibrate_delay_direct()'s KERN_DEBUG printk related to bogomips
    calculation as it appears when booting every core on setups with
    'ignore_loglevel' which dmesg people scan for possible issues. As the
    message doesn't show very useful information to the widest audience of
    kernel boot message gazers, it should be removed.

    Introduced by commit d2b463135f84 ("init/calibrate.c: fix for critical
    bogoMIPS intermittent calculation failure").

    Signed-off-by: Borislav Petkov
    Cc: Andrew Worsley
    Cc: Phil Carmody
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Borislav Petkov
     

25 May, 2011

1 commit

  • A fix to the TSC (Time Stamp Counter) based bogoMIPS calculation used on
    secondary CPUs which has two faults:

    1: Not handling wrapping of the lower 32 bits of the TSC counter on
    32bit kernel - perhaps TSC is not reset by a warm reset?

    2: TSC and Jiffies are no incrementing together properly. Either
    jiffies increment too quickly or Time Stamp Counter isn't incremented
    in during an SMI but the real time clock is and jiffies are
    incremented.

    Case 1 can result in a factor of 16 too large a value which makes udelay()
    values too small and can cause mysterious driver errors. Case 2 appears
    to give smaller 10-15% errors after averaging but enough to cause
    occasional failures on my own board

    I have tested this code on my own branch and attach patch suitable for
    current kernel code. See below for examples of the failures and how the
    fix handles these situations now.

    I reported this issue earlier here:
    Intermittent problem with BogoMIPs calculation on Intel AP CPUs -
    http://marc.info/?l=linux-kernel&m=129947246316875&w=4

    I suspect this issue has been seen by others but as it is intermittent and
    bogoMIPS for secondary CPUs are no longer printed out it might have been
    difficult to identify this as the cause. Perhaps these unresolved issues,
    although quite old, might be relevant as possibly this fault has been
    around for a while. In particular Case 1 may only be relevant to 32bit
    kernels on newer HW (most people run 64bit kernels?). Case 2 is less
    dramatic since the earlier fix in this area and also intermittent.

    Re: bogomips discrepancy on Intel Core2 Quad CPU -
    http://marc.info/?l=linux-kernel&m=118929277524298&w=4
    slow system and bogus bogomips -
    http://marc.info/?l=linux-kernel&m=116791286716107&w=4
    Re: Re: [RFC-PATCH] clocksource: update lpj if clocksource has -
    http://marc.info/?l=linux-kernel&m=128952775819467&w=4

    This issue is masked a little by commit feae3203d711db0a ("timers, init:
    Limit the number of per cpu calibration bootup messages") which only
    prints out the first bogoMIPS value making it much harder to notice other
    values differing. Perhaps it should be changed to only suppress them when
    they are similar values?

    Here are some outputs showing faults occurring and the new code handling
    them properly. See my earlier message for examples of the original
    failure.

    Case 1: A Time Stamp Counter wrap:
    ...
    Calibrating delay loop (skipped), value calculated using timer
    frequency.. 6332.70 BogoMIPS (lpj=31663540)
    ....
    calibrate_delay_direct() timer_rate_max=31666493
    timer_rate_min=31666151 pre_start=4170369255 pre_end=4202035539
    calibrate_delay_direct() timer_rate_max=2425955274
    timer_rate_min=2425954941 pre_start=4265368533 pre_end=2396356387
    calibrate_delay_direct() ignoring timer_rate as we had a TSC wrap
    around start=4265368581 >=post_end=2396356511
    calibrate_delay_direct() timer_rate_max=31666274
    timer_rate_min=31665942 pre_start=2440373374 pre_end=2472039515
    calibrate_delay_direct() timer_rate_max=31666492
    timer_rate_min=31666160 pre_start=2535372139 pre_end=2567038422
    calibrate_delay_direct() timer_rate_max=31666455
    timer_rate_min=31666207 pre_start=2630371084 pre_end=2662037415
    Calibrating delay using timer specific routine.. 6333.28 BogoMIPS (lpj=31666428)
    Total of 2 processors activated (12665.99 BogoMIPS).
    ....

    Case 2: Some thing (presumably the SMM interrupt?) causing the
    very low increase in TSC counter for the DELAY_CALIBRATION_TICKS
    increase in jiffies
    ...
    Calibrating delay loop (skipped), value calculated using timer
    frequency.. 6333.25 BogoMIPS (lpj=31666270)
    ...
    calibrate_delay_direct() timer_rate_max=31666483
    timer_rate_min=31666074 pre_start=4199536526 pre_end=4231202809
    calibrate_delay_direct() timer_rate_max=864348 timer_rate_min=864016
    pre_start=2405343672 pre_end=2406207897
    calibrate_delay_direct() timer_rate_max=31666483
    timer_rate_min=31666179 pre_start=2469540464 pre_end=2501206823
    calibrate_delay_direct() timer_rate_max=31666511
    timer_rate_min=31666122 pre_start=2564539400 pre_end=2596205712
    calibrate_delay_direct() timer_rate_max=31666084
    timer_rate_min=31665685 pre_start=2659538782 pre_end=2691204657
    calibrate_delay_direct() dropping min bogoMips estimate 1 = 864348
    Calibrating delay using timer specific routine.. 6333.27 BogoMIPS (lpj=31666390)
    Total of 2 processors activated (12666.53 BogoMIPS).
    ...

    After 70 boots I saw 2 variations
    Reviewed-by: Phil Carmody
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Worsley
     

23 Mar, 2011

3 commits

  • Systems with unmaskable interrupts such as SMIs may massively
    underestimate loops_per_jiffy, and fail to converge anywhere near the real
    value. A case seen on x86_64 was an initial estimate of 256<<<
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Tested-by: Stephen Boyd
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Phil Carmody
     
  • Binary chop with a jiffy-resync on each step to find an upper bound is
    slow, so just race in a tight-ish loop to find an underestimate.

    If done with lots of individual steps, sometimes several hundreds of
    iterations would be required, which would impose a significant overhead,
    and make the initial estimate very low. By taking slowly increasing steps
    there will be less overhead.

    E.g. an x86_64 2.67GHz could have fitted in 613 individual small delays,
    but in reality should have been able to fit in a single delay 644 times
    longer, so underestimated by 31 steps. To reach the equivalent of 644
    small delays with the accelerating scheme now requires about 130
    iterations, so has
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Tested-by: Stephen Boyd
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Phil Carmody
     
  • The motivation for this patch series is that currently our OMAP calibrates
    itself using the trial-and-error binary chop fallback that some other
    architectures no longer need to perform. This is a lengthy process,
    taking 0.2s in an environment where boot time is of great interest.

    Patch 2/4 has two optimisations. Firstly, it replaces the initial
    repeated- doubling to find the relevant power of 2 with a tight loop that
    just does as much as it can in a jiffy. Secondly, it doesn't binary chop
    over an entire power of 2 range, it choses a much smaller range based on
    how much it squeezed in, and failed to squeeze in, during the first stage.
    Both are significant optimisations, and bring our calibration down from
    23 jiffies to 5, and, in the process, often arrive at a more accurate lpj
    value.

    The 'bands' and 'sub-logarithmic' growth may look over-engineered, but
    they only cost a small level of inaccuracy in the initial guess (for all
    architectures) in order to avoid the very large inaccuracies that appeared
    during testing (on x86_64 architectures, and presumably others with less
    metronomic operation). Note that due to the existence of the TSC and
    other timers, the x86_64 will not typically use this fallback routine, but
    I wanted to code defensively, able to cope with all kinds of processor
    behaviours and kernel command line options.

    Patch 3/4 is an additional trap for the nightmare scenario where the
    initial estimate is very inaccurate, possibly due to things like SMIs.
    It simply retries with a larger bound.

    Stephen said:

    I tried this patch set out on an MSM7630.
    :
    : Before:
    :
    : Calibrating delay loop... 681.57 BogoMIPS (lpj=3407872)
    :
    : After:
    :
    : Calibrating delay loop... 680.75 BogoMIPS (lpj=3403776)
    :
    : But the really good news is calibration time dropped from ~247ms to ~56ms.
    : Sadly we won't be able to benefit from this should my udelay patches make
    : it into ARM because we would be using calibrate_delay_direct() instead (at
    : least on machines who choose to). Can we somehow reapply the logic behind
    : this to calibrate_delay_direct()? That would be even better, but this is
    : definitely a boot time improvement.
    :
    : Or maybe we could just replace calibrate_delay_direct() with this fallback
    : calculation? If __delay() is a thin wrapper around read_current_timer()
    : it should work just as well (plus patch 3 makes it handle SMIs). I'll try
    : that out.

    This patch:

    ... so that it can be modified more clinically.

    This is almost entirely cosmetic. The only change to the operation
    is that the global variable is only set once after the estimation is
    completed, rather than taking on all the intermediate values. However,
    there are no readers of that variable, so this change is unimportant.

    Signed-off-by: Phil Carmody
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Tested-by: Stephen Boyd
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Phil Carmody
     

11 Feb, 2011

1 commit

  • Fixes a hang when booting as dom0 under Xen, when jiffies can be
    quite large by the time the kernel init gets this far.

    Signed-off-by: Tim Deegan
    [jbeulich@novell.com: !time_after() -> time_before_eq() as suggested by Jiri Slaby]
    Signed-off-by: Jan Beulich
    Cc: Jiri Slaby
    Cc: Jeremy Fitzhardinge
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Tim Deegan
     

26 Nov, 2009

1 commit

  • Limit the number of per cpu calibration messages by only
    printing out results for the first cpu to boot.

    Also, don't print "CPUx is down" as this is expected, and we
    don't need 4096 reminders... ;-)

    Signed-off-by: Mike Travis
    Cc: Heiko Carstens
    Cc: Roland Dreier
    Cc: Randy Dunlap
    Cc: Tejun Heo
    Cc: Andi Kleen
    Cc: Greg Kroah-Hartman
    Cc: Yinghai Lu
    Cc: David Rientjes
    Cc: Steven Rostedt
    Cc: Rusty Russell
    Cc: Hidetoshi Seto
    Cc: Jack Steiner
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Mike Travis
     

28 Jul, 2008

1 commit

  • Rabin Vincent noticed that there's a stray in BogoMIPS printk:

    > Remove the extra KERN_INFO which causes this:
    > Calibrating delay loop... 179.40 BogoMIPS (lpj=897024)
    > - printk(KERN_INFO "%lu.%02lu BogoMIPS (lpj=%lu)\n",
    > - loops_per_jiffy/(500000/HZ),
    > - (loops_per_jiffy/(5000/HZ)) % 100, loops_per_jiffy);
    > + printk("%lu.%02lu BogoMIPS (lpj=%lu)\n",
    > + loops_per_jiffy/(500000/HZ),
    > + (loops_per_jiffy/(5000/HZ)) % 100, loops_per_jiffy);
    > }

    How about just using KERN_CONT and leaving the whitespace
    for a patch that does the entire file?

    Reported-by: Rabin Vincent

    Joe Perches
     

24 Jun, 2008

2 commits

  • As suggested by Ingo, remove all references to tsc from init/calibrate.c

    TSC is x86 specific, and using tsc in variable names in a generic file should
    be avoided. lpj_tsc is now called lpj_fine, since it is related to fine tuning
    of lpj value. Also tsc_rate_* is called timer_rate_*

    Signed-off-by: Alok N Kataria
    Cc: Arjan van de Ven
    Cc: Daniel Hecht
    Cc: Tim Mann
    Cc: Zach Amsden
    Cc: Sahil Rihan
    Signed-off-by: Ingo Molnar

    Alok Kataria
     
  • On the x86 platform we can use the value of tsc_khz computed during tsc
    calibration to calculate the loops_per_jiffy value. Its very important
    to keep the error in lpj values to minimum as any error in that may
    result in kernel panic in check_timer. In virtualization environment, On
    a highly overloaded host the guest delay calibration may sometimes
    result in errors beyond the ~50% that timer_irq_works can handle,
    resulting in the guest panicking.

    Does some formating changes to lpj_setup code to now have a single
    printk to print the bogomips value.

    We do this only for the boot processor because the AP's can have
    different base frequencies or the BIOS might boot a AP at a different
    frequency.

    Signed-off-by: Alok N Kataria
    Cc: Arjan van de Ven
    Cc: Daniel Hecht
    Cc: Tim Mann
    Cc: Zach Amsden
    Cc: Sahil Rihan
    Signed-off-by: Ingo Molnar

    Alok Kataria
     

07 Feb, 2008

2 commits

  • calibrate_delay() must be __cpuinit, not __{dev,}init.

    I've verified that this is correct for all users.

    While doing the latter, I also did the following cleanups:
    - remove pointless additional prototypes in C files
    - ensure all users #include

    This fixes the following section mismatches with CONFIG_HOTPLUG=n,
    CONFIG_HOTPLUG_CPU=y:

    WARNING: vmlinux.o(.text+0x1128d): Section mismatch: reference to .init.text.1:calibrate_delay (between 'check_cx686_slop' and 'set_cx86_reorder')
    WARNING: vmlinux.o(.text+0x25102): Section mismatch: reference to .init.text.1:calibrate_delay (between 'smp_callin' and 'cpu_coregroup_map')

    Signed-off-by: Adrian Bunk
    Cc: Ivan Kokshaysky
    Cc: Richard Henderson
    Cc: "Luck, Tony"
    Cc: Ralf Baechle
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: "David S. Miller"
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Christian Zankel
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • - All implementations can be __devinit

    - The function prototypes were in asm/timex.h but they all must be the same,
    so create a single declaration in linux/timex.h.

    - uninline the sparc64 version to match the other architectures

    - Don't bother #defining ARCH_HAS_READ_CURRENT_TIMER to a particular value.

    [ezk@cs.sunysb.edu: fix build]
    Cc: "David S. Miller"
    Cc: Haavard Skinnemoen
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

17 Oct, 2007

1 commit

  • Optionally add a boot delay after each kernel printk() call, crudely
    measured in milliseconds, with a maximum delay of 10 seconds per printk.

    Enable CONFIG_BOOT_PRINTK_DELAY=y and then add (e.g.):
    "lpj=loops_per_jiffy boot_delay=100"
    to the kernel command line.

    It has been useful in cases like "during boot, my machine just reboots or the
    screen goes black" by slowing down printk, (and adding initcall_debug), we can
    usually see the last thing that happened before the lights went out which is
    usually a valuable clue.

    [akpm@linux-foundation.org: not all architectures implement CONFIG_HZ]
    [akpm@linux-foundation.org: fix lots of stuff]
    [bunk@stusta.de: kernel/printk.c: make 2 variables static]
    [heiko.carstens@de.ibm.com: fix slow down printk on boot compile error]
    Signed-off-by: Randy Dunlap
    Signed-off-by: Dave Jones
    Signed-off-by: Adrian Bunk
    Signed-off-by: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

15 Feb, 2007

1 commit

  • After Al Viro (finally) succeeded in removing the sched.h #include in module.h
    recently, it makes sense again to remove other superfluous sched.h includes.
    There are quite a lot of files which include it but don't actually need
    anything defined in there. Presumably these includes were once needed for
    macros that used to live in sched.h, but moved to other header files in the
    course of cleaning it up.

    To ease the pain, this time I did not fiddle with any header files and only
    removed #includes from .c-files, which tend to cause less trouble.

    Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha,
    arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig,
    allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all
    configs in arch/arm/configs on arm. I also checked that no new warnings were
    introduced by the patch (actually, some warnings are removed that were emitted
    by unnecessarily included header files).

    Signed-off-by: Tim Schmielau
    Acked-by: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tim Schmielau
     

24 Jun, 2005

1 commit

  • Issue:
    Current tsc based delay_calibration can result in significant errors in
    loops_per_jiffy count when the platform events like SMIs
    (System Management Interrupts that are non-maskable) are present. This could
    lead to potential kernel panic(). This issue is becoming more visible with 2.6
    kernel (as default HZ is 1000) and on platforms with higher SMI handling
    latencies. During the boot time, SMIs are mostly used by BIOS (for things
    like legacy keyboard emulation).

    Description:
    The psuedocode for current delay calibration with tsc based delay looks like
    (0) Estimate a value for loops_per_jiffy
    (1) While (loops_per_jiffy estimate is accurate enough)
    (2) wait for jiffy transition (jiffy1)
    (3) Note down current tsc (tsc1)
    (4) loop until tsc becomes tsc1 + loops_per_jiffy
    (5) check whether jiffy changed since jiffy1 or not and refine
    loops_per_jiffy estimate

    Consider the following cases
    Case 1:
    If SMIs happen between (2) and (3) above, we can end up with a
    loops_per_jiffy value that is too low. This results in shorted delays and
    kernel can panic () during boot (Mostly at IOAPIC timer initialization
    timer_irq_works() as we don't have enough timer interrupts in a specified
    interval).

    Case 2:
    If SMIs happen between (3) and (4) above, then we can end up with a
    loops_per_jiffy value that is too high. And with current i386 code, too
    high lpj value (greater than 17M) can result in a overflow in
    delay.c:__const_udelay() again resulting in shorter delay and panic().

    Solution:
    The patch below makes the calibration routine aware of asynchronous events
    like SMIs. We increase the delay calibration time and also identify any
    significant errors (greater than 12.5%) in the calibration and notify it to
    user.

    Patch below changes both i386 and x86-64 architectures to use this
    new and improved calibrate_delay_direct() routine.

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Venkatesh Pallipadi
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds