30 Jan, 2008

40 commits

  • lindent these files:
    errors lines of code errors/KLOC
    arch/x86/math-emu/ 2236 9424 237.2
    arch/x86/math-emu/ 128 8706 14.7

    no other changes. No code changed:

    text data bss dec hex filename
    5589802 612739 3833856 10036397 9924ad vmlinux.before
    5589802 612739 3833856 10036397 9924ad vmlinux.after

    the intent of this patch is to ease the automated tracking of kernel
    code quality - it's just much easier for us to maintain it if every file
    in arch/x86 is supposed to be clean.

    NOTE: it is a known problem of lindent that it causes some style damage
    of its own, but it's a safe tool (well, except for the gcc array range
    initializers extension), so we did the bulk of the changes via lindent,
    and did the manual fixups in a followup patch.

    the resulting math-emu code has been tested by Thomas Gleixner on a real
    386 DX CPU as well, and it works fine.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Ingo Molnar
     
  • lindent the mach-voyager files to get rid of more than 300 style errors:

    errors lines of code errors/KLOC
    arch/x86/mach-voyager/ [old] 409 3729 109.6
    arch/x86/mach-voyager/ [new] 71 3678 19.3

    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Ingo Molnar
     
  • clean up arch/x86/kernel/aperture_64.c printk()s.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Ingo Molnar
     
  • whitespace cleanup. No code changed:

    text data bss dec hex filename
    2080 76 4 2160 870 aperture_64.o.before
    2080 76 4 2160 870 aperture_64.o.after

    errors lines of code errors/KLOC
    arch/x86/kernel/aperture_64.c 114 299 381.2
    arch/x86/kernel/aperture_64.c 0 315 0

    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Ingo Molnar
     
  • White space and coding style clenaup.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • White space and coding style clenaup.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • White space and coding style clenaup.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • White space and coding style clenaup.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • White space and coding style cleanup.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • White space and coding style clenaup.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • White space and coding style clenaup.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • White space and coding style clenaup.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • asm/cpufeature.h was already almost unified; this completes the job.

    Signed-off-by: H. Peter Anvin
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    H. Peter Anvin
     
  • Create , with common definitions suitable for assembly
    unification.

    Signed-off-by: H. Peter Anvin
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    H. Peter Anvin
     
  • local_irq_enable() is missing after sched_clock_idle_wakeup_event().

    Signed-off-by: Hiroshi Shimamoto
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Hiroshi Shimamoto
     
  • do a proper idle-wakeup event on HLT as well - some CPUs stop the TSC
    in HLT too, not just when going through the ACPI methods.

    (the ACPI idle code already does this.)

    [ update the 64-bit side too, as noticed by Jiri Slaby. ]

    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Ingo Molnar
     
  • scale the sched_clock() cyc_2_nsec scaling factor according to
    CPU frequency changes.

    [ mingo@elte.hu: simplified it and fixed it for SMP. ]

    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Guillaume Chazarain
     
  • cf http://lkml.org/lkml/2007/10/3/41

    To summarize: on Linux, SA_ONSTACK decides whether you are already on the
    signal stack based on the value of the SP at the time of a signal. If
    you are not already inside the range, you are not "on the signal stack"
    and so the new signal handler frame starts over at the base of the signal
    stack.

    sigaltstack (and sigstack before it) was invented in BSD. There, the
    SA_ONSTACK behavior has always been different. It uses a kernel state
    flag to decide, rather than the SP value. When you first take an
    SA_ONSTACK signal and switch to the alternate signal stack, it sets the
    SS_ONSTACK flag in the thread's sigaltstack state in the kernel.
    Thereafter you are "on the signal stack" and don't switch SP before
    pushing a handler frame no matter what the SP value is. Only when you
    sigreturn from the original handler context do you clear the SS_ONSTACK
    flag so that a new handler frame will start over at the base of the
    alternate signal stack.

    The undesireable effect of the Linux behavior is that an overflow of the
    alternate signal stack can not only go undetected, but lead to a ring
    buffer effect of clobbering the original handler frame at the base of the
    signal stack for each successive signal that comes just after the
    overflow. This is what Shi Weihua's test case demonstrates. Normally
    this does not come up because of the signal mask, but the test case uses
    SA_NODEFER for its SIGSEGV handler.

    The other subtle part of the existing Linux semantics is that a simple
    longjmp out of a signal handler serves to take you off the signal stack
    in a safe and reliable fashion without having used sigreturn (nor having
    just returned from the handler normally, which means the same). After
    the longjmp (or even informal stack switching not via any proper libc or
    kernel interface), the alternate signal stack stands ready to be used
    again.

    A paranoid program would allocate a PROT_NONE red zone around its
    alternate signal stack. Then a small overflow would trigger a SIGSEGV in
    handler setup, and be fatal (core dump) whether or not SIGSEGV is
    blocked. As with thread stack red zones, that cannot catch all overflows
    (or underflows). e.g., a local array as large as page size allocated in
    a function called from a handler, but not actually touched before more
    calls push more stack, could cause an overflow that silently pushes into
    some unrelated allocated pages.

    The BSD behavior does not do anything in particular about overflow. But
    it does at least avoid the wraparound or "ring buffer effect", so you'll
    just get a straightforward all-out overflow down your address space past
    the low end of the alternate signal stack. I don't know what the BSD
    behavior is for longjmp out of an SA_ONSTACK handler.

    The POSIX wording relating to sigaltstack is pretty minimal. I don't
    think it speaks to this issue one way or another. (The program that
    overflows its stack is clearly in undefined behavior territory of one
    sort or another anyhow.)

    Given the longjmp issue and the potential for highly subtle complications
    in existing programs relying on this in arcane ways deep in their code, I
    am very dubious about changing the behavior to the BSD style persistent
    flag. I think Shi Weihua's patches have a similar effect by tracking the
    SP used in the last handler setup.

    I think it would be sensible for the signal handler setup code to detect
    when it would itself be causing a stack overflow. Maybe something like
    the following patch (untested). This issue exists in the same way on all
    machines, so ideally they would all do a similar check.

    When it's the handler function itself or its callees that cause the
    overflow, rather than the signal handler frame setup alone crossing the
    boundary, this still won't help. But I don't see any way to distinguish
    that from the valid longjmp case.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Roland McGrath
     
  • add the DMI strings provided by Islam Amer , for
    the Compaq Presario V6000 (Quanta/30B7).

    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Ingo Molnar
     
  • make io_delay=0xed the default. This frees up port 0x80 which is
    a debug port on some machines and locks up certain laptops.

    Testing only for now. Try the io_delay=0x80 boot option if this does not
    work for you.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Ingo Molnar
     
  • various changes to the in_p/out_p delay details:

    - add the io_delay=none method
    - make each method selectable from the kernel config
    - simplify the delay code a bit by getting rid of an indirect function call
    - add the /proc/sys/kernel/io_delay_type sysctl
    - change 'io_delay=standard|alternate' to io_delay=0x80 and io_delay=0xed
    - make the io delay config not depend on CONFIG_DEBUG_KERNEL

    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner
    Tested-by: "David P. Reed"

    Ingo Molnar
     
  • x86: provide a DMI based port 0x80 I/O delay override.

    Certain (HP) laptops experience trouble from our port 0x80 I/O delay
    writes. This patch provides for a DMI based switch to the "alternate
    diagnostic port" 0xed (as used by some BIOSes as well) for these.

    David P. Reed confirmed that port 0xed works for him and provides a
    proper delay. The symptoms of _not_ working are a hanging machine,
    with "hwclock" use being a direct trigger.

    Earlier versions of this attempted to simply use udelay(2), with the
    2 being a value tested to be a nicely conservative upper-bound with
    help from many on the linux-kernel mailinglist but that approach has
    two problems.

    First, pre-loops_per_jiffy calibration (which is post PIT init while
    some implementations of the PIT are actually one of the historically
    problematic devices that need the delay) udelay() isn't particularly
    well-defined. We could initialise loops_per_jiffy conservatively (and
    based on CPU family so as to not unduly delay old machines) which
    would sort of work, but...

    Second, delaying isn't the only effect that a write to port 0x80 has.
    It's also a PCI posting barrier which some devices may be explicitly
    or implicitly relying on. Alan Cox did a survey and found evidence
    that additionally some drivers may be racy on SMP without the bus
    locking outb.

    Switching to an inb() makes the timing too unpredictable and as such,
    this DMI based switch should be the safest approach for now. Any more
    invasive changes should get more rigid testing first. It's moreover
    only very few machines with the problem and a DMI based hack seems
    to fit that situation.

    This also introduces a command-line parameter "io_delay" to override
    the DMI based choice again:

    io_delay=

    where "standard" means using the standard port 0x80 and "alternate"
    port 0xed.

    This retains the udelay method as a config (CONFIG_UDELAY_IO_DELAY) and
    command-line ("io_delay=udelay") choice for testing purposes as well.

    This does not change the io_delay() in the boot code which is using
    the same port 0x80 I/O delay but those do not appear to be a problem
    as David P. Reed reported the problem was already gone after using the
    udelay version. He moreover reported that booting with "acpi=off" also
    fixed things and seeing as how ACPI isn't touched until after this DMI
    based I/O port switch I believe it's safe to leave the ones in the boot
    code be.

    The DMI strings from David's HP Pavilion dv9000z are in there already
    and we need to get/verify the DMI info from other machines with the
    problem, notably the HP Pavilion dv6000z.

    This patch is partly based on earlier patches from Pavel Machek and
    David P. Reed.

    Signed-off-by: Rene Herman
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Rene Herman
     
  • s2ram recently became useful here, except for the kernel's annoying
    habit of disabling my P4's perfectly good TSC.

    [ 107.894470] CPU 1 is now offline
    [ 107.894474] SMP alternatives: switching to UP code
    [ 107.895832] CPU0 attaching sched-domain:
    [ 107.895836] domain 0: span 1
    [ 107.895838] groups: 1
    [ 107.896097] CPU1 is down
    [ 3.726156] Intel machine check architecture supported.
    [ 3.726165] Intel machine check reporting enabled on CPU#0.
    [ 3.726167] CPU0: Intel P4/Xeon Extended MCE MSRs (12) available
    [ 3.726170] CPU0: Thermal monitoring enabled
    [ 3.726175] Back to C!
    [ 3.726708] Force enabled HPET at resume
    [ 3.726775] Enabling non-boot CPUs ...
    [ 3.727049] CPU0 attaching NULL sched-domain.
    [ 3.727165] SMP alternatives: switching to SMP code
    [ 3.727858] Booting processor 1/1 eip 3000
    [ 3.727862] CPU 1 irqstacks, hard=b042f000 soft=b042d000
    [ 3.738173] Initializing CPU#1
    [ 3.798912] Calibrating delay using timer specific routine.. 5986.12 BogoMIPS (lpj=2993061)
    [ 3.798920] CPU: After generic identify, caps: bfebfbff 00000000 00000000 00000000 00004400 00000000 00000000 00000000
    [ 3.798931] CPU: Trace cache: 12K uops, L1 D cache: 8K
    [ 3.798934] CPU: L2 cache: 512K
    [ 3.798936] CPU: Physical Processor ID: 0
    [ 3.798938] CPU: After all inits, caps: bfebfbff 00000000 00000000 0000b080 00004400 00000000 00000000 00000000
    [ 3.798946] Intel machine check architecture supported.
    [ 3.798952] Intel machine check reporting enabled on CPU#1.
    [ 3.798955] CPU1: Intel P4/Xeon Extended MCE MSRs (12) available
    [ 3.798959] CPU1: Thermal monitoring enabled
    [ 3.799161] CPU1: Intel(R) Pentium(R) 4 CPU 3.00GHz stepping 09
    [ 3.799187] checking TSC synchronization [CPU#0 -> CPU#1]:
    [ 3.819181] Measured 63588552840 cycles TSC warp between CPUs, turning off TSC clock.
    [ 3.819184] Marking TSC unstable due to: check_tsc_sync_source failed.

    If check_tsc_warp() is called after initial boot, and the TSC has in the
    meantime been set (BIOS, user, silicon, elves) to a value lower than the
    last stored/stale value, we blame the TSC. Reset to pristine condition
    after every test.

    Signed-off-by: Mike Galbraith
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Mike Galbraith
     
  • Document the fact that __save_processor_state() has to save all CPU
    registers referred to by the kernel in case a different kernel is
    used to load and restore a hibernation image containing it.

    Sigend-off-by: Rafael J. Wysocki
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Rafael J. Wysocki
     
  • Michael Opdenacker reported:

    For backward compatibility with earlier (< 2.6.24) kernels,
    arch/i386/boot/bzImage or arch/x86_64/boot/bzImage symbolic links to
    arch/x86/boot/bzImage are created when you build an x86 kernel. The
    arch/i386 or arch/x86_64 directories are then created for this only
    purpose.

    Issue: these generated directories and symbolic links are *not cleaned
    up* when you run "make mrproper" (and thus "make distclean"). This
    disturbs the production of patches, because the source tree is left with
    generated files and directories.

    Sam has an alternative fix:

    The directory is killed during make clean as opposed to make mrproper.

    Reported-by: Michael Opdenacker
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Sam Ravnborg
     
  • Current idle time in kstat is based on jiffies and is coarse grained.
    tick_sched.idle_sleeptime is making some attempt to keep track of idle time
    in a fine grained manner. But, it is not handling the time spent in
    interrupts fully.

    Make tick_sched.idle_sleeptime accurate with respect to time spent on
    handling interrupts and also add tick_sched.idle_lastupdate, which keeps
    track of last time when idle_sleeptime was updated.

    This statistics will be crucial for cpufreq-ondemand governor, which can
    shed some conservative gaurd band that is uses today while setting the
    frequency. The ondemand changes that uses the exact idle time is coming
    soon.

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Andrew Morton
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Venki Pallipadi
     
  • I recently noticed on one of my boxes that when synched with an NTP
    server, the drift value reported for the system was ~283ppm. While in
    some cases, clock hardware can be that bad, it struck me as unusual as
    the system was using the acpi_pm clocksource, which is one of the more
    trustworthy and accurate clocksources on x86 hardware.

    I brought up another system and let it sync to the same NTP server, and
    I noticed a similar 280some ppm drift.

    In looking at the code, I found that the acpi_pm's constant frequency
    was being computed correctly at boot-up, however once the system was up,
    even without the ntp daemon running, the clocksource's frequency was
    being modified by the clocksource_adjust() function.

    Digging deeper, I realized that in the code that keeps track of how much
    the clocksource is skewing from the ntp desired time, we were using
    different lengths to establish how long an time interval was.

    The clocksource was being setup with the following interval:
    NTP_INTERVAL_LENGTH = NSEC_PER_SEC/NTP_INTERVAL_FREQ

    While the ntp code was using the tick_length_base value:
    tick_length_base ~= (tick_usec * NSEC_PER_USEC * USER_HZ)
    /NTP_INTERVAL_FREQ

    The subtle difference is:
    (tick_usec * NSEC_PER_USEC * USER_HZ) != NSEC_PER_SEC

    This difference in calculation was causing the clocksource correction
    code to apply a correction factor to the clocksource so the two
    intervals were the same, however this results in the actual frequency of
    the clocksource to be made incorrect. I believe this difference would
    affect all clocksources, although to differing degrees depending on the
    clocksource resolution.

    The issue was introduced when my HZ free ntp patch landed in 2.6.21-rc1,
    so my apologies for the mistake, and for not noticing it until now.

    The following patch, corrects the clocksource's initialization code so
    it uses the same interval length as the code in ntp.c. After applying
    this patch, the drift value for the same system went from ~283ppm to
    only 2.635ppm.

    I believe this patch to be good, however it does affect all arches and
    I've only tested on x86, so some caution is advised. I do think it would
    be a likely candidate for a stable 2.6.24.x release.

    Any thoughts or feedback would be appreciated.

    Signed-off-by: John Stultz
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    john stultz
     
  • Looks like IRQ 31 is assigned to timer 3, even without the patch!
    I wonder who wrote the number 31. But the manual says that it is
    zero by default.

    I think we should check whether the timer has been allocated an IRQ before
    proceeding to assign one to it. Here is a patch that does this.

    Signed-off-by: Balaji Rao
    Tested-by: Yinghai Lu
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Balaji Rao
     
  • The userspace API for the HPET (see Documentation/hpet.txt) did not work. The
    HPET_IE_ON ioctl was failing as there was no IRQ assigned to the timer
    device. This patch fixes it by allocating IRQs to timer blocks in the HPET.

    arch/x86/kernel/hpet.c | 13 +++++--------
    drivers/char/hpet.c | 45 ++++++++++++++++++++++++++++++++++++++-------
    include/linux/hpet.h | 2 +-
    3 files changed, 44 insertions(+), 16 deletions(-)

    Signed-off-by: Balaji Rao
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Balaji Rao
     
  • detect zero event-device multiplicators - they then cause
    division-by-zero crashes if a clockevent has been initialized
    incorrectly.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Ingo Molnar
     
  • The following scenario might leave PIT as a disfunctional clock source:

    PIT is registered as clocksource
    PM_TIMER is registered as clocksource and enables highres/dyntick mode
    PIT is switched to oneshot mode
    -> now the readout of PIT is bogus, but the user might select PIT
    via the sysfs override, which would break the box as the time
    readout is unusable.

    Unregister the PIT clocksource when the PIT clock event device is switched
    into shutdown / oneshot mode.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • On x86 the PIT might become an unusable clocksource. Add an unregister
    function to provide a possibilty to remove the PIT from the list of
    available clock sources.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • PIT clocksource is registered unconditionally even when HPET is enabled
    or when PIT is replaced by the local APIC timer. In both cases PIT can
    not be used as it is stopped and the readout would be stale.

    Prevent registering PIT in those cases.

    patch depends on:

    x86: offer is_hpet_enabled() on !CONFIG_HPET_TIMER too

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • offer is_hpet_enabled() on !CONFIG_HPET_TIMER too.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Ingo Molnar
     
  • This way it checks if the clocks are synchronized between CPUs too.
    This might be able to detect slowly drifting TSCs which only
    go wrong over longer time.

    Signed-off-by: Andi Kleen
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Andi Kleen
     
  • clocksource_watchdog can use a deferrable timer - reduces wakeups from
    idle per second.

    Signed-off-by: Parag Warudkar
    Cc: john stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Parag Warudkar
     
  • - getnstimeofday() was just a wrapper around __get_realtime_clock_ts()
    - Replace calls to __get_realtime_clock_ts() by calls to getnstimeofday()
    - Fix bogus reference to get_realtime_clock_ts(), which never existed

    Signed-off-by: Geert Uytterhoeven
    Cc: john stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Geert Uytterhoeven
     
  • Signed-off-by: Atsushi Nemoto
    Cc: john stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Atsushi Nemoto
     
  • clean up tick-broadcast.c

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • I was confused by FSEC = 10^15 NSEC statement, plus small whitespace
    fixes. When there's copyright, there should be GPL.

    Signed-off-by: Pavel Machek
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Pavel Machek