26 Sep, 2006

40 commits

  • Nobody has been setting the mismatch counter and the ifdef was never
    set so remove it.
    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • IO-APIC or local APIC can only be disabled at runtime anyways and
    Kconfig has forced these options on for a long time now.

    The Kconfigs are kept only now for the benefit of the shared acpi
    boot.c code.

    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • Signed-off-by: Andi Kleen

    Andi Kleen
     
  • And move the comment to a proper place.

    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • - Move them to a pure assembly file. Previously they were in
    a C file that only consisted of inline assembly. Doing it in pure
    assembler is much nicer.
    - Add a frame.i include with FRAME/ENDFRAME macros to easily
    add frame pointers to assembly functions
    - Add dwarf2 annotation to them so that the new dwarf2 unwinder
    doesn't get stuck on them
    - Random cleanups

    Includes feedback from Jan Beulich and a UML build fix from Andrew
    Morton.

    Cc: jbeulich@novell.com
    Cc: jdike@addtoit.com
    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • LOCK_PREFIX is replaced by nops on UP systems, so it has to be a special
    macro. Previously this was only possible from C. Allow it for pure
    assembly files too. Similar to earlier x86-64 patch.
    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • Previously it didn't align. Use the same one as the C compiler
    in blended mode, which is good for K8 and Core2 and doesn't hurt
    on P4.

    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • rwlocks are now out of line, so it near never triggers. Also it was
    incompatible with the new dwarf2 unwinder because it had unannotiatable
    push/pops.

    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • - Move the slow path fallbacks to their own assembly files
    This makes them much easier to read and is needed for the next change.
    - Add CFI annotations for unwinding (XXX need review)
    - Remove constant case which can never happen with out of line spinlocks
    - Use patchable LOCK prefixes
    - Don't use lock sections anymore for inline code because they can't
    be expressed by the unwinder (this adds one taken jump to the lock
    fast path)

    Cc: jbeulich@novell.com

    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • Signed-off-by: Andi Kleen

    Andi Kleen
     
  • Signed-off-by: Andi Kleen

    Andi Kleen
     
  • Use knowledge about EFLAGS layout (bits 22:63 are always 0) to distingush
    EFLAGS word and kernel address in the spin lock stack frame.

    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • This ports the algorithm from x86-64 (with improvements) to i386.
    Previously this only worked for frame pointer enabled kernels.
    But spinlocks have a very simple stack frame that can be manually
    analyzed. Do this.

    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • Based on patch from Frank van Maarseveen , but
    extended.

    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • A few trivial spelling and grammar mistakes picked up in
    "arch/x86_64/aperture.c", "arch/x86_64/crash.c" and
    "arch/x86_64/apic.c". I think all are correct fixes but am ever aware
    of my fallibility :o) This is my first patch submission so all
    feedback is appreciated, esp. WRT CCing to Linus, Andi and
    trivial@kernel.org, is this correct? And which is the most appropriate
    kernel version to diff against? If any.

    Should apply cleanly to 2.6.18-rc1

    Signed-off-by: Adam Henley
    Signed-off-by: Andi Kleen

    - adam

    Adam Henley
     
  • virtual addresses don't belong into kernel logs for non debugging

    Cc: clemens@ladisch.de
    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • Hello,

    Following my discussion with Andi. Here is a patch that introduces
    two new TIF flags to simplify the context switch code in __switch_to().
    The idea is to minimize the number of cache lines accessed in the common
    case, i.e., when neither the debug registers nor the I/O bitmap are used.

    This patch covers the x86-64 modifications. A patch for i386 follows.

    Changelog:
    - add TIF_DEBUG to track when debug registers are active
    - add TIF_IO_BITMAP to track when I/O bitmap is used
    - modify __switch_to() to use the new TIF flags

    : eranian@hpl.hp.com

    Signed-off-by: Andi Kleen

    Stephane Eranian
     
  • No need to include it from entry.S
    Drop all the #ifdef __ASSEMBLY__

    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • For NUMA optimization and some other algorithms it is useful to have a fast
    to get the current CPU and node numbers in user space.

    x86-64 added a fast way to do this in a vsyscall. This adds a generic
    syscall for other architectures to make it a generic portable facility.

    I expect some of them will also implement it as a faster vsyscall.

    The cache is an optimization for the x86-64 vsyscall optimization. Since
    what the syscall returns is an approximation anyways and user space
    often wants very fast results it can be cached for some time. The norma
    methods to get this information in user space are relatively slow

    The vsyscall is in a better position to manage the cache because it has direct
    access to a fast time stamp (jiffies). For the generic syscall optimization
    it doesn't help much, but enforce a valid argument to keep programs
    portable

    I only added an i386 syscall entry for now. Other architectures can follow
    as needed.

    AK: Also added some cleanups from Andrew Morton

    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • This patch adds a vgetcpu vsyscall, which depending on the CPU RDTSCP
    capability uses either the RDTSCP or CPUID to obtain a CPU and node
    numbers and pass them to the program.

    AK: Lots of changes over Vojtech's original code:
    Better prototype for vgetcpu()
    It's better to pass the cpu / node numbers as separate arguments
    to avoid mistakes when going from SMP to NUMA.
    Also add a fast time stamp based cache using a user supplied
    argument to speed things more up.
    Use fast method from Chuck Ebbert to retrieve node/cpu from
    GDT limit instead of CPUID
    Made sure RDTSCP init is always executed after node is known.
    Drop printk

    Signed-off-by: Vojtech Pavlik
    Signed-off-by: Andi Kleen

    Vojtech Pavlik
     
  • This patch adds initalization of the RDTSCP auxilliary values to CPU numbers
    to time.c. If RDTSCP is available, the MSRs are written with the respective
    values. It can be later used to initalize per-cpu timekeeping variables.

    AK: Some cleanups. Move externs into headers and fix CPU hotplug.

    Signed-off-by: Vojtech Pavlik
    Signed-off-by: Andi Kleen

    Vojtech Pavlik
     
  • This patch adds macros for reading tsc via the RDTSCP instruction, as well
    as writing the auxilliary MSR read by RDTSCP to msr.h

    [AK: changed rdtscp definition for old binutils]

    Signed-off-by: Vojtech Pavlik
    Signed-off-by: Andi Kleen

    Vojtech Pavlik
     
  • AK: This redoes the changes I temporarily reverted.

    Intel now has support for Architectural Performance Monitoring Counters
    ( Refer to IA-32 Intel Architecture Software Developer's Manual
    http://www.intel.com/design/pentium4/manuals/253669.htm ). This
    feature is present starting from Intel Core Duo and Intel Core Solo processors.

    What this means is, the performance monitoring counters and some performance
    monitoring events are now defined in an architectural way (using cpuid).
    And there will be no need to check for family/model etc for these architectural
    events.

    Below is the patch to use this performance counters in nmi watchdog driver.
    Patch handles both i386 and x86-64 kernels.

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Andi Kleen

    Venkatesh Pallipadi
     
  • I've had good experiences with having this on by default on x86-64.
    It turns nasty hangs into easier to debug oopses.

    Enable the local APIC wdog by default for systems newer than 2004.

    This comes from a strange compromise: according to arjan the reason
    it was off by default was some old IBM systems that corrupted
    registered when NMI happened in SMI. Can't remember more specific,
    but >= 2004 should avoid these. It's probably overly broad
    because most older systems should be ok (and the really old systems
    won't be supported by the local apic watchdog anyways)

    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • After a crash we should wait for NMI IPI event and not for external NMI or
    NMI watchdog tick.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Andi Kleen
    Cc: Don Zickus
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton

    Vivek Goyal
     
  • After a crash we should wait for NMI IPI event and not for external NMI or
    NMI watchdog tick.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Andi Kleen
    Cc: Don Zickus
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton

    Vivek Goyal
     
  • This patch makes the following needlessly global functions static:
    - nmi_int.c: profile_exceptions_notify()
    - nmi_timer_int.c: profile_timer_exceptions_notify()

    Signed-off-by: Adrian Bunk
    Signed-off-by: Andi Kleen

    Adrian Bunk
     
  • When a unknown NMI happened the panic would claim a NMI watchdog timeout.
    Also it would check the variable set by nmi_watchdog=panic and panic then.

    Fix up the panic message to be generic
    Unconditionally panic on unknown NMI when panic on unknown nmi is enabled.

    Noticed by Jan Beulich

    Cc: jbeulich@novell.com

    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • Making NMI suspend/resume work with SMP. We use CPU hotplug to offline
    APs in SMP suspend/resume. Only BSP executes sysdev's .suspend/.resume
    method. APs should follow CPU hotplug code path.

    And:

    +From: Don Zickus

    Makes the start/stop paths of nmi watchdog more robust to handle the
    suspend/resume cases more gracefully.

    AK: I merged the two patches together

    Signed-off-by: Shaohua Li
    Signed-off-by: Andi Kleen
    Cc: Don Zickus
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton

    Shaohua Li
     
  • Clean up some of the output messages on the nmi error paths to make more
    sense when they are displayed. This is mainly a cosmetic fix and
    shouldn't impact any normal code path.

    Signed-off-by: Don Zickus
    Signed-off-by: Andi Kleen

    Don Zickus
     
  • To quote Alan Cox:

    The default Linux behaviour on an NMI of either memory or unknown is to
    continue operation. For many environments such as scientific computing
    it is preferable that the box is taken out and the error dealt with than
    an uncorrected parity/ECC error get propogated.

    A small number of systems do generate NMI's for bizarre random reasons
    such as power management so the default is unchanged. In other respects
    the new proc/sys entry works like the existing panic controls already in
    that directory.

    This is separate to the edac support - EDAC allows supported chipsets to
    handle ECC errors well, this change allows unsupported cases to at least
    panic rather than cause problems further down the line.

    Signed-off-by: Don Zickus
    Signed-off-by: Andi Kleen

    Don Zickus
     
  • Adds a new /proc/sys/kernel/nmi_watchdog call that will enable/disable the
    nmi watchdog.

    By entering a non-zero value here, a user can enable the nmi watchdog to
    monitor the online cpus in the system. By entering a zero value here, a
    user can disable the nmi watchdog and free up a performance counter which
    could then be utilized by the oprofile subsystem, otherwise oprofile may be
    short a counter when in use.

    Signed-off-by: Don Zickus
    Signed-off-by: Andi Kleen
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton

    Don Zickus
     
  • Adds a new /proc/sys/kernel/nmi call that will enable/disable the nmi
    watchdog.

    Signed-off-by: Don Zickus
    Signed-off-by: Andi Kleen

    Don Zickus
     
  • Removes the un/set_nmi_callback and reserve/release_lapic_nmi functions as
    they are no longer needed. The various subsystems are modified to register
    with the die_notifier instead.

    Also includes compile fixes by Andrew Morton.

    Signed-off-by: Don Zickus
    Signed-off-by: Andi Kleen

    Don Zickus
     
  • Needed TIF_RESTORE_SIGMASK first

    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • We need TIF_RESTORE_SIGMASK in order to support ppoll() and pselect()
    system calls. This patch originally came from Andi, and was based
    heavily on David Howells' implementation of same on i386. I fixed a typo
    which was causing do_signal() to use the wrong signal mask.

    Signed-off-by: David Woodhouse
    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • This patch cleans up the NMI interrupt path. Instead of being gated by if
    the 'nmi callback' is set, the interrupt handler now calls everyone who is
    registered on the die_chain and additionally checks the nmi watchdog,
    reseting it if enabled. This allows more subsystems to hook into the NMI if
    they need to (without being block by set_nmi_callback).

    Signed-off-by: Don Zickus
    Signed-off-by: Andi Kleen

    Don Zickus
     
  • This patch includes the changes to make the nmi watchdog on i386 SMP aware.
    A bunch of code was moved around to make it simpler to read. In addition,
    it is now possible to determine if a particular NMI was the result of the
    watchdog or not. This feature allows the kernel to filter out unknown NMIs
    easier.

    Signed-off-by: Don Zickus
    Signed-off-by: Andi Kleen

    Don Zickus
     
  • This patch includes the changes to make the nmi watchdog on x86_64 SMP
    aware. A bunch of code was moved around to make it simpler to read. In
    addition, it is now possible to determine if a particular NMI was the result
    of the watchdog or not. This feature allows the kernel to filter out
    unknown NMIs easier.

    Signed-off-by: Don Zickus
    Signed-off-by: Andi Kleen

    Don Zickus
     
  • Incorporates the new performance counter reservation system in oprofile.
    Also cleans up a lot of the initialization code. The code original zero'd
    out every register associated with performance counters regardless if those
    registers were used or not. This causes issues with the nmi watchdog.
    Now oprofile tries to reserve registers and gives up if it can't get them.

    Cc: levon@movementarian.org
    Cc: oprofile-list@lists.sf.net

    Signed-off-by: Don Zickus
    Signed-off-by: Andi Kleen

    Don Zickus