13 Feb, 2007

40 commits

  • x86-64 is missing these:

    Signed-off-by: Jeff Garzik
    Signed-off-by: Andi Kleen

    Jeff Garzik
     
  • We trust the e820 table, so explicitely reserving ROMs shouldn't
    be needed.

    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • Should be harmless because there is normally no memory there, but
    technically it was incorrect.

    Pointed out by Leo Duran

    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • Initialize FS and GS to __KERNEL_DS as well. The actual value of them is not
    important, but it is important to reload them in protected mode. At this time,
    they still retain the real mode values from initial boot. VT disallows
    execution of code under such conditions, which means hardware virtualization
    can not be used to boot the kernel on Intel platforms, making the boot time
    painfully slow.

    This requires moving the GS load before the load of GS_BASE, so just move
    all the segments loads there to keep them together in the code.

    Signed-off-by: Zachary Amsden
    Signed-off-by: Andi Kleen

    Zachary Amsden
     
  • The symbol is needed to manipulate page tables, and modules shouldn't
    do that.

    Leftover from 2.4, but no in tree module should need it now.

    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • This means if an illegal value is set for the segment registers there
    ptrace will error out now with an errno instead of silently ignoring
    it.

    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • Add failsafe mechanism to HPET/TSC clock calibration.

    Signed-off-by: Jack Steiner

    Updated to include failsafe mechanism & additional community feedback.
    Patch built on latest 2.6.20-rc4-mm1 tree.

    Signed-off-by: Andi Kleen

    Jack Steiner
     
  • mtrr: fix size_or_mask and size_and_mask

    This fixes two bugs in /proc/mtrr interface:
    o If physical address size crosses the 44 bit boundary
    size_or_mask is evaluated wrong.
    o size_and_mask limits width of physical base
    address for an MTRR to be less than 44 bits.

    TBD: later patch had one more change, but I think that was bogus.
    TBD: need to double check

    Signed-off-by: Andreas Herrmann
    Signed-off-by: Andi Kleen

    Andreas Herrmann
     
  • Byte-to-byte identical /proc/apm here.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andi Kleen

    Alexey Dobriyan
     
  • Old code was legal standard C, but apparently not sparse-C.

    Signed-off-by: Josef 'Jeff' Sipek
    Signed-off-by: Andi Kleen

    Josef 'Jeff' Sipek
     
  • It will execure cpuid only on the cpu we need.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andi Kleen

    Alexey Dobriyan
     
  • It will execute rdmsr and wrmsr only on the cpu we need.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andi Kleen

    Alexey Dobriyan
     
  • Some typos in Kconfig.

    Signed-off-by: Nicolas Kaiser
    Signed-off-by: Andi Kleen

    Nicolas Kaiser
     
  • - Remove outdated comment
    - Use cpu_relax() in a busy loop

    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • Function is dead.

    Signed-off-by: Jan Beulich
    Signed-off-by: Andi Kleen

    Jan Beulich
     
  • When a machine check event is detected (including a AMD RevF threshold
    overflow event) allow to run a "trigger" program. This allows user space
    to react to such events sooner.

    The trigger is configured using a new trigger entry in the
    machinecheck sysfs interface. It is currently shared between
    all CPUs.

    I also fixed the AMD threshold handler to run the machine
    check polling code immediately to actually log any events
    that might have caused the threshold interrupt.

    Also added some documentation for the mce sysfs interface.

    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • while debugging an unrelated problem in Xen, I noticed odd reads from
    non-existent MSRs. Having now found time to look why these happen, I
    came up with below patch, which
    - prevents accessing MCi_MISCj with j > 0 when the block pointer in
    MCi_MISC0 is zero
    - accesses only contiguous MCi_MISCj until a non-implemented one is
    found
    - doesn't touch unimplemented blocks in mce_threshold_interrupt at all
    - gives names to two bits previously derived from MASK_VALID_HI (it
    took me some time to understand the code without this)

    The first three items, besides being apparently closer to the spec, should
    namely help cutting down on the time mce_threshold_interrupt() takes.

    Signed-off-by: Andi Kleen

    Jan Beulich
     
  • Remove all parameters from this function that aren't really variable.

    Signed-off-by: Jan Beulich
    Signed-off-by: Andi Kleen

    Jan Beulich
     
  • List x86_64 quilt tree in MAINTAINERS.

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andi Kleen

    Randy Dunlap
     
  • Fix typos.
    Lots of whitespace changes for readability and consistency.

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andi Kleen

    Randy Dunlap
     
  • Handle these 32 bit perfmon counter MSR writes cleanly in oprofile.

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Andi Kleen

    Venkatesh Pallipadi
     
  • Change i386 nmi handler to handle 32 bit perfmon counter MSR writes cleanly.

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Andi Kleen

    Venkatesh Pallipadi
     
  • P6 CPUs and Core/Core 2 CPUs which has 'architectural perf mon' feature,
    only supports write of low 32 bits in Performance Monitoring Counters.
    Bits 32..39 are sign extended based on bit 31 and bits 40..63 are reserved
    and should be zero.

    This patch:

    Change x86_64 nmi handler to handle this case cleanly.

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Andi Kleen

    Venkatesh Pallipadi
     
  • This is a tiny cleanup to increase readability

    Signed-off-by: Glauber de Oliveira Costa
    Signed-off-by: Andi Kleen
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton

    Glauber de Oliveira Costa
     
  • Unlike x86, x86_64 already passes arguments in registers. The use of
    regparm attribute makes no difference in produced code, and the use of
    fastcall just bloats the code.

    Signed-off-by: Glauber de Oliveira Costa
    Signed-off-by: Andi Kleen
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton

    Glauber de Oliveira Costa
     
  • This patch resolves the issue of running with numa=fake=X on kernel command
    line on x86_64 machines that have big IO hole. While calculating the size
    of each node now we look at the total hole size in that range.

    Previously there were nodes that only had IO holes in them causing kernel
    boot problems. We now use the NODE_MIN_SIZE (64MB) as the minimum size of
    memory that any node must have. We reduce the number of allocated nodes if
    the number of nodes specified on kernel command line results in any node
    getting memory smaller than NODE_MIN_SIZE.

    This change allows the extra memory to be incremented in NODE_MIN_SIZE
    granule and uniformly distribute among as many nodes (called big nodes) as
    possible.

    [akpm@osdl.org: build fix]
    Signed-off-by: David Rientjes
    Signed-off-by: Paul Menage
    Signed-off-by: Rohit Seth
    Signed-off-by: Andi Kleen
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton

    Rohit Seth
     
  • Use adding __init to romsignature() (it's only called from probe_roms()
    which is itself __init) as an excuse to submit a pedantic cleanup.

    Signed-off-by: Rene Herman
    Signed-off-by: Andi Kleen
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton

    Rene Herman
     
  • Clean up sched_clock() on i686: it will use the TSC if available and falls
    back to jiffies only if the user asked for it to be disabled via notsc or
    the CPU calibration code didnt figure out the right cpu_khz.

    This generally makes the scheduler timestamps more finegrained, on all
    hardware. (the current scheduler is pretty resistant against asynchronous
    sched_clock() values on different CPUs, it will allow at most up to a jiffy
    of jitter.)

    Also simplify sched_clock()'s check for TSC availability: propagate the
    desire and ability to use the TSC into the tsc_disable flag, previously
    this flag only indicated whether the notsc option was passed. This makes
    the rare low-res sched_clock() codepath a single branch off a read-mostly
    flag.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Andi Kleen

    Ingo Molnar
     
  • Add a notifier mechanism to the low level idle loop. You can register a
    callback function which gets invoked on entry and exit from the low level idle
    loop. The low level idle loop is defined as the polling loop, low-power call,
    or the mwait instruction. Interrupts processed by the idle thread are not
    considered part of the low level loop.

    The notifier can be used to measure precisely how much is spent in useless
    execution (or low power mode). The perfmon subsystem uses it to turn on/off
    monitoring.

    Signed-off-by: stephane eranian
    Signed-off-by: Andrew Morton
    Signed-off-by: Andi Kleen

    Stephane Eranian
     
  • Every file should include the headers containing the prototypes for
    it's global functions.

    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Andi Kleen

    Adrian Bunk
     
  • o init() is a non __init function in .text section but it calls many
    functions which are in .init.text section. Hence MODPOST generates lots
    of cross reference warnings on i386 if compiled with CONFIG_RELOCATABLE=y

    WARNING: vmlinux - Section mismatch: reference to .init.text:smp_prepare_cpus from .text between 'init' (at offset 0xc0101049) and 'rest_init'
    WARNING: vmlinux - Section mismatch: reference to .init.text:migration_init from .text between 'init' (at offset 0xc010104e) and 'rest_init'
    WARNING: vmlinux - Section mismatch: reference to .init.text:spawn_ksoftirqd from .text between 'init' (at offset 0xc0101053) and 'rest_init'

    o This patch breaks down init() in two parts. One part which can go
    in .init.text section and can be freed and other part which has to
    be non __init(init_post()). Now init() calls init_post() and init_post()
    does not call any functions present in .init sections. Hence getting
    rid of warnings.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Andi Kleen

    Vivek Goyal
     
  • o Entry startup_32 was in .text section but it was accessing some init
    data too and it prompts MODPOST to generate compilation warnings.

    WARNING: vmlinux - Section mismatch: reference to .init.data:boot_params from
    .text between '_text' (at offset 0xc0100029) and 'startup_32_smp'
    WARNING: vmlinux - Section mismatch: reference to .init.data:boot_params from
    .text between '_text' (at offset 0xc0100037) and 'startup_32_smp'
    WARNING: vmlinux - Section mismatch: reference to
    .init.data:init_pg_tables_end from .text between '_text' (at offset
    0xc0100099) and 'startup_32_smp'

    o Can't move startup_32 to .init.text as this entry point has to be at the
    start of bzImage. Hence moved startup_32 to a new section .text.head and
    instructed MODPOST to not to generate warnings if init data is being
    accessed from .text.head section. This code has been audited.

    o SMP boot up code (startup_32_smp) can go into .init.text if CPU hotplug
    is not supported. Otherwise it generates more warnings

    WARNING: vmlinux - Section mismatch: reference to .init.data:new_cpu_data from
    .text between 'checkCPUtype' (at offset 0xc0100126) and 'is486'
    WARNING: vmlinux - Section mismatch: reference to .init.data:new_cpu_data from
    .text between 'checkCPUtype' (at offset 0xc0100130) and 'is486'

    Signed-off-by: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Andi Kleen

    Vivek Goyal
     
  • Deliberate register clobber around performance critical inline code is great for
    testing, bad to leave on by default. Many people ship with DEBUG_KERNEL turned
    on, so stop making DEBUG_PARAVIRT default on.

    Signed-off-by: Zachary Amsden
    Signed-off-by: Andi Kleen

    Zachary Amsden
     
  • Because timer code moves around, and we might eventually move our init to a
    late_time_init hook, save and restore IRQs around this code because it is
    definitely not interrupt safe.

    Signed-off-by: Zachary Amsden
    Signed-off-by: Andi Kleen

    Zachary Amsden
     
  • Kprobes bugfix for paravirt compatibility - RPL on the CS when inserting
    BPs must match running kernel.

    Signed-off-by: Zachary Amsden
    Signed-off-by: Andi Kleen
    CC: Eric Biederman

    Zachary Amsden
     
  • Profile_pc was broken when using paravirtualization because the
    assumption the kernel was running at CPL 0 was violated, causing
    bad logic to read a random value off the stack.

    The only way to be in kernel lock functions is to be in kernel
    code, so validate that assumption explicitly by checking the CS
    value. We don't want to be fooled by BIOS / APM segments and
    try to read those stacks, so only match KERNEL_CS.

    I moved some stuff in segment.h to make it prettier.

    Signed-off-by: Zachary Amsden
    Signed-off-by: Andi Kleen

    Zachary Amsden
     
  • VMI timer code. It works by taking over the local APIC clock when APIC is
    configured, which requires a couple hooks into the APIC code. The backend
    timer code could be commonized into the timer infrastructure, but there are
    some pieces missing (stolen time, in particular), and the exact semantics of
    when to do accounting for NO_IDLE need to be shared between different
    hypervisors as well. So for now, VMI timer is a separate module.

    [Adrian Bunk: cleanups]

    Subject: VMI timer patches
    Signed-off-by: Zachary Amsden
    Signed-off-by: Andi Kleen
    Cc: Andi Kleen
    Cc: Jeremy Fitzhardinge
    Cc: Rusty Russell
    Cc: Chris Wright
    Signed-off-by: Andrew Morton

    Zachary Amsden
     
  • Fairly straightforward implementation of VMI backend for paravirt-ops.

    [Adrian Bunk: some cleanups]

    Signed-off-by: Zachary Amsden
    Signed-off-by: Andi Kleen
    Cc: Andi Kleen
    Cc: Jeremy Fitzhardinge
    Cc: Rusty Russell
    Cc: Chris Wright
    Signed-off-by: Andrew Morton

    Zachary Amsden
     
  • Add VMI SMP boot hook. We emulate a regular boot sequence and use the same
    APIC IPI initiation, we just poke magic values to load into the CPU state when
    the startup IPI is received, rather than having to jump through a real mode
    trampoline.

    This is all that was needed to get SMP to work.

    Signed-off-by: Zachary Amsden
    Signed-off-by: Andi Kleen
    Cc: Andi Kleen
    Cc: Jeremy Fitzhardinge
    Cc: Rusty Russell
    Cc: Chris Wright
    Signed-off-by: Andrew Morton

    Zachary Amsden
     
  • I found a clever way to make the extra IOPL switching invisible to
    non-paravirt compiles - since kernel_rpl is statically defined to be zero
    there, and only non-zero rpl kernel have a problem restoring IOPL, as popf
    does not restore IOPL flags unless run at CPL-0.

    Signed-off-by: Zachary Amsden
    Signed-off-by: Andi Kleen
    Cc: Andi Kleen
    Cc: Jeremy Fitzhardinge
    Cc: Rusty Russell
    Cc: Chris Wright
    Signed-off-by: Andrew Morton

    Zachary Amsden