26 Sep, 2006

40 commits

  • kexec: Avoid overwriting the current pgd (V4, i386)

    This patch upgrades the i386-specific kexec code to avoid overwriting the
    current pgd. Overwriting the current pgd is bad when CONFIG_CRASH_DUMP is used
    to start a secondary kernel that dumps the memory of the previous kernel.

    The code introduces a new set of page tables. These tables are used to provide
    an executable identity mapping without overwriting the current pgd.

    Signed-off-by: Magnus Damm
    Signed-off-by: Andi Kleen

    Magnus Damm
     
  • kexec: Avoid overwriting the current pgd (V4, x86_64)

    This patch upgrades the x86_64-specific kexec code to avoid overwriting the
    current pgd. Overwriting the current pgd is bad when CONFIG_CRASH_DUMP is used
    to start a secondary kernel that dumps the memory of the previous kernel.

    The code introduces a new set of page tables. These tables are used to provide
    an executable identity mapping without overwriting the current pgd.

    Signed-off-by: Magnus Damm
    Signed-off-by: Andi Kleen

    Magnus Damm
     
  • Remove most of the special cases for the debug IST stack. This is a
    follow on clean up patch, it requires the bug fix patch that adds
    orig_ist.

    Signed-off-by: Keith Owens
    Signed-off-by: Andi Kleen

    Keith Owens
     
  • The EDD code would scan the command line as a fixed array, without
    taking account of either whitespace, null-termination, the old
    command-line protocol, late overrides early, or the fact that the
    command line may not be reachable from INITSEG.

    This should fix those problems, and enable us to use a longer command
    line.

    Signed-off-by: H. Peter Anvin
    Signed-off-by: Andi Kleen

    H. Peter Anvin
     
  • Based on a idea by Jeremy Fitzhardinge:

    Replace the volatiles and memory clobbers in the PDA access with
    telling gcc about access to a proxy PDA structure that doesn't
    actually exist. But the dummy accesses give a defined ordering for
    read/write accesses.

    Also add some memory barriers to the early GS initialization to
    make sure no PDA access is moved before it.

    Advantage is some .text savings (probably most from better
    code for accessing "current"):

    text data bss dec hex filename
    4845647 1223688 615864 6685199 66020f vmlinux
    4837780 1223688 615864 6677332 65e354 vmlinux-pda

    1.2% smaller code

    Cc: Jeremy Fitzhardinge
    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • They cannot be actually freed because the FACS table has a
    shared-with-the-BIOS lock.

    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • This patch updates x86_64 linker script to pack any .note.* sections
    into a PT_NOTE segment in the output file.

    To do this, we tell ld that we need a PT_NOTE segment. This requires
    us to start explicitly mapping sections to segments, so we also need
    to explicitly create PT_LOAD segments for text and data, and map the
    sections to them appropriately. Fortunately, each section will
    default to its previous section's segment, so it doesn't take many
    changes to vmlinux.lds.S.

    The corresponding change is already made for i386 in -mm and I'd like
    this patch to join it. The section to segment mappings do change as do
    the segment flags so some time in -mm would be good for that reason as
    well, just in case.

    In particular .data and .bss move from the text segment to the data
    segment and .data.cacheline_aligned .data.read_mostly are put in the
    data segment instead of a separate one.

    I think that it would be possible to exactly match the existing section
    to segment mapping and flags but it would be a more intrusive change and
    I'm not sure there is a reason for the existing layout other than it is
    what you get by default if you don't explicitly specify something else.
    If there is a reason for the existing layout then I will of course make
    the more intrusive change. If there is no reason we could probably drop
    the executable or writable flags from some segments but I don't know how
    much attention is paid to them anyway so it might not be worth the
    effort.

    The vsyscall related sections need to go in a different segment to the
    normal data segment and so I invented a "user" segment to contain them.
    I believe this should appear to be another data segment as far as the
    kernel is concerned so the flags are setup accordingly.

    The notes will be used in the Xen paravirt_ops backend to provide
    additional information to the domain builder. I am in the process of
    converting the xen-unstable kernels and tools over to this scheme at the
    moment to support this in the future.

    It has been suggested to me that the notes segment should have flags 0
    (i.e. not readable) since it is only used by the loader and is not used
    at runtime. For now I went with a readable segment since that is what
    the i386 patch uses.

    AK: dropped NOTES addition right now because the needed infrastructure
    for that is not merged yet

    Signed-off-by: Ian Campbell
    Signed-off-by: Andi Kleen

    Ian Campbell
     
  • In long mode the %cs is largely a relic. However there are a few cases
    like iret where it matters that we have a valid value. Without this
    patch it is possible to enter the kernel in startup_64 without setting
    %cs to a valid value. With this patch we don't care what %cs value
    we enter the kernel with, so long as the cs shadow register indicates
    it is a privileged code segment.

    Thanks to Magnus Damm for finding this problem and posting the
    first workable patch. I have moved the jump to set %cs down a
    few instructions so we don't need to take an extra jump. Which
    keeps the code simpler.

    Signed-of-by: Eric W. Biederman
    Signed-off-by: Andi Kleen

    Eric W. Biederman
     
  • Based on patch from David Rientjes , but
    changed by AK.

    Optimizes the 64-bit hamming weight for x86_64 processors assuming they
    have fast multiplication. Uses five fewer bitops than the generic
    hweight64. Benchmark on one EMT64 showed ~25% speedup with 2^24
    consecutive calls.

    Define a new ARCH_HAS_FAST_MULTIPLIER that can be set by other
    architectures that can also multiply fast.

    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • Drop support for non e820 BIOS calls to get the memory map.

    The boot assembler code still has some support, but not the C code now.

    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • When compiling a 64-bit kernel on an Ubuntu 6.06 32bit system (whose GCC is also
    a cross-compiler for x86_64) I've seen that head.o is compiled as a 64-bit file
    (while it should not) and ld complaining about this during linking:
    [AK: it happens on all systems with new binutils]

    ld: warning: i386:x86-64 architecture of input file
    `arch/x86_64/boot/compressed/head.o' is incompatible with i386 output

    I've verified that removing -m64 from compilation flags to turn
    "-m64 -traditional -m32" into "-traditional -m32" fixes the issue.

    Signed-off-by: Paolo 'Blaisorblade' Giarrusso
    Signed-off-by: Andi Kleen

    Paolo 'Blaisorblade' Giarrusso
     
  • NMIs are not supposed to track the irq flags, but TRACE_IRQS_IRETQ
    did it anyways. Add a check.

    Cc: mingo@elte.hu

    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • Give the printks a consistent prefix.
    Add some missing white space.

    Cc: len.brown@intel.com

    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • - Remove a define that was used only once
    - Remove the too large APIC ID check because we always support
    the full 8bit range of APICs.
    - Restructure code a bit to be simpler.

    Cc: len.brown@intel.com

    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • ACPI went to great trouble to get the APIC version and CPU capabilities
    of different CPUs before passing them to the mpparser. But all
    that data was used was to print it out. Actually it even faked some data
    based on the boot cpu, not on the actual CPU being booted.

    Remove all this code because it's not needed.

    Cc: len.brown@intel.com

    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • Use normal pte accessors in change_page_attr() to access the PSE
    bits.

    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • Fix the pte_exec/mkexec page table accessor functions to really
    use the NX bit. Previously they only checked the USER bit, but
    weren't actually used for anything.

    Then use them in change_page_attr() to manipulate the NX bit
    properly.

    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • It is correct for its only caller right now, but not for possible
    future others.

    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • And replace all users with ordinary smp_processor_id. The function
    was originally added to get some basic oops information out even
    if the GS register was corrupted. However that didn't
    work for some anymore because printk is needed to print the oops
    and it uses smp_processor_id() already. Also GS register corruptions
    are not particularly common anymore.

    This also helps the Xen port which would otherwise need to
    do this in a special way because it can't access the local APIC.

    Cc: Chris Wright

    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • Detect the situations in which the time after a resume from disk would
    be earlier than the time before the suspend and prevent them from
    happening on x86_64.

    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Andi Kleen

    Rafael J. Wysocki
     
  • In i386's entry.S, FIX_STACK() needs annotation because it
    replaces the stack pointer. And the rest of nmi() needs
    annotation in order to compile with these new annotations.

    Signed-off-by: Chuck Ebbert
    Signed-off-by: Andi Kleen

    Chuck Ebbert
     
  • Signed-off-by: Andrew Morton
    Signed-off-by: Andi Kleen

    Andrew Morton
     
  • Apparently IA64 needs it, but i386/x86-64 don't anymore
    since gcc 2.95 support was dropped. Nobody else on linux-arch
    requested keeping it generically

    Cc: tony.luck@intel.com
    Cc: kaos@sgi.com

    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • From i386 x86-64 inherited code to force reserve the 640k-1MB area.
    That was needed on some old systems.

    But we generally trust the e820 map to be correct on 64bit systems
    and mark all areas that are not memory correctly.

    This patch will allow to use the real memory in there.

    Or rather the only way to find out if it's still needed is to
    try. So far I'm optimistic.

    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • This is now automatically included by kbuild.

    Signed-off-by: Dave Jones
    Signed-off-by: Andi Kleen

    Dave Jones
     
  • A kprobe executes IRET early and that could cause NMI recursion and stack
    corruption.

    Note: This problem was originally spotted and solved by Andi Kleen in the
    x86_64 architecture. This patch is an adaption of his patch for i386.

    AK: Merged with current code which was a bit different.
    AK: Removed printk in nmi handler that shouldn't be there in the first time
    AK: Added missing include.
    AK: added KPROBES_END

    Signed-off-by: Fernando Vazquez
    Signed-off-by: Andi Kleen

    Fernando Luis Vázquez Cao
     
  • A kprobe executes IRET early and that could cause NMI recursion and stack
    corruption.

    Note: This problem was originally spotted by Andi Kleen. This patch
    adds fixes not included in his original patch.
    [AK: Jan Beulich originally discovered these classes of bugs]

    Signed-off-by: Fernando Vazquez
    Signed-off-by: Andi Kleen

    Fernando Luis Vázquez Cao
     
  • Mark i386-specific cpu cache functions as __cpuinit. They are all
    only called from arch/i386/common.c:display_cache_info() that already is
    marked as __cpuinit.

    Signed-off-by: Magnus Damm
    Signed-off-by: Andi Kleen

    Magnus Damm
     
  • Mark i386-specific cpu identification functions as __cpuinit. They are all
    only called from arch/i386/common.c:identify_cpu() that already is marked as
    __cpuinit.

    Signed-off-by: Magnus Damm
    Signed-off-by: Andi Kleen

    Magnus Damm
     
  • Mark i386-specific cpu init functions as __cpuinit. They are all
    only called from arch/i386/common.c:identify_cpu() that already is marked as
    __cpuinit. This patch also removes the empty function init_umc().

    Signed-off-by: Magnus Damm
    Signed-off-by: Andi Kleen

    Magnus Damm
     
  • The different cpu_dev structures are all used from __cpuinit callers what
    I can tell. So mark them as __cpuinitdata instead of __initdata. I am a
    little bit unsure about arch/i386/common.c:default_cpu, especially when it
    comes to the purpose of this_cpu.

    Signed-off-by: Magnus Damm
    Signed-off-by: Andi Kleen

    Magnus Damm
     
  • The init_amd() function is only called from identify_cpu() which is already
    marked as __cpuinit. So let's mark it as __cpuinit.

    Signed-off-by: Magnus Damm
    Signed-off-by: Andi Kleen

    Magnus Damm
     
  • cpu_dev->c_identify is only called from arch/i386/common.c:identify_cpu(), and
    this after generic_identify() already has been called. There is no need to call
    this function twice and hook it in c_identify - but I may be wrong, please
    double check before applying.

    This patch also removes generic_identify() from cpu.h to avoid unnecessary
    future nesting.

    Signed-off-by: Magnus Damm
    Signed-off-by: Andi Kleen

    Magnus Damm
     
  • Fix for the x86_64 kernel mapping code. Without this patch the update path
    only inits one pmd_page worth of memory and tramples any entries on it. now
    the calling convention to phys_pmd_init and phys_init is to always pass a
    [pmd/pud] page not an offset within a page.

    Signed-off-by: Keith Mannthey
    Signed-off-by: Andi Kleen
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton

    Keith Mannthey
     
  • Implement pause_on_oops() on x86_64.

    AK: I redid the patch to do the oops_enter/exit in the existing
    oops_begin()/end(). This makes it much shorter.

    Signed-off-by: Andrew Morton
    Signed-off-by: Andi Kleen

    Andrew Morton
     
  • Right now the kernel on x86-64 has a 100% lazy fpu behavior: after *every*
    context switch a trap is taken for the first FPU use to restore the FPU
    context lazily. This is of course great for applications that have very
    sporadic or no FPU use (since then you avoid doing the expensive
    save/restore all the time). However for very frequent FPU users... you
    take an extra trap every context switch.

    The patch below adds a simple heuristic to this code: After 5 consecutive
    context switches of FPU use, the lazy behavior is disabled and the context
    gets restored every context switch. If the app indeed uses the FPU, the
    trap is avoided. (the chance of the 6th time slice using FPU after the
    previous 5 having done so are quite high obviously).

    After 256 switches, this is reset and lazy behavior is returned (until
    there are 5 consecutive ones again). The reason for this is to give apps
    that do longer bursts of FPU use still the lazy behavior back after some
    time.

    [akpm@osdl.org: place new task_struct field next to jit_keyring to save space]
    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andi Kleen
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton

    Arjan van de Ven
     
  • This patch enables ACPI based physical CPU hotplug support for x86_64.
    Implements acpi_map_lsapic() and acpi_unmap_lsapic() to support physical cpu
    hotplug.

    Signed-off-by: Ashok Raj
    Signed-off-by: Andi Kleen
    Cc: Andi Kleen
    Cc: "Brown, Len"
    Signed-off-by: Andrew Morton

    Ashok Raj
     
  • Make an mmconfig warning print the bus id with a regular format.

    Signed-off-by: Brice Goglin
    Signed-off-by: Andi Kleen
    Signed-off-by: Andrew Morton

    Brice Goglin
     
  • cyrix_identify() should be __init because transmeta_identify() is.
    tsc_init() is only called from setup_arch() which is marked as __init.

    These two section mismatches have been detected using running modpost on
    a vmlinux image compiled with CONFIG_RELOCATABLE=y.

    Signed-off-by: Magnus Damm
    Signed-off-by: Andi Kleen

    Magnus Damm
     
  • There is no need to duplicate the topology_init() function.

    Signed-off-by: Magnus Damm
    Signed-off-by: Andi Kleen

    Magnus Damm