08 Jul, 2008

40 commits

  • typo fixes from Randy Dunlap and Alan Cox.

    Signed-off-by: Cyrill Gorcunov
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • nmi_watchdog is set to NMI_NONE by default (ie disabled) on _any_
    mode so lets fix documentation too.

    Signed-off-by: Cyrill Gorcunov
    Cc: "Maciej W. Rozycki"
    Signed-off-by: Ingo Molnar

    Cyrill Gorcunov
     
  • change the enable_local_apic to static force_enable_local_apic for 32bit

    Signed-off-by: Yinghai Lu
    Signed-off-by: Ingo Molnar

    Yinghai Lu
     
  • use PMD_SHIFT to calculate boundary also adjust size for pre-allocated
    table size

    Signed-off-by: Yinghai Lu
    Cc: Jeremy Fitzhardinge
    Signed-off-by: Ingo Molnar

    Yinghai Lu
     
  • when 64bit resource is not enabled, we get:

    arch/x86/kernel/e820.c: In function ‘e820_reserve_resources’:
    arch/x86/kernel/e820.c:1217: warning: comparison is always false due to limited range of data type

    because res->start/end is resource_t aka u32. it will overflow.

    fix it with temp end of u64

    Signed-off-by: Yinghai Lu
    Signed-off-by: Ingo Molnar

    Yinghai Lu
     
  • some ram-end boundary only has page alignment, instead of 2M alignment.

    v2: make init_memory_mapping more solid: start could be any value other than 0
    v3: fix NON PAE by handling left over in kernel_physical_mapping

    Signed-off-by: Yinghai Lu
    Signed-off-by: Ingo Molnar

    Yinghai Lu
     
  • Make sure SWAPGS and PARAVIRT_ADJUST_EXCEPTION_FRAME are properly
    defined when CONFIG_PARAVIRT is off.

    Fixes Ingo's build failure:
    arch/x86/kernel/entry_64.S: Assembler messages:
    arch/x86/kernel/entry_64.S:1201: Error: invalid character '_' in mnemonic
    arch/x86/kernel/entry_64.S:1205: Error: invalid character '_' in mnemonic
    arch/x86/kernel/entry_64.S:1209: Error: invalid character '_' in mnemonic
    arch/x86/kernel/entry_64.S:1213: Error: invalid character '_' in mnemonic

    Signed-off-by: Jeremy Fitzhardinge
    Cc: Nick Piggin
    Cc: Mark McLoughlin
    Cc: xen-devel
    Cc: Eduardo Habkost
    Cc: Vegard Nossum
    Cc: Stephen Tweedie
    Signed-off-by: Ingo Molnar

    Jeremy Fitzhardinge
     
  • instead of calling it from trap_init()

    also move init ioapic mapping out of apic_32.c

    so 32 bit do same as 64 bit

    Signed-off-by: Yinghai Lu
    Signed-off-by: Ingo Molnar

    Yinghai Lu
     
  • v2: fix print info to cont

    Signed-off-by: Yinghai Lu
    Signed-off-by: Ingo Molnar

    Yinghai Lu
     
  • Ingo Molnar wrote:
    > that fixed the build but now we've got a boot crash with this config:
    >
    > time.c: Detected 2010.304 MHz processor.
    > spurious 8259A interrupt: IRQ7.
    > BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
    > IP: []
    > PGD 0
    > Thread overran stack, or stack corrupted
    > Oops: 0010 [1] SMP
    > CPU 0
    >

    I don't know if this will fix this bug, but it's definitely a bugfix.
    It was trashing random pages by overwriting them with pagetables...

    Don't trash a large pmd's data when mapping physical memory.
    This is a bugfix for "x86_64: adjust mapping of physical pagetables
    to work with Xen".

    Signed-off-by: Jeremy Fitzhardinge
    Cc: xen-devel
    Cc: Stephen Tweedie
    Cc: Eduardo Habkost
    Cc: Mark McLoughlin
    Cc: Vegard Nossum
    Cc: Nick Piggin
    Cc: Yinghai Lu
    Signed-off-by: Ingo Molnar

    Jeremy Fitzhardinge
     
  • Ingo Molnar wrote:
    > * Jeremy Fitzhardinge wrote:
    >
    >
    >>> It quickly broke the build in testing:
    >>>
    >>> include/asm/pgalloc.h: In function ‘paravirt_pgd_free':
    >>> include/asm/pgalloc.h:14: error: parameter name omitted
    >>> arch/x86/kernel/entry_64.S: In file included from
    >>> arch/x86/kernel/traps_64.c:51:include/asm/pgalloc.h: In function
    >>> ‘paravirt_pgd_free':
    >>> include/asm/pgalloc.h:14: error: parameter name omitted
    >>>
    >>>
    >> No, looks like my fault. The non-PARAVIRT version of
    >> paravirt_pgd_free() is:
    >>
    >> static inline void paravirt_pgd_free(struct mm_struct *mm, pgd_t *) {}
    >>
    >> but C doesn't like missing parameter names, even if unused.
    >>
    >> This should fix it:
    >>
    >
    > that fixed the build but now we've got a boot crash with this config:
    >
    > time.c: Detected 2010.304 MHz processor.
    > spurious 8259A interrupt: IRQ7.
    > BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
    > IP: []
    > PGD 0
    > Thread overran stack, or stack corrupted
    > Oops: 0010 [1] SMP
    > CPU 0
    >
    > with:
    >
    > http://redhat.com/~mingo/misc/config-Thu_Jun_26_12_46_46_CEST_2008.bad
    >

    Use SWAPGS_UNSAFE_STACK in ia32entry.S in the places where the active
    stack is the usermode stack.

    Signed-off-by: Jeremy Fitzhardinge
    Cc: xen-devel
    Cc: Stephen Tweedie
    Cc: Eduardo Habkost
    Cc: Mark McLoughlin
    Cc: Vegard Nossum
    Cc: Nick Piggin
    Cc: Yinghai Lu
    Signed-off-by: Ingo Molnar

    Jeremy Fitzhardinge
     
  • do that in init_memory_mapping

    also remove one init_ohci1394_dma_on_all_controllers

    Signed-off-by: Yinghai Lu
    Signed-off-by: Ingo Molnar

    Yinghai Lu
     
  • asm-x86/paravirt.h already have protection with CONFIG_PARAVIRT inside

    Signed-off-by: Yinghai Lu
    Signed-off-by: Ingo Molnar

    Yinghai Lu
     
  • This patch brings back limiting of the E820 map when a user-defined
    E820 map is specified. While the behaviour of i386 (32 bit) was to limit
    the E820 map (and /proc/iomem), the behaviour of x86-64 (64 bit) was not to
    limit.

    That patch limits the E820 map again for both x86 architectures.

    Code was tested for compilation and booting on a 32 bit and 64 bit system.

    Signed-off-by: Bernhard Walle
    Acked-by: Yinghai Lu
    Cc: kexec@lists.infradead.org
    Cc: vgoyal@redhat.com
    Cc: Bernhard Walle
    Signed-off-by: Ingo Molnar

    Bernhard Walle
     
  • The patch "x86: introduce init_memory_mapping for 32bit" does not allocate
    enough space for PTEs if the CPU does not implement PSE.

    Signed-off-by: Jeremy Fitzhardinge
    Acked-by: Yinghai Lu
    Signed-off-by: Ingo Molnar

    Jeremy Fitzhardinge
     
  • Signed-off-by: Eduardo Habkost
    Signed-off-by: Jeremy Fitzhardinge
    Cc: xen-devel
    Cc: Stephen Tweedie
    Cc: Eduardo Habkost
    Cc: Mark McLoughlin
    Signed-off-by: Ingo Molnar

    Jeremy Fitzhardinge
     
  • 64-bit Xen pushes a couple of extra words onto an exception frame.
    Add a hook to deal with them.

    Signed-off-by: Jeremy Fitzhardinge
    Cc: xen-devel
    Cc: Stephen Tweedie
    Cc: Eduardo Habkost
    Cc: Mark McLoughlin
    Signed-off-by: Ingo Molnar

    Jeremy Fitzhardinge
     
  • It's never safe to call a swapgs pvop when the user stack is current -
    it must be inline replaced. Rather than making a call, the
    SWAPGS_UNSAFE_STACK pvop always just puts "swapgs" as a placeholder,
    which must either be replaced inline or trap'n'emulated (somehow).

    Signed-off-by: Jeremy Fitzhardinge
    Cc: xen-devel
    Cc: Stephen Tweedie
    Cc: Eduardo Habkost
    Cc: Mark McLoughlin
    Signed-off-by: Ingo Molnar

    Jeremy Fitzhardinge
     
  • Replace privileged instructions with the corresponding pvops in
    ia32entry.S.

    Signed-off-by: Jeremy Fitzhardinge
    Cc: xen-devel
    Cc: Stephen Tweedie
    Cc: Eduardo Habkost
    Cc: Mark McLoughlin
    Signed-off-by: Ingo Molnar

    Jeremy Fitzhardinge
     
  • In a 64-bit system, we need separate sysret/sysexit operations to
    return to a 32-bit userspace.

    Signed-off-by: Jeremy Fitzhardinge
    Cc: xen-devel
    Cc: Stephen Tweedie
    Cc: Eduardo Habkost
    Cc: Mark McLoughlin
    Signed-off-by: Ingo Molnar

    Jeremy Fitzhardinge
     
  • There's no need to combine restoring the user rsp within the sysret
    pvop, so split it out. This makes the pvop's semantics closer to the
    machine instruction.

    Signed-off-by: Jeremy Fitzhardinge
    Cc: xen-devel
    Cc: Stephen Tweedie
    Cc: Eduardo Habkost
    Cc: Mark McLoughlin
    Signed-off-by: Ingo Molnar

    Jeremy Fitzhardinge
     
  • Don't conflate sysret and sysexit; they're different instructions with
    different semantics, and may be in use at the same time (at least
    within the same kernel, depending on whether its an Intel or AMD
    system).

    sysexit - just return to userspace, does no register restoration of
    any kind; must explicitly atomically enable interrupts.

    sysret - reloads flags from r11, so no need to explicitly enable
    interrupts on 64-bit, responsible for restoring usermode %gs

    Signed-off-by: Jeremy Fitzhardinge
    Cc: xen-devel
    Cc: Stephen Tweedie
    Cc: Eduardo Habkost
    Cc: Mark McLoughlin
    Signed-off-by: Ingo Molnar

    Jeremy Fitzhardinge
     
  • This is needed when the kernel is running on RING3, such as under Xen.
    x86_64 has a weird feature that makes it #GP on iret when SS is a null
    descriptor.

    This need to be tested on bare metal to make sure it doesn't cause any
    problems. AMD specs say SS is always ignored (except on iret?).

    Signed-off-by: Eduardo Habkost
    Signed-off-by: Jeremy Fitzhardinge
    Cc: xen-devel
    Cc: Stephen Tweedie
    Cc: Eduardo Habkost
    Cc: Mark McLoughlin
    Signed-off-by: Ingo Molnar

    Jeremy Fitzhardinge
     
  • We must do this because load_TLS() may need to clear %fs and %gs.
    (e.g. under Xen).

    Signed-off-by: Eduardo Habkost
    Signed-off-by: Jeremy Fitzhardinge
    Cc: xen-devel
    Cc: Stephen Tweedie
    Cc: Eduardo Habkost
    Cc: Mark McLoughlin
    Signed-off-by: Ingo Molnar

    Jeremy Fitzhardinge
     
  • We must leave lazy mode before switching the %fs and %gs selectors.

    Signed-off-by: Eduardo Habkost
    Signed-off-by: Jeremy Fitzhardinge
    Cc: xen-devel
    Cc: Stephen Tweedie
    Cc: Eduardo Habkost
    Cc: Mark McLoughlin
    Signed-off-by: Ingo Molnar

    Jeremy Fitzhardinge
     
  • We will need to set a pte on l3_user_pgt. Extract set_pte_vaddr_pud()
    from set_pte_vaddr(), that will accept the l3 page table as parameter.

    This change should be a no-op for existing code.

    Signed-off-by: Eduardo Habkost
    Signed-off-by: Mark McLoughlin
    Signed-off-by: Jeremy Fitzhardinge
    Cc: xen-devel
    Cc: Stephen Tweedie
    Cc: Mark McLoughlin
    Signed-off-by: Ingo Molnar

    Eduardo Habkost
     
  • Because Xen doesn't support PSE mappings in guests, all code which
    assumed the presence of PSE has been changed to fall back to smaller
    mappings if necessary. As a result, PSE is optional rather than
    required (though still used whereever possible).

    Signed-off-by: Jeremy Fitzhardinge
    Cc: xen-devel
    Cc: Stephen Tweedie
    Cc: Eduardo Habkost
    Cc: Mark McLoughlin
    Signed-off-by: Ingo Molnar

    Jeremy Fitzhardinge
     
  • If PSE is not available, then fall back to 4k page mappings for the
    vmemmap area.

    Signed-off-by: Jeremy Fitzhardinge
    Cc: xen-devel
    Cc: Stephen Tweedie
    Cc: Eduardo Habkost
    Cc: Mark McLoughlin
    Signed-off-by: Ingo Molnar

    Jeremy Fitzhardinge
     
  • This makes a few of changes to the construction of the initial
    pagetables to work better with paravirt_ops/Xen. The main areas
    are:

    1. Support non-PSE mapping of memory, since Xen doesn't currently
    allow 2M pages to be mapped in guests.

    2. Make sure that the ioremap alias of all pages are dropped before
    attaching the new page to the pagetable. This avoids having
    writable aliases of pagetable pages.

    3. Preserve existing pagetable entries, rather than overwriting. Its
    possible that a fair amount of pagetable has already been constructed,
    so reuse what's already in place rather than ignoring and overwriting it.

    The algorithm relies on the invariant that any page which is part of
    the kernel pagetable is itself mapped in the linear memory area. This
    way, it can avoid using ioremap on a pagetable page.

    The invariant holds because it maps memory from low to high addresses,
    and also allocates memory from low to high. Each allocated page can
    map at least 2M of address space, so the mapped area will always
    progress much faster than the allocated area. It relies on the early
    boot code mapping enough pages to get started.

    Signed-off-by: Jeremy Fitzhardinge
    Cc: xen-devel
    Cc: Stephen Tweedie
    Cc: Eduardo Habkost
    Cc: Mark McLoughlin
    Signed-off-by: Ingo Molnar

    Jeremy Fitzhardinge
     
  • Split x86_64_start_kernel() into two pieces:

    The first essentially cleans up after head_64.S. It clears the
    bss, zaps low identity mappings, sets up some early exception
    handlers.

    The second part preserves the boot data, reserves the kernel's
    text/data/bss, pagetables and ramdisk, and then starts the kernel
    proper.

    This split is so that Xen can call the second part to do the set up it
    needs done. It doesn't need any of the first part setups, because it
    doesn't boot via head_64.S, and its redundant or actively damaging.

    Signed-off-by: Jeremy Fitzhardinge
    Cc: xen-devel
    Cc: Stephen Tweedie
    Cc: Eduardo Habkost
    Cc: Mark McLoughlin
    Signed-off-by: Ingo Molnar

    Jeremy Fitzhardinge
     
  • This matches 32 bit.

    Signed-off-by: Jeremy Fitzhardinge
    Cc: xen-devel
    Cc: Stephen Tweedie
    Cc: Eduardo Habkost
    Cc: Mark McLoughlin
    Signed-off-by: Ingo Molnar

    Jeremy Fitzhardinge
     
  • Set __PAGE_OFFSET to the most negative possible address +
    16*PGDIR_SIZE. The gap is to allow a space for a hypervisor to fit.
    The gap is more or less arbitrary, but it's what Xen needs.

    When booting native, kernel/head_64.S has a set of compile-time
    generated pagetables used at boot time. This patch removes their
    absolutely hard-coded layout, and makes it parameterised on
    __PAGE_OFFSET (and __START_KERNEL_map).

    Signed-off-by: Eduardo Habkost
    Signed-off-by: Jeremy Fitzhardinge
    Cc: xen-devel
    Cc: Stephen Tweedie
    Cc: Eduardo Habkost
    Cc: Mark McLoughlin
    Signed-off-by: Ingo Molnar

    Eduardo Habkost
     
  • On 32-bit it's best to use a %cs: prefix to access memory where the
    other segments may not bet set up properly yet. On 64-bit it's best
    to use a rip-relative addressing mode. Define PARA_INDIRECT() to
    abstract this and generate the proper addressing mode in each case.

    Signed-off-by: Jeremy Fitzhardinge
    Cc: xen-devel
    Cc: Stephen Tweedie
    Cc: Eduardo Habkost
    Cc: Mark McLoughlin
    Signed-off-by: Ingo Molnar

    Jeremy Fitzhardinge
     
  • Rather than just jumping to 0 when there's a missing operation, raise a BUG.

    Signed-off-by: Jeremy Fitzhardinge
    Cc: xen-devel
    Cc: Stephen Tweedie
    Cc: Eduardo Habkost
    Cc: Mark McLoughlin
    Signed-off-by: Ingo Molnar

    Jeremy Fitzhardinge
     
  • Jan Beulich points out that vmalloc_sync_all() assumes that the
    kernel's pmd is always expected to be present in the pgd. The current
    pgd construction code will add the pgd to the pgd_list before its pmds
    have been pre-populated, thereby making it visible to
    vmalloc_sync_all().

    However, because pgd_prepopulate_pmd also does the allocation, it may
    block and cannot be done under spinlock.

    The solution is to preallocate the pmds out of the spinlock, then
    populate them while holding the pgd_list lock.

    This patch also pulls the pmd preallocation and mop-up functions out
    to be common, assuming that the compiler will generate no code for
    them when PREALLOCTED_PMDS is 0. Also, there's no need for pgd_ctor
    to clear the pgd again, since it's allocated as a zeroed page.

    Signed-off-by: Jeremy Fitzhardinge
    Cc: xen-devel
    Cc: Stephen Tweedie
    Cc: Eduardo Habkost
    Cc: Mark McLoughlin
    Signed-off-by: Ingo Molnar
    Cc: Jan Beulich

    Signed-off-by: Ingo Molnar

    Jeremy Fitzhardinge
     
  • Add hooks which are called at pgd_alloc/free time. The pgd_alloc hook
    may return an error code, which if non-zero, causes the pgd allocation
    to be failed. The hooks may be used to allocate/free auxillary
    per-pgd information.

    also fix:

    > * Ingo Molnar wrote:
    >
    > include/asm/pgalloc.h: In function ‘paravirt_pgd_free':
    > include/asm/pgalloc.h:14: error: parameter name omitted
    > arch/x86/kernel/entry_64.S: In file included from
    > arch/x86/kernel/traps_64.c:51:include/asm/pgalloc.h: In function ‘paravirt_pgd_free':
    > include/asm/pgalloc.h:14: error: parameter name omitted

    Signed-off-by: Jeremy Fitzhardinge
    Cc: xen-devel
    Cc: Stephen Tweedie
    Cc: Eduardo Habkost
    Cc: Mark McLoughlin
    Signed-off-by: Ingo Molnar

    Jeremy Fitzhardinge
     
  • vmalloc_sync_all() is only called from register_die_notifier and
    alloc_vm_area. Neither is on any performance-critical paths, so
    vmalloc_sync_all() itself is not on any hot paths.

    Given that the optimisations in vmalloc_sync_all add a fair amount of
    code and complexity, and are fairly hard to evaluate for correctness,
    it's better to just remove them to simplify the code rather than worry
    about its absolute performance.

    Signed-off-by: Jeremy Fitzhardinge
    Cc: xen-devel
    Cc: Stephen Tweedie
    Cc: Eduardo Habkost
    Cc: Mark McLoughlin
    Signed-off-by: Ingo Molnar

    Jeremy Fitzhardinge
     
  • Add sync_cmpxchg to match 32-bit's sync_cmpxchg.

    Signed-off-by: Jeremy Fitzhardinge
    Cc: xen-devel
    Cc: Stephen Tweedie
    Cc: Eduardo Habkost
    Cc: Mark McLoughlin
    Signed-off-by: Ingo Molnar

    Jeremy Fitzhardinge
     
  • Signed-off-by: Jeremy Fitzhardinge
    Cc: xen-devel
    Cc: Stephen Tweedie
    Cc: Eduardo Habkost
    Cc: Mark McLoughlin
    Signed-off-by: Ingo Molnar

    Jeremy Fitzhardinge
     
  • fix:

    In file included from arch/x86/kernel/setup.c:118:
    include/asm/highmem.h:64: error: expected identifier or ‘(' before ‘do'
    include/asm/highmem.h:64: error: expected identifier or ‘(' before ‘while'
    include/asm/highmem.h:67: error: expected identifier or ‘(' before ‘do'
    include/asm/highmem.h:67: error: expected identifier or ‘(' before ‘while'

    Signed-off-by: Ingo Molnar

    Ingo Molnar