11 Feb, 2009

1 commit

  • Impact: fix x86_32 stack protector

    Brian Gerst found out that %gs was being initialized to stack_canary
    instead of stack_canary - 20, which basically gave the same canary
    value for all threads. Fixing this also exposed the following bugs.

    * cpu_idle() didn't call boot_init_stack_canary()

    * stack canary switching in switch_to() was being done too late making
    the initial run of a new thread use the old stack canary value.

    Fix all of them and while at it update comment in cpu_idle() about
    calling boot_init_stack_canary().

    Reported-by: Brian Gerst
    Signed-off-by: Tejun Heo
    Signed-off-by: Ingo Molnar

    Tejun Heo
     

10 Feb, 2009

10 commits

  • Impact: stack protector for x86_32

    Implement stack protector for x86_32. GDT entry 28 is used for it.
    It's set to point to stack_canary-20 and have the length of 24 bytes.
    CONFIG_CC_STACKPROTECTOR turns off CONFIG_X86_32_LAZY_GS and sets %gs
    to the stack canary segment on entry. As %gs is otherwise unused by
    the kernel, the canary can be anywhere. It's defined as a percpu
    variable.

    x86_32 exception handlers take register frame on stack directly as
    struct pt_regs. With -fstack-protector turned on, gcc copies the
    whole structure after the stack canary and (of course) doesn't copy
    back on return thus losing all changed. For now, -fno-stack-protector
    is added to all files which contain those functions. We definitely
    need something better.

    Signed-off-by: Tejun Heo
    Signed-off-by: Ingo Molnar

    Tejun Heo
     
  • Impact: pt_regs changed, lazy gs handling made optional, add slight
    overhead to SAVE_ALL, simplifies error_code path a bit

    On x86_32, %gs hasn't been used by kernel and handled lazily. pt_regs
    doesn't have place for it and gs is saved/loaded only when necessary.
    In preparation for stack protector support, this patch makes lazy %gs
    handling optional by doing the followings.

    * Add CONFIG_X86_32_LAZY_GS and place for gs in pt_regs.

    * Save and restore %gs along with other registers in entry_32.S unless
    LAZY_GS. Note that this unfortunately adds "pushl $0" on SAVE_ALL
    even when LAZY_GS. However, it adds no overhead to common exit path
    and simplifies entry path with error code.

    * Define different user_gs accessors depending on LAZY_GS and add
    lazy_save_gs() and lazy_load_gs() which are noop if !LAZY_GS. The
    lazy_*_gs() ops are used to save, load and clear %gs lazily.

    * Define ELF_CORE_COPY_KERNEL_REGS() which always read %gs directly.

    xen and lguest changes need to be verified.

    Signed-off-by: Tejun Heo
    Cc: Jeremy Fitzhardinge
    Cc: Rusty Russell

    Signed-off-by: Ingo Molnar

    Tejun Heo
     
  • Impact: cleanup

    On x86_32, %gs is handled lazily. It's not saved and restored on
    kernel entry/exit but only when necessary which usually is during task
    switch but there are few other places. Currently, it's done by
    calling savesegment() and loadsegment() explicitly. Define
    get_user_gs(), set_user_gs() and task_user_gs() and use them instead.

    While at it, clean up register access macros in signal.c.

    This cleans up code a bit and will help future changes.

    Signed-off-by: Tejun Heo
    Signed-off-by: Ingo Molnar

    Tejun Heo
     
  • Impact: cleanup

    Use .macro instead of cpp #define where approriate. This cleans up
    code and will ease future changes.

    Signed-off-by: Tejun Heo
    Signed-off-by: Ingo Molnar

    Tejun Heo
     
  • Impact: avoid crash on vsyscall

    Signed-off-by: Tejun Heo
    Signed-off-by: Ingo Molnar

    Tejun Heo
     
  • Impact: no default -fno-stack-protector if stackp is enabled, cleanup

    Stackprotector make rules had the following problems.

    * cc support test and warning are scattered across makefile and
    kernel/panic.c.

    * -fno-stack-protector was always added regardless of configuration.

    Update such that cc support test and warning are contained in makefile
    and -fno-stack-protector is added iff stackp is turned off. While at
    it, prepare for 32bit support.

    Signed-off-by: Tejun Heo
    Signed-off-by: Ingo Molnar

    Tejun Heo
     
  • Impact: misc udpate

    * wrap content with CONFIG_CC_STACK_PROTECTOR so that other arch files
    can include it directly

    * add missing includes

    This will help future changes.

    Signed-off-by: Tejun Heo
    Signed-off-by: Ingo Molnar

    Tejun Heo
     
  • Conflicts:
    arch/x86/kernel/acpi/boot.c

    Ingo Molnar
     
  • Ingo Molnar
     
  • do_device_not_available() is the handler for #NM and it declares that
    it takes a unsigned long and calls math_emu(), which takes a long
    argument and surprisingly expects the stack frame starting at the zero
    argument would match struct math_emu_info, which isn't true regardless
    of configuration in the current code.

    This patch makes do_device_not_available() take struct pt_regs like
    other exception handlers and initialize struct math_emu_info with
    pointer to it and pass pointer to the math_emu_info to math_emulate()
    like normal C functions do. This way, unless gcc makes a copy of
    struct pt_regs in do_device_not_available(), the register frame is
    correctly accessed regardless of kernel configuration or compiler
    used.

    This doesn't fix all math_emu problems but it at least gets it
    somewhat working.

    Signed-off-by: Tejun Heo
    Signed-off-by: Ingo Molnar

    Tejun Heo
     

09 Feb, 2009

9 commits

  • Conflicts:
    arch/x86/mach-voyager/voyager_smp.c
    arch/x86/mm/fault.c

    Ingo Molnar
     
  • Impact: cleanup

    * Come on, struct info? s/struct info/struct math_emu_info/

    * Use struct pt_regs and kernel_vm86_regs instead of defining its own
    register frame structure.

    Signed-off-by: Tejun Heo
    Signed-off-by: Ingo Molnar

    Tejun Heo
     
  • Impact: dump the correct %gs into a.out core dump

    aout_dump_thread() read %gs but didn't include it in core dump. Fix
    it.

    Signed-off-by: Tejun Heo
    Signed-off-by: Ingo Molnar

    Tejun Heo
     
  • Commit 6194ba6ff6ccf8d5c54c857600843c67aa82c407 ("x86: don't special-case
    pmd allocations as much") made changes to the way we handle pmd allocations,
    and while doing that it dropped a call to paravirt_release_pd on the
    pgd page from the pgd_dtor code path.

    As a result of this missing release, the hypervisor is now unaware of the
    pgd page being freed, and as a result it ends up tracking this page as a
    page table page.

    After this the guest may start using the same page for other purposes, and
    depending on what use the page is put to, it may result in various performance
    and/or functional issues ( hangs, reboots).

    Since this release is only required for VMI, I now release the pgd page from
    the (vmi)_pgd_free hook.

    Signed-off-by: Alok N Kataria
    Acked-by: Jeremy Fitzhardinge
    Signed-off-by: Ingo Molnar
    Cc:

    Alok Kataria
     
  • Impact: find right nr_irqs_gsi on some systems.

    One test-system has gap between gsi's:

    [ 0.000000] ACPI: IOAPIC (id[0x04] address[0xfec00000] gsi_base[0])
    [ 0.000000] IOAPIC[0]: apic_id 4, version 0, address 0xfec00000, GSI 0-23
    [ 0.000000] ACPI: IOAPIC (id[0x05] address[0xfeafd000] gsi_base[48])
    [ 0.000000] IOAPIC[1]: apic_id 5, version 0, address 0xfeafd000, GSI 48-54
    [ 0.000000] ACPI: IOAPIC (id[0x06] address[0xfeafc000] gsi_base[56])
    [ 0.000000] IOAPIC[2]: apic_id 6, version 0, address 0xfeafc000, GSI 56-62
    ...
    [ 0.000000] nr_irqs_gsi: 38

    So nr_irqs_gsi is not right. some irq for MSI will overwrite with io_apic.

    need to get that with acpi_probe_gsi when acpi io_apic is used

    Signed-off-by: Yinghai Lu
    Signed-off-by: Ingo Molnar

    Yinghai Lu
     
  • For Intel 7400 series CPUs, the recommendation is to use a clflush on the
    monitored address just before monitor and mwait pair [1].

    This clflush makes sure that there are no false wakeups from mwait when the
    monitored address was recently written to.

    [1] "MONITOR/MWAIT Recommendations for Intel Xeon Processor 7400 series"
    section in specification update document of 7400 series
    http://download.intel.com/design/xeon/specupdt/32033601.pdf

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Ingo Molnar

    Pallipadi, Venkatesh
     
  • Impact: bug fix

    Don't use per_cpu_offset() to determine if it valid to access a
    per-cpu variable for a given cpu number. It is not a valid assumption
    on x86-64 anymore. Use cpu_possible() instead.

    Signed-off-by: Brian Gerst
    Signed-off-by: Ingo Molnar

    Brian Gerst
     
  • Impact: cleanup and bug fix

    Use the linker to create symbols for certain per-cpu variables
    that are offset by __per_cpu_load. This allows the removal of
    the runtime fixup of the GDT pointer, which fixes a bug with
    resume reported by Jiri Slaby.

    Reported-by: Jiri Slaby
    Signed-off-by: Brian Gerst
    Acked-by: Jiri Slaby
    Signed-off-by: Ingo Molnar

    Brian Gerst
     
  • Impact: bug fix

    IA-64 needs to put percpu data in the seperate section even on UP.
    Fixes regression caused by "percpu: refactor percpu.h"

    Signed-off-by: Brian Gerst
    Acked-by: Tony Luck
    Signed-off-by: Ingo Molnar

    Brian Gerst
     

08 Feb, 2009

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
    PCI PM: make the PM core more careful with drivers using the new PM framework
    PCI PM: Read power state from device after trying to change it on resume
    PCI PM: Do not disable and enable bridges during suspend-resume
    PCI: PCIe portdrv: Simplify suspend and resume
    PCI PM: Fix saving of device state in pci_legacy_suspend
    PCI PM: Check if the state has been saved before trying to restore it
    PCI PM: Fix handling of devices without drivers
    PCI: return error on failure to read PCI ROMs
    PCI: properly clean up ASPM link state on device remove

    Linus Torvalds
     

07 Feb, 2009

2 commits

  • …isc', 'printk' and 'processor' into release

    Len Brown
     
  • One of my past fixes to this code introduced a different new bug.
    When using 32-bit "int $0x80" entry for a bogus syscall number,
    the return value is not correctly set to -ENOSYS. This only happens
    when neither syscall-audit nor syscall tracing is enabled (i.e., never
    seen if auditd ever started). Test program:

    /* gcc -o int80-badsys -m32 -g int80-badsys.c
    Run on x86-64 kernel.
    Note to reproduce the bug you need auditd never to have started. */

    #include
    #include

    int
    main (void)
    {
    long res;
    asm ("int $0x80" : "=a" (res) : "0" (99999));
    printf ("bad syscall returns %ld\n", res);
    return res != -ENOSYS;
    }

    The fix makes the int $0x80 path match the sysenter and syscall paths.

    Reported-by: Dmitry V. Levin
    Signed-off-by: Roland McGrath

    Roland McGrath
     

06 Feb, 2009

4 commits

  • Prevent kprobes from catching spurious faults which will cause infinite
    recursive page-fault and memory corruption by stack overflow.

    Signed-off-by: Masami Hiramatsu
    Cc: [2.6.28.x]
    Signed-off-by: Linus Torvalds

    Masami Hiramatsu
     
  • * 'sh/for-2.6.29' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
    sh: Fix up T-bit error handling in SH-4A mutex fastpath.
    sh: Fix up spurious syscall restarting.
    sh: fcnvds fix with denormalized numbers on SH-4 FPU.
    sh: Only reserve memory under CONFIG_ZERO_PAGE_OFFSET when it != 0.
    sh: Handle calling csum_partial with misaligned data
    sh: ap325rxa: Enable ov772x in defconfig.
    sh: ap325rxa: Add ov772x support.
    sh: ap325rxa: control camera power toggling.
    sh: mach-migor: Enable ov772x and tw9910 in defconfig.

    Linus Torvalds
     
  • Do usual do {} while (0) dance, otherwise

    fs/gfs2/util.c:99: error: expected expression before 'else'
    drivers/scsi/lpfc/lpfc_sli.c:363: error: expected expression before 'else'

    Signed-off-by: Alexey Dobriyan
    Acked-by: Ivan Kokshaysky
    Cc: Richard Henderson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Make the following style cleanups:

    * drop unnecessary //#include from xen-asm_32.S
    * compulsive adding of space after comma
    * reformat multiline comments

    Signed-off-by: Tejun Heo
    Signed-off-by: Ingo Molnar

    Tejun Heo
     

05 Feb, 2009

13 commits

  • Due to recurring issues with DMAR support on certain platforms.
    There's a number of filesystem corruption incidents reported:

    https://bugzilla.redhat.com/show_bug.cgi?id=479996
    http://bugzilla.kernel.org/show_bug.cgi?id=12578

    Provide a Kconfig option to change whether it is enabled by
    default.

    If disabled, it can still be reenabled by passing intel_iommu=on to the
    kernel. Keep the .config option off by default.

    Signed-off-by: Kyle McMartin
    Signed-off-by: Andrew Morton
    Acked-By: David Woodhouse
    Signed-off-by: Ingo Molnar

    Kyle McMartin
     
  • On an x86 system which doesn't support global mappings,
    __supported_pte_mask has _PAGE_GLOBAL clear, to make sure it never
    appears in the PTE. pfn_pte() and so on will enforce it with:

    static inline pte_t pfn_pte(unsigned long page_nr, pgprot_t pgprot)
    {
    return __pte((((phys_addr_t)page_nr << PAGE_SHIFT) |
    pgprot_val(pgprot)) & __supported_pte_mask);
    }

    However, we overload _PAGE_GLOBAL with _PAGE_PROTNONE on non-present
    ptes to distinguish them from swap entries. However, applying
    __supported_pte_mask indiscriminately will clear the bit and corrupt the
    pte.

    I guess the best fix is to only apply __supported_pte_mask to present
    ptes. This seems like the right solution to me, as it means we can
    completely ignore the issue of overlaps between the present pte bits and
    the non-present pte-as-swap entry use of the bits.

    __supported_pte_mask contains the set of flags we support on the
    current hardware. We also use bits in the pte for things like
    logically present ptes with no permissions, and swap entries for
    swapped out pages. We should only apply __supported_pte_mask to
    present ptes, because otherwise we may destroy other information being
    stored in the ptes.

    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: H. Peter Anvin

    Jeremy Fitzhardinge
     
  • Impact: cleanup

    In __put_user_size() macro errret is used for error value.
    But if size is 8, errret isn't passed to__put_user_asm_u64().
    This behavior is inconsistent.

    Signed-off-by: Hiroshi Shimamoto
    Signed-off-by: H. Peter Anvin

    Hiroshi Shimamoto
     
  • Enable the use of the direct vcpu-access operations on 64-bit.

    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: H. Peter Anvin

    Jeremy Fitzhardinge
     
  • Now that x86-64 has directly accessible percpu variables, it can also
    implement the direct versions of these operations, which operate on a
    vcpu_info structure directly embedded in the percpu area.

    In fact, the 64-bit versions are more or less identical, and so can be
    shared. The only two differences are:
    1. xen_restore_fl_direct takes its argument in eax on 32-bit, and rdi on 64-bit.
    Unfortunately it isn't possible to directly refer to the 2nd lsb of rdi directly
    (as you can with %ah), so the code isn't quite as dense.
    2. check_events needs to variants to save different registers.

    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: H. Peter Anvin

    Jeremy Fitzhardinge
     
  • We need to access percpu data fairly early, so set up the percpu
    registers as soon as possible. We only need to load the appropriate
    segment register. We already have a GDT, but its hard to change it
    early because we need to manipulate the pagetable to do so, and that
    hasn't been set up yet.

    Also, set the kernel stack when bringing up secondary CPUs. If we
    don't they all end up sharing the same stack...

    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: H. Peter Anvin

    Jeremy Fitzhardinge
     
  • This patch makes the ROM reading code return an error to user space if
    the size of the ROM read is equal to 0.

    The patch also emits a warnings if the contents of the ROM are invalid,
    and documents the effects of the "enable" file on ROM reading.

    Signed-off-by: Timothy S. Nelson
    Acked-by: Alex Villacis-Lasso
    Signed-off-by: Jesse Barnes

    Timothy S. Nelson
     
  • H. Peter Anvin
     
  • Moving the mmu code from enlighten.c to mmu.c inadvertently broke the
    32-bit build. Fix it.

    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: H. Peter Anvin

    Jeremy Fitzhardinge
     
  • Fix user-visible grammo.

    Signed-off-by: Alex Chiang
    Signed-off-by: Ingo Molnar

    Alex Chiang
     
  • …git/tip/linux-2.6-tip

    * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86: APIC: enable workaround on AMD Fam10h CPUs
    xen: disable interrupts before saving in percpu
    x86: add x86@kernel.org to MAINTAINERS
    x86: push old stack address on irqstack for unwinder
    irq, x86: fix lock status with numa_migrate_irq_desc
    x86: add cache descriptors for Intel Core i7
    x86/Voyager: make it build and boot

    Linus Torvalds
     
  • Impact: cleanup

    Some lines exceed the 80 char width making them unreadable.

    Signed-off-by: Borislav Petkov
    Signed-off-by: Ingo Molnar

    Borislav Petkov
     
  • This patch echoes what we already do on 32-bit since
    90f7d25c6b672137344f447a30a9159945ffea72, and prints the DMI
    product name in show_regs, so that system specific problems can be
    easily identified.

    Signed-off-by: Kyle McMartin
    Signed-off-by: Ingo Molnar

    Kyle McMartin