18 May, 2010

2 commits

  • …el/git/tip/linux-2.6-tip

    * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    lockdep: Reduce stack_trace usage
    lockdep: No need to disable preemption in debug atomic ops
    lockdep: Actually _dec_ in debug_atomic_dec
    lockdep: Provide off case for redundant_hardirqs_on increment
    lockdep: Simplify debug atomic ops
    lockdep: Fix redundant_hardirqs_on incremented with irqs enabled
    lockstat: Make lockstat counting per cpu
    i8253: Convert i8253_lock to raw_spinlock

    Linus Torvalds
     
  • …/git/tip/linux-2.6-tip

    * 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86/amd-iommu: Add amd_iommu=off command line option
    iommu-api: Remove iommu_{un}map_range functions
    x86/amd-iommu: Implement ->{un}map callbacks for iommu-api
    x86/amd-iommu: Make amd_iommu_iova_to_phys aware of multiple page sizes
    x86/amd-iommu: Make iommu_unmap_page and fetch_pte aware of page sizes
    x86/amd-iommu: Make iommu_map_page and alloc_pte aware of page sizes
    kvm: Change kvm_iommu_map_pages to map large pages
    VT-d: Change {un}map_range functions to implement {un}map interface
    iommu-api: Add ->{un}map callbacks to iommu_ops
    iommu-api: Add iommu_map and iommu_unmap functions
    iommu-api: Rename ->{un}map function pointers to ->{un}map_range

    Linus Torvalds
     

17 May, 2010

1 commit


15 May, 2010

5 commits

  • …git/tip/linux-2.6-tip

    * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86, mrst: Don't blindly access extended config space

    Linus Torvalds
     
  • Do not blindly access extended configuration space unless we actively
    know we're on a Moorestown platform. The fixed-size BAR capability
    lives in the extended configuration space, and thus is not applicable
    if the configuration space isn't appropriately sized.

    This fixes booting certain VMware configurations with CONFIG_MRST=y.

    Moorestown will add a fake PCI-X 266 capability to advertise the
    presence of extended configuration space.

    Reported-and-tested-by: Petr Vandrovec
    Signed-off-by: H. Peter Anvin
    Acked-by: Jacob Pan
    Acked-by: Jesse Barnes
    LKML-Reference:

    H. Peter Anvin
     
  • …git/tip/linux-2.6-tip

    * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86, cacheinfo: Turn off L3 cache index disable feature in virtualized environments
    x86, k8: Fix build error when K8_NB is disabled
    x86, amd: Check X86_FEATURE_OSVW bit before accessing OSVW MSRs
    x86: Fix fake apicid to node mapping for numa emulation

    Linus Torvalds
     
  • When running a quest kernel on xen we get:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
    IP: [] cpuid4_cache_lookup_regs+0x2ca/0x3df
    PGD 0
    Oops: 0000 [#1] SMP
    last sysfs file:
    CPU 0
    Modules linked in:

    Pid: 0, comm: swapper Tainted: G W 2.6.34-rc3 #1 /HVM domU
    RIP: 0010:[] [] cpuid4_cache_lookup_regs+0x
    2ca/0x3df
    RSP: 0018:ffff880002203e08 EFLAGS: 00010046
    RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000060
    RDX: 0000000000000000 RSI: 0000000000000040 RDI: 0000000000000000
    RBP: ffff880002203ed8 R08: 00000000000017c0 R09: ffff880002203e38
    R10: ffff8800023d5d40 R11: ffffffff81a01e28 R12: ffff880187e6f5c0
    R13: ffff880002203e34 R14: ffff880002203e58 R15: ffff880002203e68
    FS: 0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 0000000000000038 CR3: 0000000001a3c000 CR4: 00000000000006f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process swapper (pid: 0, threadinfo ffffffff81a00000, task ffffffff81a44020)
    Stack:
    ffffffff810d7ecb ffff880002203e20 ffffffff81059140 ffff880002203e30
    ffffffff810d7ec9 0000000002203e40 000000000050d140 ffff880002203e70
    0000000002008140 0000000000000086 ffff880040020140 ffffffff81068b8b
    Call Trace:

    [] ? sync_supers_timer_fn+0x0/0x1c
    [] ? mod_timer+0x23/0x25
    [] ? arm_supers_timer+0x34/0x36
    [] ? hrtimer_get_next_event+0xa7/0xc3
    [] ? get_next_timer_interrupt+0x19a/0x20d
    [] get_cpu_leaves+0x5c/0x232
    [] ? sched_clock_local+0x1c/0x82
    [] ? sched_clock_tick+0x75/0x7a
    [] generic_smp_call_function_single_interrupt+0xae/0xd0
    [] smp_call_function_single_interrupt+0x18/0x27
    [] call_function_single_interrupt+0x13/0x20

    [] ? notifier_call_chain+0x14/0x63
    [] ? native_safe_halt+0xc/0xd
    [] ? default_idle+0x36/0x53
    [] cpu_idle+0xaa/0xe4
    [] rest_init+0x7e/0x80
    [] start_kernel+0x40e/0x419
    [] x86_64_start_reservations+0xb3/0xb7
    [] x86_64_start_kernel+0xf8/0x107
    Code: 14 d5 40 ff ae 81 8b 14 02 31 c0 3b 15 47 1c 8b 00 7d 0e 48 8b 05 36 1c 8b
    00 48 63 d2 48 8b 04 d0 c7 85 5c ff ff ff 00 00 00 00 70 38 48 8d 8d 5c ff
    ff ff 48 8b 78 10 ba c4 01 00 00 e8 eb
    RIP [] cpuid4_cache_lookup_regs+0x2ca/0x3df
    RSP
    CR2: 0000000000000038
    ---[ end trace a7919e7f17c0a726 ]---

    The L3 cache index disable feature of AMD CPUs has to be disabled if the
    kernel is running as guest on top of a hypervisor because northbridge
    devices are not available to the guest. Currently, this fixes a boot
    crash on top of Xen. In the future this will become an issue on KVM as
    well.

    Check if northbridge devices are present and do not enable the feature
    if there are none.

    [ hpa: backported to 2.6.34 ]

    Signed-off-by: Frank Arnold
    LKML-Reference:
    Acked-by: Borislav Petkov
    Signed-off-by: H. Peter Anvin
    Cc:

    Frank Arnold
     
  • K8_NB depends on PCI and when the last is disabled (allnoconfig) we fail
    at the final linking stage due to missing exported num_k8_northbridges.
    Add a header stub for that.

    Signed-off-by: Borislav Petkov
    LKML-Reference:
    Signed-off-by: H. Peter Anvin
    Cc:

    Borislav Petkov
     

14 May, 2010

1 commit


13 May, 2010

3 commits

  • As the processor may not consider GUEST_INTR_STATE_STI as a reason for
    blocking NMI, it could return immediately with EXIT_REASON_NMI_WINDOW
    when we asked for it. But as we consider this state as NMI-blocking, we
    can run into an endless loop.

    Resolve this by allowing NMI injection if just GUEST_INTR_STATE_STI is
    active (originally suggested by Gleb). Intel confirmed that this is
    safe, the processor will never complain about NMI injection in this
    state.

    Signed-off-by: Jan Kiszka
    KVM-Stable-Tag
    Acked-by: Gleb Natapov
    Signed-off-by: Marcelo Tosatti

    Jan Kiszka
     
  • cpuid_update may operate VMCS, so vcpu_load() and vcpu_put()
    should be called to ensure correctness.

    Signed-off-by: Dongxiao Xu
    Signed-off-by: Marcelo Tosatti

    Dongxiao Xu
     
  • This patch makes KVM on 32 bit SVM working again by
    correcting the masks used for iret interception. With the
    wrong masks the upper 32 bits of the intercepts are masked
    out which leaves vmrun unintercepted. This is not legal on
    svm and the vmrun fails.
    Bug was introduced by commits 95ba827313 and 3cfc3092.

    Cc: Jan Kiszka
    Cc: Gleb Natapov
    Cc: stable@kernel.org
    Signed-off-by: Joerg Roedel
    Signed-off-by: Avi Kivity

    Joerg Roedel
     

11 May, 2010

3 commits

  • Conflicts:
    arch/x86/kernel/amd_iommu.c

    Joerg Roedel
     
  • This patch adds a command line option to tell the AMD IOMMU
    driver to not initialize any IOMMU it finds.

    Signed-off-by: Joerg Roedel

    Joerg Roedel
     
  • Fix kprobe/x86 to check removed int3 when failing to get kprobe
    from hlist. Since we have a time window between checking int3
    exists on probed address and getting kprobe on that address,
    we can have following scenario:

    -------
    CPU1 CPU2
    hit int3
    check int3 exists
    remove int3
    remove kprobe from hlist
    get kprobe from hlist
    no kprobe->OOPS!
    -------

    This patch moves int3 checking if there is no kprobe on that
    address for fixing this problem as follows:

    ------
    CPU1 CPU2
    hit int3
    remove int3
    remove kprobe from hlist
    get kprobe from hlist
    no kprobe->check int3 exists
    ->rollback&retry
    ------

    Signed-off-by: Masami Hiramatsu
    Acked-by: Ananth N Mavinakayanahalli
    Cc: systemtap
    Cc: DLE
    Cc: Dave Anderson
    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Masami Hiramatsu
     

06 May, 2010

1 commit

  • With NUMA emulation, it's possible for a single cpu to be bound
    to multiple nodes since more than one may have affinity if
    allocated on a physical node that is local to the cpu.

    APIC ids must therefore be mapped to the lowest node ids to
    maintain generic kernel use of functions such as cpu_to_node()
    that determine device affinity. For example, if a device has
    proximity to physical node 1, for instance, and a cpu happens to
    be mapped to a higher emulated node id 8, the proximity may not
    be correctly determined by comparison in generic code even
    though the cpu may be truly local and allocated on physical node 1.

    When this happens, the true topology of the machine isn't
    accurately represented in the emulated environment; although
    this isn't critical to the system's uptime, any generic code
    that is NUMA aware benefits from the physical topology being
    accurately represented.

    This can affect any system that maps multiple APIC ids to a
    single node and is booted with numa=fake=N where N is greater
    than the number of physical nodes.

    Signed-off-by: David Rientjes
    Cc: Yinghai Lu
    Cc: Suresh Siddha
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    David Rientjes
     

05 May, 2010

2 commits


03 May, 2010

3 commits


01 May, 2010

1 commit


30 Apr, 2010

1 commit

  • When specifying the 'reservetop=0xbadc0de' kernel parameter,
    the kernel will stop booting due to a early_ioremap bug that
    relates to commit 8827247ff.

    The root cause of boot failure problem is the value of
    'slot_virt[i]' was initialized in setup_arch->early_ioremap_init().
    But later in setup_arch, the function 'parse_early_param' will
    modify 'FIXADDR_TOP' when 'reservetop=0xbadc0de' being specified.

    The simplest fix might be use __fix_to_virt(idx0) to get updated
    value of 'FIXADDR_TOP' in '__early_ioremap' instead of reference
    old value from slot_virt[slot] directly.

    Changelog since v0:

    -v1: When reservetop being handled then FIXADDR_TOP get
    adjusted, Hence check prev_map then re-initialize slot_virt and
    PMD based on new FIXADDR_TOP.

    -v2: place fixup_early_ioremap hence call early_ioremap_init in
    reserve_top_address to re-initialize slot_virt and
    corresponding PMD when parse_reservertop

    -v3: move fixup_early_ioremap out of reserve_top_address to make
    sure other clients of reserve_top_address like xen/lguest won't
    broken

    Signed-off-by: Liang Li
    Tested-by: Konrad Rzeszutek Wilk
    Acked-by: Yinghai Lu
    Acked-by: Jeremy Fitzhardinge
    Cc: Wang Chen
    Cc: "H. Peter Anvin"
    Cc: Andrew Morton
    LKML-Reference:
    [ fixed three small cleanliness details in fixup_early_ioremap() ]
    Signed-off-by: Ingo Molnar

    Liang Li
     

29 Apr, 2010

2 commits

  • …git/x86/linux-2.6-tip

    * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip:
    x86: Disable large pages on CPUs with Atom erratum AAE44
    x86-64: Clear a 64-bit FS/GS base on fork if selector is nonzero
    x86, mrst: Conditionally register cpu hotplug notifier for apbt

    Linus Torvalds
     
  • ACPI _CRS Address Space Descriptors have _MIN, _MAX, and _LEN. Linux has
    been computing Address Spaces as [_MIN to _MIN + _LEN - 1]. Based on the
    tests in the bug reports below, Windows apparently uses [_MIN to _MAX].

    Per spec (ACPI 4.0, Table 6-40), for _CRS fixed-size, fixed location
    descriptors, "_LEN must be (_MAX - _MIN + 1)", and when that's true, it
    doesn't matter which way we compute the end. But of course, there are
    BIOSes that don't follow this rule, and we're better off if Linux handles
    those exceptions the same way as Windows.

    This patch makes Linux use [_MIN to _MAX], as Windows seems to do. This
    effectively reverts d558b483d5 and 03db42adfe and replaces them with
    simpler code.

    https://bugzilla.kernel.org/show_bug.cgi?id=14337 (round)
    https://bugzilla.kernel.org/show_bug.cgi?id=15480 (truncate)

    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Jesse Barnes

    Bjorn Helgaas
     

27 Apr, 2010

1 commit

  • When we move a PCI device or assign resources to a device not configured
    by the BIOS, we want to avoid the BIOS region below 1MB. Note that if the
    BIOS places devices below 1MB, we leave them there.

    See https://bugzilla.kernel.org/show_bug.cgi?id=15744
    and https://bugzilla.kernel.org/show_bug.cgi?id=15841

    Tested-by: Andy Isaacson
    Tested-by: Andy Bailey
    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Jesse Barnes

    Bjorn Helgaas
     

25 Apr, 2010

2 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
    PCI: Ensure we re-enable devices on resume
    x86/PCI: parse additional host bridge window resource types
    PCI: revert broken device warning
    PCI aerdrv: use correct bit defines and add 2ms delay to aer_root_reset
    x86/PCI: ignore Consumer/Producer bit in ACPI window descriptions

    Linus Torvalds
     
  • This is a standalone version of VMware Balloon driver. Ballooning is a
    technique that allows hypervisor dynamically limit the amount of memory
    available to the guest (with guest cooperation). In the overcommit
    scenario, when hypervisor set detects that it needs to shuffle some
    memory, it instructs the driver to allocate certain number of pages, and
    the underlying memory gets returned to the hypervisor. Later hypervisor
    may return memory to the guest by reattaching memory to the pageframes and
    instructing the driver to "deflate" balloon.

    We are submitting a standalone driver because KVM maintainer (Avi Kivity)
    expressed opinion (rightly) that our transport does not fit well into
    virtqueue paradigm and thus it does not make much sense to integrate with
    virtio.

    There were also some concerns whether current ballooning technique is the
    right thing. If there appears a better framework to achieve this we are
    prepared to evaluate and switch to using it, but in the meantime we'd like
    to get this driver upstream.

    We want to get the driver accepted in distributions so that users do not
    have to deal with an out-of-tree module and many distributions have
    "upstream first" requirement.

    The driver has been shipping for a number of years and users running on
    VMware platform will have it installed as part of VMware Tools even if it
    will not come from a distribution, thus there should not be additional
    risk in pulling the driver into mainline. The driver will only activate
    if host is VMware so everyone else should not be affected at all.

    Signed-off-by: Dmitry Torokhov
    Cc: Avi Kivity
    Cc: Jeremy Fitzhardinge
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dmitry Torokhov
     

24 Apr, 2010

2 commits

  • Atom erratum AAE44/AAF40/AAG38/AAH41:

    "If software clears the PS (page size) bit in a present PDE (page
    directory entry), that will cause linear addresses mapped through this
    PDE to use 4-KByte pages instead of using a large page after old TLB
    entries are invalidated. Due to this erratum, if a code fetch uses
    this PDE before the TLB entry for the large page is invalidated then
    it may fetch from a different physical address than specified by
    either the old large page translation or the new 4-KByte page
    translation. This erratum may also cause speculative code fetches from
    incorrect addresses."

    [http://download.intel.com/design/processor/specupdt/319536.pdf]

    Where as commit 211b3d03c7400f48a781977a50104c9d12f4e229 seems to
    workaround errata AAH41 (mixed 4K TLBs) it reduces the window of
    opportunity for the bug to occur and does not totally remove it. This
    patch disables mixed 4K/4MB page tables totally avoiding the page
    splitting and not tripping this processor issue.

    This is based on an original patch by Colin King.

    Originally-by: Colin Ian King
    Cc: Colin Ian King
    Cc: Ingo Molnar
    Signed-off-by: H. Peter Anvin
    LKML-Reference:
    Cc:

    H. Peter Anvin
     
  • When we do a thread switch, we clear the outgoing FS/GS base if the
    corresponding selector is nonzero. This is taken by __switch_to() as
    an entry invariant; it does not verify that it is true on entry.
    However, copy_thread() doesn't enforce this constraint, which can
    result in inconsistent results after fork().

    Make copy_thread() match the behavior of __switch_to().

    Reported-and-tested-by: Samuel Thibault
    Signed-off-by: H. Peter Anvin
    LKML-Reference:
    Cc:

    H. Peter Anvin
     

23 Apr, 2010

1 commit

  • This adds support for Memory24, Memory32, and Memory32Fixed descriptors in
    PCI host bridge _CRS.

    I experimentally determined that Windows (2008 R2) accepts these descriptors
    and treats them as windows that are forwarded to the PCI bus, e.g., if
    it finds any PCI devices with BARs outside the windows, it moves them into
    the windows.

    I don't know whether any machines actually use these descriptors in PCI
    host bridge _CRS methods, but if any exist and they're new enough that we
    automatically turn on "pci=use_crs", they will work with Windows but not
    with Linux.

    Here are the details: https://bugzilla.kernel.org/show_bug.cgi?id=15817

    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Jesse Barnes

    Bjorn Helgaas
     

22 Apr, 2010

1 commit

  • * 'kvm-updates/2.6.34' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
    KVM: x86: Fix TSS size check for 16-bit tasks
    KVM: Add missing srcu_read_lock() for kvm_mmu_notifier_release()
    KVM: Increase NR_IOBUS_DEVS limit to 200
    KVM: fix the handling of dirty bitmaps to avoid overflows
    KVM: MMU: fix kvm_mmu_zap_page() and its calling path
    KVM: VMX: Save/restore rflags.vm correctly in real mode
    KVM: allow bit 10 to be cleared in MSR_IA32_MC4_CTL
    KVM: Don't spam kernel log when injecting exceptions due to bad cr writes
    KVM: SVM: Fix memory leaks that happen when svm_create_vcpu() fails
    KVM: take srcu lock before call to complete_pio()

    Linus Torvalds
     

21 Apr, 2010

4 commits

  • A 16-bit TSS is only 44 bytes long. So make sure to test for the correct
    size on task switch.

    Signed-off-by: Jan Kiszka
    Signed-off-by: Avi Kivity

    Jan Kiszka
     
  • APB timer is used on Moorestown platforms but not on a standard PC.
    If APB timer code is compiled in but not initialized at run-time due
    to lack of FW reported SFI table, kernel would panic when the non-boot
    CPUs are offlined and notifier is called.

    https://bugzilla.kernel.org/show_bug.cgi?id=15786

    This patch ensures CPU hotplug notifier for APB timer is only registered
    when the APBT timer block is initialized.

    Signed-off-by: Jacob Pan
    LKML-Reference:
    Signed-off-by: H. Peter Anvin

    Jacob Pan
     
  • …/git/tip/linux-2.6-tip

    * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    perf: Fix unsafe frame rewinding with hot regs fetching

    Linus Torvalds
     
  • Before commit e28cbf22933d0c0ccaf3c4c27a1a263b41f73859 ("improve
    sys_newuname() for compat architectures") 64-bit x86 had a private
    implementation of sys_uname which was just called sys_uname, which other
    architectures used for the old uname.

    Due to some merge issues with the uname refactoring patches we ended up
    calling the old uname version for both the old and new system call
    slots, which lead to the domainname filed never be set which caused
    failures with libnss_nis.

    Reported-and-tested-by: Andy Isaacson
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

20 Apr, 2010

4 commits

  • Int is not long enough to store the size of a dirty bitmap.

    This patch fixes this problem with the introduction of a wrapper
    function to calculate the sizes of dirty bitmaps.

    Note: in mark_page_dirty(), we have to consider the fact that
    __set_bit() takes the offset as int, not long.

    Signed-off-by: Takuya Yoshikawa
    Signed-off-by: Marcelo Tosatti

    Takuya Yoshikawa
     
  • This patch fix:

    - calculate zapped page number properly in mmu_zap_unsync_children()
    - calculate freeed page number properly kvm_mmu_change_mmu_pages()
    - if zapped children page it shoud restart hlist walking

    KVM-Stable-Tag.
    Signed-off-by: Xiao Guangrong
    Signed-off-by: Marcelo Tosatti

    Xiao Guangrong
     
  • Currently we set eflags.vm unconditionally when entering real mode emulation
    through virtual-8086 mode, and clear it unconditionally when we enter protected
    mode. The means that the following sequence

    KVM_SET_REGS (rflags.vm=1)
    KVM_SET_SREGS (cr0.pe=1)

    Ends up with rflags.vm clear due to KVM_SET_SREGS triggering enter_pmode().

    Fix by shadowing rflags.vm (and rflags.iopl) correctly while in real mode:
    reads and writes to those bits access a shadow register instead of the actual
    register.

    Signed-off-by: Avi Kivity
    Signed-off-by: Marcelo Tosatti

    Avi Kivity
     
  • There is a quirk for AMD K8 CPUs in many Linux kernels (see
    arch/x86/kernel/cpu/mcheck/mce.c:__mcheck_cpu_apply_quirks()) that
    clears bit 10 in that MCE related MSR. KVM can only cope with all
    zeros or all ones, so it will inject a #GP into the guest, which
    will let it panic.
    So lets add a quirk to the quirk and ignore this single cleared bit.
    This fixes -cpu kvm64 on all machines and -cpu host on K8 machines
    with some guest Linux kernels.

    Signed-off-by: Andre Przywara
    Signed-off-by: Avi Kivity

    Andre Przywara