27 Oct, 2012

1 commit

  • Pull x86 fixes from Ingo Molnar:
    "This fixes a couple of nasty page table initialization bugs which were
    causing kdump regressions. A clean rearchitecturing of the code is in
    the works - meanwhile these are reverts that restore the
    best-known-working state of the kernel.

    There's also EFI fixes and other small fixes."

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86, mm: Undo incorrect revert in arch/x86/mm/init.c
    x86: efi: Turn off efi_enabled after setup on mixed fw/kernel
    x86, mm: Find_early_table_space based on ranges that are actually being mapped
    x86, mm: Use memblock memory loop instead of e820_RAM
    x86, mm: Trim memory in memblock to be page aligned
    x86/irq/ioapic: Check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt
    x86/efi: Fix oops caused by incorrect set_memory_uc() usage
    x86-64: Fix page table accounting
    Revert "x86/mm: Fix the size calculation of mapping tables"
    MAINTAINERS: Add EFI git repository location

    Linus Torvalds
     

26 Oct, 2012

3 commits

  • …g/efi into x86/urgent

    Pull EFI fixes from Matt Fleming:

    "Fix oops with EFI variables on mixed 32/64-bit firmware/kernels and
    document EFI git repository location on kernel.org."

    Conflicts:
    arch/x86/include/asm/efi.h

    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     
  • Commit

    844ab6f9 x86, mm: Find_early_table_space based on ranges that are actually being mapped

    added back some lines back wrongly that has been removed in commit

    7b16bbf97 Revert "x86/mm: Fix the size calculation of mapping tables"

    remove them again.

    Signed-off-by: Yinghai Lu
    Link: http://lkml.kernel.org/r/CAE9FiQW_vuaYQbmagVnxT2DGsYc=9tNeAbdBq53sYkitPOwxSQ@mail.gmail.com
    Acked-by: Jacob Shin
    Signed-off-by: H. Peter Anvin

    Yinghai Lu
     
  • When 32-bit EFI is used with 64-bit kernel (or vice versa), turn off
    efi_enabled once setup is done. Beyond setup, it is normally used to
    determine if runtime services are available and we will have none.

    This will resolve issues stemming from efivars modprobe panicking on a
    32/64-bit setup, as well as some reboot issues on similar setups.

    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=45991

    Reported-by: Marko Kohtala
    Reported-by: Maxim Kammerer
    Signed-off-by: Olof Johansson
    Acked-by: Maarten Lankhorst
    Cc: stable@kernel.org # 3.4 - 3.6
    Cc: Matthew Garrett
    Signed-off-by: Matt Fleming

    Olof Johansson
     

25 Oct, 2012

3 commits

  • Current logic finds enough space for direct mapping page tables from 0
    to end. Instead, we only need to find enough space to cover mr[0].start
    to mr[nr_range].end -- the range that is actually being mapped by
    init_memory_mapping()

    This is needed after 1bbbbe779aabe1f0768c2bf8f8c0a5583679b54a, to address
    the panic reported here:

    https://lkml.org/lkml/2012/10/20/160
    https://lkml.org/lkml/2012/10/21/157

    Signed-off-by: Jacob Shin
    Link: http://lkml.kernel.org/r/20121024195311.GB11779@jshin-Toonie
    Tested-by: Tom Rini
    Signed-off-by: H. Peter Anvin

    Jacob Shin
     
  • We need to handle E820_RAM and E820_RESERVED_KERNEL at the same time.

    Also memblock has page aligned range for ram, so we could avoid mapping
    partial pages.

    Signed-off-by: Yinghai Lu
    Link: http://lkml.kernel.org/r/CAE9FiQVZirvaBMFYRfXMmWEcHbKSicQEHz4VAwUv0xFCk51ZNw@mail.gmail.com
    Acked-by: Jacob Shin
    Signed-off-by: H. Peter Anvin
    Cc:

    Yinghai Lu
     
  • We will not map partial pages, so need to make sure memblock
    allocation will not allocate those bytes out.

    Also we will use for_each_mem_pfn_range() to loop to map memory
    range to keep them consistent.

    Signed-off-by: Yinghai Lu
    Link: http://lkml.kernel.org/r/CAE9FiQVZirvaBMFYRfXMmWEcHbKSicQEHz4VAwUv0xFCk51ZNw@mail.gmail.com
    Acked-by: Jacob Shin
    Signed-off-by: H. Peter Anvin
    Cc:

    Yinghai Lu
     

24 Oct, 2012

15 commits

  • Posting this patch to fix an issue concerning sparse irq's that
    I raised a while back. There was discussion about adding
    refcounting to sparse irqs (to fix other potential race
    conditions), but that does not appear to have been addressed
    yet. This covers the only issue of this type that I've
    encountered in this area.

    A NULL pointer dereference can occur in
    smp_irq_move_cleanup_interrupt() if we haven't yet setup the
    irq_cfg pointer in the irq_desc.irq_data.chip_data.

    In create_irq_nr() there is a window where we have set
    vector_irq in __assign_irq_vector(), but not yet called
    irq_set_chip_data() to set the irq_cfg pointer.

    Should an IRQ_MOVE_CLEANUP_VECTOR hit the cpu in question during
    this time, smp_irq_move_cleanup_interrupt() will attempt to
    process the aforementioned irq, but panic when accessing
    irq_cfg.

    Only continue processing the irq if irq_cfg is non-NULL.

    Signed-off-by: Dimitri Sivanich
    Cc: Suresh Siddha
    Cc: Joerg Roedel
    Cc: Yinghai Lu
    Cc: Alexander Gordeev
    Link: http://lkml.kernel.org/r/20121016125021.GA22935@sgi.com
    Signed-off-by: Ingo Molnar

    Dimitri Sivanich
     
  • The variable port is initialized but never used
    otherwise, so remove the unused variable.

    dpatch engine is used to auto generate this patch.
    (https://github.com/weiyj/dpatch)

    Signed-off-by: Wei Yongjun
    Cc: Yan, Zheng
    Cc: a.p.zijlstra@chello.nl
    Cc: paulus@samba.org
    Cc: acme@ghostprotocols.net
    Link: http://lkml.kernel.org/r/CAPgLHd8NZkYSkZm22FpZxiEh6HcA0q-V%3D29vdnheiDhgrJZ%2Byw@mail.gmail.com
    Signed-off-by: Ingo Molnar

    Wei Yongjun
     
  • Calling __pa() with an ioremap'd address is invalid. If we
    encounter an efi_memory_desc_t without EFI_MEMORY_WB set in
    ->attribute we currently call set_memory_uc(), which in turn
    calls __pa() on a potentially ioremap'd address.

    On CONFIG_X86_32 this results in the following oops:

    BUG: unable to handle kernel paging request at f7f22280
    IP: [] reserve_ram_pages_type+0x89/0x210
    *pdpt = 0000000001978001 *pde = 0000000001ffb067 *pte = 0000000000000000
    Oops: 0000 [#1] PREEMPT SMP
    Modules linked in:

    Pid: 0, comm: swapper Not tainted 3.0.0-acpi-efi-0805 #3
    EIP: 0060:[] EFLAGS: 00010202 CPU: 0
    EIP is at reserve_ram_pages_type+0x89/0x210
    EAX: 0070e280 EBX: 38714000 ECX: f7814000 EDX: 00000000
    ESI: 00000000 EDI: 38715000 EBP: c189fef0 ESP: c189fea8
    DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
    Process swapper (pid: 0, ti=c189e000 task=c18bbe60 task.ti=c189e000)
    Stack:
    80000200 ff108000 00000000 c189ff00 00038714 00000000 00000000 c189fed0
    c104f8ca 00038714 00000000 00038715 00000000 00000000 00038715 00000000
    00000010 38715000 c189ff48 c1025aff 38715000 00000000 00000010 00000000
    Call Trace:
    [] ? page_is_ram+0x1a/0x40
    [] reserve_memtype+0xdf/0x2f0
    [] set_memory_uc+0x49/0xa0
    [] efi_enter_virtual_mode+0x1c2/0x3aa
    [] start_kernel+0x291/0x2f2
    [] ? loglevel+0x1b/0x1b
    [] i386_start_kernel+0xbf/0xc8

    The only time we can call set_memory_uc() for a memory region is
    when it is part of the direct kernel mapping. For the case where
    we ioremap a memory region we must leave it alone.

    This patch reimplements the fix from e8c7106280a3 ("x86, efi:
    Calling __pa() with an ioremap()ed address is invalid") which
    was reverted in e1ad783b12ec because it caused a regression on
    some MacBooks (they hung at boot). The regression was caused
    because the commit only marked EFI_RUNTIME_SERVICES_DATA as
    E820_RESERVED_EFI, when it should have marked all regions that
    have the EFI_MEMORY_RUNTIME attribute.

    Despite first impressions, it's not possible to use
    ioremap_cache() to map all cached memory regions on
    CONFIG_X86_64 because of the way that the memory map might be
    configured as detailed in the following bug report,

    https://bugzilla.redhat.com/show_bug.cgi?id=748516

    e.g. some of the EFI memory regions *need* to be mapped as part
    of the direct kernel mapping.

    Signed-off-by: Matt Fleming
    Cc: Matthew Garrett
    Cc: Zhang Rui
    Cc: Huang Ying
    Cc: Keith Packard
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/1350649546-23541-1-git-send-email-matt@console-pimps.org
    Signed-off-by: Ingo Molnar

    Matt Fleming
     
  • Although based on the Intel P6 design, the interrupt mechnanism
    for KNC more closely resembles the Intel architectural
    perfmon one.

    We can't just re-use that code though, because KNC has different
    MSR numbers for the status and ack registers.

    In this case we just cut-and paste from perf_event_intel.c
    with some minor changes, as it looks like it would not be
    worth the trouble to change that code to be MSR-configurable.

    Signed-off-by: Vince Weaver
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: eranian@gmail.com
    Cc: Meadows Lawrence F
    Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1210171304410.23243@vincent-weaver-1.um.maine.edu
    [ Small stylistic edits. ]
    Signed-off-by: Ingo Molnar

    Vince Weaver
     
  • x86_pmu.enable() is called from x86_pmu_enable() with
    cpuc->enabled set to 0. This means we weren't re-enabling the
    counters after a context switch.

    This patch just removes the check, as it should't be necessary
    (and the equivelent x86_ generic code does not have the checks).

    The origin of this problem is the KNC driver being based on the
    P6 one. The P6 driver also has this issue, but works anyway
    due to various lucky accidents.

    Signed-off-by: Vince Weaver
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: eranian@gmail.com
    Cc: Meadows
    Cc: Lawrence F
    Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1210171303290.23243@vincent-weaver-1.um.maine.edu
    Signed-off-by: Ingo Molnar

    Vince Weaver
     
  • Early versions of Intel KNC chips have a bug where bits above 32
    were not properly set. We worked around this by only using the
    bottom 32 bits (out of 40 that should be available).

    It turns out this workaround breaks overflow handling.

    The buggy silicon will in theory never be used in production
    systems, so remove this workaround so we get proper overflow
    support.

    Signed-off-by: Vince Weaver
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: eranian@gmail.com
    Cc: Meadows Lawrence F
    Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1210171302140.23243@vincent-weaver-1.um.maine.edu
    Signed-off-by: Ingo Molnar

    Vince Weaver
     
  • This, beyond handling corner cases, also fixes some build warnings:

    arch/x86/kernel/cpu/perf_event_intel_uncore.c: In function ‘snbep_uncore_pci_disable_box’:
    arch/x86/kernel/cpu/perf_event_intel_uncore.c:124:9: warning: ‘config’ is used uninitialized in this function [-Wuninitialized]
    arch/x86/kernel/cpu/perf_event_intel_uncore.c: In function ‘snbep_uncore_pci_enable_box’:
    arch/x86/kernel/cpu/perf_event_intel_uncore.c:135:9: warning: ‘config’ is used uninitialized in this function [-Wuninitialized]
    arch/x86/kernel/cpu/perf_event_intel_uncore.c: In function ‘snbep_uncore_pci_read_counter’:
    arch/x86/kernel/cpu/perf_event_intel_uncore.c:164:2: warning: ‘count’ is used uninitialized in this function [-Wuninitialized]

    Signed-off-by: Yan, Zheng
    Cc: a.p.zijlstra@chello.nl
    Link: http://lkml.kernel.org/r/1351068140-13456-1-git-send-email-zheng.z.yan@intel.com
    Signed-off-by: Ingo Molnar

    Yan, Zheng
     
  • Commit 20167d3421a089a1bf1bd680b150dc69c9506810 ("x86-64: Fix
    accounting in kernel_physical_mapping_init()") went a little too
    far by entirely removing the counting of pre-populated page
    tables: this should be done at boot time (to cover the page
    tables set up in early boot code), but shouldn't be done during
    memory hot add.

    Hence, re-add the removed increments of "pages", but make them
    and the one in phys_pte_init() conditional upon !after_bootmem.

    Reported-Acked-and-Tested-by: Hugh Dickins
    Signed-off-by: Jan Beulich
    Cc:
    Link: http://lkml.kernel.org/r/506DAFBA020000780009FA8C@nat28.tlf.novell.com
    Signed-off-by: Ingo Molnar

    Jan Beulich
     
  • Between 2.6.33 and 2.6.34 the PMU code was made modular.

    The x86_pmu_enable() call was extended to disable cpuc->enabled
    and iterate the counters, enabling one at a time, before calling
    enable_all() at the end, followed by re-enabling cpuc->enabled.

    Since cpuc->enabled was set to 0, that change effectively caused
    the "val |= ARCH_PERFMON_EVENTSEL_ENABLE;" code in p6_pmu_enable_event()
    and p6_pmu_disable_event() to be dead code that was never called.

    This change removes this code (which was confusing) and adds some
    extra commentary to make it more clear what is going on.

    Signed-off-by: Vince Weaver
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1210191732000.14552@vincent-weaver-1.um.maine.edu
    Signed-off-by: Ingo Molnar

    Vince Weaver
     
  • This patch updates the generic events on p6, including some new
    extended cache events.

    Values for these events were taken from the equivelant PAPI
    predefined events.

    Tested on a Pentium II.

    Signed-off-by: Vince Weaver
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1210191730080.14552@vincent-weaver-1.um.maine.edu
    Signed-off-by: Ingo Molnar

    Vince Weaver
     
  • According to Intel SDM Volume 3B, FP_ASSIST is limited to Counter 1 only,
    not Counter 0.

    Tested on a Pentium II.

    Signed-off-by: Vince Weaver
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1210191728570.14552@vincent-weaver-1.um.maine.edu
    Signed-off-by: Ingo Molnar

    Vince Weaver
     
  • Commit:

    722bc6b16771 x86/mm: Fix the size calculation of mapping tables

    Tried to address the issue that the first 2/4M should use 4k pages
    if PSE enabled, but extra counts should only be valid for x86_32.

    This commit caused a kdump regression: the kdump kernel hangs.

    Work is in progress to fundamentally fix the various page table
    initialization issues that we have, via the design suggested
    by H. Peter Anvin, but it's not ready yet to be merged.

    So, to get a working kdump revert to the last known working version,
    which is the revert of this commit and of a followup fix (which was
    incomplete):

    bd2753b2dda7 x86/mm: Only add extra pages count for the first memory range during pre-allocation

    Tested kdump on physical and virtual machines.

    Signed-off-by: Dave Young
    Acked-by: Yinghai Lu
    Acked-by: Cong Wang
    Acked-by: Flavio Leitner
    Tested-by: Flavio Leitner
    Cc: Dan Carpenter
    Cc: Cong Wang
    Cc: Flavio Leitner
    Cc: Tejun Heo
    Cc: ianfang.cn@gmail.com
    Cc: Vivek Goyal
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Cc:
    Signed-off-by: Ingo Molnar

    Dave Young
     
  • In check_hw_exists() we try to detect non-emulated MSR accesses
    by writing an arbitrary value into one of the PMU registers
    and check if it's value after a readout is still the same.
    This algorithm silently assumes that the register does not contain
    the magic value already, which is wrong in at least one situation.

    Fix the algorithm to really do a read-modify-write cycle. This fixes
    a warning under Xen under some circumstances on AMD family 10h CPUs.

    The reasons in more details actually sound like a story from
    Believe It or Not!:

    First you need an AMD family 10h/12h CPU. These do not reset the
    PERF_CTR registers on a reboot.
    Now you boot bare metal Linux, which goes successfully through this
    check, but leaves the magic value of 0xabcd in the register. You
    don't use the performance counters, but do a reboot (warm reset).
    Then you choose to boot Xen. The check will be triggered with a
    recent Linux kernel as Dom0 again, trying to write 0xabcd into the
    MSR. Xen silently drops the write (expected), but the subsequent read
    will return the value in the register, which just happens to be the
    expected magic value. Thus the test misleadingly succeeds, leaving
    the kernel in the belief that the PMU is available. This will trigger
    the following message:

    [ 0.020294] ------------[ cut here ]------------
    [ 0.020311] WARNING: at arch/x86/xen/enlighten.c:730 xen_apic_write+0x15/0x17()
    [ 0.020318] Hardware name: empty
    [ 0.020323] Modules linked in:
    [ 0.020334] Pid: 1, comm: swapper/0 Not tainted 3.3.8 #7
    [ 0.020340] Call Trace:
    [ 0.020354] [] warn_slowpath_common+0x80/0x98
    [ 0.020369] [] warn_slowpath_null+0x15/0x17
    [ 0.020378] [] xen_apic_write+0x15/0x17
    [ 0.020392] [] perf_events_lapic_init+0x2e/0x30
    [ 0.020410] [] init_hw_perf_events+0x250/0x407
    [ 0.020419] [] ? check_bugs+0x2d/0x2d
    [ 0.020430] [] do_one_initcall+0x7a/0x131
    [ 0.020444] [] kernel_init+0x91/0x15d
    [ 0.020456] [] kernel_thread_helper+0x4/0x10
    [ 0.020471] [] ? retint_restore_args+0x5/0x6
    [ 0.020481] [] ? gs_change+0x13/0x13
    [ 0.020500] ---[ end trace a7919e7f17c0a725 ]---

    The new code will change every of the 16 low bits read from the
    register and tries to write and read-back that modified number
    from the MSR.

    Signed-off-by: Andre Przywara
    Signed-off-by: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Avi Kivity
    Link: http://lkml.kernel.org/r/1349797115-28346-2-git-send-email-andre.przywara@amd.com
    Signed-off-by: Ingo Molnar

    Andre Przywara
     
  • Pull xen bug-fixes from Konrad Rzeszutek Wilk:
    - Fix mysterious SIGSEGV or SIGKILL in applications due to corrupting
    of the %eip when returning from a signal handler.
    - Fix various ARM compile issues after the merge fallout.
    - Continue on making more of the Xen generic code usable by ARM
    platform.
    - Fix SR-IOV passthrough to mirror multifunction PCI devices.
    - Fix various compile warnings.
    - Remove hypercalls that don't exist anymore.

    * tag 'stable/for-linus-3.7-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
    xen: dbgp: Fix warning when CONFIG_PCI is not enabled.
    xen: arm: comment on why 64-bit xen_pfn_t is safe even on 32 bit
    xen: balloon: use correct type for frame_list
    xen/x86: don't corrupt %eip when returning from a signal handler
    xen: arm: make p2m operations NOPs
    xen: balloon: don't include e820.h
    xen: grant: use xen_pfn_t type for frame_list.
    xen: events: pirq_check_eoi_map is X86 specific
    xen: XENMEM_translate_gpfn_list was remove ages ago and is unused.
    xen: sysfs: fix build warning.
    xen: sysfs: include err.h for PTR_ERR etc
    xen: xenbus: quirk uses x86 specific cpuid
    xen PV passthru: assign SR-IOV virtual functions to separate virtual slots
    xen/xenbus: Fix compile warning.
    xen/x86: remove duplicated include from enlighten.c

    Linus Torvalds
     
  • Pull kvm fixes from Avi Kivity:
    "KVM updates for 3.7-rc2"

    * tag 'kvm-3.7-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
    KVM guest: exit idleness when handling KVM_PV_REASON_PAGE_NOT_PRESENT
    KVM: apic: fix LDR calculation in x2apic mode
    KVM: MMU: fix release noslot pfn

    Linus Torvalds
     

23 Oct, 2012

3 commits

  • KVM_PV_REASON_PAGE_NOT_PRESENT kicks cpu out of idleness, but we haven't
    marked that spot as an exit from idleness.

    Not doing so can cause RCU warnings such as:

    [ 732.788386] ===============================
    [ 732.789803] [ INFO: suspicious RCU usage. ]
    [ 732.790032] 3.7.0-rc1-next-20121019-sasha-00002-g6d8d02d-dirty #63 Tainted: G W
    [ 732.790032] -------------------------------
    [ 732.790032] include/linux/rcupdate.h:738 rcu_read_lock() used illegally while idle!
    [ 732.790032]
    [ 732.790032] other info that might help us debug this:
    [ 732.790032]
    [ 732.790032]
    [ 732.790032] RCU used illegally from idle CPU!
    [ 732.790032] rcu_scheduler_active = 1, debug_locks = 1
    [ 732.790032] RCU used illegally from extended quiescent state!
    [ 732.790032] 2 locks held by trinity-child31/8252:
    [ 732.790032] #0: (&rq->lock){-.-.-.}, at: [] __schedule+0x178/0x8f0
    [ 732.790032] #1: (rcu_read_lock){.+.+..}, at: [] cpuacct_charge+0xe/0x200
    [ 732.790032]
    [ 732.790032] stack backtrace:
    [ 732.790032] Pid: 8252, comm: trinity-child31 Tainted: G W 3.7.0-rc1-next-20121019-sasha-00002-g6d8d02d-dirty #63
    [ 732.790032] Call Trace:
    [ 732.790032] [] lockdep_rcu_suspicious+0x10b/0x120
    [ 732.790032] [] cpuacct_charge+0x90/0x200
    [ 732.790032] [] ? cpuacct_charge+0xe/0x200
    [ 732.790032] [] update_curr+0x1a3/0x270
    [ 732.790032] [] dequeue_entity+0x2a/0x210
    [ 732.790032] [] dequeue_task_fair+0x45/0x130
    [ 732.790032] [] dequeue_task+0x89/0xa0
    [ 732.790032] [] deactivate_task+0x1e/0x20
    [ 732.790032] [] __schedule+0x879/0x8f0
    [ 732.790032] [] ? trace_hardirqs_off+0xd/0x10
    [ 732.790032] [] ? kvm_async_pf_task_wait+0x1d5/0x2b0
    [ 732.790032] [] schedule+0x55/0x60
    [ 732.790032] [] kvm_async_pf_task_wait+0x1f4/0x2b0
    [ 732.790032] [] ? abort_exclusive_wait+0xb0/0xb0
    [ 732.790032] [] ? prepare_to_wait+0x25/0x90
    [ 732.790032] [] do_async_page_fault+0x56/0xa0
    [ 732.790032] [] async_page_fault+0x28/0x30

    Signed-off-by: Sasha Levin
    Acked-by: Gleb Natapov
    Acked-by: Paul E. McKenney
    Signed-off-by: Avi Kivity

    Sasha Levin
     
  • Signed-off-by: Gleb Natapov
    Reviewed-by: Chegu Vinod
    Tested-by: Chegu Vinod
    Signed-off-by: Avi Kivity

    Gleb Natapov
     
  • We can not directly call kvm_release_pfn_clean to release the pfn
    since we can meet noslot pfn which is used to cache mmio info into
    spte

    Signed-off-by: Xiao Guangrong
    Cc: stable@vger.kernel.org
    Signed-off-by: Avi Kivity

    Xiao Guangrong
     

22 Oct, 2012

2 commits


20 Oct, 2012

8 commits

  • Initializing uncore PMU on virtualized CPU may hang the kernel.
    This is because kvm does not emulate the entire hardware. Thers
    are lots of uncore related MSRs, making kvm enumerate them all
    is a non-trival task. So just disable uncore on virtualized CPU.

    Signed-off-by: Yan, Zheng
    Tested-by: Pekka Enberg
    Cc: a.p.zijlstra@chello.nl
    Cc: eranian@google.com
    Cc: andi@firstfloor.org
    Cc: avi@redhat.com
    Link: http://lkml.kernel.org/r/1345540117-14164-1-git-send-email-zheng.z.yan@intel.com
    Signed-off-by: Ingo Molnar

    Yan, Zheng
     
  • Pull perf fixes from Ingo Molnar:
    "Assorted small fixes"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    perf python: Properly link with libtraceevent
    perf hists browser: Add back callchain folding symbol
    perf tools: Fix build on sparc.
    perf python: Link with libtraceevent
    perf python: Initialize 'page_size' variable
    tools lib traceevent: Fix missed freeing of subargs in free_arg() in filter
    lib tools traceevent: Add back pevent assignment in __pevent_parse_format()
    perf hists browser: Fix off-by-two bug on the first column
    perf tools: Remove warnings on JIT samples for srcline sort key
    perf tools: Fix segfault when using srcline sort key
    perf: Require exclude_guest to use PEBS - kernel side enforcement
    perf tool: Precise mode requires exclude_guest

    Linus Torvalds
     
  • …it/acme/linux into perf/urgent

    Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

    * The python binding needs to link with libtraceevent and to initialize
    the 'page_size' variable so that mmaping works again.

    * The callchain folding character that appears on the TUI just before
    the overhead had disappeared due to recent changes, add it back.

    * Intel PEBS in VT-x context uses the DS address as a guest linear address,
    even though its programmed by the host as a host linear address. This either
    results in guest memory corruption and or the hardware faulting and 'crashing'
    the virtual machine. Therefore we have to disable PEBS on VT-x enter and
    re-enable on VT-x exit, enforcing a strict exclude_guest.

    Kernel side enforcement fix by Peter Zijlstra, tooling side fix by David Ahern.

    * Fix build on sparc due to UAPI, fix from David Miller.

    * Fixes for the srclike sort key for unresolved symbols and when processing
    samples in JITted code, where we don't have an ELF file, just an special
    symbol table, fixes from Namhyung Kim.

    * Fix some leaks in libtraceevent, from Steven Rostedt.

    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     
  • * commit 'v3.7-rc1': (10892 commits)
    Linux 3.7-rc1
    x86, boot: Explicitly include autoconf.h for hostprogs
    perf: Fix UAPI fallout
    ARM: config: make sure that platforms are ordered by option string
    ARM: config: sort select statements alphanumerically
    UAPI: (Scripted) Disintegrate include/linux/byteorder
    UAPI: (Scripted) Disintegrate include/linux
    UAPI: Unexport linux/blk_types.h
    UAPI: Unexport part of linux/ppp-comp.h
    perf: Handle new rbtree implementation
    procfs: don't need a PATH_MAX allocation to hold a string representation of an int
    vfs: embed struct filename inside of names_cache allocation if possible
    audit: make audit_inode take struct filename
    vfs: make path_openat take a struct filename pointer
    vfs: turn do_path_lookup into wrapper around struct filename variant
    audit: allow audit code to satisfy getname requests from its names_list
    vfs: define struct filename and have getname() return it
    btrfs: Fix compilation with user namespace support enabled
    userns: Fix posix_acl_file_xattr_userns gid conversion
    userns: Properly print bluetooth socket uids
    ...

    Konrad Rzeszutek Wilk
     
  • In 32 bit guests, if a userspace process has %eax == -ERESTARTSYS
    (-512) or -ERESTARTNOINTR (-513) when it is interrupted by an event
    /and/ the process has a pending signal then %eip (and %eax) are
    corrupted when returning to the main process after handling the
    signal. The application may then crash with SIGSEGV or a SIGILL or it
    may have subtly incorrect behaviour (depending on what instruction it
    returned to).

    The occurs because handle_signal() is incorrectly thinking that there
    is a system call that needs to restarted so it adjusts %eip and %eax
    to re-execute the system call instruction (even though user space had
    not done a system call).

    If %eax == -514 (-ERESTARTNOHAND (-514) or -ERESTART_RESTARTBLOCK
    (-516) then handle_signal() only corrupted %eax (by setting it to
    -EINTR). This may cause the application to crash or have incorrect
    behaviour.

    handle_signal() assumes that regs->orig_ax >= 0 means a system call so
    any kernel entry point that is not for a system call must push a
    negative value for orig_ax. For example, for physical interrupts on
    bare metal the inverse of the vector is pushed and page_fault() sets
    regs->orig_ax to -1, overwriting the hardware provided error code.

    xen_hypervisor_callback() was incorrectly pushing 0 for orig_ax
    instead of -1.

    Classic Xen kernels pushed %eax which works as %eax cannot be both
    non-negative and -RESTARTSYS (etc.), but using -1 is consistent with
    other non-system call entry points and avoids some of the tests in
    handle_signal().

    There were similar bugs in xen_failsafe_callback() of both 32 and
    64-bit guests. If the fault was corrected and the normal return path
    was used then 0 was incorrectly pushed as the value for orig_ax.

    Signed-off-by: David Vrabel
    Acked-by: Jan Beulich
    Acked-by: Ian Campbell
    Cc: stable@vger.kernel.org
    Signed-off-by: Konrad Rzeszutek Wilk

    David Vrabel
     
  • This correctly sizes it as 64 bit on ARM but leaves it as unsigned
    long on x86 (therefore no intended change on x86).

    The long and ulong guest handles are now unused (and a bit dangerous)
    so remove them.

    Acked-by: Stefano Stabellini
    Signed-off-by: Ian Campbell
    Signed-off-by: Konrad Rzeszutek Wilk

    Ian Campbell
     
  • Define PRI macros for xen_ulong_t and xen_pfn_t and use to fix:
    drivers/xen/sys-hypervisor.c:288:4: warning: format '%lx' expects argument of type 'long unsigned int', but argument 3 has type 'xen_ulong_t' [-Wformat]

    Ideally this would use PRIx64 on ARM but these (or equivalent) don't
    seem to be available in the kernel.

    Acked-by: Stefano Stabellini
    Signed-off-by: Ian Campbell
    Signed-off-by: Konrad Rzeszutek Wilk

    Ian Campbell
     
  • Remove duplicated include.

    dpatch engine is used to auto generate this patch.
    (https://github.com/weiyj/dpatch)

    CC: stable@vger.kernel.org
    Signed-off-by: Wei Yongjun
    Signed-off-by: Konrad Rzeszutek Wilk

    Wei Yongjun
     

19 Oct, 2012

3 commits

  • From Borislav Petkov :

    Below is a RAS fix which reverts the addition of a sysfs attribute
    which we agreed is not needed, post-factum. And this should go in now
    because that sysfs attribute is going to end up in 3.7 otherwise and
    thus exposed to userspace; removing it then would be a lot harder.

    This is done as a merge rather than a simple patch/cherry-pick since
    the baseline for this patch was not in the previous x86/urgent.

    Signed-off-by: H. Peter Anvin

    H. Peter Anvin
     
  • 450cc201038f3 ("x86/mce: Provide boot argument to honour bios-set CMCI
    threshold") added the bios_cmci_threshold sysfs attribute which was
    supposed to communicate to userspace tools that BIOS CMCI threshold has
    been honoured.

    However, this info is not of any importance to userspace - it should
    rather get the actual error count it has been thresholded already from
    MCi_STATUS[38:52].

    So drop this before it becomes a used interface (good thing we caught
    this early in 3.7-rc1, right after the merge window closed).

    Cc: Naveen N. Rao
    Acked-by: Tony Luck
    Link: http://lkml.kernel.org/r/20121017105940.GA14590@x1.osrc.amd.com
    Signed-off-by: Borislav Petkov

    Borislav Petkov
     
  • Calling convention for internal functions and 'asmlinkage' functions is
    different on x86-32. Therefore do not directly cast aesni_enc as XTS tweak
    function, but use wrapper function in between. Fixes crash with "XTS +
    aesni_intel + x86-32" combination.

    Cc: stable@vger.kernel.org
    Reported-by: Krzysztof Kolasa
    Signed-off-by: Jussi Kivilinna
    Acked-by: David S. Miller
    Signed-off-by: Linus Torvalds

    Jussi Kivilinna
     

18 Oct, 2012

2 commits

  • When booting on a federated multi-server system (NumaScale), the
    processor Northbridge lookup returns NULL; add guards to prevent this
    causing an oops.

    On those systems, the northbridge is accessed through MMIO and the
    "normal" northbridge enumeration in amd_nb.c doesn't work since we're
    generating the northbridge ID from the initial APIC ID and the last
    is not unique on those systems. Long story short, we end up without
    northbridge descriptors.

    Signed-off-by: Daniel J Blueman
    Cc: stable@vger.kernel.org # 3.6
    Link: http://lkml.kernel.org/r/1349073725-14093-1-git-send-email-daniel@numascale-asia.com
    [ Boris: beef up commit message ]
    Signed-off-by: Borislav Petkov
    Signed-off-by: H. Peter Anvin

    Daniel J Blueman
     
  • On systems with very large memory (1 TB in our case), BIOS may report a
    reserved region or a hole in the E820 map, even above the 4 GB range. Exclude
    these from the direct mapping.

    [ hpa: this should be done not just for > 4 GB but for everything above the legacy
    region (1 MB), at the very least. That, however, turns out to require significant
    restructuring. That work is well underway, but is not suitable for rc/stable. ]

    Cc: stable@kernel.org # > 2.6.32
    Signed-off-by: Jacob Shin
    Link: http://lkml.kernel.org/r/1319145326-13902-1-git-send-email-jacob.shin@amd.com
    Signed-off-by: H. Peter Anvin

    Jacob Shin