14 Feb, 2012

4 commits

  • commit 8ef5d844cc3a644ea6f7665932a4307e9fad01fa upstream.

    following statement can only change device size from 8-bit(0) to 16-bit(1),
    but not vice versa:

    regval |= GPMC_CONFIG1_DEVICESIZE(wval);

    so as this field has 1 reserved bit, that could be used in future,
    just clear both bits and then OR with the desired value

    Signed-off-by: Yegor Yefremov
    Signed-off-by: Tony Lindgren
    Signed-off-by: Greg Kroah-Hartman

    Yegor Yefremov
     
  • commit 8130b9d7b9d858aa04ce67805e8951e3cb6e9b2f upstream.

    If we are context switched whilst copying into a thread's
    vfp_hard_struct then the partial copy may be corrupted by the VFP
    context switching code (see "ARM: vfp: flush thread hwstate before
    restoring context from sigframe").

    This patch updates the ptrace VFP set code so that the thread state is
    flushed before the copy, therefore disabling VFP and preventing
    corruption from occurring.

    Signed-off-by: Will Deacon
    Signed-off-by: Russell King
    Signed-off-by: Greg Kroah-Hartman

    Will Deacon
     
  • commit 247f4993a5974e6759606c4d380748eecfd273ff upstream.

    In a preemptible kernel, vfp_set() can be preempted, causing the
    hardware VFP context to be switched while the thread vfp state is
    being read and modified. This leads to a race condition which can
    cause the thread vfp state to become corrupted if lazy VFP context
    save occurs due to preemption in between the time thread->vfpstate
    is read and the time the modified state is written back.

    This may occur if preemption occurs during the execution of a
    ptrace() call which modifies the VFP register state of a thread.
    Such instances should be very rare in most realistic scenarios --
    none has been reported, so far as I am aware. Only uniprocessor
    systems should be affected, since VFP context save is not currently
    lazy in SMP kernels.

    The problem was introduced by my earlier patch migrating to use
    regsets to implement ptrace.

    This patch does a vfp_sync_hwstate() before reading
    thread->vfpstate, to make sure that the thread's VFP state is not
    live in the hardware registers while the registers are modified.

    Thanks to Will Deacon for spotting this.

    Signed-off-by: Dave Martin
    Signed-off-by: Will Deacon
    Signed-off-by: Russell King
    Signed-off-by: Greg Kroah-Hartman

    Dave Martin
     
  • commit 2af276dfb1722e97b190bd2e646b079a2aa674db upstream.

    Following execution of a signal handler, we currently restore the VFP
    context from the ucontext in the signal frame. This involves copying
    from the user stack into the current thread's vfp_hard_struct and then
    flushing the new data out to the hardware registers.

    This is problematic when using a preemptible kernel because we could be
    context switched whilst updating the vfp_hard_struct. If the current
    thread has made use of VFP since the last context switch, the VFP
    notifier will copy from the hardware registers into the vfp_hard_struct,
    overwriting any data that had been partially copied by the signal code.

    Disabling preemption across copy_from_user calls is a terrible idea, so
    instead we move the VFP thread flush *before* we update the
    vfp_hard_struct. Since the flushing is performed lazily, this has the
    effect of disabling VFP and clearing the CPU's VFP state pointer,
    therefore preventing the thread from being updated with stale data on
    the next context switch.

    Tested-by: Peter Maydell
    Signed-off-by: Will Deacon
    Signed-off-by: Russell King
    Signed-off-by: Greg Kroah-Hartman

    Will Deacon
     

04 Feb, 2012

10 commits

  • commit 2ab1159e80e8f416071e9f51e4f77b9173948296 upstream.

    MMC_CAP_SD_HIGHSPEED is not supported on Snowball board resulting on
    initialization errors.

    Signed-off-by: Mathieu Poirier
    Signed-off-by: Fredrik Soderstedt
    Signed-off-by: Philippe Langlais
    Signed-off-by: Linus Walleij

    Philippe Langlais
     
  • [ Upstream commit d00a9dd21bdf7908b70866794c8313ee8a5abd5c ]

    Several problems fixed in this patch :

    1) Target of the conditional jump in case a divide by 0 is performed
    by a bpf is wrong.

    2) Must 'generate' the full function prologue/epilogue at pass=0,
    or else we can stop too early in pass=1 if the proglen doesnt change.
    (if the increase of prologue/epilogue equals decrease of all
    instructions length because some jumps are converted to near jumps)

    3) Change the wrong length detection at the end of code generation to
    issue a more explicit message, no need for a full stack trace.

    Reported-by: Phil Oester
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • commit 7a7546b377bdaa25ac77f33d9433c59f259b9688 upstream.

    If NR_CPUS < 256 then arch_spinlock_t is only 16 bits wide but struct
    xen_spinlock is 32 bits. When a spin lock is contended and
    xl->spinners is modified the two bytes immediately after the spin lock
    would be corrupted.

    This is a regression caused by 84eb950db13ca40a0572ce9957e14723500943d6
    (x86, ticketlock: Clean up types and accessors) which reduced the size
    of arch_spinlock_t.

    Fix this by making xl->spinners a u8 if NR_CPUS < 256. A
    BUILD_BUG_ON() is also added to check the sizes of the two structures
    are compatible.

    In many cases this was not noticable as there would often be padding
    bytes after the lock (e.g., if any of CONFIG_GENERIC_LOCKBREAK,
    CONFIG_DEBUG_SPINLOCK, or CONFIG_DEBUG_LOCK_ALLOC were enabled).

    The bnx2 driver is affected. In struct bnx2, phy_lock and
    indirect_lock may have no padding after them. Contention on phy_lock
    would corrupt indirect_lock making it appear locked and the driver
    would deadlock.

    Signed-off-by: David Vrabel
    Signed-off-by: Jeremy Fitzhardinge
    Acked-by: Ian Campbell
    Signed-off-by: Konrad Rzeszutek Wilk
    Signed-off-by: Greg Kroah-Hartman

    David Vrabel
     
  • commit 612539e81f655f6ac73c7af1da8701c1ee618aee upstream.

    On v7, we use the same cache maintenance instructions for data lines
    as for unified lines. This was not the case for v6, where HARVARD_CACHE
    was defined to indicate the L1 cache topology.

    This patch removes the erroneous compile-time check for HARVARD_CACHE in
    proc-v7.S, ensuring that we perform I-side invalidation at boot.

    Reported-and-Acked-by: Shawn Guo

    Acked-by: Catalin Marinas
    Signed-off-by: Will Deacon
    Signed-off-by: Russell King
    Signed-off-by: Greg Kroah-Hartman

    Will Deacon
     
  • commit d65015f7c5c5be9fd3f5e567889c844ba81bdc9c upstream.

    This applies ARM errata 764369 for all ux500 platforms.

    Signed-off-by: Srinidhi Kasagar
    Signed-off-by: Linus Walleij
    Signed-off-by: Greg Kroah-Hartman

    Srinidhi KASAGAR
     
  • commit 3e90772f76010c315474bde59eaca7cc4c94d645 upstream.

    Currently setting it to PQFP changes subtype to BGA as subtypes are
    swapped in at91rm9200_set_type().

    Wrong subtype causes GPIO bank D not to work at all.

    After this fix, subtype is still set as unknown. But board code should
    fill it in with proper value. Another information is thus printed.

    Bug discovery and first implementation made by Veli-Pekka Peltola.

    Signed-off-by: Nicolas Ferre
    Acked-by: Jean-Christophe PLAGNIOL-VILLARD
    Signed-off-by: Greg Kroah-Hartman

    Nicolas Ferre
     
  • commit 2a3535069e33d8b416f406c159ce924427315303 upstream.

    Passing the address of a variable as an operand to an asm statement
    doesn't mark the value of this variable as used, so gcc may optimize its
    initialisation away. Fix this by using the "m" constraint instead.

    Signed-off-by: Andreas Schwab
    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: Greg Kroah-Hartman

    Andreas Schwab
     
  • commit 5b68edc91cdc972c46f76f85eded7ffddc3ff5c2 upstream.

    We've decided to provide CPU family specific container files
    (starting with CPU family 15h). E.g. for family 15h we have to
    load microcode_amd_fam15h.bin instead of microcode_amd.bin

    Rationale is that starting with family 15h patch size is larger
    than 2KB which was hard coded as maximum patch size in various
    microcode loaders (not just Linux).

    Container files which include patches larger than 2KB cause
    different kinds of trouble with such old patch loaders. Thus we
    have to ensure that the default container file provides only
    patches with size less than 2KB.

    Signed-off-by: Andreas Herrmann
    Cc: Borislav Petkov
    Cc:
    Link: http://lkml.kernel.org/r/20120120164412.GD24508@alberich.amd.com
    [ documented the naming convention and tidied the code a bit. ]
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Andreas Herrmann
     
  • commit 5a51467b146ab7948d2f6812892eac120a30529c upstream.

    uv_gpa_to_soc_phys_ram() was inadvertently ignoring the
    shift values. This fix takes the shift into account.

    Signed-off-by: Russ Anderson
    Link: http://lkml.kernel.org/r/20120119020753.GA7228@sgi.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Russ Anderson
     
  • commit d2ebc71d472020bc30e29afe8c4d2a85a5b41f56 upstream.

    Initialize two spinlocks in tlb_uv.c and also properly define/initialize
    the uv_irq_lock.

    The lack of explicit initialization seems to be functionally
    harmless, but it is diagnosed when these are turned on:

    CONFIG_DEBUG_SPINLOCK=y
    CONFIG_DEBUG_MUTEXES=y
    CONFIG_DEBUG_LOCK_ALLOC=y
    CONFIG_LOCKDEP=y

    Signed-off-by: Cliff Wickman
    Cc: Dimitri Sivanich
    Link: http://lkml.kernel.org/r/E1RnXd1-0003wU-PM@eag09.americas.sgi.com
    [ Added the uv_irq_lock initialization fix by Dimitri Sivanich ]
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Cliff Wickman
     

26 Jan, 2012

11 commits

  • commit c25a785d6647984505fa165b5cd84cfc9a95970b upstream.

    If the provided system call number is equal to __NR_syscalls, the
    current check will pass and a function pointer just after the system
    call table may be called, since sys_call_table is an array with total
    size __NR_syscalls.

    Whether or not this is a security bug depends on what the compiler puts
    immediately after the system call table. It's likely that this won't do
    anything bad because there is an additional NULL check on the syscall
    entry, but if there happens to be a non-NULL value immediately after the
    system call table, this may result in local privilege escalation.

    Signed-off-by: Dan Rosenberg
    Cc: Chen Liqin
    Cc: Lennox Wu
    Cc: Eugene Teo
    Cc: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Dan Rosenberg
     
  • commit c5d35d399e685acccc85a675e8765c26b2a9813a upstream.

    This patch implements a workaround for a UV2 hardware bug.
    The bug is a non-atomic update of a memory-mapped register. When
    hardware message delivery and software message acknowledge occur
    simultaneously the pending message acknowledge for the arriving
    message may be lost. This causes the sender's message status to
    stay busy.

    Part of the workaround is to not acknowledge a completed message
    until it is verified that no other message is actually using the
    resource that is mistakenly recorded in the completed message.

    Part of the workaround is to test for long elapsed time in such
    a busy condition, then handle it by using a spare sending
    descriptor. The stay-busy condition is eventually timed out by
    hardware, and then the original sending descriptor can be
    re-used. Most of that logic change is in keeping track of the
    current descriptor and the state of the spares.

    The occurrences of the workaround are added to the BAU
    statistics.

    Signed-off-by: Cliff Wickman
    Link: http://lkml.kernel.org/r/20120116211947.GC5767@sgi.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Cliff Wickman
     
  • commit d059f9fa84a30e04279c6ff615e9e2cf3b260191 upstream.

    Move the call to enable_timeouts() forward so that
    BAU_MISC_CONTROL is initialized before using it in
    calculate_destination_timeout().

    Fix the calculation of a BAU destination timeout
    for UV2 (in calculate_destination_timeout()).

    Signed-off-by: Cliff Wickman
    Link: http://lkml.kernel.org/r/20120116211848.GB5767@sgi.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Cliff Wickman
     
  • commit da87c937e5a2374686edd58df06cfd5050b125fa upstream.

    Update the use of the Broadcast Assist Unit on SGI Altix UV2 to
    the use of native UV2 mode on new hardware (not the legacy mode).

    UV2 native mode has a different format for a broadcast message.
    We also need quick differentiaton between UV1 and UV2.

    Signed-off-by: Cliff Wickman
    Link: http://lkml.kernel.org/r/20120116211750.GA5767@sgi.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Cliff Wickman
     
  • commit 9f10f6a520deb3639fac78d81151a3ade88b4e7f upstream.

    In SRAT v1, we had 8bit proximity domain (PXM) fields; SRAT v2 provides
    32bits for these. The new fields were reserved before.
    According to the ACPI spec, the OS must disregrard reserved fields.

    ia64 did handle the PXM fields almost consistently, but depending on
    sgi's sn2 platform. This patch leaves the sn2 logic in, but does also
    use 16/32 bits for PXM if the SRAT has rev 2 or higher.

    The patch also adds __init to the two pxm accessor functions, as they
    access __initdata now and are called from an __init function only anyway.

    Note that the code only uses 16 bits for the PXM field in the processor
    proximity field; the patch does not address this as 16 bits are more than
    enough.

    Signed-off-by: Kurt Garloff
    Signed-off-by: Len Brown
    Signed-off-by: Greg Kroah-Hartman

    Kurt Garloff
     
  • commit cd298f60a2451a16e0f077404bf69b62ec868733 upstream.

    In SRAT v1, we had 8bit proximity domain (PXM) fields; SRAT v2 provides
    32bits for these. The new fields were reserved before.
    According to the ACPI spec, the OS must disregrard reserved fields.

    x86/x86-64 was rather inconsistent prior to this patch; it used 8 bits
    for the pxm field in cpu_affinity, but 32 bits in mem_affinity.
    This patch makes it consistent: Either use 8 bits consistently (SRAT
    rev 1 or lower) or 32 bits (SRAT rev 2 or higher).

    cc: x86@kernel.org
    Signed-off-by: Kurt Garloff
    Signed-off-by: Len Brown
    Signed-off-by: Greg Kroah-Hartman

    Kurt Garloff
     
  • commit da517a08ac5913cd80ce3507cddd00f2a091b13c upstream.

    SGI UV systems print a message during boot:

    UV: Found blades

    Due to packaging changes, the blade count is not accurate for
    on the next generation of the platform. This patch corrects the
    count.

    Signed-off-by: Jack Steiner
    Link: http://lkml.kernel.org/r/20120106191900.GA19772@sgi.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Jack Steiner
     
  • commit 9af0c7a6fa860698d080481f24a342ba74b68982 upstream.

    On x86_32 casting the unsigned int result of get_random_int() to
    long may result in a negative value. On x86_32 the range of
    mmap_rnd() therefore was -255 to 255. The 32bit mode on x86_64
    used 0 to 255 as intended.

    The bug was introduced by 675a081 ("x86: unify mmap_{32|64}.c")
    in January 2008.

    Signed-off-by: Ludwig Nussel
    Cc: Linus Torvalds
    Cc: harvey.harrison@gmail.com
    Cc: "H. Peter Anvin"
    Cc: Harvey Harrison
    Signed-off-by: Andrew Morton
    Link: http://lkml.kernel.org/r/201111152246.pAFMklOB028527@wpaz5.hot.corp.google.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Ludwig Nussel
     
  • commit 5cf9a4e69c1ff0ccdd1d2b7404f95c0531355274 upstream.

    We only need amd_bus.o for AMD systems with PCI. arch/x86/pci/Makefile
    already depends on CONFIG_PCI=y, so this patch just adds the dependency
    on CONFIG_AMD_NB.

    Cc: Yinghai Lu
    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Bjorn Helgaas
     
  • commit 24d25dbfa63c376323096660bfa9ad45a08870ce upstream.

    This factors out the AMD native MMCONFIG discovery so we can use it
    outside amd_bus.c.

    amd_bus.c reads AMD MSRs so it can remove the MMCONFIG area from the
    PCI resources. We may also need the MMCONFIG information to work
    around BIOS defects in the ACPI MCFG table.

    Cc: Borislav Petkov
    Cc: Yinghai Lu
    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Jesse Barnes
    Signed-off-by: Greg Kroah-Hartman

    Bjorn Helgaas
     
  • commit ae5cd86455381282ece162966183d3f208c6fad7 upstream.

    This assures that a _CRS reserved host bridge window or window region is
    not used if it is not addressable by the CPU. The new code either trims
    the window to exclude the non-addressable portion or totally ignores the
    window if the entire window is non-addressable.

    The current code has been shown to be problematic with 32-bit non-PAE
    kernels on systems where _CRS reserves resources above 4GB.

    Signed-off-by: Gary Hade
    Reviewed-by: Bjorn Helgaas
    Cc: Thomas Renninger
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Jesse Barnes
    Signed-off-by: Greg Kroah-Hartman

    Gary Hade
     

13 Jan, 2012

2 commits

  • commit e4f387d8db3ba3c2dae4d8bdfe7bb5f4fe1bcb0d upstream.

    Unpaired calling of probe_hcall_entry and probe_hcall_exit might happen
    as following, which could cause incorrect preempt count.

    __trace_hcall_entry => trace_hcall_entry -> probe_hcall_entry =>
    get_cpu_var => preempt_disable

    __trace_hcall_exit => trace_hcall_exit -> probe_hcall_exit =>
    put_cpu_var => preempt_enable

    where:
    A => B and A -> B means A calls B, but
    => means A will call B through function name, and B will definitely be
    called.
    -> means A will call B through function pointer, so B might not be
    called if the function pointer is not set.

    So error happens when only one of probe_hcall_entry and probe_hcall_exit
    get called during a hcall.

    This patch tries to move the preempt count operations from
    probe_hcall_entry and probe_hcall_exit to its callers.

    Reported-by: Paul E. McKenney
    Signed-off-by: Li Zhong
    Tested-by: Paul E. McKenney
    Signed-off-by: Benjamin Herrenschmidt
    Signed-off-by: Greg Kroah-Hartman

    Li Zhong
     
  • commit 37fb9a0231ee43d42d069863bdfd567fca2b61af upstream.

    When re-enabling interrupts we have code to handle edge sensitive
    decrementers by resetting the decrementer to 1 whenever it is negative.
    If interrupts were disabled long enough that the decrementer wrapped to
    positive we do nothing. This means interrupts can be delayed for a long
    time until it finally goes negative again.

    While we hope interrupts are never be disabled long enough for the
    decrementer to go positive, we have a very good test team that can
    drive any kernel into the ground. The softlockup data we get back
    from these fails could be seconds in the future, completely missing
    the cause of the lockup.

    We already keep track of the timebase of the next event so use that
    to work out if we should trigger a decrementer exception.

    Signed-off-by: Anton Blanchard
    Signed-off-by: Benjamin Herrenschmidt
    Signed-off-by: Greg Kroah-Hartman

    Anton Blanchard
     

31 Dec, 2011

3 commits


30 Dec, 2011

2 commits

  • * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    perf/x86: Fix raw_spin_unlock_irqrestore() usage
    oprofile, arm/sh: Fix oprofile_arch_exit() linkage issue

    Linus Torvalds
     
  • Commit 2a95ea6c0d129b4 ("procfs: do not overflow get_{idle,iowait}_time
    for nohz") did not take into account that one some architectures jiffies
    and cputime use different units.

    This causes get_idle_time() to return numbers in the wrong units, making
    the idle time fields in /proc/stat wrong.

    Instead of converting the usec value returned by
    get_cpu_{idle,iowait}_time_us to units of jiffies, use the new function
    usecs_to_cputime64 to convert it to the correct unit of cputime64_t.

    Signed-off-by: Andreas Schwab
    Acked-by: Michal Hocko
    Cc: Arnd Bergmann
    Cc: "Artem S. Tashkinov"
    Cc: Dave Jones
    Cc: Alexey Dobriyan
    Cc: Thomas Gleixner
    Cc: "Luck, Tony"
    Cc: Benjamin Herrenschmidt
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andreas Schwab
     

28 Dec, 2011

2 commits

  • SROMC static memory mapping is included in the common s5p initialization
    code. Hence, remove the duplicated SROMC static memory mapping for EXYNOS.

    Signed-off-by: Thomas Abraham
    Cc: stable@kernel.org
    Signed-off-by: Kukjin Kim

    Thomas Abraham
     
  • Following is happened when CONFIG_CPU_FREQ_S3C24XX_DEBUGFS
    is selected without building of s3c2410-iotiming.c file:

    arch/arm/mach-s3c2440/built-in.o:(.data+0x38c): undefined reference to `s3c2410_iotiming_debugfs

    Basically, the CONFIG_S3C2410_IOTIMING is not selected for
    MACH_MINI2440. Because the s3c2410-iotiming.c is not ever
    compiled and enabling CONFIG_CPU_FREQ_S3C24XX_DEBUGFS option
    caused undefined reference to s3c2410_iotiming_debugfs()
    defined in that file. The s3c2410_iotiming_debugfs defined
    as NULL for this case.

    Signed-off-by: Denis Kuzmenko
    Cc: stable@kernel.org
    [kgene.kim@samsung.com: removed useless changes]
    Signed-off-by: Kukjin Kim

    Denis Kuzmenko
     

26 Dec, 2011

5 commits

  • This is required for THIS_MODULE. We recently stopped acquiring
    it via some other header.

    Signed-off-by: Scott Wood
    Signed-off-by: Alexander Graf

    Scott Wood
     
  • Currently kvmppc_start_thread() tries to wake other SMT threads via
    xics_wake_cpu(). Unfortunately xics_wake_cpu only exists when
    CONFIG_SMP=Y so when compiling with CONFIG_SMP=N we get:

    arch/powerpc/kvm/built-in.o: In function `.kvmppc_start_thread':
    book3s_hv.c:(.text+0xa1e0): undefined reference to `.xics_wake_cpu'

    The following should be fine since kvmppc_start_thread() shouldn't
    called to start non-zero threads when SMP=N since threads_per_core=1.

    Signed-off-by: Michael Neuling
    Signed-off-by: Alexander Graf

    Michael Neuling
     
  • kvmppc_h_pr is only available if CONFIG_KVM_BOOK3S_64_PR.

    Signed-off-by: Andreas Schwab
    Signed-off-by: Alexander Graf

    Andreas Schwab
     
  • compute_tlbie_rb is only used on ppc64 and cannot be compiled on ppc32.

    Signed-off-by: Andreas Schwab
    Signed-off-by: Alexander Graf

    Andreas Schwab
     
  • Unlike all of the other cpuid bits, the TSC deadline timer bit is set
    unconditionally, regardless of what userspace wants.

    This is broken in several ways:
    - if userspace doesn't use KVM_CREATE_IRQCHIP, and doesn't emulate the TSC
    deadline timer feature, a guest that uses the feature will break
    - live migration to older host kernels that don't support the TSC deadline
    timer will cause the feature to be pulled from under the guest's feet;
    breaking it
    - guests that are broken wrt the feature will fail.

    Fix by not enabling the feature automatically; instead report it to userspace.
    Because the feature depends on KVM_CREATE_IRQCHIP, which we cannot guarantee
    will be called, we expose it via a KVM_CAP_TSC_DEADLINE_TIMER and not
    KVM_GET_SUPPORTED_CPUID.

    Fixes the Illumos guest kernel, which uses the TSC deadline timer feature.

    [avi: add the KVM_CAP + documentation]

    Reported-by: Alexey Zaytsev
    Tested-by: Alexey Zaytsev
    Signed-off-by: Jan Kiszka
    Signed-off-by: Avi Kivity

    Jan Kiszka
     

25 Dec, 2011

1 commit

  • User space may create the PIT and forgets about setting up the irqchips.
    In that case, firing PIT IRQs will crash the host:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000128
    IP: [] kvm_set_irq+0x30/0x170 [kvm]
    ...
    Call Trace:
    [] pit_do_work+0x51/0xd0 [kvm]
    [] process_one_work+0x111/0x4d0
    [] worker_thread+0x152/0x340
    [] kthread+0x7e/0x90
    [] kernel_thread_helper+0x4/0x10

    Prevent this by checking the irqchip mode before starting a timer. We
    can't deny creating the PIT if the irqchips aren't set up yet as
    current user land expects this order to work.

    Signed-off-by: Jan Kiszka
    Signed-off-by: Marcelo Tosatti

    Jan Kiszka