20 Jan, 2021

1 commit

  • commit ad0a6bad44758afa3b440c254a24999a0c7e35d5 upstream.

    We've observed crashes due to an empty cpu mask in
    hyperv_flush_tlb_others. Obviously the cpu mask in question is changed
    between the cpumask_empty call at the beginning of the function and when
    it is actually used later.

    One theory is that an interrupt comes in between and a code path ends up
    changing the mask. Move the check after interrupt has been disabled to
    see if it fixes the issue.

    Signed-off-by: Wei Liu
    Cc: stable@kernel.org
    Link: https://lore.kernel.org/r/20210105175043.28325-1-wei.liu@kernel.org
    Reviewed-by: Michael Kelley
    Signed-off-by: Greg Kroah-Hartman

    Wei Liu
     

06 Nov, 2020

1 commit


27 Oct, 2020

1 commit

  • The comment about Hyper-V accessors is unclear regarding their
    potential use in x2apic mode, as is the associated commit message
    in e211288b72f1. Clarify that while the architectural and
    synthetic MSRs are equivalent in x2apic mode, the full set of xapic
    accessors cannot be used because of register layout differences.

    Fixes: e211288b72f1 ("x86/hyperv: Make vapic support x2apic mode")
    Signed-off-by: Michael Kelley
    Link: https://lore.kernel.org/r/1603723972-81303-1-git-send-email-mikelley@microsoft.com
    Signed-off-by: Wei Liu

    Michael Kelley
     

27 Sep, 2020

1 commit

  • In the architecture independent version of hyperv-tlfs.h, commit c55a844f46f958b
    removed the "X64" in the symbol names so they would make sense for both x86 and
    ARM64. That commit added aliases with the "X64" in the x86 version of hyperv-tlfs.h
    so that existing x86 code would continue to compile.

    As a cleanup, update the x86 code to use the symbols without the "X64", then remove
    the aliases. There's no functional change.

    Signed-off-by: Joseph Salisbury
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Michael Kelley
    Acked-by: Paolo Bonzini
    Link: https://lore.kernel.org/r/1601130386-11111-1-git-send-email-jsalisbury@linux.microsoft.com

    Joseph Salisbury
     

04 Jul, 2020

1 commit

  • Fix the recently added new __vmalloc_node_range callers to pass the
    correct values as the owner for display in /proc/vmallocinfo.

    Fixes: 800e26b81311 ("x86/hyperv: allocate the hypercall page with only read and execute bits")
    Fixes: 10d5e97c1bf8 ("arm64: use PAGE_KERNEL_ROX directly in alloc_insn_page")
    Fixes: 7a0e27b2a0ce ("mm: remove vmalloc_exec")
    Reported-by: Ard Biesheuvel
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Link: http://lkml.kernel.org/r/20200627075649.2455097-1-hch@lst.de
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

26 Jun, 2020

1 commit

  • Patch series "fix a hyperv W^X violation and remove vmalloc_exec"

    Dexuan reported a W^X violation due to the fact that the hyper hypercall
    page due switching it to be allocated using vmalloc_exec.

    The problem is that PAGE_KERNEL_EXEC as used by vmalloc_exec actually
    sets writable permissions in the pte. This series fixes the issue by
    switching to the low-level __vmalloc_node_range interface that allows
    specifing more detailed permissions instead. It then also open codes
    the other two callers and removes the somewhat confusing vmalloc_exec
    interface.

    Peter noted that the hyper hypercall page allocation also has another
    long standing issue in that it shouldn't use the full vmalloc but just
    the module space. This issue is so far theoretical as the allocation is
    done early in the boot process. I plan to fix it with another bigger
    series for 5.9.

    This patch (of 3):

    Avoid a W^X violation cause by the fact that PAGE_KERNEL_EXEC includes
    the writable bit.

    For this resurrect the removed PAGE_KERNEL_RX definition, but as
    PAGE_KERNEL_ROX to match arm64 and powerpc.

    Link: http://lkml.kernel.org/r/20200618064307.32739-2-hch@lst.de
    Fixes: 78bb17f76edc ("x86/hyperv: use vmalloc_exec for the hypercall page")
    Signed-off-by: Christoph Hellwig
    Reported-by: Dexuan Cui
    Tested-by: Vitaly Kuznetsov
    Acked-by: Wei Liu
    Acked-by: Peter Zijlstra (Intel)
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: Jessica Yu
    Cc: David Hildenbrand
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

11 Jun, 2020

1 commit

  • Convert various hypervisor vectors to IDTENTRY_SYSVEC:

    - Implement the C entry point with DEFINE_IDTENTRY_SYSVEC
    - Emit the ASM stub with DECLARE_IDTENTRY_SYSVEC
    - Remove the ASM idtentries in 64-bit
    - Remove the BUILD_INTERRUPT entries in 32-bit
    - Remove the old prototypes

    No functional change.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar
    Acked-by: Andy Lutomirski
    Reviewed-by: Wei Liu
    Link: https://lore.kernel.org/r/20200521202119.647997594@linutronix.de

    Thomas Gleixner
     

03 Jun, 2020

2 commits

  • The pgprot argument to __vmalloc is always PAGE_KERNEL now, so remove it.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Reviewed-by: Michael Kelley [hyperv]
    Acked-by: Gao Xiang [erofs]
    Acked-by: Peter Zijlstra (Intel)
    Acked-by: Wei Liu
    Cc: Christian Borntraeger
    Cc: Christophe Leroy
    Cc: Daniel Vetter
    Cc: David Airlie
    Cc: Greg Kroah-Hartman
    Cc: Haiyang Zhang
    Cc: Johannes Weiner
    Cc: "K. Y. Srinivasan"
    Cc: Laura Abbott
    Cc: Mark Rutland
    Cc: Minchan Kim
    Cc: Nitin Gupta
    Cc: Robin Murphy
    Cc: Sakari Ailus
    Cc: Stephen Hemminger
    Cc: Sumit Semwal
    Cc: Benjamin Herrenschmidt
    Cc: Catalin Marinas
    Cc: Heiko Carstens
    Cc: Paul Mackerras
    Cc: Vasily Gorbik
    Cc: Will Deacon
    Link: http://lkml.kernel.org/r/20200414131348.444715-22-hch@lst.de
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Patch series "decruft the vmalloc API", v2.

    Peter noticed that with some dumb luck you can toast the kernel address
    space with exported vmalloc symbols.

    I used this as an opportunity to decruft the vmalloc.c API and make it
    much more systematic. This also removes any chance to create vmalloc
    mappings outside the designated areas or using executable permissions
    from modules. Besides that it removes more than 300 lines of code.

    This patch (of 29):

    Use the designated helper for allocating executable kernel memory, and
    remove the now unused PAGE_KERNEL_RX define.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Reviewed-by: Michael Kelley
    Acked-by: Wei Liu
    Acked-by: Peter Zijlstra (Intel)
    Cc: Christian Borntraeger
    Cc: Christophe Leroy
    Cc: Daniel Vetter
    Cc: David Airlie
    Cc: Gao Xiang
    Cc: Greg Kroah-Hartman
    Cc: Haiyang Zhang
    Cc: Johannes Weiner
    Cc: "K. Y. Srinivasan"
    Cc: Laura Abbott
    Cc: Mark Rutland
    Cc: Minchan Kim
    Cc: Nitin Gupta
    Cc: Robin Murphy
    Cc: Sakari Ailus
    Cc: Stephen Hemminger
    Cc: Sumit Semwal
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: Heiko Carstens
    Cc: Vasily Gorbik
    Link: http://lkml.kernel.org/r/20200414131348.444715-1-hch@lst.de
    Link: http://lkml.kernel.org/r/20200414131348.444715-2-hch@lst.de
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

13 May, 2020

1 commit

  • Errors during hibernation with reenlightenment notifications enabled were
    reported:

    [ 51.730435] PM: hibernation entry
    [ 51.737435] PM: Syncing filesystems ...
    ...
    [ 54.102216] Disabling non-boot CPUs ...
    [ 54.106633] smpboot: CPU 1 is now offline
    [ 54.110006] unchecked MSR access error: WRMSR to 0x40000106 (tried to
    write 0x47c72780000100ee) at rIP: 0xffffffff90062f24
    native_write_msr+0x4/0x20)
    [ 54.110006] Call Trace:
    [ 54.110006] hv_cpu_die+0xd9/0xf0
    ...

    Normally, hv_cpu_die() just reassigns reenlightenment notifications to some
    other CPU when the CPU receiving them goes offline. Upon hibernation, there
    is no other CPU which is still online so cpumask_any_but(cpu_online_mask)
    returns >= nr_cpu_ids and using it as hv_vp_index index is incorrect.
    Disable the feature when cpumask_any_but() fails.

    Also, as we now disable reenlightenment notifications upon hibernation we
    need to restore them on resume. Check if hv_reenlightenment_cb was
    previously set and restore from hv_resume().

    Signed-off-by: Vitaly Kuznetsov
    Reviewed-by: Dexuan Cui
    Reviewed-by: Tianyu Lan
    Link: https://lore.kernel.org/r/20200512160153.134467-1-vkuznets@redhat.com
    Signed-off-by: Wei Liu

    Vitaly Kuznetsov
     

21 Apr, 2020

1 commit

  • Unlike the other CPUs, CPU0 is never offlined during hibernation, so in the
    resume path, the "new" kernel's VP assist page is not suspended (i.e. not
    disabled), and later when we jump to the "old" kernel, the page is not
    properly re-enabled for CPU0 with the allocated page from the old kernel.

    So far, the VP assist page is used by hv_apic_eoi_write(), and is also
    used in the case of nested virtualization (running KVM atop Hyper-V).

    For hv_apic_eoi_write(), when the page is not properly re-enabled,
    hvp->apic_assist is always 0, so the HV_X64_MSR_EOI MSR is always written.
    This is not ideal with respect to performance, but Hyper-V can still
    correctly handle this according to the Hyper-V spec; nevertheless, Linux
    still must update the Hyper-V hypervisor with the correct VP assist page
    to prevent Hyper-V from writing to the stale page, which causes guest
    memory corruption and consequently may have caused the hangs and triple
    faults seen during non-boot CPUs resume.

    Fix the issue by calling hv_cpu_die()/hv_cpu_init() in the syscore ops.
    Without the fix, hibernation can fail at a rate of 1/300 ~ 1/500.
    With the fix, hibernation can pass a long-haul test of 2000 runs.

    In the case of nested virtualization, disabling/reenabling the assist
    page upon hibernation may be unsafe if there are active L2 guests.
    It looks KVM should be enhanced to abort the hibernation request if
    there is any active L2 guest.

    Fixes: 05bd330a7fd8 ("x86/hyperv: Suspend/resume the hypercall page for hibernation")
    Cc: stable@vger.kernel.org
    Signed-off-by: Dexuan Cui
    Link: https://lore.kernel.org/r/1587437171-2472-1-git-send-email-decui@microsoft.com
    Signed-off-by: Wei Liu

    Dexuan Cui
     

12 Apr, 2020

1 commit

  • When oops happens with panic_on_oops unset, the oops
    thread is killed by die() and system continues to run.
    In such case, guest should not report crash register
    data to host since system still runs. Check panic_on_oops
    and return directly in hyperv_report_panic() when the function
    is called in the die() and panic_on_oops is unset. Fix it.

    Fixes: 7ed4325a44ea ("Drivers: hv: vmbus: Make panic reporting to be more useful")
    Signed-off-by: Tianyu Lan
    Reviewed-by: Michael Kelley
    Link: https://lore.kernel.org/r/20200406155331.2105-7-Tianyu.Lan@microsoft.com
    Signed-off-by: Wei Liu

    Tianyu Lan
     

01 Feb, 2020

1 commit

  • For hibernation the hypercall page must be disabled before the hibernation
    image is created so that subsequent hypercall operations fail safely. On
    resume the hypercall page has to be restored and reenabled to ensure proper
    operation of the resumed kernel.

    Implement the necessary suspend/resume callbacks.

    [ tglx: Decrypted changelog ]

    Signed-off-by: Dexuan Cui
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Michael Kelley
    Link: https://lore.kernel.org/r/1578350559-130275-1-git-send-email-decui@microsoft.com

    Dexuan Cui
     

01 Dec, 2019

1 commit

  • Pull Hyper-V updates from Sasha Levin:

    - support for new VMBus protocols (Andrea Parri)

    - hibernation support (Dexuan Cui)

    - latency testing framework (Branden Bonaby)

    - decoupling Hyper-V page size from guest page size (Himadri Pandya)

    * tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (22 commits)
    Drivers: hv: vmbus: Fix crash handler reset of Hyper-V synic
    drivers/hv: Replace binary semaphore with mutex
    drivers: iommu: hyperv: Make HYPERV_IOMMU only available on x86
    HID: hyperv: Add the support of hibernation
    hv_balloon: Add the support of hibernation
    x86/hyperv: Implement hv_is_hibernation_supported()
    Drivers: hv: balloon: Remove dependencies on guest page size
    Drivers: hv: vmbus: Remove dependencies on guest page size
    x86: hv: Add function to allocate zeroed page for Hyper-V
    Drivers: hv: util: Specify ring buffer size using Hyper-V page size
    Drivers: hv: Specify receive buffer size using Hyper-V page size
    tools: hv: add vmbus testing tool
    drivers: hv: vmbus: Introduce latency testing
    video: hyperv: hyperv_fb: Support deferred IO for Hyper-V frame buffer driver
    video: hyperv: hyperv_fb: Obtain screen resolution from Hyper-V host
    hv_netvsc: Add the support of hibernation
    hv_sock: Add the support of hibernation
    video: hyperv_fb: Add the support of hibernation
    scsi: storvsc: Add the support of hibernation
    Drivers: hv: vmbus: Add module parameter to cap the VMBus version
    ...

    Linus Torvalds
     

22 Nov, 2019

2 commits

  • The API will be used by the hv_balloon and hv_vmbus drivers.

    Balloon up/down and hot-add of memory must not be active if the user
    wants the Linux VM to support hibernation, because they are incompatible
    with hibernation according to Hyper-V team, e.g. upon suspend the
    balloon VSP doesn't save any info about the ballooned-out pages (if any);
    so, after Linux resumes, Linux balloon VSC expects that the VSP will
    return the pages if Linux is under memory pressure, but the VSP will
    never do that, since the VSP thinks it never stole the pages from the VM.

    So, if the user wants Linux VM to support hibernation, Linux must forbid
    balloon up/down and hot-add, and the only functionality of the balloon VSC
    driver is reporting the VM's memory pressure to the host.

    Ideally, when Linux detects that the user wants it to support hibernation,
    the balloon VSC should tell the VSP that it does not support ballooning
    and hot-add. However, the current version of the VSP requires the VSC
    should support these capabilities, otherwise the capability negotiation
    fails and the VSC can not load at all, so with the later changes to the
    VSC driver, Linux VM still reports to the VSP that the VSC supports these
    capabilities, but the VSC ignores the VSP's requests of balloon up/down
    and hot add, and reports an error to the VSP, when applicable. BTW, in
    the future the balloon VSP driver will allow the VSC to not support the
    capabilities of balloon up/down and hot add.

    The ACPI S4 state is not a must for hibernation to work, because Linux is
    able to hibernate as long as the system can shut down. However in practice
    we decide to artificially use the presence of the virtual ACPI S4 state as
    an indicator of the user's intent of using hibernation, because Linux VM
    must find a way to know if the user wants to use the hibernation feature
    or not.

    By default, Hyper-V does not enable the virtual ACPI S4 state; on recent
    Hyper-V hosts (e.g. RS5, 19H1), the administrator is able to enable the
    state for a VM by WMI commands.

    Once all the vmbus and VSC patches for the hibernation feature are
    accepted, an extra patch will be submitted to forbid hibernation if the
    virtual ACPI S4 state is absent, i.e. hv_is_hibernation_supported() is
    false.

    Signed-off-by: Dexuan Cui
    Reviewed-by: Michael Kelley
    Acked-by: Thomas Gleixner
    Signed-off-by: Sasha Levin

    Dexuan Cui
     
  • Hyper-V assumes page size to be 4K. While this assumption holds true on
    x86 architecture, it might not be true for ARM64 architecture. Hence
    define hyper-v specific function to allocate a zeroed page which can
    have a different implementation on ARM64 architecture to handle the
    conflict between hyper-v's assumed page size and actual guest page size.

    Signed-off-by: Himadri Pandya
    Reviewed-by: Michael Kelley
    Signed-off-by: Sasha Levin

    Himadri Pandya
     

15 Nov, 2019

2 commits

  • Hyper-V has historically initialized stimer-based clockevents late in the
    process of onlining a CPU because clockevents depend on stimer
    interrupts. In the original Hyper-V design, stimer interrupts generate a
    VMbus message, so the VMbus machinery must be running first, and VMbus
    can't be initialized until relatively late. On x86/64, LAPIC timer based
    clockevents are used during early initialization before VMbus and
    stimer-based clockevents are ready, and again during CPU offlining after
    the stimer clockevents have been shut down.

    Unfortunately, this design creates problems when offlining CPUs for
    hibernation or other purposes. stimer-based clockevents are shut down
    relatively early in the offlining process, so clockevents_unbind_device()
    must be used to fallback to the LAPIC-based clockevents for the remainder
    of the offlining process. Furthermore, the late initialization and early
    shutdown of stimer-based clockevents doesn't work well on ARM64 since there
    is no other timer like the LAPIC to fallback to. So CPU onlining and
    offlining doesn't work properly.

    Fix this by recognizing that stimer Direct Mode is the normal path for
    newer versions of Hyper-V on x86/64, and the only path on other
    architectures. With stimer Direct Mode, stimer interrupts don't require any
    VMbus machinery. stimer clockevents can be initialized and shut down
    consistent with how it is done for other clockevent devices. While the old
    VMbus-based stimer interrupts must still be supported for backward
    compatibility on x86, that mode of operation can be treated as legacy.

    So add a new Hyper-V stimer entry in the CPU hotplug state list, and use
    that new state when in Direct Mode. Update the Hyper-V clocksource driver
    to allocate and initialize stimer clockevents earlier during boot. Update
    Hyper-V initialization and the VMbus driver to use this new design. As a
    result, the LAPIC timer is no longer used during boot or CPU
    onlining/offlining and clockevents_unbind_device() is not called. But
    retain the old design as a legacy implementation for older versions of
    Hyper-V that don't support Direct Mode.

    Signed-off-by: Michael Kelley
    Signed-off-by: Thomas Gleixner
    Tested-by: Dexuan Cui
    Reviewed-by: Dexuan Cui
    Link: https://lkml.kernel.org/r/1573607467-9456-1-git-send-email-mikelley@microsoft.com

    Michael Kelley
     
  • Pick up upstream fixes to avoid conflicts.

    Thomas Gleixner
     

12 Nov, 2019

1 commit

  • When sending an IPI to a single CPU there is no need to deal with cpumasks.

    With 2 CPU guest on WS2019 a minor (like 3%, 8043 -> 7761 CPU cycles)
    improvement with smp_call_function_single() loop benchmark can be seeb. The
    optimization, however, is tiny and straitforward. Also, send_ipi_one() is
    important for PV spinlock kick.

    Switching to the regular APIC IPI send for CPU > 64 case does not make
    sense as it is twice as expesive (12650 CPU cycles for __send_ipi_mask_ex()
    call, 26000 for orig_apic.send_IPI(cpu, vector)).

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Michael Kelley
    Reviewed-by: Roman Kagan
    Link: https://lkml.kernel.org/r/20191027151938.7296-1-vkuznets@redhat.com

    Vitaly Kuznetsov
     

15 Oct, 2019

1 commit

  • Now that there's Hyper-V IOMMU driver, Linux can switch to x2apic mode
    when supported by the vcpus.

    However, the apic access functions for Hyper-V enlightened apic assume
    xapic mode only.

    As a result, Linux fails to bring up secondary cpus when run as a guest
    in QEMU/KVM with both hv_apic and x2apic enabled.

    According to Michael Kelley, when in x2apic mode, the Hyper-V synthetic
    apic MSRs behave exactly the same as the corresponding architectural
    x2apic MSRs, so there's no need to override the apic accessors. The
    only exception is hv_apic_eoi_write, which benefits from lazy EOI when
    available; however, its implementation works for both xapic and x2apic
    modes.

    Fixes: 29217a474683 ("iommu/hyper-v: Add Hyper-V stub IOMMU driver")
    Fixes: 6b48cb5f8347 ("X86/Hyper-V: Enlighten APIC access")
    Suggested-by: Michael Kelley
    Signed-off-by: Roman Kagan
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Vitaly Kuznetsov
    Reviewed-by: Michael Kelley
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20191010123258.16919-1-rkagan@virtuozzo.com

    Roman Kagan
     

18 Sep, 2019

1 commit

  • Pull core timer updates from Thomas Gleixner:
    "Timers and timekeeping updates:

    - A large overhaul of the posix CPU timer code which is a preparation
    for moving the CPU timer expiry out into task work so it can be
    properly accounted on the task/process.

    An update to the bogus permission checks will come later during the
    merge window as feedback was not complete before heading of for
    travel.

    - Switch the timerqueue code to use cached rbtrees and get rid of the
    homebrewn caching of the leftmost node.

    - Consolidate hrtimer_init() + hrtimer_init_sleeper() calls into a
    single function

    - Implement the separation of hrtimers to be forced to expire in hard
    interrupt context even when PREEMPT_RT is enabled and mark the
    affected timers accordingly.

    - Implement a mechanism for hrtimers and the timer wheel to protect
    RT against priority inversion and live lock issues when a (hr)timer
    which should be canceled is currently executing the callback.
    Instead of infinitely spinning, the task which tries to cancel the
    timer blocks on a per cpu base expiry lock which is held and
    released by the (hr)timer expiry code.

    - Enable the Hyper-V TSC page based sched_clock for Hyper-V guests
    resulting in faster access to timekeeping functions.

    - Updates to various clocksource/clockevent drivers and their device
    tree bindings.

    - The usual small improvements all over the place"

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (101 commits)
    posix-cpu-timers: Fix permission check regression
    posix-cpu-timers: Always clear head pointer on dequeue
    hrtimer: Add a missing bracket and hide `migration_base' on !SMP
    posix-cpu-timers: Make expiry_active check actually work correctly
    posix-timers: Unbreak CONFIG_POSIX_TIMERS=n build
    tick: Mark sched_timer to expire in hard interrupt context
    hrtimer: Add kernel doc annotation for HRTIMER_MODE_HARD
    x86/hyperv: Hide pv_ops access for CONFIG_PARAVIRT=n
    posix-cpu-timers: Utilize timerqueue for storage
    posix-cpu-timers: Move state tracking to struct posix_cputimers
    posix-cpu-timers: Deduplicate rlimit handling
    posix-cpu-timers: Remove pointless comparisons
    posix-cpu-timers: Get rid of 64bit divisions
    posix-cpu-timers: Consolidate timer expiry further
    posix-cpu-timers: Get rid of zero checks
    rlimit: Rewrite non-sensical RLIMIT_CPU comment
    posix-cpu-timers: Respect INFINITY for hard RTTIME limit
    posix-cpu-timers: Switch thread group sampling to array
    posix-cpu-timers: Restructure expiry array
    posix-cpu-timers: Remove cputime_expires
    ...

    Linus Torvalds
     

17 Sep, 2019

1 commit


03 Sep, 2019

1 commit

  • When the 'start' parameter is >= 0xFF000000 on 32-bit
    systems, or >= 0xFFFFFFFF'FF000000 on 64-bit systems,
    fill_gva_list() gets into an infinite loop.

    With such inputs, 'cur' overflows after adding HV_TLB_FLUSH_UNIT
    and always compares as less than end. Memory is filled with
    guest virtual addresses until the system crashes.

    Fix this by never incrementing 'cur' to be larger than 'end'.

    Reported-by: Jong Hyun Park
    Signed-off-by: Tianyu Lan
    Reviewed-by: Michael Kelley
    Cc: Borislav Petkov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Fixes: 2ffd9e33ce4a ("x86/hyper-v: Use hypercall for remote TLB flush")
    Signed-off-by: Ingo Molnar

    Tianyu Lan
     

23 Aug, 2019

1 commit

  • Hyper-V guests use the default native_sched_clock() in
    pv_ops.time.sched_clock on x86. But native_sched_clock() directly uses the
    raw TSC value, which can be discontinuous in a Hyper-V VM.

    Add the generic hv_setup_sched_clock() to set the sched clock function
    appropriately. On x86, this sets pv_ops.time.sched_clock to read the
    Hyper-V reference TSC value that is scaled and adjusted to be continuous.

    Also move the Hyper-V reference TSC initialization much earlier in the boot
    process so no discontinuity is observed when pv_ops.time.sched_clock
    calculates its offset.

    [ tglx: Folded build fix ]

    Signed-off-by: Tianyu Lan
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Michael Kelley
    Link: https://lkml.kernel.org/r/20190814123216.32245-3-Tianyu.Lan@microsoft.com

    Tianyu Lan
     

22 Jul, 2019

1 commit

  • Introduce two new functions, hv_alloc_hyperv_page() and
    hv_free_hyperv_page(), to allocate/deallocate memory with the size and
    alignment that Hyper-V expects as a page. Although currently they are not
    used, they are ready to be used to allocate/deallocate memory on x86 when
    their ARM64 counterparts are implemented, keeping symmetry between
    architectures with potentially different guest page sizes.

    Signed-off-by: Maya Nakamura
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Michael Kelley
    Reviewed-by: Vitaly Kuznetsov
    Link: https://lore.kernel.org/lkml/alpine.DEB.2.21.1906272334560.32342@nanos.tec.linutronix.de/
    Link: https://lore.kernel.org/lkml/87muindr9c.fsf@vitty.brq.redhat.com/
    Link: https://lkml.kernel.org/r/706b2e71eb3e587b5f8801e50f090fae2a00e35d.1562916939.git.m.maya.nakamura@gmail.com

    Maya Nakamura
     

19 Jul, 2019

1 commit

  • The VP ASSIST PAGE is an "overlay" page (see Hyper-V TLFS's Section
    5.2.1 "GPA Overlay Pages" for the details) and here is an excerpt:

    "The hypervisor defines several special pages that "overlay" the guest's
    Guest Physical Addresses (GPA) space. Overlays are addressed GPA but are
    not included in the normal GPA map maintained internally by the hypervisor.
    Conceptually, they exist in a separate map that overlays the GPA map.

    If a page within the GPA space is overlaid, any SPA page mapped to the
    GPA page is effectively "obscured" and generally unreachable by the
    virtual processor through processor memory accesses.

    If an overlay page is disabled, the underlying GPA page is "uncovered",
    and an existing mapping becomes accessible to the guest."

    SPA = System Physical Address = the final real physical address.

    When a CPU (e.g. CPU1) is onlined, hv_cpu_init() allocates the VP ASSIST
    PAGE and enables the EOI optimization for this CPU by writing the MSR
    HV_X64_MSR_VP_ASSIST_PAGE. From now on, hvp->apic_assist belongs to the
    special SPA page, and this CPU *always* uses hvp->apic_assist (which is
    shared with the hypervisor) to decide if it needs to write the EOI MSR.

    When a CPU is offlined then on the outgoing CPU:
    1. hv_cpu_die() disables the EOI optimizaton for this CPU, and from
    now on hvp->apic_assist belongs to the original "normal" SPA page;
    2. the remaining work of stopping this CPU is done
    3. this CPU is completely stopped.

    Between 1 and 3, this CPU can still receive interrupts (e.g. reschedule
    IPIs from CPU0, and Local APIC timer interrupts), and this CPU *must* write
    the EOI MSR for every interrupt received, otherwise the hypervisor may not
    deliver further interrupts, which may be needed to completely stop the CPU.

    So, after the EOI optimization is disabled in hv_cpu_die(), it's required
    that the hvp->apic_assist's bit0 is zero, which is not guaranteed by the
    current allocation mode because it lacks __GFP_ZERO. As a consequence the
    bit might be set and interrupt handling would not write the EOI MSR causing
    interrupt delivery to become stuck.

    Add the missing __GFP_ZERO to the allocation.

    Note 1: after the "normal" SPA page is allocted and zeroed out, neither the
    hypervisor nor the guest writes into the page, so the page remains with
    zeros.

    Note 2: see Section 10.3.5 "EOI Assist" for the details of the EOI
    optimization. When the optimization is enabled, the guest can still write
    the EOI MSR register irrespective of the "No EOI required" value, but
    that's slower than the optimized assist based variant.

    Fixes: ba696429d290 ("x86/hyper-v: Implement EOI assist")
    Signed-off-by: Dexuan Cui
    Signed-off-by: Thomas Gleixner
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/ <PU1P153MB0169B716A637FABF07433C04BFCB0@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM

    Dexuan Cui
     

03 Jul, 2019

1 commit

  • Continue consolidating Hyper-V clock and timer code into an ISA
    independent Hyper-V clocksource driver.

    Move the existing clocksource code under drivers/hv and arch/x86 to the new
    clocksource driver while separating out the ISA dependencies. Update
    Hyper-V initialization to call initialization and cleanup routines since
    the Hyper-V synthetic clock is not independently enumerated in ACPI.

    Update Hyper-V clocksource users in KVM and VDSO to get definitions from
    the new include file.

    No behavior is changed and no new functionality is added.

    Suggested-by: Marc Zyngier
    Signed-off-by: Michael Kelley
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Vitaly Kuznetsov
    Cc: "bp@alien8.de"
    Cc: "will.deacon@arm.com"
    Cc: "catalin.marinas@arm.com"
    Cc: "mark.rutland@arm.com"
    Cc: "linux-arm-kernel@lists.infradead.org"
    Cc: "gregkh@linuxfoundation.org"
    Cc: "linux-hyperv@vger.kernel.org"
    Cc: "olaf@aepfle.de"
    Cc: "apw@canonical.com"
    Cc: "jasowang@redhat.com"
    Cc: "marcelo.cerri@canonical.com"
    Cc: Sunil Muthuswamy
    Cc: KY Srinivasan
    Cc: "sashal@kernel.org"
    Cc: "vincenzo.frascino@arm.com"
    Cc: "linux-arch@vger.kernel.org"
    Cc: "linux-mips@vger.kernel.org"
    Cc: "linux-kselftest@vger.kernel.org"
    Cc: "arnd@arndb.de"
    Cc: "linux@armlinux.org.uk"
    Cc: "ralf@linux-mips.org"
    Cc: "paul.burton@mips.com"
    Cc: "daniel.lezcano@linaro.org"
    Cc: "salyzyn@android.com"
    Cc: "pcc@google.com"
    Cc: "shuah@kernel.org"
    Cc: "0x7f454c46@gmail.com"
    Cc: "linux@rasmusvillemoes.dk"
    Cc: "huw@codeweavers.com"
    Cc: "sfr@canb.auug.org.au"
    Cc: "pbonzini@redhat.com"
    Cc: "rkrcmar@redhat.com"
    Cc: "kvm@vger.kernel.org"
    Link: https://lkml.kernel.org/r/1561955054-1838-3-git-send-email-mikelley@microsoft.com

    Michael Kelley
     

05 Jun, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license version 2 as
    published by the free software foundation this program is
    distributed in the hope that it will be useful but without any
    warranty without even the implied warranty of merchantability or
    fitness for a particular purpose good title or non infringement see
    the gnu general public license for more details

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 9 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Alexios Zavras
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190529141900.459653302@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

21 May, 2019

1 commit


19 Apr, 2019

1 commit

  • This function is referrenced from assembler, so it needs to be marked
    visible for LTO.

    Fixes: 3a025de64bf8 ("x86/hyperv: Enable PV qspinlock for Hyper-V")
    Signed-off-by: Andi Kleen
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Yi Sun
    Cc: kys@microsoft.com
    Cc: haiyangz@microsoft.com
    Link: https://lkml.kernel.org/r/20190330004743.29541-6-andi@firstfloor.org

    Andi Kleen
     

16 Apr, 2019

1 commit

  • Hyper-V TLFS suggests an optimization to avoid imminent VMExit on EOI:
    "The OS performs an EOI by atomically writing zero to the EOI Assist field
    of the virtual VP assist page and checking whether the "No EOI required"
    field was previously zero. If it was, the OS must write to the
    HV_X64_APIC_EOI MSR thereby triggering an intercept into the hypervisor."

    Implement the optimization in Linux.

    Tested-by: Long Li
    Signed-off-by: Vitaly Kuznetsov
    Cc: Borislav Petkov
    Cc: Haiyang Zhang
    Cc: K. Y. Srinivasan
    Cc: Linus Torvalds
    Cc: Michael Kelley (EOSG)
    Cc: Peter Zijlstra
    Cc: Sasha Levin
    Cc: Simon Xiao
    Cc: Stephen Hemminger
    Cc: Thomas Gleixner
    Cc: linux-hyperv@vger.kernel.org
    Link: http://lkml.kernel.org/r/20190403170309.4107-1-vkuznets@redhat.com
    Signed-off-by: Ingo Molnar

    Vitaly Kuznetsov
     

21 Mar, 2019

1 commit

  • The page allocation in hv_cpu_init() can fail, but the code does not
    have a check for that.

    Add a check and return -ENOMEM when the allocation fails.

    [ tglx: Massaged changelog ]

    Signed-off-by: Kangjie Lu
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Mukesh Ojha
    Acked-by: "K. Y. Srinivasan"
    Cc: pakki001@umn.edu
    Cc: Haiyang Zhang
    Cc: Stephen Hemminger
    Cc: Sasha Levin
    Cc: Borislav Petkov
    Cc: "H. Peter Anvin"
    Cc: linux-hyperv@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190314054651.1315-1-kjlu@umn.edu

    Kangjie Lu
     

11 Mar, 2019

1 commit

  • Pull x86 fixes from Thomas Gleixner:
    "A set of fixes for x86:

    - Make the unwinder more robust when it encounters a NULL pointer
    call, so the backtrace becomes more useful

    - Fix the bogus ORC unwind table alignment

    - Prevent kernel panic during kexec on HyperV caused by a cleared but
    not disabled hypercall page.

    - Remove the now pointless stacksize increase for KASAN_EXTRA, as
    KASAN_EXTRA is gone.

    - Remove unused variables from the x86 memory management code"

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/hyperv: Fix kernel panic when kexec on HyperV
    x86/mm: Remove unused variable 'old_pte'
    x86/mm: Remove unused variable 'cpu'
    Revert "x86_64: Increase stack size for KASAN_EXTRA"
    x86/unwind: Add hardcoded ORC entry for NULL
    x86/unwind: Handle NULL pointer calls better in frame unwinder
    x86/unwind/orc: Fix ORC unwind table alignment

    Linus Torvalds
     

07 Mar, 2019

1 commit

  • After commit 68bb7bfb7985 ("X86/Hyper-V: Enable IPI enlightenments"),
    kexec fails with a kernel panic:

    kexec_core: Starting new kernel
    BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
    Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v3.0 03/02/2018
    RIP: 0010:0xffffc9000001d000

    Call Trace:
    ? __send_ipi_mask+0x1c6/0x2d0
    ? hv_send_ipi_mask_allbutself+0x6d/0xb0
    ? mp_save_irq+0x70/0x70
    ? __ioapic_read_entry+0x32/0x50
    ? ioapic_read_entry+0x39/0x50
    ? clear_IO_APIC_pin+0xb8/0x110
    ? native_stop_other_cpus+0x6e/0x170
    ? native_machine_shutdown+0x22/0x40
    ? kernel_kexec+0x136/0x156

    That happens if hypercall based IPIs are used because the hypercall page is
    reset very early upon kexec reboot, but kexec sends IPIs to stop CPUs,
    which invokes the hypercall and dereferences the unusable page.

    To fix his, reset hv_hypercall_pg to NULL before the page is reset to avoid
    any misuse, IPI sending will fall back to the non hypercall based
    method. This only happens on kexec / kdump so just setting the pointer to
    NULL is good enough.

    Fixes: 68bb7bfb7985 ("X86/Hyper-V: Enable IPI enlightenments")
    Signed-off-by: Kairui Song
    Signed-off-by: Thomas Gleixner
    Cc: "K. Y. Srinivasan"
    Cc: Haiyang Zhang
    Cc: Stephen Hemminger
    Cc: Sasha Levin
    Cc: Borislav Petkov
    Cc: "H. Peter Anvin"
    Cc: Vitaly Kuznetsov
    Cc: Dave Young
    Cc: devel@linuxdriverproject.org
    Link: https://lkml.kernel.org/r/20190306111827.14131-1-kasong@redhat.com

    Kairui Song
     

01 Mar, 2019

1 commit


21 Dec, 2018

1 commit


24 Oct, 2018

1 commit

  • Pull x86 paravirt updates from Ingo Molnar:
    "Two main changes:

    - Remove no longer used parts of the paravirt infrastructure and put
    large quantities of paravirt ops under a new config option
    PARAVIRT_XXL=y, which is selected by XEN_PV only. (Joergen Gross)

    - Enable PV spinlocks on Hyperv (Yi Sun)"

    * 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/hyperv: Enable PV qspinlock for Hyper-V
    x86/hyperv: Add GUEST_IDLE_MSR support
    x86/paravirt: Clean up native_patch()
    x86/paravirt: Prevent redefinition of SAVE_FLAGS macro
    x86/xen: Make xen_reservation_lock static
    x86/paravirt: Remove unneeded mmu related paravirt ops bits
    x86/paravirt: Move the Xen-only pv_mmu_ops under the PARAVIRT_XXL umbrella
    x86/paravirt: Move the pv_irq_ops under the PARAVIRT_XXL umbrella
    x86/paravirt: Move the Xen-only pv_cpu_ops under the PARAVIRT_XXL umbrella
    x86/paravirt: Move items in pv_info under PARAVIRT_XXL umbrella
    x86/paravirt: Introduce new config option PARAVIRT_XXL
    x86/paravirt: Remove unused paravirt bits
    x86/paravirt: Use a single ops structure
    x86/paravirt: Remove clobbers from struct paravirt_patch_site
    x86/paravirt: Remove clobbers parameter from paravirt patch functions
    x86/paravirt: Make paravirt_patch_call() and paravirt_patch_jmp() static
    x86/xen: Add SPDX identifier in arch/x86/xen files
    x86/xen: Link platform-pci-unplug.o only if CONFIG_XEN_PVHVM
    x86/xen: Move pv specific parts of arch/x86/xen/mmu.c to mmu_pv.c
    x86/xen: Move pv irq related functions under CONFIG_XEN_PV umbrella

    Linus Torvalds
     

09 Oct, 2018

1 commit

  • Implement the required wait and kick callbacks to support PV spinlocks in
    Hyper-V guests.

    [ tglx: Document the requirement for disabling interrupts in the wait()
    callback. Remove goto and unnecessary includes. Add prototype
    for hv_vcpu_is_preempted(). Adapted to pending paravirt changes. ]

    Signed-off-by: Yi Sun
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Juergen Gross
    Cc: "K. Y. Srinivasan"
    Cc: Haiyang Zhang
    Cc: Stephen Hemminger
    Cc: Michael Kelley (EOSG)
    Cc: chao.p.peng@intel.com
    Cc: chao.gao@intel.com
    Cc: isaku.yamahata@intel.com
    Cc: tianyu.lan@microsoft.com
    Link: https://lkml.kernel.org/r/1538987374-51217-3-git-send-email-yi.y.sun@linux.intel.com

    Yi Sun
     

28 Sep, 2018

2 commits

  • Remove including . It's not needed.

    Signed-off-by: YueHaibing
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Michael Kelley
    Cc: "K. Y. Srinivasan"
    Cc: Haiyang Zhang
    Cc: Stephen Hemminger
    Cc: "H. Peter Anvin"
    Cc:
    Cc:
    Link: https://lkml.kernel.org/r/1537690822-97455-1-git-send-email-yuehaibing@huawei.com

    YueHaibing
     
  • A Generation-2 Linux VM on Hyper-V doesn't have the legacy PCI bus, and
    users always see the scary warning, which is actually harmless.

    Suppress it.

    Signed-off-by: Dexuan Cui
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Michael Kelley
    Cc: "H. Peter Anvin"
    Cc: KY Srinivasan
    Cc: Haiyang Zhang
    Cc: Stephen Hemminger
    Cc: "devel@linuxdriverproject.org"
    Cc: Olaf Aepfle
    Cc: Andy Whitcroft
    Cc: Jason Wang
    Cc: Vitaly Kuznetsov
    Cc: Marcelo Cerri
    Cc: Josh Poulson
    Link: https://lkml.kernel.org/r/ <KU1P153MB0166D977DC930996C4BF538ABF1D0@KU1P153MB0166.APCP153.PROD.OUTLOOK.COM

    Dexuan Cui