Eric Lee / smarc-fsl-linux-kernel

12 Sep, 2024

1 commit

6ed45748c Drivers: hv: vmbus: Fix rescind handling in uio_hv_generic ... Browse Code »

commit 6fd28941447bf2c8ca0f26fda612a1cabc41663f upstream.

Rescind offer handling relies on rescind callbacks for some of the
resources cleanup, if they are registered. It does not unregister
vmbus device for the primary channel closure, when callback is
registered. Without it, next onoffer does not come, rescind flag
remains set and device goes to unusable state.

Add logic to unregister vmbus for the primary channel in rescind callback
to ensure channel removal and relid release, and to ensure that next
onoffer can be received and handled properly.

Cc: stable@vger.kernel.org
Fixes: ca3cda6fcf1e ("uio_hv_generic: add rescind support")
Signed-off-by: Naman Jain
Reviewed-by: Saurabh Sengar
Link: https://lore.kernel.org/r/20240829071312.1595-3-namjain@linux.microsoft.com
Signed-off-by: Greg Kroah-Hartman

Naman Jain
2024-09-12 17:11:41 +0800

17 May, 2024

3 commits

82f9e213b Drivers: hv: vmbus: Don't free ring buffers that couldn't be re-encrypted ... Browse Code »

[ Upstream commit 30d18df6567be09c1433e81993e35e3da573ac48 ]

In CoCo VMs it is possible for the untrusted host to cause
set_memory_encrypted() or set_memory_decrypted() to fail such that an
error is returned and the resulting memory is shared. Callers need to
take care to handle these errors to avoid returning decrypted (shared)
memory to the page allocator, which could lead to functional or security
issues.

The VMBus ring buffer code could free decrypted/shared pages if
set_memory_decrypted() fails. Check the decrypted field in the struct
vmbus_gpadl for the ring buffers to decide whether to free the memory.

Signed-off-by: Michael Kelley
Reviewed-by: Kuppuswamy Sathyanarayanan
Acked-by: Kirill A. Shutemov
Link: https://lore.kernel.org/r/20240311161558.1310-6-mhklinux@outlook.com
Signed-off-by: Wei Liu
Message-ID:
Signed-off-by: Sasha Levin

Michael Kelley
2024-05-17 18:02:17 +0800
8e62341f5 Drivers: hv: vmbus: Track decrypted status in vmbus_gpadl ... Browse Code »

[ Upstream commit 211f514ebf1ef5de37b1cf6df9d28a56cfd242ca ]

In CoCo VMs it is possible for the untrusted host to cause
set_memory_encrypted() or set_memory_decrypted() to fail such that an
error is returned and the resulting memory is shared. Callers need to
take care to handle these errors to avoid returning decrypted (shared)
memory to the page allocator, which could lead to functional or security
issues.

In order to make sure callers of vmbus_establish_gpadl() and
vmbus_teardown_gpadl() don't return decrypted/shared pages to
allocators, add a field in struct vmbus_gpadl to keep track of the
decryption status of the buffers. This will allow the callers to
know if they should free or leak the pages.

Signed-off-by: Rick Edgecombe
Signed-off-by: Michael Kelley
Reviewed-by: Kuppuswamy Sathyanarayanan
Acked-by: Kirill A. Shutemov
Link: https://lore.kernel.org/r/20240311161558.1310-3-mhklinux@outlook.com
Signed-off-by: Wei Liu
Message-ID:
Signed-off-by: Sasha Levin

Rick Edgecombe
2024-05-17 18:02:17 +0800
6123a4e8e Drivers: hv: vmbus: Leak pages if set_memory_encrypted() fails ... Browse Code »

[ Upstream commit 03f5a999adba062456c8c818a683beb1b498983a ]

In CoCo VMs it is possible for the untrusted host to cause
set_memory_encrypted() or set_memory_decrypted() to fail such that an
error is returned and the resulting memory is shared. Callers need to
take care to handle these errors to avoid returning decrypted (shared)
memory to the page allocator, which could lead to functional or security
issues.

VMBus code could free decrypted pages if set_memory_encrypted()/decrypted()
fails. Leak the pages if this happens.

Signed-off-by: Rick Edgecombe
Signed-off-by: Michael Kelley
Reviewed-by: Kuppuswamy Sathyanarayanan
Acked-by: Kirill A. Shutemov
Link: https://lore.kernel.org/r/20240311161558.1310-2-mhklinux@outlook.com
Signed-off-by: Wei Liu
Message-ID:
Signed-off-by: Sasha Levin

Rick Edgecombe
2024-05-17 18:02:17 +0800

27 Mar, 2024

1 commit

3d6af30f1 x86/hyperv: Use per cpu initial stack for vtl context ... Browse Code »

[ Upstream commit 2b4b90e053a29057fb05ba81acce26bddce8d404 ]

Currently, the secondary CPUs in Hyper-V VTL context lack support for
parallel startup. Therefore, relying on the single initial_stack fetched
from the current task structure suffices for all vCPUs.

However, common initial_stack risks stack corruption when parallel startup
is enabled. In order to facilitate parallel startup, use the initial_stack
from the per CPU idle thread instead of the current task.

Fixes: 3be1bc2fe9d2 ("x86/hyperv: VTL support for Hyper-V")
Signed-off-by: Saurabh Sengar
Reviewed-by: Michael Kelley
Link: https://lore.kernel.org/r/1709452896-13342-1-git-send-email-ssengar@linux.microsoft.com
Signed-off-by: Wei Liu
Message-ID:
Signed-off-by: Sasha Levin

Saurabh Sengar
2024-03-27 06:20:06 +0800

05 Sep, 2023

1 commit

0b90c5637 Merge tag 'hyperv-next-signed-20230902' of git://git.kernel.org/pub/scm/linux/ke… ... Browse Code »

…rnel/git/hyperv/linux

Pull hyperv updates from Wei Liu:

- Support for SEV-SNP guests on Hyper-V (Tianyu Lan)

- Support for TDX guests on Hyper-V (Dexuan Cui)

- Use SBRM API in Hyper-V balloon driver (Mitchell Levy)

- Avoid dereferencing ACPI root object handle in VMBus driver (Maciej
Szmigiero)

- A few misecllaneous fixes (Jiapeng Chong, Nathan Chancellor, Saurabh
Sengar)

* tag 'hyperv-next-signed-20230902' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (24 commits)
x86/hyperv: Remove duplicate include
x86/hyperv: Move the code in ivm.c around to avoid unnecessary ifdef's
x86/hyperv: Remove hv_isolation_type_en_snp
x86/hyperv: Use TDX GHCI to access some MSRs in a TDX VM with the paravisor
Drivers: hv: vmbus: Bring the post_msg_page back for TDX VMs with the paravisor
x86/hyperv: Introduce a global variable hyperv_paravisor_present
Drivers: hv: vmbus: Support >64 VPs for a fully enlightened TDX/SNP VM
x86/hyperv: Fix serial console interrupts for fully enlightened TDX guests
Drivers: hv: vmbus: Support fully enlightened TDX guests
x86/hyperv: Support hypercalls for fully enlightened TDX guests
x86/hyperv: Add hv_isolation_type_tdx() to detect TDX guests
x86/hyperv: Fix undefined reference to isolation_type_en_snp without CONFIG_HYPERV
x86/hyperv: Add missing 'inline' to hv_snp_boot_ap() stub
hv: hyperv.h: Replace one-element array with flexible-array member
Drivers: hv: vmbus: Don't dereference ACPI root object handle
x86/hyperv: Add hyperv-specific handling for VMMCALL under SEV-ES
x86/hyperv: Add smp support for SEV-SNP guest
clocksource: hyper-v: Mark hyperv tsc page unencrypted in sev-snp enlightened guest
x86/hyperv: Use vmmcall to implement Hyper-V hypercall in sev-snp enlightened guest
drivers: hv: Mark percpu hvcall input arg page unencrypted in SEV-SNP enlightened guest
...

Linus Torvalds
2023-09-05 02:26:29 +0800

25 Aug, 2023

7 commits

e3131f1c8 x86/hyperv: Remove hv_isolation_type_en_snp ... Browse Code »

In ms_hyperv_init_platform(), do not distinguish between a SNP VM with
the paravisor and a SNP VM without the paravisor.

Replace hv_isolation_type_en_snp() with
!ms_hyperv.paravisor_present && hv_isolation_type_snp().

The hv_isolation_type_en_snp() in drivers/hv/hv.c and
drivers/hv/hv_common.c can be changed to hv_isolation_type_snp() since
we know !ms_hyperv.paravisor_present is true there.

Signed-off-by: Dexuan Cui
Reviewed-by: Michael Kelley
Reviewed-by: Tianyu Lan
Signed-off-by: Wei Liu
Link: https://lore.kernel.org/r/20230824080712.30327-10-decui@microsoft.com

Dexuan Cui
2023-08-25 08:04:57 +0800
233782950 Drivers: hv: vmbus: Bring the post_msg_page back for TDX VMs with the paravisor ... Browse Code »

The post_msg_page was removed in
commit 9a6b1a170ca8 ("Drivers: hv: vmbus: Remove the per-CPU post_msg_page")

However, it turns out that we need to bring it back, but only for a TDX VM
with the paravisor: in such a VM, the hyperv_pcpu_input_arg is not decrypted,
but the HVCALL_POST_MESSAGE in such a VM needs a decrypted page as the
hypercall input page: see the comments in hyperv_init() for a detailed
explanation.

Except for HVCALL_POST_MESSAGE and HVCALL_SIGNAL_EVENT, the other hypercalls
in a TDX VM with the paravisor still use hv_hypercall_pg and must use the
hyperv_pcpu_input_arg (which is encrypted in such a VM), when a hypercall
input page is used.

Signed-off-by: Dexuan Cui
Reviewed-by: Tianyu Lan
Reviewed-by: Michael Kelley
Signed-off-by: Wei Liu
Link: https://lore.kernel.org/r/20230824080712.30327-8-decui@microsoft.com

Dexuan Cui
2023-08-25 08:04:57 +0800
d3a9d7e49 x86/hyperv: Introduce a global variable hyperv_paravisor_present ... Browse Code »

The new variable hyperv_paravisor_present is set only when the VM
is a SNP/TDX VM with the paravisor running: see ms_hyperv_init_platform().

We introduce hyperv_paravisor_present because we can not use
ms_hyperv.paravisor_present in arch/x86/include/asm/mshyperv.h:

struct ms_hyperv_info is defined in include/asm-generic/mshyperv.h, which
is included at the end of arch/x86/include/asm/mshyperv.h, but at the
beginning of arch/x86/include/asm/mshyperv.h, we would already need to use
struct ms_hyperv_info in hv_do_hypercall().

We use hyperv_paravisor_present only in include/asm-generic/mshyperv.h,
and use ms_hyperv.paravisor_present elsewhere. In the future, we'll
introduce a hypercall function structure for different VM types, and
at boot time, the right function pointers would be written into the
structure so that runtime testing of TDX vs. SNP vs. normal will be
avoided and hyperv_paravisor_present will no longer be needed.

Call hv_vtom_init() when it's a VBS VM or when ms_hyperv.paravisor_present
is true, i.e. the VM is a SNP VM or TDX VM with the paravisor.

Enhance hv_vtom_init() for a TDX VM with the paravisor.

In hv_common_cpu_init(), don't decrypt the hyperv_pcpu_input_arg
for a TDX VM with the paravisor, just like we don't decrypt the page
for a SNP VM with the paravisor.

Signed-off-by: Dexuan Cui
Reviewed-by: Tianyu Lan
Reviewed-by: Michael Kelley
Signed-off-by: Wei Liu
Link: https://lore.kernel.org/r/20230824080712.30327-7-decui@microsoft.com

Dexuan Cui
2023-08-25 08:04:57 +0800
cceb4e081 Drivers: hv: vmbus: Support >64 VPs for a fully enlightened TDX/SNP VM ... Browse Code »

Don't set *this_cpu_ptr(hyperv_pcpu_input_arg) before the function
set_memory_decrypted() returns, otherwise we run into this ticky issue:

For a fully enlightened TDX/SNP VM, in hv_common_cpu_init(),
*this_cpu_ptr(hyperv_pcpu_input_arg) is an encrypted page before
the set_memory_decrypted() returns.

When such a VM has more than 64 VPs, if the hyperv_pcpu_input_arg is not
NULL, hv_common_cpu_init() -> set_memory_decrypted() -> ... ->
cpa_flush() -> on_each_cpu() -> ... -> hv_send_ipi_mask() -> ... ->
__send_ipi_mask_ex() tries to call hv_do_rep_hypercall() with the
hyperv_pcpu_input_arg as the hypercall input page, which must be a
decrypted page in such a VM, but the page is still encrypted at this
point, and a fatal fault is triggered.

Fix the issue by setting *this_cpu_ptr(hyperv_pcpu_input_arg) after
set_memory_decrypted(): if the hyperv_pcpu_input_arg is NULL,
__send_ipi_mask_ex() returns HV_STATUS_INVALID_PARAMETER immediately,
and hv_send_ipi_mask() falls back to orig_apic.send_IPI_mask(),
which can use x2apic_send_IPI_all(), which may be slightly slower than
the hypercall but still works correctly in such a VM.

Reviewed-by: Michael Kelley
Reviewed-by: Tianyu Lan
Signed-off-by: Dexuan Cui
Signed-off-by: Wei Liu
Link: https://lore.kernel.org/r/20230824080712.30327-6-decui@microsoft.com

Dexuan Cui
2023-08-25 08:04:56 +0800
68f2f2bc1 Drivers: hv: vmbus: Support fully enlightened TDX guests ... Browse Code »

Add Hyper-V specific code so that a fully enlightened TDX guest (i.e.
without the paravisor) can run on Hyper-V:
Don't use hv_vp_assist_page. Use GHCI instead.
Don't try to use the unsupported HV_REGISTER_CRASH_CTL.
Don't trust (use) Hyper-V's TLB-flushing hypercalls.
Don't use lazy EOI.
Share the SynIC Event/Message pages with the hypervisor.
Don't use the Hyper-V TSC page for now, because non-trivial work is
required to share the page with the hypervisor.

Reviewed-by: Michael Kelley
Signed-off-by: Dexuan Cui
Signed-off-by: Wei Liu
Link: https://lore.kernel.org/r/20230824080712.30327-4-decui@microsoft.com

Dexuan Cui
2023-08-25 08:04:56 +0800
d6e0228d2 x86/hyperv: Support hypercalls for fully enlightened TDX guests ... Browse Code »

A fully enlightened TDX guest on Hyper-V (i.e. without the paravisor) only
uses the GHCI call rather than hv_hypercall_pg. Do not initialize
hypercall_pg for such a guest.

In hv_common_cpu_init(), the hyperv_pcpu_input_arg page needs to be
decrypted in such a guest.

Reviewed-by: Kuppuswamy Sathyanarayanan
Reviewed-by: Michael Kelley
Reviewed-by: Tianyu Lan
Signed-off-by: Dexuan Cui
Signed-off-by: Wei Liu
Link: https://lore.kernel.org/r/20230824080712.30327-3-decui@microsoft.com

Dexuan Cui
2023-08-25 08:04:56 +0800
08e9d1207 x86/hyperv: Add hv_isolation_type_tdx() to detect TDX guests ... Browse Code »

No logic change to SNP/VBS guests.

hv_isolation_type_tdx() will be used to instruct a TDX guest on Hyper-V to
do some TDX-specific operations, e.g. for a fully enlightened TDX guest
(i.e. without the paravisor), hv_do_hypercall() should use
__tdx_hypercall() and such a guest on Hyper-V should handle the Hyper-V
Event/Message/Monitor pages specially.

Reviewed-by: Michael Kelley
Reviewed-by: Tianyu Lan
Signed-off-by: Dexuan Cui
Signed-off-by: Wei Liu
Link: https://lore.kernel.org/r/20230824080712.30327-2-decui@microsoft.com

Dexuan Cui
2023-08-25 08:04:56 +0800

22 Aug, 2023

4 commits

78e04bbff Drivers: hv: vmbus: Don't dereference ACPI root object handle ... Browse Code »

Since the commit referenced in the Fixes: tag below the VMBus client driver
is walking the ACPI namespace up from the VMBus ACPI device to the ACPI
namespace root object trying to find Hyper-V MMIO ranges.

However, if it is not able to find them it ends trying to walk resources of
the ACPI namespace root object itself.
This object has all-ones handle, which causes a NULL pointer dereference
in the ACPI code (from dereferencing this pointer with an offset).

This in turn causes an oops on boot with VMBus host implementations that do
not provide Hyper-V MMIO ranges in their VMBus ACPI device or its
ancestors.
The QEMU VMBus implementation is an example of such implementation.

I guess providing these ranges is optional, since all tested Windows
versions seem to be able to use VMBus devices without them.

Fix this by explicitly terminating the lookup at the ACPI namespace root
object.

Note that Linux guests under KVM/QEMU do not use the Hyper-V PV interface
by default - they only do so if the KVM PV interface is missing or
disabled.

Example stack trace of such oops:
[ 3.710827] ? __die+0x1f/0x60
[ 3.715030] ? page_fault_oops+0x159/0x460
[ 3.716008] ? exc_page_fault+0x73/0x170
[ 3.716959] ? asm_exc_page_fault+0x22/0x30
[ 3.717957] ? acpi_ns_lookup+0x7a/0x4b0
[ 3.718898] ? acpi_ns_internalize_name+0x79/0xc0
[ 3.720018] acpi_ns_get_node_unlocked+0xb5/0xe0
[ 3.721120] ? acpi_ns_check_object_type+0xfe/0x200
[ 3.722285] ? acpi_rs_convert_aml_to_resource+0x37/0x6e0
[ 3.723559] ? down_timeout+0x3a/0x60
[ 3.724455] ? acpi_ns_get_node+0x3a/0x60
[ 3.725412] acpi_ns_get_node+0x3a/0x60
[ 3.726335] acpi_ns_evaluate+0x1c3/0x2c0
[ 3.727295] acpi_ut_evaluate_object+0x64/0x1b0
[ 3.728400] acpi_rs_get_method_data+0x2b/0x70
[ 3.729476] ? vmbus_platform_driver_probe+0x1d0/0x1d0 [hv_vmbus]
[ 3.730940] ? vmbus_platform_driver_probe+0x1d0/0x1d0 [hv_vmbus]
[ 3.732411] acpi_walk_resources+0x78/0xd0
[ 3.733398] vmbus_platform_driver_probe+0x9f/0x1d0 [hv_vmbus]
[ 3.734802] platform_probe+0x3d/0x90
[ 3.735684] really_probe+0x19b/0x400
[ 3.736570] ? __device_attach_driver+0x100/0x100
[ 3.737697] __driver_probe_device+0x78/0x160
[ 3.738746] driver_probe_device+0x1f/0x90
[ 3.739743] __driver_attach+0xc2/0x1b0
[ 3.740671] bus_for_each_dev+0x70/0xc0
[ 3.741601] bus_add_driver+0x10e/0x210
[ 3.742527] driver_register+0x55/0xf0
[ 3.744412] ? 0xffffffffc039a000
[ 3.745207] hv_acpi_init+0x3c/0x1000 [hv_vmbus]

Fixes: 7f163a6fd957 ("drivers:hv: Modify hv_vmbus to search for all MMIO ranges available.")
Signed-off-by: Maciej S. Szmigiero
Reviewed-by: Michael Kelley
Signed-off-by: Wei Liu
Link: https://lore.kernel.org/r/fd8e64ceeecfd1d95ff49021080cf699e88dbbde.1691606267.git.maciej.szmigiero@oracle.com

Maciej S. Szmigiero
2023-08-22 09:18:32 +0800
193061ea0 drivers: hv: Mark percpu hvcall input arg page unencrypted in SEV-SNP enlightened guest ... Browse Code »

Hypervisor needs to access input arg, VMBus synic event and
message pages. Mark these pages unencrypted in the SEV-SNP
guest and free them only if they have been marked encrypted
successfully.

Reviewed-by: Dexuan Cui
Reviewed-by: Michael Kelley
Signed-off-by: Tianyu Lan
Signed-off-by: Wei Liu
Link: https://lore.kernel.org/r/20230818102919.1318039-5-ltykernel@gmail.com

Tianyu Lan
2023-08-22 08:38:20 +0800
8387ce06d x86/hyperv: Set Virtual Trust Level in VMBus init message ... Browse Code »

SEV-SNP guests on Hyper-V can run at multiple Virtual Trust
Levels (VTL). During boot, get the VTL at which we're running
using the GET_VP_REGISTERs hypercall, and save the value
for future use. Then during VMBus initialization, set the VTL
with the saved value as required in the VMBus init message.

Reviewed-by: Dexuan Cui
Reviewed-by: Michael Kelley
Signed-off-by: Tianyu Lan
Signed-off-by: Wei Liu
Link: https://lore.kernel.org/r/20230818102919.1318039-3-ltykernel@gmail.com

Tianyu Lan
2023-08-22 08:38:20 +0800
d6e2d6524 x86/hyperv: Add sev-snp enlightened guest static key ... Browse Code »

Introduce static key isolation_type_en_snp for enlightened
sev-snp guest check.

Reviewed-by: Dexuan Cui
Reviewed-by: Michael Kelley
Signed-off-by: Tianyu Lan
Signed-off-by: Wei Liu
Link: https://lore.kernel.org/r/20230818102919.1318039-2-ltykernel@gmail.com

Tianyu Lan
2023-08-22 08:38:20 +0800

12 Aug, 2023

1 commit

4f74fb30e hv_balloon: Update the balloon driver to use the SBRM API ... Browse Code »

This patch is intended as a proof-of-concept for the new SBRM
machinery[1]. For some brief background, the idea behind SBRM is using
the __cleanup__ attribute to automatically unlock locks (or otherwise
release resources) when they go out of scope, similar to C++ style RAII.
This promises some benefits such as making code simpler (particularly
where you have lots of goto fail; type constructs) as well as reducing
the surface area for certain kinds of bugs.

The changes in this patch should not result in any difference in how the
code actually runs (i.e., it's purely an exercise in this new syntax
sugar). In one instance SBRM was not appropriate, so I left that part
alone, but all other locking/unlocking is handled automatically in this
patch.

[1] https://lore.kernel.org/all/20230626125726.GU4253@hirez.programming.kicks-ass.net/

Suggested-by: Boqun Feng
Signed-off-by: "Mitchell Levy (Microsoft)"
Reviewed-by: Boqun Feng
Signed-off-by: Wei Liu
Link: https://lore.kernel.org/r/20230807-sbrm-hyperv-v2-1-9d2ac15305bd@gmail.com

Mitchell Levy
2023-08-12 05:04:42 +0800

29 Jun, 2023

2 commits

55e544e1a x86/hyperv: Improve code for referencing hyperv_pcpu_input_arg ... Browse Code »

Several places in code for Hyper-V reference the
per-CPU variable hyperv_pcpu_input_arg. Older code uses a multi-line
sequence to reference the variable, and usually includes a cast.
Newer code does a much simpler direct assignment. The latter is
preferable as the complexity of the older code is unnecessary.

Update older code to use the simpler direct assignment.

Signed-off-by: Nischala Yelchuri
Link: https://lore.kernel.org/r/1687286438-9421-1-git-send-email-niyelchu@linux.microsoft.com
Signed-off-by: Wei Liu

Nischala Yelchuri
2023-06-29 01:53:25 +0800
a6fe04388 Drivers: hv: Change hv_free_hyperv_page() to take void * argument ... Browse Code »

Currently hv_free_hyperv_page() takes an unsigned long argument, which
is inconsistent with the void * return value from the corresponding
hv_alloc_hyperv_page() function and variants. This creates unnecessary
extra casting.

Change the hv_free_hyperv_page() argument type to void *.
Also remove redundant casts from invocations of
hv_alloc_hyperv_page() and variants.

Signed-off-by: Kameron Carr
Reviewed-by: Nuno Das Neves
Reviewed-by: Dexuan Cui
Link: https://lore.kernel.org/r/1687558189-19734-1-git-send-email-kameroncarr@linux.microsoft.com
Signed-off-by: Wei Liu

Kameron Carr
2023-06-29 01:51:18 +0800

18 Jun, 2023

1 commit

9636be85c x86/hyperv: Fix hyperv_pcpu_input_arg handling when CPUs go online/offline ... Browse Code »

These commits

a494aef23dfc ("PCI: hv: Replace retarget_msi_interrupt_params with hyperv_pcpu_input_arg")
2c6ba4216844 ("PCI: hv: Enable PCI pass-thru devices in Confidential VMs")

update the Hyper-V virtual PCI driver to use the hyperv_pcpu_input_arg
because that memory will be correctly marked as decrypted or encrypted
for all VM types (CoCo or normal). But problems ensue when CPUs in the
VM go online or offline after virtual PCI devices have been configured.

When a CPU is brought online, the hyperv_pcpu_input_arg for that CPU is
initialized by hv_cpu_init() running under state CPUHP_AP_ONLINE_DYN.
But this state occurs after state CPUHP_AP_IRQ_AFFINITY_ONLINE, which
may call the virtual PCI driver and fault trying to use the as yet
uninitialized hyperv_pcpu_input_arg. A similar problem occurs in a CoCo
VM if the MMIO read and write hypercalls are used from state
CPUHP_AP_IRQ_AFFINITY_ONLINE.

When a CPU is taken offline, IRQs may be reassigned in state
CPUHP_TEARDOWN_CPU. Again, the virtual PCI driver may fault trying to
use the hyperv_pcpu_input_arg that has already been freed by a
higher state.

Fix the onlining problem by adding state CPUHP_AP_HYPERV_ONLINE
immediately after CPUHP_AP_ONLINE_IDLE (similar to CPUHP_AP_KVM_ONLINE)
and before CPUHP_AP_IRQ_AFFINITY_ONLINE. Use this new state for
Hyper-V initialization so that hyperv_pcpu_input_arg is allocated
early enough.

Fix the offlining problem by not freeing hyperv_pcpu_input_arg when
a CPU goes offline. Retain the allocated memory, and reuse it if
the CPU comes back online later.

Signed-off-by: Michael Kelley
Reviewed-by: Vitaly Kuznetsov
Acked-by: Borislav Petkov (AMD)
Reviewed-by: Dexuan Cui
Link: https://lore.kernel.org/r/1684862062-51576-1-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu

Michael Kelley
2023-06-18 07:09:47 +0800

24 May, 2023

1 commit

320805ab6 Drivers: hv: vmbus: Fix vmbus_wait_for_unload() to scan present CPUs ... Browse Code »

vmbus_wait_for_unload() may be called in the panic path after other
CPUs are stopped. vmbus_wait_for_unload() currently loops through
online CPUs looking for the UNLOAD response message. But the values of
CONFIG_KEXEC_CORE and crash_kexec_post_notifiers affect the path used
to stop the other CPUs, and in one of the paths the stopped CPUs
are removed from cpu_online_mask. This removal happens in both
x86/x64 and arm64 architectures. In such a case, vmbus_wait_for_unload()
only checks the panic'ing CPU, and misses the UNLOAD response message
except when the panic'ing CPU is CPU 0. vmbus_wait_for_unload()
eventually times out, but only after waiting 100 seconds.

Fix this by looping through *present* CPUs in vmbus_wait_for_unload().
The cpu_present_mask is not modified by stopping the other CPUs in the
panic path, nor should it be.

Also, in a CoCo VM the synic_message_page is not allocated in
hv_synic_alloc(), but is set and cleared in hv_synic_enable_regs()
and hv_synic_disable_regs() such that it is set only when the CPU is
online. If not all present CPUs are online when vmbus_wait_for_unload()
is called, the synic_message_page might be NULL. Add a check for this.

Fixes: cd95aad55793 ("Drivers: hv: vmbus: handle various crash scenarios")
Cc: stable@vger.kernel.org
Reported-by: John Starks
Signed-off-by: Michael Kelley
Reviewed-by: Vitaly Kuznetsov
Link: https://lore.kernel.org/r/1684422832-38476-1-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu

Michael Kelley
2023-05-24 02:53:16 +0800

09 May, 2023

1 commit

ec97e1129 Drivers: hv: vmbus: Call hv_synic_free() if hv_synic_alloc() fails ... Browse Code »

Commit 572086325ce9 ("Drivers: hv: vmbus: Cleanup synic memory free path")
says "Any memory allocations that succeeded will be freed when the caller
cleans up by calling hv_synic_free()", but if the get_zeroed_page() in
hv_synic_alloc() fails, currently hv_synic_free() is not really called
in vmbus_bus_init(), consequently there will be a memory leak, e.g.
hv_context.hv_numa_map is not freed in the error path. Fix this by
updating the goto labels.

Cc: stable@kernel.org
Signed-off-by: Dexuan Cui
Fixes: 4df4cb9e99f8 ("x86/hyperv: Initialize clockevents earlier in CPU onlining")
Reviewed-by: Michael Kelley
Link: https://lore.kernel.org/r/20230504224155.10484-1-decui@microsoft.com
Signed-off-by: Wei Liu

Dexuan Cui
2023-05-09 01:37:20 +0800

28 Apr, 2023

3 commits

da46b58ff Merge tag 'hyperv-next-signed-20230424' of git://git.kernel.org/pub/scm/linux/ke… ... Browse Code »

…rnel/git/hyperv/linux

Pull hyperv updates from Wei Liu:

- PCI passthrough for Hyper-V confidential VMs (Michael Kelley)

- Hyper-V VTL mode support (Saurabh Sengar)

- Move panic report initialization code earlier (Long Li)

- Various improvements and bug fixes (Dexuan Cui and Michael Kelley)

* tag 'hyperv-next-signed-20230424' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (22 commits)
PCI: hv: Replace retarget_msi_interrupt_params with hyperv_pcpu_input_arg
Drivers: hv: move panic report code from vmbus to hv early init code
x86/hyperv: VTL support for Hyper-V
Drivers: hv: Kconfig: Add HYPERV_VTL_MODE
x86/hyperv: Make hv_get_nmi_reason public
x86/hyperv: Add VTL specific structs and hypercalls
x86/init: Make get/set_rtc_noop() public
x86/hyperv: Exclude lazy TLB mode CPUs from enlightened TLB flushes
x86/hyperv: Add callback filter to cpumask_to_vpset()
Drivers: hv: vmbus: Remove the per-CPU post_msg_page
clocksource: hyper-v: make sure Invariant-TSC is used if it is available
PCI: hv: Enable PCI pass-thru devices in Confidential VMs
Drivers: hv: Don't remap addresses that are above shared_gpa_boundary
hv_netvsc: Remove second mapping of send and recv buffers
Drivers: hv: vmbus: Remove second way of mapping ring buffers
Drivers: hv: vmbus: Remove second mapping of VMBus monitor pages
swiotlb: Remove bounce buffer remapping for Hyper-V
Driver: VMBus: Add Devicetree support
dt-bindings: bus: Add Hyper-V VMBus
Drivers: hv: vmbus: Convert acpi_device to more generic platform_device
...

Linus Torvalds
2023-04-28 08:17:12 +0800
888d3c9f7 Merge tag 'sysctl-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux ... Browse Code »

Pull sysctl updates from Luis Chamberlain:
"This only does a few sysctl moves from the kernel/sysctl.c file, the
rest of the work has been put towards deprecating two API calls which
incur recursion and prevent us from simplifying the registration
process / saving memory per move. Most of the changes have been
soaking on linux-next since v6.3-rc3.

I've slowed down the kernel/sysctl.c moves due to Matthew Wilcox's
feedback that we should see if we could *save* memory with these moves
instead of incurring more memory. We currently incur more memory since
when we move a syctl from kernel/sysclt.c out to its own file we end
up having to add a new empty sysctl used to register it. To achieve
saving memory we want to allow syctls to be passed without requiring
the end element being empty, and just have our registration process
rely on ARRAY_SIZE(). Without this, supporting both styles of sysctls
would make the sysctl registration pretty brittle, hard to read and
maintain as can be seen from Meng Tang's efforts to do just this [0].
Fortunately, in order to use ARRAY_SIZE() for all sysctl registrations
also implies doing the work to deprecate two API calls which use
recursion in order to support sysctl declarations with subdirectories.

And so during this development cycle quite a bit of effort went into
this deprecation effort. I've annotated the following two APIs are
deprecated and in few kernel releases we should be good to remove
them:

- register_sysctl_table()
- register_sysctl_paths()

During this merge window we should be able to deprecate and unexport
register_sysctl_paths(), we can probably do that towards the end of
this merge window.

Deprecating register_sysctl_table() will take a bit more time but this
pull request goes with a few example of how to do this.

As it turns out each of the conversions to move away from either of
these two API calls *also* saves memory. And so long term, all these
changes *will* prove to have saved a bit of memory on boot.

The way I see it then is if remove a user of one deprecated call, it
gives us enough savings to move one kernel/sysctl.c out from the
generic arrays as we end up with about the same amount of bytes.

Since deprecating register_sysctl_table() and register_sysctl_paths()
does not require maintainer coordination except the final unexport
you'll see quite a bit of these changes from other pull requests, I've
just kept the stragglers after rc3"

Link: https://lkml.kernel.org/r/ZAD+cpbrqlc5vmry@bombadil.infradead.org [0]

* tag 'sysctl-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: (29 commits)
fs: fix sysctls.c built
mm: compaction: remove incorrect #ifdef checks
mm: compaction: move compaction sysctl to its own file
mm: memory-failure: Move memory failure sysctls to its own file
arm: simplify two-level sysctl registration for ctl_isa_vars
ia64: simplify one-level sysctl registration for kdump_ctl_table
utsname: simplify one-level sysctl registration for uts_kern_table
ntfs: simplfy one-level sysctl registration for ntfs_sysctls
coda: simplify one-level sysctl registration for coda_table
fs/cachefiles: simplify one-level sysctl registration for cachefiles_sysctls
xfs: simplify two-level sysctl registration for xfs_table
nfs: simplify two-level sysctl registration for nfs_cb_sysctls
nfs: simplify two-level sysctl registration for nfs4_cb_sysctls
lockd: simplify two-level sysctl registration for nlm_sysctls
proc_sysctl: enhance documentation
xen: simplify sysctl registration for balloon
md: simplify sysctl registration
hv: simplify sysctl registration
scsi: simplify sysctl registration with register_sysctl()
csky: simplify alignment sysctl registration
...

Linus Torvalds
2023-04-28 07:52:33 +0800
556eb8b79 Merge tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core ... Browse Code »

Pull driver core updates from Greg KH:
"Here is the large set of driver core changes for 6.4-rc1.

Once again, a busy development cycle, with lots of changes happening
in the driver core in the quest to be able to move "struct bus" and
"struct class" into read-only memory, a task now complete with these
changes.

This will make the future rust interactions with the driver core more
"provably correct" as well as providing more obvious lifetime rules
for all busses and classes in the kernel.

The changes required for this did touch many individual classes and
busses as many callbacks were changed to take const * parameters
instead. All of these changes have been submitted to the various
subsystem maintainers, giving them plenty of time to review, and most
of them actually did so.

Other than those changes, included in here are a small set of other
things:

- kobject logging improvements

- cacheinfo improvements and updates

- obligatory fw_devlink updates and fixes

- documentation updates

- device property cleanups and const * changes

- firwmare loader dependency fixes.

All of these have been in linux-next for a while with no reported
problems"

* tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (120 commits)
device property: make device_property functions take const device *
driver core: update comments in device_rename()
driver core: Don't require dynamic_debug for initcall_debug probe timing
firmware_loader: rework crypto dependencies
firmware_loader: Strip off \n from customized path
zram: fix up permission for the hot_add sysfs file
cacheinfo: Add use_arch[|_cache]_info field/function
arch_topology: Remove early cacheinfo error message if -ENOENT
cacheinfo: Check cache properties are present in DT
cacheinfo: Check sib_leaf in cache_leaves_are_shared()
cacheinfo: Allow early level detection when DT/ACPI info is missing/broken
cacheinfo: Add arm64 early level initializer implementation
cacheinfo: Add arch specific early level initializer
tty: make tty_class a static const structure
driver core: class: remove struct class_interface * from callbacks
driver core: class: mark the struct class in struct class_interface constant
driver core: class: make class_register() take a const *
driver core: class: mark class_release() as taking a const *
driver core: remove incorrect comment for device_create*
MIPS: vpe-cmp: remove module owner pointer from struct class usage.
...

Linus Torvalds
2023-04-28 02:53:57 +0800

26 Apr, 2023

1 commit

bc1bb2a49 Merge tag 'x86_sev_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 SEV updates from Borislav Petkov:

- Add the necessary glue so that the kernel can run as a confidential
SEV-SNP vTOM guest on Hyper-V. A vTOM guest basically splits the
address space in two parts: encrypted and unencrypted. The use case
being running unmodified guests on the Hyper-V confidential computing
hypervisor

- Double-buffer messages between the guest and the hardware PSP device
so that no partial buffers are copied back'n'forth and thus potential
message integrity and leak attacks are possible

- Name the return value the sev-guest driver returns when the hw PSP
device hasn't been called, explicitly

- Cleanups

* tag 'x86_sev_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/hyperv: Change vTOM handling to use standard coco mechanisms
init: Call mem_encrypt_init() after Hyper-V hypercall init is done
x86/mm: Handle decryption/re-encryption of bss_decrypted consistently
Drivers: hv: Explicitly request decrypted in vmap_pfn() calls
x86/hyperv: Reorder code to facilitate future work
x86/ioremap: Add hypervisor callback for private MMIO mapping in coco VM
x86/sev: Change snp_guest_issue_request()'s fw_err argument
virt/coco/sev-guest: Double-buffer messages
crypto: ccp: Get rid of __sev_platform_init_locked()'s local function pointer
crypto: ccp - Name -1 return value as SEV_RET_NO_FW_CALL

Linus Torvalds
2023-04-26 01:48:08 +0800

21 Apr, 2023

1 commit

9c318a1d9 Drivers: hv: move panic report code from vmbus to hv early init code ... Browse Code »

The panic reporting code was added in commit 81b18bce48af
("Drivers: HV: Send one page worth of kmsg dump over Hyper-V during panic")

It was added to the vmbus driver. The panic reporting has no dependence
on vmbus, and can be enabled at an earlier boot time when Hyper-V is
initialized.

This patch moves the panic reporting code out of vmbus. There is no
functionality changes. During moving, also refactored some cleanup
functions into hv_kmsg_dump_unregister().

Signed-off-by: Long Li
Reviewed-by: Michael Kelley
Link: https://lore.kernel.org/r/1682030946-6372-1-git-send-email-longli@linuxonhyperv.com
Signed-off-by: Wei Liu

Long Li
2023-04-21 07:00:37 +0800

19 Apr, 2023

1 commit

d01b9a9f2 Drivers: hv: Kconfig: Add HYPERV_VTL_MODE ... Browse Code »

Add HYPERV_VTL_MODE Kconfig flag for VTL mode.

Signed-off-by: Saurabh Sengar
Reviewed-by: Michael Kelley
Link: https://lore.kernel.org/r/1681192532-15460-5-git-send-email-ssengar@linux.microsoft.com
Signed-off-by: Wei Liu

Saurabh Sengar
2023-04-19 01:29:52 +0800

18 Apr, 2023

9 commits

9a6b1a170 Drivers: hv: vmbus: Remove the per-CPU post_msg_page ... Browse Code »

The post_msg_page was introduced in 2014 in
commit b29ef3546aec ("Drivers: hv: vmbus: Cleanup hv_post_message()")

Commit 68bb7bfb7985 ("X86/Hyper-V: Enable IPI enlightenments") introduced
the hyperv_pcpu_input_arg in 2018, which can be used in hv_post_message().

Remove post_msg_page to simplify the code a little bit.

Signed-off-by: Dexuan Cui
Reviewed-by: Jinank Jain
Link: https://lore.kernel.org/r/20230408213441.15472-1-decui@microsoft.com
Signed-off-by: Wei Liu

Dexuan Cui
2023-04-18 03:19:05 +0800
2c6ba4216 PCI: hv: Enable PCI pass-thru devices in Confidential VMs ... Browse Code »

For PCI pass-thru devices in a Confidential VM, Hyper-V requires
that PCI config space be accessed via hypercalls. In normal VMs,
config space accesses are trapped to the Hyper-V host and emulated.
But in a confidential VM, the host can't access guest memory to
decode the instruction for emulation, so an explicit hypercall must
be used.

Add functions to make the new MMIO read and MMIO write hypercalls.
Update the PCI config space access functions to use the hypercalls
when such use is indicated by Hyper-V flags. Also, set the flag to
allow the Hyper-V PCI driver to be loaded and used in a Confidential
VM (a.k.a., "Isolation VM"). The driver has previously been hardened
against a malicious Hyper-V host[1].

[1] https://lore.kernel.org/all/20220511223207.3386-2-parri.andrea@gmail.com/

Co-developed-by: Dexuan Cui
Signed-off-by: Dexuan Cui
Signed-off-by: Michael Kelley
Reviewed-by: Boqun Feng
Reviewed-by: Haiyang Zhang
Link: https://lore.kernel.org/r/1679838727-87310-13-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu

Michael Kelley
2023-04-18 03:19:04 +0800
6afd9dc1a Drivers: hv: Don't remap addresses that are above shared_gpa_boundary ... Browse Code »

With the vTOM bit now treated as a protection flag and not part of
the physical address, avoid remapping physical addresses with vTOM set
since technically such addresses aren't valid. Use ioremap_cache()
instead of memremap() to ensure that the mapping provides decrypted
access, which will correctly set the vTOM bit as a protection flag.

While this change is not required for correctness with the current
implementation of memremap(), for general code hygiene it's better to
not depend on the mapping functions doing something reasonable with
a physical address that is out-of-range.

While here, fix typos in two error messages.

Signed-off-by: Michael Kelley
Reviewed-by: Tianyu Lan
Link: https://lore.kernel.org/r/1679838727-87310-12-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu

Michael Kelley
2023-04-18 03:19:04 +0800
25727aaed hv_netvsc: Remove second mapping of send and recv buffers ... Browse Code »

With changes to how Hyper-V guest VMs flip memory between private
(encrypted) and shared (decrypted), creating a second kernel virtual
mapping for shared memory is no longer necessary. Everything needed
for the transition to shared is handled by set_memory_decrypted().

As such, remove the code to create and manage the second
mapping for the pre-allocated send and recv buffers. This mapping
is the last user of hv_map_memory()/hv_unmap_memory(), so delete
these functions as well. Finally, hv_map_memory() is the last
user of vmap_pfn() in Hyper-V guest code, so remove the Kconfig
selection of VMAP_PFN.

Signed-off-by: Michael Kelley
Reviewed-by: Tianyu Lan
Link: https://lore.kernel.org/r/1679838727-87310-11-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu

Michael Kelley
2023-04-18 03:19:04 +0800
bb862397f Drivers: hv: vmbus: Remove second way of mapping ring buffers ... Browse Code »

With changes to how Hyper-V guest VMs flip memory between private
(encrypted) and shared (decrypted), it's no longer necessary to
have separate code paths for mapping VMBus ring buffers for
for normal VMs and for Confidential VMs.

As such, remove the code path that uses vmap_pfn(), and set
the protection flags argument to vmap() to account for the
difference between normal and Confidential VMs.

Signed-off-by: Michael Kelley
Reviewed-by: Tianyu Lan
Link: https://lore.kernel.org/r/1679838727-87310-10-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu

Michael Kelley
2023-04-18 03:19:04 +0800
a5ddb7458 Drivers: hv: vmbus: Remove second mapping of VMBus monitor pages ... Browse Code »

With changes to how Hyper-V guest VMs flip memory between private
(encrypted) and shared (decrypted), creating a second kernel virtual
mapping for shared memory is no longer necessary. Everything needed
for the transition to shared is handled by set_memory_decrypted().

As such, remove the code to create and manage the second
mapping for VMBus monitor pages. Because set_memory_decrypted()
and set_memory_encrypted() are no-ops in normal VMs, it's
not even necessary to test for being in a Confidential VM
(a.k.a., "Isolation VM").

Signed-off-by: Michael Kelley
Reviewed-by: Tianyu Lan
Link: https://lore.kernel.org/r/1679838727-87310-9-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu

Michael Kelley
2023-04-18 03:19:04 +0800
21eb596fc Merge remote-tracking branch 'tip/x86/sev' into hyperv-next ... Browse Code »

Merge the following 6 patches from tip/x86/sev, which are taken from
Michael Kelley's series [0]. The rest of Michael's series depend on
them.

x86/hyperv: Change vTOM handling to use standard coco mechanisms
init: Call mem_encrypt_init() after Hyper-V hypercall init is done
x86/mm: Handle decryption/re-encryption of bss_decrypted consistently
Drivers: hv: Explicitly request decrypted in vmap_pfn() calls
x86/hyperv: Reorder code to facilitate future work
x86/ioremap: Add hypervisor callback for private MMIO mapping in coco VM

0: https://lore.kernel.org/linux-hyperv/1679838727-87310-1-git-send-email-mikelley@microsoft.com/

Wei Liu
2023-04-18 03:18:13 +0800
f83705a51 Driver: VMBus: Add Devicetree support ... Browse Code »

Update the driver to support Devicetree boot as well along with ACPI.
At present the Devicetree parsing only provides the mmio region info
and is not the exact copy of ACPI parsing. This is sufficient to cater
all the current Devicetree usecases for VMBus.

Currently Devicetree is supported only for x86 systems.

Signed-off-by: Saurabh Sengar
Reviewed-by: Michael Kelley
Link: https://lore.kernel.org/r/1679298460-11855-6-git-send-email-ssengar@linux.microsoft.com
Signed-off-by: Wei Liu

Saurabh Sengar
2023-04-18 03:16:22 +0800
9c8434238 Drivers: hv: vmbus: Convert acpi_device to more generic platform_device ... Browse Code »

VMBus driver code currently has direct dependency on ACPI and struct
acpi_device. As a staging step toward optionally configuring based on
Devicetree instead of ACPI, use a more generic platform device to reduce
the dependency on ACPI where possible, though the dependency on ACPI
is not completely removed. Also rename the function vmbus_acpi_remove()
to the more generic vmbus_mmio_remove().

Signed-off-by: Saurabh Sengar
Reviewed-by: Michael Kelley
Link: https://lore.kernel.org/r/1679298460-11855-4-git-send-email-ssengar@linux.microsoft.com
Signed-off-by: Wei Liu

Saurabh Sengar
2023-04-18 03:16:22 +0800

14 Apr, 2023

1 commit

525f23fe5 hv: simplify sysctl registration ... Browse Code »

register_sysctl_table() is a deprecated compatibility wrapper.
register_sysctl() can do the directory creation for you so just use
that.

Signed-off-by: Luis Chamberlain
Reviewed-by: Michael Kelley
Reviewed-by: Wei Liu

Luis Chamberlain
2023-04-14 02:49:20 +0800

27 Mar, 2023

1 commit

812b0597f x86/hyperv: Change vTOM handling to use standard coco mechanisms ... Browse Code »

Hyper-V guests on AMD SEV-SNP hardware have the option of using the
"virtual Top Of Memory" (vTOM) feature specified by the SEV-SNP
architecture. With vTOM, shared vs. private memory accesses are
controlled by splitting the guest physical address space into two
halves.

vTOM is the dividing line where the uppermost bit of the physical
address space is set; e.g., with 47 bits of guest physical address
space, vTOM is 0x400000000000 (bit 46 is set). Guest physical memory is
accessible at two parallel physical addresses -- one below vTOM and one
above vTOM. Accesses below vTOM are private (encrypted) while accesses
above vTOM are shared (decrypted). In this sense, vTOM is like the
GPA.SHARED bit in Intel TDX.

Support for Hyper-V guests using vTOM was added to the Linux kernel in
two patch sets[1][2]. This support treats the vTOM bit as part of
the physical address. For accessing shared (decrypted) memory, these
patch sets create a second kernel virtual mapping that maps to physical
addresses above vTOM.

A better approach is to treat the vTOM bit as a protection flag, not
as part of the physical address. This new approach is like the approach
for the GPA.SHARED bit in Intel TDX. Rather than creating a second kernel
virtual mapping, the existing mapping is updated using recently added
coco mechanisms.

When memory is changed between private and shared using
set_memory_decrypted() and set_memory_encrypted(), the PTEs for the
existing kernel mapping are changed to add or remove the vTOM bit in the
guest physical address, just as with TDX. The hypercalls to change the
memory status on the host side are made using the existing callback
mechanism. Everything just works, with a minor tweak to map the IO-APIC
to use private accesses.

To accomplish the switch in approach, the following must be done:

* Update Hyper-V initialization to set the cc_mask based on vTOM
and do other coco initialization.

* Update physical_mask so the vTOM bit is no longer treated as part
of the physical address

* Remove CC_VENDOR_HYPERV and merge the associated vTOM functionality
under CC_VENDOR_AMD. Update cc_mkenc() and cc_mkdec() to set/clear
the vTOM bit as a protection flag.

* Code already exists to make hypercalls to inform Hyper-V about pages
changing between shared and private. Update this code to run as a
callback from __set_memory_enc_pgtable().

* Remove the Hyper-V special case from __set_memory_enc_dec()

* Remove the Hyper-V specific call to swiotlb_update_mem_attributes()
since mem_encrypt_init() will now do it.

* Add a Hyper-V specific implementation of the is_private_mmio()
callback that returns true for the IO-APIC and vTPM MMIO addresses

[1] https://lore.kernel.org/all/20211025122116.264793-1-ltykernel@gmail.com/
[2] https://lore.kernel.org/all/20211213071407.314309-1-ltykernel@gmail.com/

[ bp: Touchups. ]

Signed-off-by: Michael Kelley
Signed-off-by: Borislav Petkov (AMD)
Link: https://lore.kernel.org/r/1679838727-87310-7-git-send-email-mikelley@microsoft.com

Michael Kelley
2023-03-27 15:31:43 +0800