Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backends/igvm: Fix memory leak #12

Open
wants to merge 82 commits into
base: svsm-igvm
Choose a base branch
from

Conversation

peterfang
Copy link

Always pair igvm_get_buffer() with igvm_free_buffer() before exiting the loop in supported_platform_compat_mask().

calmisi and others added 30 commits January 9, 2024 08:22
pc_machine_kvm_type() was introduced by commit e21be72 ("i386/xen:
add pc_machine_kvm_type to initialize XEN_EMULATE mode") to do Xen
specific initialization by utilizing kvm_type method.

commit eeedfe6 ("hw/xen: Simplify emulated Xen platform init")
moves the Xen specific initialization to pc_basic_device_init().

There is no need to keep the PC specific kvm_type() implementation
anymore. On the other hand, later patch will implement kvm_type()
method for all x86/i386 machines to support KVM_X86_SW_PROTECTED_VM.

Signed-off-by: Xiaoyao Li <[email protected]>
Reviewed-by: Isaku Yamahata <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Decrease array index cpuid_i when CPUID leaf 1F is skipped, otherwise it
will get an all zero'ed CPUID entry with leaf 0 and subleaf 0. It
conflicts with correct leaf 0.

Signed-off-by: Xiaoyao Li <[email protected]>
No such constraint that subleaf index needs to be less than 64.

Signed-off-by: Xiaoyao Li <[email protected]>
…ry()

The upper 16 bits of kvm_userspace_memory_region::slot are
address space id. Parse it separately in trace_kvm_set_user_memory().

Signed-off-by: Xiaoyao Li <[email protected]>
This patch needs to be updated by script

	scripts/update-linux-headers.sh

once gmem fd support is upstreamed in Linux kernel.

Signed-off-by: Xiaoyao Li <[email protected]>
Add KVM guest_memfd support to RAMBlock so both normal hva based memory
and kvm guest memfd based private memory can be associated in one RAMBlock.

Introduce new flag RAM_GUEST_MEMFD. When it's set, it calls KVM ioctl to
create private guest_memfd during RAMBlock setup.

Note, RAM_GUEST_MEMFD is supposed to be set for memory backends of
confidential guests, such as TDX VM. How and when to set it for memory
backends will be implemented in the following patches.

Introduce memory_region_has_guest_memfd() to query if the MemoryRegion has
KVM guest_memfd allocated.

Signed-off-by: Xiaoyao Li <[email protected]>
---
Changes in v3:
- rename gmem to guest_memfd;
- close(guest_memfd) when RAMBlock is released; (Daniel P. Berrangé)
- Suqash the patch that introduces memory_region_has_guest_memfd().
KVM allows KVM_GUEST_MEMFD_ALLOW_HUGEPAGE for guest memfd. When the
flag is set, KVM tries to allocate memory with transparent hugeapge at
first and falls back to non-hugepage on failure.

However, KVM defines one restriction that size must be hugepage size
aligned when KVM_GUEST_MEMFD_ALLOW_HUGEPAGE is set.

Signed-off-by: Xiaoyao Li <[email protected]>
---
v3:
 - New one in v3.
Add a new member "require_guest_memfd" to memory backends. When it's set
to true, it enables RAM_GUEST_MEMFD in ram_flags, thus private kvm
guest_memfd will be allocated during RAMBlock allocation.

Memory backend's @require_guest_memfd is wired with @require_guest_memfd
field of MachineState. MachineState::require_guest_memfd is supposed to
be set by any VMs that requires KVM guest memfd as private memory, e.g.,
TDX VM.

Signed-off-by: Xiaoyao Li <[email protected]>
Switch to KVM_SET_USER_MEMORY_REGION2 when supported by KVM.

With KVM_SET_USER_MEMORY_REGION2, QEMU can set up memory region that
backend'ed both by hva-based shared memory and guest memfd based private
memory.

Signed-off-by: Chao Peng <[email protected]>
Co-developed-by: Xiaoyao Li <[email protected]>
Signed-off-by: Xiaoyao Li <[email protected]>
Introduce the helper functions to set the attributes of a range of
memory to private or shared.

This is necessary to notify KVM the private/shared attribute of each gpa
range. KVM needs the information to decide the GPA needs to be mapped at
hva-based shared memory or guest_memfd based private memory.

Signed-off-by: Xiaoyao Li <[email protected]>
…card_range()

Commit d3a5038 ("exec: ram_block_discard_range") introduced
ram_block_discard_range() which grabs some code from
ram_discard_range(). However, during code movement, it changed alignment
check of host_startaddr from qemu_host_page_size to rb->page_size.

When ramblock is back'ed by hugepage, it requires the startaddr to be
huge page size aligned, which is a overkill. e.g., TDX's private-shared
page conversion is done at 4KB granularity. Shared page is discarded
when it gets converts to private and when shared page back'ed by
hugepage it is going to fail on this check.

So change to alignment check back to qemu_host_page_size.

Signed-off-by: Xiaoyao Li <[email protected]>
---
Changes in v3:
 - Newly added in v3;
…ge()

Use __func__ to avoid hard-coded function name.

Signed-off-by: Xiaoyao Li <[email protected]>
It's used for discarding opposite memory after memory conversion, for
confidential guest.

When page is converted from shared to private, the original shared
memory can be discarded via ram_block_discard_range();

When page is converted from private to shared, the original private
memory is back'ed by guest_memfd. Introduce
ram_block_discard_guest_memfd_range() for discarding memory in
guest_memfd.

Originally-from: Isaku Yamahata <[email protected]>
Codeveloped-by: Xiaoyao Li <[email protected]>
Signed-off-by: Xiaoyao Li <[email protected]>
Currently only KVM_MEMORY_EXIT_FLAG_PRIVATE in flags is valid when
KVM_EXIT_MEMORY_FAULT happens. It indicates userspace needs to do
the memory conversion on the RAMBlock to turn the memory into desired
attribute, i.e., private/shared.

Note, KVM_EXIT_MEMORY_FAULT makes sense only when the RAMBlock has
guest_memfd memory backend.

Note, KVM_EXIT_MEMORY_FAULT returns with -EFAULT, so special handling is
added.

Signed-off-by: Chao Peng <[email protected]>
Co-developed-by: Xiaoyao Li <[email protected]>
Signed-off-by: Xiaoyao Li <[email protected]>
Implement mc->kvm_type() for i386 machines. It provides a way for user
to create SW_PROTECTE_VM.

Also store the vm_type in machinestate to other code to query what the
VM type is.

Signed-off-by: Xiaoyao Li <[email protected]>
TDX VM requires VM type KVM_X86_TDX_VM to be passed to
kvm_ioctl(KVM_CREATE_VM).

If tdx-guest object is specified to confidential-guest-support, like,

  qemu -machine ...,confidential-guest-support=tdx0 \
       -object tdx-guest,id=tdx0,...

it parses VM type as KVM_X86_TDX_VM.

Signed-off-by: Xiaoyao Li <[email protected]>
Introduce a separate function kvm_confidential_guest_init(), which
dispatches specific confidential guest initialization function by
ms->cgs type.

Signed-off-by: Xiaoyao Li <[email protected]>
Acked-by: Gerd Hoffmann <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
…ivate value

Introduce new flag RAM_DEFAULT_PRIVATE for RAMBlock. It's used to
indicate the default attribute,  private or not.

Set the RAM range to private explicitly when it's default private.

Originated-from: Isaku Yamahata <[email protected]>
Signed-off-by: Xiaoyao Li <[email protected]>
Introduce memory_region_init_ram_guest_memfd() to allocate private
guset memfd on the MemoryRegion initialization. It's for the use case of
TDVF, which must be private on TDX case.

Signed-off-by: Xiaoyao Li <[email protected]>
In mch_realize(), process PAM initialization before SMRAM initialization so
that later patch can skill all the SMRAM related with a single check.

Signed-off-by: Isaku Yamahata <[email protected]>
Signed-off-by: Xiaoyao Li <[email protected]>
Add a q35 property to check whether or not SMM ranges, e.g. SMRAM, TSEG,
etc... exist for the target platform.  TDX doesn't support SMM and doesn't
play nice with QEMU modifying related guest memory ranges.

Signed-off-by: Isaku Yamahata <[email protected]>
Co-developed-by: Sean Christopherson <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Xiaoyao Li <[email protected]>
This reverts commit a1a6b66aa83d26bab1628b92b7720d5e4b672e7e.
TODO: squash into:
        kvm: Enable KVM_SET_USER_MEMORY_REGION2 for memslot

Signed-off-by: Michael Roth <[email protected]>
Currently all SEV/SEV-ES functionality is managed through a single
'sev-guest' QOM type. With upcoming support for SEV-SNP, taking this
same approach won't work well since some of the properties/state
managed by 'sev-guest' is not applicable to SEV-SNP, which will instead
rely on a new QOM type with its own set of properties/state.

To prepare for this, this patch moves common state into an abstract
'sev-common' parent type to encapsulate properties/state that is
common to both SEV/SEV-ES and SEV-SNP, leaving only SEV/SEV-ES-specific
properties/state in the current 'sev-guest' type. This should not
affect current behavior or command-line options.

As part of this patch, some related changes are also made:

  - a static 'sev_guest' variable is currently used to keep track of
    the 'sev-guest' instance. SEV-SNP would similarly introduce an
    'sev_snp_guest' static variable. But these instances are now
    available via qdev_get_machine()->cgs, so switch to using that
    instead and drop the static variable.

  - 'sev_guest' is currently used as the name for the static variable
    holding a pointer to the 'sev-guest' instance. Re-purpose the name
    as a local variable referring the 'sev-guest' instance, and use
    that consistently throughout the code so it can be easily
    distinguished from sev-common/sev-snp-guest instances.

  - 'sev' is generally used as the name for local variables holding a
    pointer to the 'sev-guest' instance. In cases where that now points
    to common state, use the name 'sev_common'; in cases where that now
    points to state specific to 'sev-guest' instance, use the name
    'sev_guest'

Signed-off-by: Michael Roth <[email protected]>
SEV-SNP support relies on a different set of properties/state than the
existing 'sev-guest' object. This patch introduces the 'sev-snp-guest'
object, which can be used to configure an SEV-SNP guest. For example,
a default-configured SEV-SNP guest with no additional information
passed in for use with attestation:

  -object sev-snp-guest,id=sev0

or a fully-specified SEV-SNP guest where all spec-defined binary
blobs are passed in as base64-encoded strings:

  -object sev-snp-guest,id=sev0, \
    policy=0x30000, \
    init-flags=0, \
    id-block=YWFhYWFhYWFhYWFhYWFhCg==, \
    id-auth=CxHK/OKLkXGn/KpAC7Wl1FSiisWDbGTEKz..., \
    auth-key-enabled=on, \
    host-data=LNkCWBRC5CcdGXirbNUV1OrsR28s..., \
    guest-visible-workarounds=AA==, \

See the QAPI schema updates included in this patch for more usage
details.

In some cases these blobs may be up to 4096 characters, but this is
generally well below the default limit for linux hosts where
command-line sizes are defined by the sysconf-configurable ARG_MAX
value, which defaults to 2097152 characters for Ubuntu hosts, for
example.

Co-developed-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
When SEV-SNP is enabled, the KVM_SNP_INIT command is used to initialize
the platform. The command checks whether SNP is enabled in the KVM, if
enabled then it allocates a new ASID from the SNP pool and calls the
firmware to initialize the all the resources.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
The SNP_LAUNCH_START is called first to create a cryptographic launch
context within the firmware.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
The KVM_SEV_SNP_LAUNCH_UPDATE command is used for encrypting the bios
image used for booting the SEV-SNP guest.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
mdroth and others added 28 commits January 9, 2024 08:22
Current SNP guest kernels will attempt to access these regions with
with C-bit set, so guest_memfd is needed to handle that. Whether guests
should actually try to access ROM regions in this way (or need to deal
with legacy ROM regions at all), is a separate issue to be addressed on
kernel side, but current SNP kernels will exhibit this behavior and so
need some sort of handling.

Signed-off-by: Michael Roth <[email protected]>
Test kernel now uses hugepages by default.

Signed-off-by: Michael Roth <[email protected]>
SEV uses these to pin pages, SNP doesn't need this since it relies on
gmem for any private pages.

Signed-off-by: Michael Roth <[email protected]>
The IGVM library allows Independent Guest Virtual Machine files to be
parsed and processed. IGVM files are used to configure guest memory
layout, initial processor state and other configuration pertaining to
secure virtual machines.

This adds the --enable-igvm configure option, enabled by default, which
attempts to locate and link against the IGVM library via pkgconfig and
sets CONFIG_IGVM if found.

The library is added to the system_ss target in backends/meson.build
where the IGVM parsing will be performed by the ConfidentialGuestSupport
object.

Signed-off-by: Roy Hopkins <[email protected]>
In order to add support for parsing IGVM files for secure virtual
machines, a the path to an IGVM file needs to be specified as
part of the guest configuration. It makes sense to add this to
the ConfidentialGuestSupport object as this is common to all secure
virtual machines that potentially could support IGVM based
configuration.

This patch allows the filename to be configured via the QEMU
object model in preparation for subsequent patches that will read and
parse the IGVM file.

Signed-off-by: Roy Hopkins <[email protected]>
In preparation for supporting the processing of IGVM files to configure
guests, this adds a set of functions to ConfidentialGuestSupport
allowing configuration of secure virtual machines that can be
implemented for each supported isolation platform type such as Intel TDX
or AMD SEV-SNP. These functions will be called by IGVM processing code
in subsequent patches.

This commit provides a default implementation of the functions that
either perform no action or generate a warning or error when they are
called. Targets that support ConfidentalGuestSupport should override
these implementations.

Signed-off-by: Roy Hopkins <[email protected]>
This commit adds an implementation of an IGVM loader which parses the
file specified as a pararameter to ConfidentialGuestSupport and provides
a function that uses the interface in the same object to configure and
populate guest memory based on the contents of the file.

The IGVM file is parsed when a filename is provided but the code to
process the IGVM file is not yet hooked into target systems. This will
follow in a later commit.

Signed-off-by: Roy Hopkins <[email protected]>
An IGVM file contains configuration of a guest that supports
confidential computing hardware. As part of the PC system
initialization, the IGVM needs to be processed to apply this
configuration before the guest is started.

This patch introduces processing of a provided IGVM file at the end of
the current PC initialization steps. If an IGVM file has been provided
then the directives in the file are processed completing the
initialization of the target.

If no IGVM file has been specified by the user then no there is no
intended consequences in these changes.

Signed-off-by: Roy Hopkins <[email protected]>
…h IGVM

When using an IGVM file the configuration of the system firmware is
defined by IGVM directives contained in the file. In this case the user
should not configure any pflash devices.

This commit skips initialization of the ROM mode when pflash0 is not set
then checks to ensure no pflash devices have been configured when using
IGVM, exiting with an error message if this is not the case.

Signed-off-by: Roy Hopkins <[email protected]>
The segment registers should be updated via a call to
cpu_x86_load_seg_cache(). However, this only works for the general
purpose segment registers and not LDTR or TR. In these cases, the
function call results in an out-of-bounds access on the env->segs array.
Fortunately the out-of-bounds access actually overflows into the correct
segment registers.

This commit fixes the handling of those segment registers.

Signed-off-by: Roy Hopkins <[email protected]>
When an SEV guest is started, the reset vector and state are
extracted from metadata that is contained in the firmware volume.

In preparation for using IGVM to setup the initial CPU state,
the code has been refactored to populate vmcb_save_area for each
CPU which is then applied during guest startup and CPU reset.

Signed-off-by: Roy Hopkins <[email protected]>
The ConfidentialGuestSupport object defines a number of virtual
functions that are called during processing of IGVM directives to query
or configure initial guest state. In order to support processing of IGVM
files, these functions need to be implemented by relevant isolation
hardware support code such as SEV.

This commit implements the required functions for SEV-ES and adds
support for processing IGVM files for configuring the guest.

Signed-off-by: Roy Hopkins <[email protected]>
IGVM support has been implemented for Confidential Guests that support
AMD SEV and AMD SEV-ES. Add some documentation that gives some
background on the IGVM format and how to use it to configure a
confidential guest.

Signed-off-by: Roy Hopkins <[email protected]>
When sev_get_capabilities() is called on non-SEV platforms, an error is
correctly generated stating it is not supported. The function should
return NULL at this point but instead continues to further check for
supported features, subsequently causing a crash.

This commit adds the missing NULL return.

Signed-off-by: Roy Hopkins <[email protected]>
For confidential guests a policy can be provided that defines the
security level, debug status, expected launch measurement and other
parameters that define the configuration of the confidential platform.

This commit adds a new function named set_guest_policy() that can be
implemented by each confidential platform, such as AMD SEV to set the
policy. This will allow configuration of the policy from a
multi-platform resource such as an IGVM file without the IGVM processor
requiring specific implementation details for each platform.

Signed-off-by: Roy Hopkins <[email protected]>
The initialization secions in IGVM files contains configuration that
should be applied to the guest platform before it is started. This
includes guest policy and other information that can affect the security
level and measurement of a confidential guest.

This commit refactors the existing section processing code to prepare it
for handling initialization sections as well as directives and iterates
the initialization sections in the IGVM file.

Signed-off-by: Roy Hopkins <[email protected]>
Adds a handler for the guest policy initialization IGVM section and
builds an SEV policy based on this information and the ID block
directive if present. The policy is applied using by calling
'set_guest_policy()' on the ConfidentialGuestSupport object.

Signed-off-by: Roy Hopkins <[email protected]>
The new cgs_set_guest_policy() function is provided to receive the guest
policy flags, SNP ID block and SNP ID authentication from guest
configuration such as an IGVM file and apply it to the platform prior to
launching the guest.

The policy is used to populate values for the existing 'policy',
'id_block' and 'id_auth' parameters. When provided, the guest policy is
applied and the ID block configuration is used to verify the launch
measurement and signatures. The guest is only successfully started if
the expected launch measurements match the actual measurements and the
signatures are valid.

Signed-off-by: Roy Hopkins <[email protected]>
This fixes the handling of return codes from the
KVM_SEV_SNP_LAUNCH_FINISH ioctl. The ioctl returns a success code if the
command was successfully sent to the SEV PSP regardless of whether the
command itself was successful or not. Any error from the PSP is returned
instead via the '&error' parameter.

This commit checks 'error' to see if the processing resulted in an error
and reports it as required.

Signed-off-by: Roy Hopkins <[email protected]>
The synchronization between VMSA and CPU state did not include the GDT,
IDT, LDTR or TR registers. These registers had default, non-zero values,
therefore if the VMSA did not match these values then a difference in
launch measurement is observed.

This change adds synchronization of these registers and adds them to the
VMSA field check.

Signed-off-by: Roy Hopkins <[email protected]>
Fix compile error due to no return statement in function
returning non-void.

Signed-off-by: Vijay Dhanraj <[email protected]>
Fix compile error when built without --enable-igvm
…_idblock

Implement IGVM policy and ID block support
Always pair igvm_get_buffer() with igvm_free_buffer() before exiting the
loop in supported_platform_compat_mask().

Signed-off-by: Peter Fang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.