Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

virtio: Enable full disk caching #76

Merged
merged 1 commit into from
Jan 9, 2024
Merged

Conversation

cgwalters
Copy link
Contributor

We're seeing highly reliable disk corruption in podman machine with the default configuration, and this
fixes it for me.

This looks like the same thing as
lima-vm/lima@488c95c

We're seeing highly reliable disk corruption in podman
machine with the default configuration, and this
fixes it for me.

This looks like the same thing as
lima-vm/lima@488c95c
Copy link

openshift-ci bot commented Jan 6, 2024

Hi @cgwalters. Thanks for your PR.

I'm waiting for a crc-org member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@@ -275,7 +275,9 @@ func (config *StorageConfig) toVz() (vz.StorageDeviceAttachment, error) {
if config.ImagePath == "" {
return nil, fmt.Errorf("missing mandatory 'path' option for %s device", config.DevName)
}
return vz.NewDiskImageStorageDeviceAttachment(config.ImagePath, config.ReadOnly)
syncMode := vz.DiskImageSynchronizationModeFsync
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future it'd make sense I guess to plumb through these options as API too, but given the current foot-gun nature of things, let's hardcode for now.

@cgwalters
Copy link
Contributor Author

We had an in-person chat on this, and I can definitely say that I can't reproduce any corruption with this change. I tried playing with some I/O stressing etc. and things seemed fine. There's a lot of discussion related to this in utmapp/UTM#4840 btw.

cgwalters added a commit to cgwalters/podman that referenced this pull request Jan 9, 2024
This depends on crc-org/vfkit#78
and is an alternative to crc-org/vfkit#76
that I like better for fixing
containers#21160

It looks like at least UTM switched to NVMe for Linux guests by default
for example.

[NO NEW TESTS NEEDED]

Signed-off-by: Colin Walters <[email protected]>
@gbraad
Copy link

gbraad commented Jan 9, 2024

Thanks. We also spoke with Sergio Lopez and got confirmed that this is most likely caused by caching/not flushing in time. We had several reports that this works for them.

Copy link

openshift-ci bot commented Jan 9, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: gbraad
Once this PR has been reviewed and has the lgtm label, please assign baude for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gbraad
Copy link

gbraad commented Jan 9, 2024

/ok-to-test
/lgtm

@gbraad gbraad merged commit 732de99 into crc-org:main Jan 9, 2024
4 of 5 checks passed
@cfergeau
Copy link
Collaborator

cfergeau commented Jan 9, 2024

Was wondering about performance impact, but tests in lima-vm/lima#2026 (comment) say there is no impact.

@cfergeau
Copy link
Collaborator

cfergeau commented Jan 9, 2024

lima-vm/lima#1957
utmapp/UTM#4840
contain a lot of useful information.

@jorhett
Copy link

jorhett commented Jan 13, 2024

This is a very important fix. When can we expect a release?

@cfergeau
Copy link
Collaborator

This is a very important fix. When can we expect a release?

I'm aiming to cut a release this week. In the mean time, I've already added the patch to this brew recipe https://github.com/cfergeau/homebrew-crc/blob/main/vfkit.rb

Are you also hitting this bug?

@jorhett
Copy link

jorhett commented Jan 16, 2024

Are you also hitting this bug?

No, I just recently discovered from a comment on another bug that applehv support was now available in Podman. I was poking around to see why it wasn't announce or visible in the docs (other than a mention of it as a valid provider) and stumbled on this bug. Figured I should wait for this to be in the release before I started recommending that our engineers give this a try. If it's already in the brew recipe, that may suffice.

What's your feeling about stability versus Qemu? Would you turn a few hundred engineers loose on this?

@cfergeau
Copy link
Collaborator

What's your feeling about stability versus Qemu? Would you turn a few hundred engineers loose on this?

podman's applehv support is still being worked on, which is why it's not on by default ;)
There's also containers/gvisor-tap-vsock#309 which is being fixed, which cause failures in some cases on podman machine start

@gbraad
Copy link

gbraad commented Jan 17, 2024

I wanted to add a comment in answer to:

just recently discovered from a comment on another bug that applehv support was now available in Podman

but saw Christophe already did. Some are still fixing some race conditions with applehv, as the implementations differs slightly with CRC. We hope to resolve this soon and converge; just note that this also needs a gvproxy release, which is more closely coordinated with Podman's release schedule.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants