Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

agent: retry mount on ENOMEM #788

Merged
merged 1 commit into from
May 23, 2024
Merged

Conversation

heynemax
Copy link
Contributor

There is a race between firecracker-containerd replacing the stub drive with the actual drive and mounting this drive. When the disk is replaced the kernel will schedule asynchronous work in virtblk_config_changed. In the meantime firecracker-containerd can proceed and already send a mount command to the agent running in the guest. This mount operation will, however, fail because the guest kernel still sees the stub drive with only 512 bytes in size. The resulting error code is a ENOMEM in this case. This commit therefore adds this as an retryable error code to accommodate for this situation.

The issue can be reproduced when an artificial msleep(1000) is added in virtblk_config_changed_work. This produced the following error:

error="failed to get stub drive for task "test": failed to mount
drive inside vm: failed to mount newly patched drive: rpc error: code
= Unknown desc = non-retryable failure mounting drive from
"/dev/vdb" to "/container/test/rootfs": cannot allocate memory"

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

There is a race between firecracker-containerd replacing the stub drive
with the actual drive and mounting this drive. When the disk is replaced
the kernel will schedule asynchronous work in virtblk_config_changed. In
the meantime firecracker-containerd can proceed and already send a mount
command to the agent running in the guest. This mount operation will,
however, fail because the guest kernel still sees the stub drive with
only 512 bytes in size. The resulting error code is a ENOMEM in this
case. This commit therefore adds this as an retryable error code to
accommodate for this situation.

The issue can be reproduced when an artificial msleep(1000) is added in
virtblk_config_changed_work. This produced the following error:

  error="failed to get stub drive for task \"test\": failed to mount
  drive inside vm: failed to mount newly patched drive: rpc error: code
  = Unknown desc = non-retryable failure mounting drive from
  \"/dev/vdb\" to \"/container/test/rootfs\": cannot allocate memory"

Signed-off-by: Maximilian Heyne <[email protected]>
@heynemax heynemax requested a review from a team as a code owner May 22, 2024 13:59
Copy link
Member

@henry118 henry118 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@henry118 henry118 merged commit f712b69 into firecracker-microvm:main May 23, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants