You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We've been running into trouble while trying to boot Flatcar on AWS instances which have an instance store. I've seen ignition-disks.service fail and I've seen GRUB itself fail to select a partition to boot. We've been using the same version of Flatcar for months and only started seeing these failures after switching to instances with an instance store. I'm filing this here mainly for visibility. We're switching back to EBS-only instances since the performance of these stores is, uh, not great (p99.9 latency of 17 s).
Environment and steps to reproduce
Boot Flatcar 3941.1.0 on an m6gd.medium or c5d.large AWS instance in us-west-2
Expected behavior
It boots successfully.
Actual behavior
There are two different manifestations. Sometimes we see:
error: file `/flatcar/grub/arm64-efi/all_video.mod' not found.
error: no such device: OEM.
error: invalid GPT signature.
Reading or updating the GPT failed!
Please file a bug with any messages above to Flatcar:
https://issues.flatcar.org/
Aborted. Press enter to exit GRUB.
And if GRUB finishes, we sometimes get stuck in the initrd with the following failure:
# systemctl status --failed --no-pager -l
× ignition-disks.service - Ignition (disks)
Loaded: loaded (/usr/lib/systemd/system/ignition-disks.service; static)
Active: failed (Result: signal) since Wed 2024-11-13 19:49:53 UTC; 8min ago
Docs: https://github.com/coreos/ignition
Process: 1195 ExecStart=/usr/bin/ignition --root=/sysroot --platform=${PLATFORM_ID} --stage=disks (code=killed, signal=TERM)
Main PID: 1195 (code=killed, signal=TERM)
Nov 13 19:49:53 localhost ignition[1195]: disks: createPartitions: created device alias for "/dev/nvme1n1": "/run/ignition/dev_aliases/dev/nvme1n1" -> "/dev/nvme1n1"
Nov 13 19:49:53 localhost ignition[1195]: disks: createPartitions: op(2): [started] partitioning "/run/ignition/dev_aliases/dev/nvme1n1"
Nov 13 19:49:53 localhost ignition[1195]: disks: createPartitions: op(2): wiping partition table requested on "/run/ignition/dev_aliases/dev/nvme1n1"
Nov 13 19:49:53 localhost ignition[1195]: disks: createPartitions: op(2): running sgdisk with options: [--zap-all /run/ignition/dev_aliases/dev/nvme1n1]
Nov 13 19:49:53 localhost ignition[1195]: disks: createPartitions: op(2): op(3): [started] deleting 0 partitions and creating 0 partitions on "/run/ignition/dev_aliases/dev/nvme1n1"
Nov 13 19:49:53 localhost ignition[1195]: disks: createPartitions: op(2): op(3): executing: "sgdisk" "--zap-all" "/run/ignition/dev_aliases/dev/nvme1n1"
Nov 13 19:49:53 localhost systemd[1]: ignition-disks.service: Main process exited, code=killed, status=15/TERM
Nov 13 19:49:53 localhost systemd[1]: ignition-disks.service: Failed with result 'signal'.
Nov 13 19:49:53 localhost systemd[1]: Stopped ignition-disks.service - Ignition (disks).
Nov 13 19:49:53 localhost systemd[1]: ignition-disks.service: Triggering OnFailure= dependencies.
(/dev/nvme1n1 is the instance store)
I'd estimate that we see one of these two failures 10% of the time.
The text was updated successfully, but these errors were encountered:
Thanks @crawford for the report. Do you have by any chance a Terraform snippet for repro? I'd be interested to see if we can reproduce this with Fedora CoreOS too as I guess there are two issues one with Ignition and the other one with the boot itself.
@tormath1 I don't, unfortunately. I think my days of Terraform are behind me. Here's the relevant portion of the Ignition config that we've been using though (sorry for forgetting to include it):
Description
We've been running into trouble while trying to boot Flatcar on AWS instances which have an instance store. I've seen ignition-disks.service fail and I've seen GRUB itself fail to select a partition to boot. We've been using the same version of Flatcar for months and only started seeing these failures after switching to instances with an instance store. I'm filing this here mainly for visibility. We're switching back to EBS-only instances since the performance of these stores is, uh, not great (p99.9 latency of 17 s).
Environment and steps to reproduce
Expected behavior
It boots successfully.
Actual behavior
There are two different manifestations. Sometimes we see:
And if GRUB finishes, we sometimes get stuck in the initrd with the following failure:
(
/dev/nvme1n1
is the instance store)I'd estimate that we see one of these two failures 10% of the time.
The text was updated successfully, but these errors were encountered: