Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

invalid GPT signature when booting with AWS instance store #1581

Open
crawford opened this issue Nov 13, 2024 · 2 comments
Open

invalid GPT signature when booting with AWS instance store #1581

crawford opened this issue Nov 13, 2024 · 2 comments
Labels
kind/bug Something isn't working platform/AWS

Comments

@crawford
Copy link

Description

We've been running into trouble while trying to boot Flatcar on AWS instances which have an instance store. I've seen ignition-disks.service fail and I've seen GRUB itself fail to select a partition to boot. We've been using the same version of Flatcar for months and only started seeing these failures after switching to instances with an instance store. I'm filing this here mainly for visibility. We're switching back to EBS-only instances since the performance of these stores is, uh, not great (p99.9 latency of 17 s).

Environment and steps to reproduce

  1. Boot Flatcar 3941.1.0 on an m6gd.medium or c5d.large AWS instance in us-west-2

Expected behavior

It boots successfully.

Actual behavior

There are two different manifestations. Sometimes we see:

error: file `/flatcar/grub/arm64-efi/all_video.mod' not found.
error: no such device: OEM.

error: invalid GPT signature.
Reading or updating the GPT failed!
Please file a bug with any messages above to Flatcar:

 https://issues.flatcar.org/

Aborted. Press enter to exit GRUB.

And if GRUB finishes, we sometimes get stuck in the initrd with the following failure:

# systemctl status --failed --no-pager -l
× ignition-disks.service - Ignition (disks)
     Loaded: loaded (/usr/lib/systemd/system/ignition-disks.service; static)
     Active: failed (Result: signal) since Wed 2024-11-13 19:49:53 UTC; 8min ago
       Docs: https://github.com/coreos/ignition
    Process: 1195 ExecStart=/usr/bin/ignition --root=/sysroot --platform=${PLATFORM_ID} --stage=disks (code=killed, signal=TERM)
   Main PID: 1195 (code=killed, signal=TERM)

Nov 13 19:49:53 localhost ignition[1195]: disks: createPartitions: created device alias for "/dev/nvme1n1": "/run/ignition/dev_aliases/dev/nvme1n1" -> "/dev/nvme1n1"
Nov 13 19:49:53 localhost ignition[1195]: disks: createPartitions: op(2): [started]  partitioning "/run/ignition/dev_aliases/dev/nvme1n1"
Nov 13 19:49:53 localhost ignition[1195]: disks: createPartitions: op(2): wiping partition table requested on "/run/ignition/dev_aliases/dev/nvme1n1"
Nov 13 19:49:53 localhost ignition[1195]: disks: createPartitions: op(2): running sgdisk with options: [--zap-all /run/ignition/dev_aliases/dev/nvme1n1]
Nov 13 19:49:53 localhost ignition[1195]: disks: createPartitions: op(2): op(3): [started]  deleting 0 partitions and creating 0 partitions on "/run/ignition/dev_aliases/dev/nvme1n1"
Nov 13 19:49:53 localhost ignition[1195]: disks: createPartitions: op(2): op(3): executing: "sgdisk" "--zap-all" "/run/ignition/dev_aliases/dev/nvme1n1"
Nov 13 19:49:53 localhost systemd[1]: ignition-disks.service: Main process exited, code=killed, status=15/TERM
Nov 13 19:49:53 localhost systemd[1]: ignition-disks.service: Failed with result 'signal'.
Nov 13 19:49:53 localhost systemd[1]: Stopped ignition-disks.service - Ignition (disks).
Nov 13 19:49:53 localhost systemd[1]: ignition-disks.service: Triggering OnFailure= dependencies.

(/dev/nvme1n1 is the instance store)

I'd estimate that we see one of these two failures 10% of the time.

@tormath1
Copy link
Contributor

Thanks @crawford for the report. Do you have by any chance a Terraform snippet for repro? I'd be interested to see if we can reproduce this with Fedora CoreOS too as I guess there are two issues one with Ignition and the other one with the boot itself.

@crawford
Copy link
Author

@tormath1 I don't, unfortunately. I think my days of Terraform are behind me. Here's the relevant portion of the Ignition config that we've been using though (sorry for forgetting to include it):

{
  "ignition": {
    "version": "3.4.0"
  },
  "storage": {
    "disks": [{
      "device": "/dev/nvme1n1",
      "partitions": [{
        "label": "SWAP",
        "sizeMiB": 10240
      }, {
        "label": "DOCKER"
      }],
      "wipeTable": true
    }],
    "filesystems": [{
      "device": "/dev/disk/by-partlabel/SWAP",
      "format": "swap",
      "label": "SWAP",
      "wipeFilesystem": true
    }, {
      "device": "/dev/disk/by-partlabel/DOCKER",
      "format": "btrfs",
      "label": "DOCKER",
      "wipeFilesystem": true
    }]
    "units": [{
      "contents": "\n[Mount]\nWhat=/dev/disk/by-label/DOCKER\nWhere=/var/lib/docker\n\n[Install]\nWantedBy=local-fs.target\n",
      "enabled": true,
      "name": "var-lib-docker.mount"
    }, {
      "mask": true,
      "name": "update-engine.service"
    }, {
      "mask": true,
      "name": "locksmithd.service"
    }, {
      "mask": true,
      "name": "sshkeys.service"
    }, {
      "mask": true,
      "name": "amazon-ssm-agent.service"
    }]
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working platform/AWS
Projects
Status: 📝 Needs Triage
Development

No branches or pull requests

2 participants