Machines in a bare-metall cluster freeze randomly with no reaction until rebooted #2632

smattheis · 2019-11-11T18:42:23Z

Issue Report

Bug

In a bare-metall cluster of 24 (desktop) machines that boot CoreOS via PXE boot, machines randomly fail/freeze from time to time. All machines have been affected over time. The failure scenario is absolutely the same for any failing machine. See below for detailed description of system freeze.

Container Linux Version

$ cat /etc/os-release
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=2317.0.1
VERSION_ID=2317.0.1
BUILD_ID=2019-11-06-2121
PRETTY_NAME="Container Linux by CoreOS 2317.0.1 (Rhyolite)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"

Environment

The cluster is configured to host Kubernetes. All 24 machines have Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz with 32 GB memory and a 220 GB SSD. The machines are booted via PXE with the following ignition config:

passwd:
  users:
    - name: core
      ssh_authorized_keys:
        - ssh-rsa AAAAB3<...> core
storage:
  disks:
    - device: /dev/sda
      wipe_table: true
      partitions:
      - label: ROOT
  filesystems:
    - mount:
        device: /dev/disk/by-partlabel/ROOT
        format: ext4
        wipe_filesystem: true
        label: ROOT
  directories:
    - path: /var/log/journal
      filesystem: root
      mode: 0755
  files:
    - path: /etc/systemd/timesyncd.conf
      filesystem: root
      mode: 0644
      contents:
        inline: |
          [Time]
          NTP=172.16.0.1
    - path: /etc/docker/daemon.json
      filesystem: root
      mode: 0644
      contents:
        inline: |
          {
            "exec-opts": ["native.cgroupdriver=systemd"],
            "insecure-registries" : ["registry.lab:5000"],
            "log-driver": "json-file",
            "log-opts": {
              "max-size": "100m"
            },
            "storage-driver": "overlay2"
          }
    - path: /home/core/.toolboxrc
      filesystem: root
      mode: 0644
      contents:
        inline: |
          #TOOLBOX_DOCKER_IMAGE=registry.lab:5000/toolbox
          #TOOLBOX_DOCKER_TAG=latest
systemd:
  units:
    - name: docker.service
      enabled: true

Expected Behavior

No freezes.

Actual Behavior

If a machine freezes, it is still powered on, screen shows a fully static output (usually some log messages and login screen), no network connectivity (but NIC is powered and blinks), no reaction to keyboard plugging or input. Only after a reboot the machine is up again and behaves as normal. The journald logging completely stops with the freeze. No kernel core dump or journal log that indicates the problem.

Reproduction Steps

I don't know if someone else can reproduce the problem. However, I can reproduce the failure scenario with 95% probability if I deploy an Apache Cassandra cluster on the Kubernetes cluster and run a standard batch data ingest of few gigabytes that takes about one hour where one or two machines usually fail as described during that ingest. Since I have no idea where to continue, I need some help or guidance to find the problem.

Other Information

I have tried various configurations and options:

I have tried the following versions of CoreOS: 1520.9.0, 2079.3.0, 2247.5.0, 2303.0.0, 2317.0.1 (The problem is the same for all of them.)
I have tried different cgroup drivers for Docker: systemd and cgroupfs, as recommended by Kubernetes. (The failure occurs with both.)
I have tried different CNI plugins for Kubernetes: flannel and calico. (The failure occurs with both.)
I have added debug boot option to get some information about the failure. However, I can't find any suspicious message. A snippet of the journal log is attached here where the last message before the reboot denotes roughly the time of system freeze:

journalctl --since "2019-11-08 18:42:29" --lines 250

-- Logs begin at Fri 2019-11-08 16:23:25 UTC, end at Mon 2019-11-11 18:25:28 UTC. --
Nov 08 18:42:29 node-17 systemd-networkd[591]: LLDP: Invoking callback for 'refreshed' event.
Nov 08 18:42:29 node-17 systemd-networkd[591]: LLDP: Successfully processed LLDP datagram.
Nov 08 18:42:59 node-17 systemd-networkd[591]: LLDP: Invoking callback for 'refreshed' event.
Nov 08 18:42:59 node-17 systemd-networkd[591]: LLDP: Successfully processed LLDP datagram.
Nov 08 18:43:22 node-17 systemd[1]: systemd-resolved.service: Got notification message from PID 688 (WATCHDOG=1)
Nov 08 18:43:24 node-17 systemd-networkd[591]: DHCP CLIENT (0x3ed4862b): DISCOVER
Nov 08 18:43:24 node-17 systemd[1]: systemd-networkd.service: Got notification message from PID 591 (WATCHDOG=1)
Nov 08 18:43:29 node-17 systemd-networkd[591]: LLDP: Invoking callback for 'refreshed' event.
Nov 08 18:43:29 node-17 systemd-networkd[591]: LLDP: Successfully processed LLDP datagram.
Nov 08 18:43:29 node-17 systemd[1]: systemd-journald.service: Got notification message from PID 558 (WATCHDOG=1)
Nov 08 18:43:59 node-17 systemd-networkd[591]: LLDP: Invoking callback for 'refreshed' event.
Nov 08 18:43:59 node-17 systemd-networkd[591]: LLDP: Successfully processed LLDP datagram.
Nov 08 18:44:22 node-17 systemd[1]: systemd-logind.service: Got notification message from PID 724 (WATCHDOG=1)
Nov 08 18:44:22 node-17 systemd[1]: systemd-udevd.service: Got notification message from PID 581 (WATCHDOG=1)
Nov 08 18:44:22 node-17 systemd[1]: systemd-timesyncd.service: Got notification message from PID 689 (WATCHDOG=1)
Nov 08 18:44:28 node-17 systemd-networkd[591]: DHCP CLIENT (0x3ed4862b): DISCOVER
Nov 08 18:44:28 node-17 systemd[1]: systemd-networkd.service: Got notification message from PID 591 (WATCHDOG=1)
Nov 08 18:44:29 node-17 systemd-networkd[591]: LLDP: Invoking callback for 'refreshed' event.
Nov 08 18:44:29 node-17 systemd-networkd[591]: LLDP: Successfully processed LLDP datagram.
Nov 08 18:44:59 node-17 systemd-networkd[591]: LLDP: Invoking callback for 'refreshed' event.
Nov 08 18:44:59 node-17 systemd-networkd[591]: LLDP: Successfully processed LLDP datagram.
Nov 08 18:44:59 node-17 systemd[1]: systemd-journald.service: Got notification message from PID 558 (WATCHDOG=1)
Nov 08 18:45:22 node-17 systemd[1]: systemd-resolved.service: Got notification message from PID 688 (WATCHDOG=1)
Nov 08 18:45:29 node-17 systemd-networkd[591]: LLDP: Invoking callback for 'refreshed' event.
Nov 08 18:45:29 node-17 systemd[1]: systemd-networkd.service: Got notification message from PID 591 (WATCHDOG=1)
Nov 08 18:45:29 node-17 systemd-networkd[591]: LLDP: Successfully processed LLDP datagram.
Nov 08 18:45:33 node-17 systemd-networkd[591]: DHCP CLIENT (0x3ed4862b): DISCOVER
Nov 08 18:45:59 node-17 systemd-networkd[591]: LLDP: Invoking callback for 'refreshed' event.
Nov 08 18:45:59 node-17 systemd-networkd[591]: LLDP: Successfully processed LLDP datagram.
Nov 08 18:46:22 node-17 systemd[1]: systemd-logind.service: Got notification message from PID 724 (WATCHDOG=1)
Nov 08 18:46:22 node-17 systemd[1]: systemd-udevd.service: Got notification message from PID 581 (WATCHDOG=1)
Nov 08 18:46:22 node-17 systemd[1]: systemd-timesyncd.service: Got notification message from PID 689 (WATCHDOG=1)
Nov 08 18:46:29 node-17 systemd-networkd[591]: LLDP: Invoking callback for 'refreshed' event.
Nov 08 18:46:29 node-17 systemd[1]: systemd-networkd.service: Got notification message from PID 591 (WATCHDOG=1)
Nov 08 18:46:29 node-17 systemd-networkd[591]: LLDP: Successfully processed LLDP datagram.
Nov 08 18:46:29 node-17 systemd[1]: systemd-journald.service: Got notification message from PID 558 (WATCHDOG=1)
Nov 08 18:46:37 node-17 systemd-networkd[591]: DHCP CLIENT (0x3ed4862b): DISCOVER
Nov 08 18:46:59 node-17 systemd-networkd[591]: LLDP: Invoking callback for 'refreshed' event.
Nov 08 18:46:59 node-17 systemd-networkd[591]: LLDP: Successfully processed LLDP datagram.
Nov 08 18:47:22 node-17 systemd[1]: systemd-resolved.service: Got notification message from PID 688 (WATCHDOG=1)
Nov 08 18:47:29 node-17 systemd-networkd[591]: LLDP: Invoking callback for 'refreshed' event.
Nov 08 18:47:29 node-17 systemd[1]: systemd-networkd.service: Got notification message from PID 591 (WATCHDOG=1)
Nov 08 18:47:29 node-17 systemd-networkd[591]: LLDP: Successfully processed LLDP datagram.
Nov 08 18:47:40 node-17 systemd-networkd[591]: DHCP CLIENT (0x3ed4862b): DISCOVER
Nov 08 18:47:59 node-17 systemd-networkd[591]: LLDP: Invoking callback for 'refreshed' event.
Nov 08 18:47:59 node-17 systemd-networkd[591]: LLDP: Successfully processed LLDP datagram.
Nov 08 18:47:59 node-17 systemd[1]: systemd-journald.service: Got notification message from PID 558 (WATCHDOG=1)
Nov 08 18:48:22 node-17 systemd[1]: systemd-logind.service: Got notification message from PID 724 (WATCHDOG=1)
Nov 08 18:48:22 node-17 systemd[1]: systemd-timesyncd.service: Got notification message from PID 689 (WATCHDOG=1)
Nov 08 18:48:22 node-17 systemd[1]: systemd-udevd.service: Got notification message from PID 581 (WATCHDOG=1)
Nov 08 18:48:29 node-17 systemd-networkd[591]: LLDP: Invoking callback for 'refreshed' event.
Nov 08 18:48:29 node-17 systemd[1]: systemd-networkd.service: Got notification message from PID 591 (WATCHDOG=1)
Nov 08 18:48:29 node-17 systemd-networkd[591]: LLDP: Successfully processed LLDP datagram.
Nov 08 18:48:44 node-17 systemd-networkd[591]: DHCP CLIENT (0x3ed4862b): DISCOVER
Nov 08 18:48:59 node-17 systemd-networkd[591]: LLDP: Invoking callback for 'refreshed' event.
Nov 08 18:48:59 node-17 systemd-networkd[591]: LLDP: Successfully processed LLDP datagram.
Nov 08 18:49:22 node-17 systemd[1]: systemd-resolved.service: Got notification message from PID 688 (WATCHDOG=1)
Nov 08 18:49:29 node-17 systemd-networkd[591]: LLDP: Invoking callback for 'refreshed' event.
Nov 08 18:49:29 node-17 systemd[1]: systemd-networkd.service: Got notification message from PID 591 (WATCHDOG=1)
Nov 08 18:49:29 node-17 systemd-networkd[591]: LLDP: Successfully processed LLDP datagram.
Nov 08 18:49:29 node-17 systemd[1]: systemd-journald.service: Got notification message from PID 558 (WATCHDOG=1)
Nov 08 18:49:47 node-17 systemd-networkd[591]: DHCP CLIENT (0x3ed4862b): DISCOVER
Nov 08 18:49:59 node-17 systemd-networkd[591]: LLDP: Invoking callback for 'refreshed' event.
Nov 08 18:49:59 node-17 systemd-networkd[591]: LLDP: Successfully processed LLDP datagram.
Nov 08 18:50:22 node-17 systemd[1]: systemd-logind.service: Got notification message from PID 724 (WATCHDOG=1)
Nov 08 18:50:22 node-17 systemd[1]: systemd-udevd.service: Got notification message from PID 581 (WATCHDOG=1)
Nov 08 18:50:22 node-17 systemd[1]: systemd-timesyncd.service: Got notification message from PID 689 (WATCHDOG=1)
Nov 08 18:50:29 node-17 systemd-networkd[591]: LLDP: Invoking callback for 'refreshed' event.
Nov 08 18:50:29 node-17 systemd[1]: systemd-networkd.service: Got notification message from PID 591 (WATCHDOG=1)
Nov 08 18:50:29 node-17 systemd-networkd[591]: LLDP: Successfully processed LLDP datagram.
Nov 08 18:50:52 node-17 systemd-networkd[591]: DHCP CLIENT (0x3ed4862b): DISCOVER
Nov 08 18:50:59 node-17 systemd-networkd[591]: LLDP: Invoking callback for 'refreshed' event.
Nov 08 18:50:59 node-17 systemd-networkd[591]: LLDP: Successfully processed LLDP datagram.
Nov 08 18:50:59 node-17 systemd[1]: systemd-journald.service: Got notification message from PID 558 (WATCHDOG=1)
Nov 08 18:51:22 node-17 systemd[1]: systemd-resolved.service: Got notification message from PID 688 (WATCHDOG=1)
Nov 08 18:51:29 node-17 systemd-networkd[591]: LLDP: Invoking callback for 'refreshed' event.
Nov 08 18:51:29 node-17 systemd[1]: systemd-networkd.service: Got notification message from PID 591 (WATCHDOG=1)
Nov 08 18:51:29 node-17 systemd-networkd[591]: LLDP: Successfully processed LLDP datagram.
Nov 08 18:51:56 node-17 systemd-networkd[591]: DHCP CLIENT (0x3ed4862b): DISCOVER
Nov 08 18:51:59 node-17 systemd-networkd[591]: LLDP: Invoking callback for 'refreshed' event.
Nov 08 18:51:59 node-17 systemd-networkd[591]: LLDP: Successfully processed LLDP datagram.
Nov 08 18:52:22 node-17 systemd[1]: systemd-logind.service: Got notification message from PID 724 (WATCHDOG=1)
Nov 08 18:52:22 node-17 systemd[1]: systemd-udevd.service: Got notification message from PID 581 (WATCHDOG=1)
Nov 08 18:52:22 node-17 systemd[1]: systemd-timesyncd.service: Got notification message from PID 689 (WATCHDOG=1)
Nov 08 18:52:29 node-17 systemd-networkd[591]: LLDP: Invoking callback for 'refreshed' event.
Nov 08 18:52:29 node-17 systemd[1]: systemd-networkd.service: Got notification message from PID 591 (WATCHDOG=1)
Nov 08 18:52:29 node-17 systemd-networkd[591]: LLDP: Successfully processed LLDP datagram.
Nov 08 18:52:29 node-17 systemd[1]: systemd-journald.service: Got notification message from PID 558 (WATCHDOG=1)
-- Reboot --
Nov 11 16:54:08 localhost kernel: microcode: microcode updated early to revision 0x21, date = 2019-02-13
Nov 11 16:54:08 localhost kernel: Linux version 4.19.81-coreos-r1 (jenkins@ip-10-7-32-103) (gcc version 8.3.0 (Gentoo Hardened 8.3.0-r1 p1.1)) #1 SMP Wed Nov 6 20:47:30 -00 2019
Nov 11 16:54:08 localhost kernel: Command line: BOOT_IMAGE=coreos_production_pxe.vmlinuz root=LABEL=ROOT coreos.config.url=http://172.16.0.1:8080/pxe/config.ign debug initrd=coreos_production_pxe_image.cpio.gz
Nov 11 16:54:08 localhost kernel: x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
Nov 11 16:54:08 localhost kernel: x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
Nov 11 16:54:08 localhost kernel: x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
Nov 11 16:54:08 localhost kernel: x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
Nov 11 16:54:08 localhost kernel: x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
Nov 11 16:54:08 localhost kernel: BIOS-provided physical RAM map:
Nov 11 16:54:08 localhost kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009e7ff] usable
Nov 11 16:54:08 localhost kernel: BIOS-e820: [mem 0x000000000009e800-0x000000000009ffff] reserved
Nov 11 16:54:08 localhost kernel: BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
Nov 11 16:54:08 localhost kernel: BIOS-e820: [mem 0x0000000000100000-0x000000001fffffff] usable
Nov 11 16:54:08 localhost kernel: BIOS-e820: [mem 0x0000000020000000-0x00000000201fffff] reserved
Nov 11 16:54:08 localhost kernel: BIOS-e820: [mem 0x0000000020200000-0x0000000040003fff] usable
Nov 11 16:54:08 localhost kernel: BIOS-e820: [mem 0x0000000040004000-0x0000000040004fff] reserved
Nov 11 16:54:08 localhost kernel: BIOS-e820: [mem 0x0000000040005000-0x00000000cc603fff] usable
Nov 11 16:54:08 localhost kernel: BIOS-e820: [mem 0x00000000cc604000-0x00000000cca86fff] reserved
Nov 11 16:54:08 localhost kernel: BIOS-e820: [mem 0x00000000cca87000-0x00000000cca97fff] ACPI data
Nov 11 16:54:08 localhost kernel: BIOS-e820: [mem 0x00000000cca98000-0x00000000ccbbffff] ACPI NVS
Nov 11 16:54:08 localhost kernel: BIOS-e820: [mem 0x00000000ccbc0000-0x00000000cd7f3fff] reserved
Nov 11 16:54:08 localhost kernel: BIOS-e820: [mem 0x00000000cd7f4000-0x00000000cd7f4fff] usable
Nov 11 16:54:08 localhost kernel: BIOS-e820: [mem 0x00000000cd7f5000-0x00000000cd837fff] ACPI NVS
Nov 11 16:54:08 localhost kernel: BIOS-e820: [mem 0x00000000cd838000-0x00000000cdc5dfff] usable
Nov 11 16:54:08 localhost kernel: BIOS-e820: [mem 0x00000000cdc5e000-0x00000000cdff3fff] reserved
Nov 11 16:54:08 localhost kernel: BIOS-e820: [mem 0x00000000cdff4000-0x00000000cdffffff] usable
Nov 11 16:54:08 localhost kernel: BIOS-e820: [mem 0x00000000cf000000-0x00000000df1fffff] reserved
Nov 11 16:54:08 localhost kernel: BIOS-e820: [mem 0x00000000f8000000-0x00000000fbffffff] reserved
Nov 11 16:54:08 localhost kernel: BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
Nov 11 16:54:08 localhost kernel: BIOS-e820: [mem 0x00000000fed00000-0x00000000fed03fff] reserved
Nov 11 16:54:08 localhost kernel: BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed1ffff] reserved
Nov 11 16:54:08 localhost kernel: BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
Nov 11 16:54:08 localhost kernel: BIOS-e820: [mem 0x00000000ff000000-0x00000000ffffffff] reserved
Nov 11 16:54:08 localhost kernel: BIOS-e820: [mem 0x0000000100000000-0x000000081fdfffff] usable
Nov 11 16:54:08 localhost kernel: NX (Execute Disable) protection: active
Nov 11 16:54:08 localhost kernel: SMBIOS 2.7 present.
Nov 11 16:54:08 localhost kernel: DMI: System manufacturer System Product Name/P8H77-M PRO, BIOS 1101 02/04/2013
Nov 11 16:54:08 localhost kernel: tsc: Fast TSC calibration using PIT
Nov 11 16:54:08 localhost kernel: tsc: Detected 3400.325 MHz processor
Nov 11 16:54:08 localhost kernel: e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
Nov 11 16:54:08 localhost kernel: e820: remove [mem 0x000a0000-0x000fffff] usable
Nov 11 16:54:08 localhost kernel: last_pfn = 0x81fe00 max_arch_pfn = 0x400000000
Nov 11 16:54:08 localhost kernel: MTRR default type: uncachable
Nov 11 16:54:08 localhost kernel: MTRR fixed ranges enabled:
Nov 11 16:54:08 localhost kernel:   00000-9FFFF write-back
Nov 11 16:54:08 localhost kernel:   A0000-BFFFF uncachable
Nov 11 16:54:08 localhost kernel:   C0000-D3FFF write-protect
Nov 11 16:54:08 localhost kernel:   D4000-E7FFF uncachable
Nov 11 16:54:08 localhost kernel:   E8000-FFFFF write-protect
Nov 11 16:54:08 localhost kernel: MTRR variable ranges enabled:
Nov 11 16:54:08 localhost kernel:   0 base 000000000 mask 800000000 write-back
Nov 11 16:54:08 localhost kernel:   1 base 800000000 mask FE0000000 write-back
Nov 11 16:54:08 localhost kernel:   2 base 0E0000000 mask FE0000000 uncachable
Nov 11 16:54:08 localhost kernel:   3 base 0D0000000 mask FF0000000 uncachable
Nov 11 16:54:08 localhost kernel:   4 base 0CF000000 mask FFF000000 uncachable
Nov 11 16:54:08 localhost kernel:   5 base 81FE00000 mask FFFE00000 uncachable
Nov 11 16:54:08 localhost kernel:   6 disabled
Nov 11 16:54:08 localhost kernel:   7 disabled
Nov 11 16:54:08 localhost kernel:   8 disabled
Nov 11 16:54:08 localhost kernel:   9 disabled
Nov 11 16:54:08 localhost kernel: x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WP  UC- WT  
Nov 11 16:54:08 localhost kernel: total RAM covered: 32494M
Nov 11 16:54:08 localhost kernel: Found optimal setting for mtrr clean up
Nov 11 16:54:08 localhost kernel:  gran_size: 64K         chunk_size: 32M         num_reg: 9          lose cover RAM: 0G
Nov 11 16:54:08 localhost kernel: e820: update [mem 0xcf000000-0xffffffff] usable ==> reserved
Nov 11 16:54:08 localhost kernel: last_pfn = 0xce000 max_arch_pfn = 0x400000000
Nov 11 16:54:08 localhost kernel: BRK [0x1bb001000, 0x1bb001fff] PGTABLE
Nov 11 16:54:08 localhost kernel: BRK [0x1bb002000, 0x1bb002fff] PGTABLE
Nov 11 16:54:08 localhost kernel: BRK [0x1bb003000, 0x1bb003fff] PGTABLE
Nov 11 16:54:08 localhost kernel: BRK [0x1bb004000, 0x1bb004fff] PGTABLE
Nov 11 16:54:08 localhost kernel: BRK [0x1bb005000, 0x1bb005fff] PGTABLE
Nov 11 16:54:08 localhost kernel: BRK [0x1bb006000, 0x1bb006fff] PGTABLE
Nov 11 16:54:08 localhost kernel: BRK [0x1bb007000, 0x1bb007fff] PGTABLE
Nov 11 16:54:08 localhost kernel: BRK [0x1bb008000, 0x1bb008fff] PGTABLE
Nov 11 16:54:08 localhost kernel: BRK [0x1bb009000, 0x1bb009fff] PGTABLE
Nov 11 16:54:08 localhost kernel: BRK [0x1bb00a000, 0x1bb00afff] PGTABLE
Nov 11 16:54:08 localhost kernel: BRK [0x1bb00b000, 0x1bb00bfff] PGTABLE
Nov 11 16:54:08 localhost kernel: BRK [0x1bb00c000, 0x1bb00cfff] PGTABLE
Nov 11 16:54:08 localhost kernel: RAMDISK: [mem 0x66b3e000-0x7fffffff]
Nov 11 16:54:08 localhost kernel: ACPI: Early table checksum verification disabled
Nov 11 16:54:08 localhost kernel: ACPI: RSDP 0x00000000000F0490 000024 (v02 ALASKA)
Nov 11 16:54:08 localhost kernel: ACPI: XSDT 0x00000000CCA8A080 000074 (v01 ALASKA A M I    01072009 AMI  00010013)
Nov 11 16:54:08 localhost kernel: ACPI: FACP 0x00000000CCA95568 00010C (v05 ALASKA A M I    01072009 AMI  00010013)
Nov 11 16:54:08 localhost kernel: ACPI: DSDT 0x00000000CCA8A188 00B3D9 (v02 ALASKA A M I    00000022 INTL 20051117)
Nov 11 16:54:08 localhost kernel: ACPI: FACS 0x00000000CCBBE080 000040
Nov 11 16:54:08 localhost kernel: ACPI: APIC 0x00000000CCA95678 000092 (v03 ALASKA A M I    01072009 AMI  00010013)
Nov 11 16:54:08 localhost kernel: ACPI: FPDT 0x00000000CCA95710 000044 (v01 ALASKA A M I    01072009 AMI  00010013)
Nov 11 16:54:08 localhost kernel: ACPI: MCFG 0x00000000CCA95758 00003C (v01 ALASKA A M I    01072009 MSFT 00000097)
Nov 11 16:54:08 localhost kernel: ACPI: HPET 0x00000000CCA95798 000038 (v01 ALASKA A M I    01072009 AMI. 00000005)
Nov 11 16:54:08 localhost kernel: ACPI: SSDT 0x00000000CCA957D0 000495 (v01 IdeRef IdeTable 00001000 INTL 20091112)
Nov 11 16:54:08 localhost kernel: ACPI: SSDT 0x00000000CCA95C68 0009AA (v01 PmRef  Cpu0Ist  00003000 INTL 20051117)
Nov 11 16:54:08 localhost kernel: ACPI: SSDT 0x00000000CCA96618 000A92 (v01 PmRef  CpuPm    00003000 INTL 20051117)
Nov 11 16:54:08 localhost kernel: ACPI: BGRT 0x00000000CCA971C0 000038 (v00 ALASKA A M I    01072009 AMI  00010013)
Nov 11 16:54:08 localhost kernel: ACPI: DMAR 0x00000000CCA97108 0000B8 (v01 INTEL  SNB      00000001 INTL 00000001)
Nov 11 16:54:08 localhost kernel: ACPI: Local APIC address 0xfee00000
Nov 11 16:54:08 localhost kernel: No NUMA configuration found
Nov 11 16:54:08 localhost kernel: Faking a node at [mem 0x0000000000000000-0x000000081fdfffff]
Nov 11 16:54:08 localhost kernel: NODE_DATA(0) allocated [mem 0x81fddc000-0x81fde1fff]
Nov 11 16:54:08 localhost kernel: Zone ranges:
Nov 11 16:54:08 localhost kernel:   DMA      [mem 0x0000000000001000-0x0000000000ffffff]
Nov 11 16:54:08 localhost kernel:   DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
Nov 11 16:54:08 localhost kernel:   Normal   [mem 0x0000000100000000-0x000000081fdfffff]
Nov 11 16:54:08 localhost kernel: Movable zone start for each node
Nov 11 16:54:08 localhost kernel: Early memory node ranges
Nov 11 16:54:08 localhost kernel:   node   0: [mem 0x0000000000001000-0x000000000009dfff]
Nov 11 16:54:08 localhost kernel:   node   0: [mem 0x0000000000100000-0x000000001fffffff]
Nov 11 16:54:08 localhost kernel:   node   0: [mem 0x0000000020200000-0x0000000040003fff]
Nov 11 16:54:08 localhost kernel:   node   0: [mem 0x0000000040005000-0x00000000cc603fff]
Nov 11 16:54:08 localhost kernel:   node   0: [mem 0x00000000cd7f4000-0x00000000cd7f4fff]
Nov 11 16:54:08 localhost kernel:   node   0: [mem 0x00000000cd838000-0x00000000cdc5dfff]
Nov 11 16:54:08 localhost kernel:   node   0: [mem 0x00000000cdff4000-0x00000000cdffffff]
Nov 11 16:54:08 localhost kernel:   node   0: [mem 0x0000000100000000-0x000000081fdfffff]
Nov 11 16:54:08 localhost kernel: Reserved but unavailable: 99 pages
Nov 11 16:54:08 localhost kernel: Initmem setup node 0 [mem 0x0000000000001000-0x000000081fdfffff]
Nov 11 16:54:08 localhost kernel: On node 0 totalpages: 8308179
Nov 11 16:54:08 localhost kernel:   DMA zone: 64 pages used for memmap
Nov 11 16:54:08 localhost kernel:   DMA zone: 22 pages reserved
Nov 11 16:54:08 localhost kernel:   DMA zone: 3997 pages, LIFO batch:0
Nov 11 16:54:08 localhost kernel:   DMA32 zone: 13025 pages used for memmap
Nov 11 16:54:08 localhost kernel:   DMA32 zone: 833590 pages, LIFO batch:63
Nov 11 16:54:08 localhost kernel:   Normal zone: 116728 pages used for memmap
Nov 11 16:54:08 localhost kernel:   Normal zone: 7470592 pages, LIFO batch:63
Nov 11 16:54:08 localhost kernel: Reserving Intel graphics memory at [mem 0xcf200000-0xdf1fffff]
Nov 11 16:54:08 localhost kernel: ACPI: PM-Timer IO Port: 0x408
Nov 11 16:54:08 localhost kernel: ACPI: Local APIC address 0xfee00000
Nov 11 16:54:08 localhost kernel: ACPI: LAPIC_NMI (acpi_id[0xff] high edge lint[0x1])
Nov 11 16:54:08 localhost kernel: IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23
Nov 11 16:54:08 localhost kernel: ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
Nov 11 16:54:08 localhost kernel: ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
Nov 11 16:54:08 localhost kernel: ACPI: IRQ0 used by override.
Nov 11 16:54:08 localhost kernel: ACPI: IRQ9 used by override.
Nov 11 16:54:08 localhost kernel: Using ACPI (MADT) for SMP configuration information
Nov 11 16:54:08 localhost kernel: ACPI: HPET id: 0x8086a701 base: 0xfed00000
Nov 11 16:54:08 localhost kernel: smpboot: Allowing 8 CPUs, 0 hotplug CPUs
Nov 11 16:54:08 localhost kernel: [mem 0xdf200000-0xf7ffffff] available for PCI devices
Nov 11 16:54:08 localhost kernel: Booting paravirtualized kernel on bare hardware
Nov 11 16:54:08 localhost kernel: clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1910969940391419 ns
Nov 11 16:54:08 localhost kernel: random: get_random_bytes called from start_kernel+0x93/0x522 with crng_init=0
Nov 11 16:54:08 localhost kernel: setup_percpu: NR_CPUS:512 nr_cpumask_bits:512 nr_cpu_ids:8 nr_node_ids:1
Nov 11 16:54:08 localhost kernel: percpu: Embedded 44 pages/cpu s143256 r8192 d28776 u262144
Nov 11 16:54:08 localhost kernel: pcpu-alloc: s143256 r8192 d28776 u262144 alloc=1*2097152
Nov 11 16:54:08 localhost kernel: pcpu-alloc: [0] 0 1 2 3 4 5 6 7 
Nov 11 16:54:08 localhost kernel: Built 1 zonelists, mobility grouping on.  Total pages: 8178340
Nov 11 16:54:08 localhost kernel: Policy zone: Normal
Nov 11 16:54:08 localhost kernel: Kernel command line: rootflags=rw mount.usrflags=ro BOOT_IMAGE=coreos_production_pxe.vmlinuz root=LABEL=ROOT coreos.config.url=http://172.16.0.1:8080/pxe/config.ign debug initrd=coreos_production_pxe_image.cpio.gz
Nov 11 16:54:08 localhost kernel: Memory: 32164632K/33232716K available (10252K kernel code, 1217K rwdata, 5488K rodata, 41396K init, 1660K bss, 1068084K reserved, 0K cma-reserved)
Nov 11 16:54:08 localhost kernel: SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1
Nov 11 16:54:08 localhost kernel: Kernel/User page tables isolation: enabled
Nov 11 16:54:08 localhost kernel: ftrace: allocating 29194 entries in 115 pages
Nov 11 16:54:08 localhost kernel: rcu: Hierarchical RCU implementation.
Nov 11 16:54:08 localhost kernel: rcu:         RCU event tracing is enabled.
Nov 11 16:54:08 localhost kernel: rcu:         RCU restricting CPUs from NR_CPUS=512 to nr_cpu_ids=8.
Nov 11 16:54:08 localhost kernel: rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=8
Nov 11 16:54:08 localhost kernel: NR_IRQS: 33024, nr_irqs: 488, preallocated irqs: 16
Nov 11 16:54:08 localhost kernel: Console: colour VGA+ 80x25
Nov 11 16:54:08 localhost kernel: console [tty0] enabled
Nov 11 16:54:08 localhost kernel: ACPI: Core revision 20180810
Nov 11 16:54:08 localhost kernel: clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 133484882848 ns
Nov 11 16:54:08 localhost kernel: hpet clockevent registered
Nov 11 16:54:08 localhost kernel: APIC: Switch to symmetric I/O mode setup
Nov 11 16:54:08 localhost kernel: DMAR: Host address width 36
Nov 11 16:54:08 localhost kernel: DMAR: DRHD base: 0x000000fed90000 flags: 0x0

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Machines in a bare-metall cluster freeze randomly with no reaction until rebooted #2632

Machines in a bare-metall cluster freeze randomly with no reaction until rebooted #2632

smattheis commented Nov 11, 2019 •

edited

Loading

Machines in a bare-metall cluster freeze randomly with no reaction until rebooted #2632

Machines in a bare-metall cluster freeze randomly with no reaction until rebooted #2632

Comments

smattheis commented Nov 11, 2019 • edited Loading

Issue Report

Bug

Container Linux Version

Environment

Expected Behavior

Actual Behavior

Reproduction Steps

Other Information

smattheis commented Nov 11, 2019 •

edited

Loading