Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spike-01 throwing a Uncorrectable Machine Check Exception #278

Closed
Firefishy opened this issue Feb 17, 2019 · 6 comments
Closed

spike-01 throwing a Uncorrectable Machine Check Exception #278

Firefishy opened this issue Feb 17, 2019 · 6 comments

Comments

@Firefishy
Copy link
Member

Firefishy commented Feb 17, 2019

Server spike-01 is occasionally throwing an Uncorrectable Machine Check Exception.

I have been unable to figure out what component has the issue. We have previously had issues with this server #89.

1346 Critical       04:07  11/05/2018 04:07  11/05/2018 0001
LOG: Uncorrectable Machine Check Exception (Board 0, Processor 1, APIC ID 0x00000010, Bank 0x00000005, Status 0xBA000000'00400405, Address 0x00000000'00000000, Misc 0x00000000'00004280)

1347 Critical       04:07  11/05/2018 04:07  11/05/2018 0001
LOG: Uncorrectable Machine Check Exception (Board 0, Processor 1, APIC ID 0x00000011, Bank 0x00000005, Status 0xBA000000'00400405, Address 0x00000000'00000000, Misc 0x00000000'00004280)

1348 Critical       09:42  12/19/2018 09:42  12/19/2018 0001
LOG: Uncorrectable Machine Check Exception (Board 0, Processor 1, APIC ID 0x00000010, Bank 0x00000005, Status 0xBA000000'00400405, Address 0x00000000'00000000, Misc 0x00000000'00000080)

1349 Critical       09:42  12/19/2018 09:42  12/19/2018 0001
LOG: Uncorrectable Machine Check Exception (Board 0, Processor 1, APIC ID 0x00000011, Bank 0x00000005, Status 0xBA000000'00400405, Address 0x00000000'00000000, Misc 0x00000000'00000080)

1350 Critical       01:53  01/16/2019 01:53  01/16/2019 0001
LOG: Uncorrectable Machine Check Exception (Board 0, Processor 1, APIC ID 0x00000010, Bank 0x00000005, Status 0xBA000000'00400405, Address 0x00000000'00000000, Misc 0x00000000'00004300)

1351 Critical       01:53  01/16/2019 01:53  01/16/2019 0001
LOG: Uncorrectable Machine Check Exception (Board 0, Processor 1, APIC ID 0x00000011, Bank 0x00000005, Status 0xBA000000'00400405, Address 0x00000000'00000000, Misc 0x00000000'00004300)
@Firefishy
Copy link
Member Author

Options available:

  1. Replace all RAM.
  2. Replace CPUs.
  3. Replace Motherboard.
  4. Replace Machine.

@pnorman
Copy link
Collaborator

pnorman commented Feb 17, 2019

There is a related BIOS upgrade.

If that doesn't work, what makes the most sense will depend on when we plan to refresh these machines, so we need to decide that.

@Firefishy
Copy link
Member Author

We are running the most recent BIOS version:

Handle 0x0000, DMI type 0, 24 bytes
BIOS Information
        Vendor: HP
        Version: P64
        Release Date: 05/21/2018

@pnorman
Copy link
Collaborator

pnorman commented Feb 18, 2019

We've discussed using this as another remote hands test. Let's do it - the RAM isn't that expensive and it's the next logical thing to test with them. Do you want to place the order and I'll do the portal tickets?

@pnorman
Copy link
Collaborator

pnorman commented Feb 21, 2019

Added @Firefishy to assignee list for placing the RAM order

@pnorman
Copy link
Collaborator

pnorman commented Jun 20, 2019

spike-01 is gone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants