-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
loader.efi netboot regression in 2023.11 #1970
Comments
If it's dying in loader that's an upstream bug, we don't turn anything CHERI on there. Can you please verify it's reproducible with 14.0-RELEASE and/or recent 15-CURRENT? |
Yes, this is an upstream regression. Bisection (full sequence of tests below) points at 75b7d39e ("stand: efi_fmtdev can be reduced to devformat") being the culprit, though at a glance it's hard to see why.
|
It's a bit odd that you've hit the MFC commit to stable/13 rather than the one to main... I will say it's not clear those functions are equivalent except in the simplest cases and the commit doesn't explain why only those cases can happen. |
devformat should produce exactly the same results. If not, it's a bug in the dev->d_dev formatting routine (which should default to the default: case for http booting. devformat uses devsw->dv_fmtdev if it exists, and defaults to
if not, which is the same as the default case which was removed i 75b7d3...
I think only DEVT_NONE type devices are different. Since only a few devices in the devsw have fmtdev, they should be teh same. disk has a different one, and zfs has a different one as well.... And I know others have network booted. One could revert that one change, and then unrevert the replaced calls to efi_fmtdev one at a time (there's only 3) to see which one goes south, and what the devdesc that's passed into devformat() function looks like. |
Also, it would be nice to get a symbolic traceback on what's happening. I have some libtraceback code written, but it doesn't quite work so I've not pushed it into Upstream FreeBSD... but knowing which of these calls to efi_fmtdev dies might suffice. |
Manually symbolizing...
In particular,
but |
So the exception is at 0x0000F327C740, which is well outside the range of all the other addresses on the stack. But it has survived the call to devformat... I'm not sure what the device should be... but one test would be to set vfs.root.mountfrom to what you think it should be before issuing 'boot'. This would let us know if it was the return value (and/or the function itself) that's causing this or something else. Worst case, if you set this to something non-sensical, the kernel will just not be able to fine /, so its safe to test to see if the return value from devformat() changes in a way that runs of off the cliff we hit. I'm not familar with cheri enough to know, but are we running in capabilities which trap on buffer overflows, or are we still in mixed mode where they might be possible? |
loader is built as a plain AArch64 binary, no capabilities allowed, since the firmware doesn't save/restore them currently on traps so they would be clobbered at arbitrary points. We only enable capability use in the kernel's locore. |
OK, so we can't rule out an overflow or something similar.... |
I don't think that's right; the return address is 0x0000F7D310DC, which means that |
With apologies, I hadn't tested netbooting this release on the MSRC cluster. It looks like we've got a regression. Using the https://github.com/microsoft/msr-morello-automation tooling using HTTP netboot, 2022.12 boots fine, but 2023.11 (built locally) dies at the very end of
loader.efi
with the below wad of complaints. I have tested on both the 1.5 and 1.7 releases of Morello firmware, with no apparent difference. Fortunately,loader_lua.efi
from 2022.12 continues to function, so that's pretty convenient as workarounds go.The text was updated successfully, but these errors were encountered: