Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle network file not found like disks #695

Merged
merged 3 commits into from
Jan 15, 2025

Conversation

dbnicholson
Copy link
Contributor

When the second stage can't be found, shim will try the default second stage. That only work when reading the image from disk since the network protocols don't return EFI_NOT_FOUND. Convert the appropriate network error codes to EFI_NOT_FOUND so they're treated the same way.

See #649.

@dbnicholson
Copy link
Contributor Author

I haven't tested this, but it looks appropriate to me.

@mikebeaton
Copy link
Contributor

@dbnicholson - Is this right? Just a flyby comment, but I would guess that most (all?) failures when connecting to TFTP or HTTP should be equivalent to Not Found for the purposes of failing over to the default second stage?

@nathan-omeara
Copy link

This doesn't seem to work. I tested in my lab, and shim does not try the fallback default loader. I pulled up a wireshark capture to make sure the TFTP response set the error code right, and it appears right.

image

@dbnicholson
Copy link
Contributor Author

@mikebeaton Personally I would want to keep it narrow. Shim has a case to handle file not found and TFTP has an error for file not found. More specifically, when loading images from disk, shim only does the fallback for EFI_INVALID_PARAMETER and EFI_NOT_FOUND. For reference, there are lots of other errors that could be encountered:

Shim doesn't try to handle EFI_NO_MEDIA or EFI_DEVICE_ERROR, for example. You may notice that none of those functions return EFI_INVALID_PARAMETER. That only happens when shim sets the error that way. So, when loading images off disk, you really only get the fallback to the default loader when the volume or file can't be found. Not for anything else, and I don't think network booting should be different unless the disk rules change, too.

If I was to revise my PR, it would be to map more error codes to simple file equivalents. In other words, make the interface for reading an image from network the same as reading from disk. If that was the case, you could discuss expanding the cases where shim automatically tries the default loader instead of failing.

@dbnicholson
Copy link
Contributor Author

This doesn't seem to work. I tested in my lab, and shim does not try the fallback default loader. I pulled up a wireshark capture to make sure the TFTP response set the error code right, and it appears right.

Thanks for trying that. Maybe I'll have to spin up a TFTP server to see what the error looks like from the firmware side.

@nathan-omeara
Copy link

nathan-omeara commented Oct 3, 2024

I added some debug prints.. TftpErrorReceived is 0, TftpError.ErrorCode is 0.

the error isn't getting passed back correctly, I guess?

netboot.c Outdated
pxe->Mode->TftpError.ErrorString);

switch (pxe->Mode->TftpError.ErrorCode) {
case TFTP_ERROR_NOT_FOUND:
Copy link

@nathan-omeara nathan-omeara Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I think TFTP_ERROR_ACCESS would also make sense to fall back on.. honestly, almost any error from the server side should probably trigger a retry with the default filename.

Same on the HTTP side, 'not found/404' is the obvious one, but 403s, 50Xs, other things could also be dependent on the filename requested, so it would make sense to retry with the default loader name with almost any error returned from the server.

On the other hand, network layer errors don't make sense to retry with a different filename - but I'm not sure how to differentiate those here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are really 2 issues here:

  1. The shim frontend doesn't handle the errors raised by the network backends. That's what I'm trying to address here. I've chosen to translate the network backend errors to the file backend errors (although I only added file not found here). That way the shim frontend doesn't need to special case the errors returned from the network backends.
  2. Shim should fallback to the default loader in more cases. I agree shim should try the default loader for an access denied error. While it's likely reading the default loader will also fail, trying is better than failing outright.

I'm going to rework the patches a bit more so more errors are mapped and then I'll add a commit that also tries the default loader on access denied.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the patches to map all TFTP and HTTP error codes to EFI status codes, but I decided not to change the types of errors that shim falls back to the default loader on. The policy that's there is from 50732ee and it's to cover the case where shim has incorrectly decided what the second stage loader filename is. If you want the policy to be more like "always try the default second stage if loading the specified one fails", then that's really a separate thing. FWIW, I wrote a patch to add EFI_ACCESS_DENIED here.

@mikebeaton
Copy link
Contributor

@mikebeaton Personally I would want to keep it narrow. Shim has a case to handle file not found and TFTP has an error for file not found.

Hi again - Just to be clear what I was thinking of, I was especially considering HTTP, where 404 Not Found is very specific - it means everything regarding connecting to the site was successful, but that the specific named document is not on the site. For instance, the site itself might not be found, and that is not a not found error. So I guess at least the site itself not being found is quite similar to the volume not being found? That was the thinking behind the suggestion. I'm not sure if that realistically adds any errors to your list, since I don't think you'd even get an HTTP error in that case.

@dbnicholson
Copy link
Contributor Author

I added some debug prints.. TftpErrorReceived is 0, TftpError.ErrorCode is 0.

the error isn't getting passed back correctly, I guess?

That's too bad. I think the firmware is responsible for filling that in, but maybe you have to tell it to first. I'll look closer but I suppose you could just map EFI_TFTP_ERROR to EFI_NOT_FOUND right there.

@dbnicholson
Copy link
Contributor Author

Hi again - Just to be clear what I was thinking of, I was especially considering HTTP, where 404 Not Found is very specific - it means everything regarding connecting to the site was successful, but that the specific named document is not on the site. For instance, the site itself might not be found, and that is not a not found error. So I guess at least the site itself not being found is quite similar to the volume not being found? That was the thinking behind the suggestion. I'm not sure if that realistically adds any errors to your list, since I don't think you'd even get an HTTP error in that case.

Yeah, that's exactly what I was saying about the disk case. EFI_NOT_FOUND is only returned by Open() when you already opened the volume. All the other errors you'd encounter trying to load an image from disk will skip the fallback to the default loader. An HTTP 403, for example, doesn't mean the file wasn't found. It means that you're not authorized to access it. In the disk case you'd get EFI_ACCESS_DENIED. I think that if you couldn't access the site at all (DNS resolution failure, no route, etc), I think you'd end up with EFI_DEVICE_ERROR in the HTTP request. That would be very similar to the EFI_NO_MEDIA case for OpenVolume(). That still wouldn't get you to the fallback case, though.

@dbnicholson
Copy link
Contributor Author

I added some debug prints.. TftpErrorReceived is 0, TftpError.ErrorCode is 0.

the error isn't getting passed back correctly, I guess?

I also got this with OVMF and it was a simple byte swapping issue. See tianocore/edk2#6287. I imagine that most UEFI firmwares use this TFTP code rather than roll their own. It's largely unchanged from when Intel contributed it 14 years ago.

With that change, shim in verbose mode reports:

Fetching Netboot Image /grubx64.efi
TFTP error 1: File not found
Unable to fetch TFTP image: Not Found
start_image() returned Not Found, falling back to default loader

Since this is likely to affect a lot of people, I guess I'll just include a fallback to map EFI_TFTP_ERROR to EFI_NOT_FOUND.

@nathan-omeara
Copy link

tested this branch on my latitude 5300 again - this updated branch does correctly fallback to default loader.

The define can be dropped when gnu-efi is updated to include
de6f9259e8476495c78babbc25250a59de7f3942.

Signed-off-by: Dan Nicholson <[email protected]>
This allows the caller to make a more informed decision about how to
handle the error. The error code is available in the TftpError field of
the Mode interface. In particular, by mapping the TFTP errors to
EFI_INVALID_PARAMTER and EFI_NOT_FOUND, shim will try to fetch the
default second stage image like it does when loading images from disk.

Unfortunately, some firmware doesn't fill in the error fields, so a
generic EFI_TFTP_ERROR to EFI_NOT_FOUND conversion is included.

Signed-off-by: Dan Nicholson <[email protected]>
This allows the caller to make a more informed decision about how to
handle the error. In particular, by mapping HTTP errors to
EFI_INVALID_PARAMETER and EFI_NOT_FOUND, shim will try to fetch the
default second stage image like it does when loading images from disk.

Note that this also changes the default to return EFI_HTTP_ERROR instead
of EFI_ABORTED. This should not change any behavior as EFI_ABORTED
wasn't being handled before, but letting the caller decide what to do
with an unknown HTTP error is more appropriate.

Signed-off-by: Dan Nicholson <[email protected]>
@vathpela vathpela merged commit ef8e729 into rhboot:main Jan 15, 2025
1 check passed
@dbnicholson dbnicholson deleted the netboot-not-found branch January 15, 2025 21:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants