Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework network_socket_close error paths. #22

Merged
merged 1 commit into from
May 10, 2024
Merged

Conversation

hlef
Copy link
Collaborator

@hlef hlef commented Apr 19, 2024

49f42c1 forgot to add unlocks on error paths in network_socket_close. Address this.

At the same time, clarify that we have two types of errors here: those where the call can be retried (e.g., called with the wrong malloc capability, or an invalid timeout), and those which cannot be retried (the socket close failed due to an unspecified error and we went ahead to free it). Both are currently under -EAGAIN. I think that we should differentiate the two to give a chance to the caller to call again with the right arguments. Address this by moving unrecoverable errors to -ENOTRECOVERABLE and update the documentation accordingly.

Finally, a socket close return value of 0 is unrecoverable and most likely due to an internal error, so in that case we should go ahead and free everything instead of returning -EAGAIN.

Note: Since all errors currently under -ENOTRECOVERABLE are internal, not supposed to happen (modulo bug), and not acted upon (we're leaking memory, but there really isn't anything we can do about that apart from maybe reseting, once we have it working?), we could also simply move them as debug asserts and ignore the failure.

@hlef hlef requested a review from davidchisnall April 19, 2024 08:22
lib/tcpip/network_wrapper.cc Outdated Show resolved Hide resolved
// Don't return now, try to at least free the token.
ret = -EINVAL;
ret = -ENOTRECOVERABLE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the only way to reach this is to have insufficient stack to call heap_free. Is there any other way that it can happen?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Of course there is also the possibility that the caller frees it concurrently as it owns the heap capability but this has no impact here. And there is also the possibility of a bug somewhere, but in theory it shouldn't happen.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can check for the former case by checking -ECOMPARTMENTFAIL on the heap_free return and the latter by checking the tag bit.

In the case where there's a concurrent free, we should continue freeing the other things.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case where there's a concurrent free, we should continue freeing the other things.

This is already what we do?

We can check for the former case by checking -ECOMPARTMENTFAIL on the heap_free return and the latter by checking the tag bit.

What should we check it for exactly? To provide a more explicit error code?

lib/tcpip/network_wrapper.cc Outdated Show resolved Hide resolved
lib/tcpip/network_wrapper.cc Outdated Show resolved Hide resolved
lib/tcpip/network_wrapper.cc Outdated Show resolved Hide resolved
// Don't return now, try to at least free the token.
ret = -EINVAL;
ret = -ENOTRECOVERABLE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can check for the former case by checking -ECOMPARTMENTFAIL on the heap_free return and the latter by checking the tag bit.

In the case where there's a concurrent free, we should continue freeing the other things.

@hlef hlef force-pushed the hlefeuvre/fix-locking-close branch 2 times, most recently from e314486 to b7412d8 Compare May 1, 2024 23:07
@hlef
Copy link
Collaborator Author

hlef commented May 1, 2024

I have fixed the comments. Note that this will fail until we merge CHERIoT-Platform/cheriot-rtos#213

49f42c1 forgot to add unlocks on error
paths in `network_socket_close`. Address this by using a `LockGuard` and
the new `release` API.

At the same time, clarify that we have two types of errors here: those
where the call can be retried (e.g., called with the wrong malloc
capability, or an invalid timeout), and those which cannot be retried
(the socket close failed due to an unspecified error and we went ahead
to free it). Both are currently under -EAGAIN. We should differentiate
the two to give a chance to the caller to call again with the right
arguments. Address this by moving unrecoverable errors to
-ENOTRECOVERABLE and update the documentation accordingly.

Finally, a socket close return value of 0 is unrecoverable and most
likely due to an internal error, so in that case we should go ahead and
free everything instead of returning -EAGAIN.

Signed-off-by: Hugo Lefeuvre <[email protected]>
@hlef hlef force-pushed the hlefeuvre/fix-locking-close branch from b7412d8 to 8306872 Compare May 2, 2024 16:21
@hlef hlef merged commit 7142e42 into main May 10, 2024
2 checks passed
@hlef hlef deleted the hlefeuvre/fix-locking-close branch May 10, 2024 07:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants