Rework `network_socket_close` error paths. #22

hlef · 2024-04-19T08:22:03Z

49f42c1 forgot to add unlocks on error paths in network_socket_close. Address this.

At the same time, clarify that we have two types of errors here: those where the call can be retried (e.g., called with the wrong malloc capability, or an invalid timeout), and those which cannot be retried (the socket close failed due to an unspecified error and we went ahead to free it). Both are currently under -EAGAIN. I think that we should differentiate the two to give a chance to the caller to call again with the right arguments. Address this by moving unrecoverable errors to -ENOTRECOVERABLE and update the documentation accordingly.

Finally, a socket close return value of 0 is unrecoverable and most likely due to an internal error, so in that case we should go ahead and free everything instead of returning -EAGAIN.

Note: Since all errors currently under -ENOTRECOVERABLE are internal, not supposed to happen (modulo bug), and not acted upon (we're leaking memory, but there really isn't anything we can do about that apart from maybe reseting, once we have it working?), we could also simply move them as debug asserts and ignore the failure.

lib/tcpip/network_wrapper.cc

davidchisnall · 2024-04-19T10:11:54Z

lib/tcpip/network_wrapper.cc

 			  // Don't return now, try to at least free the token.
-			  ret = -EINVAL;
+			  ret = -ENOTRECOVERABLE;


I believe the only way to reach this is to have insufficient stack to call heap_free. Is there any other way that it can happen?

Agree. Of course there is also the possibility that the caller frees it concurrently as it owns the heap capability but this has no impact here. And there is also the possibility of a bug somewhere, but in theory it shouldn't happen.

We can check for the former case by checking -ECOMPARTMENTFAIL on the heap_free return and the latter by checking the tag bit.

In the case where there's a concurrent free, we should continue freeing the other things.

In the case where there's a concurrent free, we should continue freeing the other things.

This is already what we do?

We can check for the former case by checking -ECOMPARTMENTFAIL on the heap_free return and the latter by checking the tag bit.

What should we check it for exactly? To provide a more explicit error code?

lib/tcpip/network_wrapper.cc

davidchisnall · 2024-04-29T10:57:26Z

lib/tcpip/network_wrapper.cc

 			  // Don't return now, try to at least free the token.
-			  ret = -EINVAL;
+			  ret = -ENOTRECOVERABLE;


We can check for the former case by checking -ECOMPARTMENTFAIL on the heap_free return and the latter by checking the tag bit.

In the case where there's a concurrent free, we should continue freeing the other things.

hlef · 2024-05-01T23:11:57Z

I have fixed the comments. Note that this will fail until we merge CHERIoT-Platform/cheriot-rtos#213

49f42c1 forgot to add unlocks on error paths in `network_socket_close`. Address this by using a `LockGuard` and the new `release` API. At the same time, clarify that we have two types of errors here: those where the call can be retried (e.g., called with the wrong malloc capability, or an invalid timeout), and those which cannot be retried (the socket close failed due to an unspecified error and we went ahead to free it). Both are currently under -EAGAIN. We should differentiate the two to give a chance to the caller to call again with the right arguments. Address this by moving unrecoverable errors to -ENOTRECOVERABLE and update the documentation accordingly. Finally, a socket close return value of 0 is unrecoverable and most likely due to an internal error, so in that case we should go ahead and free everything instead of returning -EAGAIN. Signed-off-by: Hugo Lefeuvre <[email protected]>

hlef requested a review from davidchisnall April 19, 2024 08:22

davidchisnall reviewed Apr 19, 2024

View reviewed changes

davidchisnall reviewed Apr 29, 2024

View reviewed changes

hlef force-pushed the hlefeuvre/fix-locking-close branch 2 times, most recently from e314486 to b7412d8 Compare May 1, 2024 23:07

hlef force-pushed the hlefeuvre/fix-locking-close branch from b7412d8 to 8306872 Compare May 2, 2024 16:21

davidchisnall approved these changes May 10, 2024

View reviewed changes

hlef merged commit 7142e42 into main May 10, 2024
2 checks passed

hlef deleted the hlefeuvre/fix-locking-close branch May 10, 2024 07:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework `network_socket_close` error paths. #22

Rework `network_socket_close` error paths. #22

hlef commented Apr 19, 2024

davidchisnall Apr 19, 2024

hlef Apr 19, 2024

davidchisnall Apr 29, 2024

hlef May 1, 2024

davidchisnall Apr 29, 2024

hlef commented May 1, 2024

Rework network_socket_close error paths. #22

Rework network_socket_close error paths. #22

Conversation

hlef commented Apr 19, 2024

davidchisnall Apr 19, 2024

Choose a reason for hiding this comment

hlef Apr 19, 2024

Choose a reason for hiding this comment

davidchisnall Apr 29, 2024

Choose a reason for hiding this comment

hlef May 1, 2024

Choose a reason for hiding this comment

davidchisnall Apr 29, 2024

Choose a reason for hiding this comment

hlef commented May 1, 2024

Rework `network_socket_close` error paths. #22

Rework `network_socket_close` error paths. #22