Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(bigtable): retry unexpected EOF errors #11276

Closed

Conversation

martin-sucha
Copy link

We see occasional unexpected EOF errors returned from ReadRows. These errors are coming from the stream.RecvMsg call.

We have seen similar error messages also from some GCS calls. I assume the underlying TCP connection is terminated while gRPC is reading a message from the wire.

I don't know the exact place where in the gRPC library the error originates, but considering that a RST_STREAM is retried, it seems that we should retry also unexpected EOF.

We see occasional unexpected EOF errors returned from ReadRows.
These errors are coming from the stream.RecvMsg call.

We have seen similar error messages also from some GCS calls.
I assume the underlying TCP connection is terminated
while gRPC is reading a message from the wire.

I don't know the exact place where in the gRPC library the error
originates, but considering that a RST_STREAM is retried,
it seems that we should retry also unexpected EOF.
@martin-sucha martin-sucha requested review from a team as code owners December 12, 2024 10:00
@product-auto-label product-auto-label bot added the api: bigtable Issues related to the Bigtable API. label Dec 12, 2024
@martin-sucha
Copy link
Author

Please let me know if you think ErrUnexpectedError should be checked by value instead of the error message.

@mutianf
Copy link
Contributor

mutianf commented Jan 7, 2025

We know when RST_STREAM will occur so it's ok to check on this error. However, I'm not sure when ErrUnexpectedEOF will be thrown, so I'm not sure if we should blindly retry it. Is there any way for us to reproduce this erro?

@mutianf
Copy link
Contributor

mutianf commented Jan 8, 2025

Looks like compute and bigquery also retries ErrUnexpectedEOF error. I'm ok with this fix, but I think we should check on the ErrUnexpectedError instead of the message.

@martin-sucha
Copy link
Author

I don't know how to reproduce.

However, I have found another error in logs that might be related: received 4294967294-bytes data exceeding the limit 131070 bytes. It seems that grpc-go closes the stream with io.EOF in that case:

https://github.com/grpc/grpc-go/blob/d0bf90aeb9b5bdf4031d812dbb743b0eb616c7b2/internal/transport/http2_client.go#L1203-L1203

https://github.com/grpc/grpc-go/blob/d0bf90aeb9b5bdf4031d812dbb743b0eb616c7b2/internal/transport/http2_client.go#L1203-L1203

So it could theoretically be connected. Will try to get more information, but it could take a few days until the error appears again.

@mutianf
Copy link
Contributor

mutianf commented Jan 9, 2025

If ErrUnexpectedEOF could be coming from received 4294967294-bytes data exceeding the limit 131070 bytes, it shouldn't be retried, because it means that some read failed and it's not a network transient error.

@mutianf
Copy link
Contributor

mutianf commented Jan 10, 2025

Closing this PR, because I think in Bigtable's case ErrUnexpectedEOF could be thrown with actual read error and we shouldn't hide the error message.

@mutianf mutianf closed this Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigtable Issues related to the Bigtable API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants