Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry refactor to allow for batching to retry. #617

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nicklas-dohrn
Copy link
Contributor

@nicklas-dohrn nicklas-dohrn commented Oct 10, 2024

Description

This change mostly changes around, where the retry logic is applied.
Thinking about the things that need to be retried, this is likely to only be needed for networking.
So the logic of retrying should be done after the serialisation of the data for every protocol, to increase performance and locality of problem solutions.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Testing performed?

  • Unit tests
  • Integration tests
  • Acceptance tests

Checklist:

  • This PR is being made against the main branch, or relevant version branch
  • I have made corresponding changes to the documentation
  • I have added testing for my changes

If you have any questions, or want to get attention for a PR or issue please reach out on the #logging-and-metrics channel in the cloudfoundry slack

@nicklas-dohrn nicklas-dohrn requested a review from a team as a code owner October 10, 2024 07:12
Copy link
Member

@ctlong ctlong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a number of stylistic concerns that I've left as comments throughout the code.

In general your approach seems sound. However, I would really like to see some unit tests added for this retry logic in order to validate that it performs as expected.

@@ -15,6 +15,7 @@ const BATCHSIZE = 256 * 1024

type HTTPSBatchWriter struct {
HTTPSWriter
*Retryer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why embed *Retryer rather than using a named field to encapsulate the new struct via composition?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that I would prefer composition in this case because I don't see a good reason to exposes the fields and methods of *Retryer directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was using a Pointer here, so that I could insert the retryer by creating it in the writer factory layer, so that the implementation and logic does not rely on the writers itself to propagate the settings through to the retryer.
Might be a shortcoming due to my limited understanding of Go idiomatic concepts.

)

// RetryWriter wraps a WriteCloser and will retry writes if the first fails.
type Retryer struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just represent Retryer as some new fields and methods in HTTPSBatchWriter since that's the only writer set to use it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion on this, but I think I lean toward just putting all this code into HTTPSBatchWriter so that it's all more self-contained.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about that as well, but when one would introduce the retryer in the way I built it here, it would have more than one benefit:

  1. Retrying after stringifying the rfsyslog message seemed beneficial performance wise.
  2. Having a connection aware retry logic for tls based approaches might result in a more efficient approach to solve connectivity issues (retries are only needed due to connectivity issues)
  3. the retry_writer retries parsing issues for syslog messages #612 is also present for tls/tcp writers. this could be fixed with a dedicated pr.

src/pkg/egress/syslog/retryer.go Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Waiting for Changes | Open for Contribution
Development

Successfully merging this pull request may close these issues.

2 participants