Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

output: Retry document-level 429s by default #13620

Merged

Conversation

marclop
Copy link
Contributor

@marclop marclop commented Jul 8, 2024

Motivation/summary

Updates the APM Server to automatically retry document-level 429s from Elasticsearch to avoid dropping data. It can be configured/overwritten by output.elasticsearch.max_retries, and defaults to 3.

It uses the default backoff configuration, which could wait up to 1m if enough retries are configured, but can be overwritten as well.

This reduces data loss whenever Elasticsearch is overhelmed and reduces the chances of data loss due to ES push-back.

Checklist

How to test these changes

This may need a bit of setup. It's thoroughly tested in the go-docappender, but we may want to verify that retries happen when document-level 429s happen.

I'd suggest writing a small "proxy" that returns 429s randomly and test that things are indexed (as duplicates).

Related issues

Closes #13619

Updates the APM Server to automatically retry document-level `429`s from
Elasticsearch to avoid dropping data. It can be configured/overwritten
by `output.elasticsearch.max_retries`, and defaults to `3`.

It uses the default backoff configuration, which could wait up to 1m if
enough retries are configured, but can be overwritten as well.

Signed-off-by: Marc Lopez Rubio <[email protected]>
@marclop marclop requested a review from a team as a code owner July 8, 2024 06:51
Copy link
Contributor

mergify bot commented Jul 8, 2024

This pull request does not have a backport label. Could you fix it @marclop? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-7.17 is the label to automatically backport to the 7.17 branch.
  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit.

NOTE: backport-skip has been added to this pull request.

@mergify mergify bot added the backport-skip Skip notification from the automated backport with mergify label Jul 8, 2024
Copy link
Member

@carsonip carsonip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retrying on document level 429 is the default if RetryOnDcoumentStatus is empty: https://github.com/elastic/go-docappender/blob/main/config.go#L76

@marclop
Copy link
Contributor Author

marclop commented Jul 8, 2024

@carsonip yes, I'm aware. I wanted to make it explicit to avoid any potential breaking changes going forward. It's also clearer when reading the code what will happen.

Copy link
Member

@carsonip carsonip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this PR is a no-op, but I was wrong. MaxDocumentRetries was 0 and it is now non-zero. Changes lgtm but it will need a changelog.

Signed-off-by: Marc Lopez Rubio <[email protected]>
changelogs/head.asciidoc Outdated Show resolved Hide resolved
@marclop marclop added backport-8.15 Automated backport with mergify and removed backport-skip Skip notification from the automated backport with mergify labels Jul 8, 2024
@marclop marclop merged commit 28436dc into elastic:main Jul 8, 2024
14 checks passed
@marclop marclop deleted the f/retry-document-level-429s-by-default branch July 8, 2024 08:56
mergify bot pushed a commit that referenced this pull request Jul 8, 2024
Updates the APM Server to automatically retry document-level `429`s from
Elasticsearch to avoid dropping data. It can be configured/overwritten
by `output.elasticsearch.max_retries`, and defaults to `3`.

It uses the default backoff configuration, which could wait up to 1m if
enough retries are configured, but can be overwritten as well.

---------

Signed-off-by: Marc Lopez Rubio <[email protected]>
Co-authored-by: Carson Ip <[email protected]>
(cherry picked from commit 28436dc)

# Conflicts:
#	changelogs/head.asciidoc
mergify bot added a commit that referenced this pull request Jul 15, 2024
… (#13621)

* output: Retry document-level `429`s by default (#13620)

Updates the APM Server to automatically retry document-level `429`s from
Elasticsearch to avoid dropping data. It can be configured/overwritten
by `output.elasticsearch.max_retries`, and defaults to `3`.

It uses the default backoff configuration, which could wait up to 1m if
enough retries are configured, but can be overwritten as well.

---------

Signed-off-by: Marc Lopez Rubio <[email protected]>
Co-authored-by: Carson Ip <[email protected]>
(cherry picked from commit 28436dc)

# Conflicts:
#	changelogs/head.asciidoc

* Fix conflicts, update changelog

Signed-off-by: Marc Lopez Rubio <[email protected]>

---------

Signed-off-by: Marc Lopez Rubio <[email protected]>
Co-authored-by: Marc Lopez Rubio <[email protected]>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.15 Automated backport with mergify enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable document-level retries by default
2 participants