Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[enhancement](plugin) logstash: add retry queue without blocking tasks #44999

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

joker-star-l
Copy link
Contributor

@joker-star-l joker-star-l commented Dec 4, 2024

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

  1. The retry task enters the retry queue separately and no longer blocks the normal queue;
  2. Data quality issues are not retried.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@joker-star-l
Copy link
Contributor Author

run buildall

qidaye
qidaye previously approved these changes Dec 6, 2024
Copy link
Contributor

@qidaye qidaye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 6, 2024
Copy link
Contributor

github-actions bot commented Dec 6, 2024

PR approved by at least one committer and no changes requested.

Copy link
Contributor

github-actions bot commented Dec 6, 2024

PR approved by anyone and no changes requested.

end

def run
@retry_queue << @event
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Is there any limit on length of retry_queue?
  2. Why not just add event to retry_queue but use a task and timer to do it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. No. Because there won't be too many failures.
  2. Because we need to use a timer to make the event wait for a certain amount of time before being queued and retried, rather than retrying immediately.

extension/logstash/lib/logstash/outputs/doris.rb Outdated Show resolved Hide resolved
@joker-star-l
Copy link
Contributor Author

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Dec 10, 2024
@joker-star-l
Copy link
Contributor Author

run buildall

@joker-star-l
Copy link
Contributor Author

run buildall

@joker-star-l
Copy link
Contributor Author

run buildall

qidaye
qidaye previously approved these changes Dec 24, 2024
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 24, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

# Run named Timer as daemon thread
@timer = java.util.Timer.new("Doris Output #{self.params['id']}", true)
# The queue in Timer is unbounded and uncontrollable, so use a new queue to control the amount
@count_block_queue = java.util.concurrent.ArrayBlockingQueue.new(128)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should use LinkedBlockingQueue here since it's very offen to do insert and delete.

documents << event_body(event) << "\n"
event_num += 1
end
documents = events.map { |event| event_body(event) }.join("\n")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just refactor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reduce allocation times

begin
response_json = JSON.parse(response.body)
rescue => _
@logger.warn("doris stream load response: #{response} is not a valid JSON")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should return or do something else instead of just go ahead.


@logger.warn("FAILED doris stream load response:\n#{response}")
# if there are data quality issues, we do not retry
if (status == 'Fail' && response_json['Message'].start_with?("[DATA_QUALITY_ERROR]")) || (@max_retries >= 0 && req_count > @max_retries)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's no necessary to do special check for DATA_QUALITY_ERROR, since it's should be handled by max_filter_ration config.

req_count += 1
@logger.warn("Will do retry #{req_count} after #{sleep_for} secs.")
timer_task = RetryTimerTask.new(@retry_queue, @count_block_queue, [documents, http_headers, event_num, req_count])
@count_block_queue.put(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why put 0 to count_block_queue?

@joker-star-l
Copy link
Contributor Author

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Dec 27, 2024
req_count += 1
@logger.warn("Will do the #{req_count-1}th retry after #{sleep_for} secs.")
delay_event = DelayEvent.new(sleep_for, [documents, http_headers, event_num, req_count])
add_event_to_retry_queue(delay_event, req_count <= 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why block just for req_count <= 1 ?

@@ -72,6 +73,7 @@ class LogStash::Outputs::Doris < LogStash::Outputs::Base

config :log_progress_interval, :validate => :number, :default => 10

config :retry_queue_size, :validate => :number, :default => 128
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

128 may be too large if the request batch size is large, eg 100MB. So you should limit on queued bytes instead of items.

sleep(sleep_rand)
@logger.warn("FAILED doris stream load response:\n#{response}")
# if there are data quality issues, we do not retry
if (status == 'Fail' && response_json['Message'].start_with?("[DATA_QUALITY_ERROR]")) || (@max_retries >= 0 && req_count-1 > @max_retries)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DATA_QUALITY_ERROR should be processed by setting max_filter_ratio instead of hard code here.

begin
response_json = JSON.parse(response.body)
rescue => _
@logger.warn("doris stream load response is not a valid JSON:\n#{response}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do more exception handling instead of just log warning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants