Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[backport to 2.x] backport PR 1316 and fix flaky test #1319

Merged
merged 1 commit into from
Sep 30, 2024

Conversation

kaituo
Copy link
Collaborator

@kaituo kaituo commented Sep 26, 2024

Description

This PR backport #1316 to 2.x

Also, there is a flaky test org.opensearch.ad.e2e.RealTimeRuleModelPerfIT.testRule exposed by the PR. The test intermittently fails with an "incorrect ordering of time" error:

? ERROR][o.o.t.m.ModelManager     ] [integTest-0] Fail to score for [componentName=Phoenix] at [1695121800]: model Id [SJJ-MJIBXTsDe_KkqmOm_entity_xdZvcSGAl1UW0rzpOU5XxA], feature [[6810.0]]
?  java.lang.IllegalArgumentException: incorrect ordering of time
?  	at com.amazon.randomcutforest.CommonUtils.checkArgument(CommonUtils.java:44) ~[randomcutforest-core-4.1.0.jar:?]
?  	at com.amazon.randomcutforest.preprocessor.ImputePreprocessor.getScaledShingledInput(ImputePreprocessor.java:59) ~[randomcutforest-core-4.1.0.jar:?]
?  	at com.amazon.randomcutforest.parkservices.ThresholdedRandomCutForest.initialSetup(ThresholdedRandomCutForest.java:474) ~[randomcutforest-parkservices-4.1.0.jar:?]
?  	at com.amazon.randomcutforest.parkservices.ThresholdedRandomCutForest.augment(ThresholdedRandomCutForest.java:213) ~[randomcutforest-parkservices-4.1.0.jar:?]
?  	at com.amazon.randomcutforest.parkservices.ThresholdedRandomCutForest.process(ThresholdedRandomCutForest.java:261) ~[randomcutforest-parkservices-4.1.0.jar:?]
?  	at org.opensearch.timeseries.ml.ModelManager.score(ModelManager.java:182) ~[opensearch-anomaly-detection-2.18.0.0-SNAPSHOT.jar:2.18.0.0-SNAPSHOT]
?  	at org.opensearch.timeseries.ml.ModelManager.score(ModelManager.java:155) [opensearch-anomaly-detection-2.18.0.0-SNAPSHOT.jar:2.18.0.0-SNAPSHOT]
?  	at org.opensearch.timeseries.ml.ModelManager.getResult(ModelManager.java:103) [opensearch-anomaly-detection-2.18.0.0-SNAPSHOT.jar:2.18.0.0-SNAPSHOT]
?  	at org.opensearch.timeseries.ml.RealTimeInferencer.tryProcess(RealTimeInferencer.java:139) [opensearch-anomaly-detection-2.18.0.0-SNAPSHOT.jar:2.18.0.0-SNAPSHOT]
?  	at org.opensearch.timeseries.ml.RealTimeInferencer.processWithTimeout(RealTimeInferencer.java:112) [opensearch-anomaly-detection-2.18.0.0-SNAPSHOT.jar:2.18.0.0-SNAPSHOT]
?  	at org.opensearch.timeseries.ml.RealTimeInferencer.lambda$processWithTimeout$1(RealTimeInferencer.java:126) [opensearch-anomaly-detection-2.18.0.0-SNAPSHOT.jar:2.18.0.0-SNAPSHOT]
?  	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:946) [opensearch-2.18.0-SNAPSHOT.jar:2.18.0-SNAPSHOT]
?  	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
?  	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
?  	at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
? WARN ][o.o.t.m.RealTimeInferencer] [integTest-0] incorrect ordering of time for config SJJ-MJIBXTsDe_KkqmOm model SJJ-MJIBXTsDe_KkqmOm_entity_xdZvcSGAl1UW0rzpOU5XxA at data end time 1695121800000

Root Cause:

  • When a lock is held during processing, retries are scheduled.
  • In the meantime, another thread with a future sample might acquire the lock and complete scoring.
  • As a result, when the retry occurs, it processes an older sample after a newer one, causing the time ordering error.

Fix Implemented:

  1. Sample Ordering:

    • Introduced a PriorityQueue<Sample> in RealTimeInferencer to maintain samples ordered by their data end time.
    • Ensures samples are processed in chronological order, preventing out-of-order processing when retries occur.
  2. Time Comparison Adjustment:

    • Previously, we imputed data when the current execution end time was larger than the model state's last seen execution end time.
    • In test environments, data end times can be set in the future, causing discrepancies because the last seen execution time is based on the current clock.
    • Updated the logic to compare the current data end time with the model state's last seen data end time.
    • This aligns the comparison with the source of the timestamps, ensuring consistent behavior even with future-dated samples in tests.

Detailed changes can be found here: https://github.com/opensearch-project/anomaly-detection/compare/3445d6d7363f31c936ad2949796952c8f01c5145..519640ff228b97ff6f221821ec55272079cce882

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link

codecov bot commented Sep 26, 2024

Codecov Report

Attention: Patch coverage is 82.69231% with 18 lines in your changes missing coverage. Please review.

Project coverage is 79.59%. Comparing base (cbe1f33) to head (519640f).
Report is 3 commits behind head on 2.x.

Files with missing lines Patch % Lines
...g/opensearch/timeseries/ml/RealTimeInferencer.java 78.04% 7 Missing and 2 partials ⚠️
.../rest/handler/AbstractTimeSeriesActionHandler.java 25.00% 1 Missing and 2 partials ⚠️
.../org/opensearch/timeseries/util/ExpiringValue.java 83.33% 2 Missing ⚠️
.../handler/AbstractAnomalyDetectorActionHandler.java 50.00% 0 Missing and 1 partial ⚠️
...search/ad/transport/ADHCImputeTransportAction.java 66.66% 0 Missing and 1 partial ⚠️
...opensearch/forecast/ForecastTaskProfileRunner.java 66.66% 0 Missing and 1 partial ⚠️
.../rest/handler/AbstractForecasterActionHandler.java 50.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##                2.x    #1319      +/-   ##
============================================
+ Coverage     77.63%   79.59%   +1.95%     
- Complexity     5465     5626     +161     
============================================
  Files           533      534       +1     
  Lines         23334    23408      +74     
  Branches       2311     2322      +11     
============================================
+ Hits          18115    18631     +516     
+ Misses         4160     3669     -491     
- Partials       1059     1108      +49     
Flag Coverage Δ
plugin 79.59% <82.69%> (+1.95%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...ava/org/opensearch/ad/ml/ADRealTimeInferencer.java 100.00% <ø> (ø)
src/main/java/org/opensearch/ad/model/ADTask.java 98.00% <100.00%> (+<0.01%) ⬆️
.../java/org/opensearch/ad/model/AnomalyDetector.java 86.33% <100.00%> (+0.25%) ⬆️
...search/ad/rest/RestIndexAnomalyDetectorAction.java 100.00% <100.00%> (ø)
...search/forecast/ml/ForecastRealTimeInferencer.java 100.00% <ø> (ø)
...va/org/opensearch/forecast/model/ForecastTask.java 98.00% <100.00%> (+<0.01%) ⬆️
...java/org/opensearch/forecast/model/Forecaster.java 78.48% <100.00%> (+0.38%) ⬆️
...earch/forecast/rest/RestIndexForecasterAction.java 79.48% <100.00%> (+9.21%) ⬆️
...ensearch/timeseries/TimeSeriesAnalyticsPlugin.java 99.36% <100.00%> (+<0.01%) ⬆️
...g/opensearch/timeseries/caching/PriorityCache.java 69.54% <100.00%> (ø)
... and 15 more

... and 50 files with indirect coverage changes

@kaituo kaituo changed the title [backport to 2.x] Support forecast tasks in profile API; enable index field modifications [backport to 2.x] backport PR 1316 and fix flaky test Sep 27, 2024
@kaituo kaituo merged commit f532a14 into opensearch-project:2.x Sep 30, 2024
28 of 29 checks passed
kaituo added a commit to kaituo/anomaly-detection-1 that referenced this pull request Oct 1, 2024
kaituo added a commit to kaituo/anomaly-detection-1 that referenced this pull request Oct 1, 2024
kaituo added a commit to kaituo/anomaly-detection-1 that referenced this pull request Oct 1, 2024
kaituo added a commit that referenced this pull request Oct 1, 2024
kaituo added a commit to kaituo/anomaly-detection-1 that referenced this pull request Oct 2, 2024
jackiehanyang pushed a commit that referenced this pull request Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants