Too many time series warning #2901

codebien · 2023-02-06T12:08:46Z

At the moment the implementation doesn't support turning it off, we may consider using 0 for it in the future.

codecov-commenter · 2023-02-06T12:17:39Z

Codecov Report

Merging #2901 (21befe9) into master (4f5c142) will increase coverage by 0.09%.
The diff coverage is 100.00%.

❗ Current head 21befe9 differs from pull request most recent head 1902a96. Consider uploading reports for the commit 1902a96 to get more accurate results

@@            Coverage Diff             @@
##           master    #2901      +/-   ##
==========================================
+ Coverage   76.84%   76.94%   +0.09%     
==========================================
  Files         228      228              
  Lines       16988    17018      +30     
==========================================
+ Hits        13055    13094      +39     
+ Misses       3087     3081       -6     
+ Partials      846      843       -3

Flag	Coverage Δ
ubuntu	`76.94% <100.00%> (+0.09%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
lib/testutils/logrus_hook.go	`100.00% <100.00%> (ø)`
metrics/engine/engine.go	`88.98% <100.00%> (+0.09%)`	⬆️
metrics/engine/ingester.go	`91.48% <100.00%> (+6.30%)`	⬆️

... and 6 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

imiric

Nice work 👍

I have some suggestions, particularly around testing.

metrics/engine/ingester.go

imiric · 2023-02-07T11:14:15Z

metrics/engine/ingester_test.go

+	ingester.seenTimeSeries[metrics.TimeSeries{
+		Metric: testMetric,
+		Tags:   piState.Registry.RootTagSet().With("a", "2"),
+	}] = struct{}{}
+	ingester.warnTimeSeriesLimit()


I think this test is OK as a warnTimeSeriesLimit() unit test, but you're not testing that flushMetrics() will actually run this check and emit the warning.

So I'd rather see a higher-level test at the output level, but feel free to keep this one as well.

This is still not covered.

Can you add another test that calls flushMetrics() and confirms the warning is logged?

codebien · 2023-03-14T18:22:18Z

@imiric @na-- I pushed an HyperLogLog implementation, the last commit contains a comparison between a map and a potential HyperLogLog. The map is doing ~4x in B/op.

This is good but it has a cost, the first is the obvious complexity of the implementation, but also the CPU effort is different.

Complexity: The current hashing mechanism for the TimeSeries struct uses a hacky way to reuse the Go's hash function used by the map type which is very efficient. On the other side, this is fragile because it depends on Go's internals. In the case, we would adopt this solution we should probably reproduce directly in k6 the current hack that the external library is doing, so we can drop the new dependency. I haven't found another way to do this without relying on slow things.
In terms of HLL implementation I think we can have our internal implementation but for a first release should be good to rely on the external library.
CPUtime: The map solution is mostly invisible, instead HLL.Add makes an addition of ~150ns/op.

imiric

Nicely done! Almost LGTM, but I think we're missing a flushMetrics() test.

The HLL lib is a large dependency, but I agree that it's worth it in this case.

metrics/engine/ingester.go

imiric · 2023-03-23T11:33:29Z

metrics/engine/ingester_test.go

+	ingester.seenTimeSeries[metrics.TimeSeries{
+		Metric: testMetric,
+		Tags:   piState.Registry.RootTagSet().With("a", "2"),
+	}] = struct{}{}
+	ingester.warnTimeSeriesLimit()


This is still not covered.

Can you add another test that calls flushMetrics() and confirms the warning is logged?

lib/testutils/logrus_hook.go

metrics/engine/ingester_test.go

na-- · 2023-03-27T17:07:27Z

Sorry for the late review 😞

Honestly, I think I am leaning towards the map solution, despite being the one to propose HLL... 🤔 For 100k elements, we get 4x the memory usage, but that is still only 3MB, and it's faster. For 1 million elements, we get a slightly better result, 50MB for a map vs 8MB for HLL (and HLL is a bit faster). But 50MB is still basically nothing, especially compared to the actual memory usage for the strings for these 1M time series, which will probably be several GB 😅

I don't see a situation where the memory usage of this counter would even remotely matter, compared to everything else. If we were already using a HLL library for something else in the k6 codebase, it might be worth it to use it here as well. But I don't think it's worth the huge dependency just for this. Besides, the map solution provides us with an exact count of the time series, while the HLL variant will only have an approximate count, so its UX would be a bit worse.

TLDR: I vote for the map

metrics/engine/ingester.go

mstoykov · 2023-03-29T08:40:11Z

metrics/engine/ingester.go

+	// we don't care about overflow
+	// the process should be already OOM
+	// if the number of generated time series goes higher than N-hundred-million(s).
+	cc.timeSeriesLimit *= 2


Is the idea that we want them to get the message multiple times as they continue to make more time series?

I am not really against that, just in the case that we forgo this, we can actually stop counting once the limit is hit and even throw away the seen map.

Not really certain if users will be better off seeing this on 100k, 200k, 400k, 800k, 1.6m and so on time series.

🤔 yeah, maybe we can show that warning 5 times and then stop tracking the time series count 🤔 not that such an optimization will help very much at 3.2million time series, but still 😅

I am mostly arguing for doing it only once.

I doubt that if this warning is ignored/not seen once it being there 3-4 more times will be that more beneficial if at all.

But thsi definitely can be done later.

I'd argue that it would be useful to see it a few times, since the time between warnings would give information about how quickly the number of time series grows, which is impossible to see or guess otherwise with local k6 runs, and not always easy even with an external output.

I think it is also helpful for us, in the case of reports, to see what it is the common order of magnitude. We should act differently if they often hit around millions instead of thousands.

codebien added this to the v0.44.0 milestone Feb 6, 2023

codebien self-assigned this Feb 6, 2023

github-actions bot requested review from imiric and na-- February 6, 2023 12:09

codebien force-pushed the time-series-limit branch from 43a0ba2 to 8509d21 Compare February 6, 2023 12:16

imiric suggested changes Feb 7, 2023

View reviewed changes

codebien force-pushed the time-series-limit branch 2 times, most recently from b6856fc to d81ad67 Compare March 14, 2023 17:52

codebien force-pushed the time-series-limit branch from d81ad67 to 5c7058f Compare March 14, 2023 18:26

imiric suggested changes Mar 23, 2023

View reviewed changes

imiric reviewed Mar 23, 2023

View reviewed changes

lib/testutils/logrus_hook.go Outdated Show resolved Hide resolved

na-- reviewed Mar 27, 2023

View reviewed changes

metrics/engine/ingester_test.go Outdated Show resolved Hide resolved

codebien force-pushed the time-series-limit branch 2 times, most recently from 7328582 to dc024b6 Compare March 28, 2023 13:39

testutils: helper hook constructor

683e355

codebien force-pushed the time-series-limit branch from dc024b6 to 47e3d08 Compare March 28, 2023 14:13

codebien requested review from imiric and na-- March 28, 2023 14:36

na-- reviewed Mar 28, 2023

View reviewed changes

metrics/engine/ingester.go Outdated Show resolved Hide resolved

na-- reviewed Mar 28, 2023

View reviewed changes

metrics/engine/ingester.go Outdated Show resolved Hide resolved

na-- reviewed Mar 28, 2023

View reviewed changes

metrics/engine/ingester.go Outdated Show resolved Hide resolved

na-- reviewed Mar 28, 2023

View reviewed changes

metrics/engine/ingester.go Show resolved Hide resolved

codebien force-pushed the time-series-limit branch from c6a1857 to ea235c0 Compare March 29, 2023 08:17

Warn when too many time series are generated

1902a96

codebien force-pushed the time-series-limit branch from ea235c0 to 1902a96 Compare March 29, 2023 08:34

codebien requested a review from na-- March 29, 2023 08:34

mstoykov reviewed Mar 29, 2023

View reviewed changes

imiric approved these changes Mar 29, 2023

View reviewed changes

na-- approved these changes Mar 29, 2023

View reviewed changes

codebien merged commit dccdf1b into master Mar 29, 2023

codebien deleted the time-series-limit branch March 29, 2023 15:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Too many time series warning #2901

Too many time series warning #2901

codebien commented Feb 6, 2023

codecov-commenter commented Feb 6, 2023 •

edited

Loading

imiric left a comment

imiric Feb 7, 2023

imiric Mar 23, 2023

codebien Mar 28, 2023

codebien commented Mar 14, 2023

imiric left a comment

imiric Mar 23, 2023

na-- commented Mar 27, 2023 •

edited

Loading

mstoykov Mar 29, 2023

na-- Mar 29, 2023 •

edited

Loading

mstoykov Mar 29, 2023

na-- Mar 29, 2023

codebien Mar 29, 2023

Too many time series warning #2901

Too many time series warning #2901

Conversation

codebien commented Feb 6, 2023

codecov-commenter commented Feb 6, 2023 • edited Loading

Codecov Report

imiric left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codebien commented Mar 14, 2023

imiric left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

na-- commented Mar 27, 2023 • edited Loading

Choose a reason for hiding this comment

na-- Mar 29, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Feb 6, 2023 •

edited

Loading

na-- commented Mar 27, 2023 •

edited

Loading

na-- Mar 29, 2023 •

edited

Loading