Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix metrics data race in the Engine test run finalization #1888

Merged
merged 1 commit into from
Mar 8, 2021

Conversation

na--
Copy link
Member

@na-- na-- commented Mar 8, 2021

There was a race between these two branches of the select: https://github.com/loadimpact/k6/blob/f1413e601b021d19f06d4d72ee14714a46344162/core/engine.go#L330-L339

I didn't add a test, since apparently the xk6 one I added in #1885 already covers it - yay for integration tests... 🎉 😅 This should close #1887

@na-- na-- added this to the v0.31.0 milestone Mar 8, 2021
@na-- na-- requested review from mstoykov and imiric March 8, 2021 07:35
Comment on lines +331 to +339
getCachedMetrics:
for {
select {
case sc := <-e.Samples:
sampleContainers = append(sampleContainers, sc)
default:
break getCachedMetrics
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the looks of things, this should be called after all the VUs have stopped ... so at least it should not be possible for there to be an endless loop as something keeps adding samples.

Obviously, this won't prevent some extension or badly written internal code to just keep sending metrics without checking the context in a separate goroutine, but I don't think that is such a big problem.

I somewhat would prefer for there to be some timeout to this still, just in case we are wrong or something changes in the future. But I think this can wait

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not convinced a timeout is a good idea here... It shouldn't be needed, and we don't have timeouts for other things that process metrics and might be overwhelmed, so this probably shouldn't have one either. That said, I added this issue and mentioned the potential problem in it: #1889

but for now, unless we observe some issues stemming from this, I'm for not touching it 😅

Copy link
Contributor

@imiric imiric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of a more elegant way to flush e.Samples, though I agree with Mihail that a timeout would be good to have.

@imiric imiric merged commit 2036bae into master Mar 8, 2021
@imiric imiric deleted the fix-summary-race branch March 8, 2021 09:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

handleSummary called before all metrics are crunched
3 participants