Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[chore][tracker]: save most recent (archive) write index to disk #36799
base: main
Are you sure you want to change the base?
[chore][tracker]: save most recent (archive) write index to disk #36799
Changes from 17 commits
25105a4
b586d84
7cdba33
43c4298
a7d6903
c8a4c51
ab6bdd1
c8171a4
5d5f12d
1a66069
f6f6815
2c578f5
b4adbf5
b9e55ba
a843a3b
106bb5b
844b7fe
b528edf
c185d43
fafb26b
3d7dbe6
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
Check failure on line 13 in pkg/stanza/fileconsumer/internal/checkpoint/checkpoint.go
GitHub Actions / govulncheck (pkg)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For existing usage, this will be a no-op.
Check failure on line 14 in pkg/stanza/fileconsumer/internal/tracker/tracker.go
GitHub Actions / govulncheck (pkg)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea to check for this case.
However, I wonder if we can handle it better than restarting from zero. What would it take to search the archive for the most recently updated?
I think we could maintain some kind of data structure which notes the time each archive was written. Maybe just
map[index]time.Time
. Then when we first create the tracker, we can load this up and find the most recent timestamp. We can also check for the case wherepollsToArchive
has changed and then rewrite the storage to align with the new value.For example, if we previously saved 10 archives and find that
pollsToArchive
is now 5, we can find the 5 most recent indices based on the timestamp structure, then rewrite the archive files so that these are 0-4. We should probably even delete the extras from storage as well.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@djaglowski This solution does makes sense to me, but it becomes tricky when we eventually overwrite old archive data, as it is a ring buffer.
We might need to load the filesets in memory.
I'll find a few ways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate?
If it's more than one at a time then it defeats the point of the archive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider this archive,
We've rolled over once and the latest data is at index
4
andarchiveIndex
(i.e. where the next data will be written) is at index5
.Let's suppose that new
polls_to_archive
is 7.We now need to construct a new, smaller archive with 7 most recent elements.
These elements are (from most recent to least recent):
14, 13, 12, 11, 10, 9, 8
We cannot simply rewrite archive in-place without caching values.
It would be much simpler to convert archive like following image,
and we would delete excess data.
Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would always be data stored at
archiveIndex-1
index. We will storearchiveIndex
on disk, so in next collector run, we would load that value and we can find most recent data.archiveIndex
points at the next location where data will be written.This can point to either of following:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if
pollsToArchive < archiveIndex-1
?Anyways, this is an edge case we can worry about later. I think we have bigger problems to work through first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correcting my previous comment,
Most recently written index is
(archiveIndex-1) % previouspollsToArchive
Check failure on line 10 in pkg/stanza/operator/persister.go
GitHub Actions / govulncheck (pkg)
Check failure on line 10 in pkg/stanza/operator/persister.go
GitHub Actions / govulncheck (processor-0)
Check failure on line 10 in pkg/stanza/operator/persister.go
GitHub Actions / govulncheck (receiver-1)
Check failure on line 10 in pkg/stanza/operator/persister.go
GitHub Actions / govulncheck (exporter-1)
Check failure on line 10 in pkg/stanza/operator/persister.go
GitHub Actions / govulncheck (receiver-0)
Check failure on line 10 in pkg/stanza/operator/persister.go
GitHub Actions / govulncheck (receiver-3)
Check failure on line 11 in pkg/stanza/testutil/util.go
GitHub Actions / govulncheck (pkg)