-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: split large incremental snapshots #1307
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
Size Change: +4.3 kB (+0.37%) Total Size: 1.18 MB
ℹ️ View Unchanged
|
@@ -165,27 +166,154 @@ export function truncateLargeConsoleLogs(_event: eventWithTime) { | |||
|
|||
export const SEVEN_MEGABYTES = 1024 * 1024 * 7 * 0.9 // ~7mb (with some wiggle room) | |||
|
|||
// recursively splits large buffers into smaller ones | |||
function sliceList(list: any[], sizeLimit: number): any[][] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the list length is 2 and they are both bigger than the limit what does this do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we possibly try recursively slicing the chunks in https://github.com/PostHog/posthog-js/pull/1307/files#diff-deb180b570d4a9e728cd83fd418e54a748f69492d07c9809924a3821ca4de4d5R178 as we used to do in https://github.com/PostHog/posthog-js/pull/1307/files#diff-deb180b570d4a9e728cd83fd418e54a748f69492d07c9809924a3821ca4de4d5L172-L188
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, i used to split in two, and then recursively split each half in two until every part was below the size limit
now we calculate how many chunks we need to split into so that every chunk is under the limit and then split into that many chunks
(recursion always feels too "clever" to me :))
@@ -165,27 +166,154 @@ export function truncateLargeConsoleLogs(_event: eventWithTime) { | |||
|
|||
export const SEVEN_MEGABYTES = 1024 * 1024 * 7 * 0.9 // ~7mb (with some wiggle room) | |||
|
|||
// recursively splits large buffers into smaller ones | |||
function sliceList(list: any[], sizeLimit: number): any[][] { | |||
const size = estimateSize(list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We compute the size in this function but then always need it again in the calling functions. Two examples:
https://github.com/PostHog/posthog-js/pull/1307/files#diff-deb180b570d4a9e728cd83fd418e54a748f69492d07c9809924a3821ca4de4d5R187
https://github.com/PostHog/posthog-js/pull/1307/files#diff-deb180b570d4a9e728cd83fd418e54a748f69492d07c9809924a3821ca4de4d5R243
If estimating the size is an expensive operation perhaps we should look to return the value so it doesn't need to be recomputed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this has either changed enough the comment doesn't apply any more, or I don't understand the comment 🙈 🤣
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a biggie, just noticed that we estimated the size while slicing the list but they re-estimate in https://github.com/PostHog/posthog-js/pull/1307/files#diff-deb180b570d4a9e728cd83fd418e54a748f69492d07c9809924a3821ca4de4d5R184-R186. Could possibly just return the already estimated size
@@ -165,27 +166,154 @@ export function truncateLargeConsoleLogs(_event: eventWithTime) { | |||
|
|||
export const SEVEN_MEGABYTES = 1024 * 1024 * 7 * 0.9 // ~7mb (with some wiggle room) | |||
|
|||
// recursively splits large buffers into smaller ones | |||
function sliceList(list: any[], sizeLimit: number): any[][] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we possibly try recursively slicing the chunks in https://github.com/PostHog/posthog-js/pull/1307/files#diff-deb180b570d4a9e728cd83fd418e54a748f69492d07c9809924a3821ca4de4d5R178 as we used to do in https://github.com/PostHog/posthog-js/pull/1307/files#diff-deb180b570d4a9e728cd83fd418e54a748f69492d07c9809924a3821ca4de4d5L172-L188
OK, I am as sure as I can be from local testing that this works am sleeping on this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good but agreed it's hard to reason about for all cases without some production data flowing.
Could you hook it up locally and test it out for some large payload sites? Otherwise maybe we need to figure out a way to roll it out slowing in production
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the |
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the |
This PR was closed due to lack of activity. Feel free to reopen if it's still relevant. |
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the |
This PR was closed due to lack of activity. Feel free to reopen if it's still relevant. |
we see very large incremental snapshots
large enough we can't ingest them
they often have large arrays of adds or attribute mutations
those mutations need to be applied in the same order but they don't need to be in one snapshot
todo