Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: split large incremental snapshots #1307

Closed
wants to merge 8 commits into from

Conversation

pauldambra
Copy link
Member

@pauldambra pauldambra commented Jul 16, 2024

we see very large incremental snapshots
large enough we can't ingest them
they often have large arrays of adds or attribute mutations
those mutations need to be applied in the same order but they don't need to be in one snapshot

todo

  • run locally with an artificially low size limit so we can see it work
  • do we need to do any timestamp fangling to preserve order

Copy link

vercel bot commented Jul 16, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
posthog-js ✅ Ready (Inspect) Visit Preview Sep 5, 2024 3:48pm

Copy link

github-actions bot commented Jul 16, 2024

Size Change: +4.3 kB (+0.37%)

Total Size: 1.18 MB

Filename Size Change
dist/array.full.js 337 kB +1.08 kB (+0.32%)
dist/array.js 157 kB +1.07 kB (+0.69%)
dist/main.js 158 kB +1.07 kB (+0.69%)
dist/module.js 157 kB +1.07 kB (+0.69%)
ℹ️ View Unchanged
Filename Size
dist/exception-autocapture.js 10.4 kB
dist/recorder-v2.js 110 kB
dist/recorder.js 111 kB
dist/surveys-preview.js 59.8 kB
dist/surveys.js 66 kB
dist/tracing-headers.js 8.26 kB
dist/web-vitals.js 5.79 kB

compressed-size-action

@@ -165,27 +166,154 @@ export function truncateLargeConsoleLogs(_event: eventWithTime) {

export const SEVEN_MEGABYTES = 1024 * 1024 * 7 * 0.9 // ~7mb (with some wiggle room)

// recursively splits large buffers into smaller ones
function sliceList(list: any[], sizeLimit: number): any[][] {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the list length is 2 and they are both bigger than the limit what does this do

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@pauldambra pauldambra Jul 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, i used to split in two, and then recursively split each half in two until every part was below the size limit

now we calculate how many chunks we need to split into so that every chunk is under the limit and then split into that many chunks

(recursion always feels too "clever" to me :))

@@ -165,27 +166,154 @@ export function truncateLargeConsoleLogs(_event: eventWithTime) {

export const SEVEN_MEGABYTES = 1024 * 1024 * 7 * 0.9 // ~7mb (with some wiggle room)

// recursively splits large buffers into smaller ones
function sliceList(list: any[], sizeLimit: number): any[][] {
const size = estimateSize(list)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We compute the size in this function but then always need it again in the calling functions. Two examples:
https://github.com/PostHog/posthog-js/pull/1307/files#diff-deb180b570d4a9e728cd83fd418e54a748f69492d07c9809924a3821ca4de4d5R187
https://github.com/PostHog/posthog-js/pull/1307/files#diff-deb180b570d4a9e728cd83fd418e54a748f69492d07c9809924a3821ca4de4d5R243

If estimating the size is an expensive operation perhaps we should look to return the value so it doesn't need to be recomputed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this has either changed enough the comment doesn't apply any more, or I don't understand the comment 🙈 🤣

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a biggie, just noticed that we estimated the size while slicing the list but they re-estimate in https://github.com/PostHog/posthog-js/pull/1307/files#diff-deb180b570d4a9e728cd83fd418e54a748f69492d07c9809924a3821ca4de4d5R184-R186. Could possibly just return the already estimated size

@@ -165,27 +166,154 @@ export function truncateLargeConsoleLogs(_event: eventWithTime) {

export const SEVEN_MEGABYTES = 1024 * 1024 * 7 * 0.9 // ~7mb (with some wiggle room)

// recursively splits large buffers into smaller ones
function sliceList(list: any[], sizeLimit: number): any[][] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/extensions/replay/sessionrecording-utils.ts Outdated Show resolved Hide resolved
src/extensions/replay/sessionrecording-utils.ts Outdated Show resolved Hide resolved
@pauldambra
Copy link
Member Author

OK, I am as sure as I can be from local testing that this works
But i'm just uncomfortable editing rrweb data because we've seen playback be intolerant to data edits in ways we didn't expect

am sleeping on this

Copy link
Contributor

@daibhin daibhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but agreed it's hard to reason about for all cases without some production data flowing.

Could you hook it up locally and test it out for some large payload sites? Otherwise maybe we need to figure out a way to roll it out slowing in production

@posthog-bot
Copy link
Collaborator

This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week.

@posthog-bot
Copy link
Collaborator

This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week.

@posthog-bot
Copy link
Collaborator

This PR was closed due to lack of activity. Feel free to reopen if it's still relevant.

@pauldambra pauldambra reopened this Sep 5, 2024
@posthog-bot
Copy link
Collaborator

This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week.

@posthog-bot
Copy link
Collaborator

This PR was closed due to lack of activity. Feel free to reopen if it's still relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants