feat: split large incremental snapshots #1307

pauldambra · 2024-07-16T18:23:30Z

we see very large incremental snapshots
large enough we can't ingest them
they often have large arrays of adds or attribute mutations
those mutations need to be applied in the same order but they don't need to be in one snapshot

todo

run locally with an artificially low size limit so we can see it work
do we need to do any timestamp fangling to preserve order

vercel · 2024-07-16T18:23:33Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated (UTC)
posthog-js	✅ Ready (Inspect)	Visit Preview	Sep 5, 2024 3:48pm

github-actions · 2024-07-16T18:26:24Z

Size Change: +4.3 kB (+0.37%)

Total Size: 1.18 MB

Filename	Size	Change
`dist/array.full.js`	337 kB	+1.08 kB (+0.32%)
`dist/array.js`	157 kB	+1.07 kB (+0.69%)
`dist/main.js`	158 kB	+1.07 kB (+0.69%)
`dist/module.js`	157 kB	+1.07 kB (+0.69%)

ℹ️ View Unchanged

Filename	Size
`dist/exception-autocapture.js`	10.4 kB
`dist/recorder-v2.js`	110 kB
`dist/recorder.js`	111 kB
`dist/surveys-preview.js`	59.8 kB
`dist/surveys.js`	66 kB
`dist/tracing-headers.js`	8.26 kB
`dist/web-vitals.js`	5.79 kB

_{compressed-size-action}

pauldambra · 2024-07-17T10:15:43Z

src/extensions/replay/sessionrecording-utils.ts

@@ -165,27 +166,154 @@ export function truncateLargeConsoleLogs(_event: eventWithTime) {

 export const SEVEN_MEGABYTES = 1024 * 1024 * 7 * 0.9 // ~7mb (with some wiggle room)

-// recursively splits large buffers into smaller ones
+function sliceList(list: any[], sizeLimit: number): any[][] {


if the list length is 2 and they are both bigger than the limit what does this do

Should we possibly try recursively slicing the chunks in https://github.com/PostHog/posthog-js/pull/1307/files#diff-deb180b570d4a9e728cd83fd418e54a748f69492d07c9809924a3821ca4de4d5R178 as we used to do in https://github.com/PostHog/posthog-js/pull/1307/files#diff-deb180b570d4a9e728cd83fd418e54a748f69492d07c9809924a3821ca4de4d5L172-L188

so, i used to split in two, and then recursively split each half in two until every part was below the size limit

now we calculate how many chunks we need to split into so that every chunk is under the limit and then split into that many chunks

(recursion always feels too "clever" to me :))

daibhin · 2024-07-18T10:16:17Z

src/extensions/replay/sessionrecording-utils.ts

@@ -165,27 +166,154 @@ export function truncateLargeConsoleLogs(_event: eventWithTime) {

 export const SEVEN_MEGABYTES = 1024 * 1024 * 7 * 0.9 // ~7mb (with some wiggle room)

-// recursively splits large buffers into smaller ones
+function sliceList(list: any[], sizeLimit: number): any[][] {
+    const size = estimateSize(list)


We compute the size in this function but then always need it again in the calling functions. Two examples:
https://github.com/PostHog/posthog-js/pull/1307/files#diff-deb180b570d4a9e728cd83fd418e54a748f69492d07c9809924a3821ca4de4d5R187
https://github.com/PostHog/posthog-js/pull/1307/files#diff-deb180b570d4a9e728cd83fd418e54a748f69492d07c9809924a3821ca4de4d5R243

If estimating the size is an expensive operation perhaps we should look to return the value so it doesn't need to be recomputed

i think this has either changed enough the comment doesn't apply any more, or I don't understand the comment 🙈 🤣

It's not a biggie, just noticed that we estimated the size while slicing the list but they re-estimate in https://github.com/PostHog/posthog-js/pull/1307/files#diff-deb180b570d4a9e728cd83fd418e54a748f69492d07c9809924a3821ca4de4d5R184-R186. Could possibly just return the already estimated size

daibhin · 2024-07-18T10:19:14Z

src/extensions/replay/sessionrecording-utils.ts

@@ -165,27 +166,154 @@ export function truncateLargeConsoleLogs(_event: eventWithTime) {

 export const SEVEN_MEGABYTES = 1024 * 1024 * 7 * 0.9 // ~7mb (with some wiggle room)

-// recursively splits large buffers into smaller ones
+function sliceList(list: any[], sizeLimit: number): any[][] {


Should we possibly try recursively slicing the chunks in https://github.com/PostHog/posthog-js/pull/1307/files#diff-deb180b570d4a9e728cd83fd418e54a748f69492d07c9809924a3821ca4de4d5R178 as we used to do in https://github.com/PostHog/posthog-js/pull/1307/files#diff-deb180b570d4a9e728cd83fd418e54a748f69492d07c9809924a3821ca4de4d5L172-L188

src/extensions/replay/sessionrecording-utils.ts

pauldambra · 2024-07-20T10:32:37Z

OK, I am as sure as I can be from local testing that this works
But i'm just uncomfortable editing rrweb data because we've seen playback be intolerant to data edits in ways we didn't expect

am sleeping on this

daibhin

Looks good but agreed it's hard to reason about for all cases without some production data flowing.

Could you hook it up locally and test it out for some large payload sites? Otherwise maybe we need to figure out a way to roll it out slowing in production

posthog-bot · 2024-07-30T09:31:56Z

This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week.

posthog-bot · 2024-08-07T09:32:05Z

This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week.

posthog-bot · 2024-08-14T09:32:07Z

This PR was closed due to lack of activity. Feel free to reopen if it's still relevant.

posthog-bot · 2024-09-16T09:35:05Z

This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week.

posthog-bot · 2024-09-24T09:33:30Z

This PR was closed due to lack of activity. Feel free to reopen if it's still relevant.

feat: split large incremental snapshots

04e8ff3

pauldambra requested a review from a team July 16, 2024 18:23

vercel bot deployed to Preview July 16, 2024 18:23 View deployment

account for timestamp

19b75f9

vercel bot deployed to Preview July 16, 2024 19:51 View deployment

pauldambra commented Jul 17, 2024

View reviewed changes

daibhin reviewed Jul 18, 2024

View reviewed changes

pauldambra added 2 commits July 20, 2024 11:14

after testing locally with every incremental being processed

35b1d85

order

dc488af

vercel bot deployed to Preview July 20, 2024 10:16 View deployment

refactor

3f41b34

vercel bot deployed to Preview July 20, 2024 10:25 View deployment

one fewer estimate

54878a2

vercel bot deployed to Preview July 20, 2024 10:33 View deployment

we already have a max message size, just use that

eab4488

pauldambra requested a review from daibhin July 20, 2024 10:38

vercel bot deployed to Preview July 20, 2024 10:39 View deployment

daibhin approved these changes Jul 22, 2024

View reviewed changes

posthog-bot added the stale label Jul 30, 2024

pauldambra added waiting and removed stale labels Jul 30, 2024

posthog-bot added the stale label Aug 7, 2024

posthog-bot closed this Aug 14, 2024

pauldambra reopened this Sep 5, 2024

Merge branch 'main' into feat/pd/split-cinremental

ef97495

vercel bot deployed to Preview September 5, 2024 15:48 View deployment

posthog-bot removed the stale label Sep 6, 2024

posthog-bot added the stale label Sep 16, 2024

posthog-bot closed this Sep 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: split large incremental snapshots #1307

feat: split large incremental snapshots #1307

pauldambra commented Jul 16, 2024 •

edited

Loading

vercel bot commented Jul 16, 2024 •

edited

Loading

github-actions bot commented Jul 16, 2024 •

edited

Loading

pauldambra Jul 17, 2024

daibhin Jul 18, 2024

pauldambra Jul 20, 2024 •

edited

Loading

daibhin Jul 18, 2024

pauldambra Jul 20, 2024

daibhin Jul 22, 2024

daibhin Jul 18, 2024

pauldambra commented Jul 20, 2024

daibhin left a comment

posthog-bot commented Jul 30, 2024

posthog-bot commented Aug 7, 2024

posthog-bot commented Aug 14, 2024

posthog-bot commented Sep 16, 2024

posthog-bot commented Sep 24, 2024

feat: split large incremental snapshots #1307

feat: split large incremental snapshots #1307

Conversation

pauldambra commented Jul 16, 2024 • edited Loading

vercel bot commented Jul 16, 2024 • edited Loading

github-actions bot commented Jul 16, 2024 • edited Loading

pauldambra Jul 17, 2024

Choose a reason for hiding this comment

daibhin Jul 18, 2024

Choose a reason for hiding this comment

pauldambra Jul 20, 2024 • edited Loading

Choose a reason for hiding this comment

daibhin Jul 18, 2024

Choose a reason for hiding this comment

pauldambra Jul 20, 2024

Choose a reason for hiding this comment

daibhin Jul 22, 2024

Choose a reason for hiding this comment

daibhin Jul 18, 2024

Choose a reason for hiding this comment

pauldambra commented Jul 20, 2024

daibhin left a comment

Choose a reason for hiding this comment

posthog-bot commented Jul 30, 2024

posthog-bot commented Aug 7, 2024

posthog-bot commented Aug 14, 2024

posthog-bot commented Sep 16, 2024

posthog-bot commented Sep 24, 2024

pauldambra commented Jul 16, 2024 •

edited

Loading

vercel bot commented Jul 16, 2024 •

edited

Loading

github-actions bot commented Jul 16, 2024 •

edited

Loading

pauldambra Jul 20, 2024 •

edited

Loading