Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: dynamic dag traversal #163

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open

Conversation

SgtPooki
Copy link
Member

@SgtPooki SgtPooki commented Dec 9, 2024

  • refactor: move set-content-type function
  • feat: enhanced dag traversal
  • feat: improve enhanced-dag-traversal

Title

feat: dynamic dag traversal

Description

This PR consolidates DAG traversal and streams when fetching dag-pb/unixfs. If the content type is an image or video, it attempts to return more quickly by doing a DFS traversal of the DAG and streaming the content as it is found. This is an enhancement to the existing DAG traversal and streaming mechanism.

A few callouts:

  • We no longer need getStreamFromAsyncIterable
  • firstChunk retrieval is eagerly done in enhancedDagTraversal
  • all other chunks are lazily fetched in enhancedDagTraversal

Fixes #52

Notes & open questions

It seems like we get a little performance boost with these changes. See ipfs/service-worker-gateway#529 also hyperfine benchmarking below. Note that the first hyperfine run was querying public endpoints (without instantiating helia), and the second was querying a local endpoint

> hyperfine --parameter-list branch main,52-video-streaming-performance --setup "git switch {branch} && npm run reset && npm i && npm run build && cd packages/verified-fetch" -i --runs 1000 "node test-time-to-first-byte.js"
Benchmark 1: node test-time-to-first-byte.js (branch = main)
  Time (mean ± σ):      24.0 ms ±   2.7 ms    [User: 17.8 ms, System: 3.5 ms]
  Range (min … max):    22.0 ms …  68.6 ms    1000 runs
 
  Warning: Ignoring non-zero exit code.
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: node test-time-to-first-byte.js (branch = 52-video-streaming-performance)
  Time (mean ± σ):      23.4 ms ±   2.1 ms    [User: 17.7 ms, System: 3.4 ms]
  Range (min … max):    21.3 ms …  58.1 ms    1000 runs
 
  Warning: Ignoring non-zero exit code.
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  node test-time-to-first-byte.js (branch = 52-video-streaming-performance) ran
    1.03 ± 0.15 times faster than node test-time-to-first-byte.js (branch = main)                                                                     [3m-6.517s]

> hyperfine --parameter-list branch main,52-video-streaming-performance --setup "git switch {branch} && npm run reset && npm i && npm run build" "cd packages/verified-fetch && node test-time-to-first-byte.js" --runs 5000
Benchmark 1: cd packages/verified-fetch && node test-time-to-first-byte.js (branch = main)
  Time (mean ± σ):     340.8 ms ±  15.3 ms    [User: 367.6 ms, System: 87.3 ms]
  Range (min … max):   320.7 ms … 638.8 ms    5000 runs
 
  Warning: The first benchmarking run for this command was significantly slower than the rest (638.8 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.
 
Benchmark 2: cd packages/verified-fetch && node test-time-to-first-byte.js (branch = 52-video-streaming-performance)
  Time (mean ± σ):     341.5 ms ±  11.5 ms    [User: 368.4 ms, System: 87.4 ms]
  Range (min … max):   322.6 ms … 599.5 ms    5000 runs
 
  Warning: The first benchmarking run for this command was significantly slower than the rest (561.6 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.
 
Summary
  cd packages/verified-fetch && node test-time-to-first-byte.js (branch = main) ran
    1.00 ± 0.06 times faster than cd packages/verified-fetch && node test-time-to-first-byte.js (branch = 52-video-streaming-performance)           [60m-28.332s]

The contents of test-time-to-first-byte.js is:

import { trustlessGateway } from '@helia/block-brokers'
import { createHeliaHTTP } from '@helia/http'
import { httpGatewayRouting } from '@helia/routers'
import browserReadableStreamToIt from 'browser-readablestream-to-it'
import first from 'it-first'
import { createVerifiedFetch } from './dist/src/index.js'

const controller = new AbortController()
const signal = controller.signal
const start = performance.now()
const helia = await createHeliaHTTP({
  blockBrokers: [trustlessGateway({ allowLocal: true })],
  routers: [httpGatewayRouting({ gateways: ['http://127.0.0.1:8080'] })]
})
const verifiedFetch = await createVerifiedFetch(helia)
const response = await verifiedFetch('bafybeidsp6fva53dexzjycntiucts57ftecajcn5omzfgjx57pqfy3kwbq', { signal })

const end = performance.now()
const timeToResponse = end - start
if (response.body == null) {
  throw new Error('response.body is null')
}
const startByte = performance.now()
await first(browserReadableStreamToIt(response.body))
const endByte = performance.now()
const timeToFirstByte = endByte - startByte
// expect(timeToFirstByte).to.be.lessThan(1000)
// eslint-disable-next-line no-console
console.log('TTR: %s, TTFB: %s', timeToResponse, timeToFirstByte)

await verifiedFetch.stop()
controller.abort()

Things still to do:

Change checklist

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation if necessary (this includes comments as well)
  • I have added tests that prove my fix is effective or that my feature works

@SgtPooki SgtPooki linked an issue Dec 9, 2024 that may be closed by this pull request
Copy link
Member Author

@SgtPooki SgtPooki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments after chatting with Alex in Helia WG

Comment on lines 42 to 48
const dfsIter = dfsEntry.content({
signal,
onProgress,
offset,
length
})

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set blockReadConcurrency: 1 here.

set offset:0 and length to min bytes to retrieve to determine file type to ensure we only request whats needed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated this to follow the same pattern as https://pkg.go.dev/net/http#DetectContentType for detecting the content type

Comment on lines 92 to 100
for await (const chunk of exporterEntry.content({
signal,
onProgress,
offset,
length
})) {
yield chunk
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test if we need to set blockReadConcurrency here, based on either offset===0 or isImageOrVideo

packages/verified-fetch/src/verified-fetch.ts Outdated Show resolved Hide resolved
@SgtPooki SgtPooki marked this pull request as ready for review December 12, 2024 20:56
@SgtPooki SgtPooki requested a review from a team as a code owner December 12, 2024 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Video streaming performance
1 participant