Bringeneers NRP bucket crawler with experiment and file explorer hosted at search.braingeneers.gi.ucsc.edu
NOTE: 2023-04-02-e-hc328_unperturbed containes primary and spike sorted NWB files
pip install -r requirments.txt
First create a small crawl database
python crawl.py --count 10
Then run the server locally in debug and auto reload mode
make debug-server
Build docker files and start using Docker Compose
make build
make up
make follow
NOTE: docker-compose.yml is configured to be run from the braingeneers server so that it integrates into the mission control reverse proxy exposing this as search.braingeneers.gi.ucsc.edu
h5wasm enables the full hdf5 library to run natively in the browser. Using Emscripten FS.createLazyFile enables providing h5wasm a virtual file backed by http that can use range requests to incrementally access the h5 file over the wire. The paves the way to provide a presigned s3 URL so that a browser based app can directly access an h5 file in a cloud store. Unfortunately you can only generate a presigned URL for a single HTTP method, and h5wasm performs a HEAD to get capabilities (like range requests) before making a GET. To work around this the flask server in thie repo responds to the HEAD request directly and then provides a presigned URL redirection for the GET request so that the browser is directly pulling data from s3. This requires that the headers in the HEAD request provide the right capabilities. This approach has the downside of a redirect for every chunk from the proxy to the client. Another approach taken by flatiron's dendro is to fork h5wasm and use an aborted fetch to just get the content length Here is the detailed sequence of requests and responses that h5wasm makes then leads to this incremental reading:
HEAD /data/aff5f64d-9a69-4ff3-a6fe-13a3f30dca50 HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9
Connection: keep-alive
Host: localhost:5282
Referer: http://localhost:5282/static/worker.js
Sec-Fetch-Dest: empty
Sec-Fetch-Mode: cors
Sec-Fetch-Site: same-origin
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4 Safari/605.1.15
HTTP/1.1 200 OK
Accept-Ranges: bytes
Cache-Control: max-age=3600
Connection: keep-alive
Content-Length: 4966709395
Content-Type: application/octet-stream
Date: Mon, 18 Mar 2024 15:17:34 GMT
ETag: W/"11795069-4966709395-2024-03-11T19:11:17.335Z"
Keep-Alive: timeout=5
Last-Modified: Mon, 11 Mar 2024 19:11:17 GM
GET /data/aff5f64d-9a69-4ff3-a6fe-13a3f30dca50 HTTP/1.1
Accept: */*
Accept-Encoding: identity
Accept-Language: en-US,en;q=0.9
Connection: keep-alive
Host: localhost:5282
Range: bytes=0-1048575
Referer: http://localhost:5282/static/worker.js
Sec-Fetch-Dest: empty
Sec-Fetch-Mode: cors
Sec-Fetch-Site: same-origin
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4 Safari/605.1.15
HTTP/1.1 206 Partial Content
Accept-Ranges: bytes
Cache-Control: max-age=3600
Connection: keep-alive
Content-Length: 1048576
Content-Range: bytes 0-1048575/4966709395
Content-Type: application/octet-stream
Date: Mon, 18 Mar 2024 15:17:34 GMT
ETag: W/"11795069-4966709395-2024-03-11T19:11:17.335Z"
Keep-Alive: timeout=5
Last-Modified: Mon, 11 Mar 2024 19:11:17 GMT
s3://braingeneers/ephys/2023-04-02-e-hc328_unperturbed/shared/hc3.28_hckcr1_chip16835_plated34.2_rec4.2.nwb
s3://braingeneers/ephys/2023-04-02-e-hc328_unperturbed/shared/hc3.28_hckcr1_chip16835_plated34.2_rec4.2_kilosort2_curated_s1.nwb
Quick full-text search using SQLite
Neurodata Without Borders(NWB)
h5wasm wrapper for h5 from http
How h5wasm accesses files over http via Emscripten lazy loading
GitHub thread on access h5 via range requests
Chunking and indexing note in an issue
React components to visualize and graph h5 data (uses h5wasm)