Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parse in batches or using workers #107

Open
mapsgeek opened this issue Feb 24, 2024 · 1 comment
Open

parse in batches or using workers #107

mapsgeek opened this issue Feb 24, 2024 · 1 comment

Comments

@mapsgeek
Copy link

this is more of a help request than an issue, i would like to have an example or help in reading/parsing files in batches
similar to @loaders.gl
for example we can do this:

  const batchIterator = await parseInBatches(file, loaders, {
    worker: true,
    batchSize: 4000,
    batchDebounceMs: 50,
    metadata: true,
  });

 for await (const batch of batchIterator) {
    for (let i = 0; i < batch.data.length; i += 1) {
      batches.push(batch.data[i] as never);
    }
  }

so the file gets loaded and parsed without blocking the main thread, i have been exploring with js web workers so my solution would be something like this and call the worker on file upload event

onmessage = function (event) {
  // console.log('Received message from the main thread:', event.data);

  const wasmTable = readGeoParquet(new Uint8Array(event.data));
  const jsTable = tableFromIPC(wasmTable.intoTable().intoIPCStream());
};

not sure yet if that's the right approach but also i'm confused about the option
earcutWorkerPoolSize and earcutWorkerUrl from the layers options if they can be more effective way to solve this issue
so more information about this would be helpful.

Thanks

@kylebarron
Copy link
Member

You can create one layer per arrow batch. So if you have an incoming stream of Arrow batches, you can create an async iterable of layers, and deck should be able to handle that.

The earcut worker is separate and handles polygon triangulation.

At some point geoparquet-wasm should be able to expose a stream of batches. It was already mostly implemented for the non-spatial case in parquet-wasm. I don't know when I'll have time to get to that, though. See also discussion here geoarrow/geoarrow-rs#283

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants