Skip to content
This repository has been archived by the owner on Aug 11, 2021. It is now read-only.

IPLD with multithreading constraints #283

Open
Gozala opened this issue Jun 3, 2020 · 1 comment
Open

IPLD with multithreading constraints #283

Gozala opened this issue Jun 3, 2020 · 1 comment

Comments

@Gozala
Copy link

Gozala commented Jun 3, 2020

I would like to start a discussion around the problems I've encountered in an effort of moving JS-IPFS into SharedWorker. I'm posting a summary here, but more details can be found at ipfs/js-ipfs#3022 (comment)

Context

Teams like 3box and Textile are using primarily ipfs.dag.* subset of JS-IPFS in browsers. They also need custom codecs like dag-jose and dag-cose.

Currently ipfs.dag.put API takes dagNode in several representations:

  1. Plain data structures, structure clone-able JS values + options about desired encoding.
  2. Binary encoded data (ArrayBuffer or some typed array / node buffer view).
  3. Some form of DAGNode class instance (e.g. one exposed by ipld-dag-pb)

Problem

Last representation is problematic, especially for custom formats, as there is no generic interface for turning it into representation adequate for moving across threads or processes. E.g. ipfs-http-client just assumes that dagNode is represented in binary encoding if it's not dag-pb or dag-cbor:

https://github.com/ipfs/js-ipfs/blob/5e0a18aa07c1852d2a7589899661761db50eeda9/packages/ipfs-http-client/src/dag/put.js#L39-L46

This implies that passing DAGNode instance of dag-jose / dag-cose would not work with ipfs-http-client. This may be ok for js-ipfs-http-client, however it is problematic for SharedWorker based use case because, client on the main thread needs to be able to:

  1. Differentiate between Representation 1 and 3.
  2. Have a way to turn it into adequate representation for moving across threads without introducing unnecessary overhead (which may not necessarily be binary representation, more about later).
  3. Avoid loading codecs (ideally all the encoding / decoding should happen in the worker thread).

I understand that new block API might address some of that, however I would like to:

  1. Not be blocked, as in have to wait for new API to eventually make it's way into JS-IPFS. Ideally current DAGNode interface could be extended so that:
    1. It could be distinguished from plain data representation e.g. dagNode instanceof DAGNode
    2. Have method to turn into representation that can be send across threads (In some cases it could be binary in others structure clone-able values).
  2. Invite you to incorporate unique constraints introduces by multithreaded setup e.g.
    1. Not having to load codecs for block assembly and defer that to the worker thread.
    2. Consider ownership, because ArrayBuffers can be transferred across threads without copying but that causes representation in the former thread to be emptied. That is to say some though needs to be put into API so that intent of copying vs transferring can be inferred.
    3. Consider intermediate representation, that is structure clone-able Block representation that would allow transfer of encoded pieces and copy of raw ones.
@mikeal
Copy link

mikeal commented Jun 3, 2020

Some form of DAGNode class instance (e.g. one exposed by ipld-dag-pb)

The good news is, we stopped doing this :) The latest dag-pb codec in Go was done in pure data model and we now have a schema for dag-pb in IPLD Data Model.

The DAGNode class is an unfortunate bit of legacy code that pre-dates us even having defined a consistent data model.

We haven’t updated the JS codecs to this representation, but if you want to get past this that might be your best option. However, it’s a substantial breaking change to dag-pb and is likely out of scope for the amount of time/work you’ve slated for this feature. You should be able to work around this if you’re only doing dag-jose and dag-cose since dag-json and dag-cbor don’t have this problem.

Going forward, all codecs will be decoded into pure data model. This is actually an important requirement for IPLD Selectors to work across languages and the current selectors used by graphsync in Go rely on this new dag-pb data model representation.

Not be blocked, as in have to wait for new API to eventually make it's way into JS-IPFS. Ideally current DAGNode interface could be extended so that:

js-ipfs is primary consumer of dag-pb so you’re welcome to do what you need to in order to make it work. As I said above, this will eventually need to be replaced with a codec that uses the new representation anyway.

Invite you to incorporate unique constraints introduces by multithreaded setup e.g.

All of the codecs are being moved to the js-multiformats interface so that they can be loaded as needed by the consumer in order to control the bundle size. I’d love any feedback you have that would make it easier to stick the interface in a worker as needed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants