Refactor externalData in protocol #413

jthrilly · 2018-03-06T17:33:32Z

New approach for loading externalData

Mirrors the way we load protocols: data can be loaded either locally (from a file in the asset folder of the protocol), or remotely (from a URL).
Interface configuration objects will continue to contain a pointer to an externalData item, but when accessing and loading this data, interfaces will use a special loader method.
Requesting the data via this method will return a promise, which will resolve with the contents of the file, or an error.
In the case of remote assets, the component should display whatever loading state (spinner etc.) is appropriate during the wait for the promise to resolve/error.
When requested for the first time, externalData with a remote URL is fetched with the result stored in app storage, with a filename the same as the key.
On subsequent requests, externalData with a remote URL has its lastFetched and cache properties evaluated, with data refreshed if needed. This is done by the loading function, not by the component consuming the data.

// redux protocol.externalData (same as protocol.js but with additional state properties)
{
  "externalData": {
    "clinics": {
      "friendly_name": "Available Clinics", // Used in architect to help user identify external data items
      "isLoading": false, // Used for loading state
      "isLoaded": false, // Used in conjunction with cache for skipping http request
      "url": "https://somewhere/clinics.json", 
      "cache": 3600 // Time in seconds before needing to re-fetch data
    },
    "hospitals": { // Example of remote asset that has already been fetched.
      "friendly_name": "Other available hospitals",
      "isLoading": false,
      "isLoaded": true, 
      "url": "https://somewhere/hospitals.csv", // requesting this data will resolve to asset: "appData/hospitals.json" because this has already been downloaded
      "lastFetched": 2342233, 
      "cache": 3600
    },
    "school_class": {
      "friendly_name": "School dataset",
      "asset": "class.json", // A local asset
    }
  }
}

Questions

Is the above a sensible approach? Can it be improved?
Where should isLoading and isLoaded go? I copied the above from the protocol in redux.
Should we use a separate folder (data?) for these assets?

The text was updated successfully, but these errors were encountered:

wwqrd · 2018-03-07T12:04:37Z

This looks good to me.

Might not need a lastFetched key, we could probably get that from the file system, although it could make the logic a lot simpler.

I don't think the protocol object should really be manipulated, since it's a configuration object. So these properties should probably want to like elsewhere, say in the session data, perhaps in a parallel data structure. (maybe the same goes for the top-level isLoading property):

"isLoading": false,
"isLoaded": true,
"lastFetched": 2342233,

How is this intended to work with our UIDs?

jthrilly · 2018-03-07T13:11:04Z

How is this intended to work with our UIDs?

This is independent as I see it. Once the data is loaded, an interface will determine if it requires UIDs to be present, and generate them with the logic set out in #415. Individual nodes in an external data file (even when loaded via URL) should hash to the same UID because its deterministic, right?

rebeccamadsen · 2018-03-07T14:56:49Z

Makes sense to me. I agree we shouldn't manipulate the protocol object.

We don't allow duplicate keys in json, right? So there shouldn't be a name clash with the downloaded data.

bryfox · 2018-07-23T19:11:42Z

(1) The wiki describes that data may be "dynamically loaded and created by other systems in response to interview variables" (e.g., participant ID). Are we working here with that goal in mind?

(2) Once loaded, is external data part of the redux state? "interfaces will use a special loader method" which "resolve with the contents of the file" implies not, but the "isLoading"/"isLoaded" flags in redux imply that it would be.

(3) If external data is in redux, it looks like it would need knowledge of other state branches (state.session, state.sessions, and state.protocol) to load data for panels and prompts; is that correct?

(4) Do we want to unload data that isn't needed for a stage and/or prompt? That seems wise if there are multiple large data sources, but (assuming data is in redux and not resolved directly to interfaces), we'll probably need maintain more state to know what can be expunged. (And if we don't care about this, then might it be preferable to load all external data in parallel up front with the protocol, so there's only a single "protocol loaded" event as far as the user in concerned?)

(5) Time-based cache expiry makes things interesting: data may change during an interview; if data is in redux, we can't caching selectors (at least with the default memoizers). Would an explicit 'clear cache' button in settings make for a more predictable UX?

(6) Does data loading also respect whatever cache headers are set by the server (used by the browser)?

wwqrd · 2018-07-26T10:22:27Z

I don't have any clear answers for this, but my understanding is we need to consider the following:

It should support different sources:

External files (remote and local asset)
APIs
json store in the protocol (or perhaps not, since we can support local .json assets)

It should support different types:

JSON
CSV
geo formats?

It may be utilised differently depending on the consumer:

As a supply of nodes
Could be specialised data for an interface

The spec in the issue describes how this might look in the protocol config: but in reality that wouldn't include things like loading state or caching info. It might not even make sense to put this data into redux, maybe it would be preferable to cache to local files. Perhaps the implementation could be a HOC which manages fetching / streaming / holding the data in state - whatever makes most sense for achieving the above features.

- Add a workerAgent to multiplex messages - Make the user functions ignorant of Worker boilerplate - Remove externalData for now; see #413 - Formally make Node a container; avoid sending label props through ancestors

jthrilly · 2018-08-01T14:26:59Z

(1) The wiki describes that data may be "dynamically loaded and created by other systems in response to interview variables" (e.g., participant ID). Are we working here with that goal in mind?

Yes. The idea would be to have some form of templating system for use in the URL.

http://myserver.com/myscript?participant=${first_name}+${last_name}

(2) Once loaded, is external data part of the redux state? "interfaces will use a special loader method" which "resolve with the contents of the file" implies not, but the "isLoading"/"isLoaded" flags in redux imply that it would be.

Sorry - this is me being lax with my language use. I think the data structure described in my post is what would be in the redux state. Up until now, we have directly embedded external data within protocol.js. After this change, all external data will either be loaded either via a file load operation (for a local resource bundled with the protocol), or a http request, followed by a file load operation (for a cached remote resource).

(4) Do we want to unload data that isn't needed for a stage and/or prompt? That seems wise if there are multiple large data sources, but (assuming data is in redux and not resolved directly to interfaces), we'll probably need maintain more state to know what can be expunged. (And if we don't care about this, then might it be preferable to load all external data in parallel up front with the protocol, so there's only a single "protocol loaded" event as far as the user in concerned?)

I am not sure what the repercussions of this would be, but the idea I had was to "load" (that is, read and parse) external data as it is requested by an interface. This would be similar to how the information interface loads media such as video or audio. We could also 'preload' (as discussed in #621), but might this have implications for RAM usage? Not sure - genuinely curious.

The other consideration is that given that session data might be passed to the endpoint in the URL (see 1), some external data cannot be meaningfully resolved until a specific point in the interview.

(5) Time-based cache expiry makes things interesting: data may change during an interview; if data is in redux, we can't caching selectors (at least with the default memoizers). Would an explicit 'clear cache' button in settings make for a more predictable UX?

The resolution to this depends to an extent on how we resolve 4. For example, if we load data at interview session start, we no longer need to track how stale the data is, or worry about it changing within the interview.

The real reason I liked this, is that it would give the researcher some control over how often data is expected to meaningfully change. If it effectively needs to be fetched in response to interview session values, the cache property would seem to be redundant.

Let's start by fetching data every time for now, and refine this as we go.

(6) Does data loading also respect whatever cache headers are set by the server (used by the browser)?

This was another thought I had while writing the above. Perhaps we can solve this problem at the server end, and use existing transport level mechanisms for cache. Something to consider.

bryfox · 2018-11-01T18:42:15Z

Two changes since this issue was opened:

network data is stored in an attributes property on each node
node types and variables are identified by an ID, not name.

For other parts of the app (e.g., selectors) to work seamlessly with external and new network data, we probably want to present the transformed data to a requesting component ~~, and we may want to store the transformed version rather than the original contents~~. Edit: going by the HIVServices development example, type may be defined by the subject and not external data. This implies that the same external source could be used for multiple types, which are defined separately in the registry, and so the source needs to be preserved rather than the transposed version.

bryfox · 2018-11-14T21:29:42Z

Per https://github.com/codaco/Network-Canvas/issues/718#issuecomment-438765777, when transposing, any variables with names that don't exist in the registry for the given [stage] subject should be ignored.

jthrilly · 2019-01-22T12:48:37Z

This was implemented by @wwqrd in https://github.com/codaco/Network-Canvas/pull/840

Still left to do:

Implement async loading of remote assets (Async loading of remote assets #847)

jthrilly added this to the Alpha.4 milestone Mar 6, 2018

jthrilly mentioned this issue Mar 6, 2018

Components that consume external data as a network must generate UUIDs for nodes #415

Closed

jthrilly added enhancement discussion item refactor labels Mar 6, 2018

rebeccamadsen mentioned this issue Apr 12, 2018

Feature/display label defaults #443

Merged

jthrilly modified the milestones: Alpha.4 - Gresley, Alpha.6 May 28, 2018

This was referenced Jul 25, 2018

Ensure uids exist for all nodes in app #615

Merged

Requirements for externalData #616

Closed

This was referenced Jul 26, 2018

Arbitrary node labeling #613

Merged

Load dependencies before displaying protocol #621

Open

jthrilly added the NC label Sep 14, 2018

jthrilly removed this from the Alpha.6 milestone Sep 14, 2018

bryfox mentioned this issue Dec 19, 2018

Scope node IDs to session complexdatacollective/Server#217

Closed

jthrilly closed this as completed Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor externalData in protocol #413

Refactor externalData in protocol #413

jthrilly commented Mar 6, 2018 •

edited

Loading

wwqrd commented Mar 7, 2018 •

edited

Loading

jthrilly commented Mar 7, 2018

rebeccamadsen commented Mar 7, 2018

bryfox commented Jul 23, 2018

wwqrd commented Jul 26, 2018

jthrilly commented Aug 1, 2018

bryfox commented Nov 1, 2018 •

edited

Loading

bryfox commented Nov 14, 2018

jthrilly commented Jan 22, 2019

Refactor externalData in protocol #413

Refactor externalData in protocol #413

Comments

jthrilly commented Mar 6, 2018 • edited Loading

New approach for loading externalData

Questions

wwqrd commented Mar 7, 2018 • edited Loading

jthrilly commented Mar 7, 2018

rebeccamadsen commented Mar 7, 2018

bryfox commented Jul 23, 2018

wwqrd commented Jul 26, 2018

jthrilly commented Aug 1, 2018

bryfox commented Nov 1, 2018 • edited Loading

bryfox commented Nov 14, 2018

jthrilly commented Jan 22, 2019

jthrilly commented Mar 6, 2018 •

edited

Loading

wwqrd commented Mar 7, 2018 •

edited

Loading

bryfox commented Nov 1, 2018 •

edited

Loading