Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor externalData in protocol #413

Closed
jthrilly opened this issue Mar 6, 2018 · 9 comments
Closed

Refactor externalData in protocol #413

jthrilly opened this issue Mar 6, 2018 · 9 comments

Comments

@jthrilly
Copy link
Member

jthrilly commented Mar 6, 2018

New approach for loading externalData

  • Mirrors the way we load protocols: data can be loaded either locally (from a file in the asset folder of the protocol), or remotely (from a URL).
  • Interface configuration objects will continue to contain a pointer to an externalData item, but when accessing and loading this data, interfaces will use a special loader method.
  • Requesting the data via this method will return a promise, which will resolve with the contents of the file, or an error.
  • In the case of remote assets, the component should display whatever loading state (spinner etc.) is appropriate during the wait for the promise to resolve/error.
  • When requested for the first time, externalData with a remote URL is fetched with the result stored in app storage, with a filename the same as the key.
  • On subsequent requests, externalData with a remote URL has its lastFetched and cache properties evaluated, with data refreshed if needed. This is done by the loading function, not by the component consuming the data.
// redux protocol.externalData (same as protocol.js but with additional state properties)
{
  "externalData": {
    "clinics": {
      "friendly_name": "Available Clinics", // Used in architect to help user identify external data items
      "isLoading": false, // Used for loading state
      "isLoaded": false, // Used in conjunction with cache for skipping http request
      "url": "https://somewhere/clinics.json", 
      "cache": 3600 // Time in seconds before needing to re-fetch data
    },
    "hospitals": { // Example of remote asset that has already been fetched.
      "friendly_name": "Other available hospitals",
      "isLoading": false,
      "isLoaded": true, 
      "url": "https://somewhere/hospitals.csv", // requesting this data will resolve to asset: "appData/hospitals.json" because this has already been downloaded
      "lastFetched": 2342233, 
      "cache": 3600
    },
    "school_class": {
      "friendly_name": "School dataset",
      "asset": "class.json", // A local asset
    }
  }
}

Questions

  • Is the above a sensible approach? Can it be improved?
  • Where should isLoading and isLoaded go? I copied the above from the protocol in redux.
  • Should we use a separate folder (data?) for these assets?
@wwqrd
Copy link
Contributor

wwqrd commented Mar 7, 2018

This looks good to me.

Might not need a lastFetched key, we could probably get that from the file system, although it could make the logic a lot simpler.

I don't think the protocol object should really be manipulated, since it's a configuration object. So these properties should probably want to like elsewhere, say in the session data, perhaps in a parallel data structure. (maybe the same goes for the top-level isLoading property):

"isLoading": false,
"isLoaded": true,
"lastFetched": 2342233, 

How is this intended to work with our UIDs?

@jthrilly
Copy link
Member Author

jthrilly commented Mar 7, 2018

How is this intended to work with our UIDs?

This is independent as I see it. Once the data is loaded, an interface will determine if it requires UIDs to be present, and generate them with the logic set out in #415. Individual nodes in an external data file (even when loaded via URL) should hash to the same UID because its deterministic, right?

@rebeccamadsen
Copy link
Contributor

Makes sense to me. I agree we shouldn't manipulate the protocol object.

We don't allow duplicate keys in json, right? So there shouldn't be a name clash with the downloaded data.

@bryfox
Copy link
Contributor

bryfox commented Jul 23, 2018

(1) The wiki describes that data may be "dynamically loaded and created by other systems in response to interview variables" (e.g., participant ID). Are we working here with that goal in mind?

(2) Once loaded, is external data part of the redux state? "interfaces will use a special loader method" which "resolve with the contents of the file" implies not, but the "isLoading"/"isLoaded" flags in redux imply that it would be.

(3) If external data is in redux, it looks like it would need knowledge of other state branches (state.session, state.sessions, and state.protocol) to load data for panels and prompts; is that correct?

(4) Do we want to unload data that isn't needed for a stage and/or prompt? That seems wise if there are multiple large data sources, but (assuming data is in redux and not resolved directly to interfaces), we'll probably need maintain more state to know what can be expunged. (And if we don't care about this, then might it be preferable to load all external data in parallel up front with the protocol, so there's only a single "protocol loaded" event as far as the user in concerned?)

(5) Time-based cache expiry makes things interesting: data may change during an interview; if data is in redux, we can't caching selectors (at least with the default memoizers). Would an explicit 'clear cache' button in settings make for a more predictable UX?

(6) Does data loading also respect whatever cache headers are set by the server (used by the browser)?

@wwqrd
Copy link
Contributor

wwqrd commented Jul 26, 2018

I don't have any clear answers for this, but my understanding is we need to consider the following:

It should support different sources:

  • External files (remote and local asset)
  • APIs
  • json store in the protocol (or perhaps not, since we can support local .json assets)

It should support different types:

  • JSON
  • CSV
  • geo formats?

It may be utilised differently depending on the consumer:

  • As a supply of nodes
  • Could be specialised data for an interface

The spec in the issue describes how this might look in the protocol config: but in reality that wouldn't include things like loading state or caching info. It might not even make sense to put this data into redux, maybe it would be preferable to cache to local files. Perhaps the implementation could be a HOC which manages fetching / streaming / holding the data in state - whatever makes most sense for achieving the above features.

bryfox added a commit that referenced this issue Jul 26, 2018
- Add a workerAgent to multiplex messages
- Make the user functions ignorant of Worker boilerplate
- Remove externalData for now; see #413
- Formally make Node a container; avoid sending label props
  through ancestors
@jthrilly
Copy link
Member Author

jthrilly commented Aug 1, 2018

(1) The wiki describes that data may be "dynamically loaded and created by other systems in response to interview variables" (e.g., participant ID). Are we working here with that goal in mind?

Yes. The idea would be to have some form of templating system for use in the URL.

http://myserver.com/myscript?participant=${first_name}+${last_name}

(2) Once loaded, is external data part of the redux state? "interfaces will use a special loader method" which "resolve with the contents of the file" implies not, but the "isLoading"/"isLoaded" flags in redux imply that it would be.

Sorry - this is me being lax with my language use. I think the data structure described in my post is what would be in the redux state. Up until now, we have directly embedded external data within protocol.js. After this change, all external data will either be loaded either via a file load operation (for a local resource bundled with the protocol), or a http request, followed by a file load operation (for a cached remote resource).

(4) Do we want to unload data that isn't needed for a stage and/or prompt? That seems wise if there are multiple large data sources, but (assuming data is in redux and not resolved directly to interfaces), we'll probably need maintain more state to know what can be expunged. (And if we don't care about this, then might it be preferable to load all external data in parallel up front with the protocol, so there's only a single "protocol loaded" event as far as the user in concerned?)

I am not sure what the repercussions of this would be, but the idea I had was to "load" (that is, read and parse) external data as it is requested by an interface. This would be similar to how the information interface loads media such as video or audio. We could also 'preload' (as discussed in #621), but might this have implications for RAM usage? Not sure - genuinely curious.

The other consideration is that given that session data might be passed to the endpoint in the URL (see 1), some external data cannot be meaningfully resolved until a specific point in the interview.

(5) Time-based cache expiry makes things interesting: data may change during an interview; if data is in redux, we can't caching selectors (at least with the default memoizers). Would an explicit 'clear cache' button in settings make for a more predictable UX?

The resolution to this depends to an extent on how we resolve 4. For example, if we load data at interview session start, we no longer need to track how stale the data is, or worry about it changing within the interview.

The real reason I liked this, is that it would give the researcher some control over how often data is expected to meaningfully change. If it effectively needs to be fetched in response to interview session values, the cache property would seem to be redundant.

Let's start by fetching data every time for now, and refine this as we go.

(6) Does data loading also respect whatever cache headers are set by the server (used by the browser)?

This was another thought I had while writing the above. Perhaps we can solve this problem at the server end, and use existing transport level mechanisms for cache. Something to consider.

@jthrilly jthrilly added the NC label Sep 14, 2018
@jthrilly jthrilly removed this from the Alpha.6 milestone Sep 14, 2018
@bryfox
Copy link
Contributor

bryfox commented Nov 1, 2018

Two changes since this issue was opened:

  • network data is stored in an attributes property on each node
  • node types and variables are identified by an ID, not name.

For other parts of the app (e.g., selectors) to work seamlessly with external and new network data, we probably want to present the transformed data to a requesting component , and we may want to store the transformed version rather than the original contents. Edit: going by the HIVServices development example, type may be defined by the subject and not external data. This implies that the same external source could be used for multiple types, which are defined separately in the registry, and so the source needs to be preserved rather than the transposed version.

@bryfox
Copy link
Contributor

bryfox commented Nov 14, 2018

Per https://github.com/codaco/Network-Canvas/issues/718#issuecomment-438765777, when transposing, any variables with names that don't exist in the registry for the given [stage] subject should be ignored.

@jthrilly
Copy link
Member Author

This was implemented by @wwqrd in https://github.com/codaco/Network-Canvas/pull/840

Still left to do:

@jthrilly jthrilly closed this as completed Feb 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants