Skip to content
Yannick Marcon edited this page Jul 24, 2018 · 9 revisions

Summary

Searching variables/datasets/studies/networks is the first step in the data exploration. The second step is the exploitation of the found documents: saving sets of variables, exporting data dictionaries, composing variable sets, getting some taxonomy coverage statistics, binding a variable set to a data access request, searching entities matching variables from a variable set etc.

Adding documents to a cart and saving them in sets are features available to anyone, i.e. authentication is not required.

See also GDC Save Sets Specification As An Example.

Rational

Make search in the web data portal more useful.

User Stories

Describe simply who is doing what and how to obtain the result.

# Who What How Result
1
2
...

Scope

Server and (js) client.

Design and Implementation Plan

Domain

Set

A document (variables etc.) set is a set of documents that is:

  • explicitly described by an enumerated list of variable identifiers OR described by a composition of several sets
  • associated to a user
  • uniquely identified
  • described by a human readable name

Despite it would be very convenient to store the document set on client side, due to the limit of the browser database the documents set must always be persisted on server side (even for anonymous users) and the client will only handle the sets meta information (name, number of documents etc.). Document set operations are also performed on server side: union, intersection, complement, export etc.

Document Set

Document set persistance is done in two parts:

  • document set (id, name, creation date etc) are stored in MongoDB
  • document set identifiers are stored in the targetted document: for instance a dedicated field of class DatasetVariable: sets that is an array of variable set identifiers to which the variable belongs.

Storing the association between a document and one or more sets in the search engine allows to apply document search criteria combined with the belonging to one or more sets, in order to:

  • display documents from a set in the search page,
  • count documents in the subsets when preparing variable set composition.

As the search criteria are expressed using a taxonomy (the document properties one), a vocabulary that represent the sets to which a document belongs is to be added: exact match queries will be performed on this field to extract documents belonging to one or more sets.

When a document is indexed (after a dataset has been updated for instance), the indexing process must enrich the documents with the sets they belong to. This requires for each document a query in MongoDB to find the sets that contains the document identifier; these sets identifiers are then added to the sets field for indexing. This way the document index is still usable for filtering documents by the sets they belong to, even after a re-publication.

In case some documents have been removed (after a document update), the count of documents in a set must be extracted from the document index (not from the MongoDB object).

#### Set Operation

A set operation is a list of set compositions. These compositions are expressed by a RQL query string.

#### Composed Set

A composed set is automatically created (and persisted in MongoDB within a Set Operation) when user is making operations on sets. The composed sets does not list explicitly the identifiers of the associated documents; instead of that it provides:

  • the list of the set ids that are involved in the set operation,
  • the query (RQL) that is to be used to develop the user query from the search page.

For instance, the user query on the composed set:

in(Mica_variable.sets,inter_s1_s2)

is developped before being submitted to the search engine as:

and(in(sets,S1),in(sets,S2))

#### Cart

The cart can contain sets of documents with different types.

Documents can be added to the cart when browsing the repository (network/study/dataset/variable pages) or when searching documents. From the server point of view, a document cart is a document set without a name. This document set content can be updated (addition/deletion of documents). The action of saving the document cart simply gives a name to this set (and apply the current user name if the user has logged-in in the meantime).

Actions

Creation

A document set can be created by:

  • getting the list of documents from the cart,
  • saving a search query results,
  • composing several document sets,
  • importing a list of document identifiers.

Operation

Several document sets can be composed. Result of this operation can be used to:

  • create a new document set,
  • download the documents.

Operations that can be performed on document sets are (see Basic operations on Sets):

  • U : union
  • ∩ : intersection
    • : difference (relative complement)

The set documents statements can be described in RQL:

  • union(S1,S2,S3)
  • inter(S1,S2,S3)
  • diff(inter(S1,S2),S3)
  • diff(S1,union(S2,S3))
  • etc.

Download

The document set can be downloaded in a CSV/TSV file.

Export

The list of the document identifiers of the set can be downloaded.

Import

A file containing the document identifiers can be uploaded to build a new enumerated document set.

Deletion

A document set can be deleted.

List

Document sets of a user can be listed. When several sets are selected in the list, the possible actions are: operation, download and deletion. The JS client will only the sets that are in the browser local store.

Web Services

Some REST resources to manage variable sets.

REST Description
GET /variables/sets  Get the variable sets associated to the current user
GET /variables/sets?id=xxx&id=xxx  List the variable sets matching the provided identifiers
POST /variables/sets/operations?s1=xxx&s2=xxx&s3=xxx  Create a set operation from a list of sets (maximum of three)
GET /variables/sets/operation/xxx  Get a set operation with count of documents for each of the compositions
DELETE /variables/sets/operation/xxx  Delete a set operation
POST /variables/sets?name=xxx  Create a variable set from: a variable RQL query, or a set RQL query (to compose variable sets), or a cart identifier. Name can be empty (this makes a variable cart). Current user is automatically associated to the set.
POST /variables/sets/_import?name=xxx  Create a variable set by uploading a CSV/TSV file containing variable identifiers (in the first column)
GET /variables/set/id  Get a variable set meta-data
GET /variables/set/id/_list?offset=0&limit=20  Page on variables of a set
GET /variables/set/id/_export  Download the variable identifiers list
GET /variables/set/id/_download  Download the variable list of a set as a CSV/TSV file
POST /variables/set/id/variables Add variables from: a variable RQL query, or a set RQL query (to compose variable sets), or a list of variable identifiers
DELETE /variables/set/id/variables Delete all variables of the set.
POST /variables/set/id/variables/_delete Delete a specified list of variables.
PUT /variables/set/id?name=xxx Update/set the name of the variable set. If the variable set has no name, it is a cart and the current user name is also applied.
PUT /variables/set/id/_delete Mark the variable set for removal.
DELETE /variables/set/id/_delete Unmark the variable set for removal.
DELETE /variables/set/id Delete a variable set (forced). All set operations in which the set is involved will be deleted as well.

UI Mockups

Test/Demo Plan

How can the feature be tested or demonstrated. It is important to describe this in fairly great details so anyone can perform the demo or test.

Unresolved Issues

This should highlight any issues that should be addressed in further specifications, and not problems with the specification itself; since any specification with problems cannot be approved.

Clone this wiki locally