The data API provides access to querying, retrieval, and indexing of public BV-BRC data and for private annotated data. The API provides a REST interface to the rich data BV-BRC provides. The data can be retrieved directly by ID or it can be queried using the Request Query Language (RQL) syntax or using Solr syntax. As queries are submitted to the API they are modified and submitted to the backend data sources (Solr) to retrieve the data that is visible to the user. Users are able to view public data, any data they own, or any data that another user has shared with them.
The BV-BRC API is an Node.js application using the Express framework. It currently requires Node.js version 14 or later. The source code is available at Github here: https://github.com/BV-BRC/BV-BRC-API.
To install you may run the following:
git clone https://github.com/BV-BRC/BV-BRC-API
cd BV-BRC-API
npm install
cp p3api.conf.sample p3api.conf # modify as appropriate
The configuration file p3api.conf
defines a number of site-specific variables. Most important is the location of the BV-BRC Solr service.
npm start
DEBUG=p3api-server npm start
./node_modules/pm2/bin/pm2 start app.js -i 3 --name "bvbrc-api-service" --merge-logs -o bvbrc_api_service.out.log -e bvbrc_api_service.err.log
For the latest documentation on setting up a test environment and running/writing tests, see here.
The BV-BRC API service enables direct retrieval of objects from the data source through HTTP GET request using the unique ID for each data type (i.e., genome_id for the Genome collections) as well as querying data sources using either RQL syntax or SOLR query syntax.
Genome Retrieval Example:
http://HOST:PORT/genome/GENOME_ID
Queries can be submitted as GET requests (with the query in the URL) or as POST requests with the query contained in the request body. The latter is useful for large queries which would exceed the maximum length of URLs supported by browsers/servers.
Genome Feature Query Example:
http://HOST:PORT/genome_feature/?eq(annotation,PATRIC)&select(genome_id,genome_name,annotation)&limit(10)&http_accept=application/json
Responses from queries are available in a number of formats:
- application/json : Returns an array of response objects
- application/solr+json : Returns an objects in the SOLR response format
- text/csv : Returns objects in comma separated values format. Columns are separated by ',', Multi-value columns are separated by ';', and rows are separated by "\n"
- text/tsv : Returns objects in tab separated values format. Columns are separted by "\t", Multi-value columns are separated by ";", and rows are separted by "\n"
- application/vnd.openxmlformats : Returns objects for use in MS Excel
- application/dna+fasta : Returns DNA sequences for queries in FASTA format (this currently only makes sense for the 'genome_feature' collection)
- application/protein+fasta: Returns Protein sequences for queries in FASTA format (this currently only makes sense for the 'genome_feature' collection)
- application/gff : Returns a genomic features in GFF format (This only makes sense for the 'genome_feature' collection)
The response format is determined by passing in the desired type in the HTTP Accept header of the request. In cases where it is not possible to supply HTTP headers, the accept header can be specified by adding &http_accept=FORMAT in the URL itself (e.g., &http_accept=application/json)
The following operators are available for RQL Queries:
- eq(FIELD,VALUE) : Equals
- ne(FIELD,VALUE) : Not Equals
- gt(FIELD,VALUE) : Greater than
- lt(FIELD,VALUE) : Less than
- keyword(VALUE) : Text search (the specific fields that will be searched depends on the data sources' configuration)
- in(FIELD,(VALUE1,VALUE2,VALUE3)) : Returns objects whose FIELD contains any of the provided values
- and(EXPRESSION,EXPRESSION,...) : ANDs two or more expressions together
- or(EXPRESSION,EXPRESSION,...) : ORs two or more expressions together
- select(FIELD1,FIELD2,FIELD3,....) : Returns only the specified fields from result objects
- sort([+|-]FIELD,[+|-]FIELD2) : Sorts result data by field. Specify + or - to sort the results ascending or descending
- limit(COUNT,START) : Specifies a limit to the query where COUNT is the total number of objects to return and start is the starting index within the query to return from
- GenomeGroup(WORKSPACE_PATH) : Retrieves the GenomeGroup from WORKSPACE_PATH for use in a query (e.g., &in(genome_id,GenomeGroup(/path/to/my/group)) )
- FeatureGroup(WORKSPACE_PATH) : Retrieves the FeatureGroup from WORKSPACE_PATH for use in a query (e.g., &in(feature_id,FeatureGroup(/path/to/my/group)) )
- facet((FACET_PROPERTY,PROPERTY_VALUE),(FACET_PROPERTY,PROPERTY_VALUE),...) : Allows facets to be specified along with a query. Facet results are included in the HTTP response header when the response content-type is application/json and included in the response body for application/solr+json
HTTP Headers can be supplied normally or in a url by preceding the header name with "http_". (e.g., &http_accept=application/json)
Requests can force the server to set content-dispostion (thereby forcing a browser to download the file) by adding &http_download=true onto the url. This must be used in combination with sort(+UNIQUE_KEY) to increase the download limit to 25 million records.
These instructions describe how build a singularity container for the BV-BRC API and deploy it. The process requires singularity and jq.
./buildImage.sh
or
npm run build-image
These both generate a file with the name bvbrc_api-<VERSION>.sif
.
The deployment requires two folders, a configuration folder and a log folder. One can be a child of the other if desired. To bootstrap the run the following command:
singularity instance start \
--bind /PATH/TO/CONFIG/FOLDER:/config \
--bind /PATH/TO/LOG/FOLDER:/logs \
--bind /PATH/TO/TREES/FOLDER:/trees \
--bind /PATH/TO/PUBLIC/GENOMES/FOLDER:/genomes \
--bind /PATH/TO/QUEUE/FOLDER:/queue \
/path/to/bvbrc_api-x.x.x.sif bvbrc_api bvbrc_api
NOTE: The last two parameters describe the singularity instance name. The should both exist and they should ALWAYS be the same.
This command will start an instance of bvbrc_api with a default config (that may fail to run). Additionally, it will populate the configuration a number of additional files. The p3_api.conf and pm2.config.js files are the bvbrc_api configuration file and a configuration file to tell pm2 how to behave within the container. Both of these may be edited and will not get replaced if they exist. An existing p3_api.conf should be directly usable for the most part, but will need to have paths pointing at the tree folder, public genomes folder, and the indexer queue folder updated to match the container internal mount points (/trees,/genomes,/queue). You may copy an existing p3_api.conf file into the configuration file before running the above command (with the aforementioned changes), and it will use that from the start. A number of shell scripts for controlling the application will be generated the first time the command is run (or whenever start.sh doesn't exist).
- start.sh : Starts the singularity container and the process manager within
- stop.sh : Stops the process manager and the stops the container
- restart.sh: Calls ./stop.sh && ./start.sh
- start-indexer.sh: Starts the indexer
- stop-indexer.sh: Stops just the indexer
- reload.sh : Calls "reload" on the process manager. This is for graceful reload after modifying the configuration file or for some other reason
- reload-api.sh: Gracefully reload the api only.
- scale.sh : This modifies the number of running instances in the process manager to
- pm2.sh : This is a simple wrapper around the pm2 process manager running inside the container
- shell.sh : This is simple wrapper around the shell command to connect to the instance
- p3-check-history.sh
- p3-check-integrity.sh
- p3-clear-index-queue.sh
- p3-index-completed.sh
- p3-index-count.sh
- p3-rebuild-history.sh
- p3-reindex.sh
- p3-update-history.sh
You will also note an instance.vars file. This file contains variables pointing at the singularity image, instance name, and bind parameters so that they won't need to be provided again. Further, when an new image comes in, modify instance.vars to point at the new image, stop the existing service (./stop.sh), and then run start.sh to start again with the new image.
- The same image may be used for multiple configuration files. Deploy an image to alpha (by pointing at the alpha configuration) and when all is good, simply use the same image for beta and then production.
- A configuration folder must NOT be used by multiple instances concurrently. The configuration folder holds the pm2 specifics for that instance and will conflict if two instances use the same folder.
- Log folder can be shared between multiple applications provided that the log file names themselves are unique.