NIH STRIDES-Codes/subject-sample-search

Welscome to NIH STRIDES-Codes/subject-sample-search.

This codeathon project will explore existing technologies to search subjects and sample data based clinical, phenotypic and other attributes of subjects and sampes. We hope to explore the GA4GH Discovery Search API, BigQuery and FHIR.

The fasp_in branch contains the working code for the codeathon. This was created from a branch in fasp-scripts. That project provides some context for how a Search API might be used with file access and workflow executiion services to compute on biomedical data.

GA4GH Search

GA4GH Search is a new API specification recently been submitted for approval. A reference implementation is available from DNAStack. Search provides the capability to make available data from multiple data technologies such as json, FHIR, Phenopackets and sql databases.

Options for the codeathon might include

Exploring and querying the existing data sources
Adding data sources e.g. as BigQuery Tables
Installing and running a GA4GH Server

BigQuery

BiqQuery tables are made available by SRA and by the Institute for Systems Biology Cancer Genomics Cloud.

These allow search of subject and sample data from projects with corresponding genomic data. Searches may be conducted via the Google Cloud Platform console, the BigQuery API, or be set up to be queried by the GA4GH Search reference server.

FHIR

The Kids First FHIR Server provides a data dashboard and API endpoint to query data from the Kids First initiative. Queries allow links to GA4GH DRS ids to locate genomic data files.

Some search queries that may be useful here: https://docs.google.com/presentation/d/1Vdd1uVitm4H0yx3OkCODJir8dIltki2IGJtZpxddtxw/edit#slide=id.g88f2892937_5_26.

The full FHIR search spec is here: https://www.hl7.org/fhir/search.html.

dbGaP

As a database with a rich source of diverse datasets dbGaP is a good test for the use cases GA4GH Search and FHIR are trying to address. To work with it effectively a data scientist needs to be able to discover the fields, codes and structure of data set. The following diagram shows how scrambled representations of dbGaP data have been made available through GA4GH Search making use of the machine readable descriptions (schema) of the data provided by the submitters.

Name		Name	Last commit message	Last commit date
Latest commit History 160 Commits
doc/images		doc/images
fasp		fasp
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NIH STRIDES-Codes/subject-sample-search

GA4GH Search

BigQuery

FHIR

dbGaP

About

Releases

Packages

Contributors 4

Languages

License

STRIDES-Codes/subject-sample-search

Folders and files

Latest commit

History

Repository files navigation

NIH STRIDES-Codes/subject-sample-search

GA4GH Search

BigQuery

FHIR

dbGaP

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages