Skip to content

STRIDES-Codes/subject-sample-search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NIH STRIDES-Codes/subject-sample-search

Welscome to NIH STRIDES-Codes/subject-sample-search.

This codeathon project will explore existing technologies to search subjects and sample data based clinical, phenotypic and other attributes of subjects and sampes. We hope to explore the GA4GH Discovery Search API, BigQuery and FHIR.

The fasp_in branch contains the working code for the codeathon. This was created from a branch in fasp-scripts. That project provides some context for how a Search API might be used with file access and workflow executiion services to compute on biomedical data.

GA4GH Search

GA4GH Search is a new API specification recently been submitted for approval. A reference implementation is available from DNAStack. Search provides the capability to make available data from multiple data technologies such as json, FHIR, Phenopackets and sql databases.

Options for the codeathon might include

  • Exploring and querying the existing data sources
  • Adding data sources e.g. as BigQuery Tables
  • Installing and running a GA4GH Server

BigQuery

BiqQuery tables are made available by SRA and by the Institute for Systems Biology Cancer Genomics Cloud.

These allow search of subject and sample data from projects with corresponding genomic data. Searches may be conducted via the Google Cloud Platform console, the BigQuery API, or be set up to be queried by the GA4GH Search reference server.

FHIR

The Kids First FHIR Server provides a data dashboard and API endpoint to query data from the Kids First initiative. Queries allow links to GA4GH DRS ids to locate genomic data files.

Some search queries that may be useful here: https://docs.google.com/presentation/d/1Vdd1uVitm4H0yx3OkCODJir8dIltki2IGJtZpxddtxw/edit#slide=id.g88f2892937_5_26.

The full FHIR search spec is here: https://www.hl7.org/fhir/search.html.

dbGaP

As a database with a rich source of diverse datasets dbGaP is a good test for the use cases GA4GH Search and FHIR are trying to address. To work with it effectively a data scientist needs to be able to discover the fields, codes and structure of data set. The following diagram shows how scrambled representations of dbGaP data have been made available through GA4GH Search making use of the machine readable descriptions (schema) of the data provided by the submitters.

schemaflow

About

No description or website provided.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages