Skip to content
This repository has been archived by the owner on Oct 17, 2020. It is now read-only.

Potential Case Study: VA #118

Open
mheadd opened this issue Feb 12, 2019 · 4 comments
Open

Potential Case Study: VA #118

mheadd opened this issue Feb 12, 2019 · 4 comments

Comments

@mheadd
Copy link
Contributor

mheadd commented Feb 12, 2019

Case study

  • Agency / office: Veterans Administration
  • Contact: Gil Alterovitz ([email protected])
  • Type of data:
    Electronic medical record data, administrative and demographic information, genetic test and sequence data, imaging, benefit and cemetery information, patient reported data, and other data linked to the Veteran population. These contain data from various sources either collected or received during providing clinical or administrative services to Veterans, or collected or received as part of clinical trials or other research studies.
  • Reference: VA Data Commons Plan draft (Private link)

Overview of the opportunity

The opportunity is to collaborate with the team in the Office of Research and Development (ORD) of the VA working on the pilot and implementation of the "VA Data Commons," which is working towards the ORD strategic priority of “Transforming VA data into a National Resource.”

ORD is responsible for over 4,000 active research projects across over 100 sites. Veterans have expressed interest in engaging in and sharing their information with others based on current and past work on Veteran preferences. Better understanding and adhering Veteran interest for sharing other types of clinical and research data is a goal of the ORD committee called with Guiding Use, Authority, and Release of Data according to Veteran Preference (GUARD).

A pilot for a VA Data Commons will develop three items:

  1. A regulatory framework to securely transfer, store, and use VA data within a VA Data Commons.
  2. Technical requirements and software applications to conduct common research functions using VA data within a VA Data Commons.
  3. An estimate of the anticipated costs needed to maintain an on-premise and/or cloud-based platform for a VA Data Commons capable of scaling to supporting thousands of approved research studies

VA Data Commons principles include:

  • Managing access to VA data outside of the VA
  • Ensuring security and ease of information exchange
  • Ensuring scalability, usability, and sustainability

The principles of usability and scalability seem particularly relevant here:

  • Scalability: A VA Data Commons should be scalable in terms of architecture, local application software, data access, and accessibility across multiple simultaneous users.
  • Usability: A VA Data Commons should have a usable and useful environment with tools, APIs, and features that facilitate efficient and reproducible research.

They have identified a number of use cases for the pilot; we are called out in one of them!

  • Distributable Data: There are several data sets that have been created and consented with the express purpose of sharing with trusted partners and investigators outside VA. These include a subset of the Precision Oncology Program participants who have consented for additional release and VA Cooperative Studies Program (CSP) Integrated Veteran Epidemiologic Study Data Resource (INVESTD-R). The benefit/risk and business model for distributing data versus allowing access to a secure hosted resource will be compared. Other datasets that fall into this category are analytic datasets required by journals to be made available after publication. We may collaborate with the U.S. General Services Administration (GSA) for federal agencies that review distributed data strategy and data federation (https://federation.data.gov/).

Other relevant use cases include:

  • Aggregate Data Exploration: One of the most common and useful elements before starting a study or while getting started – is to explore the characteristics of the data using tools that describe and visualize. Several of these tools exist in open communities, such as those in the OMOP community and those built for the NCI Genomics Data Commons. Relevant because we may be able to help with aggregating data.
  • Primary Data Collection: Many studies rely on patients and providers filling out surveys; reviewing medical record data; recording, transcribing, and coding interviews; and otherwise collecting potentially sensitive information that is not recorded in a medical record. A Data Commons would support the secure collection and import of research data, along with the ability to combine with other VA Data, such as medical record data, to analyze and conduct research. Relevant because it's about data collection from many individuals
  • Genetic Imputation and Quality Control: When MVP genotype data are returned from vendors, the GenISIS team performs imputation to take the ~700k SNP chip data to approximately 2 million imputed variants. This task is followed by quality control for both within and across batches. These are tasks that may lend themselves to scalability in a VA Data Commons. We would replicate the imputation and quality control processes in the VA Data Commons. Relevant because it's about data validation/quality control
@mheadd
Copy link
Contributor Author

mheadd commented Feb 12, 2019

@juliaklindpaintner Can you take a crack at flushing this out a bit more with some additional details on the opportunity?

@juliaklindpaintner
Copy link
Member

Updated the Issue description with more context! After reading through the Data Commons plan I'm more optimistic about this as a potential use case.

@mheadd
Copy link
Contributor Author

mheadd commented Feb 13, 2019

There is (or has been) a USDS team working with the VA. Not sure how close they are to this work, but it might make sense to ping them if this develops.

@juliaklindpaintner
Copy link
Member

Scheduled follow-up conversation with Gil and Justin for 3/18/19.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants