Skip to content

ericaVoss/omop_harvest

 
 

Repository files navigation

OMOP + Harvest

Explore simulated data in the OMOP data model using Harvest

Query and explore one of the OSIM2 datasets by clicking the "Explore OMOP Harvest" link below

Registration is required prior to use of the site.

About OMOP and OSIM2:

The Observational Medical Outcomes Partnership (OMOP) has created a Common Data Model (CDM) to standardize the format and content of observational data, with the goal of enabling the application of standardized tools and methods to it. The database underlying this web application implements the fourth version of the OMOP CDM.

OMOP has also created the Observational Medical Dataset Simulator Generation 2 (OSIM2) and released some datasets for general use. The data you find here is a subset of the OSIM2_10M_MSLR_MEDDRA_11 dataset.

About Harvest:

Harvest is a toolkit for building web applications that facilitates integrating, discovering, and reporting on data. It is developed and maintained by the Center for Biomedical Informatics (CBMi). It is nationally funded, designed for biomedical data first, open source and available on GitHub, and it comes with an HTML5 web client.

For best performance, please use Chrome, Firefox or Safari.

Install locally:

You can install this project on your local server, using any dataset in OMOP CDMV4 format or produced by OSIM2. These instructions are intended for use with a PostgreSQL database.

1. Make a python virtualenv for the project (optional but highly recommended)

virtualenv omop_harvest_env

2. Clone the project repository

cd omop_harvest_env
git clone http://github.research.chop.edu/cbmi/omop_harvest.git

3. Install project requirements

source bin/activate
cd omop_harvest
pip install -U -r requirements.txt

4. Load data into your database

  • Using an OMOP released OSIM2 dataset (please review etl/KNOWN_ISSUES.md before filing issues related to this process)

    • Download and Extract Data

      • Download the Vocabulary data files from http://omop.org/Vocabularies (look for them in the right hand column)
      • Download the OSIM2 data using SFTP following the instructions at http://omop.org/OSIM2 (OSIM2_10M_MSLR_MEDDRA_11 is known to work, but the others should as well)
      • Place both downloads in the etl folder and expand the vocabulary package there and the OSIM2 data package into etl/OSIM2_10M_MSLR_MEDDRA_11/
      • If you don't want the entire 10M patient OSIM2 dataset, we have created a one-fifth size subset of that data that can be provided by request
    • Pre-process the Data

      • Run the etl/remove_pipes_and_double_quotes.sh script. It should just work if you put all the downloaded data in the right place. It will take a while.
    • Create the Database Tables

      • Start a psql session and source both the etl/CDMV4_OSIM2_tables_postgres.ddl and etl/VocabV4_tables_postgres.ddl files (use the \i psql command)
    • Load the Data

      • Make sure your psql session is running in the etl folder (the \! pwd and \cd commands will help), and then source both the OSIM2_data_load.psql (or OSIM2_fth_load.psql if you have the fifth-size dataset) and VocabV4_data_load.psql files using the \ir command to take advantage of the relative path names in the scripts. This will take a long time.
    • Create Database Relationships

      • Source both the etl/CDMV4_OSIM2_relations_postgres.ddl and etl/VocabV4_relations_postgres.ddl (using \i again). This will take less time than the last step, but still a lot of time.
  • Using your own dataset

    • If your dataset is not yet in a database, use some version of the preprocessing, table creation, data loading, and relationship creation scripts found in the etl folder. They will likely have to be modified depending on the format of your data and the target database

    • If your dataset is in a database, then you may have to modify it to work with the Django ORM. For example, each table must have a single integer primary key field. Something like this might work:

        ALTER TABLE foo ADD COLUMN id SERIAL; 
        UPDATE foo SET id = DEFAULT;
        ALTER TABLE foo ADD PRIMARY KEY (id);
      

5. Configure local settings and connect Harvest to your database

cp omop_harvest/local_settings.py.sample omop_harvest/local_settings.py

Edit the omop_harvest/local_settings.py file, specifically:

6. Check concept table normalization SQL code

Especially if you are using your own data or any database other than PostgreSQL, check the SQL found on lines #16-22 and #30-36 of omop_harvest/migrations/0002_views.py for compatibility. This code creates tables, each of which represents Vocabulary concepts that apply to a particular field in the data model. This further normalization of the data model is necessary in order for Harvest to make these concepts available as expected. Make sure the SQL is compatible with your database and the tables which are referenced actually exist.

7. Run South migrations

# Fake the initial project model migration, as we already created the tables
python bin/manage.py migrate --fake omop_harvest 0001
# Migrate all apps forward; this will run the SQL code reviewed in the last step, so it may take a while
python bin/manage.py syncdb --migrate 

8. Check the Django models

Again, if you are using your own data or a non-PGSQL database, it is especially important that you check the Django models defined in omop_harvest/models.py. Specifically, check to be sure the db_table setting references a real table for each model and that each field references a real column on that table. Notice, however, that ForeignKey fields will be named with the _id at the end of the column name truncated. The models with managed = False should reference the tables created in step 4 above.

9. Configure your server settings

We run our apps using an nginx server that passes requests to uWSGI processes managed by supervisord and include server settings to that effect in the server directory. You should do whatever is most comfortable for you. If you don't want to bother with a production-type server environment right now, just do python bin/manage.py runserver 5678 and then open your browser and navigate to http://localhost:5678.

About

OMOP CDMV4 data model with OSIM2 data in Harvest

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published