Registration is required prior to use of the site.
The Observational Medical Outcomes Partnership (OMOP) has created a Common Data Model (CDM) to standardize the format and content of observational data, with the goal of enabling the application of standardized tools and methods to it. The database underlying this web application implements the fourth version of the OMOP CDM.
OMOP has also created the Observational Medical Dataset Simulator Generation 2 (OSIM2) and released some datasets for general use. The data you find here is a subset of the OSIM2_10M_MSLR_MEDDRA_11 dataset.
Harvest is a toolkit for building web applications that facilitates integrating, discovering, and reporting on data. It is developed and maintained by the Center for Biomedical Informatics (CBMi). It is nationally funded, designed for biomedical data first, open source and available on GitHub, and it comes with an HTML5 web client.
For best performance, please use Chrome, Firefox or Safari.
You can install this project on your local server, using any dataset in OMOP CDMV4 format or produced by OSIM2. These instructions are intended for use with a PostgreSQL database.
virtualenv omop_harvest_env
cd omop_harvest_env
git clone http://github.research.chop.edu/cbmi/omop_harvest.git
source bin/activate
cd omop_harvest
pip install -U -r requirements.txt
-
Using an OMOP released OSIM2 dataset (please review
etl/KNOWN_ISSUES.md
before filing issues related to this process)-
Download and Extract Data
- Download the Vocabulary data files from http://omop.org/Vocabularies (look for them in the right hand column)
- Download the OSIM2 data using SFTP following the instructions at http://omop.org/OSIM2 (OSIM2_10M_MSLR_MEDDRA_11 is known to work, but the others should as well)
- Place both downloads in the
etl
folder and expand the vocabulary package there and the OSIM2 data package intoetl/OSIM2_10M_MSLR_MEDDRA_11/
- If you don't want the entire 10M patient OSIM2 dataset, we have created a one-fifth size subset of that data that can be provided by request
-
Pre-process the Data
- Run the
etl/remove_pipes_and_double_quotes.sh
script. It should just work if you put all the downloaded data in the right place. It will take a while.
- Run the
-
Create the Database Tables
- Start a psql session and source both the
etl/CDMV4_OSIM2_tables_postgres.ddl
andetl/VocabV4_tables_postgres.ddl
files (use the\i
psql command)
- Start a psql session and source both the
-
Load the Data
- Make sure your psql session is running in the etl folder (the
\! pwd
and\cd
commands will help), and then source both theOSIM2_data_load.psql
(orOSIM2_fth_load.psql
if you have the fifth-size dataset) andVocabV4_data_load.psql
files using the\ir
command to take advantage of the relative path names in the scripts. This will take a long time.
- Make sure your psql session is running in the etl folder (the
-
Create Database Relationships
- Source both the
etl/CDMV4_OSIM2_relations_postgres.ddl
andetl/VocabV4_relations_postgres.ddl
(using\i
again). This will take less time than the last step, but still a lot of time.
- Source both the
-
-
Using your own dataset
-
If your dataset is not yet in a database, use some version of the preprocessing, table creation, data loading, and relationship creation scripts found in the
etl
folder. They will likely have to be modified depending on the format of your data and the target database -
If your dataset is in a database, then you may have to modify it to work with the Django ORM. For example, each table must have a single integer primary key field. Something like this might work:
ALTER TABLE foo ADD COLUMN id SERIAL; UPDATE foo SET id = DEFAULT; ALTER TABLE foo ADD PRIMARY KEY (id);
-
cp omop_harvest/local_settings.py.sample omop_harvest/local_settings.py
Edit the omop_harvest/local_settings.py
file, specifically:
- Insert a unique secret key (can be generated at http://www.miniwebtool.com/django-secret-key-generator/)
- Set up your database (documented at https://docs.djangoproject.com/en/1.5/ref/settings/#databases)
Especially if you are using your own data or any database other than PostgreSQL, check the SQL found on lines #16-22 and #30-36
of omop_harvest/migrations/0002_views.py
for compatibility.
This code creates tables, each of which represents Vocabulary concepts that apply to a particular field in the data model. This further normalization of the data model is necessary in order for Harvest to make these concepts available as expected.
Make sure the SQL is compatible with your database and the tables which are referenced actually exist.
# Fake the initial project model migration, as we already created the tables
python bin/manage.py migrate --fake omop_harvest 0001
# Migrate all apps forward; this will run the SQL code reviewed in the last step, so it may take a while
python bin/manage.py syncdb --migrate
Again, if you are using your own data or a non-PGSQL database, it is especially important that you check the Django models defined in omop_harvest/models.py
.
Specifically, check to be sure the db_table
setting references a real table for each model and that each field references a real column on that table. Notice, however, that ForeignKey
fields will be named with the _id
at the end of the column name truncated.
The models with managed = False
should reference the tables created in step 4 above.
We run our apps using an nginx server that passes requests to uWSGI processes managed by supervisord and include server settings to that effect in the server
directory. You should do whatever is most comfortable for you.
If you don't want to bother with a production-type server environment right now, just do python bin/manage.py runserver 5678
and then open your browser and navigate to http://localhost:5678
.