REDCap to SQLite ETL Tool

Introduction

This project is a Python-based ETL (Extract, Transform, Load) tool designed to fetch data from a REDCap server and store it in a SQLite database. The ETL process involves the following steps:

Extract: Data is extracted from the REDCap server using REDCap APIs. Or if the data is already extracted, it is read from the extracted files.
Transform: The extracted data is then transformed into a format suitable for insertion into a SQLite database and stored as SQL files.
Load: The transformed data (one SLQ file per patient) is loaded into the SQLite database.

This tool is particularly useful for users needing to transfer large amounts of data from REDCap to SQLite in a reliable and efficient manner. You can customize the data model and mapping tables to suit your specific requirements. The tool can be easily tested without having access to a REDCap project token by using the provided example data.

System Setup

Docker Environment Setup

Ensure Docker and Docker-compose are installed on your system. You can verify this by running docker --version and docker-compose --version in your terminal.
Clone this repository to your local machine.
Navigate to the root directory of the cloned repository.
Define the Data Model and Mapping to Target Data Model as described in the "Data Model Definition" and "Mapping to Target Data Model" sections below.
Create a config_docker.json file in the root directory. You can use the config_docker_example.json file as a template. Define the parameters as described in the "Config File Setup" section below.
Run the docker-compose file by running docker-compose up.
The ETL process will start automatically, and repeat every 6 hours.
To stop the ETL process, run docker-compose down.

Python Environment Setup

Ensure Python 3.x is installed on your system. You can verify this by running python --version in your terminal.
Clone this repository to your local machine.
Navigate to the root directory of the cloned repository.
It is recommended to create a virtual environment to isolate the project dependencies. You can do this by running python -m venv venv.
Activate the virtual environment by running source venv/bin/activate (Linux/Mac) or venv\Scripts\activate (Windows).
Install the required Python packages by running pip install -r requirements.txt.
Define the Data Model and Mapping to Target Data Model as described in the "Data Model Definition" and "Mapping to Target Data Model" sections below.
Create a config.json file in the root directory. You can use the config_example.json file as a template. Define the parameters as described in the "Config File Setup" section below.
Run the ETL process by running python workflow.py.

Data Model Definition

Define your data model in a separate SQL file. This should include the tables and fields that will be present in your SQLite database.
Define the path to this file in the config.json file under the db_schema parameter.

Mapping to Target Data Model

Write a mapping files in your mapping folder (defined in the config.json file) to map the extracted data to the target data model. These csv files will help to transform the extracted REDCap data into the format of your SQLite data model.
The mapping files should have the following columns:
- Table: The name of the table in the target data model.
- Attribute: The name of the attribute in the target data model.
- NotNull_: NOT NULL if the attribute is required, nothing otherwise.
- field_name: The name of the field in the REDCap data. For further information, see the Mapping.md file and some example mapping files in the mappingtables folder of the ClassicDB example.

Config File Setup

Create a config.json file in the root directory. This file will store the configuration parameters for the ETL process. You can use the config_example.json file as a template.
Define the following parameters in the config.json file:
- repository_root: The path to the root directory of the cloned repository.
- extract_redcap: True if data should be extracted from REDCap, False otherwise.
- redcap_api_address: The URL of the REDCap API.
- redcap_project: The name of the REDCap project.
- redcap_api_token: The API token for accessing the REDCap project.
- extraction_path: The path where the extracted data should be stored.
- data_path: The path where the data files are stored.
- mapping_path: The path to the mapping tables.
- db_creation: True if the database should be created, False otherwise.
- db_wipe: True if the database should be wiped before loading data, False otherwise.
- db_path: The path where the SQLite database should be stored.
- db_schema: The path to the data model (SQL schema) file.
- db_load_data: True if data should be loaded into the database, False otherwise.

Example Data

The ClassicDB_example folder contains example data, a data model (see Figure) and their corresponding mapping tables (One possible mapping of Patient data) that can be used to test the ETL process without having access to a REDCap project.
The example data is stored in a CSV file and can be used to simulate the extraction process.
The example data can be used to test the transformation and loading processes as well.

To run the example data with Docker

Copy and rename the 'config_docker_example.json' file to 'config_docker.json'.
```
cp config_docker_example.json config_docker.json
```
Run the docker-compose file.
```
docker-compose up -d
```
The ETL process will start automatically, and repeat every 6 hours.
(optional) To stop the ETL process, run the following command.
```
docker-compose down
```

To run the example data with Python

Rename the 'config_example.json' file to 'config.json'.
```
mv config_example.json config.json
```
Run the ETL process.
```
python workflow.py
```

License

This project is licensed under the MIT License. This means you are free to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the software, under the conditions that you include the following:

A copy of the original MIT License in any redistributed copies or substantial portions of the software.
A clear acknowledgement of the original source of the software.

For more details, please see the LICENSE file in the project root.

Contributing

Contributions, issues, and feature requests are welcome!

Enjoy transferring data from REDCap to SQLite efficiently and reliably!

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
.github/workflows		.github/workflows
ClassicDB_example		ClassicDB_example
DMS		DMS
ETL		ETL
PyUtilities		PyUtilities
.gitignore		.gitignore
LICENSE		LICENSE
Mapping.md		Mapping.md
README.md		README.md
config_docker_example.json		config_docker_example.json
config_example.json		config_example.json
crontab		crontab
docker-compose.yml		docker-compose.yml
dockerfile		dockerfile
requirements.txt		requirements.txt
workflow.py		workflow.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

REDCap to SQLite ETL Tool

Introduction

System Setup

Docker Environment Setup

Python Environment Setup

Data Model Definition

Mapping to Target Data Model

Config File Setup

Example Data

To run the example data with Docker

To run the example data with Python

License

Contributing

About

Releases 1

Packages

Languages

License

IM2Neuroing/REDCap2SQLite

Folders and files

Latest commit

History

Repository files navigation

REDCap to SQLite ETL Tool

Introduction

System Setup

Docker Environment Setup

Python Environment Setup

Data Model Definition

Mapping to Target Data Model

Config File Setup

Example Data

To run the example data with Docker

To run the example data with Python

License

Contributing

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages