Skip to content

Latest commit

 

History

History
45 lines (28 loc) · 3.07 KB

step-wise-adapter-creation-process.md

File metadata and controls

45 lines (28 loc) · 3.07 KB
description
Lists the steps to create an adapter / ETL Pipeline to ingest data from adopter's data sources into cQube V 5.0

Step-wise Adapter Creation Process

Overview

cQube adapter is an ETL (Extract, Transform and Load) pipeline with processes used to move data from the adopter database to multiple CSVs after making the required transformations. A cQube adapter is needed because cQube expects data in a specific format and the output CSVs of the adapter can be ingested directly into cQube to get the programs, reports and indicators.

Architecture

Flow of data from Adopter / State DB into CSV Files as per cQube Schema via adapter


Working of an Adapter

  1. The adapter makes a connection with the state / source database.
  2. It then fetches the data from the database tables
  3. It performs the transformation to generate the Dimension and Event (Fact) CSV files. The desired format and output columns list in the dimension and event file for each program can be found here.
  4. Output dimension CSV files will be stored inside AWS S3 Bucket / Minio / Azure in the input-bucket/dimensions/<dimension_name>.data.csv format.

Dimension Files in Minio

  1. Output Event CSV files will be stored inside AWS S3 Bucket / Minio / Azure in the input-bucket/combined_input/<program_name>//<event_name>.data.csv format.

Event Files in Minio

  1. This adapter ETL pipeline will run at a specific frequency so that the output CSV data can be refreshed and the latest data will be ingested into the system. To schedule your adapter, you can use Apache Airflow. Learn how to use Apache Airflow here.

Technology

cQube adopter can use any system, programming language or ETL tool to develop the cQube adapter.

For example:

  1. Python scripts can be used to extract data from the source / state database, transform it and finally export the CSV files inside the AWS S3 bucket or cloud storage which is being used. Apache Airflow can be used to scheduling the python scripts.
  2. Or, Apache NiFi can be used to create the end-to-end ETL Pipeline.

The only requirement is that the adapter-generated CSV files should have the same column names and the data format as mentioned here.

\