This project is used to create dummy data to seed machine learning pipelines before scaling up. It is designed to be deployed on AWS using the fargate deployment method.
The project consists of:
-
An API & schemas set up using FastAPI
-
Python scripts for creating dummy data, supported data types:
- Numerical Range
- Float Range
- Date Range
- Categorical Random Choice
- Random Text Generation
-
Unit testing and API testing using pytest
-
Docker file
-
CI/CD Script for circle CI, which requires the variables to be defined:
- AWS_ACCESS_KEY_ID
- AWS_ACCOUNT_ID
- AWS_DEFAULT_REGION
- AWS_SECRET_ACCESS_KEY
- Specific text generation
- Record limit increase above 10,000
- DB audit trails
- Latency improvements