ModelTrainer endpoint is a continuous Training pipeline where I have added paperspace GPU Instance as a runner to with high configuration. Since Training endpoint is expensive we cant keep it live all the time so, Instance will Always be in off state. We have to manually trigger workflow to start the training.
- Gpu Access on paperSpace
- Aws S3 bucket for model Registry and Data
- Update and upgrade the machine
- Install the paperspace cli
- Register Gpu as a runner
- Add secrets
- Done
export ACCESS_KEY_ID=<access-key>
export AWS_SECRET_KEY=<secret-key>
export AWS_REGION=<aws-region>
export DATABASE_USERNAME=<username>
export DATABASE_PASSWORD=<password>
export API_KEY=<api-key>
export MACHINE_ID=<machine-id>
fatal error: Python.h: No such file or directory
sudo apt install libpython3.8-dev
# Aws S3
s3 Storage: $0.025 per GB / First 50 TB / Month
s3 PUT : $0.005 (per 1,000 requests)
S3 GET : $0.0004 (per 1,000 requests)
# PaperSpace
Gpu Machine:
Ram : 30 GB
Cpu's: 8
Storage: 50 Gb
Gpu: 8 GB
$0.462/ hour