PyTorch Lightning - a lightweight PyTorch wrapper for high-performance AI research. Think of it as a framework for organizing your PyTorch code.
Hydra - a framework for elegantly configuring complex applications. The key feature is the ability to dynamically create a hierarchical configuration by composition and override it through config files and the command line.
DVC - A tool designed to handle large datasets and machine learning models in a version-controlled workflow
Tensorboard - TensorBoard is a tool that provides visualization and debugging capabilities for TensorFlow and PyTorch experiments. It’s a popular choice for monitoring machine learning training processes in real time.
AWS|EC2|S3|Lambda|ECR - AWS Elastic Compute Cloud (EC2) is a service that provides scalable virtual computing resources in the cloud.
Docker - A platform for creating, deploying, and managing lightweight, portable, and scalable containers.
Gradio - A Python library for building simple, interactive web interfaces for machine learning models and APIs.
├── .devcontainer <- vscode
│ └── devcontainer.json
├── .github <- Github Actions workflows
│ ├── ci-eval.yml
│ ├── ci-codecov.yml
│ ├── ci-test.yml
│ ├── ci-train.yml
│ └── ci-deploy.yml
├── assets
│ ├── hparams-artifacts.png
│ ├── MambaOutHparamSearch.png
│ ├── MambaOutHparamsTestScores.png
│ ├── OptunaHparams.png
│ ├── runner-ec2-training.png
│ └── self-hosted-runners.png
├── configs <- Hydra configs
│ ├── callbacks <- callback config
│ │ ├── default.yaml
│ │ ├── early_stopping.yaml
│ │ ├── learning_rate_monitor.yaml
| │ ├── model_checkpoint.yaml
│ │ ├── model_summary.yaml
│ │ ├── none.yaml
│ │ └── rich_progress_bar.yaml
│ ├── data <- data config
│ │ └── dogs.yaml
│ ├── debug <- debug config
│ │ ├── default.yaml
│ │ ├── fdr.yaml
│ │ ├── limit.yaml
│ │ ├── overfit.yaml
│ │ └── profiler.yaml
│ ├── experiment <- experiment config
│ │ └── finetune.yaml
│ ├── extras <- extras config
│ │ └── default.yaml
│ ├── hparams_search <- hparams config
│ │ └── mnist_optuna.yaml
│ ├── hydra <- hydra config
│ │ └── default.yaml
│ ├── logger <- logger config
│ │ ├── aim.yaml
│ │ ├── comet.yaml
│ │ ├── csv.yaml
│ │ ├── default.yaml
│ │ ├── many_loggers.yaml
│ │ ├── mlflow.yaml
│ │ ├── neptune.yaml
│ │ ├── tensorboard.yaml
│ │ └── wandb.yaml
│ ├── model <- model config
│ │ ├── mamba.yaml
│ │ ├── mnist.yaml
│ │ └── timm_classify.yaml
│ ├── paths <- path config
│ │ └── default.yaml
│ ├── trainer <- trainer config
│ │ ├── cpu.yaml
│ │ ├── ddp_sim.yaml
│ │ ├── ddp.yaml
│ │ ├── default.yaml
│ │ ├── gpu.yaml
│ │ └── mps.yaml
│ ├──
│ ├── eval.yaml <- evalution config
│ └── train.yaml <- training config
├── data <- DATASET
│ ├── dogs_dataset
│ │ ├── test
│ │ ├── train
│ │ └── validation
│ └── dogs_dataset.dvc
├── dvc.lock
├── dvc.yaml <- DVC
├── environment.yaml <- conda export `conda env export|grep -v "^prefix: " > environment.yml`
├── logs <- Logs generated by hydra and lightning loggers
├── multirun <- Logs for Hparams Search
├── outputs <- Logs for eval/fastrun
├── notebooks <- Jupyter notebooks
├── reports
│ ├── lr-Adam.png
│ ├── test-report.png
│ ├── train-report.png
│ └── val-report.png
├── samples <- inference
│ ├── checkpoints
│ │ └── epoch_019.ckpt
│ ├── inputs
│ │ ├── guess1.jpg
│ │ └── guess2.jpg
│ └── outputs
├── scripts <- Shell scripts
├── src
│ ├── datamodules
│ │ └──
│ ├── models
│ │ └──
│ ├── utils
│ │ ├──
│ │ ├──
│ │ ├──
│ │ ├──
│ │ ├──
│ │ └──
│ ├──
│ ├──
│ ├──
| └──
├── gradio <- GRADIO Space Huggingspace
│ ├── .gradio/worflows
│ │ └── update-space.yaml
│ ├── examples <- examples
│ │ ├── guess1.jpg
│ │ └── guess2.jpg
│ ├──
│ ├──
│ ├── dvc.lock
│ ├──
│ └── requirements.txt
├── tests <- Pytest
│ ├── datamodules
│ │ └──
│ ├── models
│ │ └──
│ ├──
│ └──
├── Makefile
├── requirements.txt <- requirements+GPU
├── requirements.txt.cpu <- requirements+CPU
├── Dockerfile <- Dockerfile+GPU
├── Dockerfile.cpu <- Dockerfile+CPU
├── compose.yml <- docker-compose
├── pyproject.toml
├── ruff.toml <- ruff check --fix
├── pytest.ini <- pytest config
├── .env
├── coverage.xml
79 directories, 107 files
Hydra creates new output directory for every executed run.
Default logging structure:
├── logs
│ ├── task_name
│ │ ├── runs # Logs generated by single runs
│ │ │ ├── YYYY-MM-DD_HH-MM-SS # Datetime of the run
│ │ │ │ ├── .hydra # Hydra logs
│ │ │ │ ├── csv # Csv logs
│ │ │ │ ├── wandb # Weights&Biases logs
│ │ │ │ ├── checkpoints # Training checkpoints
│ │ │ │ └── ... # Any other thing saved during training
│ │ │ └── ...
│ │ │
│ │ └── multiruns # Logs generated by multiruns
│ │ ├── YYYY-MM-DD_HH-MM-SS # Datetime of the multirun
│ │ │ ├──1 # Multirun job number
│ │ │ ├──2
│ │ │ └── ...
│ │ └── ...
│ │
│ └── debugs # Logs generated when debugging config is attached
│ └── ...
make trash
make clean
training simple model
make fastrun
make sshow
- Train DataLoader
- Val DataLoader
make test
============================================================================== test session starts ==============================================================================
platform linux -- Python 3.11.9, pytest-8.3.3, pluggy-1.5.0
rootdir: /home/muthu/GitHub/DogBreedsClassifier
configfile: pytest.ini
plugins: cov-5.0.0, anyio-3.7.1, time-machine-2.15.0, hydra-core-1.3.2
collected 6 items
tests/datamodules/ ... [ 50%]
tests/models/ . [ 66%]
tests/ . [ 83%]
tests/ . [100%]
=========================================================================================== warnings summary ============================================================================================
/home/muthu/miniconda3/envs/venv/lib/python3.11/site-packages/jupyter_client/ DeprecationWarning: Jupyter is migrating its paths to use standard platformdirs
given by the platformdirs library. To remove this warning and
see the appropriate new directories, set the environment variable
`JUPYTER_PLATFORM_DIRS=1` and then run `jupyter --paths`.
The use of platformdirs will be the default in `jupyter_core` v6
from jupyter_core.paths import jupyter_data_dir, jupyter_runtime_dir, secure_write
/home/muthu/miniconda3/envs/venv/lib/python3.11/site-packages/lightning/fabric/ `precision=16` is supported for historical reasons but its usage is discouraged. Please set your precision to 16-mixed instead!
/home/muthu/miniconda3/envs/venv/lib/python3.11/site-packages/lightning/pytorch/loops/ The number of training batches (8) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
/home/muthu/miniconda3/envs/venv/lib/python3.11/site-packages/torch/optim/ UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at
-- Docs:
=========================================================================================== warnings summary ============================================================================================
/home/muthu/miniconda3/envs/venv/lib/python3.11/site-packages/jupyter_client/ DeprecationWarning: Jupyter is migrating its paths to use standard platformdirs
given by the platformdirs library. To remove this warning and
see the appropriate new directories, set the environment variable
`JUPYTER_PLATFORM_DIRS=1` and then run `jupyter --paths`.
The use of platformdirs will be the default in `jupyter_core` v6
from jupyter_core.paths import jupyter_data_dir, jupyter_runtime_dir, secure_write
/home/muthu/miniconda3/envs/venv/lib/python3.11/site-packages/lightning/fabric/ `precision=16` is supported for historical reasons but its usage is discouraged. Please set your precision to 16-mixed instead!
/home/muthu/miniconda3/envs/venv/lib/python3.11/site-packages/lightning/pytorch/loops/ The number of training batches (8) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
/home/muthu/miniconda3/envs/venv/lib/python3.11/site-packages/torch/optim/ UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at
-- Docs:
==================================================================================== 6 passed, 5 warnings in 33.11s =====================================================================================
Train Matrix | Val Matrix | Test Matrix |
--output_folder # where to save
make trash
make clean
2024-11-10 20:22:17 | INFO | utils.logging_utils:wrapper:22 - Starting load_image
2024-11-10 20:22:17 | INFO | utils.logging_utils:wrapper:25 - Finished load_image
2024-11-10 20:22:17 | INFO | utils.logging_utils:wrapper:22 - Starting infer
2024-11-10 20:22:17 | INFO | utils.logging_utils:wrapper:25 - Finished infer
2024-11-10 20:22:17 | INFO | utils.logging_utils:wrapper:22 - Starting save_prediction_image
2024-11-10 20:22:17 | INFO | utils.logging_utils:wrapper:25 - Finished save_prediction_image
<class 'omegaconf.listconfig.ListConfig'> "conv_ratio": 1.2
"depths": [3, 3, 15, 3]
2024-11-10 20:22:17 | INFO | utils.logging_utils:wrapper:22 - Starting load_image
"dims": [6, 12, 24, 36]
"head_fn": default
"in_chans": 3
"lr": 0.001
"min_lr": 1e-06
"model_name": Mamba
"num_classes": 10
"pretrained": False
"scheduler_factor": 0.1
"scheduler_patience": 5
"trainable": False
"weight_decay": 1e-05
Processed guess2.jpg: Poodle (0.89)
2024-11-10 20:22:17 | INFO | utils.logging_utils:wrapper:25 - Finished load_image
2024-11-10 20:22:17 | INFO | utils.logging_utils:wrapper:22 - Starting infer
2024-11-10 20:22:17 | INFO | utils.logging_utils:wrapper:25 - Finished infer
2024-11-10 20:22:17 | INFO | utils.logging_utils:wrapper:22 - Starting save_prediction_image
2024-11-10 20:22:17 | INFO | utils.logging_utils:wrapper:25 - Finished save_prediction_image
Processed guess1.jpg: Boxer (0.96)
2024-11-10 20:22:17 | INFO | utils.logging_utils:wrapper:25 - Finished main