Runner-Setup

# DogBreedsClassifier

Main Technologies

PyTorch Lightning - a lightweight PyTorch wrapper for high-performance AI research. Think of it as a framework for organizing your PyTorch code.

Hydra - a framework for elegantly configuring complex applications. The key feature is the ability to dynamically create a hierarchical configuration by composition and override it through config files and the command line.

DVC - A tool designed to handle large datasets and machine learning models in a version-controlled workflow

Tensorboard - TensorBoard is a tool that provides visualization and debugging capabilities for TensorFlow and PyTorch experiments. It’s a popular choice for monitoring machine learning training processes in real time.

AWS|EC2|S3|Lambda|ECR - AWS Elastic Compute Cloud (EC2) is a service that provides scalable virtual computing resources in the cloud.

Docker - A platform for creating, deploying, and managing lightweight, portable, and scalable containers.

Gradio - A Python library for building simple, interactive web interfaces for machine learning models and APIs.

WORKFLows

Project Structure

.
├── .devcontainer            <- vscode
│   └── devcontainer.json
|
├── .github                   <- Github Actions workflows
│   ├── ci-eval.yml   
│   ├── ci-codecov.yml   
│   ├── ci-test.yml
│   ├── ci-train.yml
│   └── ci-deploy.yml
├── assets
│   ├── hparams-artifacts.png
│   ├── MambaOutHparamSearch.png
│   ├── MambaOutHparamsTestScores.png
│   ├── OptunaHparams.png
│   ├── runner-ec2-training.png
│   └── self-hosted-runners.png
|
├── configs                                                 <- Hydra configs
│      ├── callbacks                                        <- callback config
│      │   ├── default.yaml
│      │   ├── early_stopping.yaml
│      │   ├── learning_rate_monitor.yaml
|      │   ├── model_checkpoint.yaml
│      │   ├── model_summary.yaml
│      │   ├── none.yaml
│      │   └── rich_progress_bar.yaml
│      ├── data                                             <- data config
│      │   └── dogs.yaml
│      ├── debug                                            <- debug config
│      │   ├── default.yaml
│      │   ├── fdr.yaml
│      │   ├── limit.yaml
│      │   ├── overfit.yaml
│      │   └── profiler.yaml
│      ├── experiment                                       <- experiment config
│      │   └── finetune.yaml
│      ├── extras                                           <- extras config
│      │   └── default.yaml
│      ├── hparams_search                                   <- hparams config
│      │   └── mnist_optuna.yaml
│      ├── hydra                                            <- hydra config
│      │   └── default.yaml
│      ├── logger                                           <- logger config
│      │   ├── aim.yaml
│      │   ├── comet.yaml
│      │   ├── csv.yaml
│      │   ├── default.yaml
│      │   ├── many_loggers.yaml
│      │   ├── mlflow.yaml
│      │   ├── neptune.yaml
│      │   ├── tensorboard.yaml
│      │   └── wandb.yaml
│      ├── model                                            <- model config
│      │   ├── mamba.yaml
│      │   ├── mnist.yaml
│      │   └── timm_classify.yaml
│      ├── paths                                            <- path config
│      │   └── default.yaml 
│      ├── trainer                                          <- trainer config
│      │   ├── cpu.yaml
│      │   ├── ddp_sim.yaml
│      │   ├── ddp.yaml
│      │   ├── default.yaml
│      │   ├── gpu.yaml
│      │   └── mps.yaml
│      ├── __init__.py
│      ├── eval.yaml                                        <- evalution config
│      └── train.yaml                                       <- training config
|
├── data                                                    <- DATASET
│   ├── dogs_dataset
│   │   ├── test
│   │   ├── train
│   │   └── validation
│   └── dogs_dataset.dvc
├── dvc.lock
├── dvc.yaml                                                <- DVC
├── environment.yaml                                        <- conda export `conda env export|grep -v "^prefix: " > environment.yml`
|
|
├── LICENSE
├── logs                   <- Logs generated by hydra and lightning loggers
├── multirun               <- Logs for Hparams Search
├── outputs                <- Logs for eval/fastrun 
|
├── notebooks              <- Jupyter notebooks
|
├── reports
│   ├── lr-Adam.png
│   ├── test-report.png
│   ├── train-report.png
│   └── val-report.png
|
|
├── samples                        <- inference
│   ├── checkpoints 
│   │   └── epoch_019.ckpt
│   ├── inputs
│   │   ├── guess1.jpg
│   │   └── guess2.jpg
│   └── outputs
|
├── scripts                         <- Shell scripts
├── setup.py                        
|
├── src
│   ├── datamodules
│   │   └── dogs_datamodule.py
│   ├── models
│   │   └── dogs_classifier.py
│   ├── utils
│   │   ├── __init__.py
│   │   ├── instantiators.py
│   │   ├── logging_utils.py
│   │   ├── pylogger.py
│   │   ├── rich_utils.py
│   │   └── utils.py
│   ├── __init__.py
│   ├── inference.py
│   ├── train.py
|   └── eval.py
|
├── gradio                                  <- GRADIO Space Huggingspace
│   ├── .gradio/worflows
│   │    └── update-space.yaml      
│   ├── examples                            <- examples
│   │   ├── guess1.jpg
│   │   └── guess2.jpg
│   ├── app.py
│   ├── best_model.pt
│   ├── dvc.lock
│   ├── README.md
│   └── requirements.txt
|
├── tests                                                  <- Pytest
│   ├── datamodules
│   │   └── test_dogs_datamodule.py
│   ├── models
│   │   └── test_dogs_classifier.py
│   ├── test_eval.py
│   └── test_train.py
│
├── Makefile
├── requirements.txt                                        <- requirements+GPU
├── requirements.txt.cpu                                    <- requirements+CPU
|
├── Dockerfile                                              <- Dockerfile+GPU
├── Dockerfile.cpu                                          <- Dockerfile+CPU
├── compose.yml                                             <- docker-compose
│   
├── pyproject.toml                                          
├── ruff.toml                                               <- ruff check --fix 
├── pytest.ini                                              <- pytest config
|
├── .env
├── coverage.xml
|
└── README.md

79 directories, 107 files

Logs

Hydra creates new output directory for every executed run.

Default logging structure:

├── logs
│   ├── task_name
│   │   ├── runs                        # Logs generated by single runs
│   │   │   ├── YYYY-MM-DD_HH-MM-SS       # Datetime of the run
│   │   │   │   ├── .hydra                  # Hydra logs
│   │   │   │   ├── csv                     # Csv logs
│   │   │   │   ├── wandb                   # Weights&Biases logs
│   │   │   │   ├── checkpoints             # Training checkpoints
│   │   │   │   └── ...                     # Any other thing saved during training
│   │   │   └── ...
│   │   │
│   │   └── multiruns                   # Logs generated by multiruns
│   │       ├── YYYY-MM-DD_HH-MM-SS       # Datetime of the multirun
│   │       │   ├──1                        # Multirun job number
│   │       │   ├──2
│   │       │   └── ...
│   │       └── ...
│   │
│   └── debugs                          # Logs generated when debugging config is attached
│       └── ...

Data Setup

#AWS
AWS_ACCESS_KEY_ID= 
AWS_SECRET_ACCESS_KEY=
#Dockerhub
DOCKER_USERNAME=
DOCKER_PASSWORD=
#Code-coverage
CODECOV=
#HF
HF_TOKEN=

Runner-Setup

Clean

make trash
make clean

Training

fastrun

training simple model

make fastrun
make sshow

Hparms:: Optuna

Loss & Accuracy Curves

Train DataLoader
Val DataLoader

- Test DataLoader

LearningRate

Artifacts in S3 🪣

Test- PyTest

make test

============================================================================== test session starts ==============================================================================
platform linux -- Python 3.11.9, pytest-8.3.3, pluggy-1.5.0
rootdir: /home/muthu/GitHub/DogBreedsClassifier
configfile: pytest.ini
plugins: cov-5.0.0, anyio-3.7.1, time-machine-2.15.0, hydra-core-1.3.2
collected 6 items                                                                                                                                                               

tests/datamodules/test_dogs_datamodule.py ...                                                                                                                             [ 50%]
tests/models/test_dogs_classifier.py .                                                                                                                                    [ 66%]
tests/test_eval.py .                                                                                                                                                      [ 83%]
tests/test_train.py .                                                                                                                                                     [100%]
=========================================================================================== warnings summary ============================================================================================
../../miniconda3/envs/venv/lib/python3.11/site-packages/jupyter_client/connect.py:22
  /home/muthu/miniconda3/envs/venv/lib/python3.11/site-packages/jupyter_client/connect.py:22: DeprecationWarning: Jupyter is migrating its paths to use standard platformdirs
  given by the platformdirs library.  To remove this warning and
  see the appropriate new directories, set the environment variable
  `JUPYTER_PLATFORM_DIRS=1` and then run `jupyter --paths`.
  The use of platformdirs will be the default in `jupyter_core` v6
    from jupyter_core.paths import jupyter_data_dir, jupyter_runtime_dir, secure_write

tests/test_eval.py::test_catdog_ex_testing
tests/test_train.py::test_catdog_ex_training
  /home/muthu/miniconda3/envs/venv/lib/python3.11/site-packages/lightning/fabric/connector.py:571: `precision=16` is supported for historical reasons but its usage is discouraged. Please set your precision to 16-mixed instead!

tests/test_train.py::test_catdog_ex_training
  /home/muthu/miniconda3/envs/venv/lib/python3.11/site-packages/lightning/pytorch/loops/fit_loop.py:298: The number of training batches (8) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.

tests/test_train.py::test_catdog_ex_training
  /home/muthu/miniconda3/envs/venv/lib/python3.11/site-packages/torch/optim/lr_scheduler.py:224: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================================================================================== warnings summary ============================================================================================
../../miniconda3/envs/venv/lib/python3.11/site-packages/jupyter_client/connect.py:22
  /home/muthu/miniconda3/envs/venv/lib/python3.11/site-packages/jupyter_client/connect.py:22: DeprecationWarning: Jupyter is migrating its paths to use standard platformdirs
  given by the platformdirs library.  To remove this warning and
  see the appropriate new directories, set the environment variable
  `JUPYTER_PLATFORM_DIRS=1` and then run `jupyter --paths`.
  The use of platformdirs will be the default in `jupyter_core` v6
    from jupyter_core.paths import jupyter_data_dir, jupyter_runtime_dir, secure_write

tests/test_eval.py::test_catdog_ex_testing
tests/test_train.py::test_catdog_ex_training
  /home/muthu/miniconda3/envs/venv/lib/python3.11/site-packages/lightning/fabric/connector.py:571: `precision=16` is supported for historical reasons but its usage is discouraged. Please set your precision to 16-mixed instead!

tests/test_train.py::test_catdog_ex_training
  /home/muthu/miniconda3/envs/venv/lib/python3.11/site-packages/lightning/pytorch/loops/fit_loop.py:298: The number of training batches (8) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.

tests/test_train.py::test_catdog_ex_training
  /home/muthu/miniconda3/envs/venv/lib/python3.11/site-packages/torch/optim/lr_scheduler.py:224: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
==================================================================================== 6 passed, 5 warnings in 33.11s =====================================================================================

Eval

confusion matrix

Train Matrix	Val Matrix	Test Matrix

prediction

args:
    --input_folder
    --output_folder # where to save
    --ckpt_path

Clean

make trash
make clean

Inference

2024-11-10 20:22:17 | INFO     | utils.logging_utils:wrapper:22 - Starting load_image
2024-11-10 20:22:17 | INFO     | utils.logging_utils:wrapper:25 - Finished load_image
2024-11-10 20:22:17 | INFO     | utils.logging_utils:wrapper:22 - Starting infer
2024-11-10 20:22:17 | INFO     | utils.logging_utils:wrapper:25 - Finished infer
2024-11-10 20:22:17 | INFO     | utils.logging_utils:wrapper:22 - Starting save_prediction_image
2024-11-10 20:22:17 | INFO     | utils.logging_utils:wrapper:25 - Finished save_prediction_image
<class 'omegaconf.listconfig.ListConfig'> "conv_ratio":         1.2
"depths":             [3, 3, 15, 3]
2024-11-10 20:22:17 | INFO     | utils.logging_utils:wrapper:22 - Starting load_image
"dims":               [6, 12, 24, 36]
"head_fn":            default
"in_chans":           3
"lr":                 0.001
"min_lr":             1e-06
"model_name":         Mamba
"num_classes":        10
"pretrained":         False
"scheduler_factor":   0.1
"scheduler_patience": 5
"trainable":          False
"weight_decay":       1e-05
Processed guess2.jpg: Poodle (0.89)
2024-11-10 20:22:17 | INFO     | utils.logging_utils:wrapper:25 - Finished load_image
2024-11-10 20:22:17 | INFO     | utils.logging_utils:wrapper:22 - Starting infer
2024-11-10 20:22:17 | INFO     | utils.logging_utils:wrapper:25 - Finished infer
2024-11-10 20:22:17 | INFO     | utils.logging_utils:wrapper:22 - Starting save_prediction_image
2024-11-10 20:22:17 | INFO     | utils.logging_utils:wrapper:25 - Finished save_prediction_image
Processed guess1.jpg: Boxer (0.96)
2024-11-10 20:22:17 | INFO     | utils.logging_utils:wrapper:25 - Finished main

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Main Technologies

WORKFLows

Project Structure

Logs

Data Setup

Runner-Setup

Clean

Training

fastrun

Hparms:: Optuna

Loss & Accuracy Curves

LearningRate

Artifacts in S3 🪣

Test- PyTest

Eval

confusion matrix

prediction

Clean

Inference

Gradio

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.devcontainer		.devcontainer
.dvc		.dvc
.github/workflows		.github/workflows
assets		assets
configs		configs
data		data
fastAPIDeploy		fastAPIDeploy
gradio		gradio
model_store		model_store
notebooks		notebooks
reports		reports
samples		samples
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.project-root		.project-root
Dockerfile		Dockerfile
Dockerfile.cpu		Dockerfile.cpu
K8sgradio.zip		K8sgradio.zip
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
compose.yml		compose.yml
dvc.lock		dvc.lock
dvc.yaml		dvc.yaml
environment.yaml		environment.yaml
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
requirements.txt.cpu		requirements.txt.cpu
ruff.toml		ruff.toml
setup.py		setup.py
swagger.json		swagger.json

License

Muthukamalan/DogBreedsClassifier

Folders and files

Latest commit

History

Repository files navigation

Main Technologies

WORKFLows

Project Structure

Logs

Data Setup

Runner-Setup

Clean

Training

fastrun

Hparms:: Optuna

Loss & Accuracy Curves

LearningRate

Artifacts in S3 🪣

Test- PyTest

Eval

confusion matrix

prediction

Clean

Inference

Gradio

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages