-
Notifications
You must be signed in to change notification settings - Fork 8
Deep Learning Files (backend.dl)
This page contains documentation for the files in the /backend/dl
directory of the Github repo. This page is regularly updated when new changes are added to these files.
This file contains functions to calculate evaluation metrics. Right now, we have support for computing accuracy for classification problems.
We call the evaluation functions within dl_eval.py
from dl_trainer.py
, but keep in mind that for a function like compute_accuracy()
, you take in the actual vs. predicted values as torch.Tensor
format. Note that if your data was somehow in the form of a numpy array, it is relatively straightforward to convert the numpy arrays to torch.Tensor
.
Endpoint that takes in the user-specified DL architecture (via drag and drop endpoint) and parses the "raw array of string" into actual torch.nn
objects that get passed to /backend/dl/dl_model.py
.
parse_deep_user_architecture(
[
"nn.Linear(in_features=50, out_features=10)",
"nn.Conv2d(in_channels=3, out_channels=36, kernel_size=3, stride=2)",
]
) = [nn.Linear(in_features=50, out_features=10), nn.Conv2d(in_channels=3, out_channels=36, kernel_size=3, stride=2)]
Look carefully at the output. Do you see how the output is different from the input into parse_deep_user_architecture()
This file contains the logic that takes in the user-specified DL architecture (via the drag and drop endpoint) and builds the PyTorch representation of the model in the form of nn.Sequential()
list. One thing that would be cool to support (but will take some work) is to allow for users to create custom architectures (like Bottleneck in Resnet) and be able to drag and drop those and have this file "smartly" parse the architecture and build it
This file is a very important file in our codebase. This file is what orchestrates the process of training deep learning models for classification and regression problems. This function provides a general implementation for training a deep learning model in Pytorch epoch by epoch. For each epoch, the following happens:
- Start timer
- For each batch in the train loader, do the following ** Set gradients to zero for optimization (gradient descent purposes) ** Make prediction on the input ** Evaluate loss criterion on prediction vs. actual output ** Backpropagation ** Update weights
- Stop the timer
- Update running list of epoch train time, running list of train loss, running list of test loss
The dl_trainer
is capable of exporting a file called dl_results.csv
, which is a table that has each epoch, train loss, test loss, time taken. We also store trained model weights + architecture in the form of ONNX and .pt
files. ONNX files allow for a user to visualize model architecture on netron.app
model = nn.Sequential([nn.Linear(4, 10), nn.ReLU(), nn.Linear(10, 20), nn.ReLU(), nn.Linear(20, 3), nn.Softmax()])
train, test = train_test_split(X, y, test_size=0.2)
train_loader, test_loader = get_dataloaders(train, test)
optimizer = torch.optim.SGD()
criterion = torch.criterion.CELoss()
epochs = 10
problem_type = "CLASSIFICATION"
train_deep_model(model, train_loader, test_loader, optimizer, criterion, epochs, problem_type)
The file provides the functionality to use established model architectures (like resnet, vgg) as well as already obtained weights and biases to train on the user's image dataset. The trained metrics are updated at backend/dl_results.csv
, and a .pt file is generated at frontend/playground-frontend/src/backend_outputs/my_deep_learning_model.pt
The image dataset needs to be in a zipped folder with the following structure -- root/train/class1/xxx.png root/train/class2/yyy.png
root/valid/class1/123.png
root/valid/class2/img.png
... Example zipped file for this is tests/zip_files/double_zipped.zip
A pretrained model is created by cutting an already estabilished architecture model (like resnet) into two or more parts. When the model is cut, it is divided into head and body. The parameters of body are kept the same from where the model is loaded, and the head is trained.
-
train()
: trains, saves train_loss, valid_loss and outputs a .pth file -
get_all()
: returns all models supported
train(
zipped_file="../tests/zip_files/double_zipped.zip",
model_name="xcit_small_12_p8_224_dist",
batch_size=2,
loss_func=torch.nn.CrossEntropyLoss(),
n_epochs=3,
shuffle=False,
optimizer=SGD,
lr=3e-4,
n_classes=2,
train_transform = [torchvision.transforms.Resize((224, 224)), torchvision.transforms.ToTensor() ] ## Do NOT add torchvision.transforms.Compose(),
cut = 2 # cut the model from the second layer. second layer will go the head (can be a list as well)
)
- train_loss or valid_loss have nan values => batch size is greater than the size of dataset
- Some errors can be traced here
- ViT models are not fully compatible YET
if you face any other error please tag @vidushiMaheshwari to it.
- Home
- Terraform
- Bearer-Token-Gen-Script
- Frontend-Backend Communication Documentation
- Backend Documentation (backend)
-
driver.py
- AWS Helper Files (backend.aws_helpers)
- Dynamo DB Utility Files (aws_helpers.dynamo_db_utils)
- AWS Secrets Utility Files (aws_secrets_utils)
- AWS Batch Utility Files (aws_batch_utils)
- Firebase Helper Files (backend.firebase_helpers)
- Common Files (backend.common)
-
constants.py
-
dataset.py
-
default_datasets.py
-
email_notifier.py
-
loss_functions.py
-
optimizer.py
-
utils.py
- Deep Learning Files (backend.dl)
- Machine Learning Files (backend.ml)
- Frontend Documentation
- Bug Manual
- Developer Runbook
- Examples to locally test DLP
- Knowledge Share