Skip to content

Commit

Permalink
updated docs
Browse files Browse the repository at this point in the history
  • Loading branch information
deepaksood619 committed Sep 17, 2024
1 parent e2ae0d9 commit ce4e8d5
Show file tree
Hide file tree
Showing 23 changed files with 327 additions and 124 deletions.
37 changes: 37 additions & 0 deletions docs/ai/computer-vision-cv/convolutional-neural-network-cnn.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Convolutional Neural Network (CNN)

## Neural Networks

Among [deep neural networks (DNN)](https://viso.ai/deep-learning/deep-neural-network-three-popular-types/), the [convolutional neural network (CNN)](https://viso.ai/deep-learning/convolutional-neural-networks/) has demonstrated excellent results in computer vision tasks, especially in image classification. Convolutional Neural Networks (CNNs) are a special type of multi-layer neural network inspired by the mechanism of human optical and neural systems.

In 2012, a large deep convolutional neural network called [AlexNet](https://viso.ai/deep-learning/alexnet/) showed excellent performance on the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). This marked the start of the broad use and development of convolutional neural network models (CNN) such as [VGGNet](https://viso.ai/deep-learning/vgg-very-deep-convolutional-networks/)[GoogleNet](https://viso.ai/deep-learning/googlenet-explained-the-inception-model-that-won-imagenet/)[ResNet](https://viso.ai/deep-learning/resnet-residual-neural-network/), DenseNet, and many more.

## Convolutional Neural Network (CNN)

A CNN is a framework developed using machine learning concepts. CNNs can learn and train from data on their own without the need for human intervention.

There is only some pre-processing needed when using CNNs. They develop and adapt their image filters, which have to be carefully coded for most algorithms and models. CNN frameworks have a set of layers that perform particular functions to enable CNN to perform these functions.

## CNN Architecture

The basic unit of a CNN framework is a neuron. The concept of neurons is based on human neurons, where synapses occur due to [neuron activation](https://viso.ai/deep-learning/neuron-activation/). These are statistical functions that calculate the weighted average of inputs and apply an activation function to the result generated. Layers are a cluster of neurons, with each layer having a particular function.


![Concept of a neural network](../../media/Pasted%20image%2020240917123040.png)

## CNN Layers

A CNN system may have somewhere between 3 to 150 or even more layers: The “deep” of Deep neural networks refers to the number of layers. One layer’s output acts as another layer’s input. Deep multi-layer neural networks include [Resnet50 (50 layers) or ResNet101 (101 layers)](https://viso.ai/deep-learning/resnet-residual-neural-network/).

![Concept of a Convolutional Neural Network (CNN)](../../media/Pasted%20image%2020240917123109.png)

CNN layers can be of four main types: Convolution Layer, ReLu Layer, Pooling Layer, and Fully-Connected Layer.

- **Convolution Layer:** A convolution is the simple application of a filter to an input that results in an activation. The convolution layer has a set of trainable filters that have a small receptive range but can be used to the full depth of data provided. Convolution layers are the major building blocks used in convolutional neural networks.
- **ReLu Layer:** ReLu layers, or Rectified linear unit layers, are activation functions for lowering [overfitting](https://viso.ai/computer-vision/what-is-overfitting/) and building CNN accuracy and effectiveness. Models that have these layers are easier to train and produce more accurate results.
- **Pooling Layer:** This layer collects the result of all neurons in the layer preceding it and processes this data. The primary task of a pooling layer is to lower the number of considered factors and give streamlined output.
- **Fully-Connected Layer:** This layer is the final output layer for CNN models that flattens the input data received from layers before it and gives the result.

## Links

- [A Complete Guide to Image Classification in 2024 - viso.ai](https://viso.ai/computer-vision/image-classification/)
14 changes: 5 additions & 9 deletions docs/ai/computer-vision-cv/intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,12 +82,8 @@ https://en.wikipedia.org/wiki/Peak_signal-to-noise_ratio

## References

[Self Driving Nanodegree](courses/self-driving-nanodegree.md)

https://towardsdatascience.com/understanding-ssd-multibox-real-time-object-detection-in-deep-learning-495ef744fab

https://www.freecodecamp.org/news/advanced-computer-vision-with-python

[Comic book panel segmentation • Max Halford](https://maxhalford.github.io/blog/comic-book-panel-segmentation/)

[Unbelievable Face Swapping with 5 Lines Code - YouTube](https://www.youtube.com/watch?v=a8vFMaH2aDw)
- [Self Driving Nanodegree](courses/self-driving-nanodegree.md)
- https://towardsdatascience.com/understanding-ssd-multibox-real-time-object-detection-in-deep-learning-495ef744fab
- https://www.freecodecamp.org/news/advanced-computer-vision-with-python
- [Comic book panel segmentation • Max Halford](https://maxhalford.github.io/blog/comic-book-panel-segmentation/)
- [Unbelievable Face Swapping with 5 Lines Code - YouTube](https://www.youtube.com/watch?v=a8vFMaH2aDw)
96 changes: 96 additions & 0 deletions docs/ai/computer-vision-cv/model-building-stages.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
# Model Building Stages

## 1. Define the Problem

Clearly define the goal of the project: **to build a CV model that detects aflatoxin contamination levels in corn samples through image analysis**. The contamination levels will be categorized into predefined bands such as 0-30 ppb, 31-50 ppb, etc.

- **Output**: Classification of aflatoxin levels into one of the specified categories.
- **Performance Target**: Achieve at least 80% accuracy in classifying contamination levels.

## 2. Collect and Label Data

The success of a CV model depends heavily on the quality and quantity of data:

- **Image Dataset**: Obtain a dataset of corn images provided by the client, with images labeled based on the aflatoxin contamination levels.
- **Data Labels**: Ensure that each image has a label that specifies the contamination level (in ppb). These will serve as the ground truth for training the model.
- **Data Size**: Ensure the dataset is large enough to prevent overfitting. If the dataset is small, consider techniques like **data augmentation** to artificially increase the dataset size.

## 3. Preprocess the Data

Preprocessing the images is essential to standardize the input data for the model:

- **Normalization**: Scale pixel values to a range of `[0, 1]` or `[-1, 1]` to help the model converge faster.
- **Resizing**: Resize all images to a fixed resolution (e.g., 224x224 pixels) to ensure consistency in input size.
- **Augmentation**: Apply image augmentation techniques (e.g., rotation, flipping, zoom, brightness adjustments) to make the model more robust to variations in real-world conditions.
- **Train-Validation Split**: Split the dataset into training and validation sets (e.g., 80% training, 20% validation) to evaluate model performance during development.

## 4. Choose a Model Architecture

For image classification tasks, **Convolutional Neural Networks (CNNs)** are the most commonly used architectures:

- **Pre-trained Models (Transfer Learning)**:
- Use pre-trained models like **ResNet**, **MobileNet**, or **EfficientNet** to leverage knowledge from large datasets like ImageNet. This can reduce training time and improve accuracy.
- **Transfer Learning**: Fine-tune the pre-trained model on your specific dataset by replacing the final layer(s) to output the aflatoxin contamination categories.
- **Custom CNN Architecture**:
- If transfer learning isn’t sufficient, a custom CNN architecture can be built. Design layers that fit the complexity of your data, including convolutional layers, pooling layers, and fully connected layers.

## 5. Train the Model

Now that the data is prepared and the model architecture is selected, proceed to training:

- **Loss Function**: Use **categorical cross-entropy** as the loss function since this is a multi-class classification problem.
- **Optimizer**: Use optimizers like **Adam** or **SGD** with momentum to adjust learning rates and improve convergence.
- **Batch Size & Epochs**: Experiment with different batch sizes (e.g., 32, 64) and run multiple epochs (e.g., 50-100 epochs). Monitor overfitting using early stopping techniques.
- **Hyperparameter Tuning**: Fine-tune hyperparameters like learning rate, dropout rate, and number of layers to optimize performance.

## 6. Evaluate the Model

After training, evaluate the model to ensure it meets the desired performance criteria:

- **Confusion Matrix**: Generate a confusion matrix to analyze how well the model performs across all contamination bands (e.g., 0-30 ppb, 31-50 ppb).
- **Performance Metrics**: Evaluate key metrics like accuracy, precision, recall, F1-score for each class. For imbalanced datasets, consider using **weighted precision/recall**.
- **Cross-Validation**: Perform **k-fold cross-validation** to ensure that the model generalizes well across different subsets of the data.

## 7. Improve the Model

If the model does not meet the performance goals, several techniques can be used to improve it:

- **Data Augmentation**: Further enhance the dataset by introducing more variability in the training data.
- **Model Regularization**: Use techniques like **dropout**, **batch normalization**, or **L2 regularization** to prevent overfitting.
- **Hyperparameter Tuning**: Use methods like **grid search** or **random search** to find optimal values for hyperparameters (e.g., learning rate, batch size).
- **Ensemble Methods**: Combine multiple models (e.g., bagging or boosting) to improve prediction accuracy.

## 8. Test the Model

Once the model is fine-tuned and evaluated, test its performance on a **holdout test set** or new data provided by the client:

- **Validation on New Data**: Use unseen images from the client’s dataset to ensure that the model generalizes well to real-world samples.
- **Performance Metrics Report**: Document the model's final accuracy, confusion matrix, and other performance metrics.

## 9. Deploy the Model (For POC)

For the POC phase, the model will be deployed in a hosted environment (cloud or on-prem):

- **Deploy on Vendor's Environment**: Host the model on a cloud server (e.g., AWS, Azure) where it can accept image inputs and return aflatoxin contamination levels via an API.
- **Performance Monitoring**: Set up tools to monitor inference time, model accuracy, and resource utilization to ensure smooth operation.

## 10. Document and Report Results

After deployment, prepare a comprehensive report to present to the client:

- **POC Results**: Include detailed results of the model’s performance (e.g., accuracy, confusion matrix).
- **Recommendations for Future Phases**: Provide insights on how the model can be scaled and improved further in Phase 2 (e.g., mobile app integration, on-device inference).

## Tools and Technologies for Each Step

1. **Preprocessing & Data Augmentation**:
- Tools: **OpenCV**, **Keras ImageDataGenerator**, **Albumentations**
2. **Model Development**:
- Tools: **TensorFlow**, **Keras**, **PyTorch** (for building CNNs and transfer learning)
3. **Training & Optimization**:
- Optimizers: **Adam**, **SGD**
- Techniques: Early stopping, learning rate scheduling
4. **Evaluation**:
- Tools: **scikit-learn** (for confusion matrices and performance metrics)
5. **Deployment**:
- Tools: **AWS SageMaker**, **Azure ML**, or **Google AI Platform**
9 changes: 8 additions & 1 deletion docs/ai/computer-vision-cv/pre-trained-models.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ Here's how YOLO works:

- **Grid:** YOLO's CNN divides an image into a grid.
- **Bounding boxes:** Each cell in the grid predicts a number of bounding boxes.
- **Class probabilities:** Each cell also predicts a class probability, which indicates the likelihood of an object being present in the box. 
- **Class probabilities:** Each cell also predicts a class probability, which indicates the likelihood of an object being present in the box.

YOLO is popular because of its single-stage architecture, real-time performance, and accuracy. It's well-suited for real-time applications like self-driving cars, video surveillance, and augmented reality.

Expand Down Expand Up @@ -139,6 +139,12 @@ YOLO is popular because of its single-stage architecture, real-time performance,
| BiT-L (ResNet) | **928 M** | 87.54 % | 2019 |
| NoisyStudent EfficientNet-L2 | **480 M** | **88.4** % | 2020 |
| Meta Pseudo Labels | **480 M** | **90.2** % | 2021 |
| CoCa (finetuned) | 2100M | 91.0% | 2022 |
| OmniVec (ViT) | | 92.4% | 2023 |

Leaderboard - [ImageNet Benchmark (Image Classification) | Papers With Code](https://paperswithcode.com/sota/image-classification-on-imagenet)

Models - [Models - Hugging Face](https://huggingface.co/models?pipeline_tag=image-classification)

- [CNN Architectures: LeNet, AlexNet, VGG, GoogLeNet, ResNet and more… | by Siddharth Das | Analytics Vidhya | Medium](https://medium.com/analytics-vidhya/cnns-architectures-lenet-alexnet-vgg-googlenet-resnet-and-more-666091488df5)
- [Difference between AlexNet, VGGNet, ResNet, and Inception | by Aqeel Anwar | Towards Data Science](https://towardsdatascience.com/the-w3h-of-alexnet-vggnet-resnet-and-inception-7baaaecccc96)
Expand All @@ -160,3 +166,4 @@ YOLO is popular because of its single-stage architecture, real-time performance,
- [Top Pre-Trained Models for Image Classification - GeeksforGeeks](https://www.geeksforgeeks.org/top-pre-trained-models-for-image-classification/)
- [Top 4 Pre-Trained Models for Image Classification + Python Code](https://www.analyticsvidhya.com/blog/2020/08/top-4-pre-trained-models-for-image-classification-with-python-code/)
- [Best deep CNN architectures and their principles: from AlexNet to EfficientNet | AI Summer](https://theaisummer.com/cnn-architectures/)
- [7 Best Image Classification Models You Should Know in 2023 - Jonas Cleveland](https://jonascleveland.com/best-image-classification-models/)
2 changes: 2 additions & 0 deletions docs/ai/computer-vision-cv/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

- [Computer Vision (CV) Intro](ai/computer-vision-cv/intro.md)
- [Pre-Trained Models](ai/computer-vision-cv/pre-trained-models.md)
- [Convolutional Neural Network (CNN)](ai/computer-vision-cv/convolutional-neural-network-cnn.md)
- [Model Building Stages](ai/computer-vision-cv/model-building-stages.md)
- [Image / Data Labeling Tools](image-data-labeling-tools)
- [Image Formats](image-formats)
- [MNIST for ML Begineers | Tensorflow](mnist-for-ml-beginners-tensorflow)
Expand Down
2 changes: 2 additions & 0 deletions docs/ai/data-science/datasets.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Datasets

[Home - Data Commons](https://datacommons.org/)

https://www.kaggle.com/dalpozz/creditcardfraud

[20+ Amazing (And Free) Data Sources Anyone Can Use To Build AIs](https://www.forbes.com/sites/bernardmarr/2023/05/17/20-amazing-and-free-data-sources-anyone-can-use-to-build-ais/?sh=17c13eec617f)
Expand Down
2 changes: 1 addition & 1 deletion docs/ai/deep-learning/commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ def L1(yhat, y):
def L2(yhat, y):
loss = np.sum(np.dot((y-yhat),(y-yhat)))

A trick when you want to flatten a matrix X of shape (a, b, c, d) to a matrix X_flatten of shape (b∗∗c∗∗d, a) is to use:
# A trick when you want to flatten a matrix X of shape (a, b, c, d) to a matrix X_flatten of shape (b∗∗c∗∗d, a) is to use:

X_flatten = X.reshape(X.shape [0], -1).T
```
Loading

0 comments on commit ce4e8d5

Please sign in to comment.