-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
e2ae0d9
commit ce4e8d5
Showing
23 changed files
with
327 additions
and
124 deletions.
There are no files selected for viewing
37 changes: 37 additions & 0 deletions
37
docs/ai/computer-vision-cv/convolutional-neural-network-cnn.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
# Convolutional Neural Network (CNN) | ||
|
||
## Neural Networks | ||
|
||
Among [deep neural networks (DNN)](https://viso.ai/deep-learning/deep-neural-network-three-popular-types/), the [convolutional neural network (CNN)](https://viso.ai/deep-learning/convolutional-neural-networks/) has demonstrated excellent results in computer vision tasks, especially in image classification. Convolutional Neural Networks (CNNs) are a special type of multi-layer neural network inspired by the mechanism of human optical and neural systems. | ||
|
||
In 2012, a large deep convolutional neural network called [AlexNet](https://viso.ai/deep-learning/alexnet/) showed excellent performance on the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). This marked the start of the broad use and development of convolutional neural network models (CNN) such as [VGGNet](https://viso.ai/deep-learning/vgg-very-deep-convolutional-networks/), [GoogleNet](https://viso.ai/deep-learning/googlenet-explained-the-inception-model-that-won-imagenet/), [ResNet](https://viso.ai/deep-learning/resnet-residual-neural-network/), DenseNet, and many more. | ||
|
||
## Convolutional Neural Network (CNN) | ||
|
||
A CNN is a framework developed using machine learning concepts. CNNs can learn and train from data on their own without the need for human intervention. | ||
|
||
There is only some pre-processing needed when using CNNs. They develop and adapt their image filters, which have to be carefully coded for most algorithms and models. CNN frameworks have a set of layers that perform particular functions to enable CNN to perform these functions. | ||
|
||
## CNN Architecture | ||
|
||
The basic unit of a CNN framework is a neuron. The concept of neurons is based on human neurons, where synapses occur due to [neuron activation](https://viso.ai/deep-learning/neuron-activation/). These are statistical functions that calculate the weighted average of inputs and apply an activation function to the result generated. Layers are a cluster of neurons, with each layer having a particular function. | ||
|
||
|
||
![Concept of a neural network](../../media/Pasted%20image%2020240917123040.png) | ||
|
||
## CNN Layers | ||
|
||
A CNN system may have somewhere between 3 to 150 or even more layers: The “deep” of Deep neural networks refers to the number of layers. One layer’s output acts as another layer’s input. Deep multi-layer neural networks include [Resnet50 (50 layers) or ResNet101 (101 layers)](https://viso.ai/deep-learning/resnet-residual-neural-network/). | ||
|
||
![Concept of a Convolutional Neural Network (CNN)](../../media/Pasted%20image%2020240917123109.png) | ||
|
||
CNN layers can be of four main types: Convolution Layer, ReLu Layer, Pooling Layer, and Fully-Connected Layer. | ||
|
||
- **Convolution Layer:** A convolution is the simple application of a filter to an input that results in an activation. The convolution layer has a set of trainable filters that have a small receptive range but can be used to the full depth of data provided. Convolution layers are the major building blocks used in convolutional neural networks. | ||
- **ReLu Layer:** ReLu layers, or Rectified linear unit layers, are activation functions for lowering [overfitting](https://viso.ai/computer-vision/what-is-overfitting/) and building CNN accuracy and effectiveness. Models that have these layers are easier to train and produce more accurate results. | ||
- **Pooling Layer:** This layer collects the result of all neurons in the layer preceding it and processes this data. The primary task of a pooling layer is to lower the number of considered factors and give streamlined output. | ||
- **Fully-Connected Layer:** This layer is the final output layer for CNN models that flattens the input data received from layers before it and gives the result. | ||
|
||
## Links | ||
|
||
- [A Complete Guide to Image Classification in 2024 - viso.ai](https://viso.ai/computer-vision/image-classification/) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,96 @@ | ||
# Model Building Stages | ||
|
||
## 1. Define the Problem | ||
|
||
Clearly define the goal of the project: **to build a CV model that detects aflatoxin contamination levels in corn samples through image analysis**. The contamination levels will be categorized into predefined bands such as 0-30 ppb, 31-50 ppb, etc. | ||
|
||
- **Output**: Classification of aflatoxin levels into one of the specified categories. | ||
- **Performance Target**: Achieve at least 80% accuracy in classifying contamination levels. | ||
|
||
## 2. Collect and Label Data | ||
|
||
The success of a CV model depends heavily on the quality and quantity of data: | ||
|
||
- **Image Dataset**: Obtain a dataset of corn images provided by the client, with images labeled based on the aflatoxin contamination levels. | ||
- **Data Labels**: Ensure that each image has a label that specifies the contamination level (in ppb). These will serve as the ground truth for training the model. | ||
- **Data Size**: Ensure the dataset is large enough to prevent overfitting. If the dataset is small, consider techniques like **data augmentation** to artificially increase the dataset size. | ||
|
||
## 3. Preprocess the Data | ||
|
||
Preprocessing the images is essential to standardize the input data for the model: | ||
|
||
- **Normalization**: Scale pixel values to a range of `[0, 1]` or `[-1, 1]` to help the model converge faster. | ||
- **Resizing**: Resize all images to a fixed resolution (e.g., 224x224 pixels) to ensure consistency in input size. | ||
- **Augmentation**: Apply image augmentation techniques (e.g., rotation, flipping, zoom, brightness adjustments) to make the model more robust to variations in real-world conditions. | ||
- **Train-Validation Split**: Split the dataset into training and validation sets (e.g., 80% training, 20% validation) to evaluate model performance during development. | ||
|
||
## 4. Choose a Model Architecture | ||
|
||
For image classification tasks, **Convolutional Neural Networks (CNNs)** are the most commonly used architectures: | ||
|
||
- **Pre-trained Models (Transfer Learning)**: | ||
- Use pre-trained models like **ResNet**, **MobileNet**, or **EfficientNet** to leverage knowledge from large datasets like ImageNet. This can reduce training time and improve accuracy. | ||
- **Transfer Learning**: Fine-tune the pre-trained model on your specific dataset by replacing the final layer(s) to output the aflatoxin contamination categories. | ||
- **Custom CNN Architecture**: | ||
- If transfer learning isn’t sufficient, a custom CNN architecture can be built. Design layers that fit the complexity of your data, including convolutional layers, pooling layers, and fully connected layers. | ||
|
||
## 5. Train the Model | ||
|
||
Now that the data is prepared and the model architecture is selected, proceed to training: | ||
|
||
- **Loss Function**: Use **categorical cross-entropy** as the loss function since this is a multi-class classification problem. | ||
- **Optimizer**: Use optimizers like **Adam** or **SGD** with momentum to adjust learning rates and improve convergence. | ||
- **Batch Size & Epochs**: Experiment with different batch sizes (e.g., 32, 64) and run multiple epochs (e.g., 50-100 epochs). Monitor overfitting using early stopping techniques. | ||
- **Hyperparameter Tuning**: Fine-tune hyperparameters like learning rate, dropout rate, and number of layers to optimize performance. | ||
|
||
## 6. Evaluate the Model | ||
|
||
After training, evaluate the model to ensure it meets the desired performance criteria: | ||
|
||
- **Confusion Matrix**: Generate a confusion matrix to analyze how well the model performs across all contamination bands (e.g., 0-30 ppb, 31-50 ppb). | ||
- **Performance Metrics**: Evaluate key metrics like accuracy, precision, recall, F1-score for each class. For imbalanced datasets, consider using **weighted precision/recall**. | ||
- **Cross-Validation**: Perform **k-fold cross-validation** to ensure that the model generalizes well across different subsets of the data. | ||
|
||
## 7. Improve the Model | ||
|
||
If the model does not meet the performance goals, several techniques can be used to improve it: | ||
|
||
- **Data Augmentation**: Further enhance the dataset by introducing more variability in the training data. | ||
- **Model Regularization**: Use techniques like **dropout**, **batch normalization**, or **L2 regularization** to prevent overfitting. | ||
- **Hyperparameter Tuning**: Use methods like **grid search** or **random search** to find optimal values for hyperparameters (e.g., learning rate, batch size). | ||
- **Ensemble Methods**: Combine multiple models (e.g., bagging or boosting) to improve prediction accuracy. | ||
|
||
## 8. Test the Model | ||
|
||
Once the model is fine-tuned and evaluated, test its performance on a **holdout test set** or new data provided by the client: | ||
|
||
- **Validation on New Data**: Use unseen images from the client’s dataset to ensure that the model generalizes well to real-world samples. | ||
- **Performance Metrics Report**: Document the model's final accuracy, confusion matrix, and other performance metrics. | ||
|
||
## 9. Deploy the Model (For POC) | ||
|
||
For the POC phase, the model will be deployed in a hosted environment (cloud or on-prem): | ||
|
||
- **Deploy on Vendor's Environment**: Host the model on a cloud server (e.g., AWS, Azure) where it can accept image inputs and return aflatoxin contamination levels via an API. | ||
- **Performance Monitoring**: Set up tools to monitor inference time, model accuracy, and resource utilization to ensure smooth operation. | ||
|
||
## 10. Document and Report Results | ||
|
||
After deployment, prepare a comprehensive report to present to the client: | ||
|
||
- **POC Results**: Include detailed results of the model’s performance (e.g., accuracy, confusion matrix). | ||
- **Recommendations for Future Phases**: Provide insights on how the model can be scaled and improved further in Phase 2 (e.g., mobile app integration, on-device inference). | ||
|
||
## Tools and Technologies for Each Step | ||
|
||
1. **Preprocessing & Data Augmentation**: | ||
- Tools: **OpenCV**, **Keras ImageDataGenerator**, **Albumentations** | ||
2. **Model Development**: | ||
- Tools: **TensorFlow**, **Keras**, **PyTorch** (for building CNNs and transfer learning) | ||
3. **Training & Optimization**: | ||
- Optimizers: **Adam**, **SGD** | ||
- Techniques: Early stopping, learning rate scheduling | ||
4. **Evaluation**: | ||
- Tools: **scikit-learn** (for confusion matrices and performance metrics) | ||
5. **Deployment**: | ||
- Tools: **AWS SageMaker**, **Azure ML**, or **Google AI Platform** |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.