updated docs

deepaksood619 · Sep 17, 2024 · ce4e8d5 · ce4e8d5
1 parent e2ae0d9
commit ce4e8d5
Show file tree

Hide file tree

Showing 23 changed files with 327 additions and 124 deletions.
diff --git a/docs/ai/computer-vision-cv/convolutional-neural-network-cnn.md b/docs/ai/computer-vision-cv/convolutional-neural-network-cnn.md
@@ -0,0 +1,37 @@
+# Convolutional Neural Network (CNN)
+
+## Neural Networks
+
+Among [deep neural networks (DNN)](https://viso.ai/deep-learning/deep-neural-network-three-popular-types/), the [convolutional neural network (CNN)](https://viso.ai/deep-learning/convolutional-neural-networks/) has demonstrated excellent results in computer vision tasks, especially in image classification. Convolutional Neural Networks (CNNs) are a special type of multi-layer neural network inspired by the mechanism of human optical and neural systems.
+
+In 2012, a large deep convolutional neural network called [AlexNet](https://viso.ai/deep-learning/alexnet/) showed excellent performance on the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). This marked the start of the broad use and development of convolutional neural network models (CNN) such as [VGGNet](https://viso.ai/deep-learning/vgg-very-deep-convolutional-networks/), [GoogleNet](https://viso.ai/deep-learning/googlenet-explained-the-inception-model-that-won-imagenet/), [ResNet](https://viso.ai/deep-learning/resnet-residual-neural-network/), DenseNet, and many more.
+
+## Convolutional Neural Network (CNN)
+
+A CNN is a framework developed using machine learning concepts. CNNs can learn and train from data on their own without the need for human intervention.
+
+There is only some pre-processing needed when using CNNs. They develop and adapt their image filters, which have to be carefully coded for most algorithms and models. CNN frameworks have a set of layers that perform particular functions to enable CNN to perform these functions.
+
+## CNN Architecture
+
+The basic unit of a CNN framework is a neuron. The concept of neurons is based on human neurons, where synapses occur due to [neuron activation](https://viso.ai/deep-learning/neuron-activation/). These are statistical functions that calculate the weighted average of inputs and apply an activation function to the result generated. Layers are a cluster of neurons, with each layer having a particular function.
+
+
+![Concept of a neural network](../../media/Pasted%20image%2020240917123040.png)
+
+## CNN Layers
+
+A CNN system may have somewhere between 3 to 150 or even more layers: The “deep” of Deep neural networks refers to the number of layers. One layer’s output acts as another layer’s input. Deep multi-layer neural networks include [Resnet50 (50 layers) or ResNet101 (101 layers)](https://viso.ai/deep-learning/resnet-residual-neural-network/).
+
+![Concept of a Convolutional Neural Network (CNN)](../../media/Pasted%20image%2020240917123109.png)
+
+CNN layers can be of four main types: Convolution Layer, ReLu Layer, Pooling Layer, and Fully-Connected Layer.
+
+- **Convolution Layer:** A convolution is the simple application of a filter to an input that results in an activation. The convolution layer has a set of trainable filters that have a small receptive range but can be used to the full depth of data provided. Convolution layers are the major building blocks used in convolutional neural networks.
+- **ReLu Layer:** ReLu layers, or Rectified linear unit layers, are activation functions for lowering [overfitting](https://viso.ai/computer-vision/what-is-overfitting/) and building CNN accuracy and effectiveness. Models that have these layers are easier to train and produce more accurate results.
+- **Pooling Layer:** This layer collects the result of all neurons in the layer preceding it and processes this data. The primary task of a pooling layer is to lower the number of considered factors and give streamlined output.
+- **Fully-Connected Layer:** This layer is the final output layer for CNN models that flattens the input data received from layers before it and gives the result.
+
+## Links
+
+- [A Complete Guide to Image Classification in 2024 - viso.ai](https://viso.ai/computer-vision/image-classification/)
diff --git a/docs/ai/computer-vision-cv/intro.md b/docs/ai/computer-vision-cv/intro.md
@@ -82,12 +82,8 @@ https://en.wikipedia.org/wiki/Peak_signal-to-noise_ratio
 
 ## References
 
-[Self Driving Nanodegree](courses/self-driving-nanodegree.md)
-
-https://towardsdatascience.com/understanding-ssd-multibox-real-time-object-detection-in-deep-learning-495ef744fab
-
-https://www.freecodecamp.org/news/advanced-computer-vision-with-python
-
-[Comic book panel segmentation • Max Halford](https://maxhalford.github.io/blog/comic-book-panel-segmentation/)
-
-[Unbelievable Face Swapping with 5 Lines Code - YouTube](https://www.youtube.com/watch?v=a8vFMaH2aDw)
+- [Self Driving Nanodegree](courses/self-driving-nanodegree.md)
+- https://towardsdatascience.com/understanding-ssd-multibox-real-time-object-detection-in-deep-learning-495ef744fab
+- https://www.freecodecamp.org/news/advanced-computer-vision-with-python
+- [Comic book panel segmentation • Max Halford](https://maxhalford.github.io/blog/comic-book-panel-segmentation/)
+- [Unbelievable Face Swapping with 5 Lines Code - YouTube](https://www.youtube.com/watch?v=a8vFMaH2aDw)
diff --git a/docs/ai/computer-vision-cv/model-building-stages.md b/docs/ai/computer-vision-cv/model-building-stages.md
@@ -0,0 +1,96 @@
+# Model Building Stages
+
+## 1. Define the Problem
+
+Clearly define the goal of the project: **to build a CV model that detects aflatoxin contamination levels in corn samples through image analysis**. The contamination levels will be categorized into predefined bands such as 0-30 ppb, 31-50 ppb, etc.
+
+- **Output**: Classification of aflatoxin levels into one of the specified categories.
+- **Performance Target**: Achieve at least 80% accuracy in classifying contamination levels.
+
+## 2. Collect and Label Data
+
+The success of a CV model depends heavily on the quality and quantity of data:
+
+- **Image Dataset**: Obtain a dataset of corn images provided by the client, with images labeled based on the aflatoxin contamination levels.
+- **Data Labels**: Ensure that each image has a label that specifies the contamination level (in ppb). These will serve as the ground truth for training the model.
+- **Data Size**: Ensure the dataset is large enough to prevent overfitting. If the dataset is small, consider techniques like **data augmentation** to artificially increase the dataset size.
+
+## 3. Preprocess the Data
+
+Preprocessing the images is essential to standardize the input data for the model:
+
+- **Normalization**: Scale pixel values to a range of `[0, 1]` or `[-1, 1]` to help the model converge faster.
+- **Resizing**: Resize all images to a fixed resolution (e.g., 224x224 pixels) to ensure consistency in input size.
+- **Augmentation**: Apply image augmentation techniques (e.g., rotation, flipping, zoom, brightness adjustments) to make the model more robust to variations in real-world conditions.
+- **Train-Validation Split**: Split the dataset into training and validation sets (e.g., 80% training, 20% validation) to evaluate model performance during development.
+
+## 4. Choose a Model Architecture
+
+For image classification tasks, **Convolutional Neural Networks (CNNs)** are the most commonly used architectures:
+
+- **Pre-trained Models (Transfer Learning)**:
+    - Use pre-trained models like **ResNet**, **MobileNet**, or **EfficientNet** to leverage knowledge from large datasets like ImageNet. This can reduce training time and improve accuracy.
+    - **Transfer Learning**: Fine-tune the pre-trained model on your specific dataset by replacing the final layer(s) to output the aflatoxin contamination categories.
+- **Custom CNN Architecture**:
+    - If transfer learning isn’t sufficient, a custom CNN architecture can be built. Design layers that fit the complexity of your data, including convolutional layers, pooling layers, and fully connected layers.
+
+## 5. Train the Model
+
+Now that the data is prepared and the model architecture is selected, proceed to training:
+
+- **Loss Function**: Use **categorical cross-entropy** as the loss function since this is a multi-class classification problem.
+- **Optimizer**: Use optimizers like **Adam** or **SGD** with momentum to adjust learning rates and improve convergence.
+- **Batch Size & Epochs**: Experiment with different batch sizes (e.g., 32, 64) and run multiple epochs (e.g., 50-100 epochs). Monitor overfitting using early stopping techniques.
+- **Hyperparameter Tuning**: Fine-tune hyperparameters like learning rate, dropout rate, and number of layers to optimize performance.
+
+## 6. Evaluate the Model
+
+After training, evaluate the model to ensure it meets the desired performance criteria:
+
+- **Confusion Matrix**: Generate a confusion matrix to analyze how well the model performs across all contamination bands (e.g., 0-30 ppb, 31-50 ppb).
+- **Performance Metrics**: Evaluate key metrics like accuracy, precision, recall, F1-score for each class. For imbalanced datasets, consider using **weighted precision/recall**.
+- **Cross-Validation**: Perform **k-fold cross-validation** to ensure that the model generalizes well across different subsets of the data.
+
+## 7. Improve the Model
+
+If the model does not meet the performance goals, several techniques can be used to improve it:
+
+- **Data Augmentation**: Further enhance the dataset by introducing more variability in the training data.
+- **Model Regularization**: Use techniques like **dropout**, **batch normalization**, or **L2 regularization** to prevent overfitting.
+- **Hyperparameter Tuning**: Use methods like **grid search** or **random search** to find optimal values for hyperparameters (e.g., learning rate, batch size).
+- **Ensemble Methods**: Combine multiple models (e.g., bagging or boosting) to improve prediction accuracy.
+
+## 8. Test the Model
+
+Once the model is fine-tuned and evaluated, test its performance on a **holdout test set** or new data provided by the client:
+
+- **Validation on New Data**: Use unseen images from the client’s dataset to ensure that the model generalizes well to real-world samples.
+- **Performance Metrics Report**: Document the model's final accuracy, confusion matrix, and other performance metrics.
+
+## 9. Deploy the Model (For POC)
+
+For the POC phase, the model will be deployed in a hosted environment (cloud or on-prem):
+
+- **Deploy on Vendor's Environment**: Host the model on a cloud server (e.g., AWS, Azure) where it can accept image inputs and return aflatoxin contamination levels via an API.
+- **Performance Monitoring**: Set up tools to monitor inference time, model accuracy, and resource utilization to ensure smooth operation.
+
+## 10. Document and Report Results
+
+After deployment, prepare a comprehensive report to present to the client:
+
+- **POC Results**: Include detailed results of the model’s performance (e.g., accuracy, confusion matrix).
+- **Recommendations for Future Phases**: Provide insights on how the model can be scaled and improved further in Phase 2 (e.g., mobile app integration, on-device inference).
+
+## Tools and Technologies for Each Step
+
+1. **Preprocessing & Data Augmentation**:
+    - Tools: **OpenCV**, **Keras ImageDataGenerator**, **Albumentations**
+2. **Model Development**:
+    - Tools: **TensorFlow**, **Keras**, **PyTorch** (for building CNNs and transfer learning)
+3. **Training & Optimization**:
+    - Optimizers: **Adam**, **SGD**
+    - Techniques: Early stopping, learning rate scheduling
+4. **Evaluation**:
+    - Tools: **scikit-learn** (for confusion matrices and performance metrics)
+5. **Deployment**:
+    - Tools: **AWS SageMaker**, **Azure ML**, or **Google AI Platform**
diff --git a/docs/ai/computer-vision-cv/pre-trained-models.md b/docs/ai/computer-vision-cv/pre-trained-models.md
@@ -109,7 +109,7 @@ Here's how YOLO works:
 
 - **Grid:** YOLO's CNN divides an image into a grid.
 - **Bounding boxes:** Each cell in the grid predicts a number of bounding boxes.
-- **Class probabilities:** Each cell also predicts a class probability, which indicates the likelihood of an object being present in the box. 
+- **Class probabilities:** Each cell also predicts a class probability, which indicates the likelihood of an object being present in the box.
 
 YOLO is popular because of its single-stage architecture, real-time performance, and accuracy. It's well-suited for real-time applications like self-driving cars, video surveillance, and augmented reality.
 
@@ -139,6 +139,12 @@ YOLO is popular because of its single-stage architecture, real-time performance,
 | BiT-L (ResNet)               | **928 M**                       | 87.54 %                 | 2019 |
 | NoisyStudent EfficientNet-L2 | **480 M**                       | **88.4** %              | 2020 |
 | Meta Pseudo Labels           | **480 M**                       | **90.2** %              | 2021 |
+| CoCa (finetuned)             | 2100M                           | 91.0%                   | 2022 |
+| OmniVec (ViT)                |                                 | 92.4%                   | 2023 |
+
+Leaderboard - [ImageNet Benchmark (Image Classification) | Papers With Code](https://paperswithcode.com/sota/image-classification-on-imagenet)
+
+Models - [Models - Hugging Face](https://huggingface.co/models?pipeline_tag=image-classification)
 
 - [CNN Architectures: LeNet, AlexNet, VGG, GoogLeNet, ResNet and more… | by Siddharth Das | Analytics Vidhya | Medium](https://medium.com/analytics-vidhya/cnns-architectures-lenet-alexnet-vgg-googlenet-resnet-and-more-666091488df5)
 - [Difference between AlexNet, VGGNet, ResNet, and Inception | by Aqeel Anwar | Towards Data Science](https://towardsdatascience.com/the-w3h-of-alexnet-vggnet-resnet-and-inception-7baaaecccc96)
@@ -160,3 +166,4 @@ YOLO is popular because of its single-stage architecture, real-time performance,
 - [Top Pre-Trained Models for Image Classification - GeeksforGeeks](https://www.geeksforgeeks.org/top-pre-trained-models-for-image-classification/)
 - [Top 4 Pre-Trained Models for Image Classification + Python Code](https://www.analyticsvidhya.com/blog/2020/08/top-4-pre-trained-models-for-image-classification-with-python-code/)
 - [Best deep CNN architectures and their principles: from AlexNet to EfficientNet | AI Summer](https://theaisummer.com/cnn-architectures/)
+- [7 Best Image Classification Models You Should Know in 2023 - Jonas Cleveland](https://jonascleveland.com/best-image-classification-models/)
diff --git a/docs/ai/computer-vision-cv/readme.md b/docs/ai/computer-vision-cv/readme.md
@@ -2,6 +2,8 @@
 
 - [Computer Vision (CV) Intro](ai/computer-vision-cv/intro.md)
 - [Pre-Trained Models](ai/computer-vision-cv/pre-trained-models.md)
+- [Convolutional Neural Network (CNN)](ai/computer-vision-cv/convolutional-neural-network-cnn.md)
+- [Model Building Stages](ai/computer-vision-cv/model-building-stages.md)
 - [Image / Data Labeling Tools](image-data-labeling-tools)
 - [Image Formats](image-formats)
 - [MNIST for ML Begineers | Tensorflow](mnist-for-ml-beginners-tensorflow)

diff --git a/docs/ai/data-science/datasets.md b/docs/ai/data-science/datasets.md
@@ -1,5 +1,7 @@
 # Datasets
 
+[Home - Data Commons](https://datacommons.org/)
+
 https://www.kaggle.com/dalpozz/creditcardfraud
 
 [20+ Amazing (And Free) Data Sources Anyone Can Use To Build AIs](https://www.forbes.com/sites/bernardmarr/2023/05/17/20-amazing-and-free-data-sources-anyone-can-use-to-build-ais/?sh=17c13eec617f)

diff --git a/docs/ai/deep-learning/commands.md b/docs/ai/deep-learning/commands.md
@@ -63,7 +63,7 @@ def L1(yhat, y):
 def L2(yhat, y):
     loss = np.sum(np.dot((y-yhat),(y-yhat)))
 
-A trick when you want to flatten a matrix X of shape (a, b, c, d) to a matrix X_flatten of shape (b∗∗c∗∗d, a) is to use:
+# A trick when you want to flatten a matrix X of shape (a, b, c, d) to a matrix X_flatten of shape (b∗∗c∗∗d, a) is to use:
 
 X_flatten = X.reshape(X.shape [0], -1).T
 ```