updated docs

deepaksood619 · Oct 18, 2024 · 18ba102 · 18ba102
1 parent 01d54c1
commit 18ba102
Show file tree

Hide file tree

Showing 71 changed files with 1,015 additions and 121 deletions.
diff --git a/docs/.obsidian/graph.json b/docs/.obsidian/graph.json
@@ -17,6 +17,6 @@
   "repelStrength": 10,
   "linkStrength": 1,
   "linkDistance": 250,
-  "scale": 0.1722256118200868,
+  "scale": 0.1279275687532539,
   "close": true
-}
+}
diff --git a/docs/about-me/projects/45-traditional-ai-case-studies.md b/docs/about-me/projects/45-traditional-ai-case-studies.md
@@ -0,0 +1,86 @@
+# Traditional AI Case study
+
+## Financial Technology (FinTech) Fraud Detection Case Study
+
+### Challenges
+
+A FinTech company was struggling with real-time fraud detection as the transaction volumes grew. Their current systems failed to maintain high accuracy, which posed a risk of financial losses and reduced customer trust.
+
+### Solution by Opstree
+
+Opstree partnered with the company to deliver a tailored AI-powered fraud detection solution. The solution leveraged **Amazon SageMaker** for model training and deployment, **Amazon Fraud Detector** for identifying suspicious activities, and proprietary models that combined supervised and unsupervised learning techniques, along with **scikit-learn** and **PyTorch** for custom fraud detection algorithms.
+
+Opstree further integrated **Amazon Textract** and **Amazon Rekognition** to strengthen document verification and identity validation processes, while adding advanced clustering techniques to uncover fraud networks. Continuous retraining pipelines were also implemented using **Amazon SageMaker**, ensuring the model stayed accurate over time.
+
+### Results
+
+Opstree's AI solution led to enhanced fraud detection with fewer false positives and negatives. The real-time detection capabilities and use of advanced services helped the company quickly identify fraudulent patterns and mitigate financial risks.
+
+- Reduced NPAs from 9% to 6%.
+- Decreased monthly financial losses by ₹2 Cr.
+- Faster fraud detection enabled quick action in high-risk areas.
+
+### Tools Used
+
+- **Amazon SageMaker**
+- **Amazon Fraud Detector**
+- **Amazon Textract**
+- **Amazon Rekognition**
+- Python (scikit-learn, PyTorch)
+
+---
+
+## Financial Technology (FinTech) Credit Risk Analysis and Modeling Case Study
+
+### Challenges
+
+The FinTech company needed to expedite loan approvals while reducing Non-Performing Assets (NPAs). Traditional credit risk methods resulted in slower processing times and suboptimal loan terms, negatively impacting customer satisfaction.
+
+### Solution by Opstree
+
+Opstree developed an AI-based credit risk analysis solution using a combination of supervised models, including **logistic regression** with **PSI/CSI tracking** to ensure model accuracy. Using **scikit-learn**, Opstree trained custom credit risk models and deployed them using **Python (FastAPI)**, which enabled efficient real-time decision-making.
+
+Additionally, **Amazon SageMaker** was used to automate model retraining, while **PowerBI** provided detailed analytics and dashboards to track credit risk performance. Opstree also incorporated clustering algorithms to assess borrower risk profiles based on alternative data sources such as transactional patterns and social profiles.
+
+### Results
+
+The credit risk AI solution revolutionized the company's lending operations, improving loan approval speed and accuracy while reducing NPAs. The detailed analytics allowed the company to optimize lending terms and improve customer satisfaction.
+
+- Reduced NPAs from 9% to 6%.
+- Reduced delinquency rates by 5%.
+- Reduced reliance on external credit scores by 25% through in-house AI models.
+
+### Tools Used
+
+- **Python (FastAPI)**
+- **scikit-learn**
+- **Amazon SageMaker**
+- **PowerBI** for analytics
+
+---
+
+## IoT Predictive Maintenance for HVAC Systems Case Study
+
+### Challenges
+
+An HVAC company needed a proactive maintenance strategy to reduce equipment downtime and operational costs. Their traditional reactive approach led to frequent breakdowns and costly repairs.
+
+### Solution by Opstree
+
+Opstree implemented an AI-driven predictive maintenance solution based on **time-series modeling**. Using **ARIMA** models developed with **scikit-learn**, Opstree analyzed data from IoT sensors monitoring system performance. The solution predicted potential failures, allowing the company to address issues before they escalated.
+
+By leveraging real-time monitoring with integrated alerts, the maintenance team could take preventive actions, reducing equipment breakdowns and optimizing overall system performance.
+
+### Results
+
+With Opstree’s predictive maintenance solution, the company achieved significant reductions in operational costs and equipment failures, while optimizing resource allocation and improving customer satisfaction.
+
+- Reduced equipment downtime by 25%.
+- Increased energy savings by 20%.
+- Improved compliance rates by 30%.
+
+### Tools Used
+
+- **scikit-learn** (ARIMA)
+- **Amazon SageMaker**
+- **Time-series analysis** models
diff --git a/...me/projects/46-podcast-streamlining-cloud-migration-through-data-engineering.md b/...me/projects/46-podcast-streamlining-cloud-migration-through-data-engineering.md
@@ -7,7 +7,7 @@ From designing robust data pipelines to managing complex transformations and opt
 ### Podcast Details
 
 - Speaker - Deepak Sood, Sr. AI, Data & DevOps Architect, OpsTree Solutions
-- Date - 15th October 2024
+- Date - Tuesday, 15th October 2024
 - Time - 6:30 pm – 8:00 pm (IST)
 
 ### Workshop Agenda

diff --git a/docs/about-me/projects/47-genai-case-study-careers360.md b/docs/about-me/projects/47-genai-case-study-careers360.md
@@ -44,3 +44,17 @@ Opstree’s solution empowered Careers360’s content team to focus on higher-le
 ## Conclusion
 
 With Opstree’s GenAI-driven solution using Amazon Bedrock, Careers360 was able to transform its content operations, achieving greater efficiency and saving significant research time for its large team of content creators.
+
+## Why Choose OpsTree
+
+### 1. Expertise in Custom GenAI Solutions
+
+OpsTree specializes in designing and deploying tailored GenAI solutions, like Retrieval-Augmented Generation (RAG), that cater to your unique business needs. Leveraging platforms like **Amazon Bedrock**, we create models fine-tuned to your data and workflows, ensuring the AI-generated insights are relevant, actionable, and optimized for your specific use cases.
+
+### 2. Seamless Integration with AWS Ecosystem
+
+As AWS experts, OpsTree excels in integrating GenAI models with existing AWS services such as S3, Lambda, SageMaker, and more. This enables end-to-end automation, real-time updates, and seamless scalability, making your GenAI solution robust and future-proof while benefiting from AWS’s secure infrastructure.
+
+### 3. Proven Track Record with Scalable AI Solutions
+
+OpsTree has a proven history of successfully delivering AI-driven projects, including large-scale content research systems like the one implemented for Careers360. Our expertise in deploying **scalable, cost-optimized** AI solutions ensures that you receive a high-performance, enterprise-grade system that can handle evolving data demands efficiently.
diff --git a/docs/about-me/projects/readme.md b/docs/about-me/projects/readme.md
@@ -35,6 +35,7 @@
 - [MLOps Case Studies](about-me/projects/64-mlops-case-studies.md)
 - [Case Study: Anomaly Detection in Metric Data using Isolation Forest](about-me/projects/51-case-study-anomaly-detection.md)
 - [GenAI Case Study - Careers360](about-me/projects/47-genai-case-study-careers360.md)
+- [Traditional AI Case Studies](about-me/projects/45-traditional-ai-case-studies.md)
 
 ### Bake.io
 

diff --git a/docs/ai/big-data/tools.md b/docs/ai/big-data/tools.md
@@ -48,6 +48,14 @@ https://www.talend.com
 
 https://www.youtube.com/watch?v=bqa0kB59SUc
 
+## Data on EKS
+
+![Data on EKS](../../media/Pasted%20image%2020241017195034.jpg)
+
+- [Hello from Data on EKS | Data on EKS](https://awslabs.github.io/data-on-eks/)
+- [GitHub - awslabs/data-on-eks: DoEKS is a tool to build, deploy and scale Data & ML Platforms on Amazon EKS](https://github.com/awslabs/data-on-eks)
+- [Introducing Data on EKS – Modernize Data Workloads on Amazon EKS | Containers](https://aws.amazon.com/blogs/containers/introducing-data-on-eks-modernize-data-workloads-on-amazon-eks/)
+
 ## SAAS
 
 - [Atlan](https://atlan.com/) (Enterprise Data Catalogs for DataOps)

diff --git a/docs/ai/data-science/datasets.md b/docs/ai/data-science/datasets.md
@@ -68,3 +68,7 @@ H and DS use similar datasets, and DS is basically the next-gen version of H. Wh
     - [Croissant: a metadata format for ML-ready datasets](https://research.google/blog/croissant-a-metadata-format-for-ml-ready-datasets/)
     - [GitHub - mlcommons/croissant: Croissant is a high-level format for machine learning datasets that brings together four rich layers.](https://github.com/mlcommons/croissant)
 - Cat / Dog - https://bit.ly/ImgClsKeras
+
+## Links
+
+- [5 Free Datasets to Kickstart Your Machine Learning Projects Today - MachineLearningMastery.com](https://machinelearningmastery.com/5-free-datasets-to-kickstart-your-machine-learning-projects-today/)
diff --git a/docs/ai/data-visualization/bi-tools.md b/docs/ai/data-visualization/bi-tools.md
@@ -78,6 +78,15 @@ Amazon QuickSight is built with "SPICE" -- a Super-fast, Parallel, In-memory Cal
 - [Amazon Q in QuickSight: Hands-On Demo for Generative BI and Real-Time Insights | Amazon Web Services - YouTube](https://www.youtube.com/watch?v=CFBlREfSItc)
 - [Amazon Q in QuickSight: 2024 Amazon QuickSight Learning Series - YouTube](https://www.youtube.com/watch?v=ioS4BZyxEK4)
 
+### Topics
+
+[Getting started with Amazon QuickSight Q - Amazon QuickSight](https://docs.aws.amazon.com/quicksight/latest/user/quicksight-q-get-started.html)
+
+- Exclude unused fields
+- Verify friendly field names
+- Add synonyms to fields
+- Review field configurations
+
 ## DataIQ
 
 DataIQ is a business intelligence platform designed to help organizations manage, analyze, and derive insights from their data. It typically combines data governance, analytics, and data science capabilities to enable companies to become more data-driven. DataIQ allows users to access and analyze large datasets, create predictive models, and generate actionable insights, often with a focus on improving business outcomes. It can integrate with various data sources and tools to provide a unified view of data across an organization. The platform is used by data professionals, including data scientists, analysts, and business users, to streamline data operations and decision-making processes.

diff --git a/docs/ai/deep-learning/roadmap.md b/docs/ai/deep-learning/roadmap.md
@@ -3,7 +3,7 @@
 ![complete roadmap to prepare for deep learning](../../media/Screenshot%202024-09-20%20at%2011.18.50%20PM.jpg)
 
 - Foundational - Introduction to Neural Network, Loss Function, Optimizers - Gradient Descent, SGD, Adagrad, RMSProp, Adam
-   	- Everyone is using Adam optimizer, since it is able to change the momentum i.e. the learning rate as your training is going on
+	- Everyone is using Adam optimizer, since it is able to change the momentum i.e. the learning rate as your training is going on
 - Activation function - ReLU, Sigmoid, Tanh
 - Geoffrey Hinton - inventor of backpropogation algorithm
 - Inputs, weights, bias
@@ -28,8 +28,8 @@
 
 - NLP
 - Sequence to sequence data
-   	- Sentence
-   	- Sales Forecasting
+	- Sentence
+	- Sales Forecasting
 - Neural language translation
 - HuggingFace, Ktrain
 

diff --git a/docs/ai/libraries/deep-learning-frameworks.md b/docs/ai/libraries/deep-learning-frameworks.md
@@ -1,23 +1,5 @@
 # Deep Learning Frameworks
 
-## Apache MXNet
-
-A scalable deep learning framework. Extremely fast and efficient. Capable of scaling across multiple GPUs and multiple machines.
-
-Apache MXNet is an open-source deep learning software framework that trains and deploys deep neural networks. It aims to be scalable, allows fast model training, and supports a flexible programming model and multiple programming languages (including C++, Python, Java, Julia, MATLAB, JavaScript, Go, R, Scala, Perl, and Wolfram Language). The MXNet library is portable and can scale to multiple GPUs and machines. It was co-developed by Carlos Guestrin at the University of Washington, along with GraphLab.
-
-As of September 2023, **it is no longer actively developed**. Apache MXNet was effectively abandoned due to a combination of factors including lack of significant contributions, outdated builds, and a shift in focus by its major backer, Amazon, towards other frameworks like PyTorch. The project saw no new releases for over a year, and there were very few pull requests or updates from contributors, leading to its move to the Apache Attic in 2023. Additionally, the community began migrating to other frameworks that offered more robust support and development activity.
-
-https://en.wikipedia.org/wiki/Apache_MXNet
-
-### MXNet Model Server
-
-Model Server for Apache MXNet is a tool for serving neural net models for inference
-
-Model Server for Apache MXNet (MMS) is a flexible and easy to use tool for serving deep learning models exported from [MXNet](http://mxnet.io/) or the Open Neural Network Exchange ([ONNX](http://onnx.ai/)).
-
-https://github.com/awslabs/mxnet-model-server
-
 ## Pytorch
 
 PyTorch ( Tensors and Dynamic neural networks in Python with strong GPU acceleration)
@@ -51,7 +33,6 @@ Open source machine learning library. Often used for neural networks, deep learn
 - **Keras** is not for beginners, its for rapid deployment and production. And meant to be used by the people who already understand the technology
 - **Pytorch** is great for research implementations, but it's very unnecessarily hard to deploy your model into production
 - **Tensorfow** is another great framework for deep learning. But is slow and memory hungry
-- After using Pytorch/Keras/Tensorflow 2.0, I finally decided that MXNet would be my frameworks of choice for Deep Learning.
 
 ### 1. PyTorch
 
@@ -152,6 +133,24 @@ If you are building a **deep learning computer vision model**, **PyTorch** or **
 
 For traditional machine learning tasks (e.g., using support vector machines or decision trees), **scikit-learn** is a better fit.
 
+## Apache MXNet
+
+A scalable deep learning framework. Extremely fast and efficient. Capable of scaling across multiple GPUs and multiple machines.
+
+Apache MXNet is an open-source deep learning software framework that trains and deploys deep neural networks. It aims to be scalable, allows fast model training, and supports a flexible programming model and multiple programming languages (including C++, Python, Java, Julia, MATLAB, JavaScript, Go, R, Scala, Perl, and Wolfram Language). The MXNet library is portable and can scale to multiple GPUs and machines. It was co-developed by Carlos Guestrin at the University of Washington, along with GraphLab.
+
+As of September 2023, **it is no longer actively developed**. Apache MXNet was effectively abandoned due to a combination of factors including lack of significant contributions, outdated builds, and a shift in focus by its major backer, Amazon, towards other frameworks like PyTorch. The project saw no new releases for over a year, and there were very few pull requests or updates from contributors, leading to its move to the Apache Attic in 2023. Additionally, the community began migrating to other frameworks that offered more robust support and development activity.
+
+https://en.wikipedia.org/wiki/Apache_MXNet
+
+### MXNet Model Server
+
+Model Server for Apache MXNet is a tool for serving neural net models for inference
+
+Model Server for Apache MXNet (MMS) is a flexible and easy to use tool for serving deep learning models exported from [MXNet](http://mxnet.io/) or the Open Neural Network Exchange ([ONNX](http://onnx.ai/)).
+
+https://github.com/awslabs/mxnet-model-server
+
 ## Links
 
 - https://www.kaggle.com/learn-forum/90594

diff --git a/docs/ai/libraries/mlops-model-deployment.md b/docs/ai/libraries/mlops-model-deployment.md
@@ -156,3 +156,4 @@ https://www.seldon.io
 - [Let’s Architect! Learn About Machine Learning on AWS | AWS Architecture Blog](https://aws.amazon.com/blogs/architecture/lets-architect-learn-about-machine-learning-on-aws/)
 - [AWS re:Invent 2023 - Introduction to MLOps engineering on AWS (TNC215) - YouTube](https://www.youtube.com/watch?v=2kzJPhgDkDE)
 - [AWS re:Invent 2023 - Zero to machine learning: Jump-start your data-driven journey (SMB204) - YouTube](https://www.youtube.com/watch?v=-CSrOKo8Qgs)
+- [Step-by-Step Guide to Deploying ML Models with Docker](https://www.kdnuggets.com/step-by-step-guide-to-deploying-ml-models-with-docker)
diff --git a/docs/ai/libraries/tools.md b/docs/ai/libraries/tools.md
@@ -186,6 +186,14 @@ https://www.cortex.dev
 - https://explosion.ai/software
 - https://web.superquery.io
 - [Announcing New Tools for Building with Generative AI on AWS | AWS Machine Learning Blog](https://aws.amazon.com/blogs/machine-learning/announcing-new-tools-for-building-with-generative-ai-on-aws/)
+- [7 Free Machine Learning Tools Every Beginner Should Master in 2024 - MachineLearningMastery.com](https://machinelearningmastery.com/7-free-machine-learning-tools-every-beginner-should-master-in-2024)
+	- Scikit-learn
+	- Great Expectations
+	- MLflow
+	- DVC (Data Version Control)
+	- SHAP (SHapley Additive exPlanations)
+	- FastAPI
+	- Docker
 
 ## SAAS Tools
 

diff --git a/docs/ai/llm/llm-building.md b/docs/ai/llm/llm-building.md
@@ -199,3 +199,4 @@ An LLM Agent is a software entity capable of reasoning and autonomously executin
 - [Let's reproduce GPT-2 (124M) - YouTube](https://www.youtube.com/watch?v=l8pRSuU81PU)
 - [Scaling and Reliability Challenges of LLama3](https://mlops.substack.com/p/scaling-and-reliability-challenges)
 - [Building LLMs from the Ground Up: A 3-hour Coding Workshop - YouTube](https://www.youtube.com/watch?v=quh7z1q7-uc&ab_channel=SebastianRaschka)
+- [How AWS engineers infrastructure to power generative AI](https://www.aboutamazon.com/news/aws/aws-infrastructure-generative-ai)
diff --git a/docs/ai/llm/models.md b/docs/ai/llm/models.md
@@ -109,6 +109,11 @@ Emotional prompting example - You are Dolphin, an uncensored and unbiased Al ass
 - [GitHub - tatsu-lab/alpaca\_eval: An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.](https://github.com/tatsu-lab/alpaca_eval)
 - [A Gentle Introduction to LLM Evaluations - Elena Samuylova - YouTube](https://www.youtube.com/live/ac6ZB5QEwGU)
 - [Eureka: OSS Framework to evaluate LLMs - by Bugra Akyildiz](https://mlops.substack.com/p/eureka-oss-framework-to-evaluate)
+- [The Needle In a Haystack Test. Evaluating the performance of RAG… | by Aparna Dhinakaran | Towards Data Science](https://towardsdatascience.com/the-needle-in-a-haystack-test-a94974c1ad38)
+	- [GitHub - gkamradt/LLMTest\_NeedleInAHaystack: Doing simple retrieval from LLM models at various context lengths to measure accuracy](https://github.com/gkamradt/LLMTest_NeedleInAHaystack)
+	- [The Needle In a Haystack Test: Evaluating the Performance of LLM RAG Systems - Arize AI](https://arize.com/blog-course/the-needle-in-a-haystack-test-evaluating-the-performance-of-llm-rag-systems/)
+	- [Unlocking precision: The "Needle-in-a-Haystack" test for LLM evaluation](https://labelbox.com/guides/unlocking-precision-the-needle-in-a-haystack-test-for-llm-evaluation/)
+	- [The Needle in the Haystack Test and How Gemini Pro Solves It | Google Cloud Blog](https://cloud.google.com/blog/products/ai-machine-learning/the-needle-in-the-haystack-test-and-how-gemini-pro-solves-it)
 
 ### Tools
 

diff --git a/docs/ai/llm/tools.md b/docs/ai/llm/tools.md
@@ -20,6 +20,7 @@
 - [The Amazing AI Super Tutor for Students and Teachers | Sal Khan | TED - YouTube](https://www.youtube.com/watch?v=hJP5GqnTrNo&ab_channel=TED)
 - [WebChatGPT: ChatGPT with internet access | Chrome Web Store - Extensions](https://chrome.google.com/webstore/detail/webchatgpt-chatgpt-with-i/lpfemeioodjbpieminkklglpmhlngfcn/related)
 - [NotebookLM | Note Taking & Research Assistant Powered by AI](https://notebooklm.google/)
+	- [How to Create YouTube Video Study Guides with NotebookLM - KDnuggets](https://www.kdnuggets.com/how-to-create-youtube-video-study-guides-with-notebooklm)
 
 ## AI Generators
 

diff --git a/docs/ai/ml-fundamentals/intro.md b/docs/ai/ml-fundamentals/intro.md
@@ -124,3 +124,4 @@ https://betterexplained.com/articles/intuitive-convolution
 - https://www.toptal.com/machine-learning/interview-questions
 - [Mathematics of Machine Learning](https://www.youtube.com/watch?v=8onB7rPG4Pk)
 - [A friendly introduction to linear algebra for ML (ML Tech Talks) - YouTube](https://www.youtube.com/watch?v=LlKAna21fLE)
+- [A Roadmap for Your Machine Learning Career - MachineLearningMastery.com](https://machinelearningmastery.com/a-roadmap-for-your-machine-learning-career/)
diff --git a/docs/ai/nlp/word-embedding-to-transformers.md b/docs/ai/nlp/word-embedding-to-transformers.md
@@ -684,3 +684,8 @@ Mamba is a linear-time language model that outperforms Transformers on various t
 [Mamba: The Easy Way](https://jackcook.com/2024/02/23/mamba.html)
 
 [Mamba Explained](https://thegradient.pub/mamba-explained/)
+
+## Links
+
+- [State Space Sequence Models over Transformers?](https://mlops.substack.com/p/state-space-sequence-models-over)
+- [Introduction - Hugging Face NLP Course](https://huggingface.co/learn/nlp-course/chapter1/1)