Skip to content

Commit

Permalink
updated docs
Browse files Browse the repository at this point in the history
  • Loading branch information
deepaksood619 committed Feb 28, 2024
1 parent 3d72c22 commit 96d4dec
Show file tree
Hide file tree
Showing 22 changed files with 629 additions and 1,071 deletions.
30 changes: 30 additions & 0 deletions docs/ai/data-science/data-governance.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Data Governance

Data governance (DG) is the process of managing the availability, usability, integrity and security of the [data](https://searchdatamanagement.techtarget.com/definition/data) in enterprise systems, based on internal data standards and policies that also control data usage. Effective data governance ensures that data is consistent and trustworthy and doesn't get misused. It's increasingly critical as organizations face new data privacy regulations and rely more and more on data analytics to help optimize operations and drive business decision-making.

Ethical Principles around Data

1. Autonomy - The right to control your data, possibly via surrogates
2. Informed consent - You should explicitly appove use of your data based on understanding
3. Beneficence - People using your data should do it for your benefit
4. Non-maleficence - Do no harm

## ODPi

ODPi creates open source standards to help you use and understand data across all platforms.

https://www.odpi.org

https://searchdatamanagement.techtarget.com/definition/data-governance

https://en.wikipedia.org/wiki/Data_governance

https://www.oreilly.com/content/data-governance-and-the-death-of-schema-on-read

![managing sensitive data](../../media/Pasted%20image%2020240228190110.png)

![Data Governance](../../../media/Pasted%20image%2020240213122425.png)

## Links

[Designing Data Governance from the Ground Up • Lauren Maffeo & Samia Rahman • GOTO 2023 - YouTube](https://www.youtube.com/watch?v=A8dVHjRENBQ)
42 changes: 0 additions & 42 deletions docs/ai/data-science/intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,10 +85,6 @@ Data handling - querying, slicing, joining

https://brohrer.github.io/data_science_archetypes.html

## Interview Questions

https://www.toptal.com/data-science#hiring-guide

## Questions for any new data science project

1. What is the question you are trying to answer?
Expand All @@ -115,53 +111,15 @@ https://www.toptal.com/data-science#hiring-guide
8. Numba (Code Optimization - convert to LLVM)
9. Cython (Code Optimization - compiles to C)

## Data Governance

Data governance (DG) is the process of managing the availability, usability, integrity and security of the [data](https://searchdatamanagement.techtarget.com/definition/data) in enterprise systems, based on internal data standards and policies that also control data usage. Effective data governance ensures that data is consistent and trustworthy and doesn't get misused. It's increasingly critical as organizations face new data privacy regulations and rely more and more on data analytics to help optimize operations and drive business decision-making.

Ethical Principles around Data

1. Autonomy

The right to control your data, possibly via surrogates

2. Informed consent

You should explicitly appove use of your data based on understanding

3. Beneficence

People using your data should do it for your benefit

4. Non-maleficence

Do no harm

## ODPi

ODPi creates open source standards to help you use and understand data across all platforms.

https://www.odpi.org

https://searchdatamanagement.techtarget.com/definition/data-governance

https://en.wikipedia.org/wiki/Data_governance

https://www.oreilly.com/content/data-governance-and-the-death-of-schema-on-read

## Topic Models

The grouping of relevant words is highly suggestive of an abstract theme which is called a topic. Based on the assumption that words that are in the same topic are more likely to occur together, it is possible to attribute phrases or keywords to a particular topic. This allows us to alias a particular topic with a number of phrases and words.

- A topic example - tobacco, farm, crops
- Steps to perform topic modeling

- Remove stop words

- Stripping punctuation

- Bigram collocation detection

- Lemmatization
- Topic dendrograms (for hierarical clustering)
- Topic graphs
Expand Down
1 change: 1 addition & 0 deletions docs/ai/data-science/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
- [Intro](ai/data-science/intro.md)
- [Data Mining](data-mining)
- [Data Analysis](data-analysis)
- [Data Governance](ai/data-science/data-governance.md)
- [Datasets](ai/data-science/datasets.md)
- [Recommender System](recommender-system)
- [Topics](topics)
Expand Down
2 changes: 2 additions & 0 deletions docs/ai/libraries/mlops-model-deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,8 @@ https://github.com/kubeflow/kubeflow

[sig-mlops/roadmap/2022/MLOpsRoadmap2022.md at main · cdfoundation/sig-mlops · GitHub](https://github.com/cdfoundation/sig-mlops/blob/main/roadmap/2022/MLOpsRoadmap2022.md)

[MLOps Roadmap](https://roadmap.sh/mlops)

### Examples

[GitHub - sayakpaul/ml-deployment-k8s-fastapi: This project shows how to serve an ONNX-optimized image classification model as a web service with FastAPI, Docker, and Kubernetes.](https://github.com/sayakpaul/ml-deployment-k8s-fastapi)
Expand Down
2 changes: 2 additions & 0 deletions docs/ai/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@
- [Courses](courses/readme.md)
- [Others / Resources / Interview](ai/others-resources-interview.md)
- [Hackathons](ai/hackathons.md)
- [Solutions](ai/solutions.md)
- [Social Media Analytics Solution](ai/social-media-analytics-solution.md)

## AGI (Artificial General Intelligence)

Expand Down
23 changes: 23 additions & 0 deletions docs/ai/social-media-analytics-solution.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Social Media Analytics Solution

[Build and deploy a social media analytics solution - Azure Architecture Center | Microsoft Learn](https://learn.microsoft.com/en-us/azure/architecture/ai-ml/architecture/build-deploy-social-media-analytics-solution)

[Social media analysis with Azure Stream Analytics - Azure Stream Analytics | Microsoft Learn](https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-twitter-sentiment-analysis-trends)

![social media analytics solution architecture](../media/Pasted%20image%2020240227211925.png)

### Dataflow

1. Azure Synapse Analytics pipelines ingest external data and store that data in Azure Data Lake. One pipeline ingests data from news APIs. The other pipeline ingests data from the Twitter API.
2. Apache Spark pools in Azure Synapse Analytics are used to process and enrich the data.
3. The Spark pools use the following services:
- Azure Cognitive Service for Language, for named entity recognition (NER), key phrase extraction, and sentiment analysis
- Azure Cognitive Services Translator, to translate text
- Azure Maps, to link data to geographical coordinates
4. The enriched data is stored in Data Lake.
5. A serverless SQL pool in Azure Synapse Analytics makes the enriched data available to Power BI.
6. Power BI Desktop dashboards provide insights into the data.
7. As an alternative to the previous step, Power BI dashboards that are embedded in Azure App Service web apps provide web and mobile app users with insights into the data.
8. As an alternative to steps 5 through 7, the enriched data is used to train a custom machine learning model in Azure Machine Learning.
9. The model is deployed to a Machine Learning endpoint.
10. A managed online endpoint is used for online, real-time inferencing, for instance, on a mobile app (**A**). Alternatively, a batch endpoint is used for offline model inferencing (**B**).
158 changes: 158 additions & 0 deletions docs/ai/solutions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
# Solutions

[Artificial intelligence (AI) architecture - Azure Architecture Center | Microsoft Learn](https://learn.microsoft.com/en-us/azure/architecture/ai-ml/)

- Explore ideas about
- Document processing
- Content tagging with NLP
- Knowledge mining for customer feedback
- Large-scale custom NLP
- Image processing
- Image classification with CNNs
- Retail assistant with visual capabilities
- Visual assistant
- Vision classifier model
- Audio processing
- Keyword digital text processing
- Predictive analytics
- Customer churn prediction
- Personalized offers
- Marketing optimization
- Personalized marketing solutions
- Chat bots
- Search and query a knowledge base
- AI at the edge
- AI at the edge with Azure Stack Hub
- Disconnected AI at the edge with Azure Stack Hub
- Video ingestion and object detection on the edge
- Document enrichment
- AI enrichment with Cognitive Search
- MLOps
- Model deployment to AKS
- Orchestrate MLOps with Azure Databricks
- Deploy AI and ML at the edge
- Many models ML with Spark
- Many models with Machine Learning
- Other ideas
- Azure Machine Learning architecture
- Autonomous systems
- Data science and machine learning
- Design architectures
- Chat bots
- Baseline end-to-end chat with OpenAI
- Document processing
- Automate document classification
- Automate document processing
- Automate PDF form processing
- Build custom document processing models
- Multiple indexers with Azure Cognitive Search
- Video and image classification
- Automate video analysis
- Image classification
- Audio processing
- Speech transcription pipeline
- Extract and analyze call center data
- Predictive analytics
- Determine customer lifetime and churn
- Batch scoring
- Batch scoring for deep learning
- Batch scoring with Python
- Batch scoring with R
- Batch scoring with Spark on Databricks
- Recommendations
- Real-time recommendation API
- [Social media analytics solution](ai/social-media-analytics-solution.md)
- Monitoring
- Monitor OpenAI models
- Regulatory
- Secure research for regulated data
- Apply guidance
- Machine learning options
- Document processing
- OpenAI GPT-3 summarization
- Build language model pipelines
- Audio processing
- Custom speech-to-text overview
- Custom speech-to-text
- Conversation summarization
- MLOps
- Machine learning operations (MLOps) v2
- MLOps for Python models
- Network security for MLOps
- MLOps maturity model
- Upscale ML lifecycle with MLOps
- Team Data Science Process
- Overview
- Lifecycle
- Overview
- 1. Business understanding
- 2. Data acquisition and understanding
- 3. Modeling
- 4. Deployment
- 5. Customer acceptance
- Roles and tasks
- Overview
- Group manager
- Team lead
- Project lead
- Individual contributor
- Development
- Agile development
- Collaborative coding with Git
- Execute data science tasks
- Code testing
- Track progress
- Operationalization
- DevOps - CI/CD
- Training
- For data scientists
- How To
- Set up data science environments
- Environment setup
- Platforms and tools
- Analyze business needs
- Identify your scenario
- Acquire and understand data
- Ingest data
- Overview
- Move to/from Blob storage
- Overview
- Use Storage Explorermove-data-to-azure-blob-using-azure-storage-explorer.md
- Use SSIS
- Move to SQL on a VM
- Move to Azure SQL Database
- Move to Hive tables
- Move to SQL partitioned tables
- Move from on-premises SQL
- Explore and visualize data
- Prepare data
- Explore data
- Overview
- Explore Azure Blob Storage
- Sample data
- Overview
- Use Blob Storage
- Use SQL Server
- Process data
- Access with Python
- Use Azure Data Lake
- Use SQL VM
- Use data pipeline
- Use Spark
- Use Scala and Spark
- Develop models
- Engineer features
- Overview
- Deploy models in production
- Build and deploy a model using Azure Synapse Analytics
- OpenAI
- Explore ideas about
- Search and query a knowledge base
- Design architectures
- Baseline end-to-end chat with OpenAI
- Extract and analyze call center data
- Monitor OpenAI models
- Apply guidance
- Build language model pipelines
- OpenAI GPT-3 summarization
- Conversation summarization
2 changes: 1 addition & 1 deletion docs/book-summaries/thinking-in-systems.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,5 +22,5 @@ By Donella H. Meadows
- By changing **buffers**, **system design** and **delays**, we can produce more effective systems
- Systems can be made even more efficient by adjusting their internal mechanisms and rules
- Paying attention to the inner workings of systems will help you better understand the world
- **Emergence:** Emergence is a simple but powerful concept. It means that when things come together, somethingnew and unexpectedhappens. And this new thing isn't present in the individual elements. It's biological as much as social. A caterpillar becomes a butterfly.
- **Emergence:** Emergence is a simple but powerful concept. It means that when things come together, something new and unexpected happens. And this new thing isn't present in the individual elements. It's biological as much as social. A caterpillar becomes a butterfly.
- Actionable advice - Always expect a positive outcome, not a negative one.
5 changes: 1 addition & 4 deletions docs/cloud/aws/security-identity-compliance/compliance.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,12 +148,9 @@ REGULATIONS

[Indian Institute of Banking & Finance (IIBF)](https://www.iibf.org.in/)

## Data Governance

![Data Governance](../../../media/Pasted%20image%2020240213122425.png)

## Others

- [Data Governance](ai/data-science/data-governance.md)
- CISA Certification - Certified Information Systems Auditor
- CISO - Chief Information Security Officer
- CMMI Level 3 - An appraisal at maturity level 3 **indicates an organization is performing at a “defined” level**. At this level, processes are well characterized and understood and are described in standards, procedures, tools, and methods.
7 changes: 7 additions & 0 deletions docs/cloud/others/azure/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,13 @@

[Azure Portal "How To" Series](https://www.youtube.com/playlist?list=PLLasX02E8BPBKgXP4oflOL29TtqTzwhxR)

[social-media-analytics-solution](ai/social-media-analytics-solution.md)

## Tools

- Azure Data Factory (ADF) - [Azure Data Factory - Data Integration Service | Microsoft Azure](https://azure.microsoft.com/en-in/products/data-factory/)
- Azure Databricks (ADB)

## Commands

```bash
Expand Down
Loading

0 comments on commit 96d4dec

Please sign in to comment.