updated docs

deepaksood619 · Feb 8, 2024 · 8ce2c0a · 8ce2c0a
1 parent b5c5f60
commit 8ce2c0a
Show file tree

Hide file tree

Showing 56 changed files with 440 additions and 288 deletions.
diff --git a/docs/ai/computer-vision-cv/image-formats.md b/docs/ai/computer-vision-cv/image-formats.md
@@ -29,7 +29,7 @@ https://www.exifdata.com
 
 In a number of tests by [Netflix](https://en.wikipedia.org/wiki/Netflix "Netflix") in 2020, AVIF showed better compression efficiency than [JPEG](https://en.wikipedia.org/wiki/JPEG "JPEG") as well as better detail preservation, fewer blocking artifacts and less [color bleeding](https://en.wikipedia.org/wiki/Color_bleeding_(printing) "Color bleeding (printing)") around hard edges in composites of natural images, text, and graphics.
 
-AV1 Image File Format (AVIF) is an encoding based on the open source AV1 video codec. AVIF is [even newer](https://caniuse.com/avif)—than WebP, only supported in Chrome and Opera since 2020, Firefox in 2021, and Safari in 2022. As with WebP, AVIF aims to address every conceivable use case for raster images on the web: GIF-like animation, PNG-like transparency, and improved perceptual quality at file sizes smaller than JPEG or WebP.
+AV1 Image File Format (AVIF) is an encoding based on the open source AV1 video codec. AVIF is [even newer](https://caniuse.com/avif)-than WebP, only supported in Chrome and Opera since 2020, Firefox in 2021, and Safari in 2022. As with WebP, AVIF aims to address every conceivable use case for raster images on the web: GIF-like animation, PNG-like transparency, and improved perceptual quality at file sizes smaller than JPEG or WebP.
 
 - No progressive rendering
 - AVIF + Blur is good
@@ -83,4 +83,4 @@ http://www.libpng.org/pub/png/apps/pngcheck.html
 
 [How are Images Compressed? [46MB ↘↘ 4.07MB] JPEG In Depth - YouTube](https://www.youtube.com/watch?v=Kv1Hiv3ox8I)
 
-[JPEG vs PNG vs GIF — which image format to use and when? | by Rahul Nanwani | Blog | ImageKit.io](https://blog.imagekit.io/jpeg-vs-png-vs-gif-which-image-format-to-use-and-when-c8913ae3e01d)
+[JPEG vs PNG vs GIF - which image format to use and when? | by Rahul Nanwani | Blog | ImageKit.io](https://blog.imagekit.io/jpeg-vs-png-vs-gif-which-image-format-to-use-and-when-c8913ae3e01d)
diff --git a/docs/ai/data-science/tableau/dashboarding.md b/docs/ai/data-science/tableau/dashboarding.md
@@ -45,7 +45,7 @@ Data source filters in Tableau are mainly used to restrict sensitive data from v
 
 ## Overlays
 
-[Create a dashboard overlay — ENTIRELY in Tableau | by Brittany Rosenau | Aug, 2023 | Medium](https://brittanyrosenau.medium.com/create-a-dashboard-overlay-entirely-in-tableau-a8e9543979e5)
+[Create a dashboard overlay - ENTIRELY in Tableau | by Brittany Rosenau | Aug, 2023 | Medium](https://brittanyrosenau.medium.com/create-a-dashboard-overlay-entirely-in-tableau-a8e9543979e5)
 
 ## Legends
 

diff --git a/docs/ai/libraries/mlops-model-deployment.md b/docs/ai/libraries/mlops-model-deployment.md
@@ -67,7 +67,7 @@ https://github.com/kubeflow/kubeflow
 
 ### Courses
 
-[The Full Stack 7-Steps MLOps Framework — Paul Iusztin](https://www.pauliusztin.me/courses/the-full-stack-7-steps-mlops-framework)
+[The Full Stack 7-Steps MLOps Framework - Paul Iusztin](https://www.pauliusztin.me/courses/the-full-stack-7-steps-mlops-framework)
 
 [MLOps Course - Made With ML](https://madewithml.com/courses/mlops/)
 

diff --git a/docs/ai/llm/design-patterns.md b/docs/ai/llm/design-patterns.md
@@ -12,11 +12,11 @@ At a very high level, the workflow can be divided into three stages:
 
 - **Data preprocessing / embedding:** This stage involves storing private data (legal documents, in our example) to be retrieved later. Typically, the documents are broken into chunks, passed through an embedding model, then stored in a specialized database called a vector database.
 - **Prompt construction / retrieval:** When a user submits a query (a legal question, in this case), the application constructs a series of prompts to submit to the language model. A compiled prompt typically combines a prompt template hard-coded by the developer; examples of valid outputs called few-shot examples; any necessary information retrieved from external APIs; and a set of relevant documents retrieved from the vector database.
-- **Prompt execution / inference:** Once the prompts have been compiled, they are submitted to a pre-trained LLM for inference—including both proprietary model APIs and open-source or self-trained models. Some developers also add operational systems like logging, caching, and validation at this stage.
+- **Prompt execution / inference:** Once the prompts have been compiled, they are submitted to a pre-trained LLM for inference-including both proprietary model APIs and open-source or self-trained models. Some developers also add operational systems like logging, caching, and validation at this stage.
 
-This looks like a lot of work, but it’s usually easier than the alternative: training or fine-tuning the LLM itself. You don’t need a specialized team of ML engineers to do in-context learning. You also don’t need to host your own infrastructure or buy an expensive dedicated instance from OpenAI. This pattern effectively reduces an AI problem to a data engineering problem that most startups and big companies already know how to solve. It also tends to outperform fine-tuning for relatively small datasets—since a specific piece of information needs to occur at least ~10 times in the training set before an LLM will remember it through fine-tuning—and can incorporate new data in near real time.
+This looks like a lot of work, but it’s usually easier than the alternative: training or fine-tuning the LLM itself. You don’t need a specialized team of ML engineers to do in-context learning. You also don’t need to host your own infrastructure or buy an expensive dedicated instance from OpenAI. This pattern effectively reduces an AI problem to a data engineering problem that most startups and big companies already know how to solve. It also tends to outperform fine-tuning for relatively small datasets-since a specific piece of information needs to occur at least ~10 times in the training set before an LLM will remember it through fine-tuning-and can incorporate new data in near real time.
 
-One of the biggest questions around in-context learning is: What happens if we just change the underlying model to increase the context window? This is indeed possible, and it is an active area of research (e.g., see the [Hyena paper](https://arxiv.org/abs/2302.10866) or this [recent post](https://blog.gopenai.com/how-to-speed-up-llms-and-use-100k-context-window-all-tricks-in-one-place-ffd40577b4c)). But this comes with a number of tradeoffs—primarily that cost and time of inference scale quadratically with the length of the prompt. Today, even linear scaling (the best theoretical outcome) would be cost-prohibitive for many applications. A single GPT-4 query over 10,000 pages would cost hundreds of dollars at current API rates. So, we don’t expect wholesale changes to the stack based on expanded context windows
+One of the biggest questions around in-context learning is: What happens if we just change the underlying model to increase the context window? This is indeed possible, and it is an active area of research (e.g., see the [Hyena paper](https://arxiv.org/abs/2302.10866) or this [recent post](https://blog.gopenai.com/how-to-speed-up-llms-and-use-100k-context-window-all-tricks-in-one-place-ffd40577b4c)). But this comes with a number of tradeoffs-primarily that cost and time of inference scale quadratically with the length of the prompt. Today, even linear scaling (the best theoretical outcome) would be cost-prohibitive for many applications. A single GPT-4 query over 10,000 pages would cost hundreds of dollars at current API rates. So, we don’t expect wholesale changes to the stack based on expanded context windows
 
 [Emerging Architectures for LLM Applications | Andreessen Horowitz](https://a16z.com/2023/06/20/emerging-architectures-for-llm-applications/)
 

diff --git a/docs/ai/llm/interview-questions.md b/docs/ai/llm/interview-questions.md
@@ -28,7 +28,7 @@ Finally, the **decoder** receives the output of the **encoder component** and al
 
 During training, the model is presented with pairs of sentences, some of which are consecutive in the original text, and some of which are not. The model is then trained to predict whether a given pair of sentences are adjacent or not. This allows the model to **understand longer-term dependencies across sentences**.
 
-Researchers have found that without **NSP**, **BERT** performs worse on every single metric — so its use it’s relevant to language modeling.
+Researchers have found that without **NSP**, **BERT** performs worse on every single metric - so its use it’s relevant to language modeling.
 
 ## How can you _evaluate the performance_ of Language Models?