Skip to content

Commit

Permalink
updated docs
Browse files Browse the repository at this point in the history
  • Loading branch information
deepaksood619 committed Dec 18, 2024
1 parent b2e72d4 commit 8c6e5d1
Show file tree
Hide file tree
Showing 18 changed files with 152 additions and 101 deletions.
2 changes: 1 addition & 1 deletion docs/ai/llm/llm-building.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@ The key distinction lies in their approaches: LangChain prioritizes customizatio

No matter what AI framework you pick, I always recommend using a robust data platform like SingleStore that supports not just vector storage but also hybrid search, low latency, fast data ingestion, all data types, AI frameworks integration, and much more.

![](../../media/Pasted%20image%2020241118181518.jpg)
![image](../../media/Pasted%20image%2020241118181518.jpg)

[A Beginner’s Guide to Building LLM-Powered Applications with LangChain! - DEV Community](https://dev.to/pavanbelagatti/a-beginners-guide-to-building-llm-powered-applications-with-langchain-2d6e)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ Uber's real-time data infrastructure is powered by a combination of advanced ope

The diagram below shows the overall landscape.

![](https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e74a5c9-a041-4657-a3e4-39017b238e76_1600x1017.png)
![image](https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e74a5c9-a041-4657-a3e4-39017b238e76_1600x1017.png)

Let’s take a closer look at the key technologies Uber relies on, how they work, and the unique tweaks that make them fit Uber's requirements.

Expand All @@ -50,7 +50,7 @@ Kafka is the backbone of Uber’s data streaming.

It handles trillions of messages and petabytes of data daily, helping to transport information from user apps (like driver and rider apps) and microservices. Kafka’s key role is to move this streaming data to batch and real-time systems.

![](https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35bab385-a2ed-4c4f-958d-66e20e5d269b_1600x813.png)
![image](https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35bab385-a2ed-4c4f-958d-66e20e5d269b_1600x813.png)

At Uber, Kafka was heavily customized to meet its large-scale needs. Some of the key features are as follows:

Expand All @@ -75,7 +75,7 @@ By implementing these changes, Uber has made Flink more reliable and easier to u

See the diagram below that shows the Unified Flink Architecture at Uber.

![](https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9e8a845-940c-468d-a19c-f39f1a8cc4b4_1600x1017.png)
![image](https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9e8a845-940c-468d-a19c-f39f1a8cc4b4_1600x1017.png)

### Apache Pinot for Real-Time OLAP

Expand Down Expand Up @@ -167,7 +167,7 @@ For example, surge pricing calculations, which depend on real-time supply and de

See the diagram below:

![](https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00f0c703-4ef5-4a6e-bc5e-82c3a6c86db6_1600x1141.png)
![image](https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00f0c703-4ef5-4a6e-bc5e-82c3a6c86db6_1600x1141.png)

This setup requires careful synchronization of data between regions. Uber uses uReplicator, a tool they developed to replicate Kafka messages across clusters, ensuring the system remains redundant and reliable. Even if one region goes down, the data is preserved and can be quickly restored in the backup region, minimizing disruption to the service.

Expand All @@ -181,7 +181,7 @@ If the primary region fails, the system fails over to a backup (passive) region,

See the diagram below that shows the Active-Passive setup.

![](https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bd81bc0-b086-4fa9-bde0-b16c1fe32634_1600x961.png)
![image](https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bd81bc0-b086-4fa9-bde0-b16c1fe32634_1600x961.png)

The key challenge in Active-Passive setups is offset synchronization—ensuring that the consumer in the backup region starts processing from the same point as the primary region.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ Arbitrage - Buy and sell commodities and make a safe profit, while the price adj

In the most intuitive sense, stationarity means that the statistical properties of a process generating a time series do not change over time. It does not mean that the series does not change over time, just that the way it changes does not itself change over time. The algebraic equivalent is thus a linear function, perhaps, and not a constant one; the value of a linear function changes as 𝒙 grows, but the way it changes remains constant - it has a constant slope; one value that captures that rate of change.

![](../../media/Pasted%20image%2020241011132306.png)
![image](../../media/Pasted%20image%2020241011132306.png)

Figure 1: Time series generated by a stationary (top) and a non-stationary (bottom) processes.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@

## State of Data Engineering 2024

![](../../../media/Screenshot%202024-07-15%20at%2012.16.36%20AM.jpg)
![image](../../../media/Screenshot%202024-07-15%20at%2012.16.36%20AM.jpg)

[State of Data Engineering 2024](https://8040338.fs1.hubspotusercontent-na1.net/hubfs/8040338/lakeFS%20State%20of%20Data%20Engineering%202024.pdf)

Expand Down
Loading

0 comments on commit 8c6e5d1

Please sign in to comment.