-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docs: Add Content Library Page to the docs #13335
Conversation
|
||
- **2020-02-27**: How Query Engines Work [Online Book](https://andygrove.io/2020/02/how-query-engines-work/) | ||
|
||
## ✨ Good Reads |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rendering of the notion page is much nicer:
I started trying to replicate the formatting with ChatGPT but it still needs cleaning up.
Here is the raw markdown from Notion (when I exporte the notion site as markdown):
📚 DF Content Library
🧭 Foundational Contents
- 2024-06-13 2024 ACM SIGMOD International Conference on Management of Data Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query Engine (talk) slides, recording, paper
- 2024-06-07 https://www.youtube.com/watch?v=-DpKcPfnNms&t=5s
- 2023-04-05 The Apache Arrow DataFusion Architecture Part 3: Physical Plan and Execution. slides, recording
- 2023-04-04 The Apache Arrow DataFusion Architecture Part 2: Logical Plans and Expressions. slides, recording
- 2023-03-31 The Apache Arrow DataFusion Architecture Part 1: Query Engines. slides, recording
- 2020-02-27 https://andygrove.io/2020/02/how-query-engines-work/
✨ Good Reads
- 2024-10-16 https://www.letsql.com/posts/candle-image-segmentation/
- 2024-09-23 → 2024-12-02 Carnegie Mellon University: Database Building Blocks Seminar Series - Fall 2024
- 2024-10-28 https://www.youtube.com/watch?v=fltZMO8EGl0&list=PLSE8ODhjZXjZc2AdXq_Lc1JS62R48UX2L&index=6
- 2024-10-21 https://www.youtube.com/watch?v=tyM-ec1lKfU&list=PLSE8ODhjZXjZc2AdXq_Lc1JS62R48UX2L&index=5
- 2024-10-07 https://www.youtube.com/watch?v=Vxb8TELNM98&list=PLSE8ODhjZXjZc2AdXq_Lc1JS62R48UX2L&index=4
- 2024-09-23 https://www.youtube.com/watch?v=iJhRbDFJjbg&list=PLSE8ODhjZXjZc2AdXq_Lc1JS62R48UX2L&index=2
- 2024-09-30 https://www.youtube.com/watch?v=o59s0d3HE1k&list=PLSE8ODhjZXjZc2AdXq_Lc1JS62R48UX2L&index=3
- 2024-09-17 https://www.youtube.com/watch?v=2z11xtYw_xs
- 2024-08-25 Pydantic/logfire: We're changing database
- 2024-08-15 https://www.youtube.com/watch?v=RVLshX6fbds
- 2024-08-14 https://uwheel.rs/post/datafusion_uwheel/
- 2024-06-17 https://blog.lancedb.com/columnar-file-readers-in-depth-apis-and-fusion/
- 2024-06-14 2024 Simplicity in Management of Data (SiMOD) DataFusion: The Case for Building Open Data Systems (keynote) slides
- 2024-05-29 https://cube.dev/blog/query-push-down-in-cubes-semantic-layer
- 2024-03-26 → 2024-06-26 Building InfluxDB 3.0 (and other systems) without starting from "scratch" with Apache DataFusion
- 2024-06-26 Microsoft Gray Systems Lab: Building InfluxDB 3.0 (and other systems) without starting from "scratch" with Apache DataFusion slides
- 2024-03-26 DataCouncil 2024: Building InfluxDB 3.0 with Apache Arrow, DataFusion, Flight and Parquet. slides, recording
- 2024-03-20 https://www.youtube.com/watch?v=P3dXH61Kr5U
- 2024-03-18 https://www.influxdata.com/blog/making-recent-value-queries-hundreds-times-faster/
- 2023-10-25 https://www.influxdata.com/blog/flight-datafusion-arrow-parquet-fdap-architecture-influxdb/
- 2023-09-26 https://www.kamu.dev/blog/2023-09-datafusion-flightsql/
- 2023-08-15https://www.synnada.ai/blog/running-window-query-in-stream-processing
- 2023-08-05 InfluxData: Aggregating Millions of Groups Fast in Apache Arrow DataFusion. InfluxData, DataFusion.
- 2023-07-28https://www.synnada.ai/blog/sliding-window-hash-join-swhj
- 2023-07-13https://www.synnada.ai/blog/probabilistic-data-structures-in-streaming-count-min-sketch
- 2023-05-25 https://www.youtube.com/watch?v=NEL6DluUxgw
- 2023-02-20https://www.synnada.ai/blog/general-purpose-stream-joins-via-pruning-symmetric-hash-joins
- 2023-02-15 → 2023-09-27 Implementing InfluxDB IOx, "from scratch" using Apache Arrow, DataFusion, and Rust
- 2023-09-27 MIT Database Group: Implementing InfluxDB IOx, "from scratch" using Apache Arrow, DataFusion, and Rust. slides
- 2023-06-02 [Dutch Seminar on Database System Design]: Implementing InfluxDB IOx, "from scratch" using Apache Arrow, DataFusion, and Rust. slides, recording
- 2023-02-15 [Invited Talk at Optum Labs]: Building a new time series database "from scratch" Using Apache Arrow, Parquet, DataFusion and Rust slides,
- 2023-01-01 https://andygrove.io/2023/01/what-i-want-from-datafusion-2023/
- 2022-12-07 https://www.influxdata.com/blog/querying-parquet-millisecond-latency/
- 2022-06-27 [DataBricks Data+AI Summit]: DataFusion and Arrow: Supercharge Your Data Analytical Tool with a Rusty Query Engine. slides, recording
- 2022-05-23 [The Data Thread 2022]: Apache Arrow and DataFusion: Changing the Game for Implementing Database Systems. slides, recording
- 2021-03-10 [InfluxData Tech Talk]: Query Engine Design and the Rust-Based DataFusion in Apache Arrow. slides (Google Slides, Slideshare), recording
📅 Release Notes & Updates
- 2024-07-24 https://datafusion.apache.org/blog/2024/07/24/datafusion-40.0.0/
- 2024-01-19 https://datafusion.apache.org/blog/2024/01/19/datafusion-34.0.0/
- 2023-06-24 https://arrow.apache.org/blog/2023/06/24/datafusion-25.0.0/
- 2023-01-19 https://arrow.apache.org/blog/2023/01/19/datafusion-16.0.0/
- 2023-01-01 https://andygrove.io/2023/01/what-i-want-from-datafusion-2023/
- 2022-10-25 https://arrow.apache.org/blog/2022/10/25/datafusion-13.0.0/
- 2022-05-16 https://arrow.apache.org/blog/2022/05/16/datafusion-8.0.0/
- 2022-02-28 https://arrow.apache.org/blog/2022/02/28/datafusion-7.0.0/
- 2021-11-19 https://arrow.apache.org/blog/2021/11/19/datafusion-6.0.0/
- 2021-08-18 https://arrow.apache.org/blog/2021/08/18/datafusion-5.0.0/
- 2019-09-22 https://andygrove.io/2019/09/datafusion-0.15.0-release-notes/
🌎 Community Events
- 2025-01-15 (Upcoming) Boston Apache DataFusion Meetup
- 2024-12-18 (Upcoming) Chicago Apache DataFusion Meetup
- 2024-09-27 Belgrade Apache DataFusion Meetup, recap, slides, recordings
- 2024-06-26 New York City Apache DataFusion Meetup. slides
- 2024-06-25 San Francisco Bay Area Apache DataFusion Meetup. slides
- 2024-03-27 Austin Apache DataFusion Meetup. slides, recording
Source
# 📚 DF Content Library
# 🧭 Foundational Contents
- **2024-06-13** 2024 ACM SIGMOD International Conference on Management of Data Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query Engine (talk) [slides](https://docs.google.com/presentation/d/1gqcxSNLGVwaqN0_yJtCbNm19-w5pqPuktII5_EDA6_k/edit#slide=id.p), [recording](https://youtu.be/-DpKcPfnNms), [paper](https://dl.acm.org/doi/10.1145/3626246.3653368)
- **2024-06-07** https://www.youtube.com/watch?v=-DpKcPfnNms&t=5s
- **2023-04-05** The Apache Arrow DataFusion Architecture Part 3: Physical Plan and Execution. [slides](https://docs.google.com/presentation/d/1cA2WQJ2qg6tx6y4Wf8FH2WVSm9JQ5UgmBWATHdik0hg), [recording](https://youtu.be/2jkWU3_w6z0)
- **2023-04-04** The Apache Arrow DataFusion Architecture Part 2: Logical Plans and Expressions. [slides](https://docs.google.com/presentation/d/1ypylM3-w60kVDW7Q6S99AHzvlBgciTdjsAfqNP85K30), [recording](https://youtu.be/EzZTLiSJnhY)
- **2023-03-31** The Apache Arrow DataFusion Architecture Part 1: Query Engines. [slides](https://docs.google.com/presentation/d/1D3GDVas-8y0sA4c8EOgdCvEjVND4s2E7I6zfs67Y4j8), [recording](https://youtu.be/NVKujPxwSBA)
- **2020-02-27** https://andygrove.io/2020/02/how-query-engines-work/
# ✨ Good Reads
- **2024-10-16** https://www.letsql.com/posts/candle-image-segmentation/
- **2024-09-23 → 2024-12-02** [Carnegie Mellon University: Database Building Blocks Seminar Series - Fall 2024](https://db.cs.cmu.edu/seminar2024/)
- **2024-10-28** https://www.youtube.com/watch?v=fltZMO8EGl0&list=PLSE8ODhjZXjZc2AdXq_Lc1JS62R48UX2L&index=6
- **2024-10-21** https://www.youtube.com/watch?v=tyM-ec1lKfU&list=PLSE8ODhjZXjZc2AdXq_Lc1JS62R48UX2L&index=5
- **2024-10-07** https://www.youtube.com/watch?v=Vxb8TELNM98&list=PLSE8ODhjZXjZc2AdXq_Lc1JS62R48UX2L&index=4
- **2024-09-23** https://www.youtube.com/watch?v=iJhRbDFJjbg&list=PLSE8ODhjZXjZc2AdXq_Lc1JS62R48UX2L&index=2
- **2024-09-30** https://www.youtube.com/watch?v=o59s0d3HE1k&list=PLSE8ODhjZXjZc2AdXq_Lc1JS62R48UX2L&index=3
- **2024-09-17** https://www.youtube.com/watch?v=2z11xtYw_xs
- **2024-08-25** [Pydantic/logfire: We're changing database](https://github.com/pydantic/logfire/issues/408)
- **2024-08-15** https://www.youtube.com/watch?v=RVLshX6fbds
- **2024-08-14** https://uwheel.rs/post/datafusion_uwheel/
- **2024-06-17** https://blog.lancedb.com/columnar-file-readers-in-depth-apis-and-fusion/
- **2024-06-14** [2024 Simplicity in Management of Data (SiMOD)](https://sfu-dis.github.io/simod/) DataFusion: The Case for Building Open Data Systems (keynote) [slides](https://docs.google.com/presentation/d/1K3EdknzkqU2LhWi_eNKXdcvNk0OEvk9AqTLqhZkPxuI/edit)
- **2024-05-29** https://cube.dev/blog/query-push-down-in-cubes-semantic-layer
- **2024-03-26 → 2024-06-26** Building InfluxDB 3.0 (and other systems) without starting from "scratch" with Apache DataFusion
- **2024-06-26** [Microsoft Gray Systems Lab:](https://www.microsoft.com/en-us/research/group/gray-systems-lab) Building InfluxDB 3.0 (and other systems) without starting from "scratch" with Apache DataFusion [slides](https://docs.google.com/presentation/d/1a4wHZij_69drdmD32TPombQ9zSaE6l26LZ87DAz2New/edit#slide=id.p)
- **2024-03-26** [DataCouncil 2024:](https://www.datacouncil.ai/talks24/building-influxdb-30-with-apache-arrow-datafusion-flight-and-parquet?hsLang=en) Building InfluxDB 3.0 with Apache Arrow, DataFusion, Flight and Parquet. [slides](https://docs.google.com/presentation/d/12kdYHLyH79B5__9xs3de_hZyG9geW4jC3vUpiy39VA0), [recording](https://www.youtube.com/watch?v=I-Z7kFGsYRI)
- **2024-03-20** https://www.youtube.com/watch?v=P3dXH61Kr5U
- **2024-03-18** https://www.influxdata.com/blog/making-recent-value-queries-hundreds-times-faster/
- **2023-10-25** https://www.influxdata.com/blog/flight-datafusion-arrow-parquet-fdap-architecture-influxdb/
- **2023-09-26** https://www.kamu.dev/blog/2023-09-datafusion-flightsql/
- **2023-08-15**https://www.synnada.ai/blog/running-window-query-in-stream-processing
- **2023-08-05** InfluxData: Aggregating Millions of Groups Fast in Apache Arrow DataFusion. [InfluxData](https://www.influxdata.com/blog/aggregating-millions-groups-fast-apache-arrow-datafusion/), [DataFusion](https://arrow.apache.org/blog/2023/08/05/datafusion_fast_grouping/).
- **2023-07-28**https://www.synnada.ai/blog/sliding-window-hash-join-swhj
- **2023-07-13**https://www.synnada.ai/blog/probabilistic-data-structures-in-streaming-count-min-sketch
- **2023-05-25** https://www.youtube.com/watch?v=NEL6DluUxgw
- **2023-02-20**https://www.synnada.ai/blog/general-purpose-stream-joins-via-pruning-symmetric-hash-joins
- **2023-02-15 → 2023-09-27** Implementing InfluxDB IOx, "from scratch" using Apache Arrow, DataFusion, and Rust
- **2023-09-27** MIT Database Group: Implementing InfluxDB IOx, "from scratch" using Apache Arrow, DataFusion, and Rust. [slides](https://docs.google.com/presentation/d/1_JXxapY2jksCOm5hePK8FIjO3buDzsrBBy0jUEpJR4A)
- **2023-06-02** [[Dutch Seminar on Database System Design]](https://dsdsd.da.cwi.nl/past_talks/post_talks/Andrew-Lamb/): Implementing InfluxDB IOx, "from scratch" using Apache Arrow, DataFusion, and Rust. [slides](https://docs.google.com/presentation/d/1XTsO2zsHkgBCF6C0YVwk0BnhZzLBrm39oeapOBb-s9A), [recording](https://youtu.be/Y5K2Ik2oo-8)
- **2023-02-15** [Invited Talk at Optum Labs]: Building a new time series database "from scratch" Using Apache Arrow, Parquet, DataFusion and Rust [slides](https://docs.google.com/presentation/d/1SzqgTtSKVqpuFUDdOHhRNC3mLmJ7oyVp0OyrYwHvgPA),
- **2023-01-01** https://andygrove.io/2023/01/what-i-want-from-datafusion-2023/
- **2022-12-07** https://www.influxdata.com/blog/querying-parquet-millisecond-latency/
- **2022-06-27** [DataBricks Data+AI Summit]: DataFusion and Arrow: Supercharge Your Data Analytical Tool with a Rusty Query Engine. [slides](https://docs.google.com/presentation/d/1wLORMn23RD_sQ84W2w51s-Xysly5S8F5mGXzaeJ4QWY), [recording](https://www.databricks.com/dataaisummit/session/datafusion-and-arrow-supercharge-your-data-analytical-tool-rusty-query-engine)
- **2022-05-23** [The Data Thread 2022]: Apache Arrow and DataFusion: Changing the Game for Implementing Database Systems. [slides](https://docs.google.com/presentation/d/1Tkjfup5z_nsrBWIO7dXscEzC5toTQCXj0IsZeO3endc), [recording](https://www.youtube.com/watch?v=rb61lVH2vYc)
- **2021-03-10** [InfluxData Tech Talk]: Query Engine Design and the Rust-Based DataFusion in Apache Arrow. slides ([Google Slides](https://docs.google.com/presentation/d/1z_bmjqQk_WKsyQMfmIYssjJNYwLEkjGcoCAsv8D7XO0), [Slideshare](https://www.slideshare.net/influxdata/influxdb-iox-tech-talks-query-engine-design-and-the-rustbased-datafusion-in-apache-arrow-244161934)), [recording](https://www.youtube.com/watch?v=K6eCAVEk4kU)
# 📅 Release Notes & Updates
- **2024-07-24** https://datafusion.apache.org/blog/2024/07/24/datafusion-40.0.0/
- **2024-01-19** https://datafusion.apache.org/blog/2024/01/19/datafusion-34.0.0/
- **2023-06-24** https://arrow.apache.org/blog/2023/06/24/datafusion-25.0.0/
- **2023-01-19** https://arrow.apache.org/blog/2023/01/19/datafusion-16.0.0/
- **2023-01-01** https://andygrove.io/2023/01/what-i-want-from-datafusion-2023/
- **2022-10-25** https://arrow.apache.org/blog/2022/10/25/datafusion-13.0.0/
- **2022-05-16** https://arrow.apache.org/blog/2022/05/16/datafusion-8.0.0/
- **2022-02-28** https://arrow.apache.org/blog/2022/02/28/datafusion-7.0.0/
- **2021-11-19** https://arrow.apache.org/blog/2021/11/19/datafusion-6.0.0/
- **2021-08-18** https://arrow.apache.org/blog/2021/08/18/datafusion-5.0.0/
- **2019-09-22** https://andygrove.io/2019/09/datafusion-0.15.0-release-notes/
# 🌎 Community Events
- **2025-01-15** (Upcoming) [Boston Apache DataFusion Meetup](https://github.com/apache/datafusion/discussions/13165)
- **2024-12-18** (Upcoming) [Chicago Apache DataFusion Meetup](https://lu.ma/eq5myc5i)
- **2024-09-27** [Belgrade Apache DataFusion Meetup](https://lu.ma/tmwuz4lg), [recap](https://github.com/apache/datafusion/discussions/11431#discussioncomment-10832070), [slides](https://github.com/apache/datafusion/discussions/11431#discussioncomment-10826169), [recordings](https://www.youtube.com/watch?v=4huEsFFv6bQ&list=PLrhIfEjaw9ilQEczOQlHyMznabtVRptyX)
- **2024-06-26** [New York City Apache DataFusion Meetup](https://lu.ma/2iwba0xm). [slides](https://docs.google.com/presentation/d/1dOLPAFPEMLhLv4NN6O9QSDIyyeiIySqAjky5cVgdWAE/edit#slide=id.g26bebde4fcc_3_7)
- **2024-06-25** [San Francisco Bay Area Apache DataFusion Meetup](https://lu.ma/6bphole2). [slides](https://docs.google.com/presentation/d/1Oz2yGllrWBkNGyiRMLr8qXTt4vmvtJWuI_weGThaZak/edit#slide=id.g26bebde4fcc_3_7)
- **2024-03-27** [Austin Apache DataFusion Meetup](https://github.com/apache/datafusion/discussions/8522). [slides](https://docs.google.com/presentation/d/1S51TK8waxHEJaxi_-uiSMrgQZ09m_hfaasPk5X5ExEY), [recording](https://www.youtube.com/watch?v=q1N3pH3tFw8)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yeah, The only proposition is to rename Content Library to Database Concepts Library
or similar
Sounds good -- thanks @comphead -- I will try and get this first draft in the next day or two |
Let's merge this one in and we can continue iterating on the content as follow on PRs |
Thanks again @comphead and @SamSynnada |
- **2024-06-26** [New York City Apache DataFusion Meetup](https://lu.ma/2iwba0xm). [slides](https://docs.google.com/presentation/d/1dOLPAFPEMLhLv4NN6O9QSDIyyeiIySqAjky5cVgdWAE/edit#slide=id.g26bebde4fcc_3_7) | ||
- **2024-06-25** [San Francisco Bay Area Apache DataFusion Meetup](https://lu.ma/6bphole2). [slides](https://docs.google.com/presentation/d/1Oz2yGllrWBkNGyiRMLr8qXTt4vmvtJWuI_weGThaZak/edit#slide=id.g26bebde4fcc_3_7) | ||
- **2024-03-27** [Austin Apache DataFusion Meetup](https://github.com/apache/datafusion/discussions/8522). [slides](https://docs.google.com/presentation/d/1S51TK8waxHEJaxi_-uiSMrgQZ09m_hfaasPk5X5ExEY), [recording](https://www.youtube.com/watch?v=q1N3pH3tFw8) | ||
- **2024-03-26** [Seattle Apache DataFusion Meetup]( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This event seems incomplete and lacks links.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. Nice catch. -- PR to fix: #13445
Looking for help
Does anyone know how to automatically format the content links in the same / similar manner as the Notion page?
Which issue does this PR close?
Related to
Rationale for this change
@SamSynnada created a wonderful list of DataFusion related content here and I think posting it to the DataFusion website would be great
What changes are included in this PR?
Add a new page with the content in the 👉 DF Content Library
Are these changes tested?
N/A
Are there any user-facing changes?