jonathanc-n / datapapers Public

Notifications You must be signed in to change notification settings
Fork 0
Star 3

Papers on Data Warehouses, Lakes, Lakehouses

3 stars 0 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md

Repository files navigation

Papers on Data Warehouses, Lakes, Lakehouses

Contents

Data Warehouses
Data Lakes
Data Lakehouses

Data Warehouses

The Snowflake Elastic Data Warehouse (Snowflake)
Yellowbrick: An Elastic Data Warehouse on Kubernetes (Yellowbrick Data)
Amazon Redshift Re-invented (Amazon Web Services)
Napa: Powering Scalable Data Warehousing with Robust Query Performance at Google (Google)
WarpGate: A Semantic Join Discovery System for Cloud Data Warehouses (Sigma Computing)
Predicate Caching: Query-Driven Secondary Indexing for Cloud Data Warehouses (Amazon Web Services)
ByteCard: Enhancing ByteDance’s Data Warehouse with Learned Cardinality Estimation (ByteDance)
Amazon Redshift and the Case for Simpler Data Warehouses (Amazon Web Services)
Apache Hive: From MapReduce to Enterprise-grade Big Data Warehousing (Hortonworks)
Presto: SQL on Everything (Meta)

Data Lakes

Data lake: a new ideology in big data era (USTB)
Discovering Related Data At Scale (Microsoft)
Data Wrangling: The Challenging Journey from the Wild to the Lake (IBM)
Amalur: Next-generation Data Integration in Data Lakes (TU Delft)
Accelerating Raw Data Analysis with the ACCORDA Software and Hardware Architecture (UChicago)
BtrBlocks: Efficient Columnar Compression for Data Lakes (FAU, TUM)
JOSIE: Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes (UofT)

Data Lakehouses

Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics (Databricks)
Shared Foundations: Modernizing Meta’s Data Lakehouse (Meta)
Photon: A Fast Query Engine for Lakehouse Systems (Databricks)
Analyzing and Comparing Lakehouse Storage Systems (Databricks)
Deep Lake: a Lakehouse for Deep Learning (Activeloop)
BigLake: BigQuery’s Evolution toward a Multi-Cloud Lakehouse (Google)
Adaptive and Robust Query Execution for Lakehouses at Scale (Databricks)
Petabyte-Scale Row-Level Operations in Data Lakehouses (Apple)

About

Papers on Data Warehouses, Lakes, Lakehouses

Report repository

Releases

No releases published

Packages

No packages published