dbt + Trino: Starburst Galaxy COVID-19 tutorial!

There's a non-insignificant amount of setup work. The entire value prop of Trino and Galaxy is to be able to grab and transform data regardless of where it is. To demo this, you have to create at least one place for data to be and put data into it. Then you must set up a Galaxy account and give it access to the external data stores as well as where output data will be stored. The silver lining is that you only have to do this once, ever!

For the data source setup required for this tutorial, please see INFRA_SETUP.MD.

This demo can be utilized for either dbt Core or dbt Cloud. Both will require you to complete the steps in INFRA_SETUP.MD to set up the appropriate data sources.

What you'll need:

A Starburst Galaxy account. This is the easiest way to get up and running with trino to see the power of trino + dbt.
AWS account to connect a catalog to S3. AWS will act as a source and a target catalog in this example.
Any snowflake login. Sign up for a free account. You don't need need snowflake for the demo, it would just require you to alter some models yourself.

Why are we using so many data sources? Well, for this data lakehouse tutorial we will take you through all the steps of creating a reporting structure, including the steps to get your sources into your land layer in S3. Starburst Galaxy's superpower with dbt is being able to federate data from multiple different sources into one dbt repository. Showing multiple sources helps demonstrate this use case in addition to the data lakehouse use case. If you are interested in only using S3, you can run all the TPCH and AWS models without having to create a snowflake login. The snowflake section will fail, but the rest should complete.

You will also need:

A dbt installation of your choosing (core or cloud).
For core: I used a virtual environment on my M1 mac because that was the most recommended. I'll add the steps below in this readme. Review the other dbt core installation information to pick what works best for you.
For Cloud: I registered for a free account and utilized this repository in dbt Cloud. This option requires less first time setup steps. If you don't know what to pick, use this.

Tutorial Information

The goal of this tutorial is to showcase the power of dbt + Starburst Galaxy together. This tutorial aims to demonstrate both superpowers.

Query federation across multiple data sources - dbt specializes as a transform tool and can only be utilized after the data is landed in a storage solution. Starburst Galaxy fixes that by allowing you to query your data from multiple sources.
Data Lakehouse analytics - In this lab, we are going to build our lakehouse reporting structure in S3 and use slightly different naming conventions from the traditional Land, Structure, and Consume layer to accomodate for dbt standards. Land = Stage, Structure = Intermediate, Consume = Aggregate. For more information about the Starburst data lakehouse, visit this blog.

dbt Core

For the dbt Core tutorial, visit this blog for more information. Use the CORE.MD as a README to run this demo using dbt Core.

dbt Cloud

For the dbt Cloud tutorial, visit this blog for more information. Use the CLOUD.MD as a README to run this demo using dbt Cloud.

Shoutouts

Shout out to @dataders for his awesome help! Inspired by the Cinco de Trino repo by @jtcohen6!

Name		Name	Last commit message	Last commit date
Latest commit History 177 Commits
macros		macros
models		models
seeds		seeds
.gitignore		.gitignore
CLOUD.MD		CLOUD.MD
CORE.MD		CORE.MD
INFRA_SETUP.MD		INFRA_SETUP.MD
README.md		README.md
dbt_project.yml		dbt_project.yml
packages.yml		packages.yml
sample.profiles.yml		sample.profiles.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dbt + Trino: Starburst Galaxy COVID-19 tutorial!

What you'll need:

Tutorial Information

dbt Core

dbt Cloud

Shoutouts

About

Releases

Packages

monimiller/dbt-galaxy-covid-demo

Folders and files

Latest commit

History

Repository files navigation

dbt + Trino: Starburst Galaxy COVID-19 tutorial!

What you'll need:

Tutorial Information

dbt Core

dbt Cloud

Shoutouts

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages