DLT Building Blocks

If you are new to dlt complete the Getting started and the Walkthroughs so you have a feeling what is dlt and how people will use your sources and example pipelinese.

We strongly suggest that you build your sources out of existing building blocks.

Declare your resources and group them in sources using Python decorators.
Connect the transformers to the resources to load additional data or enrich it
Create your resources dynamically from data
Append, replace and merge your tables
Transform your data before loading and see some examples of customizations like column renames and anonymization
Set up "last value" incremental loading
Dispatch data to several tables from a single resource
Set primary and merge keys, define the columns nullability and data types
Pass config and credentials into your sources and resources
Use google oauth2 and service account credentials, database connection strings and define your own complex credentials: see examples below

Concepts to grasp

Credentials and their "under the hood"
Schemas, naming conventions and data normalization.
How we distribute sources to our users

Building blocks used right:

Create dynamic resources for tables by reflecting a whole database
Incrementally dispatch github events to separate tables
Read the participants for each deal using transformers and pipe operator
Read the events for each ticket by attaching transformer to resource explicitly
Set tags column data type to complex to load them as JSON/struct
Typical use of merge with incremental loading for endpoints returning a list of updates to entities in Shopify source.
A dlt mega-combo in pipedrive source, where the deals from deal endpoint are fed into deals_flow resource to obtain events for a particular deal. Both resources use merge write disposition and incremental load to get just the newest updates. The deals_flow is dispatching different event types to separate tables with dlt.mark.with_table_name.
An example of using JSONPath expression to get cursor value for incremental loading. In pipedrive some objects have timestamp property and others update_time. The dlt.sources.incremental('update_time|modified') expression lets you bind the incremental to either.
If your source/resource needs google credentials, just use dlt built-in credentials as we do in google sheets and google analytics. Also note how credentials.to_native_credentials() is used to initialize google api client.
If your source/resource accepts several different credential types look how we deal with 3 different types of Zendesk credentials
See database connection string credentials applied to sql_database source

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUILDING-BLOCKS.md

BUILDING-BLOCKS.md

DLT Building Blocks

Files

BUILDING-BLOCKS.md

Latest commit

History

BUILDING-BLOCKS.md

File metadata and controls

DLT Building Blocks