If you are new to dlt
complete the Getting started and the Walkthroughs so you have a feeling what is dlt and how people will use your sources and example pipelinese.
We strongly suggest that you build your sources out of existing building blocks.
- Declare your resources and group them in sources using Python decorators.
- Connect the transformers to the resources to load additional data or enrich it
- Create your resources dynamically from data
- Append, replace and merge your tables
- Transform your data before loading and see some examples of customizations like column renames and anonymization
- Set up "last value" incremental loading
- Dispatch data to several tables from a single resource
- Set primary and merge keys, define the columns nullability and data types
- Pass config and credentials into your sources and resources
- Use google oauth2 and service account credentials, database connection strings and define your own complex credentials: see examples below
Concepts to grasp
- Credentials and their "under the hood"
- Schemas, naming conventions and data normalization.
- How we distribute sources to our users
Building blocks used right:
- Create dynamic resources for tables by reflecting a whole database
- Incrementally dispatch github events to separate tables
- Read the participants for each deal using transformers and pipe operator
- Read the events for each ticket by attaching transformer to resource explicitly
- Set
tags
column data type to complex to load them as JSON/struct - Typical use of
merge
with incremental loading for endpoints returning a list of updates to entities in Shopify source. - A
dlt
mega-combo inpipedrive
source, where the deals fromdeal
endpoint are fed intodeals_flow
resource to obtain events for a particular deal. Both resources usemerge
write disposition and incremental load to get just the newest updates. Thedeals_flow
is dispatching different event types to separate tables withdlt.mark.with_table_name
. - An example of using JSONPath expression to get cursor value for incremental loading. In pipedrive some objects have
timestamp
property and othersupdate_time
. The dlt.sources.incremental('update_time|modified') expression lets you bind the incremental to either. - If your source/resource needs google credentials, just use
dlt
built-in credentials as we do in google sheets and google analytics. Also note howcredentials.to_native_credentials()
is used to initialize google api client. - If your source/resource accepts several different credential types look how we deal with 3 different types of Zendesk credentials
- See database connection string credentials applied to sql_database source