PR #13 includes the following updates:
- Added the hubspot
engagement
source table to the package and made the following updates:- Added
stg_rag_hubspot__engagement
model as part of the hubspot staging models and updated relevant documentation. - Updated
int_rag_hubspot__deal_document
joins so thatstg_rag_hubspot__engagement
table joins first over thestg_rag_hubspot__engagement_contact
andstg_rag_hubspot__engagement_company
tables to bring in all necessary engagement records. - Updated
int_rag_hubspot__deal_document
to retrieveengagement_type
from the hubspotengagement
table, as opposed to theengagement_email
andengagement_note
tables. As such, removes their respective references as they are no longer used in this model.
- Added
- Updated the
unique_id
inrag__unified_document
to includechunk_index
. Previously, the unique key was a combination of onlydocument_id
,platform
, andsource_relation
, which was potentially inaccurate if there were multiple chunks associated with a document.
- Updated the hubspot_x seed data and get_hubspot_x_columns macros with the new
category
field where relevant. - Updated missing field descriptions in the Hubspot documentation.
PR #9 includes the following updates:
- Updated the
url
logic instg_rag_zendesk__ticket
to provide the proper clickable URL to Zendesk tickets. This way, theurl_reference
in therag__unified_document
properly generates a hyperlink for Zendesk documents.- As this is updating underlying data flowing into the incremental model, a full refresh is required.
PR #7 includes the following updates:
- For Snowflake destinations, we have removed the post-hook from the
rag__unified_document
which generated therag__unified_search
Cortex Search Service.- While the Search Service worked when deployed locally, there were issues identified when deploying and running via Fivetran Quickstart. In order to ensure Snowflake users are still able to take advantage of the
rag__unified_document
end model, we have removed the Search Service from execution until we are able to verify it works as expected on all supported orchestration methods. - If you would like, you can generate your own Snowflake Cortex Search Service by following the Create Cortex Search Service guidelines provided by Snowflake. For additional assistance, you can structure your Cortex Search Service off of the below query to effectively leverage the
rag__unified_document
generated from this data model.
-- Cortex Search Service created using the rag__unified_document model create cortex search service if not exists <your_schema>.<your_new_search_service_name> on chunk attributes unique_id warehouse = <your_warehouse> target_lag = '1 days' --You can specify this to your liking as ( select * from rag__unified_document )
- While the Search Service worked when deployed locally, there were issues identified when deploying and running via Fivetran Quickstart. In order to ensure Snowflake users are still able to take advantage of the
- Adjusted the
cluster_by
configuration within thedbt__unified_rag
to cluster by theupdate_date
(previouslyunique_id
) for improved Snowflake performance.
This is the initial release of the Unified RAG dbt package!
The main focus of this dbt package is to generate an end model and Cortex Search Service (for Snowflake destinations only) which contains the below relevant unstructured document data to be used for Retrieval Augmented Generation (RAG) applications leveraging Large Language Models (LLMs):
The following table provides a detailed list of all models materialized within this package by default.
TIP: See more details about these models in the package's dbt docs site.
Table | Description |
---|---|
rag__unified_document | Each record represents a chunk of text prepared for semantic-search and additional fields for use in LLM workflows. |
Additionally, for Snowflake destinations, a Cortex Search Service will be generated as a result of this data model. The Cortex Search Service uses the results of the rag__unified_document
and enables Snowflake users to take advantage of low-latency, high quality "fuzzy" search over their data for use in RAG applications leveraging LLMs. See the below table for details.
Snowflake Cortex Search Service | Description |
---|---|
rag__unified_search | Generates a Snowflake Cortex Search service via the search_generation macro as a post-hook for Snowflake destinations. This Cortex Search Service is currently configured with a target lag of 1 day. Please be aware that this search service will refresh automatically once a day even outside of this data model execution. To understand more about the Cortex Search Service, you can run SHOW CORTEX SEARCH SERVICES in the respective Snowflake database.schema which the rag__unified_document is materialized. See here for other relevant commands to use for understanding the nature of the Search Service, and here for helpful commands to use when leveraging the results of the Cortex Search Service in your LLM applications. |