Skip to content

Commit

Permalink
Merge branch 'main' into Release/MediaPlayer/0.9.0
Browse files Browse the repository at this point in the history
  • Loading branch information
ilias1111 authored Oct 15, 2024
2 parents 30be731 + 8a19edd commit 92e477b
Show file tree
Hide file tree
Showing 13 changed files with 1,381 additions and 10 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,10 @@ import {versions} from '@site/src/componentVersions';


<ReactMarkdown children={`
| snowplow-unified version | dbt versions | BigQuery | Databricks | Redshift | Snowflake | Postgres |
| -------------------------- | ------------------- | :------: | :--------: | :------: | :-------: | :------: |
| ${versions.dbtSnowplowUnified} | >=1.6.0 to <2.0.0 ||||||
| snowplow-unified version | dbt versions | BigQuery | Databricks | Redshift | Snowflake | Postgres | Spark |
| -------------------------- | ------------------- | :------: | :--------: | :------: | :-------: | :------: | :---: |
| ${versions.dbtSnowplowUnified} | >=1.6.0 to <2.0.0 |||||||
| 0.4.5 | >=1.6.0 to <2.0.0 |||||||
`} remarkPlugins={[remarkGfm]} />


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,23 @@ To optimism performance of large Postgres datasets you can create [indexes](http
}}
```

### Spark

For Spark environments, Iceberg is currently the supported file format for external tables. We have successfully tested this setup using both Glue and Thrift as connection methods. To use these models, create an external table from the Iceberg lake format in Spark and point your dbt model to this table.

Here's an example profiles.yml configuration for Spark using Thrift:
``` yaml
spark:
type: spark
host: localhost
method: thrift
port: 10000
schema: default
```
In your dbt_project.yml, the file_format is set to `iceberg` by default for Spark. While you can override this in your project's dbt YAML file to use a different file format, please note that Iceberg is currently the only officially supported format.


### Databricks

You can connect to Databricks using either the `dbt-spark` or the `dbt-databricks` connectors. The `dbt-spark` adapter does not allow dbt to take advantage of certain features that are unique to Databricks, which you can take advantage of when using the `dbt-databricks` adapter. Where possible, we would recommend using the `dbt-databricks` adapter.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -51,16 +51,14 @@ While using any entity in our packages is possible thanks to [modeling entities]
| [YAUAA](/docs/collecting-data/collecting-from-own-applications/snowplow-tracker-protocol/ootb-data/device-and-browser/index.md#yauaa-context-for-user-agent-parsing) | web | snowplow__enable_yauaa |
| [IAB](/docs/collecting-data/collecting-from-own-applications/snowplow-tracker-protocol/ootb-data/device-and-browser/index.md#iab-context-for-spiders-and-robots) | web | snowplow__enable_iab |
| [UA](/docs/enriching-your-data/available-enrichments/ua-parser-enrichment/index.md) | web | snowplow__enable_ua |
| [Browser](/docs/collecting-data/collecting-from-own-applications/snowplow-tracker-protocol/ootb-data/device-and-browser/index.md#browser-context) | web | snowplow__enable_browser_context |
| [Browser](/docs/collecting-data/collecting-from-own-applications/snowplow-tracker-protocol/ootb-data/device-and-browser/index.md#browser-context) | web | snowplow\__enable_browser_context, snowplow\__enable_browser_context_2 (depending on schema versions tracked, when both are enabled the values are coalesced) |
| [Mobile](/docs/collecting-data/collecting-from-own-applications/snowplow-tracker-protocol/ootb-data/device-and-browser/index.md#mobile-context) | mobile | snowplow__enable_mobile_context |
| [Geolocation](/docs/collecting-data/collecting-from-own-applications/snowplow-tracker-protocol/ootb-data/geolocation/index.md#geolocation-context-entity-tracked-in-apps) | mobile | snowplow__enable_geolocation_context |
| [Application](/docs/collecting-data/collecting-from-own-applications/snowplow-tracker-protocol/ootb-data/app-information/index.md#application-context-entity-on-mobile-apps) | mobile | snowplow__enable_application_context |
| [Screen](/docs/collecting-data/collecting-from-own-applications/snowplow-tracker-protocol/ootb-data/page-and-screen-view-events/index.md#screen-view-events) | mobile | snowplow__enable_screen_context |
| [Deep Links](/docs/collecting-data/collecting-from-own-applications/snowplow-tracker-protocol/ootb-data/links-and-referrers/index.md#context-entity-attached-to-screen-view-events) | mobile | snowplow__enable_deep_link_context |
| [Screen Summary](/docs/collecting-data/collecting-from-own-applications/snowplow-tracker-protocol/ootb-data/page-activity-tracking/index.md#screen-summary-entity) | mobile | snowplow__enable_screen_summary_context |



### Optional Modules
| Module | Docs | Enabled via Variable |
| ----------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------- |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ At time of writing, `Iceberg` is the preferred file format for Snowflake [iceber
Note that compared to the other loaders for Snowflake, that field names in Self-describing events and Entities are converted to `snake_case` format (the other loaders retain the format used in the schema, often `camelCase`). You will need to adjust other variables and inputs accordingly compared to what you may find in the docs.

# Spark
Currently using spark directly as a compute engine is not supported for our packages.
At time of writing, `Iceberg` is the supported file format for Spark external tables. We've tested this using Glue and Thrift as a connection method. If you have your event data in Iceberg format in a lake, you should be able to run the models by pointing the packages to a spark deployment, connected to that lake. For more information on setting up dbt with Spark using Thrift, please refer to the [dbt Spark documentation on Thrift](https://docs.getdbt.com/docs/core/connect-data-platform/spark-setup#thrift).

# Redshift (spectrum)
Currently using Redshift Spectrum tables is not supported for our packages due to [limitations](https://docs.aws.amazon.com/redshift/latest/dg/nested-data-restrictions.html) with the platform.
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
Expand Up @@ -56,4 +56,50 @@ Data Products created prior to the release of [Source Applications](../../organi

Event specifications which contain previously added application IDs will need to be updated to use the identifiers inherited from the Source Applications selected at Data Product level. This process can be done manually but you can reach out to our Support team to help you with that by either logging a request through our Snowplow [BDP Console](https://console.snowplowanalytics.com/) or by directly emailing [[email protected]](mailto:[email protected]).

![Updating existing Event Specifications](images/edit-existing-event-specification.png)
![Updating existing Event Specifications](images/edit-existing-event-specification.png)

## Upgrading Event Specification Instructions

When working with Event Specifications of a Data Product, it’s essential to account for the evolution of underlying [Data Structures](../../managing-your-data-structures/index.md). Data Structures define reusable JSON schemas, which can be referenced by different Event Specifications (events and entities). Each Event Specification may contain instructions, which rely on a specific version of a Data Structure, adding another layer to specialize or constraint Event Specifications in a more granular way.

### Versioning of Data Structures

As data and events evolve, Data Structures may be updated to new versions, which can be either compatible or incompatible with previous ones. These changes may cause potential conflicts with the instructions in Event Specifications (both events and entities) that reference an older version of the Data Structure.

### Semi-Automatic Upgrade of Event Specifications via the UI

To streamline the process of upgrading an Event Specification to the latest version of a Data Structure, we’ve implemented a mechanism that allows you to update Event Specification instructions through the UI. Here’s how it works:

When a new version of a Data Structure becomes available, the system will indicate that the event or entities referenced by the data structure has a new version available, showing an **'Upgrade'** button in the UI.

![Upgrade Event Specification warning](images/upgrade-event-specification-warning.png)

Clicking the button navigates to a new page, informing the user of the new version they are upgrading to, along with a **'View Changes'**.

![Upgrade Event Specification page](images/upgrade-event-specification-page.png)

When clicked it will show the differences between the current version of the Data Structure and the one the user intends to upgrade to.

![Upgrade Event Specification diff](images/upgrade-event-specification-diff.png)

At the bottom, a button will allow users to confirm the upgrade. One of two things can happen when the upgrade is confirmed:

#### 1. Successful automatic upgrade

- If the Event Specification instructions are compatible with the new Data Structure version, the system will automatically upgrade the Event Specification to the latest version of the Data Structure.
- All instructions will be updated seamlessly without any further user intervention.

![Automatic upgrade Event Specification](images/success_upgrade.png)

#### 2. Conflict detection and resolution

If the new version of the Data Structure introduces incompatibilities with the existing Event Specification instructions, the system will flag the conflicting properties.

- The UI will prompt the user to resolve these conflicts before the Event Specification can be upgraded.
- The conflict resolution UI provides options to the user tp modify or delete each instruction depending on the type of incompatibility:
- **Remove conflicting instructions**: If a specific property is no longer present in the new Data Structure.
- **Modify conflicting instructions**: If a property in the new Data Structure has been changed in an incompatible way (e.g., type change, added/removed enum values, added pattern, etc.).

![Conflict resolution Event Specification](images/conflict_resolution.png)

This mechanism ensures that teams can benefit from updated Data Structures while maintaining the integrity and accuracy of their Event Specifications. Users are empowered to make informed decisions during the upgrade process, with clear visual cues and options to handle conflicts effectively.
4 changes: 2 additions & 2 deletions src/componentVersions.js
Original file line number Diff line number Diff line change
Expand Up @@ -42,11 +42,11 @@ export const versions = {
// Data Modelling
// dbt
dbtSnowplowAttribution: '0.3.0',
dbtSnowplowUnified: '0.4.5',
dbtSnowplowUnified: '0.5.0',
dbtSnowplowWeb: '1.0.1',
dbtSnowplowMobile: '1.0.0',
dbtSnowplowUtils: '0.16.8',
dbtSnowplowMediaPlayer: '0.9.0',
dbtSnowplowUtils: '0.17.0',
dbtSnowplowNormalize: '0.3.5',
dbtSnowplowFractribution: '0.3.6',
dbtSnowplowEcommerce: '0.8.2',
Expand Down
Loading

0 comments on commit 92e477b

Please sign in to comment.