Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release/snowplow media player/0.7.0 #66

Merged
merged 7 commits into from
Dec 7, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 1 addition & 3 deletions .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
@@ -1,3 +1 @@
* @emielver
* @agnessnowplow
* @georgewoodhead
* @snowplow/com-snowplowanalytics-engineering-datavalue-integrations
24 changes: 24 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,3 +1,27 @@
snowplow-media-player 0.7.0 (2023-12-07)
---------------------------------------
## Summary
This version adds new features powered by a complete refactor of the core processing of the package by moving it out to the new `base` macro functionality provided in `snowplow_utils`. This enables users to now specify custom fields for sessionization and user identification, to add custom entities/SDEs fields to the base events table for redshift/postgres, and to add passthrough fields to the derived tables so you can now more easily add your own fields to our tables.

The default session identifier has been updated from using the domain_sessionid, to now be the media session id (or the page/screen view id if the media session entity is not set). Previously media events from a play that overlapped to a new domain_sessionid were discarded, this update ensures the complete media play is modeled. It is still possible to perform the original session level analysis using the new `domain_sessionid_array` field.

In addition this release adds a more robust unique media identifier. This fixes an issue where duplicate `media_id` values could occur in the media stats table as a result of incorrect tracking implementation (e.g. sharing the same media label across different media types). This release also fixes the incremental materialization of the media_ad_views table by adding a unique primary key.

## Features
- Migrate base models to the new `base` macros for flexibility and consistency
- Updated the default session identifier be the media session id (or page/screen view id if the media session entity is not set)
- Add ability to pass fields through to derived media base and ad views tables
- Add new field `domain_sessionid_array` to derived tables (where applicable)

## Fixes
- Add unique media identifier (close #59)
- Add missing primary key to media_ad_views
- Fix field names in custom session stats model yaml (close #63)
- Fix playback_quality_field macro (close #60)

## 🚨 Breaking Changes 🚨
This version requires a full refresh run if you have been using any previous versions. You will not be able to upgrade and have the package work without doing a full refresh. Check out the [migration guide](https://docs.snowplow.io/docs/modeling-your-data/modeling-your-data-with-dbt/migration-guides/media-player/) for more information when you upgrade.

snowplow-media-player 0.6.1 (2023-10-04)
---------------------------------------
## Summary
Expand Down
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

# dbt-snowplow-media-player

A fully incremental model that transforms media player event data into derived tables for easier querying generated by the Snowplow [JavaScript tracker][javascript-tracker] in combination with media tracking specific plugins such as the [Media Tracking plugin][media-tracking] or the [YouTube Tracking plugin][youtube-tracking].
A fully incremental model that transforms media player event data into derived tables for easier querying generated by the Snowplow [JavaScript tracker][javascript-tracker] in combination with media tracking specific plugins such as the [Media Tracking plugin][media-tracking] or the [YouTube Tracking plugin][youtube-tracking]. The package also supports media events generated by the Snowplow [iOS and Android trackers][mobile-media-tracker-docs].

Please refer to the [doc site][snowplow-media-player-docs] for a full breakdown of the package.

Expand Down Expand Up @@ -57,7 +57,7 @@ The package contains multiple staging models however the mart models are as foll
|------------------------------------------|------------------------------------------------------------------------------------------------------------------|
| snowplow_media_player_base | A table summarizing media player events by media and pageview including impressions. |
| snowplow_media_player_plays_by_pageview | A view summarizing media plays by media on a pageview level. |
| snowplow_media_player_media_stats | An aggregated table of media metrics on a media_id level. |
| snowplow_media_player_media_stats | An aggregated table of media metrics on a media_identifier level. |
| snowplow_media_player_media_ad_views | A view summarizing each ad viewed within a media playback (only for v2 schemas, see above). |
| snowplow_media_player_media_ads | An aggregated table of ad metrics for each ad played within each media content (only for v2 schemas, see above). |

Expand Down Expand Up @@ -122,3 +122,5 @@ limitations under the License.

[snowplow-media-player-docs-dbt]: https://snowplow.github.io/dbt-snowplow-media-player/#!/overview/snowplow_media_player
[snowplow-media-player-docs]: https://docs.snowplow.io/docs/modeling-your-data/modeling-your-data-with-dbt/dbt-models/dbt-media-player-data-model/

[mobile-media-tracker-docs]: https://docs.snowplow.io/docs/collecting-data/collecting-from-own-applications/mobile-trackers/tracking-events/media-tracking/
31 changes: 17 additions & 14 deletions dbt_project.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name: 'snowplow_media_player'
version: '0.6.1'
version: '0.7.0'
config-version: 2

require-dbt-version: ['>=1.4.0', '<2.0.0']
Expand Down Expand Up @@ -32,6 +32,7 @@ vars:
snowplow__dev_target_name: 'dev'
# snowplow__atomic_schema: 'atomic' # Only set if not using 'atomic' schema for Snowplow events data
# snowplow__database: # Only set if not using target.database for Snowplow events data -- WILL BE IGNORED FOR DATABRICKS
# snowplow__events_table: 'events' # Only set if not using 'events' table for Snowplow events data

# Variables - Operation and logic
snowplow__complete_play_rate: 0.99
Expand All @@ -48,8 +49,12 @@ vars:
snowplow__upsert_lookback_days: 30
snowplow__allow_refresh: false
snowplow__app_id: []

snowplow__session_timestamp: collector_tstamp # Used to manage utils version higher than 0.15.1, do not change until new base macro is used
snowplow__session_timestamp: collector_tstamp
# please refer to the macros within identifiers.sql for default session and user values
snowplow__session_identifiers: []
# snowplow__session_sql: 'sc.session_id' # takes priority over session_identifiers
snowplow__user_identifiers: []
# snowplow__user_sql: 'sc.user_id' # takes priority over user identifiers

# Variables - Contexts, filters, and logs
# please set any of the below three variables to true if the related context schemas are enabled for your warehouse, please note it cannot be used to filter the data:
Expand All @@ -67,6 +72,9 @@ vars:
snowplow__enable_web_events: true
snowplow__enable_mobile_events: false
snowplow__enable_ad_quartile_event: false
# add extra custom fields:
snowplow__base_passthroughs: []
snowplow__ad_views_passthroughs: []

# Variables - Warehouse Specific
snowplow__media_player_event_context: 'com_snowplowanalytics_snowplow_media_player_event_1'
Expand All @@ -85,17 +93,20 @@ vars:
snowplow__derived_tstamp_partitioned: true
snowplow__query_tag: 'snowplow_dbt'
snowplow__enable_load_tstamp: true
snowplow__entities_or_sdes: []
# Databricks Only
# Depending on the use case it should either be the catalog (for Unity Catalog users from databricks connector 1.1.1 onwards) or the same value as your snowplow__atomic_schema (unless changed it should be 'atomic')
# snowplow__databricks_catalog: 'hive_metastore'

# Completely or partially remove models from the manifest during run start.
on-run-start:
- '{{ snowplow_media_player_delete_from_manifest(var("models_to_remove",[])) }}'
- '{{ snowplow_utils.snowplow_delete_from_manifest(var("models_to_remove",[])) }}'
# Check inconsistencies within the variable setup.
- '{{ snowplow_media_player.config_check() }}'

# Update manifest table with last event consumed per sucessfully executed node/model
# Update manifest table with last event consumed per successfully executed node/model
on-run-end:
- '{{ snowplow_utils.snowplow_incremental_post_hook("snowplow_media_player") }}'
- '{{ snowplow_utils.snowplow_incremental_post_hook("snowplow_media_player", "snowplow_media_player_incremental_manifest", "snowplow_media_player_base_events_this_run", var("snowplow__session_tstamp", "collector_tstamp")) }}'

models:
snowplow_media_player:
Expand All @@ -104,14 +115,6 @@ models:
base:
manifest:
+schema: "snowplow_manifest"
bigquery:
+enabled: "{{ target.type == 'bigquery' | as_bool() }}"
databricks:
+enabled: "{{ target.type in ['databricks', 'spark'] | as_bool() }}"
default:
+enabled: "{{ target.type in ['redshift', 'postgres'] | as_bool() }}"
snowflake:
+enabled: "{{ target.type == 'snowflake' | as_bool() }}"
scratch:
+schema: 'scratch'
+tags: 'scratch'
Expand Down
44 changes: 30 additions & 14 deletions docs/markdown/snowplow_media_player_common_cols.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,36 @@
A UUID for each event e.g. `c6ef3124-b53a-4b13-a233-0088f79dcbcb`.
{% enddocs %}

{% docs col_media_id %}
The unique identifier of a specific media element. It is the `player_id` in case of YouTube and `html_id` in case of HTML5.
{% docs col_media_identifier %}
The surrogate key generated from `player_id`, `media_label`, `media_type` and `media_player_type` to create a unique media element identifier.
{% enddocs %}

{% docs col_player_id %}
The HTML id attribute of the media content. It is the `player_id` in case of YouTube and `html_id` in case of HTML5.
{% enddocs %}

{% docs col_play_id %}
The surrogate key generated from `page_view_id` and `media_id `to create a unique play event identifier.
The surrogate key generated from `page_view_id`, `player_id`, `media_label`, `media_type` and `media_player_type` to create a unique play event identifier.
{% enddocs %}

{% docs col_page_view_id %}
A UUID for each page view e.g. `c6ef3124-b53a-4b13-a233-0088f79dcbcb`.
{% enddocs %}

{% docs col_session_identifier %}
A visit / session UUID e.g. `c6ef3124-b53a-4b13-a233-0088f79dcbcb`.
The session identifier as defined in your project variables. Default to the media_session_id, or to page_view_id if the media session entity is not enabled.
{% enddocs %}

{% docs col_original_session_identifier %}
The session identifier set by Snowplow using 1st party cookie. This is the domain_sessionid or session_id from the mobile session context.
{% enddocs %}

{% docs col_domain_sessionid_array %}
All domain_sessionids seen for a play_id.
{% enddocs %}

{% docs col_user_identifier %}
The user identifier as defined in your project variables. Default to domain_userid.
{% enddocs %}

{% docs col_domain_userid %}
Expand Down Expand Up @@ -199,7 +215,7 @@ Average playback rate (1 is normal speed).
{% enddocs %}

{% docs col_play_rate %}
Total plays divided by impressions. Please note that as the base for media plays is pageview / media_id, in case the same video is played multiple times within the same pageview, it will still count as one play.
Total plays divided by impressions. Please note that as the base for media plays is pageview / media_identifier, in case the same video is played multiple times within the same pageview, it will still count as one play.
{% enddocs %}

{% docs col_complete_plays %}
Expand Down Expand Up @@ -297,7 +313,7 @@ The number of pageviews with audio plays of any duration.
{% enddocs %}

{% docs col_last_base_tstamp %}
The start_tstamp of the last processed page_view across all media_ids to be used as a lower limit for subsequent incremental runs.
The start_tstamp of the last processed page_view across all media_identifiers to be used as a lower limit for subsequent incremental runs.
{% enddocs %}

{% docs col_player_current_time %}
Expand Down Expand Up @@ -905,7 +921,7 @@ The index of the event in the corresponding session.
{% enddocs %}

{% docs col_media_ad_id %}
Generated identifier that identifies an ad (identified using the ad_id) played with a specific media (identified using the media_id) and on a specific platform (based on the platform property).
Generated identifier that identifies an ad (identified using the ad_id) played with a specific media (identified using the media_identifier) and on a specific platform (based on the platform property).
{% enddocs %}

{% docs col_ad_id %}
Expand Down Expand Up @@ -1001,31 +1017,31 @@ Datetime of the last event.
{% enddocs %}

{% docs col_views_unique %}
Number of users that viewed the ad (identified by their domain_userid).
Number of users that viewed the ad (identified by their user_identifier).
{% enddocs %}

{% docs col_clicked_unique %}
Number of users that clicked on the ad (identified by their domain_userid).
Number of users that clicked on the ad (identified by their user_identifier).
{% enddocs %}

{% docs col_skipped_unique %}
Number of users that skipped the ad (identified by their domain_userid).
Number of users that skipped the ad (identified by their user_identifier).
{% enddocs %}

{% docs col_percent_reached_25_unique %}
Number of users that watched 25% of the ad (identified by their domain_userid).
Number of users that watched 25% of the ad (identified by their user_identifier).
{% enddocs %}

{% docs col_percent_reached_50_unique %}
Number of users that watched 50% of the ad (identified by their domain_userid).
Number of users that watched 50% of the ad (identified by their user_identifier).
{% enddocs %}

{% docs col_percent_reached_75_unique %}
Number of users that watched 75% of the ad (identified by their domain_userid).
Number of users that watched 75% of the ad (identified by their user_identifier).
{% enddocs %}

{% docs col_percent_reached_100_unique %}
Number of users that watched 100% of the ad (identified by their domain_userid).
Number of users that watched 100% of the ad (identified by their user_identifier).
{% enddocs %}

{% docs col_media_session_id %}
Expand Down
Loading