-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define event_monitoring_live_v1
views in view.sql
files
#4576
Conversation
So they get automatically deployed by the `bqetl_artifact_deployment.publish_views` Airflow task.
This comment has been minimized.
This comment has been minimized.
bigquery_etl/view/__init__.py
Outdated
) | ||
if is_view_statement: | ||
target_view = str(tokens[2]).strip().split()[0] | ||
view_id_token = tokens[2] if tokens[1].normalized == "VIEW" else tokens[3] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sqlparse doesn't currently recognize MATERIALIZED
as a keyword, so normalized
doesn't automatically uppercase it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth opening a bug for this. We've opened a couple in the past (like this one). Usually, the response/fix time is super fast
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea: andialbrecht/sqlparse#752
sql_generators/glean_usage/templates/event_monitoring_live_v1.view.sql
Outdated
Show resolved
Hide resolved
sql_generators/glean_usage/templates/event_monitoring_live_v1.view.sql
Outdated
Show resolved
Hide resolved
This comment has been minimized.
This comment has been minimized.
…eployment. BigQuery doesn't currently allow us to replace existing materialized views.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Optional suggestion to maybe add a comment
bigquery_etl/view/__init__.py
Outdated
view_kw_index = 1 if tokens[1].normalized == "VIEW" else 2 | ||
if ( | ||
" ".join( | ||
t.normalized for t in tokens[view_kw_index + 1 : view_kw_index + 4] | ||
) | ||
== "IF NOT EXISTS" | ||
): | ||
view_id_token = tokens[view_kw_index + 4] | ||
else: | ||
view_id_token = tokens[view_kw_index + 1] | ||
target_view = str(view_id_token).replace("`", "").strip().split()[0] | ||
try: | ||
[project_id, dataset_id, view_id] = target_view.split(".") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This took me a minute to parse. Not sure if some comments might make sense here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was annoyed how verbose this logic ended up being. I was hoping we could just grab the first identifier token, but unfortunately sqlparse currently misclassifies the MATERIALIZED
keyword as an identifier.
Actually, since this isn't a pressing issue, I think I'll wait for them to fix the bug I submitted, then this can be simplified to just look for the first identifier token.
Or an alternate approach would be using sqlparse just to remove comments from the SQL, then using regular expressions to extract the view ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or an alternate approach would be using sqlparse just to remove comments from the SQL, then using regular expressions to extract the view ID.
I've implemented this alternate approach.
This simplifies the logic and avoids a sqlparse bug where it doesn't recognize the `MATERIALIZED` keyword.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Integration report for "Merge branch 'main' into event_monitoring_live_v1-views"
|
Ah, turns out this is breaking view deploys: bigquery-etl/bigquery_etl/view/__init__.py Line 258 in 6e2e8f6
Materialized views don't have a |
* Define `event_monitoring_live_v1` views in `view.sql` files. So they get automatically deployed by the `bqetl_artifact_deployment.publish_views` Airflow task. * Support materialized views in view naming validation. * Handle `IF NOT EXISTS` in view naming validation. * Use regular expression to extract view ID in view naming validation. This simplifies the logic and avoids a sqlparse bug where it doesn't recognize the `MATERIALIZED` keyword. * Update other view regular expressions to allow for materialized views.
* android funnel test * fix filter expression * fix string comparison * revise toml * add completed event * simplify by using events_unnested * Funnel fixes * Bump mkdocs from 1.5.2 to 1.5.3 (#4321) Bumps [mkdocs](https://github.com/mkdocs/mkdocs) from 1.5.2 to 1.5.3. - [Release notes](https://github.com/mkdocs/mkdocs/releases) - [Commits](mkdocs/mkdocs@1.5.2...1.5.3) --- updated-dependencies: - dependency-name: mkdocs dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [RS-826] New job to calculate newtab visits -> activity stream sessions (#4387) * New job to calculate newtab visits -> activity stream sessions * Removing newline chars at end of file * Removing newline chars at end of file * Removing newline chars at end of file * Addressing comment suggestions * Format * Add bqetl_ads DAG * Add ACL to nt_visits_to_sessions_conversion_factors_daily_v1 * Add metadata files * Add view to dry_run skip list * Oops, fix the view --------- Co-authored-by: Curtis Morales <[email protected]> * Allow running multiple checks (#4471) * Allow running multiple checks * Don't yield anything on no matches * Change pocket_available for new Pocket markets (#4472) * FXA-6721 Setup import of accounts table from FxA production CloudSQL (#4423) * Urlbar events: nested (long) instead of wide (#4373) * feat: urlbar events final release * feat: new result types * feat: add interaction and group * fix: date * fix: use BQ builtin for UUIDs * Add the view_v2' * Add new table to the DAG * fix CI error fix ci error * remove teon brooks * Incorporate feedback by Curtis Incorporate feedback from Curtis --------- Co-authored-by: Alekhya Kommasani <[email protected]> Co-authored-by: Alekhya <[email protected]> * DENG-1705 - Add startup_profile_selection_reason_first to clients_daily_v6 (#4473) * Update experiment export query to include feature ids and branch feature config values (#4477) * Update experiment export query to include feature ids and branch feature config value. * Add view skip for broken view * add skip to dry run as well * DENG-476 - Update monitoring ETLs to reference main_v5 (#4431) * DENG-476 - Update sampled main ping tables to reference main_v5 (#4433) * DENG-476 - Update experiment aggregates ETL to reference main_v5 (#4435) * DENG-476 - Update internet outages to reference main_v5 (#4432) * Fix test for mozfun.norm.result_type_to_product_name (#4487) * Bug 1860814 - fix amo_prod__desktop_addons_by_client (#4481) * quick fix * fix spread out groupby * move out sourcetable query --------- Co-authored-by: Frank Bertsch <[email protected]> * fix for #4481 (#4489) * DENG-1781- Remove urlbar_events_temp_v2 view and repoint urlbar_events view to v2 (#4486) * Remove urlbar_events_temp_v2 view and repoint urlbar_events view to v2 * Include all sql_gen files in package (#4490) When the bigquery-etl package is installed from pypi (or locally via `pip install .`), the only non-py files included in the package are those in the `package_data` section of setup.py. Previously, with just those files, sql generation would fail due to missing files. Because this directory is small, we should include all files so no one accidentally runs into this problem again. Co-authored-by: Daniel Thorn <[email protected]> * Bump types-requests from 2.31.0.2 to 2.31.0.10 (#4475) Bumps [types-requests](https://github.com/python/typeshed) from 2.31.0.2 to 2.31.0.10. - [Commits](https://github.com/python/typeshed/commits) --- updated-dependencies: - dependency-name: types-requests dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump mozilla-metric-config-parser from 2023.9.2 to 2023.10.2 (#4476) Bumps [mozilla-metric-config-parser](https://github.com/mozilla/metric-config-parser) from 2023.9.2 to 2023.10.2. - [Release notes](https://github.com/mozilla/metric-config-parser/releases) - [Commits](mozilla/metric-config-parser@2023.9.2...2023.10.2) --- updated-dependencies: - dependency-name: mozilla-metric-config-parser dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Anna Scholtz <[email protected]> * Glean server knobs monitoring table (#4491) * Glean server knobs monitoring table * fix code gen and skip dry-run * Remove view creation in query * DENG-1879 Setup import of emails table from FxA stage CloudSQL (#4493) * DENG-1879 Setup import of emails table from FxA prod CloudSQL (#4494) * Bump jsonschema from 4.19.0 to 4.19.2 (#4495) Bumps [jsonschema](https://github.com/python-jsonschema/jsonschema) from 4.19.0 to 4.19.2. - [Release notes](https://github.com/python-jsonschema/jsonschema/releases) - [Changelog](https://github.com/python-jsonschema/jsonschema/blob/main/CHANGELOG.rst) - [Commits](python-jsonschema/jsonschema@v4.19.0...v4.19.2) --- updated-dependencies: - dependency-name: jsonschema dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: akkomar <[email protected]> * Bump pytest from 7.4.2 to 7.4.3 (#4496) Bumps [pytest](https://github.com/pytest-dev/pytest) from 7.4.2 to 7.4.3. - [Release notes](https://github.com/pytest-dev/pytest/releases) - [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst) - [Commits](pytest-dev/pytest@7.4.2...7.4.3) --- updated-dependencies: - dependency-name: pytest dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Enforce no date partition parameter in DAG (#4497) * Use mozfun.glean.parse_datetime to parse ping_info fields (#4464) In future versions of Glean that timestamp can be more precise, so we need to ensure we correctly parse it. Co-authored-by: Anna Scholtz <[email protected]> * Remove mmccorquodale from DAG owners (#4492) * Fix test for norm.glean_ping_info * Bump black from 23.9.1 to 23.10.1 Bumps [black](https://github.com/psf/black) from 23.9.1 to 23.10.1. - [Release notes](https://github.com/psf/black/releases) - [Changelog](https://github.com/psf/black/blob/main/CHANGES.md) - [Commits](psf/black@23.9.1...23.10.1) --- updated-dependencies: - dependency-name: black dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> * Bump sqlglot from 18.11.4 to 19.0.1 (#4500) Bumps [sqlglot](https://github.com/tobymao/sqlglot) from 18.11.4 to 19.0.1. - [Changelog](https://github.com/tobymao/sqlglot/blob/main/CHANGELOG.md) - [Commits](tobymao/sqlglot@v18.11.4...v19.0.1) --- updated-dependencies: - dependency-name: sqlglot dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Materialized views and aggregated tables for event monitoring (#4478) * WIP event monitoring * Add FxA custom events to view definition (#4483) * Add FxA custom events to view definition * Update sql_generators/event_monitoring/templates/event_monitoring_live.init.sql * Update sql_generators/event_monitoring/templates/event_monitoring_live.init.sql * Update sql_generators/event_monitoring/templates/event_monitoring_live.init.sql * Update sql_generators/event_monitoring/templates/event_monitoring_live.init.sql --------- Co-authored-by: Anna Scholtz <[email protected]> * Move event monitoring to glean_usage generator * Add cross-app event monitoring view * Generate cross app monitoring * Simplyfy event monitoring aggregation --------- Co-authored-by: akkomar <[email protected]> * Remove generated DAGs from main (#4507) * Add output_dir to command dag generate. (#4512) * Add output_dir to command dag generate. * output_dir to command dag generate. * output_dir to command dag generate. --------- Co-authored-by: Lucia Vargas <[email protected]> * Bump pyarrow from 13.0.0 to 14.0.0 (#4511) Bumps [pyarrow](https://github.com/apache/arrow) from 13.0.0 to 14.0.0. - [Commits](apache/arrow@go/v13.0.0...go/v14.0.0) --- updated-dependencies: - dependency-name: pyarrow dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump pre-commit from 3.4.0 to 3.5.0 (#4510) Bumps [pre-commit](https://github.com/pre-commit/pre-commit) from 3.4.0 to 3.5.0. - [Release notes](https://github.com/pre-commit/pre-commit/releases) - [Changelog](https://github.com/pre-commit/pre-commit/blob/main/CHANGELOG.md) - [Commits](pre-commit/pre-commit@v3.4.0...v3.5.0) --- updated-dependencies: - dependency-name: pre-commit dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Remove distinct_docids query (#4449) * Bump pip from 23.0 to 23.3 (#4516) Bumps [pip](https://github.com/pypa/pip) from 23.0 to 23.3. - [Changelog](https://github.com/pypa/pip/blob/main/NEWS.rst) - [Commits](pypa/pip@23.0...23.3) --- updated-dependencies: - dependency-name: pip dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump mkdocs-material from 9.3.1 to 9.4.7 (#4518) Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.3.1 to 9.4.7. - [Release notes](https://github.com/squidfunk/mkdocs-material/releases) - [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG) - [Commits](squidfunk/mkdocs-material@9.3.1...9.4.7) --- updated-dependencies: - dependency-name: mkdocs-material dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Dont generate dags in bqetl query schedule command (#4517) * Add query to load application information from probe info service (#4508) * prefixing schema error message inside dryrun to "ERROR" to make it easier to find when searching logs for cause of exit code 1 (#4522) * updated schema for telemetry_derived/clients_last_seen_joined_v1 to align it with the query results (#4523) * Update scheduler of aggregates to run after upstreams. (#4503) * Update scheduler of aggregates to run after upstreams. * Update dags for new scheduler of analytics_aggregates * Update dag bqetl_search * Remove DAG. --------- Co-authored-by: Lucia Vargas <[email protected]> * Set depend_on_past=False for warn checks (#4526) * Add map.set_key to mozfun (#4527) * Add map.set_key to mozfun * Disallow NULL keys in maps * DS-3281 - Add client adclicks history table (#4528) * Add client adclicks history table * Add alias to ad_click_history col Co-authored-by: Anna Scholtz <[email protected]> * Remove partition parameter on table write --------- Co-authored-by: Anna Scholtz <[email protected]> * Add experiment information to event monitoring (#4519) * feat(DENG-1774): adding fenix derived firefox android clients v2 (#4424) * added fenix_derirved.firefox_android_clients_v2 * added ETL checks for fenix_derirved.firefox_android_clients_v2 * made changes as suggested by bani in PR#4424 * converting unique check for android clients v2 until duplication is resolved * added install_source field to firefox_android_clients_v2 and formatting applied on checks * added locale field and modified the query to suppot is_init() * removed generated dag due to new generation process * Add submission_date param to adclicks history (#4531) * DS-3054. Support running an initialization query in parallel (#4322) * DS-3054. Create functions to support running an initialization query for all sample_ids in parallel. * DS-3054. Update _run_query function. * DS-3054. Use _run_query and mapped values for initialization in parallel. * DS-3054. Unify initialization to run in parallel and get sample_id range from metadata. * DS-3054. Minimize formatting of query template and remove need to modify existing initialization queries. Validate if a query should use parallelized or regular update. * DS-3054. Adding link to caveats. * DS-3054. Update sample_id range for initialization. * DS-3054. Use current implementation of run_query. * DS-3054. Update using a parameter instead of initialization in metadata. * DS-3054. DAG update with new parameter. * Pass parameters before calling _run_query(). * Use --append_tablein favour of INSERT INTO. * DS-3054 Separate parallel and non parallel init, plus some improvements. --------- Co-authored-by: Lucia Vargas <[email protected]> * Add ios baseline_clients_yearly (#4506) * DENG-1935 Change data ordering from pings in clients-first-seen-v2 (#4533) * DENG-1935 Change data ordering from pings in clients-first-seen-v2 * Added main ping for client-3, maintain chosen ping * Fix comments in event monitoring queries (#4535) * DENG-1705 - Add missing client attribution columns to clients daily/first-seen (#4505) * DENG-1705 Add missing client attribution columns to clients daily/firstseen * Update clients_last_seen_joined * Rename main_v4 -> main_v5 in ssl_ratios tests (#4536) * Make base tables configurable in glean_usage generator (#4534) * Make base tables configurable in glean_usage generator * Fix event extras unnesting in event monitoring * Bump sqlglot from 19.0.1 to 19.0.3 (#4521) Bumps [sqlglot](https://github.com/tobymao/sqlglot) from 19.0.1 to 19.0.3. - [Changelog](https://github.com/tobymao/sqlglot/blob/main/CHANGELOG.md) - [Commits](tobymao/sqlglot@v19.0.1...v19.0.3) --- updated-dependencies: - dependency-name: sqlglot dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Anna Scholtz <[email protected]> * DS-3272 - Review checker data model for mobile (#4498) * Add mobile shopping data * Remove the ff desktop from sql_generator * Fix build issue * Incorporate feedback from Bruce * Add clients table for mobile * FIX CI issue * Incorporate Bruce's feedback * Incorporate Curtis' feedback * Fix event_monitoring_aggregates_v1 template (#4537) This will ensure that FxA tables are included in the aggregate. * Fixing query error in fenix_derived/firefox_android_clients_v2/checks.sql (#4539) * Add missing clients view to fenix review checker (#4540) * add other projects to query from for bq usage, add for loop (#4529) * add other projects to query from for bq usage, add for loop * create new function to gather jobs_by_project data into temp table, update create_query function to join jobs_by_org table to jobs_by_project tmp table * take out date from tmp table as it is unnecessary * refactor to take out irrelevant function, rewrite SQL to look at other projects * add date filter to jobs_by_project * add comment for future refactoring * add tmp_table for jobs_by_project table * create function to loop through projects for jobs_by_project, revise query to join jobs_by_org with jobs_by_project tmp table * take out ambiguous DATE filter * take out r_prefix in regex from query string. Take out tmp table function. Add proper date filter * take out r_prefix in regex from query string. Take out tmp table function. Add proper date filter * add back in the r_prefix and add in the extra space in the Query ID regex that was needed * updated two affected fields across task_instance and trigger airflow metadata tables to type JSON (#4545) * Fix event monitoring template (#4546) Nulls need to be casted to string to make the union work. This will fix https://workflow.telemetry.mozilla.org/log?execution_date=2023-11-09T02%3A00%3A00%2B00%3A00&task_id=monitoring_derived__event_monitoring_aggregates__v1&dag_id=bqetl_monitoring&map_index=-1 * removed check for firefox_ios_clients_v1 which used different filtering settings causing result mismatch (#4547) * iOS attributable_clients use metrics adclicks (#4543) * iOS attributable_clients use metrics adclicks * Remove project id from table name Co-authored-by: kik-kik <[email protected]> --------- Co-authored-by: kik-kik <[email protected]> * Use correct submission_* field (#4549) * Use correct app_version field (#4551) * Revert "updated two affected fields across task_instance and trigger airflow metadata tables to type JSON (#4545)" (#4552) This reverts commit 9750d33. * DENG-1705 - Add startup_profile_selection_reason from first ping to clients_daily, clients_first_seen_v2 and downstream (#4482) * DENG-1705 - Add startup_profile_selection_reason to clients_first_seen * Add startup_profile_selection_reason_first_ping_only * Query typo * Update test schema * Update sql/moz-fx-data-shared-prod/telemetry_derived/clients_first_seen_28_days_later_v1/schema.yaml Co-authored-by: Lucia <[email protected]> --------- Co-authored-by: Lucia <[email protected]> * change filter on final query to go back to May 2023 - the min date in the Jobs by Project table as of 11/13/23 (#4559) * change filter on final query to go back in history * take out extraneous WHERE * add DISTINCT to final query * Add rust result types to product mapping (#4544) * missing-mobile-fields-review-checker (#4553) * noting that we are missing some fields * adding is_fx_dau to android and ios clients * add missing columns to schema.yaml add schema.yaml add schema.yaml * Delete sql/moz-fx-data-shared-prod/firefox_desktop/serp_events/view.sql --------- Co-authored-by: Alekhya Kommasani <[email protected]> Co-authored-by: Alekhya <[email protected]> * Add aggregate table to monitor event errors (#4548) * updated fenix_derived.funnel_retention_clients_* to use clients view instead of table directly (#4563) * Bug 1864722 - Fix column name typo (#4567) * add referenced tables to metadata.yaml to make sure jobs_by_org task … (#4568) * add referenced tables to metadata.yaml to make sure jobs_by_org task runs before bigquery_usage_v2 task * Update sql/moz-fx-data-shared-prod/monitoring_derived/bigquery_usage_v2/metadata.yaml Co-authored-by: Sean Rose <[email protected]> --------- Co-authored-by: Sean Rose <[email protected]> * Generate normal task dependencies from `depends_on` if the task is in the same DAG (#4569) * Generate normal task dependencies from `depends_on` if the task is in the same DAG. * Update `metadata.yaml` files to use `depends_on` rather than `upstream_dependencies`. * Add a period-over-period check for revenue data (#4566) * Check for period over period changes in column sum * Fix percent change calculation * Fix errors in navigation function logic * Rename period over period check to specify revenue * Remove references to period over period check --------- Co-authored-by: Alekhya <[email protected]> * feat(): updated fenix_derived.firefox_android_clients_v2 to include reported_baseline_ping field (#4565) * updated fenix_derived.firefox_android_clients_v2 to include reported_baseline_ping field * Update sql/moz-fx-data-shared-prod/fenix_derived/firefox_android_clients_v2/query.sql Co-authored-by: Lucia <[email protected]> --------- Co-authored-by: Lucia <[email protected]> * summing sap and ad clicks (#4571) * remove file that isn't ready yet (#4572) * Add ga.nullify_string UDF (#4556) * Add ga.nullify_string UDF * Add README line * added fenix_derived.firefox_android_clients_v2 to shredder config (#4564) * Use client_info.app_channel for event monitoring channels (#4575) * Add ga_sessions_v1 table & view (#4554) * Add ga_sessions_v1 table & view This table aggregates session-level data from GA. * Rename nullify string func * Apply suggestions from code review Co-authored-by: Alexander <[email protected]> * Add upstream backfill deps * Move depends_on to correct section --------- Co-authored-by: Alexander <[email protected]> * Make sure that metadata `friendly_name` and `description` are not None (#4513) * Fill empty description * Assign a friendly name if the table doesn't have one * Update metadata tests * Update bigquery_etl/metadata/parse_metadata.py Co-authored-by: Alexander <[email protected]> * update test again --------- Co-authored-by: Alexander <[email protected]> * Add back normalized_app_id (#4580) * Add session date param; fix checks CLI bug (#4579) * Fix checks to filter on partitions * Don't print "missing checks file" on success Previously, the statement that checks.sql files were missing was printed on any execution of the for statement. ("else" clauses after "for"s execute after completion of the "for" clause). Instead, we want to print only when there are no files. * Add derived stub attribution logs (#4557) * Add derived stub attribution logs This table keeps triplets from the stub attribution logs. The triplet of (dl_token, ga_client_id, stub_session_id) will only ever appear once here. See the associated decision brief: https://docs.google.com/document/d/1L4vOR0nCGawwSRPA9xiR8Hmu_8ozCGUecXAtBWmGGA0/edit * Move stub attribution table to new dataset In order to ensure limited access to the stub attribution service data without significantly decreasing developer velocity, we move these tables to a new dataset. That dataset has the defaults we want for all stub attribution log data: - Defaults to just read access to data-science/DUET workgroup - No read/write access for DE We will backfill via the bqetl_backfill DAG. * Rename view * Use correct dataset name in view * Skip dryrun; no access * Add gclid_conversions table & view (#4558) * Add gclid_conversions table & view This table will support the desktop conversion events. Each valid GCLID will have any associated conversion events. See the decision brief: https://docs.google.com/document/d/1T8ArA9r8HDMTj1ES9NHfJFv2gUWo7w0MjG07iXtuUOI * Use correct table name * Use new stub attribution dataset; clarify activity_date * Use correct date_partition_parameter Co-authored-by: Alexander <[email protected]> * Include activity_date as parameter * Use INNER instead of LEFT joins * Update doc strings to clarify GCLID vs GA Session --------- Co-authored-by: Alexander <[email protected]> * Include GA intraday sessions tables (#4582) * Include GA intraday sessions tables * Update doc string on backfilling ga_sessions * Dont dryrun stub_attribution view * Update min_row_count error text (#4586) * Add conversion event; fix gclid conversions query (#4584) * Add first_run conversion; use correct table names * Ignore dryrun of query and view * Remove HAVING clause; fix logical_or * migrates old pingcentre onboarding artifacts to new firefox_desktop view (#4457) * migrates old pingcentre onboarding artifacts to new firefox_desktop view * generate event rollup dag * generate review checker dag * update messaging system dag * incl project in table names --------- Co-authored-by: Anna Scholtz <[email protected]> * Add ga_clients_v1 table & view (#4560) * Add ga_clients_v1 table & view - Query from ga_sessions - Fix tests * Use correct scheduling parameters Co-authored-by: Alexander <[email protected]> * Move HAVING clause to WHERE Co-authored-by: Alexander <[email protected]> * Change CTE name Co-authored-by: Alexander <[email protected]> --------- Co-authored-by: Alexander <[email protected]> * Remove duplicate BQ query param (#4587) * Firefox ios adclicks (#4585) * Add Firefox iOS client adclicks history * Add metadata description to view * DS-3272 - Fix review checker clients to remove dups (#4583) * Fix review checker clients to remove dups * Fix CI issues * Add row_num filter * add submission_date to partition * remove submission_date from partition * Account for NULL handling in joins (#4590) Previously, NULL values in the join keys didn't join, resulting in duplicate rows. This change will coalesce those to empty strings and NULLIFY them in the view. * Bug 1865716 - Include errorGroups in legacy docker_fxa_admin_server_sanitized query (#4589) `errorGroups` field was added in `docker_fxa_admin_server_sanitized_v2` and breaks the UNION. * DS-3361. Update documentation of initialize command. (#4592) Co-authored-by: Lucia Vargas <[email protected]> * Link to full diff in git comments (#4593) * Link to full diff in git comments * Show full diff of new and deleted files * Correct DAG description as DAG is currently active. (#4596) Co-authored-by: Lucia Vargas <[email protected]> * Login funnel conversions (#4591) * Mozilla accounts login funnel conversion for overall, with email confirmation, and with two factor authentication * Update sql_generators/funnels/configs/login_funnels.toml * Update sql_generators/funnels/configs/login_funnels.toml --------- Co-authored-by: Kimberly Siegler <[email protected]> Co-authored-by: Anna Scholtz <[email protected]> * Use live tables to determine deletion request ping volume (#4442) * Increase no_output_timeout for long-running CI jobs (#4602) * SVCSE-1595 Setup import of tables from staging FxA databases (#4578) * In generated diffs explicitly list the files being added or deleted. (#4600) * Glam accounts for sampling when calculating sample_count for windows & release probes (#4581) * Glam - fix legacy windows & release probes' sample count going fwd * Glam FOG accounts for sampling when calculating total_sample for windows & release probes * fog - fix client count and sample count * Add channel filtering for fog * SVCSE-1595 Setup import of tables from production FxA databases (#4597) * Bug 1866469 - Exclude use_counters from GLAM ETL (#4603) * Bug 1866469 - Exclude use_counters from GLAM ETL * Attempt to fix tests --------- Co-authored-by: Eduardo Filho <[email protected]> * feat(): updating fxa android funnel to support install_source filtering downstream (#4561) * Added a filter to only include playstore data In keeping the bottom of the funnel consistent with the upper funnel, we have to only include installs from play store in the bottom of the funnel metrics * for fenix_derived.funnel_retention_clients_week_* tables making sure we only include playstore users * updating the changes as requested by soGaussian to expose to users the install_source field to enable filtering --------- Co-authored-by: richard baffour <[email protected]> * Add schema.yaml to urlbar_events (sql_generator) (#4595) * Add schema.yaml to urlbar_events * SVCSE-1595 Update accounts_db schemas to match deployed tables. (#4604) * SVCSE-1595 Update more accounts_db schemas to match deployed tables (#4605) * Fix num_chars_typed in urlbar_events schema (#4607) * Add init clause to ga_clients table (#4611) * Give census access to gclid conversions data (#4613) * Don't nest SQL generated from `main` branch in extra `sql` directory. (#4614) * Add desktop_acquisition_funnel view (#4616) * Add desktop_acquisition_funnel view * Update reference * Update view.sql Took out some of the TODO comments around naming to stay consistent with the table it is reading as well as reduce effort to make changes to the spoke-default view that is currently setup with test data. --------- Co-authored-by: gkabbz <[email protected]> * added ETL checks to fenix_derived.firefox_android_clients_v1 (#4609) * DENG-2013 - Add explicit dependencies & checks for history (#4620) * Fix the source table to point to unified view to include all apps (#4622) * Deng 1662 move google ads to ads google mmc connector (#4525) * DENG-1662 move from google_ads connector to ads_google_mmc connector * format queries * add code for cohort_daily_statistics using clients_first_seen_v2 with… (#4404) * add code for cohort_daily_statistics using clients_first_seen_v2 with new columns from clients_first_seen_v2 * take out extra sample_id * Update sql/moz-fx-data-shared-prod/telemetry_derived/cohort_daily_stats_clients_frst_seen_v2/query.sql switching column names - original was swapped Co-authored-by: Alexander <[email protected]> * update column names- change cohort_date to first_seen_date, make more descriptive; take out client_id and sample_id in the final table; take out extraneous columns that are not used in final table * fix group by - days_seen_bits not days_interacted_bits * take out second_seen_date, irrelevant * change date _activity to submission_date * replace submission_date_activity with client_activity * add new line at end of schema.yaml file * refactor code to use clients_first_seen_v2, originally commited cohorts_daily_statistics_v1 code in the v2 file * add cohort_daily_statistics_v2 job to DAG * add cohort_daily_statistics_v2 job to DAG, take out submission_date and add activity_date to query.sql * delete now needless dags folder * correct alias of table * change submission_date to activity_date * fix column name apple_model to apple_model_id * add days_seen_dau_bits and other calculations based on this * add attribution_dlsource to table * take out underscore from column name, attribution_dlsource * revise comment - 196 days not 180 days * add all the other columns from clients_first_seen_v2, update schema.yaml file with new columns * take out sample_id, fix schema * take out document_id, dl_token, app_build_id columns, rename activity_date to submission_date, rename cohort_date to first_seen_date to match clients_first_seen_28_days_later * move files from cohort_daily_statistics_v2 to desktop_cohort_daily_retention_v1 to reflect name change, take out extraneous colums such as xpcom_abi, attribution_dlsource, engine_data columns --------- Co-authored-by: Alexander <[email protected]> * add --project_id command, take out extraneous dashes in start and end commands in creating dataset cookbook (#4626) * change docs (#4629) * fix typo in project name (#4628) * fix typo in project name * remove shared-prod project from sql for google_ads_derived * Fixes #4624 - Add a view for firefox_desktop.broken_site_report (#4625) Co-authored-by: Anna Scholtz <[email protected]> * Separate Airflow tasks for glean_usage (#4588) * Add support for assigning Airflow tasks to task groups * Generate separate Airflow tasks for glean_usage * Remove Airflow dependencies from old glean_usage tasks * Update dataset_metadata.yaml for broken site reports (#4630) * Add user-facing view to fxa_oauth.clients (#4623) * Fix jinja templating in glean usage metadata (#4636) * feat(DENG-1774 / cancelled): deleting fenix_derived/firefox_android_clients_v2, v1 will remains the active model (#4610) * deleting fenix_derived/firefox_android_clients_v2, v1 will remain the active model * removed fenix_derived.firefox_android_clients_v2 from shredder config * firefox_ios source added to shredder config (#4638) * Skip check for baseline_clients_last_seen for Fire TV (#4640) * Resolve correct task_id for tasks nested in a group (#4637) * Android LTV UDFs (#4633) * Add Android State UDF * Add Android Markov States UDFs for LTV * Make docstrings consistent * Update doc string Co-authored-by: Leif Oines <[email protected]> --------- Co-authored-by: Leif Oines <[email protected]> * Migrated DIM checks over to ETL checks for internet_outages.global_outages_v1 (#4639) * Speed up glean_usage generation by caching the table getter (#4644) `get_tables` is deterministic under the assumption that the tables don't change in between invocations. Which I hope holds here. We therefore can just cache that value so that subsequent runs quickly return without needing a roundtrip to BigQuery again. * fixing broken test for firefox_ios_derived.baseline_clients_yearly_v1 (#4645) * Feat/deng 2046/migrating telemetry derived active users aggregates v1 dim checks to etl checks (#4641) * Migrated DIM checks over to ETL checks for telemetry_derived.active_users_aggregates_v1 * rewrite * code review suggestions * add doc * rename --------- Co-authored-by: kik-kik <[email protected]> * Minimize previous PR diff comments when CI posts a new diff comment (#4635) * Minimize previous PR diff comments when CI posts a new diff comment. * Update Node image to latest version available from CircleCI and pin Node packages. * GLAM avoid scientific notation for big sample counts (#4647) * GLAM avoid scientific notation for big sample counts * Cast to bignumeric instead of numeric * feat(DENG-2083): added firefox_ios_derived.clients_activation_v1 and corresponding view (#4631) * added firefox_ios_derived.clients_activation_v1 and corresponding view * fixing a missing seperator in firefox_ios_derived.clients_activation_v1 checks * adding firefox_ios_derived.clients_activation_v1 to shredder configuration * removed is_suspicious_device_client as it should not be there, thanks bani for pointing this out * fixed black formatting error inside shredder/config.py * applied bqetl formatting * minor styling tweak as suggested by bani in PR#4631 * Remove baseline_clients_daily DAG dependency for FF ios baseline clients yearly (#4651) * Support offset backfills, require metadata (#4627) * Skip backfills for queries without metadata.yaml * Support date_partition_offset * Fixed exclude, modified exception * Add test for offset backfill * Apply suggestions from code review Co-authored-by: Frank Bertsch <[email protected]> * Formatting --------- Co-authored-by: Frank Bertsch <[email protected]> * add dau_clients_days_since_seen to CTE and num_clients_dau_on_day column to table in query and schema (#4652) * Docs: Avoid newline in link mkdocs doesn't like that newline and will treat the URL as a relative URL, thus breaking the link * Docs: Use 3rd level heading for UDFs mkdocs' ToC generator will stop when the header level goes up again. Because the UDF name itself is generated as a first level heading, any UDF with a first-level header documentation will thus stop rendering any subsequent headers. Most notably on /mozfun/hist where only the very first UDF got a ToC entry. * Docs: Link to section on the same page The separate chapter was removed in #4293 * Migrated DIM checks over to ETL checks for telemetry_derived.unified_metrics_v1 (#4649) * feat(DENG-2120): migrated over checks defined in DIM for baseline_clients_last_seen fenix. (#4656) * migrated over checks defined in DIM for this type of dataset * Update sql_generators/glean_usage/templates/baseline_clients_last_seen_v1.checks.sql Co-authored-by: Anna Scholtz <[email protected]> --------- Co-authored-by: Anna Scholtz <[email protected]> * Create tables that have state values per day (#4634) * Create tables that have state values per day * Change Airflow DAG * Move markov states to cols rather than array * Move bot/bad client filter to materialized table * Add install_source and consecutive_days_seen features * Add field to CTE * Use jinja vars instead of sql variables * Use correct UDF incantation * Use live tables for structured error counts (#4598) * Use live tables for structured error counts * Prevent from old records being deleted * Fix structured_error_counts query (#4659) * Authorize view and add workgroup access for taskcluster (#4661) * Add metadata.yaml for socorro_crash_v2 (#4664) * Temporarily add curtis to CODEOWNERS until he can be added to group (#4665) * Add clients_daily_joined view (#4660) * add view.sql to telemetry and desktop_cohort_daily_retention view (#4666) * Skip accounts_db.fxa_oauth_clients in view validation (#4667) * Public GLAM datasets (#4606) * Public GLAM datasets * Remove Fenix GLAM datasets * DENG-1352 - Migrate contextual services ETL to desktop glean pings (#4474) * Have `bqetl query` commands fail if they don't find a matching query (#4662) * Have `bqetl query` commands fail if they don't find a matching query. * Update `test_run_query_no_query_file` test. * Skip accounts_db.fxa_oauth_clients dryrun (#4671) * Remove referenced_table from firefox_android_clients (#4674) * Define `event_monitoring_live_v1` views in `view.sql` files (#4576) * Define `event_monitoring_live_v1` views in `view.sql` files. So they get automatically deployed by the `bqetl_artifact_deployment.publish_views` Airflow task. * Support materialized views in view naming validation. * Handle `IF NOT EXISTS` in view naming validation. * Use regular expression to extract view ID in view naming validation. This simplifies the logic and avoids a sqlparse bug where it doesn't recognize the `MATERIALIZED` keyword. * Update other view regular expressions to allow for materialized views. * Add state location for US & Canadian VPN subscriptions (DENG-2099) (#4675) * add triage/confidential tag to docs (#4678) * feat(DENG-2156): added value_length check and updated some of the ETL checks to use the macro (#4672) * added value_length check and updated some of the ETL checks to use the macro * added the new check macro to the data checks docs * implemented lelilia feedback from PR#4672 * simplified the sql logic for the value_length check * Skipping copying checks for baseline tables for apps marked as not receiving the baseline ping (#4670) Co-authored-by: Frank Bertsch <[email protected]> * Revert "Define `event_monitoring_live_v1` views in `view.sql` files (#4576)" (#4680) This reverts commit 2c4cc5e. * Change directory to generate private DAGs so `sql_file_path` values are relative to the repo root. (#4668) * `cd` into `private-bigquery-etl` repo when generating DAGs. To avoid generated DAGs having incorrect absolute paths for ETLs using SQL scripts. * Revert "Temporarily add curtis to CODEOWNERS until he can be added to group (#4665)" (#4669) This reverts commit 8d94a86. * ci-fix Ignore dataset.update required permissions when dryrunning authorized views (#4681) * Refactor, add typehint * Add datasets.update clause denied for authorized views * add country dimension * remove generated and old files * delete genertated files * regenerate sql and delete more files * last edits to android funnel before review * change description fields * modify config to add retention outcomes --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Anna Scholtz <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Sergio E. Betancourt <[email protected]> Co-authored-by: Curtis Morales <[email protected]> Co-authored-by: Frank Bertsch <[email protected]> Co-authored-by: m-d-bowerman <[email protected]> Co-authored-by: akkomar <[email protected]> Co-authored-by: Rebecca BurWei <[email protected]> Co-authored-by: Alekhya Kommasani <[email protected]> Co-authored-by: Alekhya <[email protected]> Co-authored-by: Alexander <[email protected]> Co-authored-by: wil stuckey <[email protected]> Co-authored-by: Daniel Thorn <[email protected]> Co-authored-by: Leli <[email protected]> Co-authored-by: Jan-Erik Rediger <[email protected]> Co-authored-by: Lucia <[email protected]> Co-authored-by: Lucia Vargas <[email protected]> Co-authored-by: kik-kik <[email protected]> Co-authored-by: Marlene Hirose <[email protected]> Co-authored-by: David Zeber <[email protected]> Co-authored-by: betling <[email protected]> Co-authored-by: Sean Rose <[email protected]> Co-authored-by: Linh Nguyen <[email protected]> Co-authored-by: Mike Williams <[email protected]> Co-authored-by: ksiegler1 <[email protected]> Co-authored-by: Kimberly Siegler <[email protected]> Co-authored-by: Eduardo Filho <[email protected]> Co-authored-by: richard baffour <[email protected]> Co-authored-by: gkabbz <[email protected]> Co-authored-by: Ksenia <[email protected]> Co-authored-by: kik-kik <[email protected]>
So they get automatically deployed by the
bqetl_artifact_deployment.publish_views
Airflow task and we don't get failures like bug 1864961.Checklist for reviewer:
<username>:<branch>
of the fork as parameter. The parameter will also show upin the logs of the
manual-trigger-required-for-fork
CI task together with more detailed instructions.For modifications to schemas in restricted namespaces (see
CODEOWNERS
):┆Issue is synchronized with this Jira Task