Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CTEs and optimize fct_daily_rt_feed_validation_notices #3509

Merged
merged 1 commit into from
Oct 18, 2024

Conversation

erikamov
Copy link
Contributor

Description

This PR is a second pass on optimizing fct_daily_rt_feed_validation_notices to fix issue #3481.

@ohrite and I split the query into 4 CTEs

  • outcomes -> notices
  • daily_feeds x codes
  • daily_feed aggregate calculations per day/url
  • outcome aggregate calculations per day/url

This reduced the runtime to < 1m. Since the restructure represents a fundamental change in how the data is aggregated, we are working on validating the output.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

How has this been tested?

We've queried both production and staging tables, compared row values for 2023-12-28, and generated csv files for 2024-10-16, comparing both files through DIFF and had zero differences.

Used SQLFluff to check the query:

sqlfluff lint warehouse/models/mart/gtfs_quality/fct_daily_rt_feed_validation_notices.sql
All Finished 📜 🎉!

The table was successfully created on staging:

❯ poetry run dbt run -s +"models/mart/gtfs_quality/fct_daily_rt_feed_validation_notices.sql"
21:48:05  Running with dbt=1.5.1
21:48:07  [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 1 unused configuration paths:
- models.calitp_warehouse.mart.ad_hoc
21:48:07  Found 420 models, 949 tests, 0 snapshots, 0 analyses, 852 macros, 0 operations, 12 seed files, 175 sources, 4 exposures, 0 metrics, 0 groups
21:48:07
21:48:10  Concurrency: 8 threads (target='dev')
21:48:10
21:48:10  1 of 28 START sql view model erika_staging.stg_gtfs_quality__rt_validation_code_descriptions  [RUN]
21:48:10  2 of 28 START sql view model erika_staging.stg_gtfs_rt__service_alerts_outcomes  [RUN]
21:48:10  3 of 28 START sql view model erika_staging.stg_gtfs_rt__service_alerts_validation_notices  [RUN]
21:48:10  4 of 28 START sql view model erika_staging.stg_gtfs_rt__service_alerts_validation_outcomes  [RUN]
21:48:10  5 of 28 START sql view model erika_staging.stg_gtfs_rt__trip_updates_outcomes .. [RUN]
21:48:10  6 of 28 START sql view model erika_staging.stg_gtfs_rt__trip_updates_validation_notices  [RUN]
21:48:10  7 of 28 START sql view model erika_staging.stg_gtfs_rt__trip_updates_validation_outcomes  [RUN]
21:48:10  8 of 28 START sql view model erika_staging.stg_gtfs_rt__vehicle_positions_outcomes  [RUN]
21:48:11  7 of 28 OK created sql view model erika_staging.stg_gtfs_rt__trip_updates_validation_outcomes  [CREATE VIEW (0 processed) in 0.99s]
21:48:11  9 of 28 START sql view model erika_staging.stg_gtfs_rt__vehicle_positions_validation_notices  [RUN]
21:48:11  2 of 28 OK created sql view model erika_staging.stg_gtfs_rt__service_alerts_outcomes  [CREATE VIEW (0 processed) in 1.02s]
21:48:11  6 of 28 OK created sql view model erika_staging.stg_gtfs_rt__trip_updates_validation_notices  [CREATE VIEW (0 processed) in 1.02s]
21:48:11  10 of 28 START sql view model erika_staging.stg_gtfs_rt__vehicle_positions_validation_outcomes  [RUN]
21:48:11  11 of 28 START sql view model erika_staging.stg_gtfs_schedule__agency .......... [RUN]
21:48:11  8 of 28 OK created sql view model erika_staging.stg_gtfs_rt__vehicle_positions_outcomes  [CREATE VIEW (0 processed) in 1.05s]
21:48:11  4 of 28 OK created sql view model erika_staging.stg_gtfs_rt__service_alerts_validation_outcomes  [CREATE VIEW (0 processed) in 1.05s]
21:48:11  3 of 28 OK created sql view model erika_staging.stg_gtfs_rt__service_alerts_validation_notices  [CREATE VIEW (0 processed) in 1.06s]
21:48:11  12 of 28 START sql view model erika_staging.stg_gtfs_schedule__download_outcomes  [RUN]
21:48:11  13 of 28 START sql view model erika_staging.stg_gtfs_schedule__file_parse_outcomes  [RUN]
21:48:11  14 of 28 START sql view model erika_staging.stg_gtfs_schedule__unzip_outcomes .. [RUN]
21:48:11  1 of 28 OK created sql view model erika_staging.stg_gtfs_quality__rt_validation_code_descriptions  [CREATE VIEW (0 processed) in 1.10s]
21:48:11  15 of 28 START sql view model erika_staging.stg_transit_database__gtfs_datasets  [RUN]
21:48:11  5 of 28 OK created sql view model erika_staging.stg_gtfs_rt__trip_updates_outcomes  [CREATE VIEW (0 processed) in 1.18s]
21:48:12  16 of 28 START sql incremental model erika_staging.int_gtfs_rt__unioned_parse_outcomes  [RUN]
21:48:12  14 of 28 OK created sql view model erika_staging.stg_gtfs_schedule__unzip_outcomes  [CREATE VIEW (0 processed) in 0.72s]
21:48:12  10 of 28 OK created sql view model erika_staging.stg_gtfs_rt__vehicle_positions_validation_outcomes  [CREATE VIEW (0 processed) in 0.78s]
21:48:12  17 of 28 START sql incremental model erika_staging.int_gtfs_quality__rt_validation_outcomes  [RUN]
21:48:12  9 of 28 OK created sql view model erika_staging.stg_gtfs_rt__vehicle_positions_validation_notices  [CREATE VIEW (0 processed) in 0.95s]
21:48:12  18 of 28 START sql incremental model erika_staging.int_gtfs_quality__rt_validation_notices  [RUN]
21:48:12  11 of 28 OK created sql view model erika_staging.stg_gtfs_schedule__agency ..... [CREATE VIEW (0 processed) in 0.93s]
21:48:12  12 of 28 OK created sql view model erika_staging.stg_gtfs_schedule__download_outcomes  [CREATE VIEW (0 processed) in 0.99s]
21:48:12  13 of 28 OK created sql view model erika_staging.stg_gtfs_schedule__file_parse_outcomes  [CREATE VIEW (0 processed) in 1.02s]
21:48:12  19 of 28 START sql view model erika_staging.int_gtfs_schedule__grouped_feed_file_parse_outcomes  [RUN]
21:48:12  15 of 28 OK created sql view model erika_staging.stg_transit_database__gtfs_datasets  [CREATE VIEW (0 processed) in 1.04s]
21:48:12  20 of 28 START sql table model erika_staging.int_transit_database__gtfs_datasets_dim  [RUN]
21:48:14  19 of 28 OK created sql view model erika_staging.int_gtfs_schedule__grouped_feed_file_parse_outcomes  [CREATE VIEW (0 processed) in 1.19s]
21:48:14  21 of 28 START sql view model erika_staging.int_gtfs_schedule__joined_feed_outcomes  [RUN]
21:48:15  21 of 28 OK created sql view model erika_staging.int_gtfs_schedule__joined_feed_outcomes  [CREATE VIEW (0 processed) in 1.22s]
21:48:15  22 of 28 START sql table model erika_mart_gtfs.dim_schedule_feeds .............. [RUN]
21:48:17  20 of 28 OK created sql table model erika_staging.int_transit_database__gtfs_datasets_dim  [CREATE TABLE (4.9k rows, 4.9 GiB processed) in 4.30s]
21:48:17  23 of 28 START sql table model erika_mart_transit_database.bridge_schedule_dataset_for_validation  [RUN]
21:48:17  24 of 28 START sql table model erika_mart_transit_database.dim_gtfs_datasets ... [RUN]
21:48:19  23 of 28 OK created sql table model erika_mart_transit_database.bridge_schedule_dataset_for_validation  [CREATE TABLE (2.8k rows, 509.1 KiB processed) in 2.19s]
21:48:19  24 of 28 OK created sql table model erika_mart_transit_database.dim_gtfs_datasets  [CREATE TABLE (4.9k rows, 1.6 MiB processed) in 2.76s]
21:48:19  25 of 28 START sql table model erika_staging.int_transit_database__urls_to_gtfs_datasets  [RUN]
21:48:22  25 of 28 OK created sql table model erika_staging.int_transit_database__urls_to_gtfs_datasets  [CREATE TABLE (4.9k rows, 865.1 KiB processed) in 2.21s]
21:48:36  17 of 28 OK created sql incremental model erika_staging.int_gtfs_quality__rt_validation_outcomes  [SCRIPT (4.6 GiB processed) in 24.07s]
21:48:37  16 of 28 OK created sql incremental model erika_staging.int_gtfs_rt__unioned_parse_outcomes  [SCRIPT (3.8 GiB processed) in 24.60s]
21:49:27  22 of 28 OK created sql table model erika_mart_gtfs.dim_schedule_feeds ......... [CREATE TABLE (13.9k rows, 11.3 GiB processed) in 72.60s]
21:49:27  26 of 28 START sql table model erika_mart_gtfs.fct_daily_schedule_feeds ........ [RUN]
21:49:32  26 of 28 OK created sql table model erika_mart_gtfs.fct_daily_schedule_feeds ... [CREATE TABLE (277.4k rows, 2.8 MiB processed) in 4.14s]
21:49:32  27 of 28 START sql incremental model erika_mart_gtfs.fct_daily_rt_feed_files ... [RUN]
21:49:35  27 of 28 OK created sql incremental model erika_mart_gtfs.fct_daily_rt_feed_files  [MERGE (402.0 rows, 175.3 MiB processed) in 3.44s]
21:52:07  18 of 28 OK created sql incremental model erika_staging.int_gtfs_quality__rt_validation_notices  [SCRIPT (15.3 GiB processed) in 234.48s]
21:52:07  28 of 28 START sql table model erika_mart_gtfs_quality.fct_daily_rt_feed_validation_notices  [RUN]
21:52:20  28 of 28 OK created sql table model erika_mart_gtfs_quality.fct_daily_rt_feed_validation_notices  [CREATE TABLE (206.2k rows, 39.2 GiB processed) in 13.07s]
21:52:20
21:52:20  Finished running 17 view models, 7 table models, 4 incremental models in 0 hours 4 minutes and 12.53 seconds (252.53s).
21:52:20
21:52:20  Completed successfully
21:52:20
21:52:20  Done. PASS=28 WARN=0 ERROR=0 SKIP=0 TOTAL=28\

Post-merge follow-ups

  • No action required
  • Actions required (specified below)

Watch the next time the DAG runs to see the results and no more errors.

Copy link
Contributor

@ohrite ohrite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🍐 pairing with @erikamov

Copy link

Warehouse report 📦

DAG

Legend (in order of precedence)

Resource type Indicator Resolution
Large table-materialized model Orange Make the model incremental
Large model without partitioning or clustering Orange Add partitioning and/or clustering
View with more than one child Yellow Materialize as a table or incremental
Incremental Light green
Table Green
View White

@erikamov erikamov merged commit b153ba0 into main Oct 18, 2024
4 checks passed
@erikamov erikamov deleted the 3481-validation-notices branch October 18, 2024 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants