Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research Request - NTD ID to source_record_id crosswalk #1121

Closed
tiffanychu90 opened this issue May 29, 2024 · 2 comments
Closed

Research Request - NTD ID to source_record_id crosswalk #1121

tiffanychu90 opened this issue May 29, 2024 · 2 comments
Assignees
Labels
ntd National Transit Database research request Issues that serve as a request for research (summary and handoff)

Comments

@tiffanychu90
Copy link
Member

tiffanychu90 commented May 29, 2024

Complete the below when receiving a research request, and continue to add to this issue as you receive additional details and produce deliverables. Be sure to also add the appropriate project-level label to this issue (eg gtfs-rt, DLA).

Research Question

Single sentence description: We want a bridge table to connect dim_organizations, specifically source_record_id (which is the GTFS operator identifier across quartets) to 2022 NTD ID. Our 2022 NTD table is here.

Detailed description: Shweta's worked on an earlier version of the crosswalk that looked at which NTD IDs are new / changed from past years. We need source_record_id added to all the NTD IDs in 2022.

How will this research be used?

Vivek will use this to update Airtable and update dim_organizations in the warehouse so we can have the 2022 NTD IDs.

Stakeholders & End-Users

Internal use

Data sources

  • Cal-ITP data sources:
  • mart_transit_database.dim_organizations
  • mart_ntd.dim_annual_ntd_agency_service

Deliverables

csv with 2 columns, saved in Google Cloud Storage for @vevetron to use

Timeline of deliverables

June

@tiffanychu90 tiffanychu90 added research request Issues that serve as a request for research (summary and handoff) ntd National Transit Database labels May 29, 2024
@csuyat-dot
Copy link
Contributor

csuyat-dot commented May 29, 2024

Joined dim_annual_ntd_agency_service and dim_organizations in Metabase. Currently stored in my personal collection on Metabase

Noticed on dim_orgs that some ntd IDs had pre-fixes (9R02-#####), so sub-string'd ntd ID to only show the last 5 digits. Joined substring ntd_id to ntd_id to get source_record_id for the 2022 NTD IDs. 210 NTD IDs from the 2022 NTD matched to a source record ID.
Data is saved to GSC in calitp-analytics-data/data-analyses/ntd

2 files:

  • ntd_id_2022_to_source_record_id_crosswalk_cleaned 2 columns, ntd id and source record id
  • ntd_id_2022_to_source_record_id_crosswalk_raw

@csuyat-dot
Copy link
Contributor

csuyat-dot commented Jun 3, 2024

Discussed results with Vikek.

  • Confirmed that this specific crosswalk does not exist in the mart or airtable.
  • Confirmed the crosswalk was not made by hand. (join was done via metabase)
  • Explained the methodology is sub-string'ing the ntd id from dim organizations to capture the last 5 digits. Vivek determined that the same can be done SQL (see Adds a column for the 2022 ntd ids data-infra#3364)
  • Vivek identified 27 orgs from dim_organizations that did not appear in the crosswalk. After examination, the 27 orgs did not appear in dim_annual_ntd_agency_service table, therefore was not part of the crosswalk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ntd National Transit Database research request Issues that serve as a request for research (summary and handoff)
Projects
Status: Done
Development

No branches or pull requests

3 participants