Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research Request - Suppress private datasets from being published #1220

Open
4 of 5 tasks
tiffanychu90 opened this issue Sep 11, 2024 · 0 comments
Open
4 of 5 tasks
Assignees
Labels
open-data Work related to publishing, ingesting open data research request Issues that serve as a request for research (summary and handoff)

Comments

@tiffanychu90
Copy link
Member

tiffanychu90 commented Sep 11, 2024

Complete the below when receiving a research request, and continue to add to this issue as you receive additional details and produce deliverables. Be sure to also add the appropriate project-level label to this issue (eg gtfs-rt, DLA).

Research Question

Single sentence description: All our published analyses (datasets, data products, etc) should remove private datasets. We will still allow the feeds to go through data processing in our pipelines, and simply exclude them at the end. Add a function in shared_utils to handle the list we can include...either gtfs_utils_v2 or publish_utils.

Detailed description:

  • See data-infra GH issue -- PR has been merged in
  • The relevant table affected is mart_transit_database.dim_gtfs_datasets, where an additional column appears. We create a crosswalk in gtfs_funnel and then bring that crosswalk in at the last stages of the analytics pipeline (when we add columns like caltrans_district / ntd_id, etc), so this would be the step we want to exclude private datasets.
  • Note: I think we would grab 2 feeds for certain operators (Big Blue Bus has its own feed + Swiftly has a feed), so we would still be able to display Big Blue Bus analysis, just the one derived from its own feed.

Update these references:

  • geoportal - high quality transit areas: ca_hq_transit_areas, ca_hq_transit_stops
  • geoportal - GTFS schedule: ca_transit_routes / ca_transit_stops
  • geoportal - speeds: speeds_by_stop_segments, speeds_by_route_timeofday
  • portfolio - GTFS digest
  • portfolio - speedmaps
@tiffanychu90 tiffanychu90 added open-data Work related to publishing, ingesting open data research request Issues that serve as a request for research (summary and handoff) labels Sep 11, 2024
@tiffanychu90 tiffanychu90 self-assigned this Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
open-data Work related to publishing, ingesting open data research request Issues that serve as a request for research (summary and handoff)
Projects
None yet
Development

No branches or pull requests

1 participant