From d81d53bbcdcce6bf24d992d83ffb1f4fc1879fd1 Mon Sep 17 00:00:00 2001 From: Laurie <55149902+lauriemerrell@users.noreply.github.com> Date: Thu, 10 Aug 2023 11:29:01 -0500 Subject: [PATCH] Airtable docs updates (#2868) * wip updates * foreign key docs and other updates * rearrange navigation * remove legacy docs section * address failures in docs build - remove unused airflow page and fix toc * rename airtable page * remove references to contacting charlie * update link to refactored architecture data page * phrasing update per pr review and add link to the google sheet --- docs/_toc.yml | 9 +- docs/airflow/dags-maintenance.md | 4 +- docs/airflow/overview.md | 6 - docs/analytics_onboarding/overview.md | 2 +- docs/analytics_tools/jupyterhub.md | 2 +- docs/contribute/contribute-best-practices.md | 2 +- docs/datasets_and_tables/transitdatabase.md | 131 ------------------ docs/transit_database/transitdatabase.md | 80 +++++++++++ .../navigating_dbt_docs.md} | 7 - 9 files changed, 88 insertions(+), 155 deletions(-) delete mode 100644 docs/airflow/overview.md delete mode 100644 docs/datasets_and_tables/transitdatabase.md create mode 100644 docs/transit_database/transitdatabase.md rename docs/{datasets_and_tables/overview.md => warehouse/navigating_dbt_docs.md} (81%) diff --git a/docs/_toc.yml b/docs/_toc.yml index 9e741068bf..c5e4a6dc0c 100644 --- a/docs/_toc.yml +++ b/docs/_toc.yml @@ -31,13 +31,11 @@ parts: - file: warehouse/overview sections: - file: warehouse/warehouse_starter_kit + - file: warehouse/navigating_dbt_docs - file: warehouse/what_is_agency - file: warehouse/developing_dbt_models - file: warehouse/adding_oneoff_data - file: warehouse/what_is_gtfs - - file: datasets_and_tables/overview - sections: - - file: datasets_and_tables/transitdatabase - file: publishing/overview sections: - glob: publishing/sections/* @@ -47,9 +45,8 @@ parts: sections: - file: architecture/services - file: architecture/data - - file: airflow/overview - sections: - - file: airflow/dags-maintenance + - file: airflow/dags-maintenance + - file: transit_database/transitdatabase - file: kubernetes/README sections: - file: kubernetes/JupyterHub diff --git a/docs/airflow/dags-maintenance.md b/docs/airflow/dags-maintenance.md index 5303e934c4..b2a4549ef6 100644 --- a/docs/airflow/dags-maintenance.md +++ b/docs/airflow/dags-maintenance.md @@ -1,7 +1,7 @@ (dags-maintenance)= -# Production DAGs Maintenance +# Airflow Operational Considerations -We use [Airflow](https://airflow.apache.org/) to orchestrate our data ingest processes. This page describes how to handle cases where an Airflow [DAG task](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/tasks.html) fails. +We use [Airflow](https://airflow.apache.org/) to orchestrate our data ingest processes. This page describes how to handle cases where an Airflow [DAG task](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/tasks.html) fails. For general information about Airflow development, see the [Airflow README in the data-infra GitHub repo](https://github.com/cal-itp/data-infra/blob/main/airflow/README.md). ## Monitoring DAGs diff --git a/docs/airflow/overview.md b/docs/airflow/overview.md deleted file mode 100644 index 553a8a0207..0000000000 --- a/docs/airflow/overview.md +++ /dev/null @@ -1,6 +0,0 @@ -(airflow)= -# Airflow - -Cal-ITP has a managed instance of Airflow (called Google Cloud Composer) used to orchestrate various pieces of our data pipeline. - -See the [Airflow README](https://github.com/cal-itp/data-infra/blob/main/airflow/README.md) for local development and testing instructions. diff --git a/docs/analytics_onboarding/overview.md b/docs/analytics_onboarding/overview.md index 178b76ba5e..4cb510c2fb 100644 --- a/docs/analytics_onboarding/overview.md +++ b/docs/analytics_onboarding/overview.md @@ -42,7 +42,7 @@   (get-help)= ```{admonition} Still need access to a non-Caltrans tool above? -DM Charlie on Cal-ITP Slack using this link, or by email. +Ask on the `#services-team` channel in the Cal-ITP Slack. ``` ## New Analyst Training Curriculum diff --git a/docs/analytics_tools/jupyterhub.md b/docs/analytics_tools/jupyterhub.md index 0e230c060d..601b9c712f 100644 --- a/docs/analytics_tools/jupyterhub.md +++ b/docs/analytics_tools/jupyterhub.md @@ -28,7 +28,7 @@ This avoids the need to set up a local environment, provides dedicated storage, JupyterHub currently lives at [notebooks.calitp.org](https://notebooks.calitp.org/). -Note: you will need to have been added to the Cal-ITP organization on GitHub to obtain access. If you have yet to be added to the organization and need to be, DM Charlie on Cal-ITP Slack using this link. +Note: you will need to have been added to the Cal-ITP organization on GitHub to obtain access. If you have yet to be added to the organization and need to be, ask in the `#services-team` channel in Slack. (connecting-to-warehouse)= ### Connecting to the Warehouse diff --git a/docs/contribute/contribute-best-practices.md b/docs/contribute/contribute-best-practices.md index 921c827067..9aa63913a2 100644 --- a/docs/contribute/contribute-best-practices.md +++ b/docs/contribute/contribute-best-practices.md @@ -33,7 +33,7 @@ If you feel a new section is warranted, make sure you follow Jupyter Book's guid (new-pages)= ### New Pages and Chapters -Add new pages and chapters only as truly needed. If you're unsure of whether a new page or chapter is necessary, reach out to `@Charlie Costanzo` on `Cal-ITP Slack`. +Add new pages and chapters only as truly needed. If you are adding new pages or chapters, you will need to also update the `_toc.yml` file. You can find more information at Jupyter Book's resource [Structure and organize content](https://jupyterbook.org/basics/organize.html). diff --git a/docs/datasets_and_tables/transitdatabase.md b/docs/datasets_and_tables/transitdatabase.md deleted file mode 100644 index 2dd6080122..0000000000 --- a/docs/datasets_and_tables/transitdatabase.md +++ /dev/null @@ -1,131 +0,0 @@ -# Transit Database - -The Cal-ITP Airtable Transit Database stores key relationships about how transit services are organized and operated in California as well as how well they are performing. See Evan or Hunter to get a link and gain access. - -We have chosen to group and maintain the tables into the following Airtable bases as follows: - -| **Table Set** | **Description** | **Data Maintainer** | -| :------------ | :-------------- | :------------------ | -| [**California Transit**](#california-transit) | Defines key organizational relationships and properties. Organizations, geography, funding programs, transit services, service characteristics, transit datasets such as GTFS, and the intersection between transit datasets and services. | *Elizabeth*
Evan handling uptake to warehouse | -| [**Transit Data Assessments**](#transit-data-assessments) | Articulates data performance metrics and assessments.| *Elizabeth*
*Evan* handling uptake to warehouse
*Olivia* a key User Advocate. | -| [**Transit Technology Stacks**](#transit-technology-stacks) | Defines operational setups at transit provider organizations. Defines relationships between vendor organizations, transit provider and operator organizations, products, contracts to provide products, transit stack components, and how they relate to one-another. Structure still somewhat a `WIP`. | *Elizabeth*
No warehouse uptake for time being. | - -While `organizations` and `services` are central to many of the tables, we have chosen to maintain them as part of the California Transit Base which will be referenced by the other two. - -## Airtable things - -### Primary Keys - -Airtable forces the use of the left-most field as the primary key of the database: the field that must be referenced in other tables, similar to a VLOOKUP in a spreadsheet. Unlike many databases, Airtable doesn't enforce uniqueness in the values of the primary key field. Instead, it assigns it an underlying and mostly hidden unique [`RECORD ID`](https://support.airtable.com/hc/en-us/articles/360051564873-Record-ID), which can be exposed by creating a formula field to reference it. - -For the sake of this documentation, we've noted the [`Primary Field`](https://support.airtable.com/hc/en-us/articles/202624179-The-primary-field), which is not guaranteed to be unique. Some tables additionally expose the unique [`RECORD ID`](https://support.airtable.com/hc/en-us/articles/360051564873-Record-ID) as well. - -### Full Documentation of Fields - -AirTable does not currently have an effective mechanism to programmatically download your data schema (they have currently paused issuing keys to their metadata API). Rather than manually type-out and export each individual field definition from AirTable, please see the [AirTable-based documentation of fields](https://airtable.com/appPnJWrQ7ui4UmIl/api/docs) which is produced as a part of their API documentation. Note that you must be authenticated with access to the base to reach this link. - -## California Transit - -| **Name**
*Key(s)*| **Description** | -| :------------- | :-------------- | -| `organizations`

*Primary Field*: `Name` | Records are legal organizations, including companies, governmental bodies, or non-profits.

Table includes information on organizational properties (i.e. locale, type) as well as summarizations of its various relationships (e.g. `services` for a transit provider, or `products` for a vendor).

An organization MAY:
-| `services`

*Primary Field*: `Name` | Each record defines a transit service and its properties.

While there are a small number of exceptions (e.g. Solano Express, which is jointly managed by Solano and Napa), generally each transit service is managed by a single organization. Transit services are differentiated from each other by variation (or the potentiality of variation) in one or more of the following:
Services MAY: