Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iss417 #440

Open
wants to merge 26 commits into
base: main
Choose a base branch
from
Open

Iss417 #440

wants to merge 26 commits into from

Conversation

tinatinc
Copy link
Collaborator

Mobility metric pull request template

Please include the following points in your PR:

  1. A link to the issue that this PR relates to: Issue 417

  2. A description of the content in this pull request.

  • What was changed?
    I added/appended the 2020 data for both the transit cost and transit trip metrics (both overall and subgroup files). Previously, we only had 2015 and 2019 data.

  • What should the reviewer be focusing on?
    Any errors I may have made in replicating the same calculations for 2020. Specifically, tract/county differences in crosswalks (since 2020 is a decade change, and there were jurisdictional changes in both counties and tract numbers).

  • Is there a logical order to review the files in?
    I would start with transit_cost_county, then transit_trips_county, these two calculate the two metrics at the county level. Then transit_trips_cost_city, which calculates both metrics in one file instead, now at the city (place) level. Then, transit_trips_cost_county_subgroups, which creates the subgroup data at the county level for both metrics. Then transit_trips_cost_city_subgroups, which creates the subgroup data at the city (place) level for both metrics.

  1. Detail on any issues or flags that the metric reviewer/data-team should be aware of.
    None that I can think of, but I may have messed up on when/if to include the 8 CT counties that became defunct recently. I need to double check on the crosswalks

cdsolari and others added 8 commits October 17, 2024 12:19
Code & final output file updated
2020 data added to code and output files for both county and city level transit trips data
Update to code and output file for transit trips - county, nonsubgroup
Code and output files updated to include 2020 data for transit cost & transit trips at the COUNTY level
Code and output files updates for the city (place) -level data for both transit trips and transit costs with race subgroups
@tinatinc tinatinc marked this pull request as draft December 30, 2024 06:12
@cdsolari cdsolari requested a review from jwalsh28 December 31, 2024 15:00
Copy link
Collaborator

@jwalsh28 jwalsh28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @tinatinc - I'm about halfway through the review but wanted to submit my comments so far to give you more time. Overall the changes look good and I do not see anything wrong with the code but have made some suggestions for improvemnets.

Two big things:

  1. A folder for "data" does not currently exist in the transportation folder when I checkout to your branch. This can cuase confusion because your instructions in the program state to download the raw data in the "data" folder. I think you should either add one or update the insturctions to state that you need to create a data folder.
  2. The historgrams in the city trips_cost code are currently not showing anything, I believe because the bin widths are set to high. These are important as they are the only viz of the post weighted data, would be key to improve these.

@@ -46,6 +46,7 @@ repository folder

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting that the "data" folder does not exist inside the Transportation folder, either add the data folder or instruct reviewers to create it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added instruction for users/reviewers to create this folder to use it

filter(year == 2020)
```

The 2015 and 2019 files have the same number of observations (3134, down from 3142 due to removing the 8 CT counties). 2020 file has 3,143 for due to the Alaska county split. Checking that's the case below:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the above code chunk can you add a count argument so that the count of the dataframes prints? This would make it easier to see what you are stating here in the text.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

}
```

No missing values for 2020.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I ran this I am seeing there is one missing value: State 48, County 243

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Thank you for catching - 1 missing value for 2020

```

Combined file has 9427 observations, which is correct (3142+3142+3143)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above, I would recommend adding a count argument so the number of observations is printed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

```

Combined file has 9427 observations, which is correct (3142+3142+3143)
Keep variables of interest and order them appropriately also rename to correct var names
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful to see the distriubtion of transit costs by county for all three years visualized together to observe similarity or movements.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added! Added commentary as well -- TLDR, the distributions are comparable, but costs increased from 2015 to 2019, and then decreased a lot in 2020 to below 2015 levels (which tracks, given this was the COVID year)

transit_trips_tracts_2020 <- transport_tracts_2020 %>%
select(GEOID, state, county, tract, blkgrps, population, households, transit_trips_80ami)

transit_cost_tracts_2020 <- transport_tracts_2020 %>%
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This applies to all years - it would be great to have some test or viz here that confirms all files have been read in properly. Is there a count of tracts we should expect?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really - tract data variable each year. But I have added a counter for each, and have checked for duplicates in the final outputs. Added distribution graphs for comparison and visual checking.


Collapse to places and also create data quality marker

```{r}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add to the comment above this calculation. This is an important and relatively complicated step, please explain what the weighting process does and why we multiply afact and hhwt.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

@@ -371,13 +505,17 @@ Examine outliers
ggplot(transit_cost_city_2015, aes(x=index_transportation_cost)) + geom_histogram(binwidth=10) + labs(y="number of places", x="Annual Transit Cost for the Regional Moderate Income Household, 2015")

ggplot(transit_cost_city_2019, aes(x=index_transportation_cost)) + geom_histogram(binwidth=10) + labs(y="number of places", x="Annual Transit Cost for the Regional Moderate Income Household, 2019")

ggplot(transit_cost_city_2020, aes(x=index_transportation_cost)) + geom_histogram(binwidth=10) + labs(y="number of places", x="Annual Transit Cost for the Regional Moderate Income Household, 2020")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way these ggplot visuals are turning out currently does not offer much information, I think the bin widths are set too wide - it is just appearing as one large grey square.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, yes -- moved it to 0.01 bin widths, much clearer. Distributions look normal

awunderground and others added 13 commits January 2, 2025 15:36
Added 2014 PEP population data into the crosswalk manually since the API is limited
…gh 2021

As discussed, aligning with the data team decision to maintain the original 8 CT counties through 2021.
Also manually adding PEP population data for 2014 to complete previously missing data.
Kicked off more specific documentation about the crosswalks in the README - will revisit as needed
Updates made based on JP's feedback
Changes made according to JP's initial review
Applying JP's suggested changes across similar code(s) & rerunning the data because the CT crosswalks were updated
Moved over all the same edits JP suggested on earlier files. Re-ran with updated crosswalk files
@tinatinc tinatinc marked this pull request as ready for review January 14, 2025 05:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants