New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Iss417 #440

Open

tinatinc wants to merge 26 commits into main from Iss417

Collaborator

tinatinc commented Dec 30, 2024

Mobility metric pull request template

Please include the following points in your PR:

A link to the issue that this PR relates to: Issue 417
A description of the content in this pull request.

What was changed?
I added/appended the 2020 data for both the transit cost and transit trip metrics (both overall and subgroup files). Previously, we only had 2015 and 2019 data.
What should the reviewer be focusing on?
Any errors I may have made in replicating the same calculations for 2020. Specifically, tract/county differences in crosswalks (since 2020 is a decade change, and there were jurisdictional changes in both counties and tract numbers).
Is there a logical order to review the files in?
I would start with transit_cost_county, then transit_trips_county, these two calculate the two metrics at the county level. Then transit_trips_cost_city, which calculates both metrics in one file instead, now at the city (place) level. Then, transit_trips_cost_county_subgroups, which creates the subgroup data at the county level for both metrics. Then transit_trips_cost_city_subgroups, which creates the subgroup data at the city (place) level for both metrics.

Detail on any issues or flags that the metric reviewer/data-team should be aware of.
None that I can think of, but I may have messed up on when/if to include the 8 CT counties that became defunct recently. I need to double check on the crosswalks

cdsolari and others added 8 commits

October 17, 2024 12:19


          Update README.md

a378739


          Resolve merge conflict by accepting ReadMe suggestions

98c0c50


          transit cost - county, nonsubgroup update

e5c982a

Code & final output file updated


          Transit Trips County + City nonsubgroup update

757928d

2020 data added to code and output files for both county and city level transit trips data


          Transit trips county nonsubgroup update

5de2b7d

Update to code and output file for transit trips - county, nonsubgroup


          changing the year of the update at the top of the code files

d1e5c36

forgot to do this earlier


          County - Transit Trips Subgroups + Transit Cost Subgroups update

0b4c803

Code and output files updated to include 2020 data for transit cost & transit trips at the COUNTY level


          City - Transit Cost & Trips Subgroups update adding 2020

0a58a38

Code and output files updates for the city (place) -level data for both transit trips and transit costs with race subgroups

tinatinc marked this pull request as draft

December 30, 2024 06:12

cdsolari requested a review from jwalsh28

December 31, 2024 15:00

jwalsh28 requested changes

View reviewed changes

Collaborator

jwalsh28 left a comment

Hey @tinatinc - I'm about halfway through the review but wanted to submit my comments so far to give you more time. Overall the changes look good and I do not see anything wrong with the code but have made some suggestions for improvemnets.

Two big things:

A folder for "data" does not currently exist in the transportation folder when I checkout to your branch. This can cuase confusion because your instructions in the program state to download the raw data in the "data" folder. I think you should either add one or update the insturctions to state that you need to create a data folder.
The historgrams in the city trips_cost code are currently not showing anything, I believe because the bin widths are set to high. These are important as they are the only viz of the post weighted data, would be key to improve these.

06_neighborhoods/Transportation/transit_cost_county.qmd Show resolved Hide resolved

06_neighborhoods/Transportation/transit_cost_county.qmd

		@@ -46,6 +46,7 @@ repository folder

Collaborator

jwalsh28 Dec 31, 2024

Noting that the "data" folder does not exist inside the Transportation folder, either add the data folder or instruct reviewers to create it

Collaborator Author

tinatinc Jan 14, 2025

Added instruction for users/reviewers to create this folder to use it

06_neighborhoods/Transportation/transit_cost_county.qmd Outdated

+                filter(year == 2020)
+              ```
+              The 2015 and 2019 files have the same number of observations (3134, down from 3142 due to removing the 8 CT counties). 2020 file has 3,143 for due to the Alaska county split. Checking that's the case below:

Collaborator

jwalsh28 Dec 31, 2024

In the above code chunk can you add a count argument so that the count of the dataframes prints? This would make it easier to see what you are stating here in the text.

Collaborator Author

tinatinc Jan 14, 2025

Done!

06_neighborhoods/Transportation/transit_cost_county.qmd Outdated

+              }
+              ```
+              No missing values for 2020.

Collaborator

jwalsh28 Dec 31, 2024

When I ran this I am seeing there is one missing value: State 48, County 243

Collaborator Author

tinatinc Jan 14, 2025

Yes! Thank you for catching - 1 missing value for 2020

06_neighborhoods/Transportation/transit_cost_county.qmd

		```

		Combined file has 9427 observations, which is correct (3142+3142+3143)

Collaborator

jwalsh28 Dec 31, 2024

Similar to above, I would recommend adding a count argument so the number of observations is printed

Collaborator Author

tinatinc Jan 14, 2025

Done

06_neighborhoods/Transportation/transit_cost_county.qmd

               ```
+              Combined file has 9427 observations, which is correct (3142+3142+3143)
               Keep variables of interest and order them appropriately also rename to correct var names

Collaborator

jwalsh28 Dec 31, 2024

It would be helpful to see the distriubtion of transit costs by county for all three years visualized together to observe similarity or movements.

Collaborator Author

tinatinc Jan 14, 2025

Added! Added commentary as well -- TLDR, the distributions are comparable, but costs increased from 2015 to 2019, and then decreased a lot in 2020 to below 2015 levels (which tracks, given this was the COVID year)

06_neighborhoods/Transportation/transit_trips_cost_city.qmd

+              transit_trips_tracts_2020 <- transport_tracts_2020 %>%
+                select(GEOID, state, county, tract, blkgrps, population, households, transit_trips_80ami)
+              transit_cost_tracts_2020 <- transport_tracts_2020 %>%

Collaborator

jwalsh28 Dec 31, 2024

This applies to all years - it would be great to have some test or viz here that confirms all files have been read in properly. Is there a count of tracts we should expect?

Collaborator Author

tinatinc Jan 14, 2025

Not really - tract data variable each year. But I have added a counter for each, and have checked for duplicates in the final outputs. Added distribution graphs for comparison and visual checking.

06_neighborhoods/Transportation/transit_trips_cost_city.qmd


		Collapse to places and also create data quality marker

		```{r}

Collaborator

jwalsh28 Dec 31, 2024

Can you add to the comment above this calculation. This is an important and relatively complicated step, please explain what the weighting process does and why we multiply afact and hhwt.

Collaborator Author

tinatinc Jan 14, 2025

Added

06_neighborhoods/Transportation/transit_trips_cost_city.qmd Outdated

@@ @@ -371,13 +505,17 @@ Examine outliers @@
               ggplot(transit_cost_city_2015, aes(x=index_transportation_cost)) + geom_histogram(binwidth=10) + labs(y="number of places", x="Annual Transit Cost for the Regional Moderate Income Household, 2015")
               ggplot(transit_cost_city_2019, aes(x=index_transportation_cost)) + geom_histogram(binwidth=10) + labs(y="number of places", x="Annual Transit Cost for the Regional Moderate Income Household, 2019")
+              ggplot(transit_cost_city_2020, aes(x=index_transportation_cost)) + geom_histogram(binwidth=10) + labs(y="number of places", x="Annual Transit Cost for the Regional Moderate Income Household, 2020")

Collaborator

jwalsh28 Dec 31, 2024

The way these ggplot visuals are turning out currently does not offer much information, I think the bin widths are set too wide - it is just appearing as one large grey square.

Collaborator Author

tinatinc Jan 14, 2025

Good catch, yes -- moved it to 0.01 bin widths, much clearer. Distributions look normal

awunderground and others added 13 commits

January 2, 2025 15:36


          Add folder for final forms

0db2a88


          Merge pull request #442 from UI-Research/forms_folder

d6b67a1

Add folder for final forms


          homeless and ela county files

f699e67


          updating place-populations crosswalk to add 2014 PEP data

ef80eaa

Added 2014 PEP population data into the crosswalk manually since the API is limited


          Update create-place-populations.qmd

ad4b566


          Adding 2014 PEP population data and re-adding the 8 CT counties throu…

f431c24

…gh 2021

As discussed, aligning with the data team decision to maintain the original 8 CT counties through 2021.
Also manually adding PEP population data for 2014 to complete previously missing data.


          Updates to the README

1028be4

Kicked off more specific documentation about the crosswalks in the README - will revisit as needed


          Merge pull request #443 from UI-Research/Iss425

6ceac83

Iss425


          Merge branch 'version2025' of https://github.com/UI-Research/mobility…

2bee287

…-from-poverty into Iss417


          transit_cost_county code and output update

fcf6878

Updates made based on JP's feedback


          transit_trips and transit_cost code and files updated for CITY

bc3e183

Changes made according to JP's initial review


          transit_trips_county code and output files updated

819b87d

Applying JP's suggested changes across similar code(s) & rerunning the data because the CT crosswalks were updated


          transit_trips_cost_county_subgroups code and files update

46b4021

Moved over all the same edits JP suggested on earlier files. Re-ran with updated crosswalk files

tinatinc marked this pull request as ready for review

January 14, 2025 05:31

tinatinc added 5 commits

January 14, 2025 00:32


          transit_cost_all_subgroups_city code and files update

d1ada2f

Updated based on JP's feedback on other codes. Reran the data output accordingly


          Evaluation form for transit metrics added

e0366ad


          removed final evaluation form due to confusion

7515a79


          evaluation forms

6d0c9c0


          Adding final evaluation to 2 of the code files (all, county)

ad631d1

cdsolari assigned jwalsh28

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet