Skip to content

9_Final File Output

jwalsh28 edited this page Oct 17, 2024 · 1 revision

Overview

Every mobility metric code creates a final file or files which should be in the form of CSV document(s) that contain the mobility metric calculated for all the available geographies (either the place or county level) for a selection of years, along with the other required variables in the format described in the data expectations section.

This final CSV file should be titled clearly (including whether it contains subgroups or not) and saved/read out into a set folder within the domain that the metric falls under. Final files are taken by the metric data team at the end of an update cycle and combined to create the Mobility Metric Data that we offer via the Urban Data Catalog. By making your final files match the expectations described in this document you can help ensure the creation of the combined files moves smoothly and avoids tedious corrections. This section includes details on how these files should be organized and tested before they are submitted for review.

Final File Folders

After finishing the code for a metric, you will need to read out the results of that program as a CSV. These final files should be read out into a folder inside of the broader folder associated with the domain your metric falls under (for example, “08_education/data/final”).  

While over time the folders containing the final data have taken on different names, we encourage the following format for this folder: 

“[Domain Folder]/data/final/[final_data_files.csv]”

If a “data/final” folder does not already exist inside of your domain folder, please create one. It is likely that there is already data output for this metric in a different folder. If you are replacing that data in your update please leave the older versions, the data team will manage the cleaning up of these historical files over time. For any questions on managing historical files please reference the Years and Version History tab in this wiki. 

Final File Titles and Organization

Final files should come as close to matching the format of the Mobility Metric Data file they will ultimately become a part of. The Data Catalog page that hosts the Mobility Metric Data provides helpful information on which files contain what data if you are unsure.  

The final files should be organized such that there is an individual file for the overall metric and each subgroup calculation of the metric separated by place and county (when both are available). For example, the college readiness metric has county and place data and offers subgroup information on race-ethnicity, gender and disability. The correct final file organization would include the 8 CSV files below: 

  • college_county_all_longitudinal.csv  

  • college_county_disability_longitiduinal.csv  

  • college_county_gender_longitudinal.csv 

  • college_county_race-ethnicity_longitudinal.csv  

  • college_place_all_longitudinal.csv  

  • college_place_disability_longitudinal.csv  

  • college_place_gender_longtiudinal.csv 

  • college_place_race-ethnicity_longitduinal.csv

Final File Evaluation

Every metric update is required to successfully pass a final data evaluation function created by the Upward Mobility data team. This function needs to be applied to every final data file that is output from a program in the mobility from poverty repository.

The function tests a series of baseline requirements for how the final data should be written out (see data expectations in this Wiki). It also looks at information from the final_data_expectations form which is filled out by a metric lead at the start of an update and must be approved by their technical reviewer (see final data expectations form).

The function itself lives in the "functions/testing" file in the repository and is a R script title "evaluate_final_data.R". The template for the final data expectation evaluation form is also in the "functions/testing" as a CSV document titled "final_data_evaluation_form.csv".    For an example of using this function and filling out the data expectations form please see functions/testing/Final Data Evaluation Text Example.html. This file walks through a real example using the housing accessibility and affordability metric.    

Function Arguments  

Metric leads will have to fill out the function with the correct arguments to evaluate their final data. These arguments should be known by the metric lead as they relate directly to the expectations for the final data. 

  1. exp_form_path: This should be the file path to the final_data_evaluation_form that the metric lead has filled out for this update. 
  2. data: This should be the name of the data frame in the program that is staged to be read out as the final data file. 
  3. geography: Either county or place depending on the content of the data file. Note the function defaults to county. 
  4. subgroups: A logical true or false argument that tells the function whether this final data has subgroups or not. 
  5. confidence_intervals: A logical true or false argument that tells the function whether this final data has confidence intervals or not.