2_Data Structure and Expectations 

Overview

The ultimate goal is to be able to seamlessly transition the separate metric datasets into a collection of main combined datasets which take on the several file formats described below. The structure of these combined files determines the standards set for individual final metric datasets.

The final files are separated by geographic level (city vs county) and whether the file includes a subgroup (e.g. race/ethnicity). The combined metric files are in the "long" format as opposed to a "wide" format, meaning each unique geography will account for more than one row because all the files contain multiple years of data. These data are hosted publicly on the Urban Institute data catalog.

The first three variables in every file (both overall and subgroups) should be year, state, and county/place. The variable year should be a four-digit numeric variable. 

state should be a two characters FIPS code. 
county should be a three-character FIPS code. 
place should the 5-digit census place FIPS. Intermediate files at the tract-level should include tract as the fourth variable.

Overall

The overall county and place files contain every mobility metric for all available years. These files have exactly one row per county/place per year. As a metric lead, you should create an overall file for your metric all available geographies with a format the matches the tables below. Note that not all variables have confidence intervals (CIs) but we encourage adding them when possible. For those that do not have CIs, these columns are not required. 

Example data: County level

year	state	county	Var1	Var1_lb	Var1_ub	Var1_quality
2014	01	001	v	v_lb	v_ub	v_quality
2014	01	003	v	v_lb	v_ub	v_quality
2014	01	005	v	v_lb	v_ub	v_quality
2014	01	007	v	v_lb	v_ub	v_quality

There should be a row in the overall data files for every county in each available year. If created correctly, the final row count will equal the product of the number of counties and the number of years. For information on the number of counties you should have per year please consult the crosswalks section. 

Example data: Place level

year	state	place	Var1	Var1_lb	Var1_ub	Var1_quality
2015	01	03076	v	v_lb	v_ub	v_quality
2015	01	07000	v	v_lb	v_ub	v_quality
2015	01	35896	v	v_lb	v_ub	v_quality
2015	01	37000	v	v_lb	v_ub	v_quality

There should be a row in the overall data files for every place in each available year. If created correctly, the final row count will equal the product of the number of places and the number of years. For information on the number of places you should have per year please consult the crosswalks section. 

Subgroups

The subgroups files follow a similar structure to the overall file, but unlike the overall these data will have multiple observations per county per year due to the subgroup values (i.e. race/ethnicity, poverty status, etc.). There currently exists 9 different subgroup types and respective subgroup combined datasets. The table below lists these subgroups and their respective values. If you are updating a metric that includes on of these subgroups the values for the subgroups in your final data should match a selection of the values listed below.

Subgroup	Values
Race-ethnicity	Black Black, Non Hispanic White White, Non-Hispanic Other Races and Ethnicities Hispanic
Race-share	Neighborhoods of color  White neighborhoods Mixed neighborhoods
Income	Economically Disadvantaged Not Economically Disadvantaged Less than $50,000 $50,000 or More High-Poverty Not High-Poverty
Age	Under Age 45 Age 45 and Over Age 10 to 14 Age 15 to 17
Gender	Male Female
Tenure	Renter Owner
Disability	With Disability Without Disability
Industry	Goods Producing Public Administration Trade, Transit, Utilities Information Services Professional Services Education and Health Leisure and Other
Mother’s education	Less than High School GED/High School Degree Some College College Degree or Higher

The first five variables in every subgroup file should be year, state, county/place, subgroup and subgroup type. The tables below provide an example of the Tenure subgroup type. 

Example data: County level

year	state	county	subgroup	subgroup_type	Var1	Var1_lb	Var1_ub	Var1_quality
2014	01	001	All	all	v	v_lb	v_ub	v_quality
2014	01	001	Renter	tenure	v	v_lb	v_ub	v_quality
2014	01	001	Owner	tenure	v	v_lb	v_ub	v_quality

There should be a row in the subgroup data for every county in each available year for each subgroup. If created correctly, the final row count will equal the product of the number of counties, the number of years and the number of subgroup values in the subgroup_type (including All). For information on the number of counties you should have per year please consult the crosswalks section.  

Example data: Place level  

year	state	place	subgroup	subgroup_type	Var1	Var1_lb	Var1_ub	Var1_quality
2015	01	03076	All	all	v	v_lb	v_ub	v_quality
2015	01	03076	Renter	tenure	v	v_lb	v_ub	v_quality
2015	01	03076	Owner	tenure	v	v_lb	v_ub	v_quality

There should be a row in the subgroup data for every place in each available year for each subgroup. If created correctly, the final row count will equal the product of the number of places, the number of years and the number of subgroup values in the subgroup_type (including All). For information on the number of places you should have per year please consult the crosswalks section. 

Subgroup “All” Values 

Subgroup files should contain an all row that includes the metric value for the overall population. This value should be represented with a capital a “All” under the subgroup variable and a lower case a “all” under the subgroup_type variable.  

Subgroup all values should be calculated using the same data that you are calculating all other subgroup values with. Note this will result in cases where the All result from the overall file differs from the All result from the subgroup file – which is expected if the input data are different.

Sorting

All files should be sorted by year, state, and county/place, the first three variables in every file. Files at different geographic levels should be sorted by year and then in order by largest geographic level (i.e. state) to smallest geographic level (i.e. city). 

Subgroup files should be sorted by year, state, county/place, subgroup_type, and subgroup. All sorting should be alphanumeric. Importantly, the race/ethnicity groups should be sorted alphabetically so that "Black, Non-Hispanic" appears first and "White, Non-Hispanic" appears last.   

Missing Values

If a metric is missing the value should be reported as a character NA value. If the metric value is NA the variables related to the metric quality (_quality and confidence intervals (_lb, _ub)) should always also be NA – there should never be a value reported for quality when the metric is missing or a value reported for the metric when quality is missing, these must always align. Geography and subgroup information should still be included when the metric is missing so it is clear what is missing.  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly