GitHub - mansueto-institute/kblock-analysis: Replication code and data for https://arxiv.org/abs/2307.16328

Analysis of street access and development in sub-Saharan Africa

Block complexity aggregation and visualization:

complexity-analysis.R aggregates the block level data and generates several visualizations used in the analysis. The script uses aggregation_func.R to facilitate aggregation from the the block level to higher geographic scales.
complexity-analysis.R will automatically download several GB of block level data and add it to folders within the top level of the repo. The database is available at the website millionneighborhoods.africa with a data dictionary available here. The script also uses the following files in this repo: data /buildings_lusaka.geojson, data/streets_lusaka.geojson. Data sources and methods for generating the block level population and informality data are available in the mansueto-institute/kblock repo.

Demographic and Health Survey (DHS) statistical analysis:

dhs-analysis.R downloads DHS data via an API connection, spatial joins DHS administrative regions to the block level database, aggregates blocks statistics to the DHS regions, and then performs a statistical analysis that runs correlations, PCA, and regressions on the relationships between block statistics and social development indicators corresponding to human well-being and household characteristics.
dhs-analysis.R must be run after the step in the complexity-analysis.R that downloads the block level database. When running on your own computer please be sure to register and add your own DHS login credentials to the script (this requires undergoing a dataset access approval process). The pre-processing steps that involve API calls to the DHS website and spatial joins may take an hour of processing time. This script also uses data downloaded from data/un-habitat. To account for street densities this script uses streets_dhs_regions.csv.

Block complexity graph visualizations:

graph-viz.R visualizes block complexity in the format of a network graph. This script uses the graph_funcs.R to generate a complexity graph. These functions were specifically designed for the R programming language to replicate the basic functionality of the kblock repo, which performs the actual method used in the scientific article and associated datasets.

Below are functions applied in the graph-viz.R script which demonstrate how to compute block complexity graphs and visualize parcel level diagrams. Again these methods were designed for heuristic purposes and not used in production at scale.

# Function to generate graph using building parcels (graph_polys) and on-network streets that connect interior buildings to exterior roads (graph_exclude)
graph_lines <- generate_graph(graph_polys = building_parcels_polygons, graph_exclude = street_linestrings)
# Function to convert graph to polygon representation
graph_polys <- convert_to_poly(lines = graph_lines)
# Function to compute block complexity at the parcel level such that each row to the output represents each subsequent interior layer of buildings. 
graph_duals <- extract_duals(graph_sf = graph_polys)

The script uses data/layer_lusaka.geojson, which contains data for a community area in Lusaka, Zambia.

Validation analysis:

To ground-truth block complexity statistics use sdi-analysis.R which compares block geometries to SDI settlements. Note, the data requires special access permissions. Contact authors for more information.
To validate the completeness of OSM street data use street-analysis.R which utilizes three data files: streets_metrics.parquet, streets_summary.csv, and streets_validation.csv.
To compare block complexity performance to alternative predictors of wealth, see these statistical comparisons with Blumenstock et al. (2022) and Yeh et al. (2020)
To test the fit of the following distributions to different geographies use distribution-analysis.R.

Workflow diagram:

graph LR
G[DHS API] --> C
A[africa_data.parquet] --> B[Step 1: complexity-analysis.R]
M[africa_geodata.parquet] --> B
M[africa_geodata.parquet] --> C

Q[Step 0: kblock.git] --> M

E[aggregation_func.R] --> B
A --> C[Step 2: dhs-analysis.R]
B --> Z[Summary stats and<br>visualizations]
C --> X[Statistical analysis and<br>visualizations]
Q[Step 0: kblock.git] --> A

F[graph_funcs.R] --> D[Step 3: graph-viz.R]
J[layer_lusaka.geojson] --> D

K[buildings_lusaka.geojson] --> B
L[streets_lusaka.geojson] --> B

D --> Y[Complexity graph<br>visualization]

style A fill:#eba8d3
style G fill:#eba8d3
style J fill:#eba8d3
style K fill:#eba8d3
style L fill:#eba8d3

style Z fill: #f7f5eb
style X fill: #f7f5eb
style Y fill: #f7f5eb

style F fill: #ADD8E6
style E fill: #ADD8E6

style Q fill: #b1d8b7

Details on R environment for replication

R version 4.1.2 (2021-11-01)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 13.4.1

Package	Version	Package	Version	Package	Version
tidyverse	1.3.1	broom	1.0.3	readxl	1.3.1
sf	1.0-8	betareg	3.1-4	osmdata	0.1.9
units	0.8-0	ggpmisc	0.5.2	Hmisc	4.8-0
viridis	0.6.2	kableExtra	1.3.4	scatterpie	0.1.7
patchwork	1.1.1	xtable	1.8-4	tidymodels	0.2.0
scales	1.2.1	ggplot2	3.4.1	ggrepel	0.9.3
rdhs	0.7.5	tidygeocoder	1.0.5	readr	2.1.1
rmapshaper	0.4.6	writexl	1.4.0	DescTools	0.99.48
arrow	7.0.0	ggsn	0.5.0	ggpubr	0.6.0
sfarrow	0.4.1	geoarrow	0.0.0.9000

Contact

Nicholas Marchio, data scientist at the Mansueto Institute. For any technical inquiries please feel free to create a Git issue and tag nmarchio.

For related technical work see the following repos:

kblock: Python codebase for generating the underlying block complexity and population data available at millionneighborhoods.africa.
geopull: Python package for extracting OSM data and generating street block delineations.
cloudtile: CLI for converting (Geo)Parquet files to PMTiles on AWS.
prclz: Python codebase from previous iteration of block complexity research (see kblock for more recent version).

Restricted data dependencies for SDI and DHS (contact author for more info)

Google drive with restricted data for full reproducibility for analysis.
UChicago Box with restricted data for full reproducibility for analysis.
Google drive with data for minimal reproducible example of block complexity metrics.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analysis of street access and development in sub-Saharan Africa

Block complexity aggregation and visualization:

Demographic and Health Survey (DHS) statistical analysis:

Block complexity graph visualizations:

Validation analysis:

Workflow diagram:

Details on R environment for replication

Contact

Restricted data dependencies for SDI and DHS (contact author for more info)

About

Releases 3

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
ReadMe.md		ReadMe.md
aggregation_func.R		aggregation_func.R
complexity-analysis.R		complexity-analysis.R
dhs-analysis.R		dhs-analysis.R
distribution-analysis.R		distribution-analysis.R
graph-viz.R		graph-viz.R
graph_funcs.R		graph_funcs.R
sdi-analysis.R		sdi-analysis.R
street-analysis.R		street-analysis.R

License

mansueto-institute/kblock-analysis

Folders and files

Latest commit

History

Repository files navigation

Analysis of street access and development in sub-Saharan Africa

Block complexity aggregation and visualization:

Demographic and Health Survey (DHS) statistical analysis:

Block complexity graph visualizations:

Validation analysis:

Workflow diagram:

Details on R environment for replication

Contact

Restricted data dependencies for SDI and DHS (contact author for more info)

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Packages