Dataset Alignment

This page documents the dataset alignment process performed for this project.

Gridding

We preprocess the different datasets used in this project into a 150m grid. The grid is generated using Bing Tiles Map System. The main advantage of using Bing Tile is its use of "quadkeys", an indexing system that allows us to determine the x/y location of the grid from its quadkey id.

Dataset Alignment

Each dataset, which can be raster or vector, are aligned to the grids using the following process:

Raster Data

Raster data and gridded vector data are aligned by taking the zonal statistics of the dataset (i.e. min, max, mean, median, and count) over each 150m grid tile.

Vector Categorical Features

For categorical vector datasets such as soil type and lithology, we also assign a value to the grid based on the polygon value with the highest intersection over each grid.

Vector Distance Features

For some raster datasets such as rivers and roads, we take the distance of each grid from the nearest feature.

Census Data

For census data, which is given in block form, we disaggregate the block data according to the number of households present in each grid.

Travel time to healthcare centers

For healthcare centers, we calculate isochrones from each health center, indicating the areas that can be reached within 15, 30, 45, and 60 mins of travel time

Lattice features

After dataset alignment, we also calculate lattice features, which is the average feature value of surrounding grid tiles. This allows the susceptibility model to take into account the properties of the surrounding area.

lattice1

lattice2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset Alignment

Dataset Alignment

Gridding