Skip to content

Dataset Alignment

JC Nacpil edited this page Jun 5, 2024 · 5 revisions

Dataset Alignment

This page documents the dataset alignment process performed for this project.

Gridding

We preprocess the different datasets used in this project into a 150m grid. The grid is generated using Bing Tiles Map System. The main advantage of using Bing Tile is its use of "quadkeys", an indexing system that allows us to determine the x/y location of the grid from its quadkey id.

image image

Dataset Alignment

Each dataset, which can be raster or vector, are aligned to the grids using the following process:

Raster Data

Raster data and gridded vector data are aligned by taking the zonal statistics of the dataset (i.e. min, max, mean, median, and count) over each 150m grid tile.

image

Vector Categorical Features

For categorical vector datasets such as soil type and lithology, we also assign a value to the grid based on the polygon value with the highest intersection over each grid.

Vector Distance Features

For some raster datasets such as rivers and roads, we take the distance of each grid from the nearest feature.

image

Census Data

For census data, which is given in block form, we disaggregate the block data according to the number of households present in each grid. image

Travel time to healthcare centers

For healthcare centers, we calculate isochrones from each health center, indicating the areas that can be reached within 15, 30, 45, and 60 mins of travel time image

Lattice features

After dataset alignment, we also calculate lattice features, which is the average feature value of surrounding grid tiles. This allows the susceptibility model to take into account the properties of the surrounding area.

lattice1

lattice2