hindi-fuzzy-merge

This repository contains customizable Fuzzy Matching scripts written in STATA and Python, expecially useful for datasets containing Hindi text transliterated to English.

Overview

This algorithm is motivated by the fact that Hindi names written in Devanagari script are not transliterated in a consistent way to Latin script. Although fuzzy matching programs exist, most are optimized for text originally written in Latin script, and so they perform poorly when applied to Hindi transliterated names.

We also found that match rates could be improved substantially by taking a stepwise approach, starting with the most certain matches and progressively loosening restrictions. False matches in fuzzy matching algorithms propagate: an early false match that incorrectly removes an individual from the match pool leads the algorithm to make false matches with other individuals in later steps.

By completing more certain matches before moving onto less certain matches, we found that our stepwise algorithm reduced false match rates more than running a fuzzy match program a single time.

Directory Structure

.
|-- hindi-fuzzy-merge
     |-- fuzzymerge-python # Directory with an example of the algorithm implemented in Python for matching household survey results with data collected from school registers
     |-- fuzzymerge-stata # Directory with an example of the algorithm implemented in STATA for matching household census data with voter rolls
     |-- transliteration # Directory with example code for trasliteration of Devanagiri script to English using Polyglot Python package

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
fuzzymerge-python		fuzzymerge-python
fuzzymerge-stata		fuzzymerge-stata
transliteration		transliteration
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hindi-fuzzy-merge

Overview

Directory Structure

About

Releases

Packages

Contributors 2

Languages

IDinsight/hindi-fuzzy-merge

Folders and files

Latest commit

History

Repository files navigation

hindi-fuzzy-merge

Overview

Directory Structure

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages