Skip to content
Butch Landingin edited this page Dec 15, 2022 · 1 revision

DHS Notes

File formats:

  • *.DO files are dictionaries that translate the original columns in the stata file (.DTA) to a human understable column.

Reading DHS '.DTA' files

  • povertymapping.utils.data_utils.get_base_dhs_df and geowrangler.dhs.load_dhs_file do the exact same thing except geowrangler returns column names in lower case

  • there are a lot of duplicate column names in the loaded data frame.

    • for TL (TLHR17FL.DO and TLHR17FL.DTA) there 3658 columns of which 316 are unique (note: for the data utils version which doesn't lower case the columns there are 317 unique colums since there two non-unique column names Index to Household Schedule and Index to household schedule which will map to the same lower-case column name)
Clone this wiki locally