# DHS Notes ## File formats: * `*.DO` files are dictionaries that translate the original columns in the stata file (`.DTA`) to a human understable column. ## Reading DHS '.DTA' files * `povertymapping.utils.data_utils.get_base_dhs_df` and `geowrangler.dhs.load_dhs_file` do the exact same thing except geowrangler returns column names in lower case * there are a lot of duplicate column names in the loaded data frame. - for TL (`TLHR17FL.DO` and `TLHR17FL.DTA`) there `3658` columns of which `316` are unique (note: for the data utils version which doesn't lower case the columns there are `317` unique colums since there two non-unique column names `Index to Household Schedule` and `Index to household schedule` which will map to the same lower-case column name)