This repository consists of the clean data dump generated from the latest household consumption and expenditure survey (HCES), by the NSS. The raw microdata is publicly accessible and can be downloaded here.
The repository consists of the following files:
01_docs
: This consists of all the survey documentation, including questionnaire, file structure, caveats, state codes etc02_code
: This has Stata do files which generate clean data files03_raw
: This consists of all the raw files, taken from the NSS site. User is supposed to create this04_clean
: This has all the level-wise cleaned files, in parquet format.
The folder 02_code
has the following files
0_master.do
: Sets root directory, folder names etc1_clean.do
: Extracts all variables from raw txt files, by each level and preliminary cleaning2_mpce.do
: This has code to calculate and replicate sector and fractile wise MPCEconvert_csv.ipynb
: Python code to convert csv to parquet and back
To generate the files in 04_output
, do the following
- Download and unzip the repo
- Create a folder called
3_raw
and add the raw txt files from the NSS site, available here. You should get 15 .txt files. - Open the state project
hces.stpr
. From within the project, run first0_master
, followed by1_clean
- This creates Stata and csv files for each level
- [OPTIONAL] If you want to get Parquet files, run the
convert_csv
file (change directory as needed) to do so. - [FOR NON-STATA USERS] To convert the Parquet files to csv, open the
convert_csv
file and run the section which converts parquet to csv.
To replicate MPCE numbers by sector and fractile from report, follow steps below:
- Open
hces.stpr
- Run the whole
2_mpce.do
file after running0_master
- Run till line 160, to get MPCE by rural and urban sectors
- Run till line 223, to get MPCE by rural/urban for various fractiles.
These match the report numbers upto 2 decimal places.