Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Values of dataframe in psid_data_setup.py #20

Closed
prrathi opened this issue Feb 6, 2021 · 7 comments · Fixed by #24
Closed

Values of dataframe in psid_data_setup.py #20

prrathi opened this issue Feb 6, 2021 · 7 comments · Fixed by #24

Comments

@prrathi
Copy link
Contributor

prrathi commented Feb 6, 2021

The final dataframe panel_liof psid_data_setup.py seems to have issues with the values in its columns. Couple that I noticed while working with some (through the pickle file that the code stores) were:

  • many of the values in column num_children were null
  • the values in column spouse_age were 0

I suspect this is the case for several other columns as well. These columns also weren't manipulated in psid_data_setup.py, so @MaxGhenis suggested looking back at the psidR package.

@prrathi
Copy link
Contributor Author

prrathi commented Feb 7, 2021

Just experimented more with the variables, a lot of the columns were only used for specific years or have been discontinued.
In the example I gave, num_children was only used in 2007, so it's null for the majority of instances. Another example of this is fam_smpl_wgt_core which is discontinued after 1992. Also, the spouse_age isn't an issue it's intended to be 0 sorry about that.

@jdebacker
Copy link
Member

@prrathi Let me know if anything here is a major issue. Note that in the psid_download.R script, I'm pulling some variables we may not currently be using just in case we need them for future calibration efforts. I was also liberal in the drawing of variables related to sampling weights since I am not a PSID expert and wasn't sure what was of most use in different contexts.

@prrathi
Copy link
Contributor Author

prrathi commented Feb 7, 2021

@jdebacker I believe '# of Individual Records' with most recent name ER52168 would be a beneficial add, has data since 1981 about total number in each household, which I don't think is part of the variables currently used.

@MaxGhenis
Copy link
Contributor

In the example I gave, num_children was only used in 2007, so it's null for the majority of instances. Another example of this is fam_smpl_wgt_core which is discontinued after 1992.

So there's no associated variable for num_children in other years in the crosswalk spreadsheet?

Also, the spouse_age isn't an issue it's intended to be 0 sorry about that.

Cool, so it's zero for unmarried people and nonzero for married people?

@jdebacker
Copy link
Member

@prrathi That probably is a useful variable to pull. Please feel free to open a PR to add that (or any other variables you think would be useful).

@prrathi
Copy link
Contributor Author

prrathi commented Feb 7, 2021

So there's no associated variable for num_children in other years in the crosswalk spreadsheet?

That specific variable is for children 30 and less, with no associated variables, num_children_under18 is present across the years

Cool, so it's zero for unmarried people and nonzero for married people?

yes

@MaxGhenis
Copy link
Contributor

Ah I'm seeing I found that about num_children in #6 (comment). Removing num_children from the data pull probably makes sense in that case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants