Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add tutorial about how to merge cps basic, asec, supplements #216

Open
ajdamico opened this issue Mar 25, 2017 · 2 comments
Open

add tutorial about how to merge cps basic, asec, supplements #216

ajdamico opened this issue Mar 25, 2017 · 2 comments
Assignees
Milestone

Comments

@ajdamico
Copy link
Owner

does not work perfectly in 2016 because of the parallel sample, but you can proportionally upweight everyone

Hi CPS team, I am trying to merge health insurance information from the March CPS-ASEC 2011-2016 files onto the March Monthly Basic CPS files of the same years. When I match on the ASEC variables "h_idnum1" , "h_idnum2" , "a_lineno" with the CPS Basic variables "hrhhid" , "hrhhid2" , "pulineno" I match 100% of the CPS Basic records with non-zero final weight ( PWSSWGT > 0 ) for 2011-2015. However, in 2016, I only match 93% of records. I am finding 8,606 person-records in the 2016 March CPS Basic file with ( PWSSWGT > 0 ) who do not have a matching record in the CPS-ASEC March 2016.

Has something changed in the structure of the 2016 microdata that would prevent this merge from matching as of 2016?

Some notes about what I've attempted:

When I perform this merge for 2011, 2012, and 2013, I precisely match the 132,275 // 131,372 // 130,534 statistics at the bottom of PDF page 25 of the IPUMS-CPS working paper on the topic of merging the ASEC with the Basic files. https://cps.ipums.org/cps/resources/linking/4.workingpaper16.pdf#page=25

I checked the CPS-ASEC technical documentation's section on this merge on PDF page 14 [2], but it didn't mention anything specific about 2016 (only 2014, which merged cleanly). http://www2.census.gov/programs-surveys/cps/techdocs/cpsmar16.pdf#page=14

Here's some pseudo-code to outline the straightforward merge that I'm attempting:

for( year in 11:16 ){

	load( paste0( "C:/Path/To/CPS/20" , year , "_03_cps_basic.rda" ) )
					
	load( paste0( "C:/Path/To/CPS/20" , year , "_03_cps_asc.rda" ) )

	# merge across the two data sets
	x <- merge( asec , basic , by.x = c( "h_idnum1" , "h_idnum2" , "a_lineno" ) , by.y = c( "hrhhid" , "hrhhid2" , "pulineno" ) )
	
	# in 2011, 2012, 2013, this number exactly matches the ipums count in their working paper
	print( nrow( x ) )

	# in 2011-2015, this test/check succeeds.
	# in 2016, this test fails.
	# the merged file contains 117,596 records with nonzero weight, compared to 126,202 in the basic file
	stopifnot( nrow( subset( x , pwsswgt > 0 ) ) == nrow( subset( basic , pwsswgt > 0 ) ) )

}

Thanks!

@ajdamico ajdamico added this to the v0.2.0 milestone Mar 25, 2017
@ajdamico ajdamico self-assigned this Mar 25, 2017
@raheem03
Copy link

Had the same problem and stumbled on this thread while Googling. Flood and Pacas (IPUMS) have written about this problem and it looks like it is yet another consequence of Census messing up the 2014 redesign (the gift that keeps on giving). See Section 2d of this link:

"To later evaluate the effects of the redesigned health insurance questions, the Census Bureau used a split-path assignment to randomly select about 6,000 households from the 2016 and 2017 March Basic Monthly sample to answer the complete pre-2014 ASEC questionnaire [17]. As a result, we cannot locate a subset of March Basic respondents in the ASEC for 2016 and 2017 (see Table 3)."

Table 3 from that link suggests that 8,638 observations will not match for March 2016 (which I was able to match).

@ajdamico
Copy link
Owner Author

2019 is slightly different (and haven't tested the research or bridge files)

asec19 <- readRDS( paste0( tempdir() , "/2019 cps asec.rds" ) )
basic19 <- readRDS( paste0( tempdir() , "/2019 03 cps basic.rds" ) )

basic19[ , 'newid' ] <- 
	paste0( 
		stringr::str_pad( basic19[ , 'hrhhid' ] , 15 , pad = '0' ) ,
		stringr::str_pad( basic19[ , 'hrhhid2' ] , 5 , pad = '0' )
	)

x19 <- merge( asec19 , basic19 , by.x = c( 'h_idnum' , 'a_lineno' ) , by.y = c( 'newid' , 'pulineno' ) )

nrow( x19 ) # about 118k

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants