Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Quality Comparison #152

Open
gwenbeebe opened this issue Apr 8, 2021 · 3 comments
Open

Data Quality Comparison #152

gwenbeebe opened this issue Apr 8, 2021 · 3 comments

Comments

@gwenbeebe
Copy link
Contributor

At 585, we're calculating out the data quality flags with this chunk here

data_quality_flags_detail <- pe_validation_summary %>%
  left_join(dq_flags_staging, by = "AltProjectName") %>%
  mutate(General_DQ = if_else(GeneralFlagTotal/ClientsServed >= .02, 1, 0),
         Benefits_DQ = if_else(BenefitsFlagTotal/AdultsEntered >= .02, 1, 0),
         Income_DQ = if_else(IncomeFlagTotal/AdultsEntered >= .02, 1, 0),
         LoTH_DQ = if_else(LoTHFlagTotal/HoHsServed >= .02, 1, 0))

but the dq_flags_staging dataframe doesn't have the flags filtered by when folks entered. If we create dq_flags_staging but restrict the benefits and income flags just to clients entering program in that time period, we end up with fewer programs in the detail dataframe.

I think that means that we can have entries in the numerator that aren't necessarily included in the denominator--is that what we want? It seems like we might want to restrict our flags to the entries that we're flagging as relevant to the time period for that flag type.

@kiadso
Copy link
Contributor

kiadso commented Apr 8, 2021

I agree with you, and I think this is something I realized last year but didn't have time to correct and felt like it was a low priority thing to fix. I think I'm going to leave this open and if we're able to fix it this year then good, otherwise it's something we can fix for next year.

@gwenbeebe
Copy link
Contributor Author

Would it cause problems elsewhere if we added entry exit IDs to our list of our variables to keep in the data quality script? If not, this should be a relatively easy fix and I can knock it out!

@kiadso
Copy link
Contributor

kiadso commented Jul 16, 2021

I think it should be ok to add EnrollmentID to that. Sorry I did not see this question until today!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants