quality_data: overview: Fix incorrect field summing passing grants #247

michaelwood · 2024-12-13T17:47:33Z

When calculating the overview stats for /api/dashboard/overview?mode=grants the logic of fail/pass was being used to count the total number of grants which passed by just using the total number of grants that are in the source file.

This is only correct some of the time because it just so happens that if a publisher is likely to be missing a field (such as company number) then they will miss it out of all of their datasets so the calculation is coincidentally correct. Some publishers however do add such fields for some of their grants which makes this calculation completely incorrect for certain metrics.

This change makes sure we use the total count from the DQT (library) rather than the aggregated data.

Updates basic test data to corrected value.

Fixes: #246

michaelwood · 2024-12-13T17:54:52Z

@mariongalley the impact of this fix is that three metrics will be corrected significantly on https://qualitydashboard.threesixtygiving.org/alldata#grants :

"Includes recipient locations codes" goes to 65% from 80%
"Includes at least one charity or company no." goes to 54% from 108%
"Includes grant duration" goes to 47% from 52%

michaelwood · 2024-12-13T17:59:56Z

I've created #248 as we need some worse quality test data to get better nuances on automated quality tests in the datastore for this, currently the totals generally add up to either 0% or 100% which can too easily coincidentally happen.

datastore/data_quality/quality_data.py

When calculating the overview stats for /api/dashboard/overview?mode=grants the logic of fail/pass was being used to count the total number of grants which passed by just using the total number of grants that are in the source file. This is only correct some of the time because it just so happens that if a publisher is likely to be missing a field (such as company number) then they will miss it out of all of their datasets so the calculation is coincidentally correct. Some publishers however do add such fields for _some_ of their grants which makes this calculation completely incorrect for certain metrics. This change makes sure we use the total count from the DQT (library) rather than the aggregated data. Updates basic test data to corrected value. Fixes: #246

R2ZER0

Looks good 👍

michaelwood · 2024-12-16T14:37:20Z

deployed

michaelwood requested review from R2ZER0 and mariongalley December 13, 2024 17:47

michaelwood mentioned this pull request Dec 13, 2024

Add worse quality test data to better test nuances in calculations for quality data #248

Open

R2ZER0 reviewed Dec 16, 2024

View reviewed changes

datastore/data_quality/quality_data.py Outdated Show resolved Hide resolved

michaelwood force-pushed the mw/fix_quality_grants_overview_stats branch from 35ad43a to f0beb5c Compare December 16, 2024 14:25

michaelwood requested a review from R2ZER0 December 16, 2024 14:26

R2ZER0 approved these changes Dec 16, 2024

View reviewed changes

michaelwood merged commit 8f6cdb0 into live Dec 16, 2024
4 of 6 checks passed

michaelwood deleted the mw/fix_quality_grants_overview_stats branch December 16, 2024 14:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quality_data: overview: Fix incorrect field summing passing grants #247

quality_data: overview: Fix incorrect field summing passing grants #247

michaelwood commented Dec 13, 2024

michaelwood commented Dec 13, 2024

michaelwood commented Dec 13, 2024

R2ZER0 left a comment

michaelwood commented Dec 16, 2024

quality_data: overview: Fix incorrect field summing passing grants #247

quality_data: overview: Fix incorrect field summing passing grants #247

Conversation

michaelwood commented Dec 13, 2024

michaelwood commented Dec 13, 2024

michaelwood commented Dec 13, 2024

R2ZER0 left a comment

Choose a reason for hiding this comment

michaelwood commented Dec 16, 2024