You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a good first issue once you get to the qc part of the workflow. It's not the easiest first new thing I'd recommend working on.
Over different versions of qcing and gapfilling temp and precip, I've tried different ways of setting up qc functions (e.g., qc functions written for wide version data vs. qc functions written for tidy [long] version data). I was often under time crunches when doing qc and so there was always a pull between write something generic that can be recycled and applied easily in the future vs. write code to get the thing done now (which doesn't necessarily not recycle well). So this is to say: all functions needed to do the checks needed below already exist in some version, they are just scattered across different iterations of qc scripts that live in NWT_climate_infilling/daily_met/[v1 and v0 example scripts].
I will describe the checks needed here and the document more how these check have been attempted/where preexisting code lives in a wiki.
QC checks needed (suggested order applied as well):
Within station
Timestamp check: Are all expected date-times present? [if not, add in] Are any date-times duplicated? [if yes, flag it and either resolve asking Jen/checking metadata or [determine other action for keeping both records with flag]).
Logical/plausible value check: Are there any nonsensical values (e.g., negative precip), or values outside detection limits of instrument (need to check instrument manual for this, but those usually turn up online in a Google query and I may already have them described in metadata that exists for climate datasets published on EDI)? Anything that fails these get automatically removed with the reason [flag] noted in the final dataset.
Another form of this ^ check is doing a pre-screen of possible suspect values per Jen's opinion (i.e., Jen wouldn't expect to see any value outside of [range]). These aren't automatic removals, but something to follow as you go through the qc process. If they are truly extreme/outliers and were artificially caused (e.g., instrument fail), they will likely either fail the absolute value check and/or rate change check and/or comparative station check, so it's not totally necessary to write a check for this.
Observed value check: This screens the observed daily value typically using some statistical method and flags if it exceeds a tolerance threshold. A popular form is this check (and the one I've used most) uses # std deviations from the mean, i.e. z-score (e.g., daily observed value is flagged if it is more than 5 or 6 sds away from the mean). This should be checked for each temp metric (e.g., tmax, tmean, tmin, and maybe diurnal):
globally (calculated using all observations for a metric in time series)
monthly (calculated using all obs for metric just in that month in time series)
A thorough QC procedure reviews the top 10 observed values (10 most positive and 10 most negative) for each of those as well by default.
Anything that gets flagged in this check isn't automatically removed, but should be considered with rate change and comparative station check results. Be conservative (non-greedy with flagging) if you run this on diurnal temp as diurnal temp properties are a little different in statistical distribution than something like tmax, tmean ,tmin. You also want to use a fairly generous threshold to allow for natural, wide variation in alpine temps.
Another form of this check I wanted to write for temp that would be good and there is code already developed (in the hrly_met workflow in this repo) for it is a check from Meek and Hatfield and uses the sin-cosine form of temperature. I would write this to flag anything that lies outside a 90 or 95% prediction interval. I'd say build the standard dev check first and if time add the Meek and Hatfield check (so I'll make an issue for it as nice to have but not an immediate need).
Rate change check: For each metric, does the day-to-day [or timestep-to-timestep] rate change exceed a tolerance threshold? This is similar to the absolute value check but instead of checking the observed daily value you apply the check to the difference between [value at time x] - [value at time x-1 timestep]. There are different ways to write this, but I've always used z-scores.
An important type of rate change check that applies to chart data (mechanical instrumentation) is a flatline check (x consecutive days of 0 change, where you allow the user to specify x). This would not be an automatic removal, but important to consider with the comparative rate change check (i.e. did nearby stations record little change also?). It's helpful to compare the flagged data with rate change for other temp metrics within station (i.e., if tmin flatlined, did tmax also flatline in that same period?), and consider time of year (a flatline in winter is more likely, the instrument might freeze up [and that might even be recorded in the chart notes already!]; a flatline in summer months would be more unexpected). Finally, consider the instrument: it would be odd to see a flatline in an electronic logger, maybe more allowable for mechanical instruments if nearby stations with mechanical instruments also recorded little day-to-day change.
Comparative (target vs. regional stations)
Pairwise observed difference check: This is like the within-station observed value check but instead of looking at z-scores for the observed value at a station you calculate z-scored for the daily pairwise difference between target and a comparator station. Because of localized weather at NWT, use at least several comparative stations for this check (e.g., if there is a artificially different value to flag, it will probably be notable for the target compared with at least a handful of other comparators). For this check, having a hierarchy of stations for comparison (by their geographic proximity and topographic and instrument similarity) is helpful, although you can compare the target with more of the regional stations too (I did this for the most recently published SDL temp dataset, but also because I was on a time crunch).
Pairwise rate change check: Same tolerance threshold idea, but comparing daily rate change at target with the rate change at comparative stations. I forget if I first took a pairwise difference of differences (e.g., diff target rate change from comparator rate change), or just calculated z-scores on the rate-changes themselves (e.g., the 'mean' is the average value of rate changes at C1, D1 and SDL for January, and a z-score is then calculated for each rate change value).. I maybe tried it both ways.. or maybe mathematically these two ways give you the same thing? [my brain is blanking right now] Check code and I will re-check in literature what the best practice is.
The text was updated successfully, but these errors were encountered:
This is a good first issue once you get to the qc part of the workflow. It's not the easiest first new thing I'd recommend working on.
Over different versions of qcing and gapfilling temp and precip, I've tried different ways of setting up qc functions (e.g., qc functions written for wide version data vs. qc functions written for tidy [long] version data). I was often under time crunches when doing qc and so there was always a pull between write something generic that can be recycled and applied easily in the future vs. write code to get the thing done now (which doesn't necessarily not recycle well). So this is to say: all functions needed to do the checks needed below already exist in some version, they are just scattered across different iterations of qc scripts that live in NWT_climate_infilling/daily_met/[v1 and v0 example scripts].
I will describe the checks needed here and the document more how these check have been attempted/where preexisting code lives in a wiki.
QC checks needed (suggested order applied as well):
Within station
Timestamp check: Are all expected date-times present? [if not, add in] Are any date-times duplicated? [if yes, flag it and either resolve asking Jen/checking metadata or [determine other action for keeping both records with flag]).
Logical/plausible value check: Are there any nonsensical values (e.g., negative precip), or values outside detection limits of instrument (need to check instrument manual for this, but those usually turn up online in a Google query and I may already have them described in metadata that exists for climate datasets published on EDI)? Anything that fails these get automatically removed with the reason [flag] noted in the final dataset.
A thorough QC procedure reviews the top 10 observed values (10 most positive and 10 most negative) for each of those as well by default.
Anything that gets flagged in this check isn't automatically removed, but should be considered with rate change and comparative station check results. Be conservative (non-greedy with flagging) if you run this on diurnal temp as diurnal temp properties are a little different in statistical distribution than something like tmax, tmean ,tmin. You also want to use a fairly generous threshold to allow for natural, wide variation in alpine temps.
Another form of this check I wanted to write for temp that would be good and there is code already developed (in the hrly_met workflow in this repo) for it is a check from Meek and Hatfield and uses the sin-cosine form of temperature. I would write this to flag anything that lies outside a 90 or 95% prediction interval. I'd say build the standard dev check first and if time add the Meek and Hatfield check (so I'll make an issue for it as nice to have but not an immediate need).
Comparative (target vs. regional stations)
Pairwise observed difference check: This is like the within-station observed value check but instead of looking at z-scores for the observed value at a station you calculate z-scored for the daily pairwise difference between target and a comparator station. Because of localized weather at NWT, use at least several comparative stations for this check (e.g., if there is a artificially different value to flag, it will probably be notable for the target compared with at least a handful of other comparators). For this check, having a hierarchy of stations for comparison (by their geographic proximity and topographic and instrument similarity) is helpful, although you can compare the target with more of the regional stations too (I did this for the most recently published SDL temp dataset, but also because I was on a time crunch).
Pairwise rate change check: Same tolerance threshold idea, but comparing daily rate change at target with the rate change at comparative stations. I forget if I first took a pairwise difference of differences (e.g., diff target rate change from comparator rate change), or just calculated z-scores on the rate-changes themselves (e.g., the 'mean' is the average value of rate changes at C1, D1 and SDL for January, and a z-score is then calculated for each rate change value).. I maybe tried it both ways.. or maybe mathematically these two ways give you the same thing? [my brain is blanking right now] Check code and I will re-check in literature what the best practice is.
The text was updated successfully, but these errors were encountered: