You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You make a systematic error when calculating rolling averages in the variant graph at the beginning of the time-window.
You always overestimate the variant share at the beginning.
Why? The window starts of being only of length 1 at the beginning, not taking into account the sequences that didn't contain a variant before the first occurence. This biases the variant share systematically up, by quite a lot, causing misleading graphs. This is a real methodological problem.
I can understand why you're doing this, because you only send the data starting from the first occurrence, but this is not ok.
Two ways to fix it: Start the rolling average only at the time when the window is of full length (7 days), or include the 7 days before the first occurence in the calculation of the rolling average.
Either way, this is a high priority issue in my eyes because it is causing a real systematic error that leads to misinterpretation of the data that the naive viewer is not aware of (at least I wasn't until now, having looked at probably a hundred of your graphs, which I love by the way, don't get me wrong!).
Here you can see the problem documented, look at a couple of graphs and you'll notice, it always starts high and goes down, every single black line on every graph. Looking at the numbers shows why, the window starts only on day 1 not day -6 as it should.
The text was updated successfully, but these errors were encountered:
Thank you for raising this issue. I will look into it this week.
As you correctly pointed out, we would need to change the way that window is computed for the first day of detection.
It'd be 6 extra days you'd need to pull, or just start the line on day 7 as opposed to 1. Which would be the obvious quick fix.
I'd say it may be worth considering doing the quick fix not showing the line for the first 6 days until a permanent solution is found. Otherwise the graphs a systematically biased which is not good for the sake of science, trust etc.
Any progress on this? It's a real bug that makes the graphs systematically wrong and makes people draw wrong conclusions - therefore in my view high priority.
I've already submitted a PR that should be able to fix the issue immediately with a fairly high chance.
You make a systematic error when calculating rolling averages in the variant graph at the beginning of the time-window.
You always overestimate the variant share at the beginning.
Why? The window starts of being only of length 1 at the beginning, not taking into account the sequences that didn't contain a variant before the first occurence. This biases the variant share systematically up, by quite a lot, causing misleading graphs. This is a real methodological problem.
I can understand why you're doing this, because you only send the data starting from the first occurrence, but this is not ok.
Two ways to fix it: Start the rolling average only at the time when the window is of full length (7 days), or include the 7 days before the first occurence in the calculation of the rolling average.
Either way, this is a high priority issue in my eyes because it is causing a real systematic error that leads to misinterpretation of the data that the naive viewer is not aware of (at least I wasn't until now, having looked at probably a hundred of your graphs, which I love by the way, don't get me wrong!).
Here you can see the problem documented, look at a couple of graphs and you'll notice, it always starts high and goes down, every single black line on every graph. Looking at the numbers shows why, the window starts only on day 1 not day -6 as it should.
The text was updated successfully, but these errors were encountered: