-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip values (moreso, RandomRuns) we have already tested #207
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any concerns about memory use, particularly for large run count scenarios?
src/Fuzz/Internal.elm
Outdated
gen : List Int -> Fuzzer a -> GenResult a | ||
gen randomRun fuzzer = | ||
generate (PRNG.hardcoded (RandomRun.fromList randomRun)) fuzzer | ||
|
||
|
||
genR : Int -> Fuzzer a -> GenResult a | ||
genR seed fuzzer = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these could use a more descriptive name 😛
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry these shouldn't be in this PR 😅 Just wanted to check stuff in my REPL, and forgot about them when git add .
-ing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully they're now all gone!
@@ -282,7 +309,7 @@ allSufficientlyCovered c state normalizedDistributionCount = | |||
True | |||
|
|||
AtLeast n -> | |||
Test.Distribution.Internal.sufficientlyCovered state.runsElapsed count (n / 100) | |||
Test.Distribution.Internal.sufficientlyCovered (state.runsElapsed + state.runsSkipped) count (n / 100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to include the skipped runs in the distribution coverage? I'm not saying no, but I think it's a question with a non-obvious answer...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the answer should be yes: we already know what the fuzzed value was when skipping, so we can count it into the fuzzer distribution. What we skipped was running the test function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But is it valuable information? Imagine that I want to classify even/odd, and my generator generates the following:
0, 1, 2, 4, 1, 6, 1, 8, 1, 10, 1, 12, 1
Then I'll see that I had a an 50%/50% distribution of even/odd, but does it give me any more confidence about what I'm testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe fuzzer distribution is about classifying what the fuzzer gives you, irregardless of the test function (... -> Expectation
). ie. you could just as well Fuzz.examples 10000 myFuzzer
in the REPL and then do some statistics on that, and get the same result.
Test skipping only deals with skipping the test function, not skipping the fuzzing. This PR shouldn't change the behaviour of the fuzzer distribution reporting.
I believe I understand your issue, but it seems to me it's unrelated to this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems to me it's unrelated to this PR
That's fair enough. I think the main relation here is that the implementation here could implement both concerns by simultaneously being simpler. However, if you don't feel like wading into those waters in this PR, that's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the main relation here is that the implementation here could implement both concerns by simultaneously being simpler.
I think you're suggesting we remove skipped values from the distribution counts. But wouldn't that invalidate the claim that the distribution counts deal with what the fuzzer returns (again, thinking about Fuzz.examples 10000 myFuzzer
)? That would change it into "unique values the fuzzer returns". I don't know if that's helpful.
It would basically change
Distribution report:
====================
2-19: 90.1% (27007x) ███████████████████████████░░░
1: 5% (1485x) █░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
20: 4.9% (1468x) █░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
<1, >20: 0% (0x) ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
into
Distribution report:
====================
2-19: 90% (18x) ███████████████████████████░░░
1: 5% (1x) █░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
20: 5% (1x) █░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
<1, >20: 0% (0x) ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
That to me is a different feature 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unique values is what we care about though isn't it? So distribution of unique values is the valuable thing that we should be highlighting to developers.
Also I disagree that distributions are just about the Fuzzer - they are about giving you confidence that your fuzz test is adequately sampling the problem space and not just focusing on some particular corner of it.
Additionally there could be some nice highlighting of overproduction of duplicates:
Distribution report:
====================
2-19: 90% (18x) ███████████████████████████░░░
1: 5% (1x) █░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ ⚠️ Fuzzer produces 1480 duplicated values
20: 5% (1x) █░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
<1, >20: 0% (0x) ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
or some such, but that's perhaps just overcomplicating things for little benefit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unique values is what we care about though isn't it? So distribution of unique values is the valuable thing that we should be highlighting to developers.
That's where our understanding differs currently -- I'll think about it and give the idea a chance! Perhaps you're right
} | ||
, failure = Just failure | ||
} | ||
|
||
Nothing -> | ||
if state.runsElapsed < c.runsNeeded then | ||
if state.runsElapsed < c.runsNeeded && state.runsSkipped < c.skipsAllowed then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't there be some sort of distinct failure when the skipsAllowed
is exceeded? That basically means your Fuzzer
is kind of rubbish, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
skipsAllowed
exceeded could mean you've exhausted all possible values (or at least, finding new values is improbable). Consider Fuzz.bool
or Fuzz.intRange 1 10
. Over 100 runs those could skip 98 times and 90 times respectively (depending on your settings)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this reason I don't think we should fail. If anything, we could stop early, but let's do that when implementing exhaustiveness? This way skipping basically only lets you fuzz more diverse inputs / not waste time on things you already know answer to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we can punt on this question until we implement exhaustiveness. Because in those cases the test runner should just switch to exhaustive checking, so this kind of failure should be reserved for more pathological fuzzers:
Fuzz.floatRange 0 1
|> Fuzz.andThen (\p ->
if p < 0.9 then
Fuzz.constant 0
else
Fuzz.intRange 0 100
)
which would just be kind of bad, but not really exhaustive...
@@ -75,25 +75,19 @@ fromExpectation labels expectation summary = | |||
expectation | |||
|> Test.Runner.getDistributionReport | |||
|> Runner.String.Distribution.report labels | |||
|
|||
summaryWithDistribution : Summary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes in this module improve the visuals of how we present the fuzzer distribution tables in our homebrew test runner. (This is not exposed to users in any way)
I can try to measure that. For now here are runtime counts: I've tried over multiple repos and the result is roughly the same everywhere: skipping doesn't give any performance benefit (it adds an overhead if anything), all it does is give you more diverse inputs. |
What number of runs are those charts for? |
@gampleman 50 runs per configuration |
I wonder if 50 is too low to: a) manifest all that much duplication Could we also test with say 10,000 runs? |
Each run of the elm-test test suite (where various fuzzers differ, some have EDIT: sorry if I previously led you to believe the number 50 was for the |
Ah OK, that makes sense. I was a bit surprised you were doing fewer than the default 100... |
Out of curiosity I'll try another set of |
Yeah I think we would need some nice test for the F-metric to be able to also see the upside of a PR like this. FWIW I don't think that's a too bad perf degradation, and ultimately helps with the mission of actually finding bugs, so 👍 |
I think I vaguely recall this mentioned in our discussion of quasirandom numbers? I'm a stats noob, could you please share some Wikipedia/paper links for its definition etc.? My googling is returning F-score (not sure it is what you're talking about) and F-metric being some kind of web development stuff (that definitely isn't it) :) |
It's a horrible name, since it's almost impossible to google for. I mentioned it in here:
So I suspect we could make a benchmark of Fuzz tests that fail in various circumstances and then run them with a high number of You could also not report out |
WIP
closes #11 without need for major version since we don't add a new Failure variant.
Allows
FuzzOptions.runs * 2
skips of values already seen.This, ironically, speeds up our fuzzer distribution machinery as well.
Exhaustive checking (#188) would help us save even more unneeded work (we'd know we can stop skipping) -- and I want to do that -- but I'm not trying that in this PR. The skipping works reasonably well even without it.
Should be working already! Now just measuring whether this has any noticeable impact, with
tests/randomized-tests.sh
.