Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dearbitrary #187

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from
Draft

Dearbitrary #187

wants to merge 7 commits into from

Conversation

LeoDog896
Copy link

@LeoDog896 LeoDog896 commented Aug 4, 2024

Resolves #44.

I would greatly appreciate feedback on naming the structs & associated methods with UnstructuredBuilder.

Todo:

  • Write rest of kani proofs
  • Set up fuzzing on non-verifiable methods (or takes too long)
  • UnstructuredBuilder Documentation
  • Dearbitrary Documentation
  • Decomment u16 range test - currently takes 10 minutes to verify, so I've commented it for the time being to save on my testing time.
  • Derive macro

Notes:

  • Should dearbitrary be hidden behind a feature?
  • Int is currently a breaking API change. Can this be fixed without manually implementing from/to le bytes?
  • While writing this, I noticed that the f32/f64 representation isn't representable cross-platform if MIPS is used as the architecture set.

@fitzgen
Copy link
Member

fitzgen commented Aug 9, 2024

Apologies for how long its taken me to respond to this PR.

I'm concerned about the complexity that this feature brings. I'm not convinced that it is worth it, just to be able to seed a corpus.

If seeding a corpus is an important use case, it can be done today by changing how the structure-aware fuzzing is done from a generative paradigm to a mutation paradigm:

  • take bytes as the input to the fuzz target

  • try to deserialize the bytes into your input type with bincode::deserialize::<MyInputType>(fuzzer_input) or similar (protobuf, etc...)

    • if the deserialization fails, continue to the next test case

    • if it succeeds, then run the structured input through your oracles, same as you otherwise would with the arbitrary crate

  • additionally define a fuzz_mutator! that does something like

    let mut input = bincode::deserialize::<MyInputType>(input_bytes)
        .unwrap_or_else(|| MyInputType::default());
    
    // Mutate a test case. Not any more difficult to implement
    // than a by-hand `Arbitrary` implementation.
    input.mutate(seed);
    
    return bincode::serialize(input);
  • now you can seed your corpus by manually building a bunch of MyInputTypes and then bincode::serialize them into the corpus directory

(But backing up: It should also be noted that randomly generating a corpus before you start fuzzing isn't going to be expected to do any better than incrementally building the corpus from nothing. Seeding the corpus is only useful if you already have inputs that you know are interesting for other reasons, for example they've triggered bugs/crashes before or are Real World snippets of your programming language or etc...)

@fitzgen
Copy link
Member

fitzgen commented Aug 9, 2024

In general, the mutation paradigm has another benefit over the generative paradigm as well: if you add knobs to only do shrinking/simplifying mutations, then it is trivial to build a test case minimizer on top of that which performs better than cargo fuzz tmin. And for things like programming languages, if you add a semantics-preserving mutation mode, then you can do differential fuzzing where you check the correctness of your various compiler passes and optimizations like

let input = the_fuzzer_input();
let result = run(&input);

let alt_input = input.mutate_preserving_semantics();
let alt_result = run(&alt_input);

assert_eq!(
    result,
    alt_result,
    "running two semantically-equivalent programs should produce the same results",
);

@fitzgen
Copy link
Member

fitzgen commented Aug 9, 2024

(we should probably have a crate that is the mutation-paradigm-equivalent of the arbitrary crate in the rust-fuzz org)

@LeoDog896
Copy link
Author

LeoDog896 commented Aug 10, 2024

By the way, my use of seeding was incorrect. I meant more for large arbitrary-derived structures, as developing an initial corpus is painstaking. Still, without it, the initial fuzzing warmup takes longer. However, the inconvenience also means that some libraries go without a corpus for those types of structures.

While I do have a limited understanding of fuzzers, aren't most fuzzers mutative instead of generative already? Wouldn't methods like mutate or mutate_preserving_semantics have to explore much more slowly, given that their inputs depend on the structure of input?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reversed arbitrary
2 participants