Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hitting a recursion limit while running cargo semver-check on diesel #108

Open
weiznich opened this issue Aug 30, 2022 · 14 comments
Open

Hitting a recursion limit while running cargo semver-check on diesel #108

weiznich opened this issue Aug 30, 2022 · 14 comments
Labels
C-bug Category: doesn't meet expectations

Comments

@weiznich
Copy link

Steps to reproduce the bug with the above code

git clone https://github.com/diesel-rs/diesel
cd diesel
git checkout v2.0.0
cargo semver-checks check-release --manifest-path diesel/Cargo.toml --baseline-version 1.4.8

Actual Behaviour

cargo semver-checks check-release --manifest-path diesel/Cargo.toml --baseline-version 1.4.8
    Updating index
     Parsing diesel v2.0.0 (current)
     Parsing diesel 1.4.8 (baseline)
Error: Failed to parse rustdoc JSON output file "/home/weiznich/Documents/rust/diesel/target/semver-checks/target/doc/diesel.json"

Caused by:
    recursion limit exceeded at line 1 column 203398

(The column number varies between different runs)

Expected Behaviour

cargo semver-checks check-release reports a number of API breaking changes between diesel 1.4 and diesel 2.0

Additional Context

No response

Debug Output

Software version

cargo-semver-checks 0.9.1

Operating system

Linux 5.18.18-200.fc36.x86_64

Command-line

/home/weiznich/.cargo/bin/cargo-semver-checks semver-checks --bugreport 

cargo nightly version

> cargo +nightly -V 
cargo 1.65.0-nightly (4ed54cecc 2022-08-27)

Compile time information

  • Profile: release
  • Target triple: x86_64-unknown-linux-gnu
  • Family: unix
  • OS: linux
  • Architecture: x86_64
  • Pointer width: 64
  • Endian: little
  • CPU features: fxsr,llvm14-builtins-abi,sse,sse2
  • Host: x86_64-unknown-linux-gnu
@weiznich weiznich added the C-bug Category: doesn't meet expectations label Aug 30, 2022
@obi1kenobi
Copy link
Owner

Thanks for the detailed report. I was worried something like this might become a problem, and I'd love your thoughts on the possible options here.

The underlying issue is that we're trying to deserialize a very deeply nested JSON object, and exceeding serde's hardcoded recursion limit of 128.

  • The issue about making this limit configurable has been closed, seemingly as "won't fix": Add configurable recursion limit serde-rs/json#162
  • serde's unbounded_depth feature appears potentially-dangerous, and I'd be hesitant to begin using it without guidance and an in-depth code review from someone with substantially more Rust experience than me.
  • We're currently committed to serde as a deserializer because we depend on rustdoc-types which is serde-only. Any change to a different deserializer approach would almost certainly need to bubble up to rustdoc-types or a similar library, so we don't have to maintain the rustdoc types ourselves as well.
  • It's not clear what deserializer we'd switch to, even if given the chance. The miniserde library is a candidate since it does not use recursion, but its README says it is not production-ready: https://github.com/dtolnay/miniserde

@epage
Copy link
Collaborator

epage commented Aug 30, 2022

This might be an opportunity to provide feedback to rustdoc folks on the design of the data format.

I'm somewhat tempted to say we allow unbounded depth considering the source is, in theory, trusted.

@aDotInTheVoid
Copy link

WRT miniserde, It only supports enums with unit variants, which makes it a non starter for rustdoc-types

WRT rustdoc-types (the crate, not the upstream version), I'd be happy to merge a change that added an additional (de)?serialization trait impls. Note that this would need to be done in the update script and not the code directly, and probably also controlled via a cargo feature.

The best place to fix this seems like changing rustdoc-json-types (the crate in rust-lang/rust). I'd love an issue in that repo for this, especially if it had an MCVE

@obi1kenobi
Copy link
Owner

obi1kenobi commented Aug 31, 2022

This might be an opportunity to provide feedback to rustdoc folks on the design of the data format.

It's not really clear to me that the rustdoc JSON format is lacking and needs to change, to be honest. If the folks in charge of the rustdoc JSON format pushed back on this, I wouldn't think it unreasonable.

Diesel uses many more (and more complex) generics than the median crate — in fact I'd bet it's past the 99.9%ile on that distribution. We are privileged because Trustfall and its adapter will absorb ~any change to the JSON format. But for most rustdoc JSON users, it might not make sense to make the format harder to use in exchange for supporting more crates at the extreme end of that distribution.

I'm somewhat tempted to say we allow unbounded depth considering the source is, in theory, trusted.

It isn't clear to me that anything guarantees fail-stop behavior if a bad JSON file is deserialized. Panics would be unfortunate, but exploitable stack overflow causing a security vulnerability would be way worse.

I'm worried about paths away from the happy ones:

  • cargo-semver-checks allows checking JSON files directly. In principle, someone may slip a corrupted file as e.g. the "current" JSON.
  • cargo-semver-checks can cause JSON to be generated. However, that file is created and saved to disk first, and then separately read and used later. This leaves an opportunity for a TOCTOU-style attack where an attacker waits for the file to be generated and then swaps it for a malicious JSON before it's used.
  • If docs.rs starts hosting JSON files, cargo-semver-checks may download a previous version's JSON file, which would be much faster than regenerating it. But then we'd have to worry about those files as well: compromising docs.rs might lead to CI or local compromise as well. Etc.

@epage
Copy link
Collaborator

epage commented Aug 31, 2022

It's not really clear to me that the rustdoc JSON format is lacking and needs to change, to be honest. If the folks in charge of the rustdoc JSON format pushed back on this, I wouldn't think it unreasonable.

If it nests that deeply, another strategy should at least be considered for the schema.

It isn't clear to me that anything guarantees fail-stop behavior if a bad JSON file is deserialized. Panics would be unfortunate, but exploitable stack overflow causing a security vulnerability would be way worse.

I see this as being more of a short range solution while something else is worked out.

@obi1kenobi
Copy link
Owner

I see this as being more of a short range solution while something else is worked out.

Perhaps an optional "use at own risk" feature that enables the corresponding serde feature, then? That seems like it could be a reasonable, measured solution for the time being.

@epage
Copy link
Collaborator

epage commented Aug 31, 2022

Maybe a -Z unlimited-recursion

@weiznich
Copy link
Author

Thanks for all your discussions here. Let me add a few arguments here:

Diesel uses many more (and more complex) generics than the median crate — in fact I'd bet it's past the 99.9%ile on that distribution.

I do not feel that's a good argument. Yes diesel is more complex that the average crate, but also it's much more popular than the average crate. To be clear here: It's fine to say: That's to complex we will not support that as long as this tool is not part of the official rust tool chain. As soon as you talk about the inclusion in cargo as discussed in #61 that's not an option anymore in my opinion. I would expect that it just works without additional flags then.

It's not really clear to me that the rustdoc JSON format is lacking and needs to change, to be honest. If the folks in charge of the rustdoc JSON format pushed back on this, I wouldn't think it unreasonable.

I personally would consider this an issue with the JSON emitted by rustdoc. At very least I would assume that rustdoc-json-types is able to load any JSON document produced by rustdoc. If that's not possible either rustdoc-json-types or the format needs to change in my opinion. As the JSON backend is not stable yet according to rust-lang/rust#76578 I do not really see a reason not to change the format even if others depend on the current format. That's exactly the reason why it is not stable yet.

The best place to fix this seems like changing rustdoc-json-types (the crate in rust-lang/rust). I'd love an issue in that repo for this, especially if it had an MCVE

What exactly do you expect as MCVE? Minimizing diesel is something that's really hard. I can at least provide the JSON document that's currently produced by rustdoc.

@obi1kenobi
Copy link
Owner

obi1kenobi commented Aug 31, 2022

What exactly do you expect as MCVE? Minimizing diesel is something that's really hard. I can at least provide the JSON document that's currently produced by rustdoc.

A few jq commands should be able to reveal the deepest parts of the JSON document, which I think would be helpful both for the MCVE and for a potential fix.

Assuming Diesel's rustdoc JSON file is diesel.json:

  • jq -r 'path(..) | length' diesel.json | sort -g | tail -1 will show in how many levels of nesting (counting both array indexing and object nesting) the deepest element can be found.
  • jq -r 'path(..) | select(length >= 21)' diesel.json will show the paths that are 21 levels of nesting or deeper. That will produce a path like: ["index", "2:13578:54362", "inner", "decl", "output", "inner", "args", "angle_bracketed", "args", 0, "type", "inner", "type", "inner", "trait", "args", "angle_bracketed", "args", 0, "type", "inner" ] — this already gives us an idea of how we got to the deepest part of the path.
  • getpath can be used to output the values given a path, etc.

It would be helpful to include at least the max-depth number + some of the paths that are at that depth together with the file, just to make it easier to minimize the file and narrow in on the problem from there.

Maybe a -Z unlimited-recursion

I'm happy to put a branch together that uses unlimited recursion — I'd appreciate pointers / PR review on the clap magic necessary to parse the options in the most ergonomic way 😊 I can clone Diesel and generate the rustdoc files.

@weiznich
Copy link
Author

weiznich commented Sep 1, 2022

Thanks for the pointers on how to extract these information. I'm out for holidays in the next weeks, but I can try to gather that information after I'm back.

@weiznich
Copy link
Author

I finally had some time to get the requested numbers:

  • jq -r 'path(..) | length' diesel.json | sort -g | tail -1 will show in how many levels of nesting (counting both array indexing and object nesting) the deepest element can be found.
    j* q -r 'path(..) | select(length >= 21)' diesel.json will show the paths that are 21 levels of nesting or deeper. That will produce a path like: ["index", "2:13578:54362", "inner", "decl", "output", "inner", "args", "angle_bracketed", "args", 0, "type", "inner", "type", "inner", "trait", "args", "angle_bracketed", "args", 0, "type", "inner" ] — this already gives us an idea of how we got to the deepest part of the path.
  • getpath can be used to output the values given a path, etc.

Funnily that fails with parse error: Exceeds depth limit for parsing at line 1, column 1478729

The generated json file is ~500MB large so it's quite hard to tell what's wrong there. For reference the generated json file as zip: diesel.zip

I did hack together some small serde_json script that disables the recursion limit. It seems like the max depth is 903 based on that. The longest path seems to be:

["index", "0:40213", "inner", "generics", "where_predicates", "254", "bound_predicate", "bounds", "0", "trait_bound", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "trait", "args", "angle_bracketed", "args", "0", "type", "inner", "args", "angle_bracketed", "args"]

The other information for the item with the id "0:40213" indicate that it is some impl generated here: https://github.com/diesel-rs/diesel/blob/master/diesel/src/query_builder/insert_statement/insert_with_default_for_sqlite.rs#L368-L375
The linked impl is an internal trait for diesel. It's not part of the public API. When build with default features it looks like that:
image
The index 254 in the path indicates that cargo semver-check builds crates with --all-features enabled, which for diesel will enable the 128-column-tables feature flag, which in turn generates tuple impls for tuples with up to 128 fields. (That last impl is that large that firefox refuses to provide an screenshot of, but I think you get the idea based on the impl for 32 fields).

So this seems to be easily fixable if cargo semver-check would provide a way to specify the feature flags to check or if it would use the default features.

The script used to get these results: https://gist.github.com/weiznich/78a541904ea08c0375c64f2cf7a96eb0

@obi1kenobi
Copy link
Owner

Thanks for putting this together, it's very useful!

By default the build is indeed --all-features but it has proved less than ideal in a few other folks' workloads as well: for example, #315.

@tonowak and I came up with a viable plan that should allow only checking user-specified features, and I believe some progress on it has already been made.

@Enselic
Copy link

Enselic commented Sep 9, 2023

Couldn't you just do what cargo public-api has done for a long time? It has not caused any reported problems so far. We have a small private wrapper function called deserialize_without_recursion_limit():

https://github.com/Enselic/cargo-public-api/blob/783450272d7745318717c42eb3795edb26d18450/public-api/src/lib.rs#L277

@obi1kenobi
Copy link
Owner

I think serde-stacker might be the safer long-term option: https://github.com/dtolnay/serde-stacker

To ensure the long-term health of the project, I have to be very conservative with adopting things that may lead to an increased maintenance burden on me. This is especially the case for things that increase complexity but don't come with ongoing monetary sponsorship. Disabling the recursion limit felt like a taking a huge risk to satisfy a rare edge case, which might endanger my ability to build the many other things that would be more widely applicable.

Also, since cargo-semver-checks now allows selecting features specifically, diesel can now be semver-checked by disabling the couple of features that cause the recursive blowup in the JSON file: https://www.reddit.com/r/rust/comments/16cj1mo/comment/jzn8ggb/

So that seems to make this issue an even lower priority for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: doesn't meet expectations
Projects
None yet
Development

No branches or pull requests

5 participants