Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prototype of facts compression #80

Closed
wants to merge 3 commits into from

Conversation

lqd
Copy link
Member

@lqd lqd commented Oct 2, 2018

This is a WIP PR to attempt tackling #20.

This tries to contract the CFG to simplify the subset computation while still keeping it equivalent (in our limited tests and benchmarks) with regards to the live borrows they propagate.

Marked as WIP because:

  • I'd like to verify the approach is sound, and the comments correct (unlikely).
  • I've completely separated the compression code for now in order to test it independently, and whenever it's ready, see how to best integrate it (but this creates "tension" as will be mentioned in another point)
  • more iteration is needed on how and where do this work, how to pass the additional input data where it is needed.
  • while the borrow_live_at are still correct, completely separating compression from the variant prevents correctly computing errors in the optimized variant, so while tests and our datasets work: it's unknown what this can break. (Even though I'm more and more thinking we should revert the errors computation in the optimized variant, until we know more of what/when/how Polonius will work and which parts should be done here or in rustc. And then when the system is more connected, remove the possible impedance mismatch between the 2, remove duplicated/unneeded work, etc.)
  • as such, more tests would be needed, and which kind of tests (whether or not they should test errors or not, etc) I'm not sure.
  • I see this as a prototype we can improve, for example it's using datafrog for parts of the computation, out of convenience and I probably have missed other/better ways of doing things.

There are comments explaining the approach, but it is close to the one described in #20 except in a key point, the outlives constraint specified there is ignored: in the polonius analysis, subsets between two points don't interact with the outlives facts, only region_live_at between the 2 points. (In fact, adding this "empty outlives" constraint wouldn't find any of the CFG edges in clap as compressible. At one point, I also had a constraint where the outlives sets for each point had to be equal, before I realized they were not used in the datalog rules, and it removed 5000 additional edges from clap).

Some numbers, contracting the CFG looks like this:

  • on clap: 51 896 cfg_edge tuples reduced to 11 027
  • on issue-47680: 61 cfg_edge tuples reduced to 19

Once the CFG is contracted, the other facts are also pruned, so that we don't spend time computing intermediary results for points absent from the CFG (some of these might be needed in cases I wasn't able to test yet, I'm just describing the WIP state).

Pruning looks like this on clap:

  • borrow_region: unchanged, 1 886 tuples
  • killed: removed 252 tuples out of 980
  • outlives: removed 408 802 tuples out of 53 4327
  • region_live_at: removed 896 790 tuples out of 1 076 158

There is a --compress-cfg flag to the CLI to be able to test on different inputs, and a new dataset input (extracted from rustc) containing code Polonius should reject — useful to test errors manually, until final integration where it will become part of the unit tests.

I have tested manually:

  • that our datasets still compute the same live loans
  • that the incorrect examples in the smoke-test fail with the same errors
  • that the rustc ui tests in compare-mode=polonius have the same behaviour compressed and uncompressed (by creating a temporary variant -- absent from this PR -- forcing the facts compression and decompressing the live loans before computing the errors) for the Opt computation (as well as the Naive variant which I had modified to produce errors instead of live loans for the manual tests)

@lqd lqd force-pushed the new-cfg-compression branch 2 times, most recently from 8beba71 to 6340eb8 Compare October 4, 2018 22:19
lqd and others added 3 commits October 5, 2018 00:21
This test from rustc contains examples of code Polonius accepts but MIR borrowck doesn't, and examples of code Polonius must reject (and it's our only dataset containing errors)
@lqd lqd changed the title [WIP] Prototype of facts compression Prototype of facts compression Oct 10, 2018
@lqd
Copy link
Member Author

lqd commented Oct 10, 2018

With the recent rustc tests, there's not really WIP-ness outside of the open questions (even if they will likely need modifying the variants API to produce errors) and discussion.

@lqd
Copy link
Member Author

lqd commented Sep 18, 2019

Fact compression is full of potential.

It was a big 6x improvement even though I hoped it would be a 10x improvement. Other prototyped improvements have achieved the 10x speedup on some benchmarks, and combining those to fact compression, down the line, is also likely to be interesting.

Alas it won't be merged as is, so I'll close this until/if I pick it up again.

@lqd lqd closed this Sep 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant