Prototype of facts compression #80

lqd · 2018-10-02T17:08:29Z

This is a WIP PR to attempt tackling #20.

This tries to contract the CFG to simplify the subset computation while still keeping it equivalent (in our limited tests and benchmarks) with regards to the live borrows they propagate.

Marked as WIP because:

I'd like to verify the approach is sound, and the comments correct (unlikely).
I've completely separated the compression code for now in order to test it independently, and whenever it's ready, see how to best integrate it (but this creates "tension" as will be mentioned in another point)
more iteration is needed on how and where do this work, how to pass the additional input data where it is needed.
while the borrow_live_at are still correct, completely separating compression from the variant prevents correctly computing errors in the optimized variant, so while tests and our datasets work: it's unknown what this can break. (Even though I'm more and more thinking we should revert the errors computation in the optimized variant, until we know more of what/when/how Polonius will work and which parts should be done here or in rustc. And then when the system is more connected, remove the possible impedance mismatch between the 2, remove duplicated/unneeded work, etc.)
as such, more tests would be needed, and which kind of tests (whether or not they should test errors or not, etc) I'm not sure.
I see this as a prototype we can improve, for example it's using datafrog for parts of the computation, out of convenience and I probably have missed other/better ways of doing things.

There are comments explaining the approach, but it is close to the one described in #20 except in a key point, the outlives constraint specified there is ignored: in the polonius analysis, subsets between two points don't interact with the outlives facts, only region_live_at between the 2 points. (In fact, adding this "empty outlives" constraint wouldn't find any of the CFG edges in clap as compressible. At one point, I also had a constraint where the outlives sets for each point had to be equal, before I realized they were not used in the datalog rules, and it removed 5000 additional edges from clap).

Some numbers, contracting the CFG looks like this:

on clap: 51 896 cfg_edge tuples reduced to 11 027
on issue-47680: 61 cfg_edge tuples reduced to 19

Once the CFG is contracted, the other facts are also pruned, so that we don't spend time computing intermediary results for points absent from the CFG (some of these might be needed in cases I wasn't able to test yet, I'm just describing the WIP state).

Pruning looks like this on clap:

borrow_region: unchanged, 1 886 tuples
killed: removed 252 tuples out of 980
outlives: removed 408 802 tuples out of 53 4327
region_live_at: removed 896 790 tuples out of 1 076 158

There is a --compress-cfg flag to the CLI to be able to test on different inputs, and a new dataset input (extracted from rustc) containing code Polonius should reject — useful to test errors manually, until final integration where it will become part of the unit tests.

I have tested manually:

that our datasets still compute the same live loans
that the incorrect examples in the smoke-test fail with the same errors
that the rustc ui tests in compare-mode=polonius have the same behaviour compressed and uncompressed (by creating a temporary variant -- absent from this PR -- forcing the facts compression and decompressing the live loans before computing the errors) for the Opt computation (as well as the Naive variant which I had modified to produce errors instead of live loans for the manual tests)

This test from rustc contains examples of code Polonius accepts but MIR borrowck doesn't, and examples of code Polonius must reject (and it's our only dataset containing errors)

lqd · 2018-10-10T10:10:54Z

With the recent rustc tests, there's not really WIP-ness outside of the open questions (even if they will likely need modifying the variants API to produce errors) and discussion.

lqd · 2019-09-18T11:35:20Z

Fact compression is full of potential.

It was a big 6x improvement even though I hoped it would be a 10x improvement. Other prototyped improvements have achieved the 10x speedup on some benchmarks, and combining those to fact compression, down the line, is also likely to be interesting.

Alas it won't be merged as is, so I'll close this until/if I pick it up again.

lqd force-pushed the new-cfg-compression branch 2 times, most recently from 8beba71 to 6340eb8 Compare October 4, 2018 22:19

lqd and others added 3 commits October 5, 2018 00:21

Compress "facts" and decompress "results"

620b6bc

Support cfg compression as a CLI flag

7762bcf

Add the polonius-smoke-test dataset

8cd0000

This test from rustc contains examples of code Polonius accepts but MIR borrowck doesn't, and examples of code Polonius must reject (and it's our only dataset containing errors)

lqd force-pushed the new-cfg-compression branch from 6340eb8 to 8cd0000 Compare October 4, 2018 22:22

lqd changed the title ~~[WIP] Prototype of facts compression~~ Prototype of facts compression Oct 10, 2018

lqd closed this Sep 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prototype of facts compression #80

Prototype of facts compression #80

lqd commented Oct 2, 2018 •

edited

Loading

lqd commented Oct 10, 2018

lqd commented Sep 18, 2019

Prototype of facts compression #80

Prototype of facts compression #80

Conversation

lqd commented Oct 2, 2018 • edited Loading

lqd commented Oct 10, 2018

lqd commented Sep 18, 2019

lqd commented Oct 2, 2018 •

edited

Loading