Use solver result caching in CI #1908

m-yac · 2023-08-14T22:12:14Z

This PR makes three changes:

Enables solver result caching on the CI for the integration tests and all the s2n tests
Adds a --clean-solver-cache command-line option to saw which provides a direct way to remove cache entries with out-of-date solver versions (this is run every time a cache is loaded on CI)
Uses the commit hash of the last commit to modify the rme directory as the version information for rme calls, instead of the commit hash of the entire repo (this ensures cached rme calls are actually used when CI is run on a later commit which does not modify the rme directory)

The result is an average of a 1.57x speedup of the relevant parts of the CI. Because all the tests run in parallel, the only one which affects the overall runtime of the CI is the awslc test. Including solver caching consistently saves 35-40 minutes on this test. For two particular runs, the full breakdown is below – with the most dramatic speedups in bold.

	intTests	hmac	drbg	sike	bike	tls	hmac-f	awslc	blst
Current	14m 1s	10m 13s	3m 12s	4m 26s	9m 14s	8m 21s	5m 11s	77m 20s	17m 21s
w/ Caching	12m 14s	6m 58s	2m 44s	4m 14s	7m 53s	2m 55s	4m 15s	41m 13s	12m 55s
Speedup	1.15x	1.47x	1.17x	1.05x	1.17x	2.86x	1.22x	1.88x	1.34x
Cache Hits	546	1518	18	913	3893	28	80	991	1117

Note that there appears to be no correlation between the number of cache hits and the speedup, likely due to the fact that a removing a few very "difficult" solver queries results in much more speedup than removing lots of "easy" solver queries.

…g(?)

m-yac · 2023-08-18T14:29:28Z

I am on vacation for the rest of the month, so unfortunately I won't be able to work on this again until September. Here's a summary of the status of the changes on this PR.

Non-solver cache related

In an attempt to figure out why non-GHC 9.9.4 builds still take 30+ minutes even when there are no haskell code changes, I added ${{ github.ref }} to the keys for the cabal cache. Unfortunately that didn't fix it, but it seems reasonable to add: without it github could choose to restore the cabal cache from master, or from somebody else's branch, rather than from your branch.

I fixed a typo in the cabal cache save step where most of the key name was accidentally duplicated. For example, before the fix keys would be saved as "1-cabal-macos-12-9.4.4-02b...601-e06...60a 1-cabal-macos-12-9.4.4-02b...601- " but with the fix it would just be "1-cabal-macos-12-9.4.4-02b...601-e06...60a ".

- key: ${{ env.CACHE_VERSION }}-cabal-${{ matrix.os }}-${{ matrix.ghc }}-${{ hashFiles(format('cabal.GHC-{0}.config', matrix.ghc)) }}-${{ github.sha }}
-   ${{ env.CACHE_VERSION }}-cabal-${{ matrix.os }}-${{ matrix.ghc }}-${{ hashFiles(format('cabal.GHC-{0}.config', matrix.ghc)) }}-
+ key: ${{ env.CACHE_VERSION }}-cabal-${{ matrix.os }}-${{ matrix.ghc }}-${{ hashFiles(format('cabal.GHC-{0}.config', matrix.ghc)) }}-${{ github.ref }}-${{ github.sha }}

Solver caching

Solver caching for the integration_tests is working! It required setting SAW_SOLVER_CACHE_PATH in $GITHUB_ENV in ci.yml, and it required modifying IntegrationTest.hs to capture the value of SAW_SOLVER_CACHE_PATH when running the tests.
Solver caching for the s2n tests is not working yet. The complexity of this piece is that the actual tests are run on docker images. So far, I've needed to set SAW_SOLVER_CACHE_PATH in the -entrypoint.sh scripts, add the cache directory as a volume in s2nTests/docker-compose.yml, and give the cache directory write permissions in ci.yml. Despite all this seemingly working correctly based on my debugging, the cache/save action was not finding the cache directory. However, I just noticed that the step which actually runs the s2n tests happens in a different working-directory. In my latest commit I tried to fix this - so maybe the cache file will actually be saved now! 🤞

RyanGlScott · 2023-08-18T23:05:12Z

figure out why non-GHC 9.9.4 builds still take 30+ minutes even when there are no haskell code changes

I don't think this is the fault of this PR. GHC 9.4+ uses a different approach to recompilation checking (see this blog post) that allows most of the local Haskell dependencies to be reused directly from the cache rather than rebuilding. Indeed, if you compare this GHC 9.4 CI job versus this GHC 9.2 CI job, the main thing that accounts for the 9.2 job taking so much longer is the Run .github/ci.sh build step, which spends 20+ minutes rebuilding all of the local dependencies.

kquick · 2023-08-18T23:17:46Z

I added ${{ github.ref }} to the keys for the cabal cache. Unfortunately that didn't fix it, but it seems reasonable to add: without it github could choose to restore the cabal cache from master, or from somebody else's branch, rather than from your branch.

This shouldn't be necessary, as caching is already limited to the current branch: https://github.com/actions/cache#cache-scopes

kquick · 2023-08-18T23:22:49Z

Solver caching for the s2n tests is not working yet

@m-yac: what are the indicators that we would be looking for to tell if this is working?

m-yac · 2023-09-05T18:38:54Z

Solver caching for the s2n tests is not working yet

@m-yac: what are the indicators that we would be looking for to tell if this is working?

Oh, I just meant that the "Save SMT solver result cache" step of the s2n tests is failing. On the most recent run it failed with:

/usr/bin/tar: s2nTests/cache/lock.mdb: Cannot open: Permission denied
/usr/bin/tar: s2nTests/cache/data.mdb: Cannot open: Permission denied
/usr/bin/tar: Exiting with failure status due to previous errors
Warning: Failed to save: "/usr/bin/tar" failed with error: The process '/usr/bin/tar' failed with exit code 2
Warning: Cache save failed.

which looks like an easy-to-fix permissions issue. For some reason this doesn't give a red X, unlike previous times this step failed, which is maybe the source of the confusion.

…ests

RyanGlScott

If the table in the PR description is to be believed, then this work will end up saving a ton of time on CI rebuilds in the future. Great work, @m-yac!

(Out of curiosity, how did you obtain the figures in that table? Perhaps this info was dumped somewhere in the CI logs, but I couldn't easily find it.)

.github/workflows/ci.yml

s2nTests/scripts/awslc-entrypoint.sh

m-yac · 2023-09-12T15:37:16Z

Thanks for the review, @RyanGlScott! With the exception of the number of cache hits, which was from debugging information which is now turned off, I got the numbers in the table directly from Github's web interface. Specifically, from the right hand side when a job is selected. The specific CI runs I used are linked in the first column. For example, after going to this CI run from master, this where I got the upper-left-most time:

m-yac · 2023-09-12T17:48:24Z

FYI: Looks like Github evicted the previously built solver caches due to age, thus the first attempt at running the CI on the latest commit latest had to rebuild all the caches from scratch (and so took longer than expected). I'm re-running the CI now that there are some caches to use just to make sure everything still works.

In this future, when this PR is merged, this hopefully won't be a problem because we should always have fresh caches from master to build off of.

m-yac · 2023-09-12T19:46:49Z

Also note that I forgot all the tests run in parallel when I made my original analysis of the speedup. Taking into account that the awslc test is always the last to finish, the correct figure is that solver caching saves 35-40 minutes overall. Between the two most recent runs mentioned above, solver caching reduced the runtime of that test from 1h 18m 45s to 38m 16s (40m saved).

first try at turning on solver caching for intTests and s2nTests in CI

0fed8d7

m-yac added the tooling: CI Issues involving CI/CD scripts or processes label Aug 14, 2023

m-yac added 18 commits August 14, 2023 20:28

add intTest which tests whether solver caching is even enabled

7fdbd5f

use absolute paths for solver caching on CI

9be89d0

fix dummy CI test file so it actually runs

ba87f71

properly set SAW_SOLVER_CACHE_PATH env var in CI

d68652a

another attempt at setting SAW_SOLVER_CACHE_PATH on CI

c706986

use github.ref instead of .sha for caching in CI, fix repeated key bu…

6f76bea

…g(?)

make integration tests aware of SAW_SOLVER_CACHE_PATH env var

1993013

enable solver caching for s2n tests; fix typo in ci.yml

9e5941e

clean up solver caching bits in ci.yml, add debugging to blst

38b97b0

remove debugging from intTests, tweak debugging for s2n-blst

4b3e2b7

more tweaks to CI blst debugging and cache keys

db0e530

more CI solver caching debugging for s2n

325bb49

another attempt at enabling solver caching for s2n tests on CI

91b4211

add debugging back to s2n CI tests

a885f60

even more debugging for s2n CI solver caching

09042ad

tweak debugging for s2n CI solver caching

2762cdf

the tweaks to s2n CI solver caching just keep coming...

69d327e

point the CI solver cache restore/save to the right place for s2n tests

358c0dc

m-yac added 5 commits September 5, 2023 16:58

remove github.ref from cache keys, try to fix s2n cache permissions

fd847c8

give read permissions to newly created s2n caches

c68fcd2

a different strategy for getting permissions for s2n solver cache

edc3b2d

add -p to mkdir in s2n solver caching

310e34b

Merge remote-tracking branch 'origin/master' into solver-caching-on-ci

55008bd

m-yac added 7 commits September 6, 2023 17:13

fix typo preventing s2n solver caches from getting restored on CI

d54e304

set solver cache timeout to 2s, add print_solver_cache_stats to s2n t…

9a9763f

…ests

change debugging method for solver caching on CI

289adc6

remove timeout in solver caching for debugging purposes

9707e4d

add --clean-solver-cache CLI opt, clean up CI debugging

a6d4bcb

update manual with --clean-solver-cache

4fe74e2

use RME-specific hash for solver versioning

3c71516

m-yac marked this pull request as ready for review September 11, 2023 17:16

m-yac requested review from kquick and RyanGlScott September 11, 2023 17:16

RyanGlScott reviewed Sep 11, 2023

View reviewed changes

.github/workflows/ci.yml Outdated Show resolved Hide resolved

s2nTests/scripts/awslc-entrypoint.sh Outdated Show resolved Hide resolved

m-yac added 2 commits September 12, 2023 11:16

add docs to CACHE_VERSION env vars, add CABAL_ to original for clarity

b042706

add "mismatched_versions" to "clean_solver_cache" command name

a6600cd

rename cleanSolverCache, cleanCacheOpt to mention mismatched versions

498d5d6

RyanGlScott approved these changes Sep 13, 2023

View reviewed changes

m-yac merged commit 5b6db1f into master Sep 14, 2023
36 checks passed

mergify bot deleted the solver-caching-on-ci branch September 14, 2023 16:23

RyanGlScott mentioned this pull request Jan 19, 2024

cabal repl saw fails if cabal build isn't run beforehand #1004

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use solver result caching in CI #1908

Use solver result caching in CI #1908

m-yac commented Aug 14, 2023 •

edited

Loading

m-yac commented Aug 18, 2023

RyanGlScott commented Aug 18, 2023

kquick commented Aug 18, 2023

kquick commented Aug 18, 2023 •

edited

Loading

m-yac commented Sep 5, 2023

RyanGlScott left a comment •

edited

Loading

m-yac commented Sep 12, 2023

m-yac commented Sep 12, 2023 •

edited

Loading

m-yac commented Sep 12, 2023

Use solver result caching in CI #1908

Use solver result caching in CI #1908

Conversation

m-yac commented Aug 14, 2023 • edited Loading

m-yac commented Aug 18, 2023

Non-solver cache related

Solver caching

RyanGlScott commented Aug 18, 2023

kquick commented Aug 18, 2023

kquick commented Aug 18, 2023 • edited Loading

m-yac commented Sep 5, 2023

RyanGlScott left a comment • edited Loading

Choose a reason for hiding this comment

m-yac commented Sep 12, 2023

m-yac commented Sep 12, 2023 • edited Loading

m-yac commented Sep 12, 2023

m-yac commented Aug 14, 2023 •

edited

Loading

kquick commented Aug 18, 2023 •

edited

Loading

RyanGlScott left a comment •

edited

Loading

m-yac commented Sep 12, 2023 •

edited

Loading