Skip to content

Commit

Permalink
Reapply reproducibility improvements + unit tests
Browse files Browse the repository at this point in the history
debug

reproducibility_tests/print_new_mse.jl -> mse_summary
  • Loading branch information
charleskawczynski committed Dec 18, 2024
1 parent 15469de commit 4edab0a
Show file tree
Hide file tree
Showing 15 changed files with 1,519 additions and 829 deletions.
8 changes: 4 additions & 4 deletions .buildkite/pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -56,11 +56,11 @@ steps:

- wait

- group: "Reproducibility tests"
- group: "Reproducibility infrastructure"
steps:

- label: ":computer: Ensure mse tables are reset when necessary"
command: "julia --color=yes --project=examples reproducibility_tests/test_reset.jl"
- label: ":computer: Test reproducibility infrastructure"
command: "julia --color=yes --project=examples test/unit_reproducibility_infra.jl"

- group: "Radiation"
steps:
Expand Down Expand Up @@ -1211,7 +1211,7 @@ steps:
continue_on_failure: true

- label: ":robot_face: Print new mse tables"
command: "julia --color=yes --project=examples reproducibility_tests/print_new_mse.jl"
command: "julia --color=yes --project=examples reproducibility_tests/mse_summary.jl"

- label: ":robot_face: Print new reference counter"
command: "julia --color=yes --project=examples reproducibility_tests/print_new_ref_counter.jl"
Expand Down
27 changes: 13 additions & 14 deletions examples/hybrid/driver.jl
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,13 @@ end

# Check if selected output has changed from the previous recorded output (bit-wise comparison)
include(
joinpath(@__DIR__, "..", "..", "reproducibility_tests", "mse_tables.jl"),
joinpath(
@__DIR__,
"..",
"..",
"reproducibility_tests",
"reproducibility_test_job_ids.jl",
),
)
if config.parsed_args["reproducibility_test"]
# Test results against main branch
Expand All @@ -95,21 +101,14 @@ if config.parsed_args["reproducibility_test"]
"..",
"..",
"reproducibility_tests",
"reproducibility_tests.jl",
"reproducibility_tools.jl",
),
)
@testset "Test reproducibility table entries" begin
mse_keys = sort(collect(keys(all_best_mse[simulation.job_id])))
pcs = collect(Fields.property_chains(sol.u[end]))
for prop_chain in mse_keys
@test prop_chain in pcs
end
end
perform_reproducibility_tests(
simulation.job_id,
export_reproducibility_results(
sol.u[end],
all_best_mse,
simulation.output_dir,
config.comms_ctx;
job_id = simulation.job_id,
computed_dir = simulation.output_dir,
)
end

Expand Down Expand Up @@ -145,7 +144,7 @@ if ClimaComms.iamroot(config.comms_ctx)
),
)
@info "Plotting"
paths = latest_comparable_paths() # __build__ path (not job path)
paths = latest_comparable_dirs() # __build__ path (not job path)
if isempty(paths)
make_plots(Val(Symbol(reference_job_id)), simulation.output_dir)
else
Expand Down
8 changes: 4 additions & 4 deletions reproducibility_tests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ To update the mse tables:
- Click the *Print new mse tables* buildkite job
- Click the *Running commands* entry in the *Log* tab
- Copy this output until `-- DO NOT COPY --`
- Paste these contents into `reproducibility_tests/mse_tables.jl`
- Paste these contents into `reproducibility_tests/reproducibility_test_job_ids.jl`
- Add, commit, and push these changes.

## Adding a new reproducibility test
Expand All @@ -66,7 +66,7 @@ To add a new reproducibility test:

- Set the command-line `reproducibility_test` to true, and add `julia --color=yes --project=examples reproducibility_tests/test_mse.jl --job_id [job_id] --out_dir [job_id]` as a separate command for the new (or existing) job
- Copy the `all_best_mse` dict template from the job's log
- Paste the `all_best_mse` dict template into `reproducibility_test/mse_tables.jl`
- Paste the `all_best_mse` dict template into `reproducibility_test/reproducibility_test_job_ids.jl`

<!-- TODO: improve names / mark off sections for all_best_mse dict -->

Expand All @@ -93,15 +93,15 @@ We cannot (easily) compare the output with a reference if we change the spatial
Reprodicibility tests are performed at the end of `examples/hybrid/driver.jl`, after a simulation completes, and relies on a unique job id (`job_id`). Here is an outline of the reproducibility test procedure:

0) Run a simulation, with a particular `job_id`, to the final time.
1) Load a dictionary, `all_best_mse`, of previous "best" mean-squared errors from `mse_tables.jl` and extract the mean squared errors for the given `job_id` (store in job-specific dictionary, `best_mse`).
1) Load a list of job IDs in `reproducibility_test_job_ids.jl`.
2) Export the solution (a `FieldVector`) at the final simulation time to an `NCDataset` file.
3) Compute the errors between the exported solution and the exported solution from the reference `NCDataset` files (which are saved in a dedicated folders on the Caltech Central cluster) and save into a dictionary, called `computed_mse`.
4) Export this dictionary (`computed_mse`) to the output folder
5) Test that `computed_mse` is no worse than `best_mse` (determines if reproducibility test passes or not).

After these steps are performed at the end of the driver, additional jobs are run:

1) Print `computed_mse` for all jobs to make updating `reproducibility_tests/mse_tables.jl` easy
1) Print `computed_mse` for all jobs
2) If we're on the github queue merging branch (all tests have passed, and the PR is effectively merging), move the `NCDataset`s from the scratch directory onto the dedicated folder on the Caltech Central cluster.

## How we track which dataset to compare against
Expand Down
214 changes: 0 additions & 214 deletions reproducibility_tests/compute_mse.jl

This file was deleted.

Loading

0 comments on commit 4edab0a

Please sign in to comment.