Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add serialised data to ci #338

Merged
merged 18 commits into from
Jan 5, 2024
Merged

Add serialised data to ci #338

merged 18 commits into from
Jan 5, 2024

Conversation

samkellerhals
Copy link
Contributor

@samkellerhals samkellerhals commented Dec 14, 2023

Description

Use the icon grid as the mesh when running cpu and gpu benchmarks.

Note:
A small number of stencils are skipped as when running them with the icon_grid a Fatal Python Error occurs, most likely as the execution domain needs to be restricted or boundary taken into account for those stencils. Also a few datatests are currently failing (maybe due to outdated serialised data?), we should address this in a future PR.

Other changes

  • One precommit yaml file with hooks to run all other precommit configs in each namespace package.

@samkellerhals
Copy link
Contributor Author

cscs-ci run

@samkellerhals
Copy link
Contributor Author

cscs-ci run

@samkellerhals
Copy link
Contributor Author

cscs-ci run

@samkellerhals
Copy link
Contributor Author

cscs-ci run

@samkellerhals
Copy link
Contributor Author

cscs-ci run

@samkellerhals
Copy link
Contributor Author

cscs-ci run

@samkellerhals
Copy link
Contributor Author

cscs-ci run

@samkellerhals
Copy link
Contributor Author

cscs-ci run

@samkellerhals
Copy link
Contributor Author

launch jenkins spack

ci/cscs.yml Outdated
extends: .test_template
stage: benchmark
script:
- pip install dace==$DACE_VERSION
- tox -r -e stencil_tests -c model/ --verbose -- --benchmark-only --backend=dace_cpu --grid=simple_grid
- tox -r -e stencil_tests -c model/ --verbose -- --benchmark-only --backend=dace_cpu --grid=icon_grid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that we use the same test command (stencil_tests) for both test and benchmark stages. This test command was added in your previous PR and contains the flags --cov --cov-append. I think that these coverage flags should not be used in benchmark mode. Maybe, --verbose could also be skipped on this line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a new command for benchmarks now.

if env_base_path:
return Path(env_base_path)
else:
return common_path.parent.joinpath("serialized_data")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are changing the default path of serialized data (was testdata before). I did a quick git grep testdata and found some places (e.g. README, .gitignore and code comments) that need to be updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch thanks, I've updated it.

@edopao
Copy link
Contributor

edopao commented Dec 20, 2023

Another problem I have observed (in baseline benchmarks) is that the benchmark executed a single run per stencil (check columns Rounds).

@edopao
Copy link
Contributor

edopao commented Dec 20, 2023

Another problem I have observed (in baseline benchmarks) is that the benchmark executed a single run per stencil (check columns Rounds).

I found why... we are using benchmark.pedantic with overriding of iterations and rounds here: model/common/src/icon4py/model/common/test_utils/helpers.py

def _test_execution_benchmark(self, pytestconfig, grid, backend, input_data, benchmark):
    if pytestconfig.getoption(
        "--benchmark-disable"
    ):  # skipping as otherwise program calls are duplicated in tests.
        pytest.skip("Test skipped due to 'benchmark-disable' option.")
    else:
        input_data = allocate_data(backend, input_data)
        benchmark.pedantic(
            self.PROGRAM.with_backend(backend),
            args=(),
            kwargs={**input_data, "offset_provider": grid.get_all_offset_providers()},
            iterations=1,
            rounds=1,
        )

We should probably try to avoid benchmark.pedantic and instead use normal benchmark:

diff --git a/model/common/src/icon4py/model/common/test_utils/helpers.py b/model/common/src/icon4py/model/common/test_utils/helpers.py
index 883a0aa7..f7c1445a 100644
--- a/model/common/src/icon4py/model/common/test_utils/helpers.py
+++ b/model/common/src/icon4py/model/common/test_utils/helpers.py
@@ -176,12 +176,10 @@ if pytest_benchmark:
             pytest.skip("Test skipped due to 'benchmark-disable' option.")
         else:
             input_data = allocate_data(backend, input_data)
-            benchmark.pedantic(
+            benchmark(
                 self.PROGRAM.with_backend(backend),
-                args=(),
-                kwargs={**input_data, "offset_provider": grid.get_all_offset_providers()},
-                iterations=1,
-                rounds=1,
+                **input_data,
+                offset_provider=grid.get_all_offset_providers()
             )
 
 else:

@samkellerhals
Copy link
Contributor Author

cscs-ci run

@samkellerhals
Copy link
Contributor Author

Another problem I have observed (in baseline benchmarks) is that the benchmark executed a single run per stencil (check columns Rounds).

I found why... we are using benchmark.pedantic with overriding of iterations and rounds here: model/common/src/icon4py/model/common/test_utils/helpers.py

def _test_execution_benchmark(self, pytestconfig, grid, backend, input_data, benchmark):
    if pytestconfig.getoption(
        "--benchmark-disable"
    ):  # skipping as otherwise program calls are duplicated in tests.
        pytest.skip("Test skipped due to 'benchmark-disable' option.")
    else:
        input_data = allocate_data(backend, input_data)
        benchmark.pedantic(
            self.PROGRAM.with_backend(backend),
            args=(),
            kwargs={**input_data, "offset_provider": grid.get_all_offset_providers()},
            iterations=1,
            rounds=1,
        )

We should probably try to avoid benchmark.pedantic and instead use normal benchmark:

diff --git a/model/common/src/icon4py/model/common/test_utils/helpers.py b/model/common/src/icon4py/model/common/test_utils/helpers.py
index 883a0aa7..f7c1445a 100644
--- a/model/common/src/icon4py/model/common/test_utils/helpers.py
+++ b/model/common/src/icon4py/model/common/test_utils/helpers.py
@@ -176,12 +176,10 @@ if pytest_benchmark:
             pytest.skip("Test skipped due to 'benchmark-disable' option.")
         else:
             input_data = allocate_data(backend, input_data)
-            benchmark.pedantic(
+            benchmark(
                 self.PROGRAM.with_backend(backend),
-                args=(),
-                kwargs={**input_data, "offset_provider": grid.get_all_offset_providers()},
-                iterations=1,
-                rounds=1,
+                **input_data,
+                offset_provider=grid.get_all_offset_providers()
             )
 
 else:

So the reason I fixed the iterations to 1 was to see whether that would decrease the runtime of the benchmarks as they seemed to be taking very long to run. But I agree that having only limited iterations makes benchmarks less useful. I think we still need to find a way to decrease the runtime, for example by making sure we only compile stencils once. Let's address that soon.

@samkellerhals
Copy link
Contributor Author

cscs-ci run

.gitignore Outdated
@@ -4,7 +4,7 @@ _local
_external_src
_reports
tmp
testdata
serialized_data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are also the grid files that are used in the test_gridmanager.pyin that same folder: serialized data get downloaded to testdata/ser_icondata, grid files to testdata/grids, so I would rather not refer to serialized... in the top level folder name.

Copy link
Contributor Author

@samkellerhals samkellerhals Jan 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving it as testdata then makes sense, wasn't aware we store anything else other than the serialized test data.

extends: .test_template
stage: benchmark
script:
- pip install dace==$DACE_VERSION
- tox -r -e stencil_tests -c model/ --verbose -- --benchmark-only --backend=dace_cpu --grid=simple_grid
- tox -r -e run_benchmarks -c model/ -- --benchmark-only --backend=dace_cpu --grid=icon_grid
only:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering: does our cscs-ci run count as manual trigger? I guess so... Also what does the only: -main refer to the target branch of the PR or are these options ignored in our setup since we run it from outside gitlab?

(don't know how it works and currentlyit runs always all of the jobs and the benchmarks take quite long... once we add the datatest that will get worse, response time wise...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure to be honest since @edopao added these dace jobs, maybe he can explain more. I would assume these benchmarks run only on main but have to be manually triggered, how I am not sure. Currently the dace jobs seem to be not run when using cscs-ci run.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As commented in today's standup meeting, the intention of this setting was to run the dace benchmark on main after PR is merged. However, this setting is ignored in our setup, as also noted above. I agree that we could have a separate CI pipeline for benchmarking, automatically triggered after PR is merged or by a daily job.

@@ -81,7 +81,12 @@ def reference(
return dict(theta_v=theta_v, exner=exner)

@pytest.fixture
def input_data(self, grid):
def input_data(self, grid, uses_icon_grid_with_otf):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have merged the verification of the global (EXCLAIM Aquaplanet) run, that means there is an additional serialized dataset (which for the datatest we would need to upload to the server) but it contains a global grid. Maybe using that one instead of the mch_ch_r04b09 local experiment would solve some of these issues. Lets discuss this...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok sounds good, let's discuss it tomorrow, uploading to the server should be relatively straightforward, Andreas can help us.

@@ -35,21 +35,6 @@ jobs:
python -m pip install --upgrade pip setuptools wheel
python -m pip install -r ./requirements-dev.txt
python -m pip list
- name: Run checks in icon4pytools
- name: Run checks
run: |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice...

ci/cscs.yml Outdated
extends: .test_template
stage: benchmark
script:
- pip install dace==$DACE_VERSION
- tox -r -e stencil_tests -c model/ --verbose -- --benchmark-only --backend=dace_gpu --grid=simple_grid
- tox -r -e run_benchmarks -c model/ -- --benchmark-only --backend=dace_gpu --grid=icon_grid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add the --benchmark-only for the run_benchmarks and the --benchmark-skip for the run_stencil_tests in the tox.ini file instead of having the repeatedly here in CI jobs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@samkellerhals
Copy link
Contributor Author

cscs-ci run

@halungge
Copy link
Contributor

halungge commented Jan 3, 2024

So the reason I fixed the iterations to 1 was to see whether that would decrease the runtime of the benchmarks as they seemed to be taking very long to run. But I agree that having only limited iterations makes benchmarks less useful. I think we still need to find a way to decrease the runtime, for example by making sure we only compile stencils once. Let's address that soon.

Agree. Maybe we could also split the pipelines? what test jobs does it make sense to run for each PR? benchmarks? how many backends, ... Are there some that we could run in scheduled daily builds?

@samkellerhals
Copy link
Contributor Author

cscs-ci run

@samkellerhals
Copy link
Contributor Author

cscs-ci run

Copy link

github-actions bot commented Jan 3, 2024

Mandatory Tests

Please make sure you run these tests via comment before you merge!

  • cscs-ci run
  • launch jenkins spack

Optional Tests

In case your change might affect downstream icon-exclaim, please consider running

  • launch jenkins icon

For more detailed information please look at CI in the EXCLAIM universe.

@samkellerhals
Copy link
Contributor Author

cscs-ci run

@samkellerhals
Copy link
Contributor Author

launch jenkins spack

@samkellerhals
Copy link
Contributor Author

samkellerhals commented Jan 4, 2024

@edopao @halungge I've opened this issue to keep track of the recompilation problem which I think we should solve at some point regardless of whether we move the benchmarking into a separate daily CI pipeline or not.

Copy link
Contributor

@halungge halungge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one small comment. There are some -- without option in the cscs.yml, I guess they can be deleted, or do they serve any purpose?

extends: .test_template
stage: benchmark
script:
- pip install dace==$DACE_VERSION
- tox -r -e stencil_tests -c model/ --verbose -- --benchmark-only --backend=dace_gpu --grid=simple_grid
- tox -r -e run_benchmarks -c model/ -- --backend=dace_gpu --grid=icon_grid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need this double double-dashes? Or did you simply forget to delete?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the double dashes are used to denote the end of arguments passed to tox itself, and that any subsequent arguments are to be treated as positional arguments passed to whatever command tox invokes, in this case pytest

@samkellerhals samkellerhals requested a review from halungge January 4, 2024 16:45
@@ -49,14 +52,14 @@ test_model_job_roundtrip_simple_grid:
extends: .test_template
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just leaving my 2 cents here: All of these jobs could be easily expressed using https://docs.gitlab.com/ee/ci/yaml/#needsparallelmatrix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try this in a new PR.

Copy link
Contributor

@edopao edopao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving this PR, since its original objective (adding serialized data) is achieved. Agreed to open a new PR, after this one is merged, to pick up some leftovers:

  • create a separate CI pipeline for the benchmark jobs
  • remove only:main from dace jobs
  • implement Till’s comment about using needs:parallel:matrix

Copy link
Contributor

@halungge halungge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, let's add further improvements in a follow up PR.

@samkellerhals samkellerhals merged commit 588987f into main Jan 5, 2024
5 checks passed
@samkellerhals samkellerhals deleted the serialised-data-ci branch January 5, 2024 08:48
OngChia pushed a commit that referenced this pull request Jan 14, 2025
Modified CI pipeline to run benchmarks on the icon grid using serialised data, and simplified pre-commit execution.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants