Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

O1.3.4a Ensure that model caches are diagnostic and avoid complicated interdependencies #1098

Open
5 tasks
juliasloan25 opened this issue Nov 23, 2024 · 1 comment
Assignees
Labels
🏅 SDI Software Design Issue

Comments

@juliasloan25
Copy link
Member

juliasloan25 commented Nov 23, 2024

The Climate Modeling Alliance

Software Design Issue 📜

Purpose

Currently, both ClimaAtmos and ClimaLand have variables in their caches that depend on the cache of the other model when coupled. This complicates coupling because it introduces ill-defined orders of operations that are required to have correctly computed cache variables. Without correctly handling these orderings, the beginning of our coupled simulations will be inconsistent across component models.

In typical simulations we may accept this as some spin-up time that we can disregard later on, but in the case of restarts this leads to inexactness, so that a restarted simulation will have different results from one that is not restarted. As we look towards running long, computationally-intensive runs on Derecho (where we have a walltime limit per run), we will need reliable and exact restarts.

Note that this issue does not come up when restarting individual component models, because we can first read initial conditions from forcing data (which does not rely on the model being run), then use these to compute the state and cache of the model. In this case, all cache variables depend on the known forcing data and state, so they can be computed correctly.

This SDI essentially communicates a new guideline with respect to the role of a cache in our models: Where possible, we should remove variables from our caches, and replace their access with on-the-fly functional computation. This is better suited to running simulations on GPUs, which have strong computational ability but are memory limited.

Cost/Benefits/Risks

Costs: developer time, potentially worse CPU performance
Benefits: simplified coupling and model structure, potentially better GPU performance
Risks: we may still end up with interdependence even after cleaning up what we can

People and Personnel

Lead: @juliasloan25 @Sbozzolo
Collaborators: @charleskawczynski @kmdeck @szy21 @trontrytel

Components

  • Reduce ClimaAtmos and ClimaLand caches where possible
  • Identify existing interdependencies between ClimaAtmos and ClimaLand cache variables
    • Note that some interdependencies are inevitable, based in the physical processes being modeled (e.g. surface albedo and atmospheric radiation feedbacks). We should let the physics guide the order of operations when an ordering is necessary.
  • Identify minimal set of interdependencies between cache variables
  • Identify roadblocks for coupling within the caches
  • Track how these changes affect performance

Inputs

  • Existing models and coupling

Results and Deliverables

  • Schematic demonstrating the interdependencies between ClimaAtmos and ClimaLand cache variables
  • Identified minimal set of required interdependencies
  • Quantitative results of performance changes on both CPU and GPU due to these changes

SDI Revision Log

22 Nov 2024: SDI created by @juliasloan25

CC

@tapios @sriharshakandala @charleskawczynski @cmbengue

Scope of Work

Tasks

Preview Give feedback
@tapios
Copy link

tapios commented Nov 23, 2024

This looks necessary and valuable and I am looking forward to seeing these changes being realized. Please keep me in the loop on conceptual questions (e.g., what we should cache or not, and questions involving order of operation).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏅 SDI Software Design Issue
Projects
None yet
Development

No branches or pull requests

7 participants