HPC modifications from first run #22

vanderwb · 2023-02-07T16:17:45Z

After the first run of this tutorial, the following modifications seem useful in the HPC section:

Make sure viewers can run through the example without YAML config files!
Show comparison of various spill ratios on Casper and give guidance on recommended values.

Would be good to have a new section on analyzing perf metrics in more depth (case study of a real workflow).

More to come!

dcherian · 2023-02-07T16:34:35Z

That was a great tutorial. Here are some notes I made:

On resource allocation, one thing to think about is thread-based parallelism with numba, NumPy. It should be possible to request more cores (say 36, use 9 dask workers with 1 thread each, and then set NUMBA_NUM_THREADS=4 to enable thread parallelism with numba. I've done this with a LocalCluster on a cheyenne compute node to good effect. Perhaps this should be in an "Advanced Examples" section somewhere.
- I see the Nanny now sets these variables to 1. So perhaps we should demo how to overwrite that when you want thread based parallelism with numba or numpy on each dask worker.
- I think I did this initially to avoid reading too much data into memory (by limiting the number of dask workers), and then crunch through the read data quickly since I had so many cores lying idle. This may not be necessary anymore with the scheduling improvement.s
It may be a good idea to resurrect ncar_jobqueue and add NCARCluster.analyze to make the memory/CPU time plots you were showing; and NCARCluster.validate to check for known misconfigurations (e.g. more dask threads than requested number of cores in the resourcespec)

vanderwb · 2023-02-07T22:48:30Z

Thanks for the notes Deepak - these look like excellent selections. From follow up questions we've had, it seems like we would want to spend more time discussing the following too:

The distinction between cores, ncpus, processes, and workers when running a batch cluster.
Chunking that spans the time dimension across multiple files with Xarray
Using blocks that have ghost cells from neighboring blocks (e.g., via map_overlap)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HPC modifications from first run #22

HPC modifications from first run #22

vanderwb commented Feb 7, 2023

dcherian commented Feb 7, 2023 •

edited

Loading

vanderwb commented Feb 7, 2023

HPC modifications from first run #22

HPC modifications from first run #22

Comments

vanderwb commented Feb 7, 2023

dcherian commented Feb 7, 2023 • edited Loading

vanderwb commented Feb 7, 2023

dcherian commented Feb 7, 2023 •

edited

Loading