Skip to content

Commit

Permalink
Merge pull request #21 from vanderwb/main
Browse files Browse the repository at this point in the history
Final updates for HPC notebooks
  • Loading branch information
vanderwb authored Feb 6, 2023
2 parents 2cf94b7 + 2f38ee6 commit 54a76a6
Show file tree
Hide file tree
Showing 2 changed files with 20 additions and 9 deletions.
16 changes: 8 additions & 8 deletions notebooks/05-dask-hpc.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@
"\n",
"Using Dask on an HPC system is no different - we need to interact with the scheduler to provide Dask with ample compute resources. We *could* first start a job with multiple cores and a large amount of memory, and then use the **LocalCluster** to spawn workers. However, this approach only scales to a single node.\n",
"\n",
"The typical approach is to let Dask request resources directly from the job scheduler via a scheduler-specific cluster type. Such clusters are provided by the add-on `dask-joqueue` package."
"The typical approach is to let Dask request resources directly from the job scheduler via a scheduler-specific cluster type. Such clusters are provided by the add-on `dask-jobqueue` package."
]
},
{
Expand All @@ -65,7 +65,7 @@
"source": [
"### Creating a scheduled-cluster\n",
"\n",
"Since we use the PBS Pro scheduler at NCAR, we will use the **PBSCluster** Dask scheduler from `dask-joqueue`. Initialization is similar to a **LocalCluster**, but with unique parameters specific to creating batch jobs."
"Since we use the PBS Pro scheduler at NCAR, we will use the **PBSCluster** Dask scheduler from `dask-jobqueue`. Initialization is similar to a **LocalCluster**, but with unique parameters specific to creating batch jobs."
]
},
{
Expand Down Expand Up @@ -132,7 +132,7 @@
"* Can speed up file-reads in some situations\n",
"\n",
"**Using more workers per job will:**\n",
"* Allow for multithreading speedups in thread-friendly workflows\n",
"* Less overhead in thread-friendly workflows\n",
"* May allow for slightly higher memory thresholds since they will share a pool"
]
},
Expand Down Expand Up @@ -215,7 +215,7 @@
"source": [
"### Live Performance Monitoring\n",
"\n",
"Using `dask.distributed` provides us with a powerful diagnostic tool you have already seen: the *Dashboard*. The Dashboard can be integrated into your Jupyter environment in two ways - either with a separate website accessible from the Client widgit, or as tabs in your JupyterLab interface via the `dask-labextension` add-on.\n",
"Using `dask.distributed` provides us with a powerful diagnostic tool you have already seen: the *Dashboard*. The Dashboard can be integrated into your Jupyter environment in two ways - either with a separate website accessible from the Client widget, or as tabs in your JupyterLab interface via the `dask-labextension` add-on.\n",
"\n",
"**Tip:** JupyterLab Dashboard tabs can be saved as a \"workspace\" and loaded back in future sessions.\n",
"\n",
Expand Down Expand Up @@ -303,7 +303,7 @@
"source": [
"#### Dashboard demo: multi-file Xarray data analysis\n",
"\n",
"To demonstrate how the Dashboard can be useful, let's do some simply analysis of data files using Xarray. Here we load 8 days of GOES5 data, and compute the mean near-surface temperature across the western US."
"To demonstrate how the Dashboard can be useful, let's do some simple analysis of data files using Xarray. Here we load 19 days of GOES5 data, and compute the mean near-surface temperature across the western US."
]
},
{
Expand Down Expand Up @@ -362,7 +362,7 @@
"id": "37816544-50c9-42b2-835c-ce24a650f3c5",
"metadata": {},
"source": [
"It looks like our data will fit into RAM, but we can verify using the Dashboard. Let's construct our computation. Here we do the following:\n",
"It looks like our data chunks will fit into RAM, but we can verify using the Dashboard. Let's construct our computation. Here we do the following:\n",
"1. Subset the \"western US\" from the data via lat/lon slices\n",
"2. Take the mean of temperature values across our western US box\n",
"3. Select the near-surface level (0)\n",
Expand Down Expand Up @@ -482,7 +482,7 @@
"source": [
"%%time\n",
"# Since metrics are captured live anyway, the overhead from the report is small\n",
"with performance_report(filename=\"dask-pr.html\"):\n",
"with performance_report(filename=\"dask-report.html\"):\n",
" result = sfc_mean_graph.compute()"
]
},
Expand Down Expand Up @@ -600,7 +600,7 @@
"\n",
"Sometimes you will need to compute multiple parameters on data from Dask objects. Using `.persist()` to store intermediate data in worker memory can save computational time if used appropriately. The raw data can be persisted too, of course, but watch out for exhausting worker memory.\n",
"\n",
"Here we compare the time it takes - with and without persisting intermediate results - to compute our level-0 mean and a mean across all model levels.\n",
"Here we compare the time it takes - with and without persisting intermediate results - to compute our level-0 mean, a level-10 mean, and a mean across all model levels.\n",
"\n",
"We will also introduce another diagnostic tool here, the `MemorySampler` context manager."
]
Expand Down
13 changes: 12 additions & 1 deletion notebooks/06-dask-chunking.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -505,7 +505,8 @@
"outputs": [],
"source": [
"ds = xr.open_dataset(my_file, chunks = {'Time' : 1, \"num_metgrid_levels\" : 16,\n",
" \"south_north\" : 355, \"east_west\" : 355})"
" \"south_north\" : 355, \"east_west\" : 355})\n",
"ds.CLDFRA"
]
},
{
Expand Down Expand Up @@ -785,6 +786,16 @@
"mse_graph.compute()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "be018325-ef95-424a-9bf2-24d33a26c640",
"metadata": {},
"outputs": [],
"source": [
"client.shutdown()"
]
},
{
"cell_type": "markdown",
"id": "024686ad-6bfa-48ba-bd85-d005f9477902",
Expand Down

0 comments on commit 54a76a6

Please sign in to comment.