Merge pull request #21 from vanderwb/main

Final updates for HPC notebooks
NCAR · Feb 6, 2023 · 54a76a6 · 54a76a6
2 parents 2cf94b7 + 2f38ee6
commit 54a76a6
Show file tree

Hide file tree

Showing 2 changed files with 20 additions and 9 deletions.
diff --git a/notebooks/05-dask-hpc.ipynb b/notebooks/05-dask-hpc.ipynb
@@ -55,7 +55,7 @@
     "\n",
     "Using Dask on an HPC system is no different - we need to interact with the scheduler to provide Dask with ample compute resources. We *could* first start a job with multiple cores and a large amount of memory, and then use the **LocalCluster** to spawn workers. However, this approach only scales to a single node.\n",
     "\n",
-    "The typical approach is to let Dask request resources directly from the job scheduler via a scheduler-specific cluster type. Such clusters are provided by the add-on `dask-joqueue` package."
+    "The typical approach is to let Dask request resources directly from the job scheduler via a scheduler-specific cluster type. Such clusters are provided by the add-on `dask-jobqueue` package."
    ]
   },
   {
@@ -65,7 +65,7 @@
    "source": [
     "### Creating a scheduled-cluster\n",
     "\n",
-    "Since we use the PBS Pro scheduler at NCAR, we will use the **PBSCluster** Dask scheduler from `dask-joqueue`. Initialization is similar to a **LocalCluster**, but with unique parameters specific to creating batch jobs."
+    "Since we use the PBS Pro scheduler at NCAR, we will use the **PBSCluster** Dask scheduler from `dask-jobqueue`. Initialization is similar to a **LocalCluster**, but with unique parameters specific to creating batch jobs."
    ]
   },
   {
@@ -132,7 +132,7 @@
     "* Can speed up file-reads in some situations\n",
     "\n",
     "**Using more workers per job will:**\n",
-    "* Allow for multithreading speedups in thread-friendly workflows\n",
+    "* Less overhead in thread-friendly workflows\n",
     "* May allow for slightly higher memory thresholds since they will share a pool"
    ]
   },
@@ -215,7 +215,7 @@
    "source": [
     "### Live Performance Monitoring\n",
     "\n",
-    "Using `dask.distributed` provides us with a powerful diagnostic tool you have already seen: the *Dashboard*. The Dashboard can be integrated into your Jupyter environment in two ways - either with a separate website accessible from the Client widgit, or as tabs in your JupyterLab interface via the `dask-labextension` add-on.\n",
+    "Using `dask.distributed` provides us with a powerful diagnostic tool you have already seen: the *Dashboard*. The Dashboard can be integrated into your Jupyter environment in two ways - either with a separate website accessible from the Client widget, or as tabs in your JupyterLab interface via the `dask-labextension` add-on.\n",
     "\n",
     "**Tip:** JupyterLab Dashboard tabs can be saved as a \"workspace\" and loaded back in future sessions.\n",
     "\n",
@@ -303,7 +303,7 @@
    "source": [
     "#### Dashboard demo: multi-file Xarray data analysis\n",
     "\n",
-    "To demonstrate how the Dashboard can be useful, let's do some simply analysis of data files using Xarray. Here we load 8 days of GOES5 data, and compute the mean near-surface temperature across the western US."
+    "To demonstrate how the Dashboard can be useful, let's do some simple analysis of data files using Xarray. Here we load 19 days of GOES5 data, and compute the mean near-surface temperature across the western US."
    ]
   },
   {
@@ -362,7 +362,7 @@
    "id": "37816544-50c9-42b2-835c-ce24a650f3c5",
    "metadata": {},
    "source": [
-    "It looks like our data will fit into RAM, but we can verify using the Dashboard. Let's construct our computation. Here we do the following:\n",
+    "It looks like our data chunks will fit into RAM, but we can verify using the Dashboard. Let's construct our computation. Here we do the following:\n",
     "1. Subset the \"western US\" from the data via lat/lon slices\n",
     "2. Take the mean of temperature values across our western US box\n",
     "3. Select the near-surface level (0)\n",
@@ -482,7 +482,7 @@
    "source": [
     "%%time\n",
     "# Since metrics are captured live anyway, the overhead from the report is small\n",
-    "with performance_report(filename=\"dask-pr.html\"):\n",
+    "with performance_report(filename=\"dask-report.html\"):\n",
     "    result = sfc_mean_graph.compute()"
    ]
   },
@@ -600,7 +600,7 @@
     "\n",
     "Sometimes you will need to compute multiple parameters on data from Dask objects. Using `.persist()` to store intermediate data in worker memory can save computational time if used appropriately. The raw data can be persisted too, of course, but watch out for exhausting worker memory.\n",
     "\n",
-    "Here we compare the time it takes - with and without persisting intermediate results - to compute our level-0 mean and a mean across all model levels.\n",
+    "Here we compare the time it takes - with and without persisting intermediate results - to compute our level-0 mean, a level-10 mean, and a mean across all model levels.\n",
     "\n",
     "We will also introduce another diagnostic tool here, the `MemorySampler` context manager."
    ]

diff --git a/notebooks/06-dask-chunking.ipynb b/notebooks/06-dask-chunking.ipynb
@@ -505,7 +505,8 @@
    "outputs": [],
    "source": [
     "ds = xr.open_dataset(my_file, chunks = {'Time' : 1, \"num_metgrid_levels\" : 16,\n",
-    "                                        \"south_north\" : 355, \"east_west\" : 355})"
+    "                                        \"south_north\" : 355, \"east_west\" : 355})\n",
+    "ds.CLDFRA"
    ]
   },
   {
@@ -785,6 +786,16 @@
     "mse_graph.compute()"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "be018325-ef95-424a-9bf2-24d33a26c640",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "client.shutdown()"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "024686ad-6bfa-48ba-bd85-d005f9477902",