Skip to content

Commit

Permalink
Merge pull request #101 from dfe-analytical-services/databricks-perso…
Browse files Browse the repository at this point in the history
…nal-clusters

Adding updated cluster guidance for consistency with SQL warehouse guidance
  • Loading branch information
jen-machin authored Sep 18, 2024
2 parents a5bbc11 + 9630bb3 commit 95f1f71
Show file tree
Hide file tree
Showing 4 changed files with 182 additions and 98 deletions.
47 changes: 0 additions & 47 deletions ADA/databricks_fundamentals.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -200,51 +200,4 @@ All compute options can be used both within the Databricks platform and be conne
- [Setup Databricks SQL Warehouse with RStudio](/ADA/databricks_rstudio_sql_warehouse.html)
- [Setup Databricks Personal Compute cluster with RStudio](/ADA/databricks_rstudio_personal_cluster.html)

---

### Creating a personal compute resource

------------------------------------------------------------------------

1. To create your own personal compute resource click the 'Create with DfE Personal Compute' button on the compute page\

![](/images/ada-compute-personal.png)

2. You'll then be presented with a screen to configure the cluster. There are 2 options here under the performance section which you will want to pay attention to; Databricks runtime version, and Node type\
\
**Databricks runtime version** - This is the version of the Databricks software that will be present on your compute resource. Generally it is recommended you go with the latest LTS (long term support) version. At the time of writing this is '15.4 LTS'\
\
**Node type** - This option determines how powerful your cluster is and there are 2 options available by default:\

- Standard 14GB 4-Core Nodes\
- Large 28GB 8-Core Nodes\
\
If you require a larger personal cluster this can be requested by the ADA team.\
\
![](/images/ada-compute-personal-create.png)

3. Click the 'Create compute' button at the bottom of the page. This will create your personal cluster and begin starting it up. This usually takes around 5 minutes\
\
![](/images/ada-compute-personal-create-button.png)

4. Once the cluster is up and running the icon under the 'State' header on the 'Compute' page will appear as a green tick\
\
![](/images/ada-compute-ready.png)

::: callout-note
## Clusters will shut down after being idle for an hour

Use of compute resources are charged by the hour, and so personal clusters have been set to shut down after being unused for an hour in order to prevent unnecessary cost to the Department.
:::

::: callout-important
## Packages and libraries

As mentioned above compute resources have no storage of their own. This means that if you install libraries or packages onto a cluster they will only remain installed until the cluster is stopped. Once re-started those libraries will need to be installed again.

An alternative to this is to specify packages/libraries to be installed on the cluster at start up. To do this click the name of your cluster from the 'Compute' page, then go to the 'Libraries' tab and click the 'Install new' button.

Certain packages are installed by default on personal cluster and do not need to be installed manually. The specific packages installed are based on the Databricks Runtime (DBR) version your cluster is set up with. A comprehensive list of packages included in each DBR is available in the [Databricks documentation](https://learn.microsoft.com/en-us/azure/databricks/release-notes/runtime/).
:::

Once you have a compute resource you can begin using Databricks. You can do this either through connecting to Databricks through RStudio, or you can begin coding in the Databricks platforms using scripts, or [Notebooks](/ADA/databricks_notebooks.html).
Loading

0 comments on commit 95f1f71

Please sign in to comment.