subcategory |
---|
Compute |
Installs a library on databricks_cluster. Each different type of library has a slightly different syntax. It's possible to set only one type of library within one resource. Otherwise, the plan will fail with an error.
-> Note databricks_library
resource would always start the associated cluster if it's not running, so make sure to have auto-termination configured. It's not possible to atomically change the version of the same library without cluster restart. Libraries are fully removed from the cluster only after restart.
You can install libraries on all clusters with the help of databricks_clusters data resource:
data "databricks_clusters" "all" {
}
resource "databricks_library" "cli" {
for_each = data.databricks_clusters.all.ids
cluster_id = each.key
pypi {
package = "databricks-cli"
}
}
resource "databricks_dbfs_file" "app" {
source = "${path.module}/app-0.0.1.jar"
path = "/FileStore/app-0.0.1.jar"
}
resource "databricks_library" "app" {
cluster_id = databricks_cluster.this.id
jar = databricks_dbfs_file.app.dbfs_path
}
Installing artifacts from Maven repository. You can also optionally specify a repo
parameter for a custom Maven-style repository, that should be accessible without any authentication. Maven libraries are resolved in Databricks Control Plane, so repo should be accessible from it. It can even be properly configured maven s3 wagon, AWS CodeArtifact or Azure Artifacts.
resource "databricks_library" "deequ" {
cluster_id = databricks_cluster.this.id
maven {
coordinates = "com.amazon.deequ:deequ:1.0.4"
// exclusions block is optional
exclusions = ["org.apache.avro:avro"]
}
}
resource "databricks_dbfs_file" "app" {
source = "${path.module}/baz.whl"
path = "/FileStore/baz.whl"
}
resource "databricks_library" "app" {
cluster_id = databricks_cluster.this.id
whl = databricks_dbfs_file.app.dbfs_path
}
Installing Python PyPI artifacts. You can optionally also specify the repo
parameter for a custom PyPI mirror, which should be accessible without any authentication for the network that cluster runs in.
-> Note repo
host should be accessible from the Internet by Databricks control plane. If connectivity to custom PyPI repositories is required, please modify cluster-node /etc/pip.conf
through databricks_global_init_script.
resource "databricks_library" "fbprophet" {
cluster_id = databricks_cluster.this.id
pypi {
package = "fbprophet==0.6"
// repo can also be specified here
}
}
Installing Python libraries listed in the requirements.txt
file. Only Workspace paths and Unity Catalog Volumes paths are supported. Requires a cluster with DBR 15.0+.
resource "databricks_library" "libraries" {
cluster_id = databricks_cluster.this.id
requirements = "/Workspace/path/to/requirements.txt"
}
resource "databricks_dbfs_file" "app" {
source = "${path.module}/foo.egg"
path = "/FileStore/foo.egg"
}
resource "databricks_library" "app" {
cluster_id = databricks_cluster.this.id
egg = databricks_dbfs_file.app.dbfs_path
}
Installing artifacts from CRan. You can also optionally specify a repo
parameter for a custom cran mirror.
resource "databricks_library" "rkeops" {
cluster_id = databricks_cluster.this.id
cran {
package = "rkeops"
}
}
-> Note Importing this resource is not currently supported.
The following resources are often used in the same context:
- End to end workspace management guide.
- databricks_clusters data to retrieve a list of databricks_cluster ids.
- databricks_cluster to create Databricks Clusters.
- databricks_cluster_policy to create a databricks_cluster policy, which limits the ability to create clusters based on a set of rules.
- databricks_dbfs_file data to get file content from Databricks File System (DBFS).
- databricks_dbfs_file_paths data to get list of file names from get file content from Databricks File System (DBFS).
- databricks_dbfs_file to manage relatively small files on Databricks File System (DBFS).
- databricks_global_init_script to manage global init scripts, which are run on all databricks_cluster and databricks_job.
- databricks_job to manage Databricks Jobs to run non-interactive code in a databricks_cluster.
- databricks_mount to mount your cloud storage on
dbfs:/mnt/name
. - databricks_pipeline to deploy Delta Live Tables.
- databricks_repo to manage Databricks Repos.