Skip to content

Commit

Permalink
[dagster-azure] fix azure compute log manager credentialing (#26157)
Browse files Browse the repository at this point in the history
Using this PR to accumulate downstack before landing.

ORIGINAL PR MESSAGE:
## Summary & Motivation
I think the existing credentialing here is busted - `BlobServiceClient` doesn't accept a bare secret string anymore - instead, in order to pass a secret, you need a fully qualified dictionary.

Technically this is a breaking change but I don't see how it could be working right now anyway.

## How I Tested These Changes
Tested manually with a local dagster installation running the compute log manager live. Downstream I'll add actual integration tests.
  • Loading branch information
dpeng817 authored Dec 5, 2024
1 parent 5595598 commit 5da96ff
Show file tree
Hide file tree
Showing 29 changed files with 692 additions and 43 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,17 @@ def build_dagster_oss_nightly_steps() -> List[BuildkiteStep]:
"KS_DBT_CLOUD_DISCOVERY_API_URL",
],
),
PackageSpec(
"integration_tests/test_suites/dagster-azure-live-tests",
name="azure-live-tests",
env_vars=[
"TEST_AZURE_TENANT_ID",
"TEST_AZURE_CLIENT_ID",
"TEST_AZURE_CLIENT_SECRET",
"TEST_AZURE_STORAGE_ACCOUNT_ID",
"TEST_AZURE_CONTAINER_ID",
],
),
]
)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
GCP_CREDS_LOCAL_FILE,
LATEST_DAGSTER_RELEASE,
)
from dagster_buildkite.git import ChangedFiles
from dagster_buildkite.package_spec import (
PackageSpec,
PytestExtraCommandsFunction,
Expand Down Expand Up @@ -42,6 +43,7 @@ def build_integration_steps() -> List[BuildkiteStep]:
steps += build_k8s_suite_steps()
steps += build_daemon_suite_steps()
steps += build_auto_materialize_perf_suite_steps()
steps += build_azure_live_test_suite_steps()

return steps

Expand Down Expand Up @@ -165,6 +167,29 @@ def build_auto_materialize_perf_suite_steps():
)


def skip_if_not_azure_commit():
"""If no dagster-azure files are changed, skip the azure live tests."""
return (
None
if (any("dagster-azure" in str(path) for path in ChangedFiles.all))
else "Not a dagster-azure commit"
)


def build_azure_live_test_suite_steps() -> List[BuildkiteTopLevelStep]:
return PackageSpec(
os.path.join("integration_tests", "test_suites", "dagster-azure-live-tests"),
skip_if=skip_if_not_azure_commit,
env_vars=[
"TEST_AZURE_TENANT_ID",
"TEST_AZURE_CLIENT_ID",
"TEST_AZURE_CLIENT_SECRET",
"TEST_AZURE_STORAGE_ACCOUNT_ID",
"TEST_AZURE_CONTAINER_ID",
],
).build_steps()


def daemon_pytest_extra_cmds(version: AvailablePythonVersion, _):
return [
"export DAGSTER_DOCKER_IMAGE_TAG=$${BUILDKITE_BUILD_ID}-" + version.value,
Expand Down
18 changes: 18 additions & 0 deletions docs/content/_navigation.json
Original file line number Diff line number Diff line change
Expand Up @@ -548,6 +548,24 @@
{
"title": "Setting environment variables",
"path": "/dagster-plus/managing-deployments/setting-environment-variables-agents"
},
{
"title": "Deploying on Azure: Full guide",
"path": "/dagster-plus/deployment/azure/overview",
"children": [
{
"title": "Part 1: Setting up an AKS agent.",
"path": "/dagster-plus/deployment/azure/aks-agent"
},
{
"title": "Part 2: Deploy user code using ACR.",
"path": "/dagster-plus/deployment/azure/acr-user-code"
},
{
"title": "Part 3: Storing compute logs in Azure Blob Storage / Azure Data Lake",
"path": "/dagster-plus/deployment/azure/blob-compute-logs"
}
]
}
]
}
Expand Down
135 changes: 135 additions & 0 deletions docs/content/dagster-plus/deployment/azure/acr-user-code.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
# Deploying user code in Azure Container Registry (ACR) with Dagster+

This quickstart guide will walk you through setting up a new repository for your Dagster code, setting up CI/CD with GitHub Actions backed by Azure Container Registry (ACR), and deploying your code to your Azure Kubernetes Service (AKS) cluster.

This guide assumes you already have an AKS agent running. You can follow along [here](/dagster-plus/deployment/azure/aks-agent) if you still need to set up an AKS agent.

## Prerequisites

This guide will use a Github repository to store the Dagster code, and GitHub Actions to deploy the code to Azure Container Registry. If you need to use another CI/CD provider, such as Azure DevOps, the steps here will need to be adapted. Check out our guide on configuring CI/CD using the `dagster-cloud` CLI [here](/dagster-plus/getting-started#step-4-configure-cicd-for-your-project).

- The azure CLI installed on your machine. You can download it [here](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli).
- A GitHub account, and the ability to run GitHub Actions workflows in a repository.

## Step 1: Creating a repository for Dagster code.

We'll create a new repository based on the [Dagster+ hybrid quickstart repository](https://github.com/dagster-io/dagster-cloud-hybrid-quickstart). We'll go through these steps using a brand new repository in GitHub, but you should be able to adapt these steps to an existing repository or other version control systems.

First, [create a new repository in GitHub](https://docs.github.com/en/repositories/creating-and-managing-repositories/creating-a-new-repository). Going forward, we'll refer to this repository as `dagster-plus-code`.

Next, we'll run a few commands which clone both our new repository and the Dagster+ hybrid quickstart repository to our local machine.

```bash
git clone <your-repo-url> dagster-plus-code
git clone [email protected]:dagster-io/dagster-cloud-hybrid-quickstart.git
```

We'll copy the contents of the `dagster-cloud-hybrid-quickstart` repository into our `dagster-plus-code` repository, and commit the changes.

```bash
rsync -av --exclude='.git' dagster-cloud-hybrid-quickstart/ dagster-plus-code/
cd dagster-plus-code
git add .
git commit -m "Initial commit"
git push
```

### Project structure

The project has the following structure:

```plaintext
├── .github
│ └── workflows
│ └── dagster-cloud-deploy.yml # GitHub Actions workflow for re-deploying code location
├── .vscode # Standard VSCode settings for working with a Dagster repository
├── Dockerfile # Dockerfile for building the user code image
├── README.md
├── dagster_cloud.yaml # Configuration file describing all code locations in the repository
├── pyproject.toml # Python project configuration file for the code location
├── quickstart_etl # Python package containing the user code
│ ├── __init__.py
│ ├── assets
│ │ ├── __init__.py
│ │ └── hackernews.py
│ └── definitions.py
├── quickstart_etl_tests # User code tests
│ ├── __init__.py
│ └── test_assets.py
├── setup.cfg
└── setup.py
```

## Step 2: Setting up an Azure Container Registry

Next, we'll set up an Azure Container Registry to store our Docker images. We'll use the Azure CLI to create the registry.

```bash
az login
az acr create --resource-group <your_resource_group> --name <your-acr-name> --sku Basic
```

Then, we'll make images from our ACR available to our AKS cluster.

```bash
az aks update -n <your-cluster-name> -g <your_resource_group> --attach-acr <your-acr-name>
```

## Step 3: Setting up GitHub Actions

Now, we'll set up a Github Actions workflow to build and push our Docker image to Azure Container Registry.

We already have a GitHub Actions workflow in our repository, located at `.github/workflows/dagster-cloud-deploy.yml`. This workflow will build the Docker image, push it to ACR, and update the code location in Dagster+. To get it working with your repository, you'll need to do a few things.

#### Generate Azure credentials

First, we'll need to generate a service principal for GitHub Actions to use to authenticate with Azure. We'll use the Azure CLI to create the service principal.

```bash
az ad sp create-for-rbac --name "github-actions-acr" --role contributor --scopes /subscriptions/<your_azure_subscription_id>/resourceGroups/<your_resource_group>/providers/Microsoft.ContainerRegistry/registries/<your_acr_name>
```

This command will output a JSON object with the service principal details. Make sure to save the `appId`, `password`, and `tenant` values - we'll use them in the next step.

### Add secrets to your repository

We'll add the service principal details as secrets in our repository. Go to your repository in GitHub, and navigate to `Settings` -> `Secrets`. Add the following secrets:

- `AZURE_CLIENT_ID`: The `appId` from the service principal JSON object.
- `AZURE_CLIENT_SECRET`: The `password` from the service principal JSON object.

### Update the workflow

Finally, we'll update the workflow to use the service principal details. Open `.github/workflows/dagster-cloud-deploy.yml` in your repository, and uncomment the section on Azure Container Registry. It should look like this:

```yaml
# Azure Container Registry (ACR)
# https://github.com/docker/login-action#azure-container-registry-acr
- name: Login to Azure Container Registry
if: steps.prerun.outputs.result != 'skip'
uses: docker/login-action@v3
with:
registry: ${{ env.IMAGE_REGISTRY }}
username: ${{ secrets.AZURE_CLIENT_ID }}
password: ${{ secrets.AZURE_CLIENT_SECRET }}
```
### Push and run the workflow
Now, commit and push the changes to your repository. The GitHub Actions workflow should run automatically. You can check the status of the workflow in the `Actions` tab of your repository.

<Image
src="/images/dagster-cloud/azure/github-actions-workflow.png"
alt="GitHub Actions workflow for deploying user code to Azure Container Registry"
width={970}
height={794}
/>

When the workflow completes, you should see the new code location in Dagster+. Navigate to the `Status` page, and click the `Code Locations` tab. You should see your new code location listed.

<Image
src="/images/dagster-cloud/azure/dagster-cloud-code-locations.png"
alt="Dagster+ code locations page showing the new code location"
width={1152}
height={320}
/>
44 changes: 44 additions & 0 deletions docs/content/dagster-plus/deployment/azure/aks-agent.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Dagster+ on Azure - Quickstart Guide

This guide will walk you through deploying a Dagster+ agent on an Azure Kubernetes Service (AKS) cluster.

This guide is intended to be a quickstart, and you should always defer to organization-specific guidelines for creating and managing new infrastructure.

We'll start from a brand new organization in Dagster+, and finish with a full hybrid deployment of Dagster+ using Azure infrastructure.

## Prerequisites

To complete the steps in this guide, you'll need:

- The azure CLI installed on your machine. You can download it [here](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli).
- `kubectl` installed on your machine. You can download it [here](https://kubernetes.io/docs/tasks/tools/install-kubectl/).
- `helm` installed on your machine. You can download it [here](https://helm.sh/docs/intro/install/).
- An existing AKS cluster. If you need to create a new AKS cluster, refer to the [Azure documentation](https://learn.microsoft.com/en-us/azure/aks/learn/quick-kubernetes-deploy-portal?tabs=azure-cli).
- A Dagster+ organization, with an agent token for that organization.

## Step 1: Generate a Dagster+ agent token.

<GenerateAgentToken />

## Step 2: Log in to your AKS cluster.

We'll use the azure CLI to log in to your AKS cluster. Run the following command and follow the prompts to log in.

```bash
az login
az aks get-credentials --resource-group <your-resource-group> --name <your-aks-cluster>
```

We should now be able to verify our installation by running a command that tells us the current context of our kubectl installation. We'd expect it to output the name of the AKS cluster.

```bash
kubectl config current-context
```

## Step 3: Install the Dagster+ agent on the AKS cluster.

Next, we'll install the agent helm chart. You should be able to follow the guide [here](/dagster-plus/deployment/agents/kubernetes/configuring-running-kubernetes-agent) to install the agent on the AKS cluster.

## Next steps

Now that you have an agent running on your AKS cluster, you can start deploying Dagster code to it. You can follow the guide [here](/dagster-plus/deployment/azure/acr-user-code) to deploy user code to your AKS cluster backed by Azure Container Registry (ACR).
127 changes: 127 additions & 0 deletions docs/content/dagster-plus/deployment/azure/blob-compute-logs.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
# Storing compute logs in Azure Blob Storage/Azure Data Lake Storage

In this guide, we'll walk through how to store compute logs in Azure Blob Storage or Azure Data Lake Storage. This guide assumes you have already set up an Azure Kubernetes Service (AKS) agent and deployed user code in Azure Container Registry (ACR).

This guide focuses on using Azure Blob Storage, but the same steps should be applicable for Azure Data Lake Storage.

If you have not yet set up an AKS agent, you can follow the [Deploy an Azure Kubernetes Service (AKS) agent guide](/dagster-plus/deployment/azure/aks-agent). If you have not yet deployed user code in ACR, you can follow the [Deploy user code in Azure Container Registry (ACR) guide](/dagster-plus/deployment/azure/acr-user-code).

## Prerequisites

To complete the steps in this guide, you'll need:

- The Azure CLI installed on your machine. You can download it [here](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli).
- An Azure account with the ability to create resources in Azure Blob Storage or Azure Data Lake Storage.
- An Azure container in Azure Blob Storage or Azure Data Lake Storage where you want to store logs.
- Either the `quickstart_etl` module from the [hybrid quickstart repo](https://github.com/dagster-io/dagster-cloud-hybrid-quickstart), or any other code location successfully imported, which contains at least one asset or job that will generate logs for you to test against.

## Step 1: Give AKS agent access to blob storage account

We need to ensure that the AKS agent has the necessary permissions to write logs to Azure Blob Storage or Azure Data Lake Storage. We'll do this with some azure CLI commands.

First, we'll enable the cluster to use workload identity. This will allow the AKS agent to use a managed identity to access Azure resources.

```bash
az aks update --resource-group <resource-group> --name <cluster-name> --enable-workload-identity
```

Then, we'll create a new managed identity for the AKS agent, and a new service account in our AKS cluster.

```bash
az identity create --resource-group <resource-group> --name agent-identity
kubectl create serviceaccount dagster-agent-service-account --namespace dagster-agent
```

Now we need to federate the managed identity with the service account.

```bash
az identity federated-credential create \
--name dagster-agent-federated-id \
--identity-name agent-identity \
--resource-group <resource-group> \
--issuer $(az aks show -g <resource-group> -n <aks-cluster-name> --query "oidcIssuerProfile.issuerUrl" -otsv) \
--subject system:serviceaccount:dagster-agent:dagster-agent-service-account
```

Finally, we'll edit our AKS agent deployment to use the new service account.

```bash
kubectl edit deployment <your-user-cloud-deployment> -n dagster-agent
```

In the deployment manifest, add the following lines:

```yaml
metadata:
...
labels:
...
azure.workload.identity/use: "true"
spec:
...
template:
...
spec:
...
serviceAccountName: dagster-agent-sa
```
If everything is set up correctly, you should be able to run the following command and see an access token returned:
```bash
kubectl exec -n dagster-agent -it <pod-in-cluster> -- bash
# in the pod
curl -H "Metadata:true" "http://169.254.169.254/metadata/identity/oauth2/token?resource=https://storage.azure.com/"
```

## Step 2: Configure Dagster to use Azure Blob Storage

Now, you need to update the helm values to use Azure Blob Storage for logs. You can do this by editing the `values.yaml` file for your user-cloud deployment.

Pull down the current values for your deployment:

```bash
helm get values user-cloud > current-values.yaml
```

Then, edit the `current-values.yaml` file to include the following lines:

```yaml
computeLogs:
enabled: true
custom:
module: dagster_azure.blob.compute_log_manager
class: AzureBlobComputeLogManager
config:
storage_account: mystorageaccount
container: mycontainer
default_azure_credential:
exclude_environment_credential: false
prefix: dagster-logs-
local_dir: "/tmp/cool"
upload_interval: 30
```
Finally, update your deployment with the new values:
```bash
helm upgrade user-cloud dagster-cloud/dagster-cloud-agent -n dagster-agent -f current-values.yaml
```

## Step 3: Verify logs are being written to Azure Blob Storage

It's time to kick off a run in Dagster to test your new configuration. If following along with the quickstart repo, you should be able to kick off a run of the `all_assets_job`, which will generate logs for you to test against. Otherwise, use any job that emits logs. When you go to the stdout/stderr window of the run page, you should see a log file that directs you to the Azure Blob Storage container.

<Image
src="/images/dagster-cloud/azure/azure-blob-storage-logs.png"
alt="Azure Blob Storage logs in Dagster"
width={970}
height={794}
/>

<Note>
Whether or not the URL will be clickable depends on whether your logs are
public or private. If they are private, directly clicking the link would not
work, and instead you should use either the Azure CLI or the Azure Portal to
access the logs using the URL.
</Note>
Loading

1 comment on commit 5da96ff

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deploy preview for dagster-docs ready!

✅ Preview
https://dagster-docs-ifwoiwjjx-elementl.vercel.app
https://master.dagster.dagster-docs.io

Built with commit 5da96ff.
This pull request is being automatically deployed with vercel-action

Please sign in to comment.