Skip to content

sfc-gh-kjilla/airflow-snowparkml-demo

 
 

Repository files navigation

Intro

Snowpark ML (in public preview) is a python framework for Machine Learning workloads with Snowpark. Currently Snowpark ML provides a model registry (storing ML tracking data and models in Snowflake tables and stages), feature engineering primitives similar to scikit-learn (ie. LabelEncoder, OneHotEncoder, etc.) and support for training and deploying certain model types as well as deployments as user-defined functions (UDFs).

This guide demonstrates how to use Apache Airflow to orchestrate a machine learning pipeline leveraging Snowpark ML for feature engineering as well as model training and scoring.

This demo also shows the use of the Snowflake XCOM backend which reinforces security and governance by serializing all task in/output to Snowflake tables and stages while storing in the Airflow XCOM table a URI pointer to the data.

Prerequisites

Setup

  1. Install Astronomer's Astro CLI. The Astro CLI is an Apache 2.0 licensed, open-source tool for building Airflow instances and is the fastest and easiest way to be up and running with Airflow in minutes. Open a terminal window and run:

For MacOS

brew install astro

For Linux

curl -sSL install.astronomer.io | sudo bash -s
  1. Clone this repository:
git clone https://github.com/astronomer/airflow-snowparkml-demo
cd airflow-snowparkml-demo
  1. Open the .env file in an editor and update the following variables with you account information This demo assumes the use of a new Snowflake trial account with admin privileges. A database named 'DEMO' and schema named 'DEMO' will be created in the DAG. Running this demo without admin privileges or with existing database/schema will require further updates to the .env file.
  • AIRFLOW_CONN_SNOWFLAKE_DEFAULT
    -- login
    -- password
    -- account **

** The Snowflake account field of the connection should use the new ORG_NAME-ACCOUNT_NAME format as per Snowflake Account Identifier policies. The ORG and ACCOUNT names can be found in the confirmation email or in the Snowflake login link (ie. https://xxxxxxx-yyy11111.snowflakecomputing.com/console/login) Do not specify a region when using this format for accounts.

NOTE: Database and Schema names should be CAPITALIZED due to a bug in Snowpark ML.

  1. Start Apache Airflow:

    astro dev start
  2. Run the Snowpark ML Demo DAG

astro dev run dags unpause snowpark_ml_demo
astro dev run dags trigger snowpark_ml_demo
  1. Connect to the Local Airflow UI and login with admin/admin

  2. While waiting for the DAG run to complete exam the DAG code by opening the file include/dags/snowpark_ml_demo.py. Each function includes a docstring with an explanation of the task functions.

For a more advanced example see the Customer Analytics Demo

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.1%
  • Other 0.9%