Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added notebooks for quickstart #44

Merged
merged 2 commits into from
Jul 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,213 @@
{
"metadata": {
"kernelspec": {
"display_name": "Streamlit Notebook",
"name": "streamlit"
}
},
"nbformat_minor": 5,
"nbformat": 4,
"cells": [
{
"cell_type": "markdown",
"id": "61f3eed1-51a2-4810-bd35-4e116de35cb3",
"metadata": {
"name": "getting_started",
"collapsed": false
},
"source": "Welcome to Snowflake! This guide helps you get started with Snowpark for data exploration and analysis. In this exercise, you will:\n\n * Load data from Snowflake tables into Snowpark DataFrames\n * Perform exploratory data analysis on Snowpark DataFrames\n * Pivot and join data from multiple tables using Snowpark DataFrames\n * Save transformed data into a Snowflake table"
},
{
"cell_type": "markdown",
"id": "754efa75-b08b-4134-bdf1-cd0ae8fe6fef",
"metadata": {
"name": "step_1",
"collapsed": false
},
"source": "## Import Snowpark and create Snowpark session"
},
{
"cell_type": "code",
"id": "cde5f575-ee0e-4761-9f6e-11936d520c09",
"metadata": {
"language": "python",
"name": "imports",
"collapsed": false,
"codeCollapsed": false
},
"outputs": [],
"source": "import snowflake.snowpark as snowpark\nfrom snowflake.snowpark.functions import month,year,col,sum",
"execution_count": null
},
{
"cell_type": "code",
"id": "3775908f-ca36-4846-8f38-5adca39217f2",
"metadata": {
"language": "python",
"name": "snowpark_session",
"collapsed": false,
"codeCollapsed": false
},
"source": "from snowflake.snowpark.context import get_active_session\nsession = get_active_session()",
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"id": "43ac9528-f167-4842-b4ca-dd7c3af2dc48",
"metadata": {
"name": "step_2",
"collapsed": false
},
"source": "## Load `campaign_spend` and `monthly_revenue` tables into Snowpark dataframes"
},
{
"cell_type": "code",
"id": "8d50cbf4-0c8d-4950-86cb-114990437ac9",
"metadata": {
"language": "python",
"name": "read_from_table",
"collapsed": false
},
"source": "snow_df_spend = session.table('campaign_spend')\nsnow_df_revenue = session.table('monthly_revenue')",
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"id": "6146fbe8-e88b-4d8f-8322-505f7e7a5c53",
"metadata": {
"name": "step_3",
"collapsed": false
},
"source": "## Total Spend per Year and Month For All Channels\nLet's transform the campaign spend data so we can see total cost per year/month per channel using `group_by()` and `agg()` Snowpark DataFrame functions."
},
{
"cell_type": "code",
"id": "c695373e-ac74-4b62-a1f1-08206cbd5c81",
"metadata": {
"language": "python",
"name": "spend_per_channel",
"collapsed": false
},
"source": "snow_df_spend_per_channel = snow_df_spend.group_by(year('DATE'), month('DATE'),'CHANNEL').\\\n agg(sum('TOTAL_COST').as_('TOTAL_COST')).\\\n with_column_renamed('\"YEAR(DATE)\"',\"YEAR\").\\\n with_column_renamed('\"MONTH(DATE)\"',\"MONTH\").\\\n sort('YEAR','MONTH')\nsnow_df_spend_per_channel.show()",
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"id": "ccc9fe0a-e400-4c22-90dc-7dd6820ad0a4",
"metadata": {
"name": "step_4",
"collapsed": false
},
"source": "## Total Spend per Year and Month\nLet's further transform the campaign spend data by `pivoting` on the `channel` dimension. This should give us the campaign spend for every month across all channels on the same row."
},
{
"cell_type": "code",
"id": "30e5ae75-40a6-4d0f-a456-07c4e23a6eda",
"metadata": {
"language": "python",
"name": "spend_per_month",
"codeCollapsed": false,
"collapsed": false
},
"outputs": [],
"source": "snow_df_spend_per_month = snow_df_spend_per_channel.pivot('CHANNEL',['search_engine','social_media','video','email']).\\\n sum('TOTAL_COST').\\\n sort('YEAR','MONTH')\n\nsnow_df_spend_per_month = snow_df_spend_per_month.select(\n col(\"YEAR\"),\n col(\"MONTH\"),\n col(\"'search_engine'\").as_(\"SEARCH_ENGINE\"),\n col(\"'social_media'\").as_(\"SOCIAL_MEDIA\"),\n col(\"'video'\").as_(\"VIDEO\"),\n col(\"'email'\").as_(\"EMAIL\")\n )\n\nsnow_df_spend_per_month.show()",
"execution_count": null
},
{
"cell_type": "markdown",
"id": "35572f55-82f9-4b54-9622-304d03d5f7d7",
"metadata": {
"name": "step_5",
"collapsed": false
},
"source": "## Total Revenue per Year and Month\nNow let's transform the revenue data into revenue per year/month using `group_by()` and `agg()` functions."
},
{
"cell_type": "code",
"id": "459ce8e4-38e3-4a9c-aea5-d879ba3c7a2b",
"metadata": {
"language": "python",
"name": "revenue_per_month",
"collapsed": false
},
"outputs": [],
"source": "snow_df_revenue_per_month = snow_df_revenue.group_by('YEAR','MONTH').\\\n agg(sum('REVENUE')).\\\n sort('YEAR','MONTH').\\\n with_column_renamed('SUM(REVENUE)','REVENUE')\nsnow_df_revenue_per_month.show()",
"execution_count": null
},
{
"cell_type": "markdown",
"id": "05e7c9d6-22b4-4343-927a-6bb330b169cb",
"metadata": {
"name": "step_6",
"collapsed": false
},
"source": "## Join Total Spend and Total Revenue per Year and Month Across All Channels\nNext let's `join` this `revenue` data with the transformed `campaign spend` data so we can analyze the spend and revenue data side by side. "
},
{
"cell_type": "code",
"id": "471359bc-0ac7-4fe1-83c2-644a81f9d135",
"metadata": {
"language": "python",
"name": "join_spend_and_revenue_per_month",
"collapsed": false
},
"outputs": [],
"source": "snow_df_spend_and_revenue_per_month = snow_df_spend_per_month.join(snow_df_revenue_per_month, [\"YEAR\",\"MONTH\"])\nsnow_df_spend_and_revenue_per_month.show()",
"execution_count": null
},
{
"cell_type": "markdown",
"id": "fc7f8bf0-a28a-4b33-bad3-c24bb2bab87a",
"metadata": {
"name": "step_7",
"collapsed": false
},
"source": "## Examine DataFrame Explain Plan\nSnowpark makes it really convenient to look at the DataFrame query and execution plan using `explain()` Snowpark DataFrame function."
},
{
"cell_type": "code",
"id": "04f05b39-dbc2-435f-9fc7-50e6cae31ebc",
"metadata": {
"language": "python",
"name": "dataframe_explain",
"collapsed": false
},
"outputs": [],
"source": "snow_df_spend_and_revenue_per_month.explain()",
"execution_count": null
},
{
"cell_type": "markdown",
"id": "b78b1ff6-ca5f-41a1-8373-3665d7db50b9",
"metadata": {
"name": "step_8",
"collapsed": false
},
"source": "## Save Transformed Data into Snowflake Table\nLet's save the transformed data into a Snowflake table `SPEND_AND_REVENUE_PER_MONTH`"
},
{
"cell_type": "code",
"id": "2a75cfbc-be9e-4915-b2cc-03df0670583f",
"metadata": {
"language": "python",
"name": "write_to_table",
"collapsed": false
},
"outputs": [],
"source": "snow_df_spend_and_revenue_per_month.write.mode('overwrite').save_as_table('SPEND_AND_REVENUE_PER_MONTH')",
"execution_count": null
},
{
"cell_type": "markdown",
"id": "d1f5ce2c-6c76-4222-8563-ad012a5ba0f3",
"metadata": {
"name": "additional_resources",
"collapsed": false
},
"source": "## Continue your learning!\n\nThis notebook is simply a `Hello World` of `Data Engineering with Snowpark`. To learn advanced data engineering with Snowflake, hop on to https://quickstarts.snowflake.com/guide/data_engineering_with_notebooks/index.html."
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
name: app_environment
channels:
- snowflake
dependencies:
- snowflake=0.8.0
Loading