diff --git a/Notebooks/TSQL/Jupiter/_content/quickstarts/csv.ipynb b/Notebooks/TSQL/Jupiter/_content/quickstarts/csv.ipynb deleted file mode 100644 index 27085b1..0000000 --- a/Notebooks/TSQL/Jupiter/_content/quickstarts/csv.ipynb +++ /dev/null @@ -1,145 +0,0 @@ -{ - "metadata": { - "kernelspec": { - "name": "SQL", - "display_name": "SQL", - "language": "sql" - }, - "language_info": { - "name": "sql", - "version": "" - } - }, - "nbformat_minor": 2, - "nbformat": 4, - "cells": [ - { - "cell_type": "markdown", - "source": [ - "# Query CSV files\n", - "\n", - "Serverless Synapse SQL pool enables you to read CSV files from Azure storage (DataLake or blob storage).\n", - "\n", - "## Read csv file\n", - "\n", - "The easiest way to see to the content of your `CSV` file is to provide file URL to `OPENROWSET` function and specify format `CSV`. If the file is publicly available or if your Azure AD identity can access this file, you should be able to see the content of the file using the query like the one shown in the following example:" - ], - "metadata": { - "azdata_cell_guid": "e01663cc-427c-457f-84db-b16d0fca3a90" - } - }, - { - "cell_type": "code", - "source": [ - "select top 10 *\r\n", - "from openrowset(\r\n", - " bulk 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.csv',\r\n", - " format = 'csv',\r\n", - " parser_version = '2.0',\r\n", - " firstrow = 2 ) as rows" - ], - "metadata": { - "azdata_cell_guid": "dbc4f12e-388c-49fa-9d85-0fbea3b19d1b" - }, - "outputs": [], - "execution_count": null - }, - { - "cell_type": "markdown", - "source": [ - "## Data source usage\n", - "\n", - "Previous example uses full path to the file. As an alternative, you can create an external data source with the location that points to the root folder of the storage:" - ], - "metadata": { - "azdata_cell_guid": "a373fa76-bfdf-4bb6-8098-73c9ef436eb8" - } - }, - { - "cell_type": "code", - "source": [ - "create external data source covid\r\n", - "with (\r\n", - " location = 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases'\r\n", - ");" - ], - "metadata": { - "azdata_cell_guid": "48b6ee55-09ec-47df-bea5-707dc2f42aa8" - }, - "outputs": [], - "execution_count": null - }, - { - "cell_type": "markdown", - "source": [ - "Once you create a data source, you can use that data source and the relative path to the file in `OPENROWSET` function:" - ], - "metadata": { - "azdata_cell_guid": "c4145b77-8663-4e59-914b-721955a02635" - } - }, - { - "cell_type": "code", - "source": [ - "select\r\n", - " top 10 *\r\n", - "from\r\n", - " openrowset(\r\n", - " bulk 'latest/ecdc_cases.csv',\r\n", - " data_source = 'covid',\r\n", - " format = 'csv',\r\n", - " parser_version ='2.0',\r\n", - " firstrow = 2\r\n", - " ) as rows" - ], - "metadata": { - "azdata_cell_guid": "f3da158c-c168-45b0-8e38-7ee2d430420f" - }, - "outputs": [], - "execution_count": null - }, - { - "cell_type": "markdown", - "source": [ - "## Explicitly specify schema \n", - "\n", - "`OPENROWSET` enables you to explicitly specify what columns you want to read from the file using `WITH` clause:" - ], - "metadata": { - "azdata_cell_guid": "745b2c81-01eb-4bf5-9cad-47a03dcff194" - } - }, - { - "cell_type": "code", - "source": [ - "select\r\n", - " top 10 *\r\n", - "from\r\n", - " openrowset(\r\n", - " bulk 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.csv',\r\n", - " format = 'csv',\r\n", - " parser_version ='2.0',\r\n", - " firstrow = 2\r\n", - " ) with (\r\n", - " date_rep date 1,\r\n", - " cases int 5,\r\n", - " geo_id varchar(6) 8\r\n", - " ) as rows" - ], - "metadata": { - "azdata_cell_guid": "e7bacd03-45d4-4b0b-b1d0-9522e1a54436" - }, - "outputs": [], - "execution_count": null - }, - { - "cell_type": "markdown", - "source": [ - "The numbers after a data type in the `WITH` clause represent column index in the CSV file." - ], - "metadata": { - "azdata_cell_guid": "4397f453-4b20-4083-ae0e-4966d789993f" - } - } - ] -} \ No newline at end of file diff --git a/Notebooks/TSQL/Jupiter/_content/quickstarts/json.ipynb b/Notebooks/TSQL/Jupiter/_content/quickstarts/json.ipynb deleted file mode 100644 index 330c2c9..0000000 --- a/Notebooks/TSQL/Jupiter/_content/quickstarts/json.ipynb +++ /dev/null @@ -1,158 +0,0 @@ -{ - "metadata": { - "kernelspec": { - "name": "SQL", - "display_name": "SQL", - "language": "sql" - }, - "language_info": { - "name": "sql", - "version": "" - } - }, - "nbformat_minor": 2, - "nbformat": 4, - "cells": [ - { - "cell_type": "markdown", - "source": [ - "# Query JSON files\n", - "\n", - "Serverless Synapse SQL pool enables you to read JSON files from Azure storage (DataLake or blob storage).\n", - "\n", - "## Read json line-delimited file\n", - "\n", - "One of the most common format that is used to store JSON documents is line-delimited JSON format (or JSON lines) where every JSOn document is placed in separate line separated with newline character. These files have extensions `jsonl`, `ldjson`, or `ndjson`.\n", - "\n", - "The easiest way to see to the content of your `jsonl` file is to provide file URL to `OPENROWSET` function and specify format `CSV`. If the file is publicly available or if your Azure AD identity can access this file, you should be able to see the content of the file using the query like the one shown in the following example:" - ], - "metadata": { - "azdata_cell_guid": "e01663cc-427c-457f-84db-b16d0fca3a90" - } - }, - { - "cell_type": "code", - "source": [ - "select top 10 *\r\n", - "from openrowset(\r\n", - " bulk 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.jsonl',\r\n", - " format = 'csv',\r\n", - " fieldterminator ='0x0b',\r\n", - " fieldquote = '0x0b'\r\n", - " ) with (doc nvarchar(max)) as rows" - ], - "metadata": { - "azdata_cell_guid": "dbc4f12e-388c-49fa-9d85-0fbea3b19d1b" - }, - "outputs": [], - "execution_count": null - }, - { - "cell_type": "markdown", - "source": [ - "This query will return one row for every JSON document placed in separate line." - ], - "metadata": { - "azdata_cell_guid": "319c9414-ef30-440f-a3dd-f360a91fa145" - } - }, - { - "cell_type": "markdown", - "source": [ - "## Read json file\n", - "\n", - "Synapse SQL enables you to read entire content of JSON file as single text field:" - ], - "metadata": { - "azdata_cell_guid": "d2597313-6223-4b1e-a6fa-979be1a3ce6e" - } - }, - { - "cell_type": "code", - "source": [ - "select top 10 *\r\n", - "from openrowset(\r\n", - " bulk 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.json',\r\n", - " format = 'csv',\r\n", - " fieldterminator ='0x0b',\r\n", - " fieldquote = '0x0b',\r\n", - " rowterminator = '0x0b' --> You need to override rowterminator to read classic JSON\r\n", - " ) with (doc nvarchar(max)) as rows" - ], - "metadata": { - "azdata_cell_guid": "2e5d0ff7-c4ca-45d6-b2fd-cfb83989af1d" - }, - "outputs": [], - "execution_count": null - }, - { - "cell_type": "markdown", - "source": [ - "## Parse JSON document\n", - "\n", - "The query below shows you how to use [JSON\\_VALUE](https://docs.microsoft.com/en-us/sql/t-sql/functions/json-value-transact-sql?toc=/azure/synapse-analytics/toc.json&bc=/azure/synapse-analytics/breadcrumb/toc.json&view=azure-sqldw-latest \"https://docs.microsoft.com/en-us/sql/t-sql/functions/json-value-transact-sql?toc=/azure/synapse-analytics/toc.json&bc=/azure/synapse-analytics/breadcrumb/toc.json&view=azure-sqldw-latest\") to retrieve scalar values (title, publisher) from the JSON documents:" - ], - "metadata": { - "azdata_cell_guid": "a373fa76-bfdf-4bb6-8098-73c9ef436eb8" - } - }, - { - "cell_type": "code", - "source": [ - "select\r\n", - " JSON_VALUE(doc, '$.date_rep') AS date_reported,\r\n", - " JSON_VALUE(doc, '$.countries_and_territories') AS country,\r\n", - " JSON_VALUE(doc, '$.cases') as cases,\r\n", - " doc\r\n", - "from openrowset(\r\n", - " bulk 'latest/ecdc_cases.jsonl',\r\n", - " data_source = 'covid',\r\n", - " format = 'csv',\r\n", - " fieldterminator ='0x0b',\r\n", - " fieldquote = '0x0b'\r\n", - " ) with (doc nvarchar(max)) as rows\r\n", - "order by JSON_VALUE(doc, '$.geo_id') desc" - ], - "metadata": { - "azdata_cell_guid": "48b6ee55-09ec-47df-bea5-707dc2f42aa8" - }, - "outputs": [], - "execution_count": null - }, - { - "cell_type": "markdown", - "source": [ - "As an alternative, you can use `OPENJSON` function to parse documents:" - ], - "metadata": { - "azdata_cell_guid": "c4145b77-8663-4e59-914b-721955a02635" - } - }, - { - "cell_type": "code", - "source": [ - "select\r\n", - " *\r\n", - "from openrowset(\r\n", - " bulk 'latest/ecdc_cases.jsonl',\r\n", - " data_source = 'covid',\r\n", - " format = 'csv',\r\n", - " fieldterminator ='0x0b',\r\n", - " fieldquote = '0x0b'\r\n", - " ) with (doc nvarchar(max)) as rows\r\n", - " cross apply openjson (doc)\r\n", - " with ( date_rep datetime2,\r\n", - " cases int,\r\n", - " fatal int '$.deaths',\r\n", - " country varchar(100) '$.countries_and_territories')\r\n", - "where country = 'Serbia'\r\n", - "order by country, date_rep desc;" - ], - "metadata": { - "azdata_cell_guid": "f3da158c-c168-45b0-8e38-7ee2d430420f" - }, - "outputs": [], - "execution_count": null - } - ] -} \ No newline at end of file diff --git a/Notebooks/TSQL/Jupiter/_content/quickstarts/parquet.ipynb b/Notebooks/TSQL/Jupiter/_content/quickstarts/parquet.ipynb deleted file mode 100644 index b49c7e4..0000000 --- a/Notebooks/TSQL/Jupiter/_content/quickstarts/parquet.ipynb +++ /dev/null @@ -1,169 +0,0 @@ -{ - "metadata": { - "kernelspec": { - "name": "SQL", - "display_name": "SQL", - "language": "sql" - }, - "language_info": { - "name": "sql", - "version": "" - } - }, - "nbformat_minor": 2, - "nbformat": 4, - "cells": [ - { - "cell_type": "markdown", - "source": [ - "# Query PARQUET files\n", - "\n", - "Serverless Synapse SQL pool enables you to read PARQUET files from Azure storage (DataLake or blob storage).\n", - "\n", - "## Read parquet file\n", - "\n", - "The easiest way to see to the content of your `PARQUET` file is to provide file URL to `OPENROWSET` function and specify parquet `FORMAT`. If the file is publicly available or if your Azure AD identity can access this file, you should be able to see the content of the file using the query like the one shown in the following example:" - ], - "metadata": { - "azdata_cell_guid": "e01663cc-427c-457f-84db-b16d0fca3a90" - } - }, - { - "cell_type": "code", - "source": [ - "select top 10 *\r\n", - "from openrowset(\r\n", - " bulk 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.parquet',\r\n", - " format = 'parquet') as rows" - ], - "metadata": { - "azdata_cell_guid": "dbc4f12e-388c-49fa-9d85-0fbea3b19d1b" - }, - "outputs": [], - "execution_count": null - }, - { - "cell_type": "markdown", - "source": [ - "## Data source usage\n", - "\n", - "Previous example uses full path to the file. As an alternative, you can create an external data source with the location that points to the root folder of the storage, and use that data source and the relative path to the file in `OPENROWSET` function.\n", - "\n", - "First you need to create `EXTERNAL DATA SOURCE` in some database:" - ], - "metadata": { - "azdata_cell_guid": "a373fa76-bfdf-4bb6-8098-73c9ef436eb8" - } - }, - { - "cell_type": "code", - "source": [ - "create external data source covid\r\n", - "with ( location = 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases' );" - ], - "metadata": { - "azdata_cell_guid": "48b6ee55-09ec-47df-bea5-707dc2f42aa8" - }, - "outputs": [], - "execution_count": null - }, - { - "cell_type": "markdown", - "source": [ - "Make sure that you create `EXTERNAL DATA SOURCE` in database other than `master`. If data source is protected with some credential you might need to create credential that is associated to data source.\r\n", - "\r\n", - "Once you have properly configures data source, you can use it in `OPENROWSET` function:" - ], - "metadata": { - "azdata_cell_guid": "4e487b5a-657a-4c2d-b24c-d9c755d27e4c" - } - }, - { - "cell_type": "code", - "source": [ - "select top 10 *\r\n", - "from openrowset(\r\n", - " bulk 'latest/ecdc_cases.parquet',\r\n", - " data_source = 'covid',\r\n", - " format = 'parquet'\r\n", - " ) as rows" - ], - "metadata": { - "azdata_cell_guid": "6ab5dd60-2dfe-4c19-a390-0c1505b0bde9" - }, - "outputs": [], - "execution_count": null - }, - { - "cell_type": "markdown", - "source": [ - "## Explicitly specify schema\n", - "\n", - "`OPENROWSET` enables you to explicitly specify what are the types of the columns that you want to read from the file using `WITH` clause:" - ], - "metadata": { - "azdata_cell_guid": "c4145b77-8663-4e59-914b-721955a02635" - } - }, - { - "cell_type": "code", - "source": [ - "select top 10 *\r\n", - "from openrowset(\r\n", - " bulk 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.parquet',\r\n", - " format = 'parquet'\r\n", - " ) with ( date_rep date, cases int, geo_id varchar(6) ) as rows" - ], - "metadata": { - "azdata_cell_guid": "f3da158c-c168-45b0-8e38-7ee2d430420f" - }, - "outputs": [], - "execution_count": null - }, - { - "cell_type": "markdown", - "source": [ - "PARQUET data types are by default mapped to SQL types. The following table describes how Parquet types are mapped to SQL native types.\n", - "\n", - "| Parquet type | Parquet logical type (annotation) | SQL data type |\n", - "| --- | --- | --- |\n", - "| BOOLEAN | | bit |\n", - "| BINARY / BYTE\\_ARRAY | | varbinary |\n", - "| DOUBLE | | float |\n", - "| FLOAT | | real |\n", - "| INT32 | | int |\n", - "| INT64 | | bigint |\n", - "| INT96 | | datetime2 |\n", - "| FIXED\\_LEN\\_BYTE\\_ARRAY | | binary |\n", - "| BINARY | UTF8 | varchar \\*(UTF8 collation) |\n", - "| BINARY | STRING | varchar \\*(UTF8 collation) |\n", - "| BINARY | ENUM | varchar \\*(UTF8 collation) |\n", - "| BINARY | UUID | uniqueidentifier |\n", - "| BINARY | DECIMAL | decimal |\n", - "| BINARY | JSON | varchar(max) \\*(UTF8 collation) |\n", - "| BINARY | BSON | varbinary(max) |\n", - "| FIXED\\_LEN\\_BYTE\\_ARRAY | DECIMAL | decimal |\n", - "| BYTE\\_ARRAY | INTERVAL | varchar(max), serialized into standardized format |\n", - "| INT32 | INT(8, true) | smallint |\n", - "| INT32 | INT(16, true) | smallint |\n", - "| INT32 | INT(32, true) | int |\n", - "| INT32 | INT(8, false) | tinyint |\n", - "| INT32 | INT(16, false) | int |\n", - "| INT32 | INT(32, false) | bigint |\n", - "| INT32 | DATE | date |\n", - "| INT32 | DECIMAL | decimal |\n", - "| INT32 | TIME (MILLIS ) | time |\n", - "| INT64 | INT(64, true) | bigint |\n", - "| INT64 | INT(64, false ) | decimal(20,0) |\n", - "| INT64 | DECIMAL | decimal |\n", - "| INT64 | TIME (MICROS / NANOS) | time |\n", - "| INT64 | TIMESTAMP (MILLIS / MICROS / NANOS) | datetime2 |\n", - "| [Complex type](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#lists \"https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#lists\") | LIST | varchar(max), serialized into JSON |\n", - "| [Complex type](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#maps \"https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#maps\") | MAP | varchar(max), serialized into JSON |" - ], - "metadata": { - "azdata_cell_guid": "745b2c81-01eb-4bf5-9cad-47a03dcff194" - } - } - ] -} \ No newline at end of file diff --git a/Notebooks/TSQL/Jupiter/_content/quickstarts/readme.md b/Notebooks/TSQL/Jupiter/_content/quickstarts/readme.md deleted file mode 100644 index e8ef14d..0000000 --- a/Notebooks/TSQL/Jupiter/_content/quickstarts/readme.md +++ /dev/null @@ -1,8 +0,0 @@ -# Azure Synapse Analytics quick-starts - -This book contains quick-start sample that demonstrate how to query the following types of files: -- [PARQUET](parquet.ipynb) -- [CSV](csv.ipynb) -- [JSON](json.ipynb) - -Open a notebook, select SQL kernel and connect to your Synapse SQL endpoint. Follow the instructions in the quick-start samples. diff --git a/Notebooks/TSQL/Jupiter/_content/readme.md b/Notebooks/TSQL/Jupiter/_content/readme.md deleted file mode 100644 index 86fae14..0000000 --- a/Notebooks/TSQL/Jupiter/_content/readme.md +++ /dev/null @@ -1,26 +0,0 @@ -# Azure Synapse Analytics - -This book contains tutorials that demo how to use serverless Synapse SQL pool to analyze data on Azure Storage. - -## Prerequisites - -To start tutorials, you would need to have Synapse Analytics workspace. - -If you don't have one, you can deploy a workspace with underlying Data Lake Storage. Select the **Deploy to Azure** button to deploy the workspave. The template will open in the Azure portal. - -Deploy to Azure - -If you don't have an Azure subscription, create a free account before you begin.

- -The template defines two resources: -- Storage account -- Workspace - -## Samples - -This book contains the following samples: - -- Quick-start samples - reading [PARQUET](quickstarts/parquet.ipynb), [CSV](quickstarts/csv.ipynb) and [JSON](quickstarts/json.ipynb) -- Tutorials - [Analyze COVID data set provided by ECDC](tutorials/covid-ecdc.ipynb) and [Analyze NY Taxi rides](tutorials/ny-taxi.ipynb) - -Open some of these notebooks, select SQL kernel and connect to your Synapse SQL endpoint. Follow the instructions in tutorials to run the samples. diff --git a/Notebooks/TSQL/Jupiter/_content/tutorials/covid-ecdc.ipynb b/Notebooks/TSQL/Jupiter/_content/tutorials/covid-ecdc.ipynb deleted file mode 100644 index 61b20a2..0000000 --- a/Notebooks/TSQL/Jupiter/_content/tutorials/covid-ecdc.ipynb +++ /dev/null @@ -1,257 +0,0 @@ -{ - "metadata": { - "kernelspec": { - "name": "SQL", - "display_name": "SQL", - "language": "sql" - }, - "language_info": { - "name": "sql", - "version": "" - } - }, - "nbformat_minor": 2, - "nbformat": 4, - "cells": [ - { - "cell_type": "markdown", - "source": [ - "# Analyze ECDC COVID data using Azure Synapse serverless SQL pool\n", - "\n", - "In this notebook, you will see how you can analyze the distribution of COVID cases reported in Serbia (Europe) using Synapse SQL endpoint in Synapse Analytics. Synapse SQL engine is the perfect choice for ad-hoc data analytics for the data analysts with T-SQL skills. The data set is placed on [Azure storage](https://azure.microsoft.com/en-us/services/open-datasets/catalog/ecdc-covid-19-cases/ \"https://azure.microsoft.com/en-us/services/open-datasets/catalog/ecdc-covid-19-cases/\") and formatted as parquet. \n", - "\n", - "## Explore your data\n", - "\n", - "As a first step we need to explore data in the file place in Azure storage using `OPENROWSET` function:" - ], - "metadata": { - "azdata_cell_guid": "b0bfbef2-8271-48da-be46-9d102c04ae3e" - } - }, - { - "cell_type": "code", - "source": [ - "select top 10 *\r\n", - "from openrowset(bulk 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.parquet',\r\n", - " format='parquet') as a" - ], - "metadata": { - "azdata_cell_guid": "fef36ba3-39d7-4dda-a544-29d301e85724" - }, - "outputs": [], - "execution_count": null - }, - { - "cell_type": "markdown", - "source": [ - "Here we can see that some of the columns interesting for analysis are `DATE_REP` and `CASES`. I would like to analyze number of cases reported in Serbia, so I would need to filter the results using `GEO_ID` column.\n", - "\n", - "We are not sure what is `geo_id` value for Serbia, so we will find all distinct countries and `geo_id` values where country is something like Serbia:" - ], - "metadata": { - "azdata_cell_guid": "892b5348-a006-45bb-bbab-9d3eb935c643" - } - }, - { - "cell_type": "code", - "source": [ - "select distinct countries_and_territories, geo_id\r\n", - "from openrowset(bulk 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.parquet',\r\n", - " format='parquet') as a\r\n", - "where countries_and_territories like '%ser%'" - ], - "metadata": { - "azdata_cell_guid": "d9a34e8f-8ee8-498a-87c0-d0210ba08bd0" - }, - "outputs": [], - "execution_count": null - }, - { - "cell_type": "markdown", - "source": [ - "Since we see that `GEO_ID` for Serbia is `RS`, we can find dayly number of cases in Serbia:" - ], - "metadata": { - "azdata_cell_guid": "934198a6-a06b-42cb-a027-3f30614ca0f6" - } - }, - { - "cell_type": "code", - "source": [ - "select DATE_REP, CASES\r\n", - "from openrowset(bulk 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.parquet',\r\n", - " format='parquet') as a\r\n", - "where geo_id = 'RS'\r\n", - "order by date_rep" - ], - "metadata": { - "azdata_cell_guid": "c399cbd1-bf74-4ea4-8fa4-72c486270ae7" - }, - "outputs": [], - "execution_count": null - }, - { - "cell_type": "markdown", - "source": [ - "We can show this in the chart to see trend analysis of reported COVID cases in Serbia. By looking at this chart, we can see that the peek is somewhere between 15th and 20th April and the peak in the second wave is second half of July.\n", - "\n", - "The points on time series charts are shown per daily basis. This might lead to daily variation, so you might want to show the graph with average values calculated in the window with +/- 1-2 days. T-SQL enables you to easily calculate average values if you specify time window:\n", - "\n", - "```\n", - "AVG(CASES) OVER(order by date_rep ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING )\n", - "```\n", - "\n", - "We need to specify how to locally order data and number of preceding/following rows that AVG function should use to calculate the average value within the window. The time series query that uses average values is shown on the following code:" - ], - "metadata": { - "azdata_cell_guid": "91e78e94-3aa0-46a4-ba9c-2b0a046f0f6c" - } - }, - { - "cell_type": "code", - "source": [ - "select DATE_REP,\r\n", - " CASES_AVG = AVG(CASES) OVER(ORDER BY date_rep ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING )\r\n", - "from openrowset(bulk 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.parquet', format='parquet') as a\r\n", - "where geo_id = 'RS'\r\n", - "order by date_rep" - ], - "metadata": { - "azdata_cell_guid": "75f965dd-4565-4561-9dcd-fbf517bb5250" - }, - "outputs": [], - "execution_count": null - }, - { - "cell_type": "markdown", - "source": [ - "We can also show cumulative values to see increase of the number of cases over time (this is known as running total):" - ], - "metadata": { - "azdata_cell_guid": "3d861e0d-a2a6-4599-a84d-620479c66c68" - } - }, - { - "cell_type": "code", - "source": [ - "select DATE_REP,\r\n", - " CUMULATIVE = SUM(CASES) OVER (ORDER BY date_rep)\r\n", - "from openrowset(bulk 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.parquet',\r\n", - " format='parquet') as a\r\n", - "where geo_id = 'RS'\r\n", - "order by date_rep" - ], - "metadata": { - "azdata_cell_guid": "f7b7457e-2e58-43f3-908e-0e564b262a7b" - }, - "outputs": [], - "execution_count": null - }, - { - "cell_type": "markdown", - "source": [ - "If we switch to chart we can see cumulative number of cases that are reported since the first COVID case.\n", - "\n", - "SQL language enables us to easily lookup number of reported cases couple of days after or before using LAG and LEAD functions. the following query will return number of cases reported 7 days ago:" - ], - "metadata": { - "azdata_cell_guid": "2dc5f582-3371-41fb-9e83-293a20876beb" - } - }, - { - "cell_type": "code", - "source": [ - "select TOP 10 date_rep,\r\n", - " cases,\r\n", - " prev = LAG(CASES, 7) OVER(partition by geo_id order by date_rep )\r\n", - "from openrowset(bulk 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.parquet',\r\n", - " format='parquet') as a\r\n", - "where geo_id = 'RS'\r\n", - "order by date_rep desc;" - ], - "metadata": { - "azdata_cell_guid": "3ce22b19-d0c8-4e33-ac99-65616680fbb7" - }, - "outputs": [], - "execution_count": null - }, - { - "cell_type": "markdown", - "source": [ - "You can notice in the result that prev column lag 7 days to the current column. Now we can easily compare the difference between the current number of reported cases of the number of reported cases reported or percent of increase:\n", - "\n", - "```\n", - "WoW% = (cases - prev) / prev\n", - " = cases/prev - 1\n", - "```\n", - "\n", - "Instead of simple comparison of current and previous value, we can make this more reliable and first calculate the average values in the 7-day windows and then calculate increase using these values:" - ], - "metadata": { - "azdata_cell_guid": "6a888617-355b-4b9d-ac22-352aa0661a20" - } - }, - { - "cell_type": "code", - "source": [ - "with ecdc as (\r\n", - " select\r\n", - " date_rep,\r\n", - " cases = AVG(CASES) OVER(partition by geo_id order by date_rep ROWS BETWEEN 7 PRECEDING AND CURRENT ROW ),\r\n", - " prev = AVG(CASES) OVER(partition by geo_id order by date_rep ROWS BETWEEN 14 PRECEDING AND 7 PRECEDING )\r\n", - " from\r\n", - " openrowset(bulk 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.parquet',\r\n", - " format='parquet') as a\r\n", - " where\r\n", - " geo_id = 'RS'\r\n", - ")\r\n", - "select date_rep, cases, prev, [WoW%] = 100*(1.0*cases/prev - 1)\r\n", - "from ecdc\r\n", - "where prev > 10\r\n", - "order by date_rep desc;" - ], - "metadata": { - "azdata_cell_guid": "c5828567-0d08-4f43-bfd1-34c78e643343" - }, - "outputs": [], - "execution_count": null - }, - { - "cell_type": "markdown", - "source": [ - "This query will calculate the average number of cases in 7-day window and calculate week over week change.\n", - "\n", - "We can go step further and use the same query to run analysis across all countries in the world to calculate weekly changes and find the countries with the highest increase of COVID cases compared to the previous week." - ], - "metadata": { - "azdata_cell_guid": "08ea9412-2c60-4a7c-affc-2eb6115f20b3" - } - }, - { - "cell_type": "code", - "source": [ - "with weekly_cases as (\r\n", - " select geo_id, date_rep, country = countries_and_territories,\r\n", - " current_avg = AVG(CASES) OVER(partition by geo_id order by date_rep ROWS BETWEEN 7 PRECEDING AND CURRENT ROW ),\r\n", - " prev_avg = AVG(CASES) OVER(partition by geo_id order by date_rep ROWS BETWEEN 14 PRECEDING AND 7 PRECEDING )\r\n", - " from openrowset(bulk 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.parquet',\r\n", - " format='parquet') as a \r\n", - ")\r\n", - "select top 10 \r\n", - " country, \r\n", - " current_avg,\r\n", - " prev_avg, \r\n", - " [WoW%] = CAST((100*(1.* current_avg / prev_avg - 1)) AS smallint)\r\n", - "from weekly_cases\r\n", - "where date_rep = CONVERT(date, DATEADD(DAY, -1, GETDATE()), 23)\r\n", - "and prev_avg > 100\r\n", - "order by (1. * current_avg / prev_avg -1) desc" - ], - "metadata": { - "azdata_cell_guid": "f338436e-d98d-4c1b-8c61-3ad838059cc0" - }, - "outputs": [], - "execution_count": null - } - ] -} \ No newline at end of file diff --git a/Notebooks/TSQL/Jupiter/_content/tutorials/ny-taxi.ipynb b/Notebooks/TSQL/Jupiter/_content/tutorials/ny-taxi.ipynb deleted file mode 100644 index f74d790..0000000 --- a/Notebooks/TSQL/Jupiter/_content/tutorials/ny-taxi.ipynb +++ /dev/null @@ -1,270 +0,0 @@ -{ - "metadata": { - "kernelspec": { - "name": "SQL", - "display_name": "SQL", - "language": "sql" - }, - "language_info": { - "name": "sql", - "version": "" - } - }, - "nbformat_minor": 2, - "nbformat": 4, - "cells": [ - { - "cell_type": "markdown", - "source": [ - "# Analyze NY taxi data\n", - "In this tutorial, we will perform exploratory data analysis by combining different Azure Open Datasets using serverless SQL and then visualizing the results in Azure Data Studio. In particular, you analyze the [New York City (NYC) Taxi dataset](https://azure.microsoft.com/services/open-datasets/catalog/nyc-taxi-limousine-commission-yellow-taxi-trip-records/ \"https://azure.microsoft.com/services/open-datasets/catalog/nyc-taxi-limousine-commission-yellow-taxi-trip-records/\").\n", - "You can learn more about the meaning of the individual columns in the descriptions of the [NYC Taxi](https://azure.microsoft.com/services/open-datasets/catalog/nyc-taxi-limousine-commission-yellow-taxi-trip-records/ \"https://azure.microsoft.com/services/open-datasets/catalog/nyc-taxi-limousine-commission-yellow-taxi-trip-records/\"), [Public Holidays](https://azure.microsoft.com/services/open-datasets/catalog/public-holidays/ \"https://azure.microsoft.com/services/open-datasets/catalog/public-holidays/\"), and [Weather Data](https://azure.microsoft.com/services/open-datasets/catalog/noaa-integrated-surface-data/ \"https://azure.microsoft.com/services/open-datasets/catalog/noaa-integrated-surface-data/\") Azure open datasets.\n", - "Let's first get familiar with the NYC Taxi data by running the following query:" - ], - "metadata": { - "azdata_cell_guid": "e01663cc-427c-457f-84db-b16d0fca3a90" - } - }, - { - "cell_type": "code", - "source": [ - "SELECT TOP 10 * FROM\r\n", - " OPENROWSET(\r\n", - " BULK 'https://azureopendatastorage.blob.core.windows.net/nyctlc/yellow/puYear=*/puMonth=*/*.parquet',\r\n", - " FORMAT='PARQUET'\r\n", - " ) AS [nyc]" - ], - "metadata": { - "azdata_cell_guid": "dbc4f12e-388c-49fa-9d85-0fbea3b19d1b" - }, - "outputs": [], - "execution_count": null - }, - { - "cell_type": "markdown", - "source": [ - "Similarly, we can explore the Public Holidays dataset:" - ], - "metadata": { - "azdata_cell_guid": "a373fa76-bfdf-4bb6-8098-73c9ef436eb8" - } - }, - { - "cell_type": "code", - "source": [ - "SELECT TOP 10 * FROM\r\n", - " OPENROWSET(\r\n", - " BULK 'https://azureopendatastorage.blob.core.windows.net/holidaydatacontainer/Processed/*.parquet',\r\n", - " FORMAT='PARQUET'\r\n", - " ) AS [holidays]" - ], - "metadata": { - "azdata_cell_guid": "48b6ee55-09ec-47df-bea5-707dc2f42aa8" - }, - "outputs": [], - "execution_count": null - }, - { - "cell_type": "markdown", - "source": [ - "Lastly, we can also explore the Weather Data dataset by using the following query:" - ], - "metadata": { - "azdata_cell_guid": "c4145b77-8663-4e59-914b-721955a02635" - } - }, - { - "cell_type": "code", - "source": [ - "SELECT\r\n", - " TOP 10 *\r\n", - "FROM \r\n", - " OPENROWSET(\r\n", - " BULK 'https://azureopendatastorage.blob.core.windows.net/isdweatherdatacontainer/ISDWeather/year=*/month=*/*.parquet',\r\n", - " FORMAT='PARQUET'\r\n", - " ) AS [weather]" - ], - "metadata": { - "azdata_cell_guid": "f3da158c-c168-45b0-8e38-7ee2d430420f" - }, - "outputs": [], - "execution_count": null - }, - { - "cell_type": "markdown", - "source": [ - "## Time series, seasonality, and outlier analysis\n", - "\n", - "We can easily summarize the yearly number of taxi rides by using the following query:" - ], - "metadata": { - "azdata_cell_guid": "745b2c81-01eb-4bf5-9cad-47a03dcff194" - } - }, - { - "cell_type": "code", - "source": [ - "SELECT\r\n", - " YEAR(tpepPickupDateTime) AS current_year,\r\n", - " COUNT(*) AS rides_per_year\r\n", - "FROM\r\n", - " OPENROWSET(\r\n", - " BULK 'https://azureopendatastorage.blob.core.windows.net/nyctlc/yellow/puYear=*/puMonth=*/*.parquet',\r\n", - " FORMAT='PARQUET'\r\n", - " ) AS [nyc]\r\n", - "WHERE nyc.filepath(1) >= '2009' AND nyc.filepath(1) <= '2019'\r\n", - "GROUP BY YEAR(tpepPickupDateTime)\r\n", - "ORDER BY 1 ASC" - ], - "metadata": { - "azdata_cell_guid": "e7bacd03-45d4-4b0b-b1d0-9522e1a54436" - }, - "outputs": [], - "execution_count": null - }, - { - "cell_type": "markdown", - "source": [ - "The data can be visualized in Azure Data Studio by switching from the Table to the Chart view. You can choose among different chart types, such as Area, Bar, Column, Line, Pie, and Scatter.  \n", - "\n", - "From this visualization, a trend of a decreasing number of rides over years can be clearly seen. Presumably, this decrease is due to the recent increased popularity of ride-sharing companies.\n", - "\n", - "Next, let's focus the analysis on a single year, for example, 2016. The following query returns the daily number of rides during that year:" - ], - "metadata": { - "azdata_cell_guid": "4397f453-4b20-4083-ae0e-4966d789993f" - } - }, - { - "cell_type": "code", - "source": [ - "SELECT\r\n", - " CAST([tpepPickupDateTime] AS DATE) AS [current_day],\r\n", - " COUNT(*) as rides_per_day\r\n", - "FROM\r\n", - " OPENROWSET(\r\n", - " BULK 'https://azureopendatastorage.blob.core.windows.net/nyctlc/yellow/puYear=*/puMonth=*/*.parquet',\r\n", - " FORMAT='PARQUET'\r\n", - " ) AS [nyc]\r\n", - "WHERE nyc.filepath(1) = '2016'\r\n", - "GROUP BY CAST([tpepPickupDateTime] AS DATE)\r\n", - "ORDER BY 1 ASC" - ], - "metadata": { - "azdata_cell_guid": "e01007a4-adda-460a-83e0-e45a789c80cb" - }, - "outputs": [], - "execution_count": null - }, - { - "cell_type": "markdown", - "source": [ - "Again, we can easily visualize data by plotting the Column chart with the Category column set to `current_day` and the Legend (series) column set to `rides_per_day`.\n", - "\n", - "From the plot chart, you can see there's a weekly pattern, with Saturdays as the peak day. During summer months, there are fewer taxi rides because of vacations. Also, notice some significant drops in the number of taxi rides without a clear pattern of when and why they occur.\n", - "\n", - "Next, let's see if the drops correlate with public holidays by joining the NYC Taxi rides dataset with the Public Holidays dataset:" - ], - "metadata": { - "azdata_cell_guid": "455f1a05-87bc-4a83-8856-7ac9adc76af0" - } - }, - { - "cell_type": "code", - "source": [ - "WITH\r\n", - "taxi_rides AS\r\n", - "(\r\n", - " SELECT\r\n", - " CAST([tpepPickupDateTime] AS DATE) AS [current_day],\r\n", - " COUNT(*) as rides_per_day\r\n", - " FROM \r\n", - " OPENROWSET(\r\n", - " BULK 'https://azureopendatastorage.blob.core.windows.net/nyctlc/yellow/puYear=*/puMonth=*/*.parquet',\r\n", - " FORMAT='PARQUET'\r\n", - " ) AS [nyc]\r\n", - " WHERE nyc.filepath(1) = '2016'\r\n", - " GROUP BY CAST([tpepPickupDateTime] AS DATE)\r\n", - "),\r\n", - "public_holidays AS\r\n", - "(\r\n", - " SELECT\r\n", - " 500000 as holiday,\r\n", - " date\r\n", - " FROM\r\n", - " OPENROWSET(\r\n", - " BULK 'https://azureopendatastorage.blob.core.windows.net/holidaydatacontainer/Processed/*.parquet',\r\n", - " FORMAT='PARQUET'\r\n", - " ) AS [holidays]\r\n", - " WHERE countryorregion = 'United States' AND YEAR(date) = 2016\r\n", - ")\r\n", - "SELECT\r\n", - "*\r\n", - "FROM taxi_rides t\r\n", - "LEFT OUTER JOIN public_holidays p on t.current_day = p.date\r\n", - "ORDER BY current_day ASC" - ], - "metadata": { - "azdata_cell_guid": "710cb813-a14d-4daa-8436-7a0086e4381f" - }, - "outputs": [], - "execution_count": null - }, - { - "cell_type": "markdown", - "source": [ - "From the plot chart, you can see that during public holidays the number of taxi rides is lower. There's still one unexplained large drop on January 23. Let's check the weather in NYC on that day by querying the Weather Data dataset:" - ], - "metadata": { - "azdata_cell_guid": "a2160c53-23d2-4243-94ad-7c3408c49080" - } - }, - { - "cell_type": "code", - "source": [ - "SELECT\r\n", - " AVG(windspeed) AS avg_windspeed,\r\n", - " MIN(windspeed) AS min_windspeed,\r\n", - " MAX(windspeed) AS max_windspeed,\r\n", - " AVG(temperature) AS avg_temperature,\r\n", - " MIN(temperature) AS min_temperature,\r\n", - " MAX(temperature) AS max_temperature,\r\n", - " AVG(sealvlpressure) AS avg_sealvlpressure,\r\n", - " MIN(sealvlpressure) AS min_sealvlpressure,\r\n", - " MAX(sealvlpressure) AS max_sealvlpressure,\r\n", - " AVG(precipdepth) AS avg_precipdepth,\r\n", - " MIN(precipdepth) AS min_precipdepth,\r\n", - " MAX(precipdepth) AS max_precipdepth,\r\n", - " AVG(snowdepth) AS avg_snowdepth,\r\n", - " MIN(snowdepth) AS min_snowdepth,\r\n", - " MAX(snowdepth) AS max_snowdepth\r\n", - "FROM\r\n", - " OPENROWSET(\r\n", - " BULK 'https://azureopendatastorage.blob.core.windows.net/isdweatherdatacontainer/ISDWeather/year=*/month=*/*.parquet',\r\n", - " FORMAT='PARQUET'\r\n", - " ) AS [weather]\r\n", - "WHERE countryorregion = 'US' AND CAST([datetime] AS DATE) = '2016-01-23' AND stationname = 'JOHN F KENNEDY INTERNATIONAL AIRPORT'" - ], - "metadata": { - "azdata_cell_guid": "aa7087b1-fb8d-43cc-a580-afa1ff6d9741" - }, - "outputs": [], - "execution_count": null - }, - { - "cell_type": "markdown", - "source": [ - "The results of the query indicate that the drop in the number of taxi rides occurred because:\n", - "\n", - "- There was a blizzard on that day in NYC with heavy snow (~30 cm).\n", - "- It was cold (temperature was below zero degrees Celsius).\n", - "- It was windy (~10 m/s).\n", - "\n", - "This tutorial has shown how a data analyst can quickly perform exploratory data analysis, easily combine different datasets by using serverless Synapse SQL pool, and visualize the results by using Azure Data Studio." - ], - "metadata": { - "azdata_cell_guid": "be7f6d54-0301-4fbd-be89-56472649e6f9" - } - } - ] -} \ No newline at end of file diff --git a/Notebooks/TSQL/Jupiter/_content/tutorials/readme.md b/Notebooks/TSQL/Jupiter/_content/tutorials/readme.md deleted file mode 100644 index ac164c1..0000000 --- a/Notebooks/TSQL/Jupiter/_content/tutorials/readme.md +++ /dev/null @@ -1,8 +0,0 @@ -# Azure Synapse Analytics tutorials - -This book contains tutorials that demo how to use serverless Synapse SQL pool to analyze data on Azure Storage. This book contains two tutorials: - -- [Analyze COVID ECDC data set](covid-ecdc.ipynb) - In this tutorial, we will perform exploratory data analysis by combining different Azure Open Datasets using serverless SQL and then visualizing the results in Azure Data Studio. In particular, you analyze the New York City (NYC) Taxi dataset. You can learn more about the meaning of the individual columns in the descriptions of the NYC Taxi, Public Holidays, and Weather Data Azure open datasets. -- [Analyze NY Taxi rides](ny-taxi.ipynb) - In this tutorial, you learn how to perform exploratory data analysis by combining different Azure Open Datasets using serverless Synapse SQL pool and then visualizing the results in Azure Synapse Studio. - -Open a notebook, select SQL kernel and connect to your Synapse SQL endpoint. Follow the instructions in the tutorials. diff --git a/Notebooks/TSQL/Jupiter/_data/toc.yml b/Notebooks/TSQL/Jupiter/_data/toc.yml index 2521fd7..aef4d86 100644 --- a/Notebooks/TSQL/Jupiter/_data/toc.yml +++ b/Notebooks/TSQL/Jupiter/_data/toc.yml @@ -3,6 +3,19 @@ - title: Search search: true +- title: Quick-starts + url: /quickstarts/readme + not_numbered: true + expand_sections: true + sections: + - title: Read Parquet files + url: /quickstarts/parquet + - title: Read Csv files + url: /quickstarts/csv + - title: Read Delta Lake folders + url: /quickstarts/delta-lake + - title: Read JSON files + url: /quickstarts/json - title: Tutorials url: /tutorials/readme not_numbered: true diff --git a/Notebooks/TSQL/Jupiter/content/quickstarts/delta-lake.ipynb b/Notebooks/TSQL/Jupiter/content/quickstarts/delta-lake.ipynb new file mode 100644 index 0000000..d0a9575 --- /dev/null +++ b/Notebooks/TSQL/Jupiter/content/quickstarts/delta-lake.ipynb @@ -0,0 +1,1852 @@ +{ + "metadata": { + "kernelspec": { + "name": "SQL", + "display_name": "SQL", + "language": "sql" + }, + "language_info": { + "name": "sql", + "version": "" + } + }, + "nbformat_minor": 2, + "nbformat": 4, + "cells": [ + { + "cell_type": "markdown", + "source": [ + "# Query Delta Lake folders\n", + "\n", + "Serverless Synapse SQL pool enables you to read Delta Lake files from Azure storage (DataLake or blob storage).\n", + "\n", + "![Delta Lake folder](img/covid-delta-lake-studio.png)\n", + "\n", + "## Read Delta Lake folder\n", + "\n", + "The easiest way to see to the content of your Delta Lake file is to provide Delta Lake folder URL to `OPENROWSET` function and specify parquet `DELTA`. If the file is publicly available or if your Azure AD identity can access this file, you should be able to see the content of the file using the query like the one shown in the following example:" + ], + "metadata": { + "azdata_cell_guid": "e01663cc-427c-457f-84db-b16d0fca3a90" + } + }, + { + "cell_type": "code", + "source": [ + "select top 10 *\r\n", + "from openrowset(\r\n", + " bulk 'https://sqlondemandstorage.blob.core.windows.net/delta-lake/covid/',\r\n", + " format = 'delta') as rows" + ], + "metadata": { + "azdata_cell_guid": "dbc4f12e-388c-49fa-9d85-0fbea3b19d1b" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/html": "Statement ID: {9D076051-65F3-4173-803E-965E3F63F229} | Query hash: 0x2607A6E4C1BCC82C | Distributed request ID: {F61831C7-0B1C-4C10-8A10-D3D05FE31CE6}. Total size of data scanned is 1 megabytes, total size of data moved is 1 megabytes, total size of data written is 0 megabytes." + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/html": "(10 rows affected)" + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/html": "Total execution time: 00:00:02.412" + }, + "metadata": {} + }, + { + "output_type": "execute_result", + "metadata": { + "resultSet": { + "id": 0, + "batchId": 0, + "rowCount": 10, + "complete": true, + "columnInfo": [ + { + "isBytes": false, + "isChars": false, + "isSqlVariant": false, + "isUdt": false, + "isXml": false, + "isJson": false, + "sqlDbType": 31, + "isHierarchyId": false, + "isSqlXmlType": false, + "isUnknownType": false, + "isUpdatable": true, + "allowDBNull": true, + "baseCatalogName": null, + "baseColumnName": null, + "baseSchemaName": null, + "baseServerName": null, + "baseTableName": null, + "columnName": "date_rep", + "columnOrdinal": 0, + "columnSize": 3, + "isAliased": null, + "isAutoIncrement": false, + "isExpression": null, + "isHidden": null, + "isIdentity": false, + "isKey": null, + "isLong": false, + "isReadOnly": false, + "isUnique": false, + "numericPrecision": 255, + "numericScale": 255, + "udtAssemblyQualifiedName": null, + "dataType": "System.DateTime, System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e", + "dataTypeName": "date" + }, + { + "isBytes": false, + "isChars": false, + "isSqlVariant": false, + "isUdt": false, + "isXml": false, + "isJson": false, + "sqlDbType": 16, + "isHierarchyId": false, + "isSqlXmlType": false, + "isUnknownType": false, + "isUpdatable": true, + "allowDBNull": true, + "baseCatalogName": null, + "baseColumnName": null, + "baseSchemaName": null, + "baseServerName": null, + "baseTableName": null, + "columnName": "day", + "columnOrdinal": 1, + "columnSize": 2, + "isAliased": null, + "isAutoIncrement": false, + "isExpression": null, + "isHidden": null, + "isIdentity": false, + "isKey": null, + "isLong": false, + "isReadOnly": false, + "isUnique": false, + "numericPrecision": 5, + "numericScale": 255, + "udtAssemblyQualifiedName": null, + "dataType": "System.Int16, System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e", + "dataTypeName": "smallint" + }, + { + "isBytes": false, + "isChars": false, + "isSqlVariant": false, + "isUdt": false, + "isXml": false, + "isJson": false, + "sqlDbType": 16, + "isHierarchyId": false, + "isSqlXmlType": false, + "isUnknownType": false, + "isUpdatable": true, + "allowDBNull": true, + "baseCatalogName": null, + "baseColumnName": null, + "baseSchemaName": null, + "baseServerName": null, + "baseTableName": null, + "columnName": "month", + "columnOrdinal": 2, + "columnSize": 2, + "isAliased": null, + "isAutoIncrement": false, + "isExpression": null, + "isHidden": null, + "isIdentity": false, + "isKey": null, + "isLong": false, + "isReadOnly": false, + "isUnique": false, + "numericPrecision": 5, + "numericScale": 255, + "udtAssemblyQualifiedName": null, + "dataType": "System.Int16, System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e", + "dataTypeName": "smallint" + }, + { + "isBytes": false, + "isChars": false, + "isSqlVariant": false, + "isUdt": false, + "isXml": false, + "isJson": false, + "sqlDbType": 16, + "isHierarchyId": false, + "isSqlXmlType": false, + "isUnknownType": false, + "isUpdatable": true, + "allowDBNull": true, + "baseCatalogName": null, + "baseColumnName": null, + "baseSchemaName": null, + "baseServerName": null, + "baseTableName": null, + "columnName": "year", + "columnOrdinal": 3, + "columnSize": 2, + "isAliased": null, + "isAutoIncrement": false, + "isExpression": null, + "isHidden": null, + "isIdentity": false, + "isKey": null, + "isLong": false, + "isReadOnly": false, + "isUnique": false, + "numericPrecision": 5, + "numericScale": 255, + "udtAssemblyQualifiedName": null, + "dataType": "System.Int16, System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e", + "dataTypeName": "smallint" + }, + { + "isBytes": false, + "isChars": false, + "isSqlVariant": false, + "isUdt": false, + "isXml": false, + "isJson": false, + "sqlDbType": 16, + "isHierarchyId": false, + "isSqlXmlType": false, + "isUnknownType": false, + "isUpdatable": true, + "allowDBNull": true, + "baseCatalogName": null, + "baseColumnName": null, + "baseSchemaName": null, + "baseServerName": null, + "baseTableName": null, + "columnName": "cases", + "columnOrdinal": 4, + "columnSize": 2, + "isAliased": null, + "isAutoIncrement": false, + "isExpression": null, + "isHidden": null, + "isIdentity": false, + "isKey": null, + "isLong": false, + "isReadOnly": false, + "isUnique": false, + "numericPrecision": 5, + "numericScale": 255, + "udtAssemblyQualifiedName": null, + "dataType": "System.Int16, System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e", + "dataTypeName": "smallint" + }, + { + "isBytes": false, + "isChars": false, + "isSqlVariant": false, + "isUdt": false, + "isXml": false, + "isJson": false, + "sqlDbType": 16, + "isHierarchyId": false, + "isSqlXmlType": false, + "isUnknownType": false, + "isUpdatable": true, + "allowDBNull": true, + "baseCatalogName": null, + "baseColumnName": null, + "baseSchemaName": null, + "baseServerName": null, + "baseTableName": null, + "columnName": "deaths", + "columnOrdinal": 5, + "columnSize": 2, + "isAliased": null, + "isAutoIncrement": false, + "isExpression": null, + "isHidden": null, + "isIdentity": false, + "isKey": null, + "isLong": false, + "isReadOnly": false, + "isUnique": false, + "numericPrecision": 5, + "numericScale": 255, + "udtAssemblyQualifiedName": null, + "dataType": "System.Int16, System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e", + "dataTypeName": "smallint" + }, + { + "isBytes": false, + "isChars": true, + "isSqlVariant": false, + "isUdt": false, + "isXml": false, + "isJson": false, + "sqlDbType": 22, + "isHierarchyId": false, + "isSqlXmlType": false, + "isUnknownType": false, + "isUpdatable": true, + "allowDBNull": true, + "baseCatalogName": null, + "baseColumnName": null, + "baseSchemaName": null, + "baseServerName": null, + "baseTableName": null, + "columnName": "countries_and_territories", + "columnOrdinal": 6, + "columnSize": 8000, + "isAliased": null, + "isAutoIncrement": false, + "isExpression": null, + "isHidden": null, + "isIdentity": false, + "isKey": null, + "isLong": false, + "isReadOnly": false, + "isUnique": false, + "numericPrecision": 255, + "numericScale": 255, + "udtAssemblyQualifiedName": null, + "dataType": "System.String, System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e", + "dataTypeName": "varchar" + }, + { + "isBytes": false, + "isChars": true, + "isSqlVariant": false, + "isUdt": false, + "isXml": false, + "isJson": false, + "sqlDbType": 22, + "isHierarchyId": false, + "isSqlXmlType": false, + "isUnknownType": false, + "isUpdatable": true, + "allowDBNull": true, + "baseCatalogName": null, + "baseColumnName": null, + "baseSchemaName": null, + "baseServerName": null, + "baseTableName": null, + "columnName": "geo_id", + "columnOrdinal": 7, + "columnSize": 8000, + "isAliased": null, + "isAutoIncrement": false, + "isExpression": null, + "isHidden": null, + "isIdentity": false, + "isKey": null, + "isLong": false, + "isReadOnly": false, + "isUnique": false, + "numericPrecision": 255, + "numericScale": 255, + "udtAssemblyQualifiedName": null, + "dataType": "System.String, System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e", + "dataTypeName": "varchar" + }, + { + "isBytes": false, + "isChars": true, + "isSqlVariant": false, + "isUdt": false, + "isXml": false, + "isJson": false, + "sqlDbType": 22, + "isHierarchyId": false, + "isSqlXmlType": false, + "isUnknownType": false, + "isUpdatable": true, + "allowDBNull": true, + "baseCatalogName": null, + "baseColumnName": null, + "baseSchemaName": null, + "baseServerName": null, + "baseTableName": null, + "columnName": "country_territory_code", + "columnOrdinal": 8, + "columnSize": 8000, + "isAliased": null, + "isAutoIncrement": false, + "isExpression": null, + "isHidden": null, + "isIdentity": false, + "isKey": null, + "isLong": false, + "isReadOnly": false, + "isUnique": false, + "numericPrecision": 255, + "numericScale": 255, + "udtAssemblyQualifiedName": null, + "dataType": "System.String, System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e", + "dataTypeName": "varchar" + }, + { + "isBytes": false, + "isChars": false, + "isSqlVariant": false, + "isUdt": false, + "isXml": false, + "isJson": false, + "sqlDbType": 8, + "isHierarchyId": false, + "isSqlXmlType": false, + "isUnknownType": false, + "isUpdatable": true, + "allowDBNull": true, + "baseCatalogName": null, + "baseColumnName": null, + "baseSchemaName": null, + "baseServerName": null, + "baseTableName": null, + "columnName": "pop_data_2018", + "columnOrdinal": 9, + "columnSize": 4, + "isAliased": null, + "isAutoIncrement": false, + "isExpression": null, + "isHidden": null, + "isIdentity": false, + "isKey": null, + "isLong": false, + "isReadOnly": false, + "isUnique": false, + "numericPrecision": 10, + "numericScale": 255, + "udtAssemblyQualifiedName": null, + "dataType": "System.Int32, System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e", + "dataTypeName": "int" + }, + { + "isBytes": false, + "isChars": true, + "isSqlVariant": false, + "isUdt": false, + "isXml": false, + "isJson": false, + "sqlDbType": 22, + "isHierarchyId": false, + "isSqlXmlType": false, + "isUnknownType": false, + "isUpdatable": true, + "allowDBNull": true, + "baseCatalogName": null, + "baseColumnName": null, + "baseSchemaName": null, + "baseServerName": null, + "baseTableName": null, + "columnName": "continent_exp", + "columnOrdinal": 10, + "columnSize": 8000, + "isAliased": null, + "isAutoIncrement": false, + "isExpression": null, + "isHidden": null, + "isIdentity": false, + "isKey": null, + "isLong": false, + "isReadOnly": false, + "isUnique": false, + "numericPrecision": 255, + "numericScale": 255, + "udtAssemblyQualifiedName": null, + "dataType": "System.String, System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e", + "dataTypeName": "varchar" + }, + { + "isBytes": false, + "isChars": false, + "isSqlVariant": false, + "isUdt": false, + "isXml": false, + "isJson": false, + "sqlDbType": 33, + "isHierarchyId": false, + "isSqlXmlType": false, + "isUnknownType": false, + "isUpdatable": true, + "allowDBNull": true, + "baseCatalogName": null, + "baseColumnName": null, + "baseSchemaName": null, + "baseServerName": null, + "baseTableName": null, + "columnName": "load_date", + "columnOrdinal": 11, + "columnSize": 6, + "isAliased": null, + "isAutoIncrement": false, + "isExpression": null, + "isHidden": null, + "isIdentity": false, + "isKey": null, + "isLong": false, + "isReadOnly": false, + "isUnique": false, + "numericPrecision": 255, + "numericScale": 0, + "udtAssemblyQualifiedName": null, + "dataType": "System.DateTime, System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e", + "dataTypeName": "datetime2" + }, + { + "isBytes": false, + "isChars": true, + "isSqlVariant": false, + "isUdt": false, + "isXml": false, + "isJson": false, + "sqlDbType": 22, + "isHierarchyId": false, + "isSqlXmlType": false, + "isUnknownType": false, + "isUpdatable": true, + "allowDBNull": true, + "baseCatalogName": null, + "baseColumnName": null, + "baseSchemaName": null, + "baseServerName": null, + "baseTableName": null, + "columnName": "iso_country", + "columnOrdinal": 12, + "columnSize": 8000, + "isAliased": null, + "isAutoIncrement": false, + "isExpression": null, + "isHidden": null, + "isIdentity": false, + "isKey": null, + "isLong": false, + "isReadOnly": false, + "isUnique": false, + "numericPrecision": 255, + "numericScale": 255, + "udtAssemblyQualifiedName": null, + "dataType": "System.String, System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e", + "dataTypeName": "varchar" + } + ], + "specialAction": { + "none": true, + "expectYukonXMLShowPlan": false + } + } + }, + "execution_count": 7, + "data": { + "application/vnd.dataresource+json": { + "schema": { + "fields": [ + { + "name": "date_rep" + }, + { + "name": "day" + }, + { + "name": "month" + }, + { + "name": "year" + }, + { + "name": "cases" + }, + { + "name": "deaths" + }, + { + "name": "countries_and_territories" + }, + { + "name": "geo_id" + }, + { + "name": "country_territory_code" + }, + { + "name": "pop_data_2018" + }, + { + "name": "continent_exp" + }, + { + "name": "load_date" + }, + { + "name": "iso_country" + } + ] + }, + "data": [ + { + "0": "2020-12-14", + "1": "14", + "2": "12", + "3": "2020", + "4": "746", + "5": "6", + "6": "Afghanistan", + "7": "AF", + "8": "AFG", + "9": "NULL", + "10": "Asia", + "11": "2021-05-11 00:07:13", + "12": "AF" + }, + { + "0": "2020-12-13", + "1": "13", + "2": "12", + "3": "2020", + "4": "298", + "5": "9", + "6": "Afghanistan", + "7": "AF", + "8": "AFG", + "9": "NULL", + "10": "Asia", + "11": "2021-05-11 00:07:13", + "12": "AF" + }, + { + "0": "2020-12-12", + "1": "12", + "2": "12", + "3": "2020", + "4": "113", + "5": "11", + "6": "Afghanistan", + "7": "AF", + "8": "AFG", + "9": "NULL", + "10": "Asia", + "11": "2021-05-11 00:07:13", + "12": "AF" + }, + { + "0": "2020-12-11", + "1": "11", + "2": "12", + "3": "2020", + "4": "63", + "5": "10", + "6": "Afghanistan", + "7": "AF", + "8": "AFG", + "9": "NULL", + "10": "Asia", + "11": "2021-05-11 00:07:13", + "12": "AF" + }, + { + "0": "2020-12-10", + "1": "10", + "2": "12", + "3": "2020", + "4": "202", + "5": "16", + "6": "Afghanistan", + "7": "AF", + "8": "AFG", + "9": "NULL", + "10": "Asia", + "11": "2021-05-11 00:07:13", + "12": "AF" + }, + { + "0": "2020-12-09", + "1": "9", + "2": "12", + "3": "2020", + "4": "135", + "5": "13", + "6": "Afghanistan", + "7": "AF", + "8": "AFG", + "9": "NULL", + "10": "Asia", + "11": "2021-05-11 00:07:13", + "12": "AF" + }, + { + "0": "2020-12-08", + "1": "8", + "2": "12", + "3": "2020", + "4": "200", + "5": "6", + "6": "Afghanistan", + "7": "AF", + "8": "AFG", + "9": "NULL", + "10": "Asia", + "11": "2021-05-11 00:07:13", + "12": "AF" + }, + { + "0": "2020-12-07", + "1": "7", + "2": "12", + "3": "2020", + "4": "210", + "5": "26", + "6": "Afghanistan", + "7": "AF", + "8": "AFG", + "9": "NULL", + "10": "Asia", + "11": "2021-05-11 00:07:13", + "12": "AF" + }, + { + "0": "2020-12-06", + "1": "6", + "2": "12", + "3": "2020", + "4": "234", + "5": "10", + "6": "Afghanistan", + "7": "AF", + "8": "AFG", + "9": "NULL", + "10": "Asia", + "11": "2021-05-11 00:07:13", + "12": "AF" + }, + { + "0": "2020-12-05", + "1": "5", + "2": "12", + "3": "2020", + "4": "235", + "5": "18", + "6": "Afghanistan", + "7": "AF", + "8": "AFG", + "9": "NULL", + "10": "Asia", + "11": "2021-05-11 00:07:13", + "12": "AF" + } + ] + }, + "text/html": [ + "", + "", + "", + "", + "", + "", + "", + "", + "", + "", + "", + "", + "
date_repdaymonthyearcasesdeathscountries_and_territoriesgeo_idcountry_territory_codepop_data_2018continent_expload_dateiso_country
2020-12-14141220207466AfghanistanAFAFGNULLAsia2021-05-11 00:07:13AF
2020-12-13131220202989AfghanistanAFAFGNULLAsia2021-05-11 00:07:13AF
2020-12-121212202011311AfghanistanAFAFGNULLAsia2021-05-11 00:07:13AF
2020-12-11111220206310AfghanistanAFAFGNULLAsia2021-05-11 00:07:13AF
2020-12-101012202020216AfghanistanAFAFGNULLAsia2021-05-11 00:07:13AF
2020-12-09912202013513AfghanistanAFAFGNULLAsia2021-05-11 00:07:13AF
2020-12-0881220202006AfghanistanAFAFGNULLAsia2021-05-11 00:07:13AF
2020-12-07712202021026AfghanistanAFAFGNULLAsia2021-05-11 00:07:13AF
2020-12-06612202023410AfghanistanAFAFGNULLAsia2021-05-11 00:07:13AF
2020-12-05512202023518AfghanistanAFAFGNULLAsia2021-05-11 00:07:13AF
" + ] + } + } + ], + "execution_count": 7 + }, + { + "cell_type": "markdown", + "source": [ + "## Data source usage\n", + "\n", + "Previous example uses full path to the file. As an alternative, you can create an external data source with the location that points to the root folder of the storage, and use that data source and the relative path to the file in `OPENROWSET` function.\n", + "\n", + "First you need to create `EXTERNAL DATA SOURCE` in some database:" + ], + "metadata": { + "azdata_cell_guid": "a373fa76-bfdf-4bb6-8098-73c9ef436eb8" + } + }, + { + "cell_type": "code", + "source": [ + "create external data source DeltaLakeStorage\r\n", + "with ( location = 'https://sqlondemandstorage.blob.core.windows.net/delta-lake/' );" + ], + "metadata": { + "azdata_cell_guid": "48b6ee55-09ec-47df-bea5-707dc2f42aa8" + }, + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "source": [ + "Make sure that you create `EXTERNAL DATA SOURCE` in database other than `master`. If data source is protected with some credential you might need to create credential that is associated to data source.\r\n", + "\r\n", + "Once you have properly configures data source, you can use it in `OPENROWSET` function:" + ], + "metadata": { + "azdata_cell_guid": "4e487b5a-657a-4c2d-b24c-d9c755d27e4c" + } + }, + { + "cell_type": "code", + "source": [ + "select top 10 *\r\n", + "from openrowset(\r\n", + " bulk 'covid',\r\n", + " data_source = 'DeltaLakeStorage',\r\n", + " format = 'delta'\r\n", + " ) as rows" + ], + "metadata": { + "azdata_cell_guid": "6ab5dd60-2dfe-4c19-a390-0c1505b0bde9" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/html": "Statement ID: {541C6B53-20E8-4B96-8EAD-B2CC3C6025F2} | Query hash: 0x2607A6E4C1BCC82C | Distributed request ID: {FEE63FA3-871C-4E0E-83C4-B5587C2AF801}. Total size of data scanned is 1 megabytes, total size of data moved is 1 megabytes, total size of data written is 0 megabytes." + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/html": "(10 rows affected)" + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/html": "Total execution time: 00:00:02.790" + }, + "metadata": {} + }, + { + "output_type": "execute_result", + "metadata": { + "resultSet": { + "id": 0, + "batchId": 0, + "rowCount": 10, + "complete": true, + "columnInfo": [ + { + "isBytes": false, + "isChars": false, + "isSqlVariant": false, + "isUdt": false, + "isXml": false, + "isJson": false, + "sqlDbType": 31, + "isHierarchyId": false, + "isSqlXmlType": false, + "isUnknownType": false, + "isUpdatable": true, + "allowDBNull": true, + "baseCatalogName": null, + "baseColumnName": null, + "baseSchemaName": null, + "baseServerName": null, + "baseTableName": null, + "columnName": "date_rep", + "columnOrdinal": 0, + "columnSize": 3, + "isAliased": null, + "isAutoIncrement": false, + "isExpression": null, + "isHidden": null, + "isIdentity": false, + "isKey": null, + "isLong": false, + "isReadOnly": false, + "isUnique": false, + "numericPrecision": 255, + "numericScale": 255, + "udtAssemblyQualifiedName": null, + "dataType": "System.DateTime, System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e", + "dataTypeName": "date" + }, + { + "isBytes": false, + "isChars": false, + "isSqlVariant": false, + "isUdt": false, + "isXml": false, + "isJson": false, + "sqlDbType": 16, + "isHierarchyId": false, + "isSqlXmlType": false, + "isUnknownType": false, + "isUpdatable": true, + "allowDBNull": true, + "baseCatalogName": null, + "baseColumnName": null, + "baseSchemaName": null, + "baseServerName": null, + "baseTableName": null, + "columnName": "day", + "columnOrdinal": 1, + "columnSize": 2, + "isAliased": null, + "isAutoIncrement": false, + "isExpression": null, + "isHidden": null, + "isIdentity": false, + "isKey": null, + "isLong": false, + "isReadOnly": false, + "isUnique": false, + "numericPrecision": 5, + "numericScale": 255, + "udtAssemblyQualifiedName": null, + "dataType": "System.Int16, System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e", + "dataTypeName": "smallint" + }, + { + "isBytes": false, + "isChars": false, + "isSqlVariant": false, + "isUdt": false, + "isXml": false, + "isJson": false, + "sqlDbType": 16, + "isHierarchyId": false, + "isSqlXmlType": false, + "isUnknownType": false, + "isUpdatable": true, + "allowDBNull": true, + "baseCatalogName": null, + "baseColumnName": null, + "baseSchemaName": null, + "baseServerName": null, + "baseTableName": null, + "columnName": "month", + "columnOrdinal": 2, + "columnSize": 2, + "isAliased": null, + "isAutoIncrement": false, + "isExpression": null, + "isHidden": null, + "isIdentity": false, + "isKey": null, + "isLong": false, + "isReadOnly": false, + "isUnique": false, + "numericPrecision": 5, + "numericScale": 255, + "udtAssemblyQualifiedName": null, + "dataType": "System.Int16, System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e", + "dataTypeName": "smallint" + }, + { + "isBytes": false, + "isChars": false, + "isSqlVariant": false, + "isUdt": false, + "isXml": false, + "isJson": false, + "sqlDbType": 16, + "isHierarchyId": false, + "isSqlXmlType": false, + "isUnknownType": false, + "isUpdatable": true, + "allowDBNull": true, + "baseCatalogName": null, + "baseColumnName": null, + "baseSchemaName": null, + "baseServerName": null, + "baseTableName": null, + "columnName": "year", + "columnOrdinal": 3, + "columnSize": 2, + "isAliased": null, + "isAutoIncrement": false, + "isExpression": null, + "isHidden": null, + "isIdentity": false, + "isKey": null, + "isLong": false, + "isReadOnly": false, + "isUnique": false, + "numericPrecision": 5, + "numericScale": 255, + "udtAssemblyQualifiedName": null, + "dataType": "System.Int16, System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e", + "dataTypeName": "smallint" + }, + { + "isBytes": false, + "isChars": false, + "isSqlVariant": false, + "isUdt": false, + "isXml": false, + "isJson": false, + "sqlDbType": 16, + "isHierarchyId": false, + "isSqlXmlType": false, + "isUnknownType": false, + "isUpdatable": true, + "allowDBNull": true, + "baseCatalogName": null, + "baseColumnName": null, + "baseSchemaName": null, + "baseServerName": null, + "baseTableName": null, + "columnName": "cases", + "columnOrdinal": 4, + "columnSize": 2, + "isAliased": null, + "isAutoIncrement": false, + "isExpression": null, + "isHidden": null, + "isIdentity": false, + "isKey": null, + "isLong": false, + "isReadOnly": false, + "isUnique": false, + "numericPrecision": 5, + "numericScale": 255, + "udtAssemblyQualifiedName": null, + "dataType": "System.Int16, System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e", + "dataTypeName": "smallint" + }, + { + "isBytes": false, + "isChars": false, + "isSqlVariant": false, + "isUdt": false, + "isXml": false, + "isJson": false, + "sqlDbType": 16, + "isHierarchyId": false, + "isSqlXmlType": false, + "isUnknownType": false, + "isUpdatable": true, + "allowDBNull": true, + "baseCatalogName": null, + "baseColumnName": null, + "baseSchemaName": null, + "baseServerName": null, + "baseTableName": null, + "columnName": "deaths", + "columnOrdinal": 5, + "columnSize": 2, + "isAliased": null, + "isAutoIncrement": false, + "isExpression": null, + "isHidden": null, + "isIdentity": false, + "isKey": null, + "isLong": false, + "isReadOnly": false, + "isUnique": false, + "numericPrecision": 5, + "numericScale": 255, + "udtAssemblyQualifiedName": null, + "dataType": "System.Int16, System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e", + "dataTypeName": "smallint" + }, + { + "isBytes": false, + "isChars": true, + "isSqlVariant": false, + "isUdt": false, + "isXml": false, + "isJson": false, + "sqlDbType": 22, + "isHierarchyId": false, + "isSqlXmlType": false, + "isUnknownType": false, + "isUpdatable": true, + "allowDBNull": true, + "baseCatalogName": null, + "baseColumnName": null, + "baseSchemaName": null, + "baseServerName": null, + "baseTableName": null, + "columnName": "countries_and_territories", + "columnOrdinal": 6, + "columnSize": 8000, + "isAliased": null, + "isAutoIncrement": false, + "isExpression": null, + "isHidden": null, + "isIdentity": false, + "isKey": null, + "isLong": false, + "isReadOnly": false, + "isUnique": false, + "numericPrecision": 255, + "numericScale": 255, + "udtAssemblyQualifiedName": null, + "dataType": "System.String, System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e", + "dataTypeName": "varchar" + }, + { + "isBytes": false, + "isChars": true, + "isSqlVariant": false, + "isUdt": false, + "isXml": false, + "isJson": false, + "sqlDbType": 22, + "isHierarchyId": false, + "isSqlXmlType": false, + "isUnknownType": false, + "isUpdatable": true, + "allowDBNull": true, + "baseCatalogName": null, + "baseColumnName": null, + "baseSchemaName": null, + "baseServerName": null, + "baseTableName": null, + "columnName": "geo_id", + "columnOrdinal": 7, + "columnSize": 8000, + "isAliased": null, + "isAutoIncrement": false, + "isExpression": null, + "isHidden": null, + "isIdentity": false, + "isKey": null, + "isLong": false, + "isReadOnly": false, + "isUnique": false, + "numericPrecision": 255, + "numericScale": 255, + "udtAssemblyQualifiedName": null, + "dataType": "System.String, System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e", + "dataTypeName": "varchar" + }, + { + "isBytes": false, + "isChars": true, + "isSqlVariant": false, + "isUdt": false, + "isXml": false, + "isJson": false, + "sqlDbType": 22, + "isHierarchyId": false, + "isSqlXmlType": false, + "isUnknownType": false, + "isUpdatable": true, + "allowDBNull": true, + "baseCatalogName": null, + "baseColumnName": null, + "baseSchemaName": null, + "baseServerName": null, + "baseTableName": null, + "columnName": "country_territory_code", + "columnOrdinal": 8, + "columnSize": 8000, + "isAliased": null, + "isAutoIncrement": false, + "isExpression": null, + "isHidden": null, + "isIdentity": false, + "isKey": null, + "isLong": false, + "isReadOnly": false, + "isUnique": false, + "numericPrecision": 255, + "numericScale": 255, + "udtAssemblyQualifiedName": null, + "dataType": "System.String, System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e", + "dataTypeName": "varchar" + }, + { + "isBytes": false, + "isChars": false, + "isSqlVariant": false, + "isUdt": false, + "isXml": false, + "isJson": false, + "sqlDbType": 8, + "isHierarchyId": false, + "isSqlXmlType": false, + "isUnknownType": false, + "isUpdatable": true, + "allowDBNull": true, + "baseCatalogName": null, + "baseColumnName": null, + "baseSchemaName": null, + "baseServerName": null, + "baseTableName": null, + "columnName": "pop_data_2018", + "columnOrdinal": 9, + "columnSize": 4, + "isAliased": null, + "isAutoIncrement": false, + "isExpression": null, + "isHidden": null, + "isIdentity": false, + "isKey": null, + "isLong": false, + "isReadOnly": false, + "isUnique": false, + "numericPrecision": 10, + "numericScale": 255, + "udtAssemblyQualifiedName": null, + "dataType": "System.Int32, System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e", + "dataTypeName": "int" + }, + { + "isBytes": false, + "isChars": true, + "isSqlVariant": false, + "isUdt": false, + "isXml": false, + "isJson": false, + "sqlDbType": 22, + "isHierarchyId": false, + "isSqlXmlType": false, + "isUnknownType": false, + "isUpdatable": true, + "allowDBNull": true, + "baseCatalogName": null, + "baseColumnName": null, + "baseSchemaName": null, + "baseServerName": null, + "baseTableName": null, + "columnName": "continent_exp", + "columnOrdinal": 10, + "columnSize": 8000, + "isAliased": null, + "isAutoIncrement": false, + "isExpression": null, + "isHidden": null, + "isIdentity": false, + "isKey": null, + "isLong": false, + "isReadOnly": false, + "isUnique": false, + "numericPrecision": 255, + "numericScale": 255, + "udtAssemblyQualifiedName": null, + "dataType": "System.String, System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e", + "dataTypeName": "varchar" + }, + { + "isBytes": false, + "isChars": false, + "isSqlVariant": false, + "isUdt": false, + "isXml": false, + "isJson": false, + "sqlDbType": 33, + "isHierarchyId": false, + "isSqlXmlType": false, + "isUnknownType": false, + "isUpdatable": true, + "allowDBNull": true, + "baseCatalogName": null, + "baseColumnName": null, + "baseSchemaName": null, + "baseServerName": null, + "baseTableName": null, + "columnName": "load_date", + "columnOrdinal": 11, + "columnSize": 6, + "isAliased": null, + "isAutoIncrement": false, + "isExpression": null, + "isHidden": null, + "isIdentity": false, + "isKey": null, + "isLong": false, + "isReadOnly": false, + "isUnique": false, + "numericPrecision": 255, + "numericScale": 0, + "udtAssemblyQualifiedName": null, + "dataType": "System.DateTime, System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e", + "dataTypeName": "datetime2" + }, + { + "isBytes": false, + "isChars": true, + "isSqlVariant": false, + "isUdt": false, + "isXml": false, + "isJson": false, + "sqlDbType": 22, + "isHierarchyId": false, + "isSqlXmlType": false, + "isUnknownType": false, + "isUpdatable": true, + "allowDBNull": true, + "baseCatalogName": null, + "baseColumnName": null, + "baseSchemaName": null, + "baseServerName": null, + "baseTableName": null, + "columnName": "iso_country", + "columnOrdinal": 12, + "columnSize": 8000, + "isAliased": null, + "isAutoIncrement": false, + "isExpression": null, + "isHidden": null, + "isIdentity": false, + "isKey": null, + "isLong": false, + "isReadOnly": false, + "isUnique": false, + "numericPrecision": 255, + "numericScale": 255, + "udtAssemblyQualifiedName": null, + "dataType": "System.String, System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e", + "dataTypeName": "varchar" + } + ], + "specialAction": { + "none": true, + "expectYukonXMLShowPlan": false + } + } + }, + "execution_count": 5, + "data": { + "application/vnd.dataresource+json": { + "schema": { + "fields": [ + { + "name": "date_rep" + }, + { + "name": "day" + }, + { + "name": "month" + }, + { + "name": "year" + }, + { + "name": "cases" + }, + { + "name": "deaths" + }, + { + "name": "countries_and_territories" + }, + { + "name": "geo_id" + }, + { + "name": "country_territory_code" + }, + { + "name": "pop_data_2018" + }, + { + "name": "continent_exp" + }, + { + "name": "load_date" + }, + { + "name": "iso_country" + } + ] + }, + "data": [ + { + "0": "2020-12-14", + "1": "14", + "2": "12", + "3": "2020", + "4": "746", + "5": "6", + "6": "Afghanistan", + "7": "AF", + "8": "AFG", + "9": "NULL", + "10": "Asia", + "11": "2021-05-11 00:07:13", + "12": "AF" + }, + { + "0": "2020-12-13", + "1": "13", + "2": "12", + "3": "2020", + "4": "298", + "5": "9", + "6": "Afghanistan", + "7": "AF", + "8": "AFG", + "9": "NULL", + "10": "Asia", + "11": "2021-05-11 00:07:13", + "12": "AF" + }, + { + "0": "2020-12-12", + "1": "12", + "2": "12", + "3": "2020", + "4": "113", + "5": "11", + "6": "Afghanistan", + "7": "AF", + "8": "AFG", + "9": "NULL", + "10": "Asia", + "11": "2021-05-11 00:07:13", + "12": "AF" + }, + { + "0": "2020-12-11", + "1": "11", + "2": "12", + "3": "2020", + "4": "63", + "5": "10", + "6": "Afghanistan", + "7": "AF", + "8": "AFG", + "9": "NULL", + "10": "Asia", + "11": "2021-05-11 00:07:13", + "12": "AF" + }, + { + "0": "2020-12-10", + "1": "10", + "2": "12", + "3": "2020", + "4": "202", + "5": "16", + "6": "Afghanistan", + "7": "AF", + "8": "AFG", + "9": "NULL", + "10": "Asia", + "11": "2021-05-11 00:07:13", + "12": "AF" + }, + { + "0": "2020-12-09", + "1": "9", + "2": "12", + "3": "2020", + "4": "135", + "5": "13", + "6": "Afghanistan", + "7": "AF", + "8": "AFG", + "9": "NULL", + "10": "Asia", + "11": "2021-05-11 00:07:13", + "12": "AF" + }, + { + "0": "2020-12-08", + "1": "8", + "2": "12", + "3": "2020", + "4": "200", + "5": "6", + "6": "Afghanistan", + "7": "AF", + "8": "AFG", + "9": "NULL", + "10": "Asia", + "11": "2021-05-11 00:07:13", + "12": "AF" + }, + { + "0": "2020-12-07", + "1": "7", + "2": "12", + "3": "2020", + "4": "210", + "5": "26", + "6": "Afghanistan", + "7": "AF", + "8": "AFG", + "9": "NULL", + "10": "Asia", + "11": "2021-05-11 00:07:13", + "12": "AF" + }, + { + "0": "2020-12-06", + "1": "6", + "2": "12", + "3": "2020", + "4": "234", + "5": "10", + "6": "Afghanistan", + "7": "AF", + "8": "AFG", + "9": "NULL", + "10": "Asia", + "11": "2021-05-11 00:07:13", + "12": "AF" + }, + { + "0": "2020-12-05", + "1": "5", + "2": "12", + "3": "2020", + "4": "235", + "5": "18", + "6": "Afghanistan", + "7": "AF", + "8": "AFG", + "9": "NULL", + "10": "Asia", + "11": "2021-05-11 00:07:13", + "12": "AF" + } + ] + }, + "text/html": [ + "", + "", + "", + "", + "", + "", + "", + "", + "", + "", + "", + "", + "
date_repdaymonthyearcasesdeathscountries_and_territoriesgeo_idcountry_territory_codepop_data_2018continent_expload_dateiso_country
2020-12-14141220207466AfghanistanAFAFGNULLAsia2021-05-11 00:07:13AF
2020-12-13131220202989AfghanistanAFAFGNULLAsia2021-05-11 00:07:13AF
2020-12-121212202011311AfghanistanAFAFGNULLAsia2021-05-11 00:07:13AF
2020-12-11111220206310AfghanistanAFAFGNULLAsia2021-05-11 00:07:13AF
2020-12-101012202020216AfghanistanAFAFGNULLAsia2021-05-11 00:07:13AF
2020-12-09912202013513AfghanistanAFAFGNULLAsia2021-05-11 00:07:13AF
2020-12-0881220202006AfghanistanAFAFGNULLAsia2021-05-11 00:07:13AF
2020-12-07712202021026AfghanistanAFAFGNULLAsia2021-05-11 00:07:13AF
2020-12-06612202023410AfghanistanAFAFGNULLAsia2021-05-11 00:07:13AF
2020-12-05512202023518AfghanistanAFAFGNULLAsia2021-05-11 00:07:13AF
" + ] + } + } + ], + "execution_count": 5 + }, + { + "cell_type": "markdown", + "source": [ + "## Explicitly specify schema\n", + "\n", + "The `OPENROWSET` function enables you to explicitly specify what are the types of the columns that you want to read from the file using `WITH` clause:" + ], + "metadata": { + "azdata_cell_guid": "c4145b77-8663-4e59-914b-721955a02635" + } + }, + { + "cell_type": "code", + "source": [ + "select top 10 *\r\n", + "from openrowset(\r\n", + " bulk 'covid',\r\n", + " data_source = 'DeltaLakeStorage',\r\n", + " format = 'delta'\r\n", + " )\r\n", + " with ( date_rep date,\r\n", + " cases int,\r\n", + " geo_id varchar(6)\r\n", + " ) as rows" + ], + "metadata": { + "azdata_cell_guid": "f3da158c-c168-45b0-8e38-7ee2d430420f" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/html": "Statement ID: {095034CD-EC65-4F33-A714-2A3E8FD551F8} | Query hash: 0x72CAC9F6D331E65C | Distributed request ID: {1353D55C-894D-44CE-BCA8-0926313CD889}. Total size of data scanned is 1 megabytes, total size of data moved is 1 megabytes, total size of data written is 0 megabytes." + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/html": "(10 rows affected)" + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/html": "Total execution time: 00:00:01.915" + }, + "metadata": {} + }, + { + "output_type": "execute_result", + "metadata": { + "resultSet": { + "id": 0, + "batchId": 0, + "rowCount": 10, + "complete": true, + "columnInfo": [ + { + "isBytes": false, + "isChars": false, + "isSqlVariant": false, + "isUdt": false, + "isXml": false, + "isJson": false, + "sqlDbType": 31, + "isHierarchyId": false, + "isSqlXmlType": false, + "isUnknownType": false, + "isUpdatable": true, + "allowDBNull": true, + "baseCatalogName": null, + "baseColumnName": null, + "baseSchemaName": null, + "baseServerName": null, + "baseTableName": null, + "columnName": "date_rep", + "columnOrdinal": 0, + "columnSize": 3, + "isAliased": null, + "isAutoIncrement": false, + "isExpression": null, + "isHidden": null, + "isIdentity": false, + "isKey": null, + "isLong": false, + "isReadOnly": false, + "isUnique": false, + "numericPrecision": 255, + "numericScale": 255, + "udtAssemblyQualifiedName": null, + "dataType": "System.DateTime, System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e", + "dataTypeName": "date" + }, + { + "isBytes": false, + "isChars": false, + "isSqlVariant": false, + "isUdt": false, + "isXml": false, + "isJson": false, + "sqlDbType": 8, + "isHierarchyId": false, + "isSqlXmlType": false, + "isUnknownType": false, + "isUpdatable": true, + "allowDBNull": true, + "baseCatalogName": null, + "baseColumnName": null, + "baseSchemaName": null, + "baseServerName": null, + "baseTableName": null, + "columnName": "cases", + "columnOrdinal": 1, + "columnSize": 4, + "isAliased": null, + "isAutoIncrement": false, + "isExpression": null, + "isHidden": null, + "isIdentity": false, + "isKey": null, + "isLong": false, + "isReadOnly": false, + "isUnique": false, + "numericPrecision": 10, + "numericScale": 255, + "udtAssemblyQualifiedName": null, + "dataType": "System.Int32, System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e", + "dataTypeName": "int" + }, + { + "isBytes": false, + "isChars": true, + "isSqlVariant": false, + "isUdt": false, + "isXml": false, + "isJson": false, + "sqlDbType": 22, + "isHierarchyId": false, + "isSqlXmlType": false, + "isUnknownType": false, + "isUpdatable": true, + "allowDBNull": true, + "baseCatalogName": null, + "baseColumnName": null, + "baseSchemaName": null, + "baseServerName": null, + "baseTableName": null, + "columnName": "geo_id", + "columnOrdinal": 2, + "columnSize": 6, + "isAliased": null, + "isAutoIncrement": false, + "isExpression": null, + "isHidden": null, + "isIdentity": false, + "isKey": null, + "isLong": false, + "isReadOnly": false, + "isUnique": false, + "numericPrecision": 255, + "numericScale": 255, + "udtAssemblyQualifiedName": null, + "dataType": "System.String, System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e", + "dataTypeName": "varchar" + } + ], + "specialAction": { + "none": true, + "expectYukonXMLShowPlan": false + } + } + }, + "execution_count": 6, + "data": { + "application/vnd.dataresource+json": { + "schema": { + "fields": [ + { + "name": "date_rep" + }, + { + "name": "cases" + }, + { + "name": "geo_id" + } + ] + }, + "data": [ + { + "0": "2020-12-14", + "1": "746", + "2": "AF" + }, + { + "0": "2020-12-13", + "1": "298", + "2": "AF" + }, + { + "0": "2020-12-12", + "1": "113", + "2": "AF" + }, + { + "0": "2020-12-11", + "1": "63", + "2": "AF" + }, + { + "0": "2020-12-10", + "1": "202", + "2": "AF" + }, + { + "0": "2020-12-09", + "1": "135", + "2": "AF" + }, + { + "0": "2020-12-08", + "1": "200", + "2": "AF" + }, + { + "0": "2020-12-07", + "1": "210", + "2": "AF" + }, + { + "0": "2020-12-06", + "1": "234", + "2": "AF" + }, + { + "0": "2020-12-05", + "1": "235", + "2": "AF" + } + ] + }, + "text/html": [ + "", + "", + "", + "", + "", + "", + "", + "", + "", + "", + "", + "", + "
date_repcasesgeo_id
2020-12-14746AF
2020-12-13298AF
2020-12-12113AF
2020-12-1163AF
2020-12-10202AF
2020-12-09135AF
2020-12-08200AF
2020-12-07210AF
2020-12-06234AF
2020-12-05235AF
" + ] + } + } + ], + "execution_count": 6 + }, + { + "cell_type": "markdown", + "source": [ + "Delta Lake data types are by default mapped to SQL types. The following table describes how Parquet types are mapped to SQL native types.\n", + "\n", + "| Parquet type | Parquet logical type (annotation) | SQL data type |\n", + "| --- | --- | --- |\n", + "| BOOLEAN | | bit |\n", + "| BINARY / BYTE\\_ARRAY | | varbinary |\n", + "| DOUBLE | | float |\n", + "| FLOAT | | real |\n", + "| INT32 | | int |\n", + "| INT64 | | bigint |\n", + "| INT96 | | datetime2 |\n", + "| FIXED\\_LEN\\_BYTE\\_ARRAY | | binary |\n", + "| BINARY | UTF8 | varchar \\*(UTF8 collation) |\n", + "| BINARY | STRING | varchar \\*(UTF8 collation) |\n", + "| BINARY | ENUM | varchar \\*(UTF8 collation) |\n", + "| BINARY | UUID | uniqueidentifier |\n", + "| BINARY | DECIMAL | decimal |\n", + "| BINARY | JSON | varchar(max) \\*(UTF8 collation) |\n", + "| BINARY | BSON | varbinary(max) |\n", + "| FIXED\\_LEN\\_BYTE\\_ARRAY | DECIMAL | decimal |\n", + "| BYTE\\_ARRAY | INTERVAL | varchar(max), serialized into standardized format |\n", + "| INT32 | INT(8, true) | smallint |\n", + "| INT32 | INT(16, true) | smallint |\n", + "| INT32 | INT(32, true) | int |\n", + "| INT32 | INT(8, false) | tinyint |\n", + "| INT32 | INT(16, false) | int |\n", + "| INT32 | INT(32, false) | bigint |\n", + "| INT32 | DATE | date |\n", + "| INT32 | DECIMAL | decimal |\n", + "| INT32 | TIME (MILLIS ) | time |\n", + "| INT64 | INT(64, true) | bigint |\n", + "| INT64 | INT(64, false ) | decimal(20,0) |\n", + "| INT64 | DECIMAL | decimal |\n", + "| INT64 | TIME (MICROS / NANOS) | time |\n", + "| INT64 | TIMESTAMP (MILLIS / MICROS / NANOS) | datetime2 |\n", + "| [Complex type](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#lists) | LIST | varchar(max), serialized into JSON |\n", + "| [Complex type](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#maps) | MAP | varchar(max), serialized into JSON |" + ], + "metadata": { + "azdata_cell_guid": "745b2c81-01eb-4bf5-9cad-47a03dcff194" + } + } + ] +} \ No newline at end of file diff --git a/Notebooks/TSQL/Jupiter/content/quickstarts/img/covid-delta-lake-studio.png b/Notebooks/TSQL/Jupiter/content/quickstarts/img/covid-delta-lake-studio.png new file mode 100644 index 0000000..4d79c97 Binary files /dev/null and b/Notebooks/TSQL/Jupiter/content/quickstarts/img/covid-delta-lake-studio.png differ diff --git a/Notebooks/TSQL/Jupiter/content/quickstarts/readme.md b/Notebooks/TSQL/Jupiter/content/quickstarts/readme.md index 642f042..c21719e 100644 --- a/Notebooks/TSQL/Jupiter/content/quickstarts/readme.md +++ b/Notebooks/TSQL/Jupiter/content/quickstarts/readme.md @@ -3,6 +3,7 @@ This book contains quick-start sample that demonstrate how to query the following types of files: - [PARQUET](parquet.ipynb) - [CSV](csv.ipynb) +- [Delta Lake](delta-lake.ipynb) - [JSON](json.ipynb) diff --git a/Notebooks/TSQL/Jupiter/content/readme.md b/Notebooks/TSQL/Jupiter/content/readme.md index 86fae14..20039bc 100644 --- a/Notebooks/TSQL/Jupiter/content/readme.md +++ b/Notebooks/TSQL/Jupiter/content/readme.md @@ -20,7 +20,7 @@ The template defines two resources: This book contains the following samples: -- Quick-start samples - reading [PARQUET](quickstarts/parquet.ipynb), [CSV](quickstarts/csv.ipynb) and [JSON](quickstarts/json.ipynb) +- Quick-start samples - reading [PARQUET](quickstarts/parquet.ipynb), [CSV](quickstarts/csv.ipynb), [Delta Lake](quickstarts/delta-lake.ipynb), and [JSON](quickstarts/json.ipynb) - Tutorials - [Analyze COVID data set provided by ECDC](tutorials/covid-ecdc.ipynb) and [Analyze NY Taxi rides](tutorials/ny-taxi.ipynb) -Open some of these notebooks, select SQL kernel and connect to your Synapse SQL endpoint. Follow the instructions in tutorials to run the samples. +Open some of these notebooks, select SQL kernel and connect to your serverless Synapse SQL endpoint. Follow the instructions in tutorials to run the samples.