Skip to content

Commit

Permalink
Add files via upload
Browse files Browse the repository at this point in the history
  • Loading branch information
jovanpop-msft authored Oct 21, 2020
1 parent 4b6b0a2 commit f053792
Show file tree
Hide file tree
Showing 4 changed files with 481 additions and 0 deletions.
145 changes: 145 additions & 0 deletions Notebooks/TSQL/Jupiter/content/quickstarts/csv.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
{
"metadata": {
"kernelspec": {
"name": "SQL",
"display_name": "SQL",
"language": "sql"
},
"language_info": {
"name": "sql",
"version": ""
}
},
"nbformat_minor": 2,
"nbformat": 4,
"cells": [
{
"cell_type": "markdown",
"source": [
"# Query CSV files\n",
"\n",
"Serverless Synapse SQL pool enables you to read CSV files from Azure storage (DataLake or blob storage).\n",
"\n",
"## Read csv file\n",
"\n",
"The easiest way to see to the content of your `CSV` file is to provide file URL to `OPENROWSET` function and specify format `CSV`. If the file is publicly available or if your Azure AD identity can access this file, you should be able to see the content of the file using the query like the one shown in the following example:"
],
"metadata": {
"azdata_cell_guid": "e01663cc-427c-457f-84db-b16d0fca3a90"
}
},
{
"cell_type": "code",
"source": [
"select top 10 *\r\n",
"from openrowset(\r\n",
" bulk 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.csv',\r\n",
" format = 'csv',\r\n",
" parser_version = '2.0',\r\n",
" firstrow = 2 ) as rows"
],
"metadata": {
"azdata_cell_guid": "dbc4f12e-388c-49fa-9d85-0fbea3b19d1b"
},
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"source": [
"## Data source usage\n",
"\n",
"Previous example uses full path to the file. As an alternative, you can create an external data source with the location that points to the root folder of the storage:"
],
"metadata": {
"azdata_cell_guid": "a373fa76-bfdf-4bb6-8098-73c9ef436eb8"
}
},
{
"cell_type": "code",
"source": [
"create external data source covid\r\n",
"with (\r\n",
" location = 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases'\r\n",
");"
],
"metadata": {
"azdata_cell_guid": "48b6ee55-09ec-47df-bea5-707dc2f42aa8"
},
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"source": [
"Once you create a data source, you can use that data source and the relative path to the file in `OPENROWSET` function:"
],
"metadata": {
"azdata_cell_guid": "c4145b77-8663-4e59-914b-721955a02635"
}
},
{
"cell_type": "code",
"source": [
"select\r\n",
" top 10 *\r\n",
"from\r\n",
" openrowset(\r\n",
" bulk 'latest/ecdc_cases.csv',\r\n",
" data_source = 'covid',\r\n",
" format = 'csv',\r\n",
" parser_version ='2.0',\r\n",
" firstrow = 2\r\n",
" ) as rows"
],
"metadata": {
"azdata_cell_guid": "f3da158c-c168-45b0-8e38-7ee2d430420f"
},
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"source": [
"## Explicitly specify schema \n",
"\n",
"`OPENROWSET` enables you to explicitly specify what columns you want to read from the file using `WITH` clause:"
],
"metadata": {
"azdata_cell_guid": "745b2c81-01eb-4bf5-9cad-47a03dcff194"
}
},
{
"cell_type": "code",
"source": [
"select\r\n",
" top 10 *\r\n",
"from\r\n",
" openrowset(\r\n",
" bulk 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.csv',\r\n",
" format = 'csv',\r\n",
" parser_version ='2.0',\r\n",
" firstrow = 2\r\n",
" ) with (\r\n",
" date_rep date 1,\r\n",
" cases int 5,\r\n",
" geo_id varchar(6) 8\r\n",
" ) as rows"
],
"metadata": {
"azdata_cell_guid": "e7bacd03-45d4-4b0b-b1d0-9522e1a54436"
},
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"source": [
"The numbers after a data type in the `WITH` clause represent column index in the CSV file."
],
"metadata": {
"azdata_cell_guid": "4397f453-4b20-4083-ae0e-4966d789993f"
}
}
]
}
158 changes: 158 additions & 0 deletions Notebooks/TSQL/Jupiter/content/quickstarts/json.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
{
"metadata": {
"kernelspec": {
"name": "SQL",
"display_name": "SQL",
"language": "sql"
},
"language_info": {
"name": "sql",
"version": ""
}
},
"nbformat_minor": 2,
"nbformat": 4,
"cells": [
{
"cell_type": "markdown",
"source": [
"# Query JSON files\n",
"\n",
"Serverless Synapse SQL pool enables you to read JSON files from Azure storage (DataLake or blob storage).\n",
"\n",
"## Read json line-delimited file\n",
"\n",
"One of the most common format that is used to store JSON documents is line-delimited JSON format (or JSON lines) where every JSOn document is placed in separate line separated with newline character. These files have extensions `jsonl`, `ldjson`, or `ndjson`.\n",
"\n",
"The easiest way to see to the content of your `jsonl` file is to provide file URL to `OPENROWSET` function and specify format `CSV`. If the file is publicly available or if your Azure AD identity can access this file, you should be able to see the content of the file using the query like the one shown in the following example:"
],
"metadata": {
"azdata_cell_guid": "e01663cc-427c-457f-84db-b16d0fca3a90"
}
},
{
"cell_type": "code",
"source": [
"select top 10 *\r\n",
"from openrowset(\r\n",
" bulk 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.jsonl',\r\n",
" format = 'csv',\r\n",
" fieldterminator ='0x0b',\r\n",
" fieldquote = '0x0b'\r\n",
" ) with (doc nvarchar(max)) as rows"
],
"metadata": {
"azdata_cell_guid": "dbc4f12e-388c-49fa-9d85-0fbea3b19d1b"
},
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"source": [
"This query will return one row for every JSON document placed in separate line."
],
"metadata": {
"azdata_cell_guid": "319c9414-ef30-440f-a3dd-f360a91fa145"
}
},
{
"cell_type": "markdown",
"source": [
"## Read json file\n",
"\n",
"Synapse SQL enables you to read entire content of JSON file as single text field:"
],
"metadata": {
"azdata_cell_guid": "d2597313-6223-4b1e-a6fa-979be1a3ce6e"
}
},
{
"cell_type": "code",
"source": [
"select top 10 *\r\n",
"from openrowset(\r\n",
" bulk 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.json',\r\n",
" format = 'csv',\r\n",
" fieldterminator ='0x0b',\r\n",
" fieldquote = '0x0b',\r\n",
" rowterminator = '0x0b' --> You need to override rowterminator to read classic JSON\r\n",
" ) with (doc nvarchar(max)) as rows"
],
"metadata": {
"azdata_cell_guid": "2e5d0ff7-c4ca-45d6-b2fd-cfb83989af1d"
},
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"source": [
"## Parse JSON document\n",
"\n",
"The query below shows you how to use [JSON\\_VALUE](https://docs.microsoft.com/en-us/sql/t-sql/functions/json-value-transact-sql?toc=/azure/synapse-analytics/toc.json&bc=/azure/synapse-analytics/breadcrumb/toc.json&view=azure-sqldw-latest \"https://docs.microsoft.com/en-us/sql/t-sql/functions/json-value-transact-sql?toc=/azure/synapse-analytics/toc.json&bc=/azure/synapse-analytics/breadcrumb/toc.json&view=azure-sqldw-latest\") to retrieve scalar values (title, publisher) from the JSON documents:"
],
"metadata": {
"azdata_cell_guid": "a373fa76-bfdf-4bb6-8098-73c9ef436eb8"
}
},
{
"cell_type": "code",
"source": [
"select\r\n",
" JSON_VALUE(doc, '$.date_rep') AS date_reported,\r\n",
" JSON_VALUE(doc, '$.countries_and_territories') AS country,\r\n",
" JSON_VALUE(doc, '$.cases') as cases,\r\n",
" doc\r\n",
"from openrowset(\r\n",
" bulk 'latest/ecdc_cases.jsonl',\r\n",
" data_source = 'covid',\r\n",
" format = 'csv',\r\n",
" fieldterminator ='0x0b',\r\n",
" fieldquote = '0x0b'\r\n",
" ) with (doc nvarchar(max)) as rows\r\n",
"order by JSON_VALUE(doc, '$.geo_id') desc"
],
"metadata": {
"azdata_cell_guid": "48b6ee55-09ec-47df-bea5-707dc2f42aa8"
},
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"source": [
"As an alternative, you can use `OPENJSON` function to parse documents:"
],
"metadata": {
"azdata_cell_guid": "c4145b77-8663-4e59-914b-721955a02635"
}
},
{
"cell_type": "code",
"source": [
"select\r\n",
" *\r\n",
"from openrowset(\r\n",
" bulk 'latest/ecdc_cases.jsonl',\r\n",
" data_source = 'covid',\r\n",
" format = 'csv',\r\n",
" fieldterminator ='0x0b',\r\n",
" fieldquote = '0x0b'\r\n",
" ) with (doc nvarchar(max)) as rows\r\n",
" cross apply openjson (doc)\r\n",
" with ( date_rep datetime2,\r\n",
" cases int,\r\n",
" fatal int '$.deaths',\r\n",
" country varchar(100) '$.countries_and_territories')\r\n",
"where country = 'Serbia'\r\n",
"order by country, date_rep desc;"
],
"metadata": {
"azdata_cell_guid": "f3da158c-c168-45b0-8e38-7ee2d430420f"
},
"outputs": [],
"execution_count": null
}
]
}
Loading

0 comments on commit f053792

Please sign in to comment.