Name	Name	Last commit message	Last commit date
Latest commit jaymedina grant ownership to CLONE database Nov 7, 2024 517525f · Nov 7, 2024 History 840 Commits
.github	.github	grant ownership to CLONE database	Nov 7, 2024
.vscode	.vscode	Newline	Aug 14, 2024
_tf_archive	_tf_archive	Move files into tf archive	May 18, 2024
admin	admin	remove clone_db.sql	Oct 24, 2024
analytics	analytics	Merge pull request #31 from Sage-Bionetworks/synsd-893-disjointed-fil…	May 14, 2024
data_validation	data_validation	Add in validation rules	May 20, 2024
genie	genie	Update genie/genie_elt.py	Apr 16, 2024
recover	recover	Add grants, add sage and recover folders, and add paths to ignore	Oct 31, 2023
sage	sage	Add samia to snowflake	Sep 26, 2024
synapse_data_warehouse	synapse_data_warehouse	.	Nov 5, 2024
transforms	transforms	Remove genie	Dec 21, 2023
.env_template	.env_template	Add in .env template	Sep 12, 2023
.gitignore	.gitignore	Separate .gitignore	Aug 13, 2024
.pre-commit-config.yaml	.pre-commit-config.yaml	Lock down permissions	May 14, 2024
.sqlfluffignore	.sqlfluffignore	Add external stages	Oct 30, 2023
CONTRIBUTING.md	CONTRIBUTING.md	rephrasing	Oct 31, 2024
LICENSE	LICENSE	Initial commit	Sep 7, 2023
README.md	README.md	new contributing.md	Oct 28, 2024

Repository files navigation

snowflake

This repository will house all the configuration, and data exploration of data that will be often executed. Configurations include warehouse, database, schema, table, user, role and other configurations.

Connecting to Snowflake

Important

The Sage snowflake overlords see all, please make sure you are careful with your queries. Always use limit when you are exploring tables!

Follow the steps below to get access to snowflake: https://mqzfhld-vp00034.snowflakecomputing.com

Ask internally at Sage for snowflake access. You are permitted to explore in the UI, but long standing views and tables should be created via PR and will be created via the SYSADMIN role by users that can assume the sysadmin privilege.
Enable vscode snowflake extension. Follow instructions here https://docs.snowflake.com/en/user-guide/vscode-ext. Note: the worksheets that you create in snowflake do not automatically get transfered into github, so we encourage using this extension to add version control to your SQL.
Follow the instructions in the vscode extension to connect to snowflake. I recommend setting up your ~/.snowflake/config file.
Look at examples in the analytics/exploration.sql to see how you can query the warehouse.

Contributing

If there is a query you expect to run frequently, lets contribute it to the analytics folder!

Administration

Please refer to the admin README for detailed information.

Synapse Data Warehouse

Data Architecture

This is just prototype, but I am going to attempt to follow the medallion data architecture. That said, AWS glue provides an in-between state of bronze and silver layer. The data warehouse data exposed as "raw" tables in snowflake does not correspond to the source system table structures, but is already transformed.

graph TD;
    A[Bronze - Raw JSON in S3]-->B[Silver - Raw JSON ETL Parquet in S3];
    B-->C[Silver - Parquet in S3 ELT Snowflake];
    C-->D[Gold - Snowflake];

RECOVER (PoC)

The RECOVER data is processed via AWS and is compressed to parquet datasets. The parquet datasets are then ingested into snowflake for easy querying and validation.

GENIE (PoC)

The GENIE public releases are loaded into snowflake via this script. You must have a snowflake connection section, please copy and fill out this template with your username and password.

Portals data (PoC)

snowflake, synapseclient and dotenv must be installed as dependencies.

pip install "snowflake-connector-python[pandas]" "synapseclient[pandas]" python-dotenv

Contributing

For contribution guidelines, please see the CONTRIBUTING.md file in this repository.

Visualizing with Streamlit

Users can customize their own data visualization dashboards from the data available on Snowflake by using the streamlit-snowflake-template. This is a template repository that leverages Streamlit to create and deploy internal applications for Synapse-derived data insight and analysis.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

snowflake

Connecting to Snowflake

Contributing

Administration

Synapse Data Warehouse

Data Architecture

RECOVER (PoC)

GENIE (PoC)

Portals data (PoC)

Contributing

Visualizing with Streamlit

About

Releases 2

Contributors 8

Languages

License

Sage-Bionetworks/snowflake

Folders and files

Latest commit

History

Repository files navigation

snowflake

Connecting to Snowflake

Contributing

Administration

Synapse Data Warehouse

Data Architecture

RECOVER (PoC)

GENIE (PoC)

Portals data (PoC)

Contributing

Visualizing with Streamlit

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Contributors 8

Languages