You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am literally copying this from the ORM question because it's virtually identical but the ORM approach doesn't work for CORE.
Question
Hi there, in Snowflake we often have multiple databases, with their own set of schemas, and tables inside those schemas. Ideally data shouldn't have to move between these databases nor schemas, but in practice they often do.
One use case is when there might be valuable data between two different databases. In my example, I'm going to reference database A and database B which has schemas ant and baboons, which has tables apples and bananas, respectively.
Through a direct query using Snowflake SQL, you could implement a join like
select*fromA.ant.apples a
joinB.baboons.bananas b
ona.id=b.id
How can you implement a cross database join like the above example using a single session through the CORE interface? My understanding is that a session can only point towards one database. I'm not creating an app and the tables already exist so ideally this would all happen through SQLAlchemy MetaData object reflection. Below are some example of what I've tried.
The ideal approach would look like the following:
importosfromsqlalchemyimportcreate_engine, MetaData, text, Tablefromsnowflake.sqlalchemyimportURLfromsqlalchemy.sqlimportselect, and_, or_, case, func, insert# Note: no database, schema and warehouse are being selected at this time. They're not needed yet.engine=create_engine(URL(
account="my_awesome_account",
user="my_user",
password="my_password",
role="my_awesome_role",
),
)
metadata=MetaData()
metadata.reflect(engine, schema='a.ant', only=['apples',]) # doesn't workmetadata.reflect(engine, schema='a.ant', only=['a.ant.apples',]) # doesn't workmetadata.reflect(engine, only=['a.ant.apples',]) # doesn't workTable("apples", metadata, schema='a.ant', autoload_with=engine) # doesn't workTable("a.ant.apples", metadata, schema='a.ant', autoload_with=engine) # doesn't work
I've tried all types of combinations of cases with the the database name and schema names too. None of them work.
What I find silly about the above is I often get StackTraces saying I have to provide a fully qualified name since I'm not declaring a database. But I AM providing a fully qualified name. And not explicitly choosing a database is the point. I need to pull data from tables in multiple databases so the reflected tables NEED TO BE FULLY QUALIFIED.
Can this be done through the CORE interface without dropping to SQL? If so, what's the best practice in doing this?
Please answer these questions before submitting your issue. Thanks!
What version of Python are you using?
Python 3.9.6
What operating system and processor architecture are you using?
macOS-14.0-arm64-arm-64bit
What are the component versions in the environment (pip freeze)?
Ideally, the fully qualified table objects, including the database, would have been instantiated as SQLAlchemy Table objects. In this way, I could then create a SQLAlchemy select statement such as select(a).select_from(a.join(b, on=[a.c.col1 == b.c.col1]) and the corresponding SQL generated would be SELECT A.ant.apple.* FROM A.ant.apple JOIN B.baboons.bananas ON A.ant.apple.col1 = B.baboons.bananas.col1;.
Instead, the reflection step fails forcing me to choose a database which then prevents me from joining tables in different databases.
Can you set logging to DEBUG and collect the logs?
import logging
import os
for logger_name in ['snowflake.sqlalchemy', 'snowflake.connector']:
logger = logging.getLogger(logger_name)
logger.setLevel(logging.DEBUG)
ch = logging.StreamHandler()
ch.setLevel(logging.DEBUG)
ch.setFormatter(logging.Formatter('%(asctime)s - %(threadName)s %(filename)s:%(lineno)d - %(funcName)s() - %(levelname)s - %(message)s'))
logger.addHandler(ch)
I'm happy to provide this info but in private. It's on a company computer with company backend stuff that's not for the public eye.
The text was updated successfully, but these errors were encountered:
github-actionsbot
changed the title
Best practice for doing cross database joins using CORE
SNOW-948287: Best practice for doing cross database joins using CORE
Oct 23, 2023
This article provides alternatives for How To get around this using session variables and optionally identifier()
Details
The need is to join data from two databases (which are environment specific) such as:
SELECT
cd.cust_id,
ff.measure_nbr,
...
FROM ent_dev_edw.bas.cust_d cd
JOIN lcl_dev_wrk.aml.fact_f ff ON ff.cust_sk = cd.cust_sk
WHERE
...
Use Variables
set dbEntNm = 'ent_dev_edw.';
set dbLclNm = 'lcl_dev_wrk.';
set cdNm = $dbEntNm || 'bas.cust_d';
set ffNm = $dbLclNm || 'amt.fact_f';
...
FROM $cdNm cd -- or use this syntax FROM TABLE($cdNm) or this syntax FROM IDENTIFIER($cdNm)
JOIN $ffNm ff ON ff.cust_sk = cd.cust_sk
...
Honestly, it's a pain when you want flexibility for a lot of tables.
We do have a Synonym jira ticket out there and I will add your company to it.
I hacked my way around it but because of how our org is set up, I'm unable to run the test suite to verify it doesn't break anything. If someone could help out, that'd be wonderful. All edits were made to snowdialect.py, attached here. snowdialect.py.zip
Bottom line, there are two sections in this file around reflection. I added a bit of code to check to see if the schema has a . in it. And if so, split the schema into database and schema. The information_schema is also being fully qualified in the code by changing the subsequent query to an f-string and including the database. Just run git diff on the attached zip file to see all the changes.
I am literally copying this from the ORM question because it's virtually identical but the ORM approach doesn't work for CORE.
Question
Hi there, in Snowflake we often have multiple databases, with their own set of schemas, and tables inside those schemas. Ideally data shouldn't have to move between these databases nor schemas, but in practice they often do.
One use case is when there might be valuable data between two different databases. In my example, I'm going to reference database
A
and databaseB
which has schemasant
andbaboons
, which has tablesapples
andbananas
, respectively.Through a direct query using Snowflake SQL, you could implement a join like
How can you implement a cross database join like the above example using a single session through the CORE interface? My understanding is that a session can only point towards one database. I'm not creating an app and the tables already exist so ideally this would all happen through SQLAlchemy MetaData object reflection. Below are some example of what I've tried.
The ideal approach would look like the following:
I've tried all types of combinations of cases with the the database name and schema names too. None of them work.
What I find silly about the above is I often get StackTraces saying I have to provide a fully qualified name since I'm not declaring a database. But I AM providing a fully qualified name. And not explicitly choosing a database is the point. I need to pull data from tables in multiple databases so the reflected tables NEED TO BE FULLY QUALIFIED.
Can this be done through the CORE interface without dropping to SQL? If so, what's the best practice in doing this?
Please answer these questions before submitting your issue. Thanks!
What version of Python are you using?
Python 3.9.6
What operating system and processor architecture are you using?
macOS-14.0-arm64-arm-64bit
What are the component versions in the environment (
pip freeze
)?appnope==0.1.3 asn1crypto==1.5.1 asttokens==2.4.0 backcall==0.2.0 certifi==2023.7.22 cffi==1.16.0 charset-normalizer==3.3.0 comm==0.1.4 cryptography==41.0.4 debugpy==1.8.0 decorator==5.1.1 exceptiongroup==1.1.3 executing==2.0.0 filelock==3.12.4 idna==3.4 importlib-metadata==6.8.0 ipykernel==6.25.2 ipython==8.16.1 jaraco.classes==3.3.0 jedi==0.19.1 jupyter_client==8.4.0 jupyter_core==5.4.0 keyring==24.2.0 matplotlib-inline==0.1.6 more-itertools==10.1.0 nest-asyncio==1.5.8 numpy==1.26.1 oscrypto==1.3.0 packaging==23.2 pandas==2.0.3 parso==0.8.3 pexpect==4.8.0 pickleshare==0.7.5 platformdirs==3.11.0 prompt-toolkit==3.0.39 psutil==5.9.6 ptyprocess==0.7.0 pure-eval==0.2.2 pyarrow==10.0.1 pycparser==2.21 pycryptodomex==3.19.0 Pygments==2.16.1 PyJWT==2.8.0 pyOpenSSL==23.2.0 python-dateutil==2.8.2 pytz==2023.3.post1 pyzmq==25.1.1 requests==2.31.0 six==1.16.0 snowflake-connector-python==3.3.1 snowflake-sqlalchemy==1.5.0 sortedcontainers==2.4.0 SQLAlchemy==1.4.49 sqlparse==0.4.4 stack-data==0.6.3 tomlkit==0.12.1 tornado==6.3.3 traitlets==5.11.2 typing_extensions==4.8.0 tzdata==2023.3 urllib3==1.26.18 wcwidth==0.2.8 zipp==3.17.0
What did you do?
The above example should be sufficient.
What did you expect to see?
Ideally, the fully qualified table objects, including the database, would have been instantiated as SQLAlchemy Table objects. In this way, I could then create a SQLAlchemy
select
statement such asselect(a).select_from(a.join(b, on=[a.c.col1 == b.c.col1])
and the corresponding SQL generated would beSELECT A.ant.apple.* FROM A.ant.apple JOIN B.baboons.bananas ON A.ant.apple.col1 = B.baboons.bananas.col1;
.Instead, the reflection step fails forcing me to choose a database which then prevents me from joining tables in different databases.
Can you set logging to DEBUG and collect the logs?
I'm happy to provide this info but in private. It's on a company computer with company backend stuff that's not for the public eye.
The text was updated successfully, but these errors were encountered: