Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-992587: Unable to select metadata columns #1185

Closed
gnilrets opened this issue Dec 15, 2023 · 2 comments
Closed

SNOW-992587: Unable to select metadata columns #1185

gnilrets opened this issue Dec 15, 2023 · 2 comments
Labels

Comments

@gnilrets
Copy link

Please answer these questions before submitting your issue. Thanks!

  1. What version of Python are you using?

3.9.12

  1. What operating system and processor architecture are you using?

Linux-5.10.197-186.748.amzn2.x86_64-x86_64-with-glibc2.31

  1. What are the component versions in the environment (pip freeze)?

absl-py==1.1.0
acryl-datahub==0.12.1.0
adagio==0.2.4
adjustText==0.8
aenum==3.1.11
aeppl==0.0.34
aesara==2.7.9
affine==2.3.0
agate==1.7.1
aiobotocore==2.5.4
aiohttp==3.8.5
aioitertools==0.11.0
aiosignal==1.2.0
alembic==1.11.3
altair==5.1.1
altair-transform==0.2.0
angel-cd==1.0.3
ansi2html==1.8.0
antlr4-python3-runtime==4.11.1
anyio==3.6.2
apify-client==1.6.1
apify-shared==1.1.0
appdirs==1.4.4
argon2-cffi==21.1.0
argparse==1.4.0
arrow==1.2.2
arviz==0.12.1
asn1crypto==1.4.0
astor==0.8.1
astunparse==1.6.3
async-timeout==4.0.2
atomicwrites==1.4.1
atpublic==4.0
attrs==23.1.0
Authlib==1.2.1
autograd==1.5
autograd-gamma==0.5.0
autopep8==1.5.7
avro==1.11.3
avro-gen3==0.7.11
awswrangler==3.3.0
Babel==2.12.1
backcall==0.2.0
backoff==2.1.2
bayesian-optimization==1.2.0
bcrypt==3.2.0
beautifulsoup4==4.10.0
bertopic==0.15.0
bidict==0.22.1
bigframes==0.12.0
bimlpa==0.1.2
biopython==1.80
bitarray==2.5.1
black==22.12.0
bleach==4.1.0
blis==0.7.10
blosc2==2.0.0
bokeh==2.4.3
boto==2.49.0
boto3==1.28.17
botocore==1.31.17
boxsdk==3.9.2
branca==0.4.2
build==0.10.0
cached-property==1.5.2
cachetools==5.3.0
catalogue==2.0.6
catboost==1.2
cattrs==1.10.0
cdlib==0.2.6
certifi==2021.10.8
cffi==1.14.6
cftime==1.5.1
chainladder==0.8.18
charset-normalizer==2.0.6
chinese-whispers==0.8.0
chroma-hnswlib==0.7.1
chromadb==0.4.1
click==8.0.4
click-default-group==1.2.4
click-plugins==1.1.1
click-spinner==0.1.10
clickhouse-driver==0.2.3
cligj==0.7.2
clikit==0.6.2
cloudpickle==2.0.0
cma==3.2.2
cmdstanpy==1.0.4
colorama==0.4.6
colorcet==3.0.0
coloredlogs==15.0.1
colour==0.1.5
comm==0.1.4
confection==0.1.1
cons==0.4.5
contextily==1.1.0
convertdate==2.3.2
convoys==0.2.1
crashtest==0.3.1
cryptography==40.0.2
cvxpy==1.3.1
cycler==0.10.0
cymem==2.0.5
Cython==0.29.24
cytoolz==0.11.2
dash==2.9.2
dash-core-components==2.0.0
dash-html-components==2.0.0
dash-table==5.0.0
dask==2022.4.0
databricks-cli==0.17.7
dataclasses-json==0.5.9
datadog==0.35.0
datatable==1.0.0
db-dtypes==1.1.1
dbt-core==1.7.3
dbt-extractor==0.5.1
dbt-semantic-interfaces==0.4.2
dbt-snowflake==1.7.1
debugpy==1.6.7.post1
decorator==5.1.0
deepdish==0.3.7
defusedxml==0.7.1
demon==2.0.6
Deprecated==1.2.13
descartes==1.1.0
df2gspread==1.0.4
dill==0.3.4
distro==1.8.0
dm-tree==0.1.6
dnspython==2.2.0
docker==6.1.3
docutils==0.16
dropbox==11.36.2
duckdb==0.8.1
dulwich==0.21.7
dynetx==0.3.1
easychart==0.1.16
easypost==5.1.3
easytree==0.1.12
ecos==2.0.12
emcee==3.1.2
emoji==2.4.0
entrypoints==0.3
ephem==4.1
et-xmlfile==1.1.0
eth-abi==4.1.0
eth-account==0.9.0
eth-hash==0.5.2
eth-keyfile==0.6.1
eth-keys==0.4.0
eth-rlp==0.3.0
eth-typing==3.4.0
eth-utils==2.2.0
etils==0.9.0
etuples==0.3.5
eva-lcd==0.1.1
expandvars==0.12.0
fastapi==0.99.1
fastdiff==0.3.0
fastprogress==1.0.0
fbprophet @ git+https://github.com/hex-inc/prophet.git@f7cd9b9fa71f5a941421f540cb10b751e59ae2ba
filelock==3.8.0
Fiona==1.8.20
flake8==4.0.1
Flask==2.1.3
flatbuffers==2.0.7
folium==0.12.1.post1
fonttools==4.34.4
formulaic==0.5.2
fredapi==0.5.0
frozenlist==1.3.0
fs==2.4.16
fsspec==2023.9.0
fugue==0.8.3
fugue-sql-antlr==0.1.6
funcy==1.16
future==0.18.2
fuzzywuzzy==0.18.0
gast==0.4.0
gcsfs==2023.9.0
gensim==4.3.2
geographiclib==1.52
geopandas==0.12.2
geopy==2.2.0
gitdb==4.0.9
GitPython==3.1.27
google-api-core==2.11.0
google-api-python-client==1.6.7
google-api-support==0.1.3
google-auth==2.17.1
google-auth-httplib2==0.1.0
google-auth-oauthlib==1.0.0
google-cloud-aiplatform==1.36.0
google-cloud-appengine-logging==1.1.1
google-cloud-audit-log==0.2.0
google-cloud-bigquery==3.13.0
google-cloud-bigquery-connection==1.13.2
google-cloud-bigquery-storage==2.19.1
google-cloud-billing==1.5.1
google-cloud-core==2.1.0
google-cloud-functions==1.13.3
google-cloud-iam==2.12.2
google-cloud-logging==3.0.0
google-cloud-resource-manager==1.10.4
google-cloud-storage==2.1.0
google-crc32c==1.3.0
google-pasta==0.2.0
google-resumable-media==2.0.3
googleapis-common-protos==1.59.0
googlemaps==4.5.3
gql==3.4.0
graphql-core==3.2.0
graphviz==0.20.1
greenlet==1.1.2
gremlinpython==3.6.4
grpc-google-iam-v1==0.12.6
grpcio==1.50.0
grpcio-status==1.41.0
gspread==5.7.1
gspread-dataframe==3.3.0
gspread-pandas==3.2.2
gunicorn==21.2.0
h11==0.14.0
h3==3.7.6
h5py==3.1.0
hdbscan==0.8.33

Editable install with no version control (hex-api==1.0.0)

-e /python-sdk

Editable install with no version control (hex-data-service==0.1.0)

-e /data-service-python

Editable install with no version control (hex-lazy-installer==0.1.0)

-e /hex-lazy-installer

Editable install with no version control (hex-packages==0.1.0)

-e /python-kernel-packages

Editable install with no version control (hex-shared==0.1.0)

-e /python-shared
hexbytes==0.2.2

Editable install with no version control (hextoolkit==0.1.0)

-e /python-api
hijri-converter==2.2.2
holidays==0.14.2
holoviews==1.15.0
httpcore==1.0.2
httplib2==0.20.1
httpstan==4.8.1
httptools==0.6.0
httpx==0.25.2
hubspot-api-client==8.1.1
huggingface-hub==0.11.1
humanfriendly==10.0
humanize==4.8.0
hyperopt==0.2.7
ibis-framework==6.2.0
idna==3.2
igraph==0.9.11
ijson==3.2.3
imageio==2.9.0
imbalanced-learn==0.9.1
importlib-metadata==6.8.0
importlib-resources==5.10.0
interface-meta==1.3.0
ipykernel==6.25.1
ipython==7.32.0
ipython-genutils==0.2.0
ipywidgets==7.8.1
isodate==0.6.1
itsdangerous==2.1.2
jaraco.classes==3.3.0
jax==0.3.23
jaxlib==0.3.22
jedi==0.17.2
jeepney==0.8.0
Jinja2==3.1.2
jinjasql==0.1.8
jmespath==0.10.0
joblib==1.2.0
jsonpatch==1.33
jsonpath-ng==1.5.3
jsonpointer==2.4
jsonref==1.1.0
jsonschema==4.0.1
jupyter-client==7.0.6
jupyter-dash==0.4.2
jupyter_core==5.3.1
jupyterlab-pygments==0.1.2
jupyterlab-widgets==1.0.2
kaleido==0.2.1
keplergl==0.1.2
keras==2.12.0
Keras-Applications==1.0.8
keyring==24.3.0
kiwisolver==1.3.2
korean-lunar-calendar==0.2.1
langchain==0.0.347
langchain-core==0.0.11
langcodes==3.3.0
langsmith==0.0.69
leather==0.3.4
libclang==14.0.1
lifelines==0.27.4
lightfm==1.17
lightgbm==3.3.5
littleutils==0.2.2
llvmlite==0.40.1
locket==0.2.1
Logbook==1.5.3
logical-unification==0.4.5
loguru==0.6.0
looker-sdk==22.2.1
lru-dict==1.1.7
LunarCalendar==0.0.9
lxml==4.8.0
Mako==1.2.4
Markdown==3.3.4
markdown-it-py==2.2.0
markov-clustering==0.0.6.dev0
MarkupSafe==2.0.1
marshmallow==3.17.0
marshmallow-enum==1.5.1
mashumaro==3.11
matplotlib==3.5.2
matplotlib-inline==0.1.3
matplotlib-venn==0.11.7
mccabe==0.6.1
mdurl==0.1.2
mercantile==1.2.1
miniKanren==1.0.3
minimal-snowplow-tracker==0.0.2
mistune==0.8.4
mixpanel==4.10.0
mizani==0.7.4
mlflow==2.6.0
modelbit==0.29.0
monotonic==1.6
more-itertools==8.10.0
mpmath==1.3.0
msgpack==1.0.5
multidict==6.0.2
multipledispatch==0.6.0
multiprocess==0.70.12.2
multitasking==0.0.11
munch==2.5.0
murmurhash==1.0.5
mypy==0.961
mypy-extensions==0.4.3
natsort==8.4.0
nbclient==0.5.4
nbconvert==6.2.0
nbformat==5.1.3
nest-asyncio==1.5.1
netCDF4==1.5.7
networkx==2.6.3
nevergrad==0.5.0
nf1==0.0.4
nltk==3.7
notebook==6.4.12
numba==0.57.1
numexpr==2.8.5
numpy==1.23.4
numpy-financial==1.0.0
numpyro==0.10.1
oauth2client==4.1.3
oauthlib==3.1.1
onnxruntime==1.15.1
openai==1.3.7
openapi-schema-pydantic==1.2.4
opencv-python==4.8.0.74
openpyxl==3.0.9
opensearch-py==1.1.0
opt-einsum==3.3.0
optbinning==0.17.3
orbit-ml==1.1.4.2
orjson==3.8.9
ortools==9.4.1874
oscrypto==1.2.1
osqp==0.6.2.post9
outdated==0.2.2
overrides==7.4.0
packaging==21.3
palettable==3.3.0
pandas==1.5.3
pandas-flavor==0.6.0
pandas-gbq==0.19.1
pandasql==0.7.3
pandocfilters==1.5.0
panel==0.13.1
param==1.12.2
paramiko==2.10.2
parsedatetime==2.6
parsimonious==0.9.0
parso==0.7.1
parsy==2.1
partd==1.2.0
pastel==0.2.1
pathlib-mate==1.0.1
pathspec==0.9.0
pathy==0.10.2
patsy==0.5.2
pexpect==4.8.0
pg8000==1.29.2
pickleshare==0.7.5
Pillow==9.1.1
pinecone-client==2.2.4
pingouin==0.5.3
pip==23.3.1
pkginfo==1.9.6
platformdirs==3.0.0
plotly==5.13.0
plotly-resampler==0.8.3.2
plotnine==0.9.0
pluggy==0.13.1
ply==3.11
polars==0.19.2
pooch==1.6.0
posthog==3.0.2
preshed==3.0.5
presto-python-client==0.7.0
progressbar2==4.0.0
prometheus-client==0.11.0
prompt-toolkit==3.0.20
prophet==1.1
proto-plus==1.22.1
protobuf==4.24.1
psutil==5.8.0
psycopg2==2.9.1
ptyprocess==0.7.0
PuLP==2.6.0
pulsar-client==3.2.0
py==1.10.0
py-cpuinfo==9.0.0
py4j==0.10.9.7
pyarrow==10.0.1
pyarrow-hotfix==0.6
pyasn1==0.4.8
pyasn1-modules==0.2.8
pyclustering==0.10.1.2
pycodestyle==2.8.0
pycparser==2.20
pycryptodome==3.14.1
pycryptodomex==3.11.0
pyct==0.4.8
pydantic==1.10.12
pydata-google-auth==1.8.2
pydeck==0.7.1
pydot==1.4.2
pyerf==1.0.1
pyflakes==2.4.0
Pygments==2.10.0
pygraphviz==1.10
pygsheets==2.0.5
pyjanitor==0.25.0
PyJWT==2.4.0
pyLDAvis==3.2.2
pylev==1.4.0
pymc==4.1.5
pymc3==3.11.4
PyMeeus==0.5.11
pymer4==0.8.0
pymongo==4.0.1
pymsteams==0.2.2
PyMySQL==1.0.2
PyNaCl==1.5.0
pynmeagps==1.0.20
pynndescent==0.5.4
pyodbc==4.0.32
pyOpenSSL==23.2.0
pyparsing==2.4.7
PyPika==0.48.9
pyproj==3.6.1
pyproject_hooks==1.0.0
pyro-api==0.1.2
pyro-ppl==1.8.4
pyrsistent==0.18.0
pysftp==0.2.9
pyshp==2.1.3
pysimdjson==3.2.0
pystan==3.5.0
pytest==5.4.3
python-box==7.0.1
python-dateutil==2.8.2
python-dotenv==1.0.0
python-igraph==0.9.11
python-json-logger==2.0.7

Editable install with no version control (python-kernel-startup==0.1.0)

-e /python-kernel-startup
python-Levenshtein==0.12.2
python-louvain==0.16
python-slugify==8.0.1

Editable install with no version control (python-universal-dataframe==0.1.0)

-e /python-universal-dataframe
python-utils==3.3.3
pytimeparse==1.1.8
pytorch-ignite==0.4.6
pytrends==4.7.3
pytz==2023.3.post1
pyunormalize==15.0.0
pyviz-comms==2.2.0
PyWavelets==1.1.1
PyYAML==6.0.1
pyzmq==22.3.0
qdldl==0.1.7
qpd==0.4.0
querystring-parser==1.2.4
rasterio==1.2.9
redmail==0.4.0
redshift-connector==2.0.915
regex==2023.5.5
requests==2.28.0
requests-aws4auth==1.2.3
requests-file==1.5.1
requests-oauthlib==1.3.0
requests-toolbelt==0.9.1
retrying==1.3.4
rich==13.2.0
rlp==3.0.0
ropwr==1.0.0
rpy2==3.5.13
rsa==4.7.2
Rtree==0.9.7
ruamel.yaml==0.17.26
ruamel.yaml.clib==0.2.7
s3fs==2023.9.0
s3transfer==0.6.0
SALib==1.4.7
scikit-image==0.18.3
scikit-learn==1.2.2
scikits.bootstrap==1.1.0
scipy==1.10.1
scramp==1.4.1
scs==3.2.3
seaborn==0.12.1
SecretStorage==3.3.3
selenium==3.141.0
semver==2.13.0
Send2Trash==1.8.0
sentence-transformers==2.2.2
sentencepiece==0.1.97
sentry-sdk==1.39.0
setuptools==68.2.2
setuptools-git==1.2
shap==0.41.0
shapely==2.0.1
sidetable==0.9.0
simple-salesforce==1.11.4
simplejson==3.18.1
six==1.15.0
sklearn-pandas==1.8.0
sktime==0.16.1
slack-sdk==3.18.3
slicer==0.0.7
smart-open==5.2.1
smmap==5.0.0
snapshottest==0.6.0
sniffio==1.3.0
snowflake-connector-python==3.1.0
snowflake-ml-python==1.0.7
snowflake-snowpark-python==1.6.1
snowflake-sqlalchemy==1.4.7
snuggs==1.4.7
sortedcontainers==2.4.0
soupsieve==2.2.1
spacy==3.6.0
spacy-legacy==3.0.12
spacy-loggers==1.0.4
sparse==0.14.0
splunk-sdk==1.6.18
sql-metadata==2.5.0
SQLAlchemy==1.4.25
sqlalchemy-redshift==0.7.9
sqlalchemy2-stubs==0.0.2a32
sqlglot==11.4.5
sqlmodel==0.0.8
sqlparse==0.4.2
srsly==2.4.7
starlette==0.27.0
statsforecast==1.5.0
statsmodels==0.14.0
stone==3.2.1
stripe==2.60.0
styleframe==4.1
sympy==1.12
tableauserverclient==0.24
tables==3.8.0
tabulate==0.8.9
tenacity==8.2.2
tensorboard==2.12.1
tensorboard-data-server==0.7.0
tensorboard-plugin-wit==1.8.0
tensorflow==2.12.0
tensorflow-decision-forests==1.3.0
tensorflow-estimator==2.12.0
tensorflow-hub==0.14.0
tensorflow-io-gcs-filesystem==0.26.0
tensorflow-probability==0.19.0
termcolor==1.1.0
terminado==0.12.1
testpath==0.5.0
text-unidecode==1.3
textblob==0.15.3
texttable==1.6.4
tfcausalimpact==0.0.13
Theano-PyMC==1.1.2
thinc==8.1.10
threadpoolctl==3.0.0
thresholdclustering==1.1
tifffile==2021.8.30
tiktoken==0.5.2
tokenizers==0.13.2
toml==0.10.2
tomli==2.0.1
tomlkit==0.12.1
toolz==0.11.1
torch==1.12.1
torchvision==0.13.1
tornado==6.3.3
tqdm==4.65.0
trace-updater==0.0.9
traitlets==5.9.0
traittypes==0.2.1
transformers==4.25.1
triad==0.8.4
typed-ast==1.5.1
typer==0.4.0
types-cachetools==5.3.0.5
types-pkg-resources==0.1.3
types-PyYAML==6.0.12
types-requests==2.28.2
types-urllib3==1.26.15
typing-inspect==0.9.0
typing_extensions==4.7.1
tzlocal==3.0
ua-parser==0.10.0
ujson==5.4.0
umap-learn==0.5.1
UpSetPlot==0.6.1
uritemplate==3.0.1
urllib3==1.26.16
user-agents==2.2.0
uszipcode==0.2.6
uvicorn==0.23.2
uvloop==0.17.0
validators==0.20.0
vegafusion==1.5.0
vegafusion-python-embed==1.5.0
vl-convert-python==0.13.1
wasabi==1.1.2
wasmer==1.1.0
wasmer_compiler_cranelift==1.1.0
watchfiles==0.19.0
wcwidth==0.2.5
web3==6.8.0
webargs==8.2.0
webencodings==0.5.1
websocket-client==1.6.1
websockets==11.0.3
Werkzeug==2.0.2
wheel==0.38.4
widgetsnbextension==3.6.6
wordcloud==1.8.1
wrapt==1.12.1
wurlitzer==3.0.2
xarray==0.19.0
xarray-einstats==0.3.0
xgboost==1.7.3
xlrd==2.0.1
xxhash==3.4.1
yarl==1.7.2
yfinance==0.1.87
youtube-data-api==0.0.21
zipp==3.10.0
zstandard==0.21.0

  1. What did you do?

I have a stage with a few CSV files in it. I wanted to create a Snowpark dataframe that read the CSV files and placed the filename as a column in the dataframe. I was then trying to use the filename column downstream but I got a lot of strange errors.

Here's what I started with:

import snowflake.snowpark.functions as F  # noqa: N812
import snowflake.snowpark.types as SPT  # noqa: N812

csv_spdf = (
    session.read
    .options({
        "skip_header": 21
    })
    .with_metadata(F.col("metadata$filename").alias("FILENAME"))
    .schema(
        SPT.StructType([
            SPT.StructField("TIME", SPT.FloatType()),
            SPT.StructField("CH1", SPT.FloatType()),
            SPT.StructField("CH2", SPT.FloatType()),
        ])
    )
    .csv("@DEV_STERLINGP_RAW.PUBLIC.POC")
)

When I convert it to a pandas dataframe and look at the columns, I see everything I expect:

csv_spdf.limit(100).to_pandas()
# Index(['FILENAME', 'TIME', 'CH1', 'CH2'], dtype='object')

However, if I look at the columns in the Snowpark dataframe, the "FILENAME" column doesn't exist:

csv_spdf.columns
# ['TIME', 'CH1', 'CH2']

If I include the FILENAME in the schema section (SPT.StructField("FILENAME", SPT.StringType())), then it does show up in the csv_spdf.columns. However, when I try to convert it to a pandas dataframe I get an error that there are duplicate columns: SnowparkFetchDataException: (1406): Failed to fetch a Pandas Dataframe. The error is: Found non-unique column index

Some other strange things that happen here is that if I try to group by the FILENAME column, it can execute the query and return the results:

# This works and I get the expected pandas dataframe:
(
    csv_spdf
    .group_by(F.col("FILENAME"))
    .count()
).to_pandas()

However, if I try to ask for the columns that are in the above dataframe expression, I get an error:

(
    csv_spdf
    .group_by(F.col("FILENAME"))
    .count()
).columns
# Returns SQL compilation error: error line 1 at position 7 invalid identifier 'FILENAME'
@gnilrets gnilrets added bug Something isn't working needs triage Initial RCA is required labels Dec 15, 2023
@github-actions github-actions bot changed the title Unable to select metadata columns SNOW-992587: Unable to select metadata columns Dec 15, 2023
@sfc-gh-aalam
Copy link
Contributor

@gnilrets can you retry using snowpark-python version 1.11.1. A bug similar to this was addressed in the latest release

@sfc-gh-aalam sfc-gh-aalam added triaged and removed bug Something isn't working needs triage Initial RCA is required labels Dec 15, 2023
@gnilrets
Copy link
Author

Yep, that did it. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants