Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1622029: Table.update() raises TypeError if table contains any VariantType columns #2067

Open
djfletcher opened this issue Aug 12, 2024 · 2 comments
Assignees
Labels
bug Something isn't working local testing Local Testing issues/PRs needs triage Initial RCA is required status-triage_done Initial triage done, will be further handled by the driver team

Comments

@djfletcher
Copy link

Please answer these questions before submitting your issue. Thanks!

  1. What version of Python are you using?

Python 3.9.6 (default, Feb 3 2024, 15:58:27)
[Clang 15.0.0 (clang-1500.3.9.4)]

  1. What are the Snowpark Python and pandas versions in the environment?

pandas==2.2.2
snowflake-snowpark-python==1.20.0

  1. What did you do?

I am updating a Table row in my tests. I can reproduce using the same code as https://docs.snowflake.com/en/developer-guide/snowpark/reference/python/latest/snowpark/api/snowflake.snowpark.Table.update but with one extra variant column. Updating any column, even if it is not the VariantType column, raises a TypeError:

session = Session.builder.config("local_testing", True).create()
target_df = session.create_dataframe([(1, 1, {}),(1, 2, {}),(2, 1, {}),(2, 2, {}),(3, 1, {}),(3, 2, {})], schema=["a", "b", "c"])
target_df.write.save_as_table("my_table", mode="overwrite", table_type="temporary")
t = session.table("my_table")
t.update({"b": 0}, t["a"] == 1)

Here is the stacktrace:

venv/lib/python3.9/site-packages/snowflake/snowpark/table.py:470: in update
    result = new_df._internal_collect_with_tag(
venv/lib/python3.9/site-packages/snowflake/snowpark/_internal/telemetry.py:150: in wrap
    result = func(*args, **kwargs)
venv/lib/python3.9/site-packages/snowflake/snowpark/dataframe.py:644: in _internal_collect_with_tag_no_telemetry
    return self._session._conn.execute(
venv/lib/python3.9/site-packages/snowflake/snowpark/mock/_connection.py:559: in execute
    res = execute_mock_plan(plan, plan.expr_to_alias)
venv/lib/python3.9/site-packages/snowflake/snowpark/mock/_plan.py:1166: in execute_mock_plan
    matched_count = intermediate[target.columns].value_counts(dropna=False)[
venv/lib/python3.9/site-packages/pandas/core/frame.py:7509: in value_counts
    counts = self.groupby(subset, dropna=dropna, observed=False)._grouper.size()
venv/lib/python3.9/site-packages/pandas/core/groupby/ops.py:705: in size
    ids, _, ngroups = self.group_info
properties.pyx:36: in pandas._libs.properties.CachedProperty.__get__
    ???
venv/lib/python3.9/site-packages/pandas/core/groupby/ops.py:745: in group_info
    comp_ids, obs_group_ids = self._get_compressed_codes()
venv/lib/python3.9/site-packages/pandas/core/groupby/ops.py:764: in _get_compressed_codes
    group_index = get_group_index(self.codes, self.shape, sort=True, xnull=True)
venv/lib/python3.9/site-packages/pandas/core/groupby/ops.py:690: in codes
    return [ping.codes for ping in self.groupings]
venv/lib/python3.9/site-packages/pandas/core/groupby/ops.py:690: in <listcomp>
    return [ping.codes for ping in self.groupings]
venv/lib/python3.9/site-packages/pandas/core/groupby/grouper.py:691: in codes
    return self._codes_and_uniques[0]
properties.pyx:36: in pandas._libs.properties.CachedProperty.__get__
    ???
venv/lib/python3.9/site-packages/pandas/core/groupby/grouper.py:835: in _codes_and_uniques
    codes, uniques = algorithms.factorize(  # type: ignore[assignment]
venv/lib/python3.9/site-packages/pandas/core/algorithms.py:795: in factorize
    codes, uniques = factorize_array(
venv/lib/python3.9/site-packages/pandas/core/algorithms.py:595: in factorize_array
    uniques, codes = table.factorize(
pandas/_libs/hashtable_class_helper.pxi:7281: in pandas._libs.hashtable.PyObjectHashTable.factorize
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   TypeError: unhashable type: 'dict'

pandas/_libs/hashtable_class_helper.pxi:7195: TypeError
  1. What did you expect to see?

The in-memory table should have been updated without raising a TypeError.

@djfletcher djfletcher added bug Something isn't working local testing Local Testing issues/PRs needs triage Initial RCA is required labels Aug 12, 2024
@github-actions github-actions bot changed the title Table.update() raises TypeError if table contains any VariantType columns SNOW-1622029: Table.update() raises TypeError if table contains any VariantType columns Aug 12, 2024
@djfletcher
Copy link
Author

Per the documentation: https://docs.snowflake.com/en/developer-guide/snowpark/python/testing-locally#limitations

For Table.merge and Table.update, the session parameters ERROR_ON_NONDETERMINISTIC_UPDATE and ERROR_ON_NONDETERMINISTIC_MERGE must be set to False. This means that for multi-joins, one of the matched rows is updated.

Adding these params has no effect:

statement_params = {"ERROR_ON_NONDETERMINISTIC_UPDATE": False, "ERROR_ON_NONDETERMINISTIC_MERGE": False}
t.update({"b": 0}, t["a"] == 1, statement_params=statement_params)

E   TypeError: unhashable type: 'dict'

@sfc-gh-sghosh sfc-gh-sghosh self-assigned this Aug 13, 2024
@sfc-gh-sghosh
Copy link

Hello @djfletcher ,

Thanks for raising the issue, yes, the issue is with local testing while updating the table and its working fine with regular session. Will work on eliminating it.

session = Session.builder.config("local_testing", True).create() target_df = session.create_dataframe([(1, 1, {}),(1, 2, {}),(2, 1, {}),(2, 2, {}),(3, 1, {}),(3, 2, {})], schema=["a", "b", "c"]) target_df.write.save_as_table("my_table", mode="overwrite", table_type="temporary") t = session.table("my_table") t.show() t.update({"b": 0}, t["a"] == 1) t.show()

Output and Error:

|"A" |"B" |"C" |

|1 |1 |{} |
|1 |2 |{} |
|2 |1 |{} |
|2 |2 |{} |
|3 |1 |{} |
|3 |2 |{} |

TypeError: unhashable type: 'dict'

Regards,
Sujan

@sfc-gh-sghosh sfc-gh-sghosh added the status-triage_done Initial triage done, will be further handled by the driver team label Aug 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working local testing Local Testing issues/PRs needs triage Initial RCA is required status-triage_done Initial triage done, will be further handled by the driver team
Projects
None yet
Development

No branches or pull requests

4 participants