Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-958008: fix local test division datatypes #1127

Merged
merged 9 commits into from
Nov 22, 2023

Conversation

sfc-gh-yixie
Copy link
Collaborator

Please answer these questions before submitting your pull requests. Thanks!

  1. What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes #NNNN

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
  3. Please describe how your code solves the related issue.

    Please write a short description of how your code change solves the related issue.

@sfc-gh-yixie sfc-gh-yixie requested a review from a team as a code owner November 3, 2023 17:34
@sfc-gh-yixie sfc-gh-yixie requested review from sfc-gh-jdu, sfc-gh-sfan, sfc-gh-stan and sfc-gh-aling and removed request for a team, sfc-gh-jdu and sfc-gh-sfan November 3, 2023 17:34
@@ -80,7 +80,7 @@ def __init__(self, length: Optional[int] = None) -> None:
self.length = length

def __repr__(self) -> str:
if self.length:
if self.length and self.length != self._MAX_LENGTH:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sfc-gh-aalam Are you good with this change?

@@ -214,6 +214,8 @@ def __init__(self, value: Any, datatype: Optional[DataType] = None) -> None:
self.datatype = datatype
else:
self.datatype = infer_type(value)
if isinstance(self.datatype, StringType):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sfc-gh-aalam Are you good with this change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. approved the other PR which adds this change

@@ -946,15 +946,15 @@ def calculate_expression(
)
return ColumnEmulator(
data=[bool(data is None) for data in child_column],
sf_type=ColumnType(BooleanType(), False),
sf_type=ColumnType(BooleanType(), True),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We observe that boolean type columns are always nullable from live connection.

@@ -989,6 +989,8 @@ def calculate_expression(
if isinstance(exp, BinaryExpression):
left = calculate_expression(exp.left, input_data, analyzer, expr_to_alias)
right = calculate_expression(exp.right, input_data, analyzer, expr_to_alias)
# TODO: Address mixed type calculation here. For instance Snowflake allows to add a date to a number, but
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be enabled by type-coercion.

@@ -182,7 +191,7 @@ def calculate_type(c1: ColumnType, c2: Optional[ColumnType], op: Union[str]):
elif isinstance(t1, (FloatType, DoubleType)) or isinstance(
t2, (FloatType, DoubleType)
):
return ColumnType(DoubleType(), nullable)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could switch to using normalize_output_sf_type

if isinstance(sf_type.datatype, DecimalType):
result = result.astype("double").round(sf_type.datatype.scale)
elif isinstance(sf_type.datatype, (FloatType, DoubleType)):
result = result.astype("double").round(16)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We observe that Snowflake returns 16 digits after the floating point. cc: @sfc-gh-yixie

@@ -435,25 +446,28 @@ def __round__(self, n=None):

def __rpow__(self, other):
result = super().__rpow__(other)
result.sf_type = ColumnType(DoubleType(), self.sf_type.nullable)
result.sf_type = ColumnType(FloatType(), True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as Sophie's to use normalize_output_sf_type

return result

def __pow__(self, power):
result = super().__pow__(power)
result.sf_type = ColumnType(DoubleType(), self.sf_type.nullable)
result.sf_type = ColumnType(
FloatType(), self.sf_type.nullable or power.sf_type.nullable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are all the FloatType in the code usage ultimately be converted to DoubleType via normalize_output_sf_type at some point?
and from user perspective will they get a chance to see the FloatType at all?

it seems the two types are interchangeable and we want DoubleType to be in the final output. can we just use DoubleType everywhere?

@sfc-gh-stan
Copy link
Collaborator

The code changes LGTM, let's address the test failures in Test Local Testing Module macos merge gates.

@@ -1661,7 +1670,9 @@ def calculate_expression(
).sf_type
if not calculated_sf_type:
calculated_sf_type = cur_windows_sf_type
elif calculated_sf_type.datatype != cur_windows_sf_type.datatype:
elif not is_datatype_compatible(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This basically allows coercion to happen but doesn't set the result type properly?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i removed it

l2 = expr2.sf_type.datatype.length or StringType._MAX_LENGTH
sf_data_type = StringType(max(l1, l2))
else:
sf_data_type = (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we not using calculate_datatype to compute the return type?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I finally wanted to limit the scope and fixed the problem locally.

@sfc-gh-aling sfc-gh-aling merged commit 10f0af0 into dev/local-testing Nov 22, 2023
7 of 38 checks passed
@sfc-gh-aling sfc-gh-aling deleted the yixie-SNOW-958008-fix-datatypes branch November 22, 2023 02:30
@github-actions github-actions bot locked and limited conversation to collaborators Nov 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants