Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1637096: When using order_by, the resulting query does not use the correct defaults for NULL [FIRST | LAST] as per the docs #2150

Open
samuelsongsr opened this issue Aug 22, 2024 · 1 comment
Assignees
Labels
bug Something isn't working status-triage_done Initial triage done, will be further handled by the driver team

Comments

@samuelsongsr
Copy link

Please answer these questions before submitting your issue. Thanks!

  1. What version of Python are you using?

    Python 3.10.4 (main, May 26 2022, 13:33:07) [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)

  2. What operating system and processor architecture are you using?

    Linux-3.10.0-1127.19.1.el7.x86_64-x86_64-with-glibc2.17

  3. What are the component versions in the environment (pip freeze)?

    asn1crypto==1.5.1
    astroid==2.15.4
    asttokens==2.4.0
    async-timeout==4.0.2
    backcall==0.2.0
    boto3==1.26.126
    botocore==1.29.126
    cachetools==5.3.3
    certifi==2022.12.7
    cffi==1.15.1
    charset-normalizer==3.1.0
    click==7.1.2
    cloudpickle==2.2.1
    colorama==0.4.6
    comm==0.1.4
    coverage==7.2.5
    cryptography==3.4.8
    debugpy==1.8.0
    decorator==5.1.1
    Deprecated==1.2.13
    dill==0.3.6
    exceptiongroup==1.1.1
    executing==2.0.0
    factory-boy==3.3.0
    Faker==25.8.0
    filelock==3.15.1
    frozendict==2.4.4
    greenlet==2.0.2
    hvac==2.2.0
    idna==3.4
    iniconfig==2.0.0
    ipykernel==6.25.2
    ipython==8.16.1
    isort==5.12.0
    jedi==0.19.1
    jmespath==1.0.1
    jupyter_client==8.3.1
    jupyter_core==5.3.2
    lazy-object-proxy==1.9.0
    lxml==4.9.2
    Mako==1.2.4
    MarkupSafe==2.0.1
    marshmallow_dataclass==7.6.0
    marshmallow-enum==1.5.1
    marshmallow==3.21.3
    matplotlib-inline==0.1.6
    mccabe==0.7.0
    multimethod==1.10
    mypy-extensions==1.0.0
    mypy==1.2.0
    nest-asyncio==1.5.8
    numpy==1.24.3
    oauthlib==3.2.2
    packaging==23.1
    pandas==2.2.2
    parso==0.8.3
    pexpect==4.8.0
    pickleshare==0.7.5
    platformdirs==3.5.0
    pluggy==1.0.0
    prompt-toolkit==3.0.39
    psutil==5.9.5
    ptyprocess==0.7.0
    pure-eval==0.2.2
    pyarrow==10.0.1
    pycparser==2.21
    pydantic==1.10.13
    Pygments==2.16.1
    PyJWT==2.0.1
    pylint==2.17.3
    pyOpenSSL==21.0.0
    pytest-cov==2.12.1
    pytest-mock==2.0.0
    pytest==7.3.1
    python-dateutil==2.8.2
    python-gnupg==0.5.0
    python-json-logger==2.0.7
    pytz==2021.3
    PyYAML==6.0.1
    pyzmq==25.1.1
    redis==4.5.4
    requests-oauthlib==1.3.1
    requests==2.29.0
    s3transfer==0.6.0
    six==1.16.0
    snowflake-connector-python==3.12.1
    snowflake-snowpark-python==1.18.0
    sortedcontainers==2.4.0
    stack-data==0.6.3
    stringcase==1.2.0
    structlog==21.5.0
    tabulate==0.9.0
    tenacity==8.2.3
    toml==0.10.2
    tomli==2.0.1
    tomlkit==0.11.8
    tornado==6.3.3
    traitlets==5.11.1
    typeguard==4.0.0
    types-click==7.1.8
    types-python-dateutil==2.8.19.12
    types-requests==2.29.0.0
    types-tabulate==0.9.0.20240106
    types-urllib3==1.26.25.12
    typing_extensions==4.5.0
    typing-inspect==0.8.0
    tzdata==2024.1
    urllib3==1.26.15
    wcwidth==0.2.8
    Werkzeug==1.0.1
    wrapt==1.15.0

  4. What did you do?
    -- this query generates: SELECT "A", "B" FROM TEST_TABLE ORDER BY "A" ASC NULLS FIRST
    asc_query = session.table("TEST_TABLE").select("A", "B").order_by(col("A").asc())

    -- this query generates: SELECT "A", "B" FROM TEST_TABLE ORDER BY "A" DESC NULLS LAST
    desc_query = session.table("TEST_TABLE").select("A", "B").order_by(col("a").desc())

  5. What did you expect to see?
    I expected the ASC sort to either use no option for the ordering of null values or use NULLS LAST to keep the default sorting semantics of Snowflake.

I expected the DESC sort to either use no option for the ordering of null values or use NULLS LAST to keep the default sorting semantics of Snowflake.

Expectations based on the docs: Order By

  • Unless specified otherwise, NULL values are considered to be higher than any non-NULL values. As a result, the ordering for NULLS depends on the sort order:
    * If the sort order is ASC, NULLS are returned last; to force NULLS to be first, use NULLS FIRST.
    * If the sort order is DESC, NULLS are returned first; to force NULLS to be last, use NULLS LAST.
@samuelsongsr samuelsongsr added bug Something isn't working needs triage Initial RCA is required labels Aug 22, 2024
@github-actions github-actions bot changed the title When using order_by, the resulting query does not use the correct defaults for NULL [FIRST | LAST] as per the docs SNOW-1637096: When using order_by, the resulting query does not use the correct defaults for NULL [FIRST | LAST] as per the docs Aug 22, 2024
@sfc-gh-sghosh sfc-gh-sghosh self-assigned this Aug 23, 2024
@sfc-gh-sghosh
Copy link

sfc-gh-sghosh commented Aug 23, 2024

Hello @samuelsongsr ,

Thanks for raising the issue, you are right, as per documentation

Unless specified otherwise, NULL values are considered to be higher than any non-NULL values. As a result, the ordering for NULLS depends on the sort order:

If the sort order is ASC, NULLS are returned last; to force NULLS to be first, use NULLS FIRST.

If the sort order is DESC, NULLS are returned first; to force NULLS to be last, use NULLS LAST.

The output for below query should be NULL at last for ASC order

asc_query = session.table("TEST_TABLE").select("A", "B").order_by(col("A").asc())
asc_results = asc_query.collect()


print("Results for ASC Query:")
for row in asc_results:
    print(row)

The output is similar to using "asc_nulls_first"
asc_query1 = session.table("TEST_TABLE").select("A", "B").order_by(col("A").asc_nulls_first())

The same goes for DESC also,
As per the doc, the NULL should appear first, but it's coming as LAST

session.table("TEST_TABLE").select("A", "B").order_by(col("A").desc())
its behaving like session.table("TEST_TABLE").select("A", "B").order_by(col("A").desc_nulls_last())

Code:

# Define a schema for the table
schema = StructType([
    StructField("A", StringType()),
    StructField("B", StringType())
])

# Create the table with some example data
data = [("1", "Apple"), ("3", "Banana"), ("2", "Orange"), (None, "Peach")]
df = session.create_dataframe(data, schema)

# Write the DataFrame to a table
df.write.mode("overwrite").save_as_table("TEST_TABLE")

# Query 1: SELECT "A", "B" FROM TEST_TABLE ORDER BY "A" ASC NULLS FIRST
asc_query = session.table("TEST_TABLE").select("A", "B").order_by(col("A").asc())
asc_results = asc_query.collect()

# Print the results of the ASC query
print("Results for ASC Query:")
for row in asc_results:
    print(row)

# Query 2:
asc_query1 = session.table("TEST_TABLE").select("A", "B").order_by(col("A").asc_nulls_first())
asc_results1 = asc_query1.collect()

# Print the results of the ASC query
print("Results for ASC Query:")
for row in asc_results1:
    print(row)

# Query 2: SELECT "A", "B" FROM TEST_TABLE ORDER BY "A" DESC NULLS LAST
desc_query = session.table("TEST_TABLE").select("A", "B").order_by(col("A").desc())
desc_results = desc_query.collect()

# Print the results of the DESC query
print("\nResults for DESC Query:")
for row in desc_results:
    print(row)

# Query 2: 
desc_query2 = session.table("TEST_TABLE").select("A", "B").order_by(col("A").desc_nulls_last())
desc_results2 = desc_query2.collect()

# Print the results of the DESC query
print("\nResults for DESC Query:")
for row in desc_results2:
    print(row)
    
 Output:
Results for ASC Query:
Row(A=None, B='Peach')
Row(A='1', B='Apple')
Row(A='2', B='Orange')
Row(A='3', B='Banana')
Results for ASC Query:
Row(A=None, B='Peach')
Row(A='1', B='Apple')
Row(A='2', B='Orange')
Row(A='3', B='Banana')

Results for DESC Query:
Row(A='3', B='Banana')
Row(A='2', B='Orange')
Row(A='1', B='Apple')
Row(A=None, B='Peach')

Results for DESC Query:
Row(A='3', B='Banana')
Row(A='2', B='Orange')
Row(A='1', B='Apple')
Row(A=None, B='Peach')



Will work on eliminating it, will update.

Regards,
Sujan

@sfc-gh-sghosh sfc-gh-sghosh added status-triage_done Initial triage done, will be further handled by the driver team and removed needs triage Initial RCA is required labels Aug 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working status-triage_done Initial triage done, will be further handled by the driver team
Projects
None yet
Development

No branches or pull requests

3 participants