Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1545867: COPY INTO command doesn't get submitted asynchronously by using collect_nowait #1941

Open
ankitsr92 opened this issue Jul 19, 2024 · 7 comments
Assignees
Labels
status-triage_done Initial triage done, will be further handled by the driver team

Comments

@ankitsr92
Copy link

Please answer these questions before submitting your issue. Thanks!

  1. What version of Python are you using?

3.8

  1. What operating system and processor architecture are you using?

Snowpark Snowflake Stored Procedure

  1. What are the component versions in the environment (pip freeze)?

Snowpark Snowflake Stored Procedure

  1. What did you do?

when submitting copy commands to async jobs.. the copy commands still run in sequence. Whereas other sql's/CTAS does get submitted asynchronously.
for table in table_list:
sql_command -> COPY INTO command
async_job = session.sql(sql_command).collect_nowait()
async_jobs.append(async_job)

  1. What did you expect to see?

COPY INTO should be submitted asynchronously and I should see multiple parallel running COPY commands.

  1. Can you set logging to DEBUG and collect the logs?

Running within Snowflake

@ankitsr92 ankitsr92 added bug Something isn't working needs triage Initial RCA is required labels Jul 19, 2024
@github-actions github-actions bot changed the title COPY INTO command doesn't get submitted asynchronously by using collect_nowait SNOW-1545867: COPY INTO command doesn't get submitted asynchronously by using collect_nowait Jul 19, 2024
@sfc-gh-sghosh sfc-gh-sghosh self-assigned this Jul 23, 2024
@sfc-gh-sghosh
Copy link

Hello @ankitsr92 ,

Thanks for raising the issue, we are looking into it, will update.

Regards,
Sujan

@sfc-gh-sghosh
Copy link

Hello @ankitsr92 ,

The Snowpark stored procedure always executes in a procedural way, it doesnt execute in parallel, thats why internal copy commands execute in a procedural way.

Regards,
Sujan

@sfc-gh-sghosh sfc-gh-sghosh added status-triage_done Initial triage done, will be further handled by the driver team and removed bug Something isn't working needs triage Initial RCA is required labels Jul 26, 2024
@ankitsr92
Copy link
Author

ankitsr92 commented Jul 26, 2024

@sfc-gh-sghosh I am not sure if you have got the question.
I am using COLLECT_NOWAIT() which means the procs should not wait for the sql to complete to move further. The problem is when I run a COPY INTO sql with collect_nowait() its still waiting for the COPY to complete to move next in the loop.

Instead if i used any other SQL ( eg CTAS, INSERT INTO or just simple SYSTEM$WAIT ) collect_nowait works fine and moves to the next sql without waiting for its completion. Try this for example,

from snowflake.snowpark.types import StringType, IntegerType
from snowflake.snowpark.async_job import AsyncJob
import time

def main(session: snowpark.Session):

    async_jobs = []
    
    for i in range(10):
        sql_command = "SELECT SYSTEM$WAIT(10)"
        async_job = session.sql(sql_command).collect_nowait()
        async_jobs.append(async_job)

    results = []
    for job in async_jobs:
        result = job.result()
        results.append(result)

    return "Success"
$$;

@sfc-gh-sghosh
Copy link

Thank you, @ankitsr92, for pointing out that.
let me check and get back.

Regards,
Sujan

@nickhealy
Copy link

@sfc-gh-sghosh do you have any updates on this? i am also facing the same issue

@sfc-gh-sghosh
Copy link

Hello @nickhealy @ankitsr92 ,

The team is working on the fix, will update.

Regards,
Sujan

@sfc-gh-aalam
Copy link
Contributor

Hi @nickhealy @ankitsr92, can you please share which snowpark version you are using and the side effect which make you think the async jobs are not being submitted asynchronously? I am not able to reproduce it on my end.

When inside stored procedure environment, you can set the version in packages

create or replace procedure my_python_sp()
returns STRING
language python
handler='my_handler_func'
runtime_version=3.8
packages=('snowflake-snowpark-python=1.20.0')
as $$
...

or you can choose it using packages dropdown
Screenshot 2024-08-15 at 1 59 20 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status-triage_done Initial triage done, will be further handled by the driver team
Projects
None yet
Development

No branches or pull requests

5 participants