SNOW-1446046: Support `glob()` for `additional_source_files` in `streamlit` deployment. #1108

dreeves-battery · 2024-05-24T02:54:53Z

Description

It would be nice if, instead of looping over additional source files like this:

if additional_source_files:
    for file in additional_source_files:
        ...

It was looped over like this instead:

from glob import glob

# ...

if additional_source_files:
    for file_list in additional_source_files:
        for file in glob(file_list):
            ...

So that users don't need to specify each individual file as a project grows.

This should be fully backwards compatible.

Context

No response

sfc-gh-vtimofeenko · 2024-07-02T21:44:29Z

I did a bit of digging and looks like cli.plugins.streamlit.manager hands it off to cli.plugins.stage.manager to perform the actual PUT through snowflake.connector.cursor. A wildcard passed code in snowflake-cli and the error is returned from the cursor if it tries to upload a directory.

The experiment was conducted on this file structure:


├── common
│   ├── hello.py
│   └── mymodule
│       ├── __init__.py
│       └── submodule
│           └── __init__.py
├── environment.yml
├── pages
│   └── my_page.py
├── snowflake.yml
└── streamlit_app.py

Code in mymodule is not terribly important. Trying out different permutations of wildcards in snowflake.yml:

definition_version: "1.1"
streamlit:
  name: streamlit_app_snowcli
  stage: my_streamlit_stage_snowcli
  query_warehouse: ADHOC
  main_file: streamlit_app.py
  env_file: environment.yml
  pages_dir: pages/
  additional_source_files:
    - common/*.py # OK, produces common/hello.py
    # - common/**  # NOK, Produces "not a file but a directory" for mymodule
    - common/mymodule/*.py # OK, uploads __init__.py
    # - common/mymodule/**/*.py # NOK, produces "my_streamlit_stage_snowcli/streamlit_app_snowcli/common/mymodule/**/__init__.py"
    # - common/mymodule/*/*.py # NOK, produces "my_streamlit_stage_snowcli/streamlit_app_snowcli/common/mymodule/*/__init__.py"
    # - common/mymodule/**/*.py # NOK, produces "my_streamlit_stage_snowcli/streamlit_app_snowcli/common/mymodule/**/__init__.py"
    # - common/mymodule/***/*.py # NOK, produces "my_streamlit_stage_snowcli/streamlit_app_snowcli/common/mymodule/***/__init__.py"
    # - common/mymodule/**/* # NOK, produces "my_streamlit_stage_snowcli/streamlit_app_snowcli/common/mymodule/***/__init__.py"
    - common/mymodule/submodule/* # OK, produces what is expected, this would need to be repeated for every "leaf" subdirectory

Looks like nested wildcards (*/*-like patterns) result in files like "my_streamlit_stage_snowcli/streamlit_app_snowcli/common/mymodule/*/__init__.py" (asterisk is literal) which breaks the Streamlit application.

It's possible to implement something like in the original comment by parsing the glob on snowflake-cli side(take glob, turn it into explicit list of files on snowflake-cli side, pass to snowflake.connector, but potential downside of that approach is that it will deparallelize the uploads, effectively making it single-threaded. Which could be not a big deal as the file count is not large enough.

dreeves-battery · 2024-07-18T21:11:29Z

It's possible to implement something like in the original comment by parsing the glob on snowflake-cli side(take glob, turn it into explicit list of files on snowflake-cli side, pass to snowflake.connector, but potential downside of that approach is that it will deparallelize the uploads, effectively making it single-threaded. Which could be not a big deal as the file count is not large enough.

@sfc-gh-vtimofeenko My 2 cents:

First: If you have enough files for parallelization to matter for performance of the deploy step, then you have enough files for globs to matter for the maintainability of your deployment spec.

Second: there is also surely a way to parallelize this, would be my strong intuition, if the existing code is parallelized. A glob() call that returns a.py, b.py and c.py is not fundamentally different than passing those directly in a list. There is just one extra step between the two that can be performed on the local machine.

github-actions bot changed the title ~~Support glob() for additional_source_files in streamlit deployment.~~ SNOW-1446046: Support glob() for additional_source_files in streamlit deployment. May 24, 2024

sfc-gh-mraba added the enhancement New feature or request label May 27, 2024

sfc-gh-turbaszek added the streamlit label May 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SNOW-1446046: Support `glob()` for `additional_source_files` in `streamlit` deployment. #1108

SNOW-1446046: Support `glob()` for `additional_source_files` in `streamlit` deployment. #1108

dreeves-battery commented May 24, 2024

sfc-gh-vtimofeenko commented Jul 2, 2024

dreeves-battery commented Jul 18, 2024

SNOW-1446046: Support glob() for additional_source_files in streamlit deployment. #1108

SNOW-1446046: Support glob() for additional_source_files in streamlit deployment. #1108

Comments

dreeves-battery commented May 24, 2024

Description

Context

sfc-gh-vtimofeenko commented Jul 2, 2024

dreeves-battery commented Jul 18, 2024

SNOW-1446046: Support `glob()` for `additional_source_files` in `streamlit` deployment. #1108

SNOW-1446046: Support `glob()` for `additional_source_files` in `streamlit` deployment. #1108