Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scan_csv does not work on s3 path while read_csv works #13115

Closed
2 tasks done
michael72 opened this issue Dec 19, 2023 · 4 comments
Closed
2 tasks done

scan_csv does not work on s3 path while read_csv works #13115

michael72 opened this issue Dec 19, 2023 · 4 comments
Labels
A-io-csv Area: reading/writing CSV files bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@michael72
Copy link

Checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

some_s3_file = f"s3://{BUCKET}/data.csv"
pl.read_csv(some_s3_file) # works
pl.scan_csv(some_s3_file) # doesn't work

Log output

2023-12-19 07:43:57,983 - analysis_helper - ERROR - No such file or directory (os error 2): s3://*****************/data.csv

Issue description

scan_csv fails reading s3 data

Expected behavior

scan_csv should at least support the same locations as read_csv

Installed versions

--------Version info---------
Polars:               0.20.1
Index type:           UInt32
Platform:             Linux-5.19.0-46-generic-x86_64-with-glibc2.35
Python:               3.11.4 (main, Jun  7 2023, 12:45:48) [GCC 11.3.0]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          2.2.1
connectorx:           <not installed>
deltalake:            <not installed>
fsspec:               2023.12.2
gevent:               <not installed>
matplotlib:           3.7.2
numpy:                1.26.0
openpyxl:             <not installed>
pandas:               2.0.3
pyarrow:              14.0.1
pydantic:             <not installed>
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           <not installed>
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>
@michael72 michael72 added bug Something isn't working python Related to Python Polars labels Dec 19, 2023
@ritchie46
Copy link
Member

ritchie46 commented Dec 19, 2023

scan_csv doesn't support reading from the cloud yet.

@stinodego stinodego added the needs triage Awaiting prioritization by a maintainer label Jan 13, 2024
@stinodego stinodego added the A-io-csv Area: reading/writing CSV files label Jan 21, 2024
@edblackburn
Copy link

edblackburn commented Jan 23, 2024

Hi @ritchie46 is there a plan to introduce scanning CSV files from object storage like S3? Is it on a roadmap? Is there a roadmap I can look at, sorry if this is in neon lights elsewhere and I missed it. Are there any significant blockers, and is there anything the community can do to help? Sorry for so many questions!

@edblackburn
Copy link

I am following up on my previous comment and inquiring about the possibility of a piecemeal approach to adding object storage features. Specifically, is it possible to explore a route similar to the one taken for Parquet files, where storage options are introduced as arguments for scan_csv as they exist for scan_parquet?

If seeking parity between scan and read for CSV files is a stated objective, I assume such a change represents a substantial engineering endeavour. However, a phased approach might offer a way to incrementally build the capability without deviating too much from the goal of achieving parity between read_csv and scan_csv. I note this has been requested prior in issue #7225

Are there any significant blockers, or is there anything the community can contribute to help? Please let us know.

Thank you, Ed

@nameexhaustion
Copy link
Collaborator

Closed by #16674

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-io-csv Area: reading/writing CSV files bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

6 participants