Reading Large CSV #6833
-
Hi team, Just curious to know what are the different ways to read a large CSV (typically 10-20GB) file into ibis. And how seamlessly can we achieve this using ibis? Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
@vmsaipreeth it's going to be very dependent on your backend -- typically a large CSV like that will expand some number in memory so you could experience out of memory (OOM) errors. You can try with |
Beta Was this translation helpful? Give feedback.
@vmsaipreeth Hey, this is still on our radar! We've been at Euroscipy last week so a few of us were out.
For reading CSVs into ibis there are a few options, mostly breaking down into local versus remote:
Local Backends
Backends like DuckDB, Polars, and DataFusion can read CSV files and they typically do well with them. If you're prototyping an analysis I would start with one of these backends.
Remote Backends
Support for reading CSV files varies quite a bit here. Some backends like Snowflake and ClickHouse have good support, while Trino does not support reading CSV files at all. More traditional DMBSs like PostgreSQL and MySQL do not support the
read_csv
API. Trino also doesn't support re…