Replies: 3 comments 2 replies
-
Hey @aborruso, Rill uses DuckDB under the hood to process data, and while a 1.5GB Parquet file is not usually a problem for DuckDB, it seems there's something about this particular Parquet file that causes it to generate a large amount of temporary data during ingestion. One possible workaround is to disable insertion order preservation. Can you try starting Rill with these flags and see if it solves the issue: rill start --reset --var connector.duckdb.boot_queries='SET preserve_insertion_order TO false' Also, can you tell me what version of Rill you're running ( |
Beta Was this translation helpful? Give feedback.
-
If you're also planning to deploy to Rill Cloud, you may also want to consider uploading the parquet file to an object storage instead (such as S3 or GCS). Besides what Benjamin mentioned (where we use DuckDB under the hood and these tmp files are related to the ingestion), you will notice that local files are added to your At a high level, when it comes to working with larger files, you may find that using object storage (if a possibility) will generally be more efficient from a storage standpoint and more performant / scalable. This also aligns with general best practices and what we would recommend, especially for production and/or if you wish to share dashboards with others (using Rill Cloud). |
Beta Was this translation helpful? Give feedback.
-
Hi @begelundmuller and @AndrewRTsao first of all thank you very much for your replies. And thank you also for Rill, it seems to me great! I have tried to use the @begelundmuller way: I don't have those huge temporary files anymore, but it doesn't display anything and it crashes after a while. The file I'm using is here https://huggingface.co/datasets/aborruso/opencup/blob/refs%2Fconvert%2Fparquet/default/train/0000.parquet Thank you |
Beta Was this translation helpful? Give feedback.
-
Hi,
I have a 1.5 GB parquet file. Almost 20 million rows and 100 columns.
I add it as local file. After 15 minutes of processing, it is still not finished and very large temporary files are created.
And so then I'm going to stop everything, because I think he's gone wrong. Can I pre work something?
Do you have any tips for using rill with files of this size?
Thank you
Beta Was this translation helpful? Give feedback.
All reactions