You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My understanding is that DataFusion is primarily an extensible query engine for engineers looking to build database systems (Influx and so on) without reinventing the wheel.
Having said that, I can see it has a Rust-based DataFrame API and SQL context at a high enough abstraction layer that it's tempting to start building Data Engineering pipelines in pure Rust. 😄
An example of something I'd love to be able to do with DataFusion (I know some of this is already possible):
With features and methods provided by DataFusion, query DeltaLake/Iceberg/MySQL/Postgres/Clickhouse/Influx and so on into DataFrame(s).
Transform the data.
Load the data into any of the above systems from DataFusion.
Should memory become a bottleneck, it's a matter of relatively simple config to use a Ray cluster for distributed computing.
Ideally, the above should be possible by a Data Engineer who doesn't have database internals domain knowledge. I also appreciate that because DataFusion uses Arrow for its memory format some of this might be wishful thinking (or at least not straightforward to implement).
Is there a chance DataFusion will evolve in this direction or will the focus remain on database systems? Is anyone else in the community using DataFusion in Rust for Data Engineering and if so, what is your experience?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
My understanding is that DataFusion is primarily an extensible query engine for engineers looking to build database systems (Influx and so on) without reinventing the wheel.
Having said that, I can see it has a Rust-based DataFrame API and SQL context at a high enough abstraction layer that it's tempting to start building Data Engineering pipelines in pure Rust. 😄
An example of something I'd love to be able to do with DataFusion (I know some of this is already possible):
Ideally, the above should be possible by a Data Engineer who doesn't have database internals domain knowledge. I also appreciate that because DataFusion uses Arrow for its memory format some of this might be wishful thinking (or at least not straightforward to implement).
Is there a chance DataFusion will evolve in this direction or will the focus remain on database systems? Is anyone else in the community using DataFusion in Rust for Data Engineering and if so, what is your experience?
Beta Was this translation helpful? Give feedback.
All reactions