Arbitrary functions and SQL's limitations #56
Replies: 3 comments 1 reply
-
I think Arroyo is a long way off from what I would consider a "modular" system, however I think it will be important to its success. Of course we can write pipelines in Rust, but I believe that by emphasizing SQL we're enabling users to take a more simple approach. I've spent a lot of years working with Airflow. Because it's based on Python, and everything is written with Python, it's super extensible. Users can create anything that they could in Python and turn it into an operator, executable code for a task, UI components, plugins, etc. Many companies have a couple of "user groups" DAG authors and system maintainers are different. If the system engineers (those writing operators and plugins) are doing their job right, it allows the DAG authors to just use the code that the other engineers have abstracted for them. This is even more important with SQL because it enables analysts and other users that are experienced with SQL but less so with more advanced languages such as Rust, to develop powerful pipelines. I can see a system like this being important to implement starting early on for Arroyo. It could be bolted on later, but in my experience it's always going to be more elegant and a better user experience if this is considered from the start. |
Beta Was this translation helpful? Give feedback.
-
Totally agree that flexibility is very important, and we're still very short of that. Our short term plan there is to support SQL UDFs compiled via WASM. We already have support for running WASM functions as part of the pipeline ( arroyo/arroyo-worker/src/operators/mod.rs Line 300 in 4201d4b To expose this to SQL, we just need to add a compilation rule to turn UDFs into wasm operators. It won't be a lot of work. Ultimately we want to allow UDFs to be written in other languages like python as well. |
Beta Was this translation helpful? Give feedback.
-
I have a case that I was thinking about, and I was trying to come up with the best architecture to support it. Consider q1 from Nexmark's benchmark query suite:
If I wanted to pull the conversion rate via API, I would currently have to make a Rust pipeline, not SQL, that performed that operation.
With SQL, what type of extensibility do users have to keep their queries readable, declarative, and optimizable by the query planner, while also implementing arbitrary functionality? Are there plans to support users extending the functionality with custom functions, types, and operators like Postgres extensions or SQL Server extensibility? Does this exist now?
Beta Was this translation helpful? Give feedback.
All reactions