Arbitrary functions and SQL's limitations #56

akennedy4155 · 2023-04-19T00:22:15Z

akennedy4155
Apr 19, 2023

I have a case that I was thinking about, and I was trying to come up with the best architecture to support it. Consider q1 from Nexmark's benchmark query suite:

Convert each bid value from dollars to euros

If I wanted to pull the conversion rate via API, I would currently have to make a Rust pipeline, not SQL, that performed that operation.

With SQL, what type of extensibility do users have to keep their queries readable, declarative, and optimizable by the query planner, while also implementing arbitrary functionality? Are there plans to support users extending the functionality with custom functions, types, and operators like Postgres extensions or SQL Server extensibility? Does this exist now?

akennedy4155 · 2023-04-19T00:23:47Z

akennedy4155
Apr 19, 2023
Author

https://www.postgresql.org/docs/current/extend.html

0 replies

akennedy4155 · 2023-04-19T00:29:45Z

akennedy4155
Apr 19, 2023
Author

I think Arroyo is a long way off from what I would consider a "modular" system, however I think it will be important to its success. Of course we can write pipelines in Rust, but I believe that by emphasizing SQL we're enabling users to take a more simple approach. I've spent a lot of years working with Airflow. Because it's based on Python, and everything is written with Python, it's super extensible. Users can create anything that they could in Python and turn it into an operator, executable code for a task, UI components, plugins, etc. Many companies have a couple of "user groups" DAG authors and system maintainers are different. If the system engineers (those writing operators and plugins) are doing their job right, it allows the DAG authors to just use the code that the other engineers have abstracted for them. This is even more important with SQL because it enables analysts and other users that are experienced with SQL but less so with more advanced languages such as Rust, to develop powerful pipelines.

I can see a system like this being important to implement starting early on for Arroyo. It could be bolted on later, but in my experience it's always going to be more elegant and a better user experience if this is considered from the start.

0 replies

mwylde · 2023-04-19T01:23:17Z

mwylde
Apr 19, 2023
Maintainer

Totally agree that flexibility is very important, and we're still very short of that. Our short term plan there is to support SQL UDFs compiled via WASM. We already have support for running WASM functions as part of the pipeline (

arroyo/arroyo-worker/src/operators/mod.rs

Line 300 in 4201d4b

    
           pub struct WasmOperator<InKey: Key, InT: Data, OutK: Key + 'static, OutT: Data + 'static> {

), which is leveraged by the Rust dataflow API. (Fun fact: the user code in the Rust API is actually compiled to WASM and that's what's run).

To expose this to SQL, we just need to add a compilation rule to turn UDFs into wasm operators. It won't be a lot of work.

Ultimately we want to allow UDFs to be written in other languages like python as well.

1 reply

akennedy4155 Apr 19, 2023
Author

This may be a naïve question.... Why WASM?

My understanding is that it's a way for browsers to call "native code" and write bindings between something like Rust or cpp and JS. Mostly used for speeding up processing. That's about as far as I've made it there.

The one thing I can think of for the reason that this would be WASM, is that it offers a sandbox, more security, for user code which is verrrrry important when you can run arbitrary code. This isolation is definitely a necessity, but besides that, I'm not sure why we would run

Am I missing anything there?

Actually..... Just thought about it for another minute. Another reason that I would compile to WASM is because Rust can compile separate code at runtime, generated through SQL, and then use that code somewhat as a "plugin" right? Rather than just compiling a Rust binary, it can use the WASM built and execute that as a module, hot-loaded.

That's just an educated guess, and I'm not 100p about how that works in practice.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arbitrary functions and SQL's limitations #56

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Arbitrary functions and SQL's limitations #56

akennedy4155 Apr 19, 2023

Replies: 3 comments · 1 reply

akennedy4155 Apr 19, 2023 Author

akennedy4155 Apr 19, 2023 Author

mwylde Apr 19, 2023 Maintainer

akennedy4155 Apr 19, 2023 Author

akennedy4155
Apr 19, 2023

Replies: 3 comments 1 reply

akennedy4155
Apr 19, 2023
Author

akennedy4155
Apr 19, 2023
Author

mwylde
Apr 19, 2023
Maintainer

akennedy4155 Apr 19, 2023
Author