Skip to content
This repository has been archived by the owner on Jun 4, 2024. It is now read-only.

Latest commit

 

History

History
110 lines (80 loc) · 3.37 KB

README.md

File metadata and controls

110 lines (80 loc) · 3.37 KB

Datafusion WASM User Defined Functions

Very simplistic datafusion user defined functions written in WASM.

POC has been built on top of Wasmedge library. It is not terribly performant, with lot of coping and serializing data.

It has been implemented to demonstrate DataFusion FunctionFactory functionality (arrow-datafusion/pull#9333) & WASM UDF (arrow-datafusion/pull#9326).

Other project in FunctionFactory series:

Note

  • It has not been envisaged as a actively maintained library.

Installation

In order to be able to compile project WasmEdge library should be installed.

or using brew:

brew install wasmedge

Why WasmEdge? It provided good enough examples to pick up wasm/rust integration.

Define Function

Define a rust function (wasm_function) like:

// expose function f1 as external function
// add required bindgen, and required serialization/deserialization
wasm_udf::export_udf_function!(f1);

/// standard datafusion udf ... kind of 
/// should return ArrayRef or ArrowError (or any error implementing to_string)
fn f1(args: &[ArrayRef]) -> Result<ArrayRef,ArrowError> {
    let base = args[0]
        .as_any()
        .downcast_ref::<Float64Array>()
        .expect("cast 0 failed");
    let exponent = args[1]
        .as_any()
        .downcast_ref::<Float64Array>()
        .expect("cast 1 failed");

    let array = base
        .iter()
        .zip(exponent.iter())
        .map(|(base, exponent)| match (base, exponent) {
            (Some(base), Some(exponent)) => Some(base.powf(exponent)),
            _ => None,
        })
        .collect::<Float64Array>();

    Ok(Arc::new(array))
}

which will be compiled to a wasm module with:

cd wasm_function
cargo build

An artifact should be available at target/wasm32-unknown-unknown/debug/wasm_function.wasm.

export_udf_function! macro should add WasmEdge bindings and peace of code which would do Arrow IPC serialization/deserialization. Arrow arrays are effectively copied across rust/wasm boundary using Arrow Ipc.

This code currently handles happy path scenario, with some exceptional cases covered.

UDF Declaration

let sql = r#"
CREATE FUNCTION f1(DOUBLE, DOUBLE)
RETURNS DOUBLE
LANGUAGE WASM
AS 'wasm_function.wasm!f1'
"#;

ctx.sql(sql).await?.show().await?;

ctx.sql("select a, b, f1(a,b) from t").await?.show().await?;

full example

should produce something similar to:

+-----+-----+-------------------+
| a   | b   | f1(t.a,t.b)       |
+-----+-----+-------------------+
| 2.0 | 2.0 | 4.0               |
| 3.0 | 3.0 | 27.0              |
| 4.0 | 4.0 | 256.0             |
| 5.0 | 5.1 | 3670.684197150057 |
+-----+-----+-------------------+

Function is declared in format wasm_function.wasm!f1, where wasm_function.wasm represents module to load and f1 a function to call.