Update levenshtein
scalar function to support Utf8View
#11854
Labels
enhancement
New feature or request
levenshtein
scalar function to support Utf8View
#11854
Part of #11752 and #11790
Currently, a call to
levenshtein
with a Utf8View datatypes induces a cast. After the change that fixes this issue, it should not.levenshtein is defined here: https://github.com/apache/datafusion/blob/main/datafusion/functions/src/string/levenshtein.rs
Is your feature request related to a problem or challenge?
We are working to add complete StringView support in DataFusion, which permits potentially much faster processing of string data. See #10918 for more background.
Today, most DataFusion string functions support DataType::Utf8 and DataType::LargeUtf8 and when called with a StringView argument DataFusion will cast the argument back to DataType::Utf8 which is expensive.
To realize the full speed of StringView, we need to ensure that all string functions support the DataType::Utf8View directly.
Describe the solution you'd like
Update the function to support DataType::Utf8View directly
Describe alternatives you've considered
The typical steps are:
string_view.slt
to ensure the arguments are not being castSignature
of the function to acceptUtf8View
in addition toUtf8
/LargeUtf8
Utf8View
Example PRs
Utf8View
type instarts_with
function #11787StringViewArray
#11556Additional context
The documentation of string functions can be found here: https://datafusion.apache.org/user-guide/sql/scalar_functions.html#string-functions
To test a function with StringView with
datafusion-cli
you can use an example like this (replacingstarts_with
with the relevant function)To see if it is using utf8 view, use
EXPLAIN
to see the plan and verify there is noCAST
. In this example theCAST(column1@0 AS Utf8)
indicates that the function is not usingUtf8View
nativelyIt is also often good to test with a constant as well (likewise there should be no cast):
The text was updated successfully, but these errors were encountered: