34.0.0 (2023-12-11)
Breaking changes:
- Implement
DISTINCT ON
from Postgres #7981 (gruuya) - Encapsulate
EquivalenceClass
into a struct #8034 (alamb) - Make fields of
ScalarUDF
,AggregateUDF
andWindowUDF
nonpub
#8079 (alamb) - Implement StreamTable and StreamTableProvider (#7994) #8021 (tustvold)
- feat: make FixedSizeList scalar also an ArrayRef #8221 (wjones127)
- Remove FileWriterMode and ListingTableInsertMode (#7994) #8017 (tustvold)
- Refactor: Unify
Expr::ScalarFunction
andExpr::ScalarUDF
, introduce unresolved functions by name #8258 (2010YOUY01) - Refactor aggregate function handling #8358 (Weijun-H)
- Move
PartitionSearchMode
into datafusion_physical_plan, rename toInputOrderMode
#8364 (alamb) - Split
EmptyExec
intoPlaceholderRowExec
#8446 (razeghi71)
Implemented enhancements:
- feat: show statistics in explain verbose #8113 (NGA-TRAN)
- feat:implement postgres style 'overlay' string function #8117 (Syleechan)
- feat: fill missing values with NULLs while inserting #8146 (jonahgao)
- feat: to_array_of_size for ScalarValue::FixedSizeList #8225 (wjones127)
- feat:implement calcite style 'levenshtein' string function #8168 (Syleechan)
- feat: roundtrip FixedSizeList Scalar to protobuf #8239 (wjones127)
- feat: impl the basic
string_agg
function #8148 (haohuaijin) - feat: support simplifying BinaryExpr with arbitrary guarantees in GuaranteeRewriter #8256 (wjones127)
- feat: support customizing column default values for inserting #8283 (jonahgao)
- feat:implement sql style 'substr_index' string function #8272 (Syleechan)
- feat:implement sql style 'find_in_set' string function #8328 (Syleechan)
- feat: support
LargeList
inarray_empty
#8321 (Weijun-H) - feat: support
LargeList
inmake_array
andarray_length
#8121 (Weijun-H) - feat: ScalarValue from String #8411 (QuenKar)
- feat: support
LargeList
forarray_has
,array_has_all
andarray_has_any
#8322 (Weijun-H) - feat: customize column default values for external tables #8415 (jonahgao)
- feat: Support
array_sort
(list_sort
) #8279 (Asura7969) - feat: support
InterleaveExecNode
in the proto #8460 (liukun4515) - feat: improve string statistics display in datafusion-cli
parquet_metadata
function #8535 (asimsedhain)
Fixed bugs:
- fix: Timestamp with timezone not considered
join on
#8150 (ACking-you) - fix: wrong result of range function #8313 (smallzhongfeng)
- fix: make
ntile
work in some corner cases #8371 (haohuaijin) - fix: Changed labeler.yml to latest format #8431 (viirya)
- fix: Literal in
ORDER BY
window definition should not be an ordinal referring to relation column #8419 (viirya) - fix: ORDER BY window definition should work on null literal #8444 (viirya)
- fix: RANGE frame for corner cases with empty ORDER BY clause should be treated as constant sort #8445 (viirya)
- fix: don't unifies projection if expr is non-trival #8454 (haohuaijin)
- fix: support uppercase when parsing
Interval
#8478 (QuenKar) - fix: incorrect set preserve_partitioning in SortExec #8485 (haohuaijin)
- fix: Pull stats in
IdentVisitor
/GraphvizVisitor
only when requested #8514 (vrongmeal) - fix: volatile expressions should not be target of common subexpt elimination #8520 (viirya)
Documentation updates:
- Library Guide: Add Using the DataFrame API #8319 (Veeupup)
- Minor: Add installation link to README.md #8389 (Weijun-H)
- Prepare version 34.0.0 #8508 (andygrove)
Merged pull requests:
- Fix typo in partitioning.rs #8134 (lewiszlw)
- Implement
DISTINCT ON
from Postgres #7981 (gruuya) - Prepare 33.0.0-rc2 #8144 (andygrove)
- Avoid concat in
array_append
#8137 (jayzhan211) - Replace macro with function for array_remove #8106 (jayzhan211)
- Implement
array_union
#7897 (edmondop) - Minor: Document
ExecutionPlan::equivalence_properties
more thoroughly #8128 (alamb) - feat: show statistics in explain verbose #8113 (NGA-TRAN)
- feat:implement postgres style 'overlay' string function #8117 (Syleechan)
- Minor: Encapsulate
LeftJoinData
into a struct (rather than anonymous enum) and add comments #8153 (alamb) - Update sqllogictest requirement from 0.18.0 to 0.19.0 #8163 (dependabot[bot])
- feat: fill missing values with NULLs while inserting #8146 (jonahgao)
- Introduce return type for aggregate sum #8141 (jayzhan211)
- implement range/generate_series func #8140 (Veeupup)
- Encapsulate
EquivalenceClass
into a struct #8034 (alamb) - Revert "Minor: remove unnecessary projection in `single_distinct_to_g… #8176 (NGA-TRAN)
- Preserve all of the valid orderings during merging. #8169 (mustafasrepo)
- Make fields of
ScalarUDF
,AggregateUDF
andWindowUDF
nonpub
#8079 (alamb) - Fix logical conflicts #8187 (tustvold)
- Minor: Update JoinHashMap comment example to make it clearer #8154 (alamb)
- Implement StreamTable and StreamTableProvider (#7994) #8021 (tustvold)
- [MINOR]: Remove unused Results #8189 (mustafasrepo)
- Minor: clean up the code based on clippy #8179 (Weijun-H)
- Minor: simplify filter statistics code #8174 (alamb)
- Replace macro with function for
array_position
andarray_positions
#8170 (jayzhan211) - Add Library Guide for User Defined Functions: Window/Aggregate #8171 (Veeupup)
- Add more stream docs #8192 (tustvold)
- Implement func
array_pop_front
#8142 (Veeupup) - Moving arrow_files SQL tests to sqllogictest #8217 (edmondop)
- fix regression in the use of name in ProjectionPushdown #8219 (alamb)
- [MINOR]: Fix column indices in the planning tests #8191 (mustafasrepo)
- Remove unnecessary reassignment #8232 (qrilka)
- Update itertools requirement from 0.11 to 0.12 #8233 (crepererum)
- Port tests in subqueries.rs to sqllogictest #8231 (PsiACE)
- feat: make FixedSizeList scalar also an ArrayRef #8221 (wjones127)
- Add versions to datafusion dependencies #8238 (andygrove)
- feat: to_array_of_size for ScalarValue::FixedSizeList #8225 (wjones127)
- feat:implement calcite style 'levenshtein' string function #8168 (Syleechan)
- feat: roundtrip FixedSizeList Scalar to protobuf #8239 (wjones127)
- Update prost-build requirement from =0.12.1 to =0.12.2 #8244 (dependabot[bot])
- Minor: Port tests in
displayable.rs
to sqllogictest #8246 (Weijun-H) - Minor: add
with_estimated_selectivity
to Precision #8177 (alamb) - fix: Timestamp with timezone not considered
join on
#8150 (ACking-you) - Replace macro in array_array to remove duplicate codes #8252 (Veeupup)
- Port tests in projection.rs to sqllogictest #8240 (PsiACE)
- Introduce
array_except
function #8135 (jayzhan211) - Port tests in
describe.rs
to sqllogictest #8242 (Asura7969) - Remove FileWriterMode and ListingTableInsertMode (#7994) #8017 (tustvold)
- Minor: clean up the code based on Clippy #8257 (Weijun-H)
- Update arrow 49.0.0 and object_store 0.8.0 #8029 (tustvold)
- feat: impl the basic
string_agg
function #8148 (haohuaijin) - Minor: Make schema of grouping set columns nullable #8248 (markusa380)
- feat: support simplifying BinaryExpr with arbitrary guarantees in GuaranteeRewriter #8256 (wjones127)
- Making stream joins extensible: A new Trait implementation for SHJ #8234 (metesynnada)
- Don't Canonicalize Filesystem Paths in ListingTableUrl / support new external tables for files that do not (yet) exist #8014 (tustvold)
- Minor: Add sql level test for inserting into non-existent directory #8278 (alamb)
- Replace
array_has/array_has_all/array_has_any
macro to remove duplicate code #8263 (Veeupup) - Fix bug in field level metadata matching code #8286 (alamb)
- Refactor Interval Arithmetic Updates #8276 (berkaysynnada)
- [MINOR]: Remove unecessary orderings from the final plan #8289 (mustafasrepo)
- consistent logical & physical
NTILE
return types #8270 (korowa) - make
array_union
/array_except
/array_intersect
handle empty/null arrays rightly #8269 (Veeupup) - improve file path validation when reading parquet #8267 (Weijun-H)
- [Benchmarks] Make
partitions
default to number of cores instead of 2 #8292 (andygrove) - Update prost-build requirement from =0.12.2 to =0.12.3 #8298 (dependabot[bot])
- Fix Display for List #8261 (jayzhan211)
- feat: support customizing column default values for inserting #8283 (jonahgao)
- support
LargeList
forarrow_cast
, supportScalarValue::LargeList
#8290 (Weijun-H) - Minor: remove useless clone based on Clippy #8300 (Weijun-H)
- Calculate ordering equivalence for expressions (rather than just columns) #8281 (mustafasrepo)
- Fix sqllogictests link in contributor-guide/index.md #8314 (qrilka)
- Refactor: Unify
Expr::ScalarFunction
andExpr::ScalarUDF
, introduce unresolved functions by name #8258 (2010YOUY01) - Support no distinct aggregate sum/min/max in
single_distinct_to_group_by
rule #8266 (haohuaijin) - feat:implement sql style 'substr_index' string function #8272 (Syleechan)
- Fixing issues with for timestamp literals #8193 (comphead)
- Projection Pushdown over StreamingTableExec #8299 (berkaysynnada)
- minor: fix documentation #8323 (comphead)
- fix: wrong result of range function #8313 (smallzhongfeng)
- Minor: rename parquet.rs to parquet/mod.rs #8301 (alamb)
- refactor: output ordering #8304 (QuenKar)
- Update substrait requirement from 0.19.0 to 0.20.0 #8339 (dependabot[bot])
- Port tests in
aggregates.rs
to sqllogictest #8316 (edmondop) - Library Guide: Add Using the DataFrame API #8319 (Veeupup)
- Port tests in limit.rs to sqllogictest #8315 (zhangxffff)
- move array function unit_tests to sqllogictest #8332 (Veeupup)
- NTH_VALUE reverse support #8327 (mustafasrepo)
- Optimize Projections during Logical Plan #8340 (mustafasrepo)
- [MINOR]: Move merge projections tests to under optimize projections #8352 (mustafasrepo)
- Add
quote
andescape
attributes to create csv external table #8351 (Asura7969) - Minor: Add DataFrame test #8341 (alamb)
- Minor: clean up the code based on Clippy #8359 (Weijun-H)
- Minor: Make it easier to work with Expr::ScalarFunction #8350 (alamb)
- Minor: Move some datafusion-optimizer::utils down to datafusion-expr::utils #8354 (Jesse-Bakker)
- Minor: Make
BuiltInScalarFunction::alias
a method #8349 (alamb) - Extract parquet statistics to its own module, add tests #8294 (alamb)
- feat:implement sql style 'find_in_set' string function #8328 (Syleechan)
- Support LargeUtf8 to Temporal Coercion #8357 (jayzhan211)
- Refactor aggregate function handling #8358 (Weijun-H)
- Implement Aliases for ScalarUDF #8360 (Veeupup)
- Minor: Remove unnecessary name field in
ScalarFunctionDefintion
#8365 (alamb) - feat: support
LargeList
inarray_empty
#8321 (Weijun-H) - Double type argument for to_timestamp function #8159 (spaydar)
- Support User Defined Table Function #8306 (Veeupup)
- Document timestamp input limits #8369 (comphead)
- fix: make
ntile
work in some corner cases #8371 (haohuaijin) - Minor: Refactor array_union function to use a generic union_arrays function #8381 (Weijun-H)
- Minor: Refactor function argument handling in
ScalarFunctionDefinition
#8387 (Weijun-H) - Materialize dictionaries in group keys #8291 (qrilka)
- Rewrite
array_ndims
to fix List(Null) handling #8320 (jayzhan211) - Docs: Improve the documentation on
ScalarValue
#8378 (alamb) - Avoid concat for
array_replace
#8337 (jayzhan211) - add a summary table to benchmark compare output #8399 (razeghi71)
- Refactors on TreeNode Implementations #8395 (berkaysynnada)
- feat: support
LargeList
inmake_array
andarray_length
#8121 (Weijun-H) - remove
unalias
TableScan filters when create Physical Filter #8404 (jackwener) - Update custom-table-providers.md #8409 (nickpoorman)
- fix transforming
LogicalPlan::Explain
useTreeNode::transform
fails #8400 (haohuaijin) - Docs: Fix
array_except
documentation example error #8407 (Asura7969) - Support named query parameters #8384 (Asura7969)
- Minor: Add installation link to README.md #8389 (Weijun-H)
- Update code comment for the cases of regularized RANGE frame and add tests for ORDER BY cases with RANGE frame #8410 (viirya)
- Minor: Add example with parameters to LogicalPlan #8418 (alamb)
- Minor: Improve
PruningPredicate
documentation #8394 (alamb) - feat: ScalarValue from String #8411 (QuenKar)
- Bump actions/labeler from 4.3.0 to 5.0.0 #8422 (dependabot[bot])
- Update sqlparser requirement from 0.39.0 to 0.40.0 #8338 (dependabot[bot])
- feat: support
LargeList
forarray_has
,array_has_all
andarray_has_any
#8322 (Weijun-H) - Union
schema
can't be a subset of the child schema #8408 (jackwener) - Move
PartitionSearchMode
into datafusion_physical_plan, rename toInputOrderMode
#8364 (alamb) - Make filter selectivity for statistics configurable #8243 (edmondop)
- fix: Changed labeler.yml to latest format #8431 (viirya)
- Minor: Use
ScalarValue::from
impl for strings #8429 (alamb) - Support crossjoin in substrait. #8427 (my-vegetable-has-exploded)
- Fix ambiguous reference when aliasing in combination with
ORDER BY
#8425 (Asura7969) - Minor: convert marcro
list-slice
andslice
to function #8424 (Weijun-H) - Remove macro in iter_to_array for List #8414 (jayzhan211)
- fix: Literal in
ORDER BY
window definition should not be an ordinal referring to relation column #8419 (viirya) - feat: customize column default values for external tables #8415 (jonahgao)
- feat: Support
array_sort
(list_sort
) #8279 (Asura7969) - Bugfix: Remove df-cli specific SQL statment options before executing with DataFusion #8426 (devinjdangelo)
- Detect when filters on unique constraints make subqueries scalar #8312 (Jesse-Bakker)
- Add alias check to optimize projections merge #8438 (mustafasrepo)
- Fix PartialOrd for ScalarValue::List/FixSizeList/LargeList #8253 (jayzhan211)
- Support parquet_metadata for datafusion-cli #8413 (Veeupup)
- Fix bug in optimizing a nested count #8459 (Dandandan)
- Bump actions/setup-python from 4 to 5 #8449 (dependabot[bot])
- fix: ORDER BY window definition should work on null literal #8444 (viirya)
- flx clippy warnings #8455 (waynexia)
- fix: RANGE frame for corner cases with empty ORDER BY clause should be treated as constant sort #8445 (viirya)
- Preserve
dict_id
onField
during serde roundtrip #8457 (avantgardnerio) - feat: support
InterleaveExecNode
in the proto #8460 (liukun4515) - [BUG FIX]: Proper Empty Batch handling in window execution #8466 (mustafasrepo)
- Minor: update
cast
#8458 (Weijun-H) - fix: don't unifies projection if expr is non-trival #8454 (haohuaijin)
- Minor: Add new bloom filter predicate tests #8433 (alamb)
- Add PRIMARY KEY Aggregate support to dataframe API #8356 (mustafasrepo)
- Minor: refactor
data_trunc
to reduce duplicated code #8430 (Weijun-H) - Support array_distinct function. #8268 (my-vegetable-has-exploded)
- Add primary key support to stream table #8467 (mustafasrepo)
- Add
evaluate_demo
andrange_analysis_demo
to Expr examples #8377 (alamb) - Minor: fix function name typo #8473 (Weijun-H)
- Minor: Fix comment typo in table.rs: s/indentical/identical/ #8469 (KeunwooLee-at)
- Remove
define_array_slice
and reusearray_slice
forarray_pop_front/back
#8401 (jayzhan211) - Minor: refactor
trim
to clean up duplicated code #8434 (Weijun-H) - Split
EmptyExec
intoPlaceholderRowExec
#8446 (razeghi71) - Enable non-uniform field type for structs created in DataFusion #8463 (dlovell)
- Minor: Add multi ordering test for array agg order #8439 (jayzhan211)
- Sort filenames when reading parquet to ensure consistent schema #6629 (thomas-k-cameron)
- Minor: Improve comments in EnforceDistribution tests #8474 (alamb)
- fix: support uppercase when parsing
Interval
#8478 (QuenKar) - Better Equivalence (ordering and exact equivalence) Propagation through ProjectionExec #8484 (mustafasrepo)
- Add
today
alias forcurrent_date
#8423 (smallzhongfeng) - Minor: remove useless clone in
array_expression
#8495 (Weijun-H) - fix: incorrect set preserve_partitioning in SortExec #8485 (haohuaijin)
- Explicitly mark parquet for tests in datafusion-common #8497 (Dennis40816)
- Minor/Doc: Clarify DataFrame::write_table Documentation #8519 (devinjdangelo)
- fix: Pull stats in
IdentVisitor
/GraphvizVisitor
only when requested #8514 (vrongmeal) - Change display of RepartitionExec from SortPreservingRepartitionExec to RepartitionExec preserve_order=true #8521 (JacobOgle)
- Fix
DataFrame::cache
errors withPlan("Mismatch between schema and batches")
#8510 (Asura7969) - Minor: update pbjson_dependency #8470 (alamb)
- Minor: Update prost-derive dependency #8471 (alamb)
- Minor/Doc: Add DataFrame::write_table to DataFrame user guide #8527 (devinjdangelo)
- Minor: Add repartition_file.slt end to end test for repartitioning files, and supporting tweaks #8505 (alamb)
- Prepare version 34.0.0 #8508 (andygrove)
- refactor: use ExprBuilder to consume substrait expr and use macro to generate error #8515 (waynexia)
- [MINOR]: Make some slt tests deterministic #8525 (mustafasrepo)
- fix: volatile expressions should not be target of common subexpt elimination #8520 (viirya)
- Minor: Add LakeSoul to the list of Known Users #8536 (xuchen-plus)
- Fix regression with Incorrect results when reading parquet files with different schemas and statistics #8533 (alamb)
- feat: improve string statistics display in datafusion-cli
parquet_metadata
function #8535 (asimsedhain) - Defer file creation to write #8539 (tustvold)
- Minor: Improve error handling in sqllogictest runner #8544 (alamb)