35.0.0 (2024-01-20)
Breaking changes:
- Minor: make SubqueryAlias::try_new take Arc #8542 (sadboy)
- Remove ListingTable and FileScanConfig Unbounded (#8540) #8573 (tustvold)
- Rename
ParamValues::{LIST -> List,MAP -> Map}
#8611 (kawadakk) - Rename
expr::window_function::WindowFunction
toWindowFunctionDefinition
, make structure consistent with ScalarFunction #8382 (edmondop) - Implement
ScalarUDF
in terms ofScalarUDFImpl
trait #8713 (alamb) - Change
ScalarValue::{List, LargeList, FixedSizedList}
to take specific types rather thanArrayRef
#8562 (rspears74) - Remove unused array_expression.rs and
SUPPORTED_ARRAY_TYPES
#8807 (alamb) - Simplify physical expression creation API (not require schema) #8823 (comphead)
- Determine causal window frames to produce early results. #8842 (mustafasrepo)
Implemented enhancements:
- feat: implement Unary Expr in substrait #8534 (waynexia)
- feat: implement Repartition plan in substrait #8526 (waynexia)
- feat: support largelist in array_slice #8561 (Weijun-H)
- feat: support
LargeList
inarray_positions
#8571 (Weijun-H) - feat: support
LargeList
inarray_element
#8570 (Weijun-H) - feat: support
LargeList
inarray_dims
#8592 (Weijun-H) - feat: support
LargeList
inarray_remove
#8595 (Weijun-H) - feat: support inlist in LiteralGurantee for pruning #8654 (my-vegetable-has-exploded)
- feat: support 'LargeList' in
array_pop_front
andarray_pop_back
#8569 (Weijun-H) - feat: support
LargeList
inarray_position
#8714 (Weijun-H) - feat: support
LargeList
inarray_ndims
#8716 (Weijun-H) - feat: remove filters with null constants #8700 (asimsedhain)
- feat: support LargeList in array_repeat #8725 (Weijun-H)
- feat: native types in
DistinctCountAccumulator
for primitive types #8721 (korowa) - feat: support
LargeList
incardinality
#8726 (Weijun-H) - feat: support
largelist
inarray_to_string
#8729 (Weijun-H) - feat: Add bloom filter metric to ParquetExec #8772 (my-vegetable-has-exploded)
- feat: support
array_resize
#8744 (Weijun-H) - feat: add more components to the wasm-pack compatible list #8843 (waynexia)
Fixed bugs:
- fix: make sure CASE WHEN pick first true branch when WHEN clause is true #8477 (haohuaijin)
- fix:
Antarctica/Vostok
tz offset changed in chrono-tz 0.8.5 #8677 (korowa) - fix: struct field don't push down to TableScan #8774 (haohuaijin)
- fix: failed to create ValuesExec with non-nullable schema #8776 (jonahgao)
- fix: fix markdown table in docs #8812 (tshauck)
- fix: don't extract common sub expr in
CASE WHEN
clause #8833 (haohuaijin)
Documentation updates:
- docs: update udf docs for udtf #8546 (tshauck)
- Doc: Clarify When Limit is Pushed Down to TableProvider::Scan #8686 (devinjdangelo)
- Minor: Improve
PruningPredicate
docstrings #8748 (alamb) - Minor: Add documentation about stream cancellation #8747 (alamb)
- docs: add sudo for install commands #8804 (caicancai)
- docs: document SessionConfig #8771 (wjones127)
- Upgrade to object_store
0.9.0
and arrow50.0.0
#8758 (tustvold) - docs: fix wrong pushdown name & a typo #8875 (SteveLauC)
- docs: Update contributor guide with installation instructions #8876 (caicancai)
- docs: fix wrong name in sub-crates' README #8889 (SteveLauC)
- docs: add an example for RecordBatchReceiverStreamBuilder #8888 (SteveLauC)
Merged pull requests:
- Remove order_bys from AggregateExec state #8537 (mustafasrepo)
- Fix count(null) and count(distinct null) #8511 (joroKr21)
- Minor: reduce code duplication in
date_bin_impl
#8528 (Weijun-H) - Add metrics for UnnestExec #8482 (simonvandel)
- Prepare 34.0.0-rc3 #8549 (andygrove)
- fix: make sure CASE WHEN pick first true branch when WHEN clause is true #8477 (haohuaijin)
- Minor: make SubqueryAlias::try_new take Arc #8542 (sadboy)
- Fallback on null empty value in ExprBoundaries::try_from_column #8501 (razeghi71)
- Add test for DataFrame::write_table #8531 (devinjdangelo)
- [MINOR]: Generate empty column at placeholder exec #8553 (mustafasrepo)
- Minor: Remove now dead
SUPPORTED_STRUCT_TYPES
#8480 (alamb) - [MINOR]: Add getter methods to first and last value #8555 (mustafasrepo)
- [MINOR]: Some code changes and a new empty batch guard for SHJ #8557 (metesynnada)
- docs: update udf docs for udtf #8546 (tshauck)
- feat: implement Unary Expr in substrait #8534 (waynexia)
- Fix
compute_record_batch_statistics
wrong withprojection
#8489 (Asura7969) - Minor: Cleanup warning in scalar.rs test #8563 (jayzhan211)
- Minor: move some invariants out of the loop #8564 (haohuaijin)
- feat: implement Repartition plan in substrait #8526 (waynexia)
- Fix sort order aware file group parallelization #8517 (alamb)
- feat: support largelist in array_slice #8561 (Weijun-H)
- minor: fix to support scalars #8559 (comphead)
- refactor:
HashJoinStream
state machine #8538 (korowa) - Remove ListingTable and FileScanConfig Unbounded (#8540) #8573 (tustvold)
- Update substrait requirement from 0.20.0 to 0.21.0 #8574 (dependabot[bot])
- [minor]: Fix rank calculation bug when empty order by is seen #8567 (mustafasrepo)
- Add
LiteralGuarantee
on columns to extract conditions required forPhysicalExpr
expressions to evaluate to true #8437 (alamb) - [MINOR]: Parametrize sort-preservation tests to exercise all situations (unbounded/bounded sources and flag behavior) #8575 (mustafasrepo)
- Minor: Add some comments to scalar_udf example #8576 (alamb)
- Move Coercion for MakeArray to
coerce_arguments_for_signature
and introduce another one for ArrayAppend #8317 (jayzhan211) - feat: support
LargeList
inarray_positions
#8571 (Weijun-H) - feat: support
LargeList
inarray_element
#8570 (Weijun-H) - Increase test coverage for unbounded and bounded cases #8581 (mustafasrepo)
- Port tests in
parquet.rs
to sqllogictest #8560 (hiltontj) - Minor: avoid a copy in Expr::unalias #8588 (alamb)
- Minor: support complex expr as the arg in the ApproxPercentileCont function #8580 (liukun4515)
- Bugfix: Add functional dependency check and aggregate try_new schema #8584 (mustafasrepo)
- Remove GroupByOrderMode #8593 (ozankabak)
- Minor: replace
not-impl-err
inarray_expression
#8589 (Weijun-H) - Substrait insubquery #8363 (tgujar)
- Minor: port last test from parquet.rs #8587 (alamb)
- Minor: consolidate map sqllogictest tests #8550 (alamb)
- feat: support
LargeList
inarray_dims
#8592 (Weijun-H) - Fix regression in regenerating protobuf source #8603 (andygrove)
- Remove unbounded_input from FileSinkOptions #8605 (devinjdangelo)
- Add
arrow_err!
macros, optional backtrace to ArrowError #8586 (comphead) - Add examples of DataFrame::write* methods without S3 dependency #8606 (devinjdangelo)
- Implement logical plan serde for CopyTo #8618 (andygrove)
- Fix InListExpr to return the correct number of rows #8601 (alamb)
- Remove ListingTable single_file option #8604 (devinjdangelo)
- feat: support
LargeList
inarray_remove
#8595 (Weijun-H) - Rename
ParamValues::{LIST -> List,MAP -> Map}
#8611 (kawadakk) - Support binary temporal coercion for Date64 and Timestamp types #8616 (Asura7969)
- Add new configuration item
listing_table_ignore_subdirectory
#8565 (Asura7969) - Optimize the parameter types of
ParamValues
's methods #8613 (kawadakk) - Do not panic on zero placeholders in
ParamValues::get_placeholders_with_values
#8615 (kawadakk) - Fix #8507: Non-null sub-field on nullable struct-field has wrong nullity #8623 (marvinlanhenke)
- Implement
contained
API in PruningPredicate #8440 (alamb) - Add partial serde support for ParquetWriterOptions #8627 (andygrove)
- Minor: add arguments length check in
array_expressions
#8622 (Weijun-H) - Minor: improve dataframe functional dependency tests #8630 (alamb)
- Improve regexp_match performance by avoiding cloning Regex #8631 (viirya)
- Minor: improve
listing_table_ignore_subdirectory
config documentation #8634 (alamb) - Support Writing Arrow files #8608 (devinjdangelo)
- Filter pushdown into cross join #8626 (mustafasrepo)
- [MINOR] Remove duplicate test utility and move one utility function for better organization #8652 (metesynnada)
- [MINOR]: Add new test for filter pushdown into cross join #8648 (mustafasrepo)
- Rewrite bloom filters to use contains API #8442 (alamb)
- Split equivalence code into smaller modules. #8649 (tushushu)
- Move parquet_schema.rs from sql to parquet tests #8644 (alamb)
- Fix group by aliased expression in LogicalPLanBuilder::aggregate #8629 (alamb)
- Refactor
array_union
andarray_intersect
functions to one general function #8516 (Weijun-H) - Minor: avoid extra clone in datafusion-proto::physical_plan #8650 (ongchi)
- Minor: name some constant values in arrow writer, parquet writer #8642 (alamb)
- TreeNode Refactor Part 2 #8653 (berkaysynnada)
- feat: support inlist in LiteralGurantee for pruning #8654 (my-vegetable-has-exploded)
- Streaming CLI support #8651 (berkaysynnada)
- Add serde support for CSV FileTypeWriterOptions #8641 (andygrove)
- Add trait based ScalarUDF API #8578 (alamb)
- Handle ordering of first last aggregation inside aggregator #8662 (mustafasrepo)
- feat: support 'LargeList' in
array_pop_front
andarray_pop_back
#8569 (Weijun-H) - chore: rename ceresdb to apache horaedb #8674 (tanruixiang)
- Minor: clean up code #8671 (Weijun-H)
- fix:
Antarctica/Vostok
tz offset changed in chrono-tz 0.8.5 #8677 (korowa) - Make the BatchSerializer behind Arc to avoid unnecessary struct creation #8666 (metesynnada)
- Implement serde for CSV and Parquet FileSinkExec #8646 (andygrove)
- [pruning] Add shortcut when all units have been pruned #8675 (Ted-Jiang)
- Change first/last implementation to prevent redundant comparisons when data is already sorted #8678 (mustafasrepo)
- minor: remove useless conversion #8684 (comphead)
- refactor: modified
JoinHashMap
build order forHashJoinStream
#8658 (korowa) - Start setting up tpch planning benchmarks #8665 (matthewmturner)
- Doc: Clarify When Limit is Pushed Down to TableProvider::Scan #8686 (devinjdangelo)
- Closes #8502: Parallel NDJSON file reading #8659 (marvinlanhenke)
- Improve
array_prepend
signature for null and empty array #8625 (jayzhan211) - Cleanup TreeNode implementations #8672 (viirya)
- Update sqlparser requirement from 0.40.0 to 0.41.0 #8647 (dependabot[bot])
- Update scalar functions doc for extract/datepart #8682 (Jefffrey)
- Remove DescribeTableStmt in parser in favour of existing functionality from sqlparser-rs #8703 (Jefffrey)
- Simplify
NULL [NOT] IN (..)
expressions #8691 (asimsedhain) - Rename
expr::window_function::WindowFunction
toWindowFunctionDefinition
, make structure consistent with ScalarFunction #8382 (edmondop) - Deprecate duplicate function
LogicalPlan::with_new_inputs
#8707 (viirya) - Minor: refactor bloom filter tests to reduce duplication #8435 (alamb)
- Minor: clean up code based on
Clippy
#8715 (Weijun-H) - Minor: Unbounded Output of AnalyzeExec #8717 (berkaysynnada)
- feat: support
LargeList
inarray_position
#8714 (Weijun-H) - feat: support
LargeList
inarray_ndims
#8716 (Weijun-H) - feat: remove filters with null constants #8700 (asimsedhain)
- support
LargeList
inarray_prepend
andarray_append
#8679 (Weijun-H) - Support for
extract(epoch from date)
for Date32 and Date64 #8695 (Jefffrey) - Implement trait based API for defining WindowUDF #8719 (guojidan)
- Minor: Introduce utils::hash for StructArray #8552 (jayzhan211)
- [CI] Improve windows machine CI test time #8730 (comphead)
- fix guarantees in allways_true of PruningPredicate #8732 (my-vegetable-has-exploded)
- Minor: Avoid memory copy in construct window exprs #8718 (Ted-Jiang)
- feat: support LargeList in array_repeat #8725 (Weijun-H)
- Minor: Ctrl+C Termination in CLI #8739 (berkaysynnada)
- Add support for functional dependency for ROW_NUMBER window function. #8737 (mustafasrepo)
- Minor: reduce code duplication in PruningPredicate test #8441 (alamb)
- feat: native types in
DistinctCountAccumulator
for primitive types #8721 (korowa) - [MINOR]: Add a test case for when target partition is 1, no hash repartition is added to the plan. #8757 (mustafasrepo)
- Minor: Improve
PruningPredicate
docstrings #8748 (alamb) - feat: support
LargeList
incardinality
#8726 (Weijun-H) - Add reproducer for #8738 #8750 (alamb)
- Minor: Use faster check for column name in schema merge #8765 (matthewmturner)
- Minor: Add documentation about stream cancellation #8747 (alamb)
- Move
repartition_file_scans
out ofenable_round_robin
check inEnforceDistribution
rule #8731 (viirya) - Clean internal implementation of WindowUDF #8746 (guojidan)
- feat: support
largelist
inarray_to_string
#8729 (Weijun-H) - [MINOR] CLI error handling on streaming use cases #8761 (metesynnada)
- Convert Binary Operator
StringConcat
to Function forarray_concat
,array_append
andarray_prepend
#8636 (jayzhan211) - Minor: Fix incorrect indices for hashing struct #8775 (jayzhan211)
- Minor: Improve library docs to mention TreeNode, ExprSimplifier, PruningPredicate and cp_solver #8749 (alamb)
- [MINOR] Add logo source files #8762 (andygrove)
- Add Apache attribution to site footer #8760 (alamb)
- ci: speed up win64 test #8728 (Jefffrey)
- Add
schema_err!
error macros with optional backtrace #8620 (comphead) - Fix regression by reverting Materialize dictionaries in group keys #8740 (alamb)
- fix: struct field don't push down to TableScan #8774 (haohuaijin)
- Implement
ScalarUDF
in terms ofScalarUDFImpl
trait #8713 (alamb) - Minor: Fix error messages in array expressions #8781 (Weijun-H)
- Move tests from
expr.rs
to sqllogictests. Part1 #8773 (comphead) - Permit running
sqllogictest
as a rust test in IDEs (+ use clap for sqllogicttest parsing, accept (and ignore) rust test harness arguments) #8288 (alamb) - Minor: Use standard tree walk in Projection Pushdown #8787 (alamb)
- Implement trait based API for define AggregateUDF #8733 (guojidan)
- Minor: Improve
DataFusionError
documentation #8792 (alamb) - fix: failed to create ValuesExec with non-nullable schema #8776 (jonahgao)
- Update substrait requirement from 0.21.0 to 0.22.1 #8796 (dependabot[bot])
- Bump follow-redirects from 1.15.3 to 1.15.4 in /datafusion/wasmtest/datafusion-wasm-app #8798 (dependabot[bot])
- Minor: array_pop_first should be array_pop_front in documentation #8797 (ongchi)
- feat: Add bloom filter metric to ParquetExec #8772 (my-vegetable-has-exploded)
- Add note on using larger row group size #8745 (twitu)
- Change
ScalarValue::{List, LargeList, FixedSizedList}
to take specific types rather thanArrayRef
#8562 (rspears74) - fix: fix markdown table in docs #8812 (tshauck)
- docs: add sudo for install commands #8804 (caicancai)
- Standardize
CompressionTypeVariant
encoding in protobuf #8785 (tushushu) - Make benefits_from_input_partitioning Default in SHJ #8801 (metesynnada)
- refactor: standardize exec_from funcs arg order #8809 (tshauck)
- [Minor] extract const and add doc and more tests for in_list pruning #8815 (Ted-Jiang)
- [MINOR]: Add size check for aggregate #8813 (mustafasrepo)
- Minor: chores: Update clippy in pre-commit.sh #8810 (my-vegetable-has-exploded)
- Cleanup the usage of round-robin repartitioning #8794 (viirya)
- Implement monotonicity for ScalarUDF #8799 (guojidan)
- Remove unused array_expression.rs and
SUPPORTED_ARRAY_TYPES
#8807 (alamb) - feat: support
array_resize
#8744 (Weijun-H) - Minor: typo in
arrays.slt
#8831 (Weijun-H) - docs: document SessionConfig #8771 (wjones127)
- Minor: Improve
datafusion-proto
documentation #8822 (alamb) - [CI] Refactor CI builders #8826 (comphead)
- Serialize function signature simplifications #8802 (metesynnada)
- Port tests in
group_by.rs
to sqllogictest #8834 (hiltontj) - Simplify physical expression creation API (not require schema) #8823 (comphead)
- feat: add more components to the wasm-pack compatible list #8843 (waynexia)
- Port tests in timestamp.rs to sqllogictest. Part 1 #8818 (caicancai)
- Upgrade to object_store
0.9.0
and arrow50.0.0
#8758 (tustvold) - Fix ApproxPercentileCont signature #8825 (joroKr21)
- Minor: Update
with_column_rename
method doc #8858 (comphead) - Minor: Document
parquet_metadata
function #8852 (alamb) - Speedup new_with_metadata by removing sort #8855 (simonvandel)
- Minor: fix wrong function call #8847 (Weijun-H)
- Add options of parquet bloom filter and page index in Session config #8869 (Ted-Jiang)
- Port tests in timestamp.rs to sqllogictest #8859 (caicancai)
- test: Port
order.rs
tests to sqllogictest #8857 (simicd) - Determine causal window frames to produce early results. #8842 (mustafasrepo)
- docs: fix wrong pushdown name & a typo #8875 (SteveLauC)
- fix: don't extract common sub expr in
CASE WHEN
clause #8833 (haohuaijin) - Add "Extended" clickbench queries #8861 (alamb)
- Change cli to propagate error to exit code #8856 (tshauck)
- test: Port tests in
predicates.rs
to sqllogictest #8879 (simicd) - docs: Update contributor guide with installation instructions #8876 (caicancai)
- Minor: add tests for casts between nested
List
andLargeList
#8882 (Weijun-H) - Disable Parallel Parquet Writer by Default, Improve Writing Test Coverage #8854 (devinjdangelo)
- Support for order sensitive
NTH_VALUE
aggregation, make reverseARRAY_AGG
more efficient #8841 (mustafasrepo) - test: Port tests in
csv_files.rs
to sqllogictest #8885 (simicd) - test: Port tests in
references.rs
to sqllogictest #8877 (simicd) - fix bug with
to_timestamp
andInitCap
logical serialization, add roundtrip test between expression and proto, #8868 (Weijun-H) - Support
LargeListArray
scalar values andalign_array_dimensions
#8881 (Weijun-H) - refactor: rename FileStream.file_reader to file_opener & update doc #8883 (SteveLauC)
- docs: fix wrong name in sub-crates' README #8889 (SteveLauC)
- Recursive CTEs: Stage 1 - add config flag #8828 (matthewgapp)
- Support array literal with scalar function #8884 (jayzhan211)
- Bump actions/cache from 3 to 4 #8903 (dependabot[bot])
- Fix
datafusion-cli
print output #8895 (alamb) - docs: add an example for RecordBatchReceiverStreamBuilder #8888 (SteveLauC)
- Fix "Projection references non-aggregate values" by updating
rebase_expr
to usetransform_down
#8890 (wizardxz) - Add serde support for Arrow FileTypeWriterOptions #8850 (tushushu)
- Improve
datafusion-cli
print format tests #8896 (alamb) - Recursive CTEs: Stage 2 - add support for sql -> logical plan generation #8839 (matthewgapp)
- Minor: remove null in
array-append
andarray-prepend
#8901 (Weijun-H) - Add support for FixedSizeList type in
arrow_cast
, hashing #8344 (Weijun-H) - aggregate_statistics should only optimize MIN/MAX when relation is not empty #8914 (viirya)
- support to_timestamp with optional chrono formats #8886 (Omega359)
- Minor: Document third argument of
date_bin
as optional and default value #8912 (alamb) - Minor: distinguish parquet row group pruning type in unit test #8921 (Ted-Jiang)