33.0.0 (2023-11-12)
Breaking changes:
- Refactor Statistics, introduce precision estimates (
Exact
,Inexact
,Absent
) #7793 (berkaysynnada) - Remove redundant unwrap in
ScalarValue::new_primitive
, return aResult
#7830 (maruschin) - Add
parquet
feature flag, enabled by default, and make parquet conditional #7745 (ongchi) - Change input for
to_timestamp
function to be seconds rather than nanoseconds, addto_timestamp_nanos
#7844 (comphead) - Percent Decode URL Paths (#8009) #8012 (tustvold)
- chore: remove panics in datafusion-common::scalar by making more operations return
Result
#7901 (junjunjd) - Combine
Expr::Wildcard
andWxpr::QualifiedWildcard
, addwildcard()
expr fn #8105 (alamb)
Performance related:
- Add distinct union optimization #7788 (maruschin)
- Fix join order for TPCH Q17 & Q18 by improving FilterExec statistics #8126 (andygrove)
- feat: add column statistics into explain #8112 (NGA-TRAN)
Implemented enhancements:
- Support InsertInto Sorted ListingTable #7743 (devinjdangelo)
- External Table Primary key support #7755 (mustafasrepo)
- add interval arithmetic for timestamp types #7758 (mhilton)
- Interval Arithmetic NegativeExpr Support #7804 (berkaysynnada)
- Exactness Indicator of Parameters: Precision #7809 (berkaysynnada)
- Implement GetIndexedField for map-typed columns #7825 (swgillespie)
- Fix precision loss when coercing date_part utf8 argument #7846 (Dandandan)
- Support
Binary
/LargeBinary
-->Utf8
/LargeUtf8
in ilike and string functions #7840 (alamb) - Support Decimal256 on AVG aggregate expression #7853 (viirya)
- Support Decimal256 column in create external table #7866 (viirya)
- Support Decimal256 in Min/Max aggregate expressions #7881 (viirya)
- Implement Hive-Style Partitioned Write Support #7801 (devinjdangelo)
- feat: support
Decimal256
for theabs
function #7904 (jonahgao) - Parallelize Serialization of Columns within Parquet RowGroups #7655 (devinjdangelo)
- feat: Use bloom filter when reading parquet to skip row groups #7821 (hengfeiyang)
- Support Partitioning Data by Dictionary Encoded String Array Types #7896 (devinjdangelo)
- Read only enough bytes to infer Arrow IPC file schema via stream #7962 (Jefffrey)
- feat: Support determining extensions from names like
foo.parquet.snappy
as well asfoo.parquet
#7972 (Weijun-H) - feat: Protobuf serde for Json file sink #8062 (Jefffrey)
- feat: support target table alias in update statement #8080 (jonahgao)
- feat: support UDAF in substrait producer/consumer #8119 (waynexia)
Fixed bugs:
- fix: preserve column qualifier for
DataFrame::with_column
#7792 (jonahgao) - fix: don't push down volatile predicates in projection #7909 (haohuaijin)
- fix: generate logical plan for
UPDATE SET FROM
statement #7984 (jonahgao) - fix: single_distinct_aggretation_to_group_by fail #7997 (haohuaijin)
- fix: clippy warnings from nightly rust 1.75 #8025 (waynexia)
- fix: DataFusion suggests invalid functions #8083 (jonahgao)
- fix: add encode/decode to protobuf encoding #8089 (Syleechan)
Documentation updates:
- Minor: Improve TableProvider document, and add ascii art #7759 (alamb)
- Expose arrow-schema
serde
crate feature flag #7829 (lewiszlw) - doc: fix ExecutionContext to SessionContext in custom-table-providers.md #7903 (ZENOTME)
- Minor: Document
parquet
crate feature #7927 (alamb) - Add some initial content about creating logical plans #7952 (andygrove)
- Minor: Add implementation examples to ExecutionPlan::execute #8013 (tustvold)
- Minor: Improve documentation for Filter Pushdown #8023 (alamb)
- Minor: Improve
ExecutionPlan
documentation #8019 (alamb) - Improve comments for
PartitionSearchMode
struct #8047 (ozankabak) - Prepare 33.0.0 Release #8057 (andygrove)
- Improve documentation for calculate_prune_length method in
SymmetricHashJoin
#8125 (Asura7969) - docs: show creation of DFSchema #8132 (wjones127)
- Improve documentation site to make it easier to find communication on Slack/Discord #8138 (alamb)
Merged pull requests:
- Minor: Improve TableProvider document, and add ascii art #7759 (alamb)
- Prepare 32.0.0 Release #7769 (andygrove)
- Minor: Change all file links to GitHub in document #7768 (ongchi)
- Minor: Improve
PruningPredicate
documentation #7738 (alamb) - Support InsertInto Sorted ListingTable #7743 (devinjdangelo)
- Minor: improve documentation to
stagger_batch
#7754 (alamb) - External Table Primary key support #7755 (mustafasrepo)
- Minor: Build array_array() with ListArray construction instead of ArrayData #7780 (jayzhan211)
- Minor: Remove unnecessary
#[cfg(feature = "avro")]
#7773 (sarutak) - add interval arithmetic for timestamp types #7758 (mhilton)
- Minor: make tests deterministic #7771 (Weijun-H)
- Minor: Improve
Interval
Docs #7782 (alamb) DataSink
additions #7778 (Dandandan)- Update substrait requirement from 0.15.0 to 0.16.0 #7783 (dependabot[bot])
- Move nested union optimization from plan builder to logical optimizer #7695 (maruschin)
- Minor: comments that explain the schema used in simply_expressions #7747 (alamb)
- Update regex-syntax requirement from 0.7.1 to 0.8.0 #7784 (dependabot[bot])
- Minor: Add sql test for
UNION
/UNION ALL
+ plans #7787 (alamb) - fix: preserve column qualifier for
DataFrame::with_column
#7792 (jonahgao) - Interval Arithmetic NegativeExpr Support #7804 (berkaysynnada)
- Exactness Indicator of Parameters: Precision #7809 (berkaysynnada)
- add
LogicalPlanBuilder::join_on
#7805 (haohuaijin) - Fix SortPreservingRepartition with no existing ordering. #7811 (mustafasrepo)
- Update zstd requirement from 0.12 to 0.13 #7806 (dependabot[bot])
- [Minor]: Remove input_schema field from window executor #7810 (mustafasrepo)
- refactor(7181): move streaming_merge() into separate mod from the merge node #7799 (wiedld)
- Improve update error #7777 (lewiszlw)
- Minor: Update LogicalPlan::join_on API, use it more #7814 (alamb)
- Add distinct union optimization #7788 (maruschin)
- Make CI fail on any occurrence of rust-tomlfmt failed #7774 (ongchi)
- Encode all join conditions in a single expression field #7612 (nseekhao)
- Update substrait requirement from 0.16.0 to 0.17.0 #7808 (dependabot[bot])
- Minor: include
sort
expressions inSortPreservingRepartitionExec
explain plan #7796 (alamb) - minor: add more document to Wildcard expr #7822 (waynexia)
- Minor: Move
Monotonicity
toexpr
crate #7820 (2010YOUY01) - Use code block for better formatting of rustdoc for PhysicalGroupBy #7823 (qrilka)
- Update explain plan to show
TopK
operator #7826 (haohuaijin) - Extract ReceiverStreamBuilder #7817 (tustvold)
- Extend backtrace coverage for
DatafusionError::Plan
errors errors #7803 (comphead) - Add documentation and usability for prepared parameters #7785 (alamb)
- Implement GetIndexedField for map-typed columns #7825 (swgillespie)
- Minor: Assert
streaming_merge
has non empty sort exprs #7795 (alamb) - Minor: Upgrade docs for
PhysicalExpr::{propagate_constraints, evaluate_bounds}
#7812 (alamb) - Change ScalarValue::List to store ArrayRef #7629 (jayzhan211)
- [MINOR]:Do not introduce unnecessary repartition when row count is 1. #7832 (mustafasrepo)
- Minor: Add tests for binary / utf8 coercion #7839 (alamb)
- Avoid panics on error while encoding/decoding ListValue::Array as protobuf #7837 (alamb)
- Refactor Statistics, introduce precision estimates (
Exact
,Inexact
,Absent
) #7793 (berkaysynnada) - Remove redundant unwrap in
ScalarValue::new_primitive
, return aResult
#7830 (maruschin) - Fix precision loss when coercing date_part utf8 argument #7846 (Dandandan)
- Add operator section to user guide, Add
std::ops
operations toprelude
, and addnot()
expr_fn #7732 (ongchi) - Expose arrow-schema
serde
crate feature flag #7829 (lewiszlw) - Improve
ContextProvider
naming: renameget_table_provider
-->get_table_source
, deprecateget_table_provider
#7831 (lewiszlw) - DataSink Dynamic Execution Time Demux #7791 (devinjdangelo)
- Add small column on empty projection #7833 (ch-sc)
- feat(7849): coerce TIMESTAMP to TIMESTAMPTZ #7850 (mhilton)
- Support
Binary
/LargeBinary
-->Utf8
/LargeUtf8
in ilike and string functions #7840 (alamb) - Minor: fix typo in comments #7856 (haohuaijin)
- Minor: improve
join
/join_on
docs #7813 (alamb) - Support Decimal256 on AVG aggregate expression #7853 (viirya)
- Minor: fix typo in comments #7861 (alamb)
- Minor: fix typo in GreedyMemoryPool documentation #7864 (avh4)
- Minor: fix multiple typos #7863 (Smoothieewastaken)
- Minor: Fix docstring typos #7873 (alamb)
- Add CursorValues Decoupling Cursor Data from Cursor Position #7855 (tustvold)
- Support Decimal256 column in create external table #7866 (viirya)
- Support Decimal256 in Min/Max aggregate expressions #7881 (viirya)
- Implement Hive-Style Partitioned Write Support #7801 (devinjdangelo)
- Minor: fix config typo #7874 (alamb)
- Add Decimal256 sqllogictests for SUM, MEDIAN and COUNT aggregate expressions #7889 (viirya)
- [test] add fuzz test for topk #7772 (Tangruilin)
- Allow Setting Minimum Parallelism with RowCount Based Demuxer #7841 (devinjdangelo)
- Drop single quotes to make warnings for parquet options not confusing #7902 (qrilka)
- Add multi-column topk fuzz tests #7898 (alamb)
- Change
FileScanConfig.table_partition_cols
from(String, DataType)
toField
s #7890 (NGA-TRAN) - Maintain time zone in
ScalarValue::new_list
#7899 (Dandandan) - [MINOR]: Move joinside struct to common #7908 (mustafasrepo)
- doc: fix ExecutionContext to SessionContext in custom-table-providers.md #7903 (ZENOTME)
- Update arrow 48.0.0 #7854 (tustvold)
- feat: support
Decimal256
for theabs
function #7904 (jonahgao) - [MINOR] Simplify Aggregate, and Projection output_partitioning implementation #7907 (mustafasrepo)
- Bump actions/setup-node from 3 to 4 #7915 (dependabot[bot])
- [Bug Fix]: Fix bug, first last reverse #7914 (mustafasrepo)
- Minor: provide default implementation for ExecutionPlan::statistics #7911 (alamb)
- Update substrait requirement from 0.17.0 to 0.18.0 #7916 (dependabot[bot])
- Minor: Remove unnecessary clone in datafusion_proto #7921 (ongchi)
- [MINOR]: Simplify code, change requirement from PhysicalSortExpr to PhysicalSortRequirement #7913 (mustafasrepo)
- [Minor] Move combine_join util to under equivalence.rs #7917 (mustafasrepo)
- support scan empty projection #7920 (haohuaijin)
- Cleanup logical optimizer rules. #7919 (mustafasrepo)
- Parallelize Serialization of Columns within Parquet RowGroups #7655 (devinjdangelo)
- feat: Use bloom filter when reading parquet to skip row groups #7821 (hengfeiyang)
- fix: don't push down volatile predicates in projection #7909 (haohuaijin)
- Add
parquet
feature flag, enabled by default, and make parquet conditional #7745 (ongchi) - [MINOR]: Simplify enforce_distribution, minor changes #7924 (mustafasrepo)
- Add simple window query to sqllogictest #7928 (Jefffrey)
- ci: upgrade node to version 20 #7918 (crepererum)
- Change input for
to_timestamp
function to be seconds rather than nanoseconds, addto_timestamp_nanos
#7844 (comphead) - Minor: Document
parquet
crate feature #7927 (alamb) - Minor: reduce some
#cfg(feature = "parquet")
#7929 (alamb) - Minor: reduce use of
#cfg(feature = "parquet")
in tests #7930 (alamb) - Fix CI failures on
to_timestamp()
calls #7941 (comphead) - minor: add a datatype casting for the updated value #7922 (jonahgao)
- Minor:add
avro
feature in datafusion-examples to makeavro_sql
run #7946 (haohuaijin) - Add simple exclude all columns test to sqllogictest #7945 (Jefffrey)
- Support Partitioning Data by Dictionary Encoded String Array Types #7896 (devinjdangelo)
- Minor: Remove array() in array_expression #7961 (jayzhan211)
- Minor: simplify update code #7943 (alamb)
- Add some initial content about creating logical plans #7952 (andygrove)
- Minor: Change from
&mut SessionContext
to&SessionContext
in substrait #7965 (my-vegetable-has-exploded) - Fix crate READMEs #7964 (Jefffrey)
- Minor: Improve
HashJoinExec
documentation #7953 (alamb) - chore: clean useless clone baesd on clippy #7973 (Weijun-H)
- Add README.md to
core
,execution
andphysical-plan
crates #7970 (alamb) - Move source repartitioning into
ExecutionPlan::repartition
#7936 (alamb) - minor: fix broken links in README.md #7986 (jonahgao)
- Minor: Upate the
sqllogictest
crate README #7971 (alamb) - Improve MemoryCatalogProvider default impl block placement #7975 (lewiszlw)
- Fix
ScalarValue
handling of NULL values for ListArray #7969 (viirya) - Refactor of Ordering and Prunability Traversals and States #7985 (berkaysynnada)
- Keep output as scalar for scalar function if all inputs are scalar #7967 (viirya)
- Fix crate READMEs for core, execution, physical-plan #7990 (Jefffrey)
- Update sqlparser requirement from 0.38.0 to 0.39.0 #7983 (jackwener)
- Fix panic in multiple distinct aggregates by fixing
ScalarValue::new_list
#7989 (alamb) - Minor: Add
MemoryReservation::consumer
getter #8000 (milenkovicm) - fix: generate logical plan for
UPDATE SET FROM
statement #7984 (jonahgao) - Create temporary files for reading or writing #8005 (smallzhongfeng)
- Minor: fix comment on SortExec::with_fetch method #8011 (westonpace)
- Fix: dataframe_subquery example Optimizer rule
common_sub_expression_eliminate
failed #8016 (smallzhongfeng) - Percent Decode URL Paths (#8009) #8012 (tustvold)
- Minor: Extract common deps into workspace #7982 (lewiszlw)
- minor: change some plan_err to exec_err #7996 (waynexia)
- Minor: error on unsupported RESPECT NULLs syntax #7998 (alamb)
- Break GroupedHashAggregateStream spill batch into smaller chunks #8004 (milenkovicm)
- Minor: Add implementation examples to ExecutionPlan::execute #8013 (tustvold)
- Minor: Extend wrap_into_list_array to accept multiple args #7993 (jayzhan211)
- GroupedHashAggregateStream should register spillable consumer #8002 (milenkovicm)
- fix: single_distinct_aggretation_to_group_by fail #7997 (haohuaijin)
- Read only enough bytes to infer Arrow IPC file schema via stream #7962 (Jefffrey)
- Minor: remove a strange char #8030 (haohuaijin)
- Minor: Improve documentation for Filter Pushdown #8023 (alamb)
- Minor: Improve
ExecutionPlan
documentation #8019 (alamb) - fix: clippy warnings from nightly rust 1.75 #8025 (waynexia)
- Minor: Avoid recomputing compute_array_ndims in align_array_dimensions #7963 (jayzhan211)
- Minor: fix doc and fmt CI check #8037 (alamb)
- Minor: remove uncessary #cfg test #8036 (alamb)
- Minor: Improve documentation for
PartitionStream
andStreamingTableExec
#8035 (alamb) - Combine Equivalence and Ordering equivalence to simplify state #8006 (mustafasrepo)
- Encapsulate
ProjectionMapping
as a struct #8033 (alamb) - Minor: Fix bugs in docs for
to_timestamp
,to_timestamp_seconds
, ... #8040 (alamb) - Improve comments for
PartitionSearchMode
struct #8047 (ozankabak) - General approach for Array replace #8050 (jayzhan211)
- Minor: Remove the irrelevant note from the Expression API doc #8053 (ongchi)
- Minor: Add more documentation about Partitioning #8022 (alamb)
- Minor: improve documentation for IsNotNull, DISTINCT, etc #8052 (alamb)
- Prepare 33.0.0 Release #8057 (andygrove)
- Minor: improve error message by adding types to message #8065 (alamb)
- Minor: Remove redundant BuiltinScalarFunction::supports_zero_argument() #8059 (2010YOUY01)
- Add example to ci #8060 (smallzhongfeng)
- Update substrait requirement from 0.18.0 to 0.19.0 #8076 (dependabot[bot])
- Fix incorrect results in COUNT(*) queries with LIMIT #8049 (msirek)
- feat: Support determining extensions from names like
foo.parquet.snappy
as well asfoo.parquet
#7972 (Weijun-H) - Use FairSpillPool for TaskContext with spillable config #8072 (viirya)
- Minor: Improve HashJoinStream docstrings #8070 (alamb)
- Fixing broken link #8085 (edmondop)
- fix: DataFusion suggests invalid functions #8083 (jonahgao)
- Replace macro with function for
array_repeat
#8071 (jayzhan211) - Minor: remove unnecessary projection in
single_distinct_to_group_by
rule #8061 (haohuaijin) - minor: Remove duplicate version numbers for arrow, object_store, and parquet dependencies #8095 (andygrove)
- fix: add encode/decode to protobuf encoding #8089 (Syleechan)
- feat: Protobuf serde for Json file sink #8062 (Jefffrey)
- Minor: use
Expr::alias
in a few places to make the code more concise #8097 (alamb) - Minor: Cleanup BuiltinScalarFunction::return_type() #8088 (2010YOUY01)
- Update sqllogictest requirement from 0.17.0 to 0.18.0 #8102 (dependabot[bot])
- Projection Pushdown in PhysicalPlan #8073 (berkaysynnada)
- Push limit into aggregation for DISTINCT ... LIMIT queries #8038 (msirek)
- Bug-fix in Filter and Limit statistics #8094 (berkaysynnada)
- feat: support target table alias in update statement #8080 (jonahgao)
- Minor: Simlify downcast functions in cast.rs. #8103 (Weijun-H)
- Fix ArrayAgg schema mismatch issue #8055 (jayzhan211)
- Minor: Support
nulls
inarray_replace
, avoid a copy #8054 (alamb) - Minor: Improve the document format of JoinHashMap #8090 (Asura7969)
- Simplify ProjectionPushdown and make it more general #8109 (alamb)
- Minor: clean up the code regarding clippy #8122 (Weijun-H)
- Support remaining functions in protobuf serialization, add
expr_fn
forStructFunction
#8100 (JacobOgle) - Minor: Cleanup BuiltinScalarFunction's phys-expr creation #8114 (2010YOUY01)
- rewrite
array_append/array_prepend
to remove deplicate codes #8108 (Veeupup) - Implementation of
array_intersect
#8081 (Veeupup) - Minor: fix ci break #8136 (haohuaijin)
- Improve documentation for calculate_prune_length method in
SymmetricHashJoin
#8125 (Asura7969) - Minor: remove duplicated
array_replace
tests #8066 (alamb) - Minor: Fix temporary files created but not deleted during testing #8115 (2010YOUY01)
- chore: remove panics in datafusion-common::scalar by making more operations return
Result
#7901 (junjunjd) - Fix join order for TPCH Q17 & Q18 by improving FilterExec statistics #8126 (andygrove)
- Fix: Do not try and preserve order when there is no order to preserve in RepartitionExec #8127 (alamb)
- feat: add column statistics into explain #8112 (NGA-TRAN)
- Add subtrait support for
IS NULL
andIS NOT NULL
#8093 (tgujar) - Combine
Expr::Wildcard
andWxpr::QualifiedWildcard
, addwildcard()
expr fn #8105 (alamb) - docs: show creation of DFSchema #8132 (wjones127)
- feat: support UDAF in substrait producer/consumer #8119 (waynexia)
- Improve documentation site to make it easier to find communication on Slack/Discord #8138 (alamb)