Skip to content

Latest commit

 

History

History
347 lines (331 loc) · 45.4 KB

14.0.0.md

File metadata and controls

347 lines (331 loc) · 45.4 KB

14.0.0 (2022-11-04)

Full Changelog

Breaking changes:

  • Improve FieldNotFound errors #4084 [sql] (andygrove)
  • Refactor: move simplify_expression.rs and expr_simplifier.rs to a new mod simplify_expressions #3951 (HaoYang670)
  • Support for non-u64 types for Window Bound #3916 [sql] (mustafasrepo)
  • Expose parquet reader settings using normal DataFusion ConfigOptions #3822 (alamb)
  • Add Filter::try_new with validation #3796 [sql] (andygrove)
  • Change public simplify API and add a public coerce API #3758 (alamb)

Implemented enhancements:

  • Automatically register tables if ObjectStore root is configured #4094
  • Simplify small InList expressions #4089
  • Support SET command #4067
  • add uuid() function to generate unique uuid per row #4045
  • Publish benchmark crate so that it can be used as a library in Ballista #4016
  • Add statistics methods to TableProvider trait for use in cost-based optimizations in the logical plan #3983
  • Implement current_time Function #3982
  • Implement current_date Function #3981
  • Put common code used for testing code into datafusion/test_utils.rs #3960
  • Print the configurations of ConfigOptions in an ordered way so that we can directly compare the equality of two ConfigOptions by their debug strings #3952
  • Don't make dependants install protoc #3947
  • Implement right anti join and support it in HashBuildProbeOrder #3946
  • Implement right semi join and support it in HashBuildProbeOrder #3945
  • Refactor simplify_expressions and expr_simplifier #3934
  • Implement serialization for ScalarValue::FixedSizeBinary #3928
  • Support inlining view / dataframes logical plan #3913
  • Plans with tables from TableProviderFactorys can't be serialized #3906
  • Simplify a AND a and a OR a. #3895
  • Allow configuring statistics on TPC-H benchmarks #3888
  • CI checks stuck in queued mode #3883
  • Multiple optimizer passes #3879
  • datafusion-proto does not support view table scan #3874
  • TableProviderFactories need to be async and return a Result to be useful #3866
  • Factorize common AND factors out of OR predicates to support filterPushDown as possible #3858
  • Replace concat_ws with concat when the delimiter is empty string #3857
  • Concatenate contiguous literal arguments of concat_ws when doing the expression simplification #3856
  • Partition and Sort Enforcement #3854
  • Enable mimalloc by default in benchmarks #3851
  • Add collect statistics configuration #3847
  • [SQL] - Support cache/uncache table syntax #3842
  • Filter pushdown doesn't seem to apply for filter on TPC-H Q17 #3839
  • Support pushdown multi-columns in PageIndex pruning. #3834
  • Consolidate Expr manipulation code so it is more discoverable and make it easier to use #3808
  • Leverage input array's null buffer for regex replace to optimize sparse arrays #3803
  • Improve join cardinality estimation when there is no overlap in the min/max values #3802
  • datafusion-cli up to date check is failing on master #3798
  • Optimize benchmark q2 subquery filter #3789
  • Benchmark should infer schema when running against Parquet #3776
  • Allow specialized physical functions to provide hints for the array adapter #3762
  • [User Guide] Add EXPLAIN to SQL reference #3755
  • move type coercion for agg/agg udf #3752
  • Prevent Cargo.lock for datafusion-cli being out-of-date #3744
  • Add example of expr apis including simplification and coercion #3740
  • support type coercion for ScalarFunction expr in the logical phase #3731
  • Add support for DISTINCT projections in decorrelate_where_exists #3724
  • Add type coercion rule for CONCAT and CONCAT_WS #3720
  • Expose and document a simpler public API for simplify expressions #3709
  • Expose + document the type coercion API publicly #3708
  • Concatenate contiguous literal arguments of CONCAT during the expression simplification. #3683
  • DataFusion 13.0.0 Release #3671
  • Add division by 0 rules in the expression simplification #3663
  • Compressed CSV/JSON Read #3641
  • remove type coercion for agg #3623
  • extract or clause as predicate for join rels #3577
  • Improve performance of regex_replace #3518
  • Add benchmarks for parquet queries with filter pushdown enabled #3457
  • Make type coercion rule more robust #3390
  • ViewTable::scan ignores filters and limits #3249
  • Add CREATE VIEW documentation to user guide #3211
  • Push additional parquet filtering into the parquet scan [EPIC] #3147
  • Remove core/logical_plan module #2683
  • Datafusion Optimizer Enhancement #2255
  • [Optimizer] Eliminate self compare self #2252
  • Break datafusion crate into smaller crates #1750
  • Benchmark constellation-rs/amadeus's parquet implementation #1341
  • Use parquet2 async reader in physical_plan/parquet #1058
  • Table Scan Enhancement Plan #944
  • Implement parquet page-level skipping with column index, using min/max stats #847
  • Support min/max statistics in ParquetTable and ParquetExec #537

Fixed bugs:

  • Clippy failing on master #4100
  • Panic when the number of partitions of the pipeline that throws the exception is inconsistent with the number of partitions output by the query #4096
  • FieldNotFound when field is available #4083
  • SingleDistinctToGroupBy being applied too broadly #4082
  • single_distinct_to_groupby strips qualifiers from group-by expressions #4049
  • Another Internal error when parquet predicate pushdown is enabled "Error evaluating filter predicate: #4046
  • Decimal multiplied by Float produces incorrect results #4035
  • Cannot query external table - TableScan replaced with EmptyExec #4027
  • benchmark q17 produces incorrect result #4026
  • benchmark q14 produces incorrect result #4025
  • benchmark q11 producing incorrect results #4023
  • Internal error when parquet predicate pushdown is enabled "Error evaluating filter predicate:" #4006
  • Incorrect results with parquet filtering pushdown enabled #4005
  • Wrong results when parquet page index filtering is enabled #4002
  • Output schema of semi join has invalid projection added after HashBuildProbeOrder #4001
  • async deserialization functions are unintuitive and possibly insecure #3977
  • Expr::to_bytes can produce output that hits Expr::from_bytes recursion limit #3968
  • Bug on propagating arrow field metadata #3964
  • Predicate still has cast when comparing Timestamp(Nano, None) to a timestamp literal, so can't be pushed down or used for pruning #3938
  • Error using IN list on dictionary encoded data: InList does not support datatype Dictionary(Int32, Utf8). #3936
  • Internal error in CAST from Timestamp[us] #3922
  • ScalarValue not implemented for FixedSizeBinary types #3910
  • [DOC] - There are unsupported DDL in the official documentation #3904
  • datafusion-proto deserialize with Substring(str [from int] [for int]) fails #3901
  • count(Literal) gives wrong column name #3891
  • projection_push_down adds duplicate projections with multiple passes #3881
  • Default physical planner generates empty relation for DROP TABLE, CREATE MEMORY TABLE, etc #3873
  • Binary expression canonical names are incorrect in some cases #3865
  • Using the window function lag causes panic. #3830
  • chrono crate : specify 0.4.22 as the minimum version due to spurious build failures #3827
  • datafusion-proto deserialize with q16 sql fails #3820
  • Filter predicates should not be aliased #3795
  • Write csv not save all lines of dataframe #3783
  • Regression in simplifying expressions in subqueries #3760
  • DataFusionError(Internal("The size of the sorted batch is larger than the size of the input batch: 2120 > 2312")) #3747
  • "labeler" PR check is broken #3743
  • DataFrame::select_columns doesn't work with names containing "." #3733
  • TPC-H Query 1 has regressed #3729
  • [RUST][Datafusion] What causes "Error: Execution("file size of 4 is less than footer")" error? #3800
  • Field names containing periods such as f.c cannot work #3682
  • TableProvider implementation for DataFrame does not support filter pushdown #3681
  • using Decimal(0) make system panicked #3665
  • Cannot query some parquet files in S3, but they work locally #3633
  • col / col returns 1 when col = 0 #3615
  • register_csv allow space in table_path #3589
  • Hardcoded u64 for WindowFrameBound fields #3571
  • docs.rs cannot build datafusion-proto crate #3538
  • Row Hash loads whole aggregation state to memory before sending #3460
  • approx_percentile_cont return wrong result when scan multi parquet files. #3140
  • User guide is incorrect regarding using CLI to register CSV files using schema inference #3001
  • Exception: Internal error, Exception: Schema error #2938
  • Version 0.6.0 Panic error during SQL execution #2738
  • wrong result when operation parquet #2044
  • Local object store accepts file:/// as base path, but LocalStore returns meta without the prefix. #1923
  • Reading nested parquet files results in index out of bounds #1383
  • - (negation) with NULL literals does not work: can't be evaluated because the expression's type is Utf8, not signed #1192
  • Inconsistent cast behavior #957
  • single_distinct_to_groupby no longer drops qualifiers #4050 [sql] (andygrove)

Documentation updates:

  • Clarify in docs that Identifiers are made lower-case in SQL query #2374
  • Fix broken links in contributor guide #3956 (Jefffrey)
  • add create view explanation #3925 (retikulum)
  • Update datafusion-examples README #3814 (alamb)
  • Add Seafowl to list of projects using DataFusion #3792 (mildbyte)

Closed issues:

  • [QUESTION] How many times should be the function create_name called when executing a query? #3900
  • Improve the Expr string format #3878
  • Simplify division by zero (division by one / multiplication by zero / multiplication by one) for Decimal types as well #3643
  • InList: merge check branch #2833
  • Optimization InList: compare the float data type using OrderedFloat<T> #2831
  • Outdated section of the add function of the contribution guide #2560
  • Optimize InList implementation with native types rather than ScalarValue #2165
  • Improve testing of optimizers using EXPLAIN #1118
  • Crash on parsing sql query with Cyrillic letters #184
  • [EPIC] Support all TPC-H queries in benchmark #158
  • Implement optional second argument to ltrim and rtrim functions #144
  • Benchmark crate does not have a SIMD feature #124
  • ColumnarValue::into_array should not require batch #113
  • [Rust] Parquet data source does not support complex types #83

Merged pull requests:

  • Appease new clippy #4101 (alamb)
  • minor: Split parquet reader up into smaller modules #4099 (alamb)
  • [MINOR] Update SET in cli.md #4098 (waitingkuo)
  • fix: Scheduler panic routing errors #4097 (yukkit)
  • Automatically register tables if ObjectStore root is configured #4095 (avantgardnerio)
  • minor: Use Operator::swap #4092 (alamb)
  • Simplify small InListExpr #4090 (Dandandan)
  • Minor: Add arrow-rs ticket reference and turn some comments into docstrings #4088 (alamb)
  • Support Dictionary in InListExpr #4070 (tustvold)
  • support SET variable #4069 [sql] (waitingkuo)
  • Add in list bench #4068 (tustvold)
  • Improve Error Handling and Readibility for downcasting StructArray #4061 (retikulum)
  • Build tests separately from running #4060 (alamb)
  • Simplify InListExpr ~20-70% Faster #4057 (tustvold)
  • MINOR: Print unoptimized logical plan in execute_query of tpch benchmark #4056 (viirya)
  • Minor: clean the code in eliminate_filter #4055 (HaoYang670)
  • Implement current_time scalar function #4054 (naosense)
  • Cleanup hash_utils adding support for decimal256 and f16 #4053 (tustvold)
  • Fix multicolumn parquet predicate pushdown (#4046) #4048 (tustvold)
  • Add CI checks that we can serde all benchmark queries #4047 (andygrove)
  • Enable more benchmark verification tests #4044 (andygrove)
  • Extract common parquet testing code to parquet-test-util crate #4042 (alamb)
  • add uuid() function #4041 (Jimexist)
  • Update to arrow 26, change timezones #4039 [sql] (tustvold)
  • Fix Decimal and Floating type coerce rule #4038 (viirya)
  • Reserve the literal expression of Count function #4031 [sql] (HaoYang670)
  • Implement current_date scalar function #4022 (comphead)
  • Fix predicate pushdown bugs: project columns within DatafusionArrowPredicate (#4005) (#4006) #4021 (tustvold)
  • minor: remove redundant code/TODO #4019 (jackwener)
  • Add CI check to verify that benchmark queries return the expected results #4015 (andygrove)
  • Minor: Add TODO and tracking ticket reference #4012 (alamb)
  • Add right anti join support and support it in HashBuildProbeOrder #4011 (Dandandan)
  • MINOR: Generate expected benchmark query results #4010 (andygrove)
  • Minor: remove unecessary clippy allow #4008 (alamb)
  • Minor: Do what clippy says and clean up some code #4007 (alamb)
  • Improve Error Handling and Readibility for downcasting Date32Array #4004 (retikulum)
  • Don't add projection for semi joins in HashBuildProbeOrder #4000 (Dandandan)
  • Minor: use DataType::is_nested #3995 (alamb)
  • [minor] bump prettier version #3992 (Jimexist)
  • Add parquet predicate pushdown metrics #3989 (alamb)
  • Pin datafusion-proto build dependencies #3987 (tustvold)
  • Add TableProvider.statistics method #3986 (andygrove)
  • Add Pull Request guidelines to contributor guide #3985 (alamb)
  • Update protos #3979 (tustvold)
  • Revert async changes but keep deltalake working #3978 (avantgardnerio)
  • Correctness integration test for parquet filter pushdown #3976 (alamb)
  • MINOR: Stop pretty printing batches in benchmark when there are no results #3974 (andygrove)
  • MINOR: Re-export Cast struct #3971 (andygrove)
  • fix: check recursion limit in Expr::to_bytes #3970 (crepererum)
  • [Part1] Partition and Sort Enforcement, PhysicalExpr enhancement #3969 (mingmwang)
  • Support pushdown multi-columns in PageIndex pruning. #3967 (Ted-Jiang)
  • Fix benchmarks README formatting #3966 (Jefffrey)
  • Bug fix on DFField to Field conversion: preserve metadata #3965 (metesynnada)
  • Informative Error Message for LAG and LEAD functions #3963 (mustafasrepo)
  • Minor: Add some docstrings to FileScanConfig and RuntimeEnv #3962 (alamb)
  • Move common code used for testing code into datafusion/test_utils #3961 (alamb)
  • Update minimum chrono dependency to 0.4.22 #3959 (alamb)
  • Implement right semi join and support in HashBuildProbeorder #3958 (Dandandan)
  • Print the configurations of ConfigOptions in an ordered way so that we can directly compare the equality of two ConfigOptions by their debug strings #3953 (yahoNanJing)
  • Vendor Generated Protobuf Code (#3947) #3950 (tustvold)
  • Implement serialization for ScalarValue::FixedSizeBinary #3943 (retikulum)
  • Consolidate physical join code into datafusion/core/src/physical_plan/joins #3942 (alamb)
  • Add optimizer test for simplifying predicates on timestamps #3939 (alamb)
  • Add test for querying predicate on dictionary #3937 (alamb)
  • fix: return error for unsupported SQL #3933 (Kikkon)
  • doc: fix doc about CREATE TABLE IF NOT EXISTS #3932 (jackwener)
  • Refactor Expr::Cast to use a struct. #3931 [sql] (jackwener)
  • minor: fix some typo. #3930 (jackwener)
  • chore: update cranelift-related dependencies #3926 (xudong963)
  • Change cast error from Internal to NotImplemented #3924 (alamb)
  • Support inlining view / dataframes logical plan #3923 (Dandandan)
  • Add test for Simplify redundant predicates #3915 (src255)
  • Implement ScalarValue for FixedSizeBinary #3911 (maxburke)
  • Add serde for plans with tables from TableProviderFactorys #3907 (avantgardnerio)
  • Support filter/limit pushdown for views/dataframes #3905 (Dandandan)
  • Factorize common AND factors out of OR predicates to support filterPu… #3903 (Ted-Jiang)
  • Add Substring(str [from int] [for int]) support in datafusion-proto #3902 (r4ntix)
  • Revert "Factorize common AND factors out of OR predicates to supportfilter Pu… (#3859)" #3897 (alamb)
  • MINOR: Add notes on Apache Reporter #3893 (andygrove)
  • Allow configuring collection of statistics during TPC-H benchmarks #3889 (isidentical)
  • Improve formatting of binary expressions #3884 [sql] (andygrove)
  • Multiple optimizer passes #3880 (andygrove)
  • [MINOR] Update docs with newly added configuration values #3877 (alamb)
  • [MINOR] Add a hint about how to resolve the Cargo.lock CI check #3876 (alamb)
  • Add LogicalPlan::ViewTable support in datafusion-proto #3875 (r4ntix)
  • Optimize the concat_ws function #3869 (HaoYang670)
  • Implement foundational filter selectivity analysis #3868 (isidentical)
  • Update TableProviderFactory trait to support real-world use-cases #3867 (avantgardnerio)
  • put subquery's equal clause into join on clauses instead of filter cl… #3862 (AssHero)
  • Factorize common AND factors out of OR predicates to support filterPu… #3859 (Ted-Jiang)
  • Enable mimalloc by default in benchmark #3853 (Dandandan)
  • Refactor Expr::Between to use a struct #3850 [sql] (b41sh)
  • Handle cardinality estimation for disjoint inner and outer joins #3848 (isidentical)
  • Add setting for statistics collection #3846 (Dandandan)
  • Update to arrow 25.0.0 #3844 [sql] (tustvold)
  • Tweak list of optimization rules #3841 (Dandandan)
  • Refactor Expr::GetIndexedField to use a struct #3838 [sql] (ygf11)
  • Infer the count of maximum distinct values from min/max #3837 (isidentical)
  • Refactor Expr::Like, Expr::ILike, Expr::SimilarTo to use a struct #3836 [sql] (b41sh)
  • Refactor Expr::BinaryExpr to use a struct #3835 [sql] (zhoudongyan)
  • update postgres version to 15 in integration test #3831 (Jimexist)
  • Fix the panic when lpad/rpad parameter is negative #3829 (ZuoTiJia)
  • MINOR: Document SHOW ALL in the users guide #3826 (alamb)
  • MINOR: Add datafusion-cli documentation on showing configuration #3825 (alamb)
  • Add/Remove Division Rules #3824 (retikulum)
  • Minor: Sort the output of SHOW ALL by config name #3823 [sql] (alamb)
  • Add precision != 0 check when making decimal type #3818 [sql] (HaoYang670)
  • Infer schema when running benchmarks against parquet #3817 (andygrove)
  • Finish removing deprecated datafusion::logical_plan module #3816 (andygrove)
  • Clarify initial example with respect to capitalization #3815 (alamb)
  • Improve expression simplification by running it twice #3811 (alamb)
  • Make expression manipulation consistent and easier to use: combine/split filter conjunction, etc #3810 (alamb)
  • Consolidate expression manipulation functions into datafusion_optimizer #3809 (alamb)
  • Optimize regexp_replace when the input is a sparse array #3804 (isidentical)
  • Stop ignoring errors when writing DataFrame to csv, parquet, json #3801 (andygrove)
  • Update datafusion-cli Cargo.lock to fix CI check on master #3799 (alamb)
  • MINOR: Benchmark regression tests #3790 (andygrove)
  • MINOR: Optimizer example and docs, deprecate Expr::name #3788 (andygrove)
  • Join cardinality computation for cost-based nested join optimizations #3787 (isidentical)
  • Optimizer now simplifies multiplication, division, module arg is a literal Decimal zero or one #3782 (drrtuy)
  • Implement parquet page-level skipping with column index, using min/ma… #3780 (Ted-Jiang)
  • Bump actions/labeler from 4.0.1 to 4.0.2 #3779 (dependabot[bot])
  • MINOR: correct ListingOptions.try_new docs to include the enabled stat collection #3775 (isidentical)
  • Teach a negative NULL expression to return NULL instead of an error #3771 (drrtuy)
  • Add benchmarks for testing row filtering #3769 (thinkharderdev)
  • move type coercion of agg and agg_udaf to logical phase #3768 (liukun4515)
  • User Guide: Add EXPLAIN to SQL reference #3767 (unvalley)
  • Allow specialized implementations to produce hints for the array adapter #3765 (isidentical)
  • Fix optimizer regression with simplifying expressions in subquery filters #3764 (andygrove)
  • Run all datafusion-examples in CI tests #3761 (alamb)
  • MINOR: Remove deprecated module datafusion::logical_plan::plan #3759 (andygrove)
  • Refactor Expr::Case to use a struct #3757 [sql] (andygrove)
  • Do not run labeler CI check if it would fail due to permissions #3756 (alamb)
  • MINOR: Improvements to scalar_subquery_to_join error handling #3754 (andygrove)
  • Always track the final size of the in-mem sorted arrays #3753 (isidentical)
  • Fix DataFrame::select_columns to handle column names with a period #3751 (zhoudongyan)
  • Fix ListingTableUrl to decode percent #3750 (unvalley)
  • remove type coercion for physical ScalarFunction #3749 (liukun4515)
  • CI: Add a new run to check whether datafusion-cli lock file is up-to-date #3745 (isidentical)
  • Add datafusion example of expression apis #3741 (alamb)
  • fix subquery where exists distinct #3732 (b41sh)
  • Remove some uneeded code in CommonSubexprEliminate #3730 (alamb)
  • Consolidate and better tests for expression re-rewriting / aliasing #3727 (alamb)
  • Fix output schema generated by CommonSubExprEliminate #3726 (alex-natzka)
  • Add type coercion rule for concat and concat_ws #3721 (HaoYang670)
  • Expose and document a simpler public API for simplify expressions #3719 (ygf11)
  • Remove dead code in UnwrapCastExprRewriter that may mask errors #3703 (alamb)
  • Fix DataFrame::with_column to handle creating column names with a period #3700 (alamb)
  • Add simplification rules for the CONCAT function #3684 (HaoYang670)
  • Compressed CSV/JSON support #3642 [sql] (Licht-T)
  • Simplify serialization by removing redundant PrimitiveScalarValue #3612 (alamb)
  • Pushdown single column predicates from ON join clauses #3578 (AssHero)
  • Simplify the serialization of ScalarValue::List #3547 (alamb)
  • Generate hash aggregation output in smaller record batches #3461 (milenkovicm)
  • Improve doc on lowercase treatment of columns on SQL #3385 (nanicpc)