10.0.0 (2022-07-12)
Breaking changes:
- Convert batch_size to config option #2771 (andygrove)
- MINOR: Remove Offset struct #2734 (andygrove)
- feat: async extension planner #2713 (waynexia)
- Switch to object_store crate (#2489) #2677 (tustvold)
Implemented enhancements:
- update documentation, fix styling to match main Arrow project #2864
- Update top-level README #2850
- [Question]How to call an async function in
ExecutionPlan::exec
method? #2847 - Add
DataFrame::with_column
#2844 - Improve ergonomics of physical expr
lit
#2827 - Add Python examples for reading CSV and query by SQL in Doc #2824
- eliminate multi limit-offset nodes to EmptyRelation if possible #2822
- Make
LogicalPlan::Union
be consistent with other plans #2816 - Use coerced data type from value and list expressions during planning inlist expression #2793
- Add configuration option to enable/disalbe
CoalesceBatchesExec
#2790 - Simplify FilterNullJoinKeys rule #2780
- Allow configuration settings to be specified with environment variables #2776
- Automatically update
configs.md
in user guide #2770 - Support multiple paths for ListingTableScanNode #2768
- Reduce outer joins #2757
- support data type coerced and decimal in INLIST expr #2755
- Change ExtensionPlanner::plan_extension() to an async function #2749
- Add
IsNotNull
filter to join inputs if one side of join condition does not allow null #2739 - Sort preserving MergeJoin #2698
- Improve readability of table scan projections in query plans #2697
- DataFusion 9.0.0 Release #2676
- Improve UX for
UNION
vsUNION ALL
(introduce a LogicalPlan::Distinct) #2573 [sql] - Implement some way to show the sql used to create a view #2529
- Consider adopting IOx ObjectStore abstraction #2489
- Support
sum0
as a built-in agg function #2067 - implement grouping sets, cubes, and rollups #1327
- Ruby bindings #1114
- Support dates in hash join #2746 (andygrove)
Fixed bugs:
- Docker Error #2851
- Anti join ignores join filters #2842
- Can't test or compile sub-model code after upgrade to arrow-rs 17.0.0 #2835
- Not evaluate the set expr in the InList for the optimization #2820
- CASE When: result type should be coercible to a common type #2818
- IN/NOT IN List: NULL is not equal to NULL #2817
- panic when case statement returns null #2798
- InList: Can't cast the list expr data type to value expr data type directly #2774
- InList Expr: expr and list values must can be converted to a same data type #2759
- tpchgen docker syntax change prevents volume from binding #2751
- Cannot join on date columns (Unsupported data type in hasher: Date32) #2744
rewrite_expression
does not properly handleExists
andScalarSubquery
#2736- LocalFileSystem Not sorted by file name, As a result, the data lines queried in multiple files are out of order. #2730
- Filter push down need consider alias columns #2725
- Recent API change in
GlobalLimitExec
breaks compatibility with Ballista #2720 - Common Subexpression Eliminiation pass errors if run twice on some plans: Schema contains duplicate unqualified field name 'IsNull-Column-sys.host' #2712
- The data type is not compatible with other system, for example spark or PG database #1379
Documentation updates:
- Fix docs styling #2865 (kmitchener)
- Various updates to top-level README #2854 (andygrove)
- MINOR: Add documentation for running integration tests #2839 (andygrove)
- add csv registration and sql query to examples #2825 (waitingkuo)
- [minor] refine doc #2753 (Ted-Jiang)
Closed issues:
- Consider adding a prominent note in the readme about ballista #2853
- support decimal in (NULL) #2800
- InList: Don't treat Null as UTF8(None) #2782
- InList: don't need to treat Null as UTF8 data type #2773
- Implement extensible configuration mechanism #138
Merged pull requests:
- Update CONTRIBUTING.md #2876 (waitingkuo)
- Make LogicalPlan::Union be consistent with other plans #2868 (comphead)
- minor: remove unneeded files from project root #2863 (kmitchener)
- chore: make cargo clippy happy in nigtly #2860 [sql] (xudong963)
- Update to arrow 18.0.0 #2856 [sql] (alamb)
- chore: remove ballista-related docker-compose file #2852 (xudong963)
- Adding dataframe with_column function #2849 (comphead)
- anti joins now respect join filters #2843 (andygrove)
- MINOR: make name meaningful and clean up code #2841 (liukun4515)
- Make
lit
implementation more concise #2838 (alamb) - InList: set/list value must be evaluated to get the values #2834 (liukun4515)
- Add SHOW CREATE TABLE with initial support for views #2830 [sql] (mrob95)
- Improve ergonomics of physical expr
lit
#2828 (alamb) - Eliminate multi limit-offset nodes to emptyRelation #2823 (AssHero)
- Fix the ci #2821 (liukun4515)
- CaseWhen: coerce the all then and else data type to a common data type #2819 (liukun4515)
- Fix
ScalarValue::isNull
calculation #2815 (alamb) - Fix nullability calculation for
CASE
expressions #2814 (alamb) - Bump numpy from 1.21.3 to 1.22.0 in /integration-tests #2811 (xudong963)
- Fix data type calculation for
CaseExpr
s withNULLs
#2810 (AssHero) - InList: fix bug for comparing with Null in the list using the set optimization #2809 (liukun4515)
- Use specialized dictionary kernels (#1178) #2808 (tustvold)
- fix schema nullability for
information_schema
schema #2804 (alamb) - fix: correctly calculate join output schema nullability #2803 (alamb)
- Correct schema nullability declaration in tests #2802 (alamb)
- Don't treat Null as UTF8(None) and change error info. #2801 (liukun4515)
- MINOR: Remove reference to docker image that is no longer available #2795 (andygrove)
- Use coerced type in inlist expr planning #2794 (viirya)
- Add LogicalPlan::Distinct #2792 [sql] (mrob95)
- Add config option for coalesce_batches physical optimization rule, make optional #2791 (andygrove)
- Improve readability of table scan projections in query plans (remove
Some
andNone
) #2789 [sql] (comphead) - Simplify FilterNullJoinKeys rule #2781 (andygrove)
- MINOR: re-export sqlparser from datafusion-sql crate #2779 [sql] (andygrove)
- Update to arrow 17.0.0 #2778 [sql] (alamb)
- Support multiple paths for ListingTableScanNode #2775 (Ted-Jiang)
- Remove expr_sub_expressions and rewrite_expression functions #2772 (mrob95)
- minor: update cranelift related dependencies #2769 (xudong963)
- minor: panic rather than fail silently on bad dictionary in hash join #2767 (alamb)
- MINOR: make
prettier
use consistent between CI and contributing guide #2766 (andygrove) - Rewrite subexpressions of InSubquery in rewrite_expression #2765 (mrob95)
- Support
DataType::Decimal
forIN
andNOT IN
expressions #2764 (liukun4515) - Implement extensible configuration mechanism #2754 (andygrove)
- Remove redundant docker argument #2752 (avantgardnerio)
- Add optimizer pass to reduce
left
/right
/full
joins toinner
join if possible #2750 [sql] (AssHero) - MINOR: Remove legacy CLI context enum #2748 (andygrove)
- CSE unit test for duplicate fields #2747 (waynexia)
- MINOR: Improve unsupported data type error message #2745 (andygrove)
- Add optimizer rule to filter out null keys before a join #2740 (andygrove)
- Sort file names in a directory #2730 #2735 (yourenawo)
- fix: filter push down with
InList
expressions #2729 (Ted-Jiang) - [Minor] add debug info in optimizer.rs #2726 (Ted-Jiang)
- Add public API for GlobalLimitExec and LocalLimitExec #2722 (andygrove)
- Add additional data types are supported in hash join #2721 (AssHero)
- Upgrade to arrow
16.0.0
#2718 [sql] (alamb) - Fix clippy warnings with toolchain 1.63 #2717 [sql] (waynexia)
- Support for GROUPING SETS/CUBE/ROLLUP #2716 (thinkharderdev)
- fix: check redundant fields while building projection plan #2715 (waynexia)
- Sort preserving
SortMergeJoin
#2699 (korowa) - fix: union schema fix #2688 [sql] (gandronchik)
- Support default precision and scale to
CAST <EXPR> AS DECIMAL
#2680 [sql] (gandronchik)