Releases: StarRocks/starrocks
3.2.2
3.0.9
Release date: January 2, 2024
New features
- Added the percentile_disc function. #36352
- Added a new metric
max_tablet_rowset_num
for setting the maximum allowed number of rowsets. This metric helps detect possible compaction issues and thus reduces the occurrences of the error "too many versions". #36539
Improvements
- A new value option
GROUP_CONCAT_LEGACY
is added to the session variable sql_mode to provide compatibility with the implementation logic of the group_concat function in versions earlier than v2.5. #36150 - When using JDK, the default GC algorithm is changed to G1. #37386
- The
be_tablets
view in theinformation_schema
database provides a new fieldINDEX_DISK
, which records the disk usage (measured in bytes) of persistent indexes #35615 - Queries on MySQL external tables and the external tables within JDBC catalogs support including keywords in the WHERE clause. #35917
- Supports updates onto the specified partitions of an automatically partitioned table. If the specified partitions do not exist, an error is returned. #34777
- The Primary Key table size returned by the SHOW DATA statement includes the sizes of .cols files (these are files related to partial column updates and generated columns) and persistent index files. #34898
- Optimized the performance of persistent index update when compaction is performed on all rowsets of a Primary Key table, which reduces disk read I/O. #36819
- When the string on the right side of the LIKE operator within the WHERE clause does not include
%
or_
, the LIKE operator is converted into the=
operator. #37515 - Optimized the logic used to compute compaction scores for Primary Key tables, thereby aligning the compaction scores for Primary Key tables within a more consistent range with the other three table types. #36534
- The result returned by the SHOW ROUTINE LOAD statement now includes the timestamps of consumption messages from each partition. #36222
- Optimized the performance of some Bitmap-related operations, including:
Compatibility Changes
Behavior Change
- Added the session variable
enable_materialized_view_for_insert
, which controls whether materialized views rewrite the queries in INSERT INTO SELECT statements. The default value isfalse
. #37505 - Changed the FE configuration item
enable_new_publish_mechanism
to a static parameter from a dynamic one. You must restart the FE after you modify the parameter settings. #35338 - The default retention period of trash files is changed to 1 day from the original 3 days. #37113
Parameters
Session variables
- Added session variable
cbo_decimal_cast_string_strict
, which controls how the CBO converts data from the DECIMAL type to the STRING type. If this variable is set totrue
, the logic built in v2.5.x and later versions prevails and the system implements strict conversion (namely, the system truncates the generated string and fills 0s based on the scale length). If this variable is set tofalse
, the logic built in versions earlier than v2.5.x prevails and the system processes all valid digits to generate a string. The default value istrue
. #34208 - Added session variables
transaction_read_only
andtx_read_only
to specify the transaction access mode, which are compatible with MySQL versions 5.7.20 and above. #37249
FE configurations
- Added the FE configuration item
routine_load_unstable_threshold_second
. #36222 - Added the FE configuration item
http_worker_threads_num
, which specifies the number of threads for HTTP server to deal with HTTP requests. The default value is0
. If the value for this parameter is set to a negative value or 0, the actual thread number is twice the number of CPU cores. #37530 - Added the FE configuration item
default_mv_refresh_immediate
, which specifies whether to immediately refresh the materialized view after the materialized view is created. The default value istrue
. #37093
BE configurations
- Added the BE configuration item
enable_stream_load_verbose_log
. The default value isfalse
. With this parameter set totrue
, StarRocks can record the HTTP requests and responses for Stream Load jobs, making troubleshooting easier. #36113 - Added the BE configuration item
pindex_major_compaction_limit_per_disk
to configure the maximum concurrency of compaction on a disk. This addresses the issue of uneven I/O across disks due to compaction. This issue can cause excessively high I/O for certain disks. The default value is2
. #36681 - Added BE configuration items to specify the timeout duration for connecting to object storage:
object_storage_connect_timeout_ms
: Timeout duration to establish socket connections with object storage. The default value is-1
, which means to use the default timeout duration of the SDK configurations.object_storage_request_timeout_ms
: Timeout duration to establish HTTP connections with object storage. The default value is-1
, which means to use the default timeout duration of the SDK configurations.
Bug Fixes
Fixed the following issues:
- In some cases, BEs may crash when a Catalog is used to read ORC external tables. #27971
- The BEs crash if users create persistent indexes in the event of data corruption. #30841
- BEs occasionally crash after a Bitmap index is added. #26463
- Failures in replaying replica operations may cause FEs to crash. #32295
- Setting the FE parameter
recover_with_empty_tablet
totrue
may cause FEs to crash. #33071 - Queries fail during hash joins, causing BEs to crash. #32219
- In a StarRocks shared-nothing cluster, queries against Iceberg or Hive tables may cause BEs to crash. #34682
- The error "get_applied_rowsets failed, tablet updates is in error state: tablet:18849 actual row size changed after compaction" is returned for queries. #33246
- Running
show proc '/statistic'
may cause a deadlock. #34237 - The FE performance plunges after the FE configuration item
enable_collect_query_detail_info
is set totrue
. #35945 - Errors may be thrown if large amounts of data are loaded into a Primary Key table with persistent index enabled. #34352
- After StarRocks is upgraded from v2.4 or earlier to a later version, compaction scores may rise unexpectedly. #34618
- If
INFORMATION_SCHEMA
is queried by using the database driver MariaDB ODBC, theCATALOG_NAME
column returned in theschemata
view holds onlynull
values. #34627 - FEs crash due to the abnormal data loaded and cannot restart. #34590
- If schema changes are being executed while a Stream Load job is in the PREPARD state, a portion of the source data to be loaded by the job is lost. #34381
- Including two or more slashes (
/
) at the end of the HDFS storage path causes the backup and restore of the data from HDFS to fail. #34601 - The
partition_live_number
property added by using the ALTER TABLE statement does not take effect. #34842 - The array_distinct function occasionally causes the BEs to crash. [#36377](https://github.com/S...
3.2.1
Release date: December 21, 2023
New Features
Data Lake Analytics
- Supports reading Hive Catalog tables and file external tables in Avro, SequenceFile, and RCFile formats through Java Native Interface (JNI).
Materialized View
- Added a view
object_dependencies
to the databasesys
. It contains the lineage information of asynchronous materialized views. #35060 - Supports creating synchronous materialized views with the WHERE clause.
- Supports partition-level incremental refresh for asynchronous materialized views created upon Iceberg catalogs.
- [Preview] Supports creating asynchronous materialized views based on tables in a Paimon catalog with partition-level refresh.
Query and SQL functions
- Supports the prepared statement. It allows better performance for processing high-concurrency point lookup queries. It also prevents SQL injection effectively.
- Supports the following Bitmap functions: subdivide_bitmap, bitmap_from_binary, and bitmap_to_binary.
- Supports the Array function array_unique_agg.
Monitoring and alerts
- Added a new metric
max_tablet_rowset_num
for setting the maximum allowed number of rowsets. This metric helps detect possible compaction issues and thus reduces the occurrences of the error "too many versions". #36539
Parameter changes
- A new BE configuration item
enable_stream_load_verbose_log
is added. The default value isfalse
. With this parameter set totrue
, StarRocks can record the HTTP requests and responses for Stream Load jobs, making troubleshooting easier. #36113
Improvements
- Upgraded the default GC algorithm in JDK8 to G1. #37268
- A new value option
GROUP_CONCAT_LEGACY
is added to the session variable sql_mode to provide compatibility with the implementation logic of the group_concat function in versions earlier than v2.5. #36150 - The authentication information
aws.s3.access_key
andaws.s3.access_secret
for AWS S3 in Broker Load jobs are hidden in audit logs. #36571 - The
be_tablets
view in theinformation_schema
database provides a new fieldINDEX_DISK
, which records the disk usage (measured in bytes) of persistent indexes. #35615 - The result returned by the SHOW ROUTINE LOAD statement provides a new field
OtherMsg
, which shows information about the last failed task. #35806
Bug Fixes
Fixed the following issues:
- The BEs crash if users create persistent indexes in the event of data corruption.#30841
- The array_distinct function occasionally causes the BEs to crash. #36377
- After the DISTINCT window operator pushdown feature is enabled, errors are reported if SELECT DISTINCT operations are performed on the complex expressions of the columns computed by window functions. #36357
- Some S3-compatible object storage returns duplicate files, causing the BEs to crash. #36103
2.5.17
Release date: December 19, 2023
New Features
- Added a new metric
max_tablet_rowset_num
for setting the maximum allowed number of rowsets. This metric helps detect possible compaction issues and thus reduces the occurrences of the error "too many versions". #36539 - Added the subdivide_bitmap function. #35817
Improvements
- The result returned by the SHOW ROUTINE LOAD statement provides a new field
OtherMsg
, which shows information about the last failed task. #35806 - The default retention period of trash files is changed to 1 day from the original 3 days. #37113
- Optimized the performance of persistent index update when compaction is performed on all rowsets of a Primary Key table, which reduces disk read I/O. #36819
- Optimized the logic used to compute compaction scores for Primary Key tables, thereby aligning the compaction scores for Primary Key tables within a more consistent range with the other three table types. #36534
- Queries on MySQL external tables and the external tables within JDBC catalogs support including keywords in the WHERE clause. #35917
- Added the bitmap_from_binary function to Spark Load to support loading Binary data. #36050
- The bRPC expiration time is shortened from 1 hour to the duration specified by the session variable
query_timeout
. This prevents query failures caused by RPC request expiration. #36778
Compatibility Changes
Parameters
- A new BE configuration item
enable_stream_load_verbose_log
is added. The default value isfalse
. With this parameter set totrue
, StarRocks can record the HTTP requests and responses for Stream Load jobs, making troubleshooting easier. #36113 - The BE static parameter
update_compaction_per_tablet_min_interval_seconds
becomes mutable. #36819
Bug Fixes
Fixed the following issues:
- Queries fail during hash joins, causing BEs to crash. #32219
- The FE performance plunges after the FE configuration item
enable_collect_query_detail_info
is set totrue
. #35945 - Errors may be thrown if large amounts of data are loaded into a Primary Key table with persistent index enabled. #34352
- The starrocks_be process may exit unexpectedly when
./agentctl.sh stop be
is used to stop a BE. #35108 - The array_distinct function occasionally causes the BEs to crash. #36377
- Deadlocks may occur when users refresh materialized views. #35736
- In some scenarios, dynamic partitioning may encounter an error, which causes FE start failures. #36846
3.1.6
Release date: December 18, 2023
New Features
- Added the now(p) function to return the current date and time with the specified fractional seconds precision (accurate to the microsecond). If p is not specified, this function returns only date and time accurate to the second. #36676
- Added a new metric max_tablet_rowset_num for setting the maximum allowed number of rowsets. This metric helps detect possible compaction issues and thus reduces the occurrences of the error "too many versions". #36539
Supports obtaining heap profiles by using a command line tool, making troubleshooting easier.#35322 - Supports creating asynchronous materialized views with common table expressions (CTEs). #36142
- Added the following bitmap functions: subdivide_bitmap, bitmap_from_binary, and bitmap_to_binary. #35817 #35621
Parameter Changes
- The FE dynamic parameter enable_new_publish_mechanism is changed to a static parameter. You must restart the FE after you modify the parameter settings. #35338
- The default retention period of trash files is changed to 1 day from the original 3 days. #37113
- A new FE configuration item routine_load_unstable_threshold_second is added. #36222
- A new BE configuration item enable_stream_load_verbose_log is added. The default value is false. With this parameter set to true, StarRocks can record the HTTP requests and responses for Stream Load jobs, making troubleshooting easier. #36113
- A new BE configuration item enable_lazy_delta_column_compaction is added. The default value is true, indicating that StarRocks does not perform frequent compaction operations on delta columns. #36654
Improvements
- A new value option GROUP_CONCAT_LEGACY is added to the session variable sql_mode to provide compatibility with the implementation logic of the group_concat function in versions earlier than v2.5. #36150
- The Primary Key table size returned by the SHOW DATA statement includes the sizes of .cols files (these are files related to partial column updates and generated columns) and persistent index files. #34898
- Queries on MySQL external tables and the external tables within JDBC catalogs support including keywords in the WHERE clause. #35917
- Plugin loading failures will no longer cause an error or cause an FE start failure. Instead, the FE can properly start, and the error status of the plug-in can be queried using SHOW PLUGINS. #36566
- Dynamic partitioning supports random distribution. #35513
- The result returned by the SHOW ROUTINE LOAD statement now includes the timestamps of consumption messages from each partition. #36222
- The result returned by the SHOW ROUTINE LOAD statement provides a new field OtherMsg, which shows information about the last failed task. #35806
- The authentication information aws.s3.access_key and aws.s3.access_secret for AWS S3 in Broker Load jobs are hidden in audit logs. #36571
- The be_tablets view in the information_schema database provides a new field INDEX_DISK, which records the disk usage (measured in bytes) of persistent indexes #35615
Bug Fixes
Fixed the following issues:
- The BEs crash if users create persistent indexes in the event of data corruption. #30841
- If users create an asynchronous materialized view that contains nested queries, the error "resolve partition column failed" is reported. #26078
- If users create an asynchronous materialized view on a base table whose data is corrupted, the error "Unexpected exception: null" is reported. #30038
- If users run a query that contains a window function, the SQL error "[1064] [42000]: Row count of const column reach limit: 4294967296" is reported. #33561
- The FE performance plunges after the FE configuration item enable_collect_query_detail_info is set to true. #35945
- In the StarRocks shared-data mode, the error "Reduce your request rate" may be reported when users attempt to delete files from object storage. #35566
- Deadlocks may occur when users refresh materialized views. #35736
- After the DISTINCT window operator pushdown feature is enabled, errors are reported if SELECT DISTINCT operations are performed on the complex expressions of the columns computed by window functions. #36357
- The BEs crash if the source data file is in ORC format and contains nested arrays. #36127
Some S3-compatible object storage returns duplicate files, causing the BEs to crash. #36103
3.2.0
Release date: December 1, 2023
New Features
Shared-data cluster
- Supports persisting indexes of Primary Key tables to local disks.
- Supports even distribution of Data Cache among multiple local disks.
Materialized View
Asynchronous materialized view
- The Query Dump file can include information of asynchronous materialized views.
- The Spill to Disk feature is enabled by default for the refresh tasks of asynchronous materialized views, reducing memory consumption.
Data Lake Analytics
- Supports creating and dropping databases and managed tables in Hive catalogs, and supports exporting data to Hive's managed tables using INSERT or INSERT OVERWRITE.
- Supports Unified Catalog, with which users can access different table formats (Hive, Iceberg, Hudi, and Delta Lake) that share a common metastore like Hive metastore or AWS Glue.
- Supports collecting statistics of Hive and Iceberg tables using ANALYZE TABLE, and storing the statistics in StarRocks, thus facilitating optimization of query plans and accelerating subsequent queries.
- Supports Information Schema for external tables, providing additional convenience for interactions between external systems (such as BI tools) and StarRocks.
Storage engine, data ingestion, and export
- Added the following features of loading with the table function FILES():
- Loading Parquet and ORC format data from Azure or GCP.
- Extracting the value of a key/value pair from the file path as the value of a column using the parameter
columns_from_path
. - Loading complex data types including ARRAY, JSON, MAP, and STRUCT.
- Supports unloading data from StarRocks to Parquet-formatted files stored in AWS S3 or HDFS by using INSERT INTO FILES. For detailed instructions, see Unload data using INSERT INTO FILES.
- Supports manual optimization of table structure and data distribution strategy used in an existing table to optimize the query and loading performance. You can set a new bucket key, bucket number, or sort key for a table. You can also set a different bucket number for specific partitions.
- Supports continuous data loading from AWS S3 or HDFS using the PIPE method.
- When PIPE detects new or modifications in a remote storage directory, it can automatically load the new or modified data into the destination table in StarRocks. While loading data, PIPE automatically splits a large loading task into smaller, serialized tasks, enhancing stability in large-scale data ingestion scenarios and reducing the cost of error retries.
Query
- Supports HTTP SQL API, enabling users to access StarRocks data via HTTP and execute SELECT, SHOW, EXPLAIN, or KILL operations.
- Supports Runtime Profile and text-based Profile analysis commands (SHOW PROFILELIST, ANALYZE PROFILE, EXPLAIN ANALYZE) to allow users to directly analyze profiles via MySQL clients, facilitating bottleneck identification and discovery of optimization opportunities.
SQL reference
Added the following functions:
- String functions: substring_index, url_extract_parameter, url_encode, url_decode, and translate
- Date functions: dayofweek_iso, week_iso, quarters_add, quarters_sub, milliseconds_add, milliseconds_sub, date_diff, jodatime_format, str_to_jodatime, to_iso8601, to_tera_date, and to_tera_timestamp
- Pattern matching function: regexp_extract_all
- hash function: xx_hash3_64
- Aggregate functions: approx_top_k
- Window functions: cume_dist, percent_rank and session_number
- Utility functions: dict_mapping and get_query_profile
Privileges and security
StarRocks supports access control through Apache Ranger, providing a higher level of data security and allowing the reuse of existing services of external data sources. After integrating with Apache Ranger, StarRocks enables the following access control methods:
- When accessing internal tables, external tables, or other objects in StarRocks, access control can be enforced based on the access policies configured for the StarRocks Service in Ranger.
- When accessing an external catalog, access control can also leverage the corresponding Ranger service of the original data source (such as Hive Service) to control access (currently, access control for exporting data to Hive is not yet supported).
For more information, see Manage permissions with Apache Ranger.
Improvements
Data Lake Analytics
- Optimized ORC Reader:
- Optimized the ORC Column Reader, resulting in nearly a two-fold performance improvement for VARCHAR and CHAR data reading.
- Optimized the decompression performance of ORC files in Zlib compression format.
- Optimized Parquet Reader:
- Supports adaptive I/O merging, allowing adaptive merging of columns with and without predicates based on filtering effects, thus reducing I/O.
- Optimized Dict Filter for faster predicate rewriting. Supports STRUCT sub-columns, and on-demand dictionary column decoding.
- Optimized Dict Decode performance.
- Optimized late materialization performance.
- Supports caching file footers to avoid repeated computation overhead.
- Supports decompression of Parquet files in lzo compression format.
- Optimized CSV Reader:
- Optimized the Reader performance.
- Supports decompression of CSV files in Snappy and lzo compression formats.
- Optimized the performance of the count calculation.
- Optimized Iceberg Catalog capabilities:
- Supports collecting column statistics from Manifest files to accelerate queries.
- Supports collecting NDV (number of distinct values) from Puffin files to accelerate queries.
- Supports partition pruning.
- Reduced Iceberg metadata memory consumption to enhance stability in scenarios with large metadata volume or high query concurrency.
Materialized View
Asynchronous materialized view
- Supports automatic refresh for an asynchronous materialized view created upon views or materialized views when schema changes occur on the views, materialized views, or their base tables.
- Data consistency:
- Added the property
query_rewrite_consistency
for asynchronous materialized view creation. This property defines the query rewrite rules based on the consistency check. - Add the property
force_external_table_query_rewrite
for external catalog-based asynchronous materialized view creation. This property defines whether to allow force query rewrite for asynchronous materialized views created upon external catalogs. - For detailed information, see CREATE MATERIALIZED VIEW.
- Added the property
- Added a consistency check for materialized views' partitioning key.
- When users create an asynchronous materialized view with window functions that include a PARTITION BY expression, the partitioning column of the window function must match that of the materialized view.
Storage engine, data ingestion, and export
- Optimized the persistent index for Primary Key tables by improving memory usage logic while reducing I/O read and write amplification. #24875 #27577 #28769
- Supports data re-distribution across local disks for Primary Key tables.
- Partitioned tables support automatic cooldown based on the partition time range and cooldown time. Compared to the original cooldown logic, it is more convenient to perform hot and cold data management on the partition level. For more information, see Specify initial storage medium, automatic storage cooldown time, replica number.
- The Publish phase of a load job that writes data into a Primary Key table is changed from asynchronous mode to synchronous mode. As such, the data loaded can be queried immediately after the load job finishes. For more information, see enable_sync_publish。
- Supports Fast Schema Evolution, which is controlled by the table property
fast_schema_evolution
. After this feature is enabled, the execution efficiency of adding or dropping columns is significantly improved. This mode is disabled by default (Default value isfalse
). You cannot modify this property for existing tables using ALTER TABLE. - Supports dynamically adjusting the number of tablets to create according to cluster information and the size of the data for Duplicate Key tables created with the Radom Bucketing strategy.
Query
- Optimized StarRocks' compatibility with Metabase and Superset. Supports integrating them with external catalogs.
SQL Reference
- array_agg supports the keyword DISTINCT.
- INSERT, UPDATE, and DELETE operations now support
SET_VAR
. #35283
Others
- Added the session variable `large_decimal_underlying_type = "p...
2.5.16
2.5.15
Release date: November 29, 2023
Improvements
- Added slow request logs to track slow requests. #33908
- Optimized the performance of using Spark Load to read Parquet and ORC files when there are a large number of files. #34787
- Optimized the performance of some Bitmap-related operations, including:
- Optimized nested loop joins. #34804 #35003
- Optimized the bitmap_xor function. #34069
- Supports Copy on Write to optimize Bitmap performance and reduce memory consumption. #34047
Compatibility Changes
Parameters
- The FE dynamic parameter enable_new_publish_mechanism is changed to a static parameter. You must restart the FE after you modify the parameter settings. #35338
Bug Fixes
- If a filtering condition is specified in a Broker Load job, BEs may crash during the data loading in certain circumstances. #29832
- Failures in replaying replica operations may cause FEs to crash. #32295
- Setting the FE parameter recover_with_empty_tablet to true may cause FEs to crash. #33071
- The error "get_applied_rowsets failed, tablet updates is in error state: tablet:18849 actual row size changed after compaction" is returned for queries. #33246
- A query that contains a window function may cause BEs to crash. #33671
- Running show proc '/statistic' may cause a deadlock. #34237
- Errors may be thrown if large amounts of data are loaded into a Primary Key table with persistent index enabled. #34566
- After StarRocks is upgraded from v2.4 or earlier to a later version, compaction scores may rise unexpectedly. #34618
- If INFORMATION_SCHEMA is queried by using the database driver MariaDB ODBC, the CATALOG_NAME column returned in the schemata view holds only null values. #34627
- If schema changes are being executed while a Stream Load job is in the PREPARD state, a portion of the source data to be loaded by the job is lost. #34381
- Including two or more slashes (/) at the end of the HDFS storage path causes the backup and restore of the data from HDFS to fail. #34601
- Running a loading task or a query may cause the FEs to hang. #34569
3.1.5
Release date: November 28, 2023
New features
- The CN nodes of a StarRocks shared-data cluster now support data export. #34018
Improvements - The COLUMNS view in the system database INFORMATION_SCHEMA can display ARRAY, MAP, and STRUCT columns. #33431
- Supports queries against Parquet, ORC, and CSV formatted files that are compressed by using LZO and stored in Hive. #30923 #30721
- Supports updates onto the specified partitions of an automatically partitioned table. If the specified partitions do not exist, an error is returned. #34777
- Supports automatic refresh of materialized views when Swap, Drop, or Schema Change operations are performed on the tables and views (including the other tables and materialized views associated with these views) on which these materialized views are created. #32829
- Optimized the performance of some Bitmap-related operations, including:
Bug Fixes
Fixed the following issues:
- If a filtering condition is specified in a Broker Load job, BEs may crash during the data loading in certain circumstances. #29832
- An unknown error is reported when SHOW GRANTS is executed. #30100
- When data is loaded into a table that uses expression-based automatic partitioning, the error "Error: The row create partition failed since Runtime error: failed to analyse partition value" may be thrown. #33513
- The error "get_applied_rowsets failed, tablet updates is in error state: tablet:18849 actual row size changed after compaction" is returned for queries. #33246
- In a StarRocks shared-nothing cluster, queries against Iceberg or Hive tables may cause BEs to crash. #34682
- In a StarRocks shared-nothing cluster, if multiple partitions are automatically created during data loading, the data loaded may occasionally be written to unmatched partitions. #34731
- Long-time, frequent data loading into a Primary Key table with persistent index enabled may cause BEs to crash. #33220
- The error "Exception: java.lang.IllegalStateException: null" is returned for queries. #33535
- When show proc '/current_queries'; is being executed and meanwhile a query begins to be executed, BEs may crash. #34316
- Errors may be thrown if large amounts of data are loaded into a Primary Key table with persistent index enabled. #34352
- After StarRocks is upgraded from v2.4 or earlier to a later version, compaction scores may rise unexpectedly. #34618
- If INFORMATION_SCHEMA is queried by using the database driver MariaDB ODBC, the CATALOG_NAME column returned in the schemata view holds only null values. #34627
- FEs crash due to the abnormal data loaded and cannot restart. #34590
- If schema changes are being executed while a Stream Load job is in the PREPARD state, a portion of the source data to be loaded by the job is lost. #34381
- Including two or more slashes (/) at the end of the HDFS storage path causes the backup and restore of the data from HDFS to fail. #34601
- Setting the session variable enable_load_profile to true makes Stream Load jobs prone to fail. #34544
- Performing partial updates in column mode onto a Primary Key table causes some tablets of the table to show data inconsistencies between their replicas. #34555
- The partition_live_number property added by using the ALTER TABLE statement does not take effect. #34842
- FEs fail to start and report the error "failed to load journal type 118". #34590
- Setting the FE parameter recover_with_empty_tablet to true may cause FEs to crash. #33071
- Failures in replaying replica operations may cause FEs to crash. #32295
Compatibility Changes
Parameters
- Added an FE configuration item enable_statistics_collect_profile, which controls whether to generate profiles for statistics queries. The default value is false. #33815
- The FE configuration item mysql_server_version is now mutable. The new setting can take effect for the current session without requiring an FE restart. #34033
- Added a BE/CN configuration item update_compaction_ratio_threshold, which controls the maximum proportion of data that a compaction can merge for a Primary Key table in a StarRocks shared-data cluster. The default value is 0.5. We recommend shrinking this value if a single tablet becomes excessively large. For a StarRocks shared-nothing cluster, the proportion of data that a compaction can merge for a Primary Key table is still automatically adjusted. #35129
System Variables
- Added a session variable cbo_decimal_cast_string_strict, which controls how the CBO converts data from the DECIMAL type to the STRING type. If this variable is set to true, the logic built in v2.5.x and later versions prevails and the system implements strict conversion (namely, the system truncates the generated string and fills 0s based on the scale length). If this variable is set to false, the logic built in versions earlier than v2.5.x prevails and the system processes all valid digits to generate a string. The default value is true. #34208
- Added a session variable cbo_eq_base_type, which specifies the data type used for data comparison between DECIMAL-type data and STRING-type data. The default value is VARCHAR, and DECIMAL is also a valid value. #34208
- Added a session variable big_query_profile_second_threshold. When the session variable enable_profile is set to false and the amount of time taken by a query exceeds the threshold specified by the big_query_profile_second_threshold variable, a profile is generated for that query. #33825
[Candidate] 3.2.0-rc01
Release date: November 15, 2023
New Features
Shared-data cluster
- Supports the persistent index for Primary Key tables on local disks.
- Supports the even distribution of Data Cache among multiple local disks.
Data Lake Analytics
- Supports creating and dropping databases and managed tables in Hive catalogs, and supports exporting data to Hive's managed tables using INSERT or INSERT OVERWRITE.
- Supports Unified Catalog, with which users can access different table formats (Hive, Iceberg, Hudi, and Delta Lake) that share a common metastore like Hive metastore or AWS Glue.
Storage engine, data ingestion, and export
- Added the following features of loading with the table function FILES():
- Loading Parquet and ORC format data from Azure or GCP.
- Extracting the value of a key/value pair from the file path as the value of a column using the parameter
columns_from_path
. - Loading complex data types including ARRAY, JSON, MAP, and STRUCT.
- Supports the dict_mapping column property, which can significantly facilitate the loading process during the construction of a global dictionary, accelerating the exact COUNT DISTINCT calculation.
- Supports unloading data from StarRocks to Parquet-formatted files stored in AWS S3 or HDFS by using INSERT INTO FILES. For detailed instructions, see Unload data using INSERT INTO FILES.
SQL reference
Added the following functions:
- String functions: substring_index, url_extract_parameter, url_encode, url_decode, and translate
- Date functions: dayofweek_iso, week_iso, quarters_add, quarters_sub, milliseconds_add, milliseconds_sub, date_diff, jodatime_format, str_to_jodatime, to_iso8601, to_tera_date, and to_tera_timestamp
- Pattern matching function: regexp_extract_all
- hash function: xx_hash3_64
- Aggregate functions: approx_top_k
- Window functions: cume_dist, percent_rank and session_number
- Utility functions: dict_mapping and get_query_profile
Privileges and security
StarRocks supports access control through Apache Ranger, providing a higher level of data security and allowing the reuse of existing Ranger Service of external data sources. After integrating with Apache Ranger, StarRocks enables the following access control methods:
- When accessing internal tables, external tables, or other objects in StarRocks, access control can be enforced based on the access policies configured for the StarRocks Service in Ranger.
- When accessing an external catalog, access control can also leverage the corresponding Ranger service of the original data source (such as Hive Service) to control access (currently, access control for exporting data to Hive is not yet supported).
For more information, see Manage permissions with Apache Ranger.
Improvements
Materialized View
Asynchronous materialized view
- Creation:
Supports automatic refresh for an asynchronous materialized view created upon views or materialized views when schema changes occur on the views, materialized views, or their base tables. - Observability:
Supports Query Dump for asynchronous materialized views. - The Spill to Disk feature is enabled by default for the refresh tasks of asynchronous materialized views, reducing memory consumption.
- Data consistency:
- Added the property
query_rewrite_consistency
for asynchronous materialized view creation. This property defines the query rewrite rules based on the consistency check. - Add the property
force_external_table_query_rewrite
for external catalog-based asynchronous materialized view creation. This property defines whether to allow force query rewrite for asynchronous materialized views created upon external catalogs.
For detailed information, see CREATE MATERIALIZED VIEW.
- Added the property
- Added a consistency check for materialized views' partitioning key.
When users create an asynchronous materialized view with window functions that include a PARTITION BY expression, the partitioning column of the window function must match that of the materialized view.
Storage engine, data ingestion, and export
- Optimized the persistent index for Primary Key tables by improving memory usage logic while reducing I/O read and write amplification. #24875 #27577 #28769
- Supports data re-distribution across local disks for Primary Key tables.
- Partitioned tables support automatic cooldown based on the partition time range and cooldown time. For detailed information, see Set initial storage medium and automatic storage cooldown time.
- The Publish phase of a load job that writes data into a Primary Key table is changed from asynchronous mode to synchronous mode. As such, the data loaded can be queried immediately after the load job finishes. For detailed information, see enable_sync_publish.
Query
- Optimized StarRocks' compatibility with Metabase and Superset. Supports integrating them with external catalogs.
SQL Reference
- array_agg supports the keyword DISTINCT.
Developer tools
- Supports Trace Query Profile for asynchronous materialized views, which can be used to analyze its transparent rewrite.
Compatibility Changes
Parameters
- Added new parameters for Data Cache.
Bug Fixes
Fixed the following issues:
- BEs crash when libcurl is invoked. #31667
- Schema Change may fail if it takes an excessive period of time, because the specified tablet version is handled by garbage collection. #31376
- Failed to access the Parquet files in MinIO or AWS S3 via file external tables. #29873
- The ARRAY, MAP, and STRUCT type columns are not correctly displayed in
information_schema.columns
. #33431 DATA_TYPE
andCOLUMN_TYPE
for BINARY or VARBINARY data types are displayed asunknown
in theinformation_schema.columns
view. #32678