Resolving conflicts

openlookeng · May 2, 2022 · 5567fd8 · 5567fd8
2 parents 660fab6 + bb76006
commit 5567fd8
Show file tree

Hide file tree

Showing 195 changed files with 2,426 additions and 558 deletions.
diff --git a/hetu-carbondata/pom.xml b/hetu-carbondata/pom.xml
@@ -22,7 +22,7 @@
   <parent>
     <groupId>io.hetu.core</groupId>
     <artifactId>presto-root</artifactId>
-    <version>1.6.0-SNAPSHOT</version>
+    <version>1.7.0-SNAPSHOT</version>
   </parent>
 
   <artifactId>hetu-carbondata</artifactId>

diff --git a/hetu-clickhouse/pom.xml b/hetu-clickhouse/pom.xml
@@ -5,7 +5,7 @@
     <parent>
         <groupId>io.hetu.core</groupId>
         <artifactId>presto-root</artifactId>
-        <version>1.6.0-SNAPSHOT</version>
+        <version>1.7.0-SNAPSHOT</version>
     </parent>
 
     <artifactId>hetu-clickhouse</artifactId>

diff --git a/hetu-common/pom.xml b/hetu-common/pom.xml
@@ -5,7 +5,7 @@
     <parent>
         <groupId>io.hetu.core</groupId>
         <artifactId>presto-root</artifactId>
-        <version>1.6.0-SNAPSHOT</version>
+        <version>1.7.0-SNAPSHOT</version>
     </parent>
 
     <artifactId>hetu-common</artifactId>

diff --git a/hetu-cube/pom.xml b/hetu-cube/pom.xml
@@ -5,7 +5,7 @@
     <parent>
         <groupId>io.hetu.core</groupId>
         <artifactId>presto-root</artifactId>
-        <version>1.6.0-SNAPSHOT</version>
+        <version>1.7.0-SNAPSHOT</version>
     </parent>
 
     <artifactId>hetu-cube</artifactId>

diff --git a/hetu-datacenter/pom.xml b/hetu-datacenter/pom.xml
@@ -4,7 +4,7 @@
     <parent>
         <groupId>io.hetu.core</groupId>
         <artifactId>presto-root</artifactId>
-        <version>1.6.0-SNAPSHOT</version>
+        <version>1.7.0-SNAPSHOT</version>
     </parent>
 
     <artifactId>hetu-datacenter</artifactId>

diff --git a/hetu-docs/en/admin/extension-execution-planner.md b/hetu-docs/en/admin/extension-execution-planner.md
@@ -0,0 +1,25 @@
+#Extension Physical Execution Planner
+This section describes how to add an extension physical execution planner in openLooKeng. With the extension physical execution planner, openLooKeng can utilize other operator acceleration libraries to speed up the execution of SQL statements.
+
+##Configuration
+To enable extension physical execution feature, the following configs must be added in
+`config.properties`：
+
+``` properties
+extension_execution_planner_enabled=true
+extension_execution_planner_jar_path=file:///xxPath/omni-openLooKeng-adapter-1.6.1-SNAPSHOT.jar
+extension_execution_planner_class_path=nova.hetu.olk.OmniLocalExecutionPlanner
+```
+
+The above attributes are described below:
+
+- `extension_execution_planner_enabled`: Enable extension physical execution feature.
+- `extension_execution_planner_jar_path`: Set the file path of the extension physical execution jar package.
+- `extension_execution_planner_class_path`: Set the package path of extension physical execution generated class in jar。
+
+
+##Usage
+The below command can control the enablement of extension physical execution feature in WebUI or Cli while running openLooKeng:
+```
+set session extension_execution_planner_enabled=true/false
+```
diff --git a/hetu-docs/en/admin/properties.md b/hetu-docs/en/admin/properties.md
@@ -387,9 +387,9 @@ Exchanges transfer data between openLooKeng nodes for different stages of a quer
 ### `exchange.max-retry-count`
 
 > -   **Type:** `integer`
-> -   **Default value:** `10`
+> -   **Default value:** `100`
 >
-> The maximum number of retry for failed task performed by the coordinator before considering it as a permanent failure. This property is used only when exchange.is-timeout-failure-detection-enabled is set to false. This value needs to be atleast 3 (minimum retry count) to take effect.
+> The maximum number of retry for failed task performed by the coordinator before consulting the failure detector module about the remote node status. If the remote node status is failed as per the failure detector module, it is considered as a permanent failure. This parameter is the minimum count which is required to decide, not necessarily the exact count. Based on the cluster size, load on the cluster the exact count may vary slightly. This property is used only when exchange.is-timeout-failure-detection-enabled is set to false. This value needs to be at least 100 to take effect.
 
 ### `sink.max-buffer-size`
 
@@ -852,6 +852,9 @@ helps with cache affinity scheduling.
 > -   **Default value:** `5m`
 >
 > The maximum time coordinator waits for remote-task related error to be resolved before it's considered a failure.
+>
+> Note:
+> For snapshot recovery `query.remote-task.max-error-duration` should be greater than `exchange.max-error-duration`.
 
 ## Distributed Snapshot
 

diff --git a/hetu-docs/en/admin/reliable-execution.md b/hetu-docs/en/admin/reliable-execution.md
@@ -61,10 +61,12 @@ It is suggested to only turn on distributed snapshot when necessary, i.e. for qu
 
 Snapshot capture and restore statistics are displayed in CLI along with query result when CLI is launched in debug mode
 
-Snapshot capture statistics covers size of snapshots captured, CPU Time taken for capturing the snapshots and Wall Time taken for capturing the snapshots during the query.  These statistics are displayed for all snapshots and for last snapshot separately.
+Snapshot capture statistics includes number of snapshots captured, size of snapshots captured, CPU Time taken for capturing the snapshots and Wall Time taken for capturing the snapshots during the query.  These statistics are displayed for all snapshots and for last snapshot separately.
 
 Snapshot restore statistics covers number of times restored from snapshots during query, Size of the snapshots loaded for restoring, CPU Time taken for restoring from snapshots and Wall Time taken for restoring from snapshots.  Restore statistics are displayed only when there is restore(recovery) happened during the query.
 
+Additionally, while query is in progress number of capturing snapshots and id of the restoring snapshot will be displayed. Refer below picture for more details 
+
 ![](../images/snapshot_statistics.png)
 
 ## Configurations

diff --git a/hetu-docs/en/admin/spill.md b/hetu-docs/en/admin/spill.md
@@ -65,6 +65,8 @@ When the build table is partitioned, the spill-to-disk mechanism can decrease th
 
 With this mechanism, the peak memory used by the join operator can be decreased to the size of the largest build table partition. Assuming no data skew, this will be `1 / task.concurrency` times the size of the whole build table.
 
+Note: spill-to-disk is not supported for Cross Join.
+
 ### Aggregations
 
 Aggregation functions perform an operation on a group of values and return one value. If the number of groups you\'re aggregating over is large, a significant amount of memory may be needed. When spill-to-disk

diff --git a/hetu-docs/en/develop/connectors.md b/hetu-docs/en/develop/connectors.md
@@ -21,7 +21,7 @@ This interface is too big to list in this documentation, but if you are interest
 connector. If your underlying data source supports schemas, tables and columns, this interface should be straightforward to implement. If you are attempting to adapt something that is not a relational database (as
 the Example HTTP connector does), you may need to get creative about how you map your data source to openLooKeng\'s schema, table, and column concepts.
 
-### ConnectorSplitManger
+### ConnectorSplitManager
 
 The split manager partitions the data for a table into the individual chunks that openLooKeng will distribute to workers for processing. For example, the Hive connector lists the files for each Hive partition and creates
 one or more split per file. For data sources that don\'t have partitioned data, a good strategy here is to simply return a single split for the entire table. This is the strategy employed by the Example HTTP connector.

diff --git a/hetu-docs/en/images/snapshot_statistics.png b/hetu-docs/en/images/snapshot_statistics.png
diff --git a/hetu-docs/en/index.md b/hetu-docs/en/index.md
@@ -172,6 +172,7 @@ headless: true
     - [SHOW CACHE]({{< relref "./docs/sql/show-cache.md" >}})
     - [SHOW CATALOGS]({{< relref "./docs/sql/show-catalogs.md" >}})
     - [SHOW COLUMNS]({{< relref "./docs/sql/show-columns.md" >}})
+    - [SHOW CREATE CUBE]({{< relref "./docs/sql/show-create-cube.md" >}})
     - [SHOW CREATE TABLE]({{< relref "./docs/sql/show-create-table.md" >}})
     - [SHOW CREATE VIEW]({{< relref "./docs/sql/show-create-view.md" >}})
     - [SHOW FUNCTIONS]({{< relref "./docs/sql/show-functions.md" >}})
@@ -216,6 +217,8 @@ headless: true
     - [Task Resource]({{< relref "./docs/rest/task.md" >}})
 
 - [Release Notes]("#")
+    - [1.6.1 (27 Apr 2022)]({{< relref "./docs/releasenotes/releasenotes-1.6.1.md" >}})
+    - [1.6.0 (30 Mar 2022)]({{< relref "./docs/releasenotes/releasenotes-1.6.0.md" >}})
     - [1.5.0 (30 Dec 2021)]({{< relref "./docs/releasenotes/releasenotes-1.5.0.md" >}})
     - [1.4.1 (12 Nov 2021)]({{< relref "./docs/releasenotes/releasenotes-1.4.1.md" >}})
     - [1.4.0 (15 Oct 2021)]({{< relref "./docs/releasenotes/releasenotes-1.4.0.md" >}})

diff --git a/hetu-docs/en/releasenotes/releasenotes-1.6.0.md b/hetu-docs/en/releasenotes/releasenotes-1.6.0.md
@@ -0,0 +1,25 @@
+# Release 1.6.0
+
+## Key Features
+
+| Area                  | Feature                                                      |
+| --------------------- | ------------------------------------------------------------ |
+| Star Tree             | Support update cube command to allow admin to easily update an existing cube when the underlying data changes |
+| Bloom Index           | Hindex-Optimize Bloom Index Size-Reduce bloom index size by 10X+ times |
+| Task Recovery         | 1. Improve failure detection time: It need take 300s to determine a task is failed and resume after that. Improving this would improve the resume & also the overall query time<br/>2. snapshotting speed & size: When sql execute takes a snapshot, now use direct Java serialization which is slow and also takes more size. Using kryo serialization would reduce size and also increase speed there by increasing the overall throughput |
+| Spill to Disk         | 1. Spill to disk speed & size improvement: When spill happens during HashAggregation & GroupBy, the data serialized to disk is slow and also size is more. It can improve the overall performance by reducing size and also improving  the writing speed. Using kryo serialization improves both speed and reduces size<br/>2. Support spilling to hdfs: Currently data can spill to multiple disks, now support spill to hdfs to improve throughput<br/>3. Async spill/unspill: When revocable memory crosses threshold and spill is triggered, it blocks accepting the data from the downstream operators. Accepting this and adding to the existing spill would help to complete the pipeline faster<br/>4. Enable spill for right outer & full join for spilling: It don’t spill the build side data when the join type is right outer or full join as it needs the entire data in memory for lookup. This leads to out of  memory when the data size is more. Instead by enable spill and create a Bloom Filter to identify the data spilled and use it during join with probe side |
+| Connector Enhancement | Support data update and delete operator for PostgreSQL and openGauss |
+
+## Known Issues
+
+| Category      | Description                                                  | Gitee issue                                               |
+| ------------- | ------------------------------------------------------------ | --------------------------------------------------------- |
+| Task Recovery | When a snapshot is enabled and a CTAS with transaction is executed, an error is reported in the SQL statement. | [I502KF](https://e.gitee.com/open_lookeng/issues/list?issue=I502KF) |
+|               | An error occurs occasionally when snapshot is enabled and exchange.is-timeout-failure-detection-enabled is disabled. | [I4Y3TQ](https://e.gitee.com/open_lookeng/issues/list?issue=I4Y3TQ) |
+| Star Tree     | In the memory connector, after the star tree is enabled, data inconsistency occurs during query. | [I4QQUB](https://e.gitee.com/open_lookeng/issues/list?issue=I4QQUB) |
+|               | When the reload cube command is executed for 10 different cubes at the same time, some cubes fail to be reloaded. | [I4VSVJ](https://e.gitee.com/open_lookeng/issues/list?issue=I4VSVJ) |
+
+## Obtaining the Document 
+
+For details, see [https://gitee.com/openlookeng/hetu-core/tree/1.6.0/hetu-docs/en](https://gitee.com/openlookeng/hetu-core/tree/1.6.0/hetu-docs/en)
+
diff --git a/hetu-docs/en/releasenotes/releasenotes-1.6.1.md b/hetu-docs/en/releasenotes/releasenotes-1.6.1.md
@@ -0,0 +1,15 @@
+# Release 1.6.1 (27 Apr 2022)
+
+## Key Features
+
+This release is mainly about modification and enhancement of some SPIs, which are used in more scenarios.
+
+| Area                    | Feature                                                      | PR #s                                                        |
+| ----------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ |
+| Data Source statistics                | The method of obtaining statistics is added so that statistics can be directly obtained from the Connector. Some operators can be pushed down to the Connector for calculation. You may need to obtain statistics from the Connector to display the amount of processed data.                                               | 1450                                                          |
+| Operator processing extension | Users can customize the physical execution plan of worker nodes. Users can use their own operator pipelines to replace the native implementation to accelerate operator processing. | 1436                                                           |
+| HIVE UDF extension | Adds the adaptation of HIVE UDF function namespace to support the execution of UDFs (including GenericUDF) written based on the HIVE UDF framework. | 1456                                                           |
+
+## Obtaining the Document 
+
+For details, see [https://gitee.com/openlookeng/hetu-core/tree/1.6.1/hetu-docs/en](https://gitee.com/openlookeng/hetu-core/tree/1.6.1/hetu-docs/en)
diff --git a/hetu-docs/zh/admin/extension-execution-planner.md b/hetu-docs/zh/admin/extension-execution-planner.md
@@ -0,0 +1,24 @@
+#扩展物理执行计划
+本节介绍openLooKeng如何添加扩展物理执行计划。通过物理执行计划的扩展，openLooKeng可以使用其他算子加速库来加速SQL语句的执行。
+
+##配置
+在配置文件`config.properties`增加如下配置：
+
+``` properties
+extension_execution_planner_enabled=true
+extension_execution_planner_jar_path=file:///xxPath/omni-openLooKeng-adapter-1.6.1-SNAPSHOT.jar
+extension_execution_planner_class_path=nova.hetu.olk.OmniLocalExecutionPlanner
+```
+
+上述属性说明如下：
+
+- `extension_execution_planner_enabled`：是否开启扩展物理执行计划特性。
+- `extension_execution_planner_jar_path`：指定扩展jar包的文件路径。
+- `extension_execution_planner_class_path`：指定扩展jar包中执行计划生成类的包路径。
+
+
+##使用
+当运行openLooKeng时，可在WebUI或Cli中通过如下命令控制扩展物理执行计划的开启:
+```
+set session extension_execution_planner_enabled=true/false
+```
diff --git a/hetu-docs/zh/develop/connectors.md b/hetu-docs/zh/develop/connectors.md
@@ -31,7 +31,7 @@
 
 
 
-### ConnectorSplitManger
+### ConnectorSplitManager
 
 分片管理器将表的数据分区成多个块，这些块由 openLooKeng 分发至工作节点进行处理。
 

diff --git a/hetu-docs/zh/index.md b/hetu-docs/zh/index.md
@@ -171,6 +171,7 @@ headless: true
     - [SHOW CACHE]({{< relref "./docs/sql/show-cache.md" >}})
     - [SHOW CATALOGS]({{< relref "./docs/sql/show-catalogs.md" >}})
     - [SHOW COLUMNS]({{< relref "./docs/sql/show-columns.md" >}})
+    - [SHOW CREATE CUBE]({{< relref "./docs/sql/show-create-cube.md" >}})
     - [SHOW CREATE TABLE]({{< relref "./docs/sql/show-create-table.md" >}})
     - [SHOW CREATE VIEW]({{< relref "./docs/sql/show-create-view.md" >}})
     - [SHOW FUNCTIONS]({{< relref "./docs/sql/show-functions.md" >}})
@@ -215,6 +216,8 @@ headless: true
     - [任务资源]({{< relref "./docs/rest/task.md" >}})
 
 - [发行说明]("#")
+    - [1.6.1 (2022年4月27日)]({{< relref "./docs/releasenotes/releasenotes-1.6.1.md" >}})
+    - [1.6.0 (2022年3月30日)]({{< relref "./docs/releasenotes/releasenotes-1.6.0.md" >}})
     - [1.5.0 (2021年12月30日)]({{< relref "./docs/releasenotes/releasenotes-1.5.0.md" >}})
     - [1.4.1 (2021年11月12日)]({{< relref "./docs/releasenotes/releasenotes-1.4.1.md" >}})
     - [1.4.0 (2021年10月15日)]({{< relref "./docs/releasenotes/releasenotes-1.4.0.md" >}})

diff --git a/hetu-docs/zh/releasenotes/releasenotes-1.6.0.md b/hetu-docs/zh/releasenotes/releasenotes-1.6.0.md
@@ -0,0 +1,23 @@
+# Release 1.6.0
+
+## 关键特性
+
+| 分类                  | 描述                                                         |
+| --------------------- | ------------------------------------------------------------ |
+| Star Tree             | 支持cube更新命令,允许管理员在基础数据更改时轻松更新现有cube的内容 |
+| Bloom Index | 优化布隆过滤器索引大小使缩小十倍以上 |
+| Task Recovery         | 1. 优化执行失败检测时间：当前需要300秒来确定任务失败,然后继续运行。改进这一点将改善执行流程和整体查询时间<br/> 2. 快照时间和大小优化：当执行过程中使用快照时,当前直接使用Java序列化,速度很慢,而且需要更多的空间。使用kryo序列化方式可以减小文件大小并提升速度来增加总吞吐量 |
+| 数据持久化 | 1. 优化计算过程数据下盘速度和大小：当在Hash Aggregation（聚合算法）和GroupBy（分组）算子执行过程中发生溢出时,序列化到磁盘的数据会很慢,而且大小也会更大。因此可以通过减小大小和提高写入速度来提高整体性能。通过使用kryo序列化可以提高速度并减小溢出写盘文件大小<br/>2. 支持溢出到hdfs上：目前计算过程数据可以溢出到多个磁盘,现在支持溢出到hdfs以提高吞吐量<br/>3. 异步溢出/不溢出机制：当可操作内存超过阈值并触发溢出时,会阻塞接受来自下游运算符的数据。接受数据并加入到现有溢出流程将有助于更快地完成任务<br/>4. 支持右外连接&全连接场景下的溢出写盘：当连接类型为右外连接或全连接时,不会溢出构建侧数据,因为需要所有数据在内存中进行查找。当数据量较大时,这将导致内存溢出。因此，通过启用溢出机制并创建一个布隆过滤器来识别溢出的数据,并在与探查侧连接期间使用它 |
+| 连接器增强            | 增强PostgreSQL和openGauss连接器，支持对数据源进行数据更新和删除操作 |
+## 已知问题
+
+| 分类          | 描述                                                         | Gitee问题                                                 |
+| ------------- | ------------------------------------------------------------ | --------------------------------------------------------- |
+| Task Recovery | 启用快照时，执行带事务的CTAS语句时，SQL语句执行报错          | [I502KF](https://e.gitee.com/open_lookeng/issues/list?issue=I502KF) |
+|               | 启用快照并将exchange.is-timeout-failure-detection-enable关闭时，概率性出现错误 | [I4Y3TQ](https://e.gitee.com/open_lookeng/issues/list?issue=I4Y3TQ) |
+| Star Tree     | 在内存连接器中，启用star tree功能后，查询时偶尔出现数据不一致 | [I4QQUB](https://e.gitee.com/open_lookeng/issues/list?issue=I4QQUB) |
+|               | 当同时对10个不同的cube执行reload cube命令时，部分cube无法重新加载 | [I4VSVJ](https://e.gitee.com/open_lookeng/issues/list?issue=I4VSVJ) |
+
+## 获取文档
+
+请参考： [https://gitee.com/openlookeng/hetu-core/tree/1.6.0/hetu-docs/zh](https://gitee.com/openlookeng/hetu-core/tree/1.6.0/hetu-docs/zh)
diff --git a/hetu-docs/zh/releasenotes/releasenotes-1.6.1.md b/hetu-docs/zh/releasenotes/releasenotes-1.6.1.md
@@ -0,0 +1,15 @@
+# Release 1.6.1 (2022年04月27日)
+
+## 关键特性
+
+本次发布主要是一些SPI的修改和增强，为扩展更多的场景使用。
+
+| 类别                    | 特性                                                      | PR #s                                                        |
+| ----------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ |
+| 数据源统计信息               | 增加统计信息获取方式，支持从Connector直接获取统计信息。有些算子可以下推到Connector里面进行计算，可能需要直接从Connector获取统计信息才能展示真正被处理的数据量。                                               | 1450                                                          |
+| 算子处理扩展 | 增加通过用户自定义worker结点物理执行计划的生成，用户可以实现自己的算子pipeline代替原生实现，加速算子处理。 | 1436                                                           |
+| HIVE UDF扩展 | 增加 HIVE UDF 函数命名空间的适配，以支持执行基于HIVE UDF框架编写的UDF（含GenericUDF）。 | 1456                                                           |
+
+## 获取文档
+
+请参考：[https://gitee.com/openlookeng/hetu-core/tree/1.6.1/hetu-docs/zh](https://gitee.com/openlookeng/hetu-core/tree/1.6.1/hetu-docs/zh )