Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge master #46486

Merged
merged 457 commits into from
Jan 7, 2025
Merged

merge master #46486

merged 457 commits into from
Jan 7, 2025

Conversation

eldenmoon
Copy link
Member

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

Mryange and others added 30 commits December 6, 2024 15:59
…tate. (apache#45001)

### What problem does this PR solve?

Problem Summary:

Removed unnecessary constructors from RuntimeState
### What problem does this PR solve?

In the past, Doris had enabled -Wshadow-field, which is a subset of
-Wshadow.
Therefore, we added -Wshadow in common/compile_check_begin.h and will
gradually enable it.
Problem Summary:
1. Fix wrong gram_size 256 for ngram bf index DDL.
2. Enhance bloom_filter_columns empty properties prcocessing in bf index
DDL.
3. Add FE UT for ngram and bf index DDL.
### What problem does this PR solve?

Problem Summary:
support handle const column in java-udf
the before and after execute time
```

mysql [ssb]>select count(test_query_qa.java_udf_test3(c_custkey,"asd")) from test_udf;
+-----------------------------------------+
| count(java_udf_test3(c_custkey, 'asd')) |
+-----------------------------------------+
|                                36000000 |
+-----------------------------------------+
1 row in set (0.68 sec)

------------

mysql [ssb]>select count(test_query_qa.java_udf_test3(c_custkey,"asd")) from test_udf;
+-----------------------------------------+
| count(java_udf_test3(c_custkey, 'asd')) |
+-----------------------------------------+
|                                36000000 |
+-----------------------------------------+
1 row in set (0.52 sec)


```

### Release note
support handle const column in java-udf
Related PR: apache#44410

Problem Summary:
Add code check for some mow code
…pache#45105)

### What problem does this PR solve?
a instance may closed early coz sink is finished, then this instance
will missing eos.
shared hash table join build sink will copy rf from other instance when
sink meet eos, then use those rf on close() method.
for avoid rf process meet error, this pr move copy rf logic to close()
method.


![图片](https://github.com/user-attachments/assets/f80aee0b-4b23-4095-b49b-68f6e2e56f25)
apache#45085)

Problem Summary:
it will cause the data dir can not be deleted because the table shared
ptr use count is 2
…Support Kerberos Ticket Auto-Renewal (apache#44916)

### Background
The current implementation uses the HadoopUGI method, which invokes the
ugiDoAs function for each operation to log in and execute actions based
on the configuration. However, this approach has the following issues:

- Lack of Auto-Renewal: If the Kerberos TGT (Ticket Granting Ticket)
expires, manual re-login is required as there is no support for
automatic ticket renewal.
- Redundant Login Overhead: Each operation requires reinitializing or
checking UserGroupInformation, potentially causing performance
bottlenecks.
- Complex Management: The HadoopUGI design does not unify the lifecycle
management of UGI instances, leading to duplicated logic across the
codebase.
### Objective

- Auto-Renewal: Automatically renew Kerberos credentials when the TGT is
expired or near expiry.
- UGI Caching: Maintain reusable UserGroupInformation instances during
their lifecycle to avoid repetitive logins.
- Unified Management: Simplify the management of UGI instances and
Kerberos credentials.
### What problem does this PR solve?
Problem Summary:
fix manager show proc case unstable.
Replace   pipeline-unfriendly `select sleep(xxx)`  in case.
### What problem does this PR solve?

Add comments to the job's test cases
). (apache#44845)

Also fix stop split too early in select join statement when
enable_profile is true. Releated to issue (apache#40683).

Case appearing condition: use batchmode and set enable_profile=true,hive
table has enough split (like more than 10)

### What problem does this PR solve?
Under enable_profile is true, in query with join like "select
dt.d_year,dt.d_moy from date_dim dt ,store_sales where dt.d_date_sk =
store_sales.ss_sold_date_sk group by dt.d_year, dt.d_moy order by
dt.d_year, dt.d_moy limit 100",
the old fix in commit 83797e3 stops SplitAssignment in
FileScanNode.getNodeExplainString, that would be called in normal query
(in StmtExecutor.updateProfile),then BE would get empty split and make
wrong result。
So i changed the place where SplitAssignment need stop in explain only
statement.
### What problem does this PR solve?

Issue Number: close #xxx

Related PR: apache#44439
…44680)

Related PR: apache#44344
`VTabletWriterV2::_select_streams()` is already checking if there is
enough downstream BE to meet the replication requirements.
`VTabletWriterV2::close()` should tolerate those non-open streams on
close wait.

Debug point `VTabletWriterV2._open_streams.skip_two_backends` is added
along with `VTabletWriterV2._open_streams.skip_one_backend` to check
this behavior.
…y_minmax case failure. (apache#45121)

### What problem does this PR solve?
Add log to investigate nereids_p0/stats/partition_key_minmax case
failure.

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

### Release note

None
…rces exist (apache#45125)

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: apache#39597 

Problem Summary:
When authorization includes create, not check if resources exist
…he#45148)

### What problem does this PR solve?
Problem Summary:
Optimize reading of maxcompute partition tables:
1. Introduce batch mode to generate splits for Maxcompute partition
tables to optimize scenarios with a large number of partitions. Control
it through the variable `num_partitions_in_batch_mode`.
2. Introduce catalog parameter `mc.split_cross_partition`. The parameter
is true, which is more friendly to reading partition tables, and false,
which is more friendly to debug.
3. Add `-Darrow.enable_null_check_for_get=false` to be jvm to improve
the efficiency of mc arrow data conversion.
apache#44701)

Issue Number: close apache#42843
Support ADMIN CHECK tabletlist command in nereids
…e#45147)

### What problem does this PR solve?
set "runtime_filter_mode=off" to avoid interference of runtime filter in
the case "invalid_stats.groovy"
…#44365)

### What problem does this PR solve?

```sql
CREATE TABLE `array_test` (
  `id` int(11) NULL COMMENT "",
  `c_array` ARRAY<int(11)> NULL COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(`id`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`id`) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"in_memory" = "false",
"storage_format" = "V2"
);
```
this sql is ok
```sql
select *, array_map(x -> x+1, c_array) from array_test
```
but lambda expressions in `array_map` do not support field arguments

for example:
```sql
select *, array_map(x -> x+id, c_array) from array_test
```

This pr solves two problems
1. support column arg in lambda expression
2. prevent high memory usage due to large block data

Related PR: #xxx
let lambda expression support refer outer slot
Co-authored-by: garenshi <[email protected]>
…pache#45057)

we need not transmit non-TTL cache to normal, just exclude them.
…concurrency. (apache#44850)

### What problem does this PR solve?

In the past, each exchange sink had its own sink buffer.  
If the query concurrency is n, there would be n * n RPCs running
concurrently
in a typical shuffle scenario (each sender instance can send data to all
downstream instances).
Here, we introduce support for shared sink buffers.  
This does not reduce the total number of RPCs but can limit the number
of concurrent RPCs.
…tedly (apache#45043)

### What problem does this PR solve?
in previous PR#33087, we extract common sub expression, and set
multi-layer projects to save computing effort.
project: f1(h(a)), f2(h(a))
=>
multi-layer:
[
L0: a
L1: a, h(a) as x
L2: a, h(a) as x, f1(x), f2(x)
]
"h(a) as x " is computed at layer L1 and layer L2.
this PR avoids the duplicated computing in L2 by set L2 as
L2: a, x, f1(x), f2(x)
…#44092)

### What problem does this PR solve?

Problem Summary:

`RowIdConversion::init_segment_map` may use a lot of memory in
compaction. however, this part of memory is not controlled, the process
may OOM crash.

If process has less than 10M available memory before generating
`rowid_map` for each segment in `init_segment_map`, it will terminate.


![image](https://github.com/user-attachments/assets/652fadc9-bca0-4edd-a03e-04f2546cdf76)
### What problem does this PR solve?

```
private static void logAuditLogImpl(ConnectContext ctx, String origStmt, StatementBase parsedStmt,
            org.apache.doris.proto.Data.PQueryStatistics statistics, boolean printFuzzyVariables) {
       
        //........
       // When we execute a statement, we execute the executeQuery method. In this method, we parse the sql using the new optimizer first, and if the parsing fails, we execute handleQueryException to record the audit log. At this time, there is no executor set, resulting in the npe
        if (!Env.getCurrentEnv().isMaster()) {
            if (ctx.executor != null && ctx.executor.isForwardToMaster()) {
                auditEventBuilder.setState(ctx.executor.getProxyStatus());
                int proxyStatusCode = ctx.executor.getProxyStatusCode();
                if (proxyStatusCode != 0) {
                    auditEventBuilder.setErrorCode(proxyStatusCode);
                    auditEventBuilder.setErrorMessage(ctx.executor.getProxyErrMsg());
                }
            }
        }
        if (ctx.getCommand() == MysqlCommand.COM_STMT_PREPARE && ctx.getState().getErrorCode() == null) {
            auditEventBuilder.setState(String.valueOf(MysqlStateType.OK));
        }
        Env.getCurrentEnv().getWorkloadRuntimeStatusMgr().submitFinishQueryToAudit(auditEventBuilder.build());
    }
```

Co-authored-by: garenshi <[email protected]>
…ache#45061)

Downloading small files is too slow and might cause the clone tablet
task to time out. This PR supports a batch downloading API to speed up
the downloading of small files.

Before

```
succeed to copy tablet 10088, total file size: 19256126 B, cost: 78674 ms, rate: 0.244758 MB/s
```

After

```
succeed to copy tablet 30157, total files: 20006, total file size: 19311624 B, cost: 4016 ms, rate: 4.80867 MB/s
```
morningman and others added 22 commits December 22, 2024 19:02
…5756)

### What problem does this PR solve?

Related PR: apache#45433

Problem Summary:
the `confLock` should be created after replaying in `gsonPostProcess()`
of `ExternalCatalog`, or it will be null.
### What problem does this PR solve?

Related PR: apache#45355

Problem Summary:
The `sessionVariable` field is already in parent class
`FileQueryScanNode`,
remove it from `HudiScanNode`.
…CT_COMPRESS (apache#45738)

Related PR: apache#44414

Problem Summary:
In inverted index version 3 mode, using dictionary compression may
lead to incorrect results after a seek operation.
…ache#45298)

The conditions that need to be met to trigger the bug, with the second
condition being somewhat difficult to trigger, are as follows:
1. The number of tablets that need to be fixed exceeds 2000 (in the
pending queue);
2. The scheduling of the lowest priority in the pending queue has
previously experienced a clone failure, with fewer than 3 failures, and
has been put back into the pending queue. Additionally, a new scheduling
request that happens to belong to the same table as the previous one has
a higher priority than the previous scheduling.

The fix is to write the lock trylock in finalize TabletCtx. If the lock
cannot be obtained, the current scheduling will fail and the next one
will be rescheduled


Fix
```
"colocate group clone checker" apache#7557 daemon prio=5 os_prio=0 cpu=686.24ms elapsed=6719.45s tid=0x00007f3e6c039ab0 nid=0x17b08 waiting on condition  [0x00007f3ec77fe000]
(1 similar threads)
   java.lang.Thread.State: WAITING (parking)
        at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
        - parking to wait for  <0x000010014d223908> (a java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
        at java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:211)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire([email protected]/AbstractQueuedSynchronizer.java:715)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire([email protected]/AbstractQueuedSynchronizer.java:938)
        at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock([email protected]/ReentrantReadWriteLock.java:959)
        at org.apache.doris.common.lock.MonitoredReentrantReadWriteLock$WriteLock.lock(MonitoredReentrantReadWriteLock.java:98)
        at org.apache.doris.catalog.Table.writeLockIfExist(Table.java:211)
        at org.apache.doris.clone.TabletSchedCtx.releaseResource(TabletSchedCtx.java:940)
        at org.apache.doris.clone.TabletSchedCtx.releaseResource(TabletSchedCtx.java:898)
        at org.apache.doris.clone.TabletScheduler.releaseTabletCtx(TabletScheduler.java:1743)
        at org.apache.doris.clone.TabletScheduler.finalizeTabletCtx(TabletScheduler.java:1625)
        at org.apache.doris.clone.TabletScheduler.addTablet(TabletScheduler.java:287)
        - locked <0x0000100009429110> (a org.apache.doris.clone.TabletScheduler)
        at org.apache.doris.clone.ColocateTableCheckerAndBalancer.matchGroups(ColocateTableCheckerAndBalancer.java:563)
        at org.apache.doris.clone.ColocateTableCheckerAndBalancer.runAfterCatalogReady(ColocateTableCheckerAndBalancer.java:340)
        at org.apache.doris.common.util.MasterDaemon.runOneCycle(MasterDaemon.java:58)
        at org.apache.doris.common.util.Daemon.run(Daemon.java:119)
```
pick random rpc coordinator to do fetch_remote_tablet_schema service
### What problem does this PR solve?
Problem Summary:
In Hive, the ORC file format supports fixed-length CHAR types (CHAR(n))
by padding strings with spaces to ensure the fixed length. When data is
written into ORC tables, the actual stored value includes additional
trailing spaces to meet the defined length. These padded spaces are also
considered during the computation of statistics.

However, in Doris, fixed-length CHAR types (CHAR(n)) and variable-length
VARCHAR types are internally represented as the same type. Doris does
not pad CHAR values with spaces and treats them as regular strings. As a
result, when Doris reads ORC files generated by Hive and parses the
statistics, the differences in the handling of CHAR types between the
two systems can lead to inconsistencies or incorrect statistics.
```sql
create table fixed_char_table (
  i int,
  c char(2)
) stored as orc;

insert into fixed_char_table values(1,'a'),(2,'b '), (3,'cd');
select * from fixed_char_table where c = 'a';
```
before
```text
empty
```
after
```text
1	a
```

If a Hive table undergoes a schema change, such as a column’s type being
modified from INT to STRING, predicate pushdown should be disabled in
such cases. Performing predicate pushdown under these circumstances may
lead to incorrect filtering, as the type mismatch can cause errors or
unexpected behavior during query execution.
```sql
create table type_changed_table (
  id int,
  name string 
) stored as orc;
insert into type_changed_table values (1, 'Alice'), (2, 'Bob'), (3, 'Charlie');
ALTER TABLE type_changed_table CHANGE COLUMN id id STRING;
select * from type_changed_table where id = '1';
select
```
before
```text
empty
```
after
```text
1	a
```
### Release note
[fix](orc) Not push down fixed char type in orc reader apache#45484
…he#45748)

optimize rewrite of synchronize materialize view
1. cache toSql
2. fast parse UnboundSlot in NereidsParser.parseExpression
### What problem does this PR solve?

Issue Number: close #xxx

Related PR: apache#45382

Problem Summary:

apache#45382 had fix compare date/datev1 with datetime literal wrong cutting.
but it not fix completely.

```
if (right instanceof DateTimeLiteral) {
                    DateTimeLiteral dateTimeLiteral = (DateTimeLiteral) right;
                    right = migrateToDateV2(dateTimeLiteral);
                    if (dateTimeLiteral.getHour() != 0 || dateTimeLiteral.getMinute() != 0
                            || dateTimeLiteral.getSecond() != 0) {
                            ...
                    }
}
```


For the above code, if check right is date time literal, but notice that
datetimev2 literal is datetime literal's child class. so datetimev2
literal will also run the above code. And datetimev2 literal should
check its microseconds not equals to 0.

for example: `date_a = '2020-01-01 00:00:00.01'` should opt as `FALSE`,
but not `date_a = '2020-01-01'`.
…apache#45588)

### What problem does this PR solve?

Previously, we allowed ColumnPtr to be directly converted to Column*:  
```C++
    ColumnPtr column;  
    const IColumn* ptr = column;  
```
This can easily cause confusion.  
For example, in the following code: 
```C++
    ColumnPtr column;  
    const auto& const_column = check_and_get_column<ColumnConst>(column);  
```
The matched function is:  
```C++
template <>  
const doris::vectorized::ColumnConst* check_and_get_column<doris::vectorized::ColumnConst>(  
        const IColumn* column)  
```
However, the actual type of const_column is:  
```C++
const doris::vectorized::ColumnConst* const&
```

### Release note

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [x] No need to test or manual test. Explain why:
- [x] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [x] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [x] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
…lead to exception in BE (apache#45265)

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

1. totally refactored `FunctionDateOrDateTimeComputation`. removed some
unnecessary function and template. simplified some template
calculations.
2. All Datetime arithmetic operation overflow will lead to exception
now. before for nullable input it will get `NULL` result
see: 
```sql
mysql> select date_add('5000-10-10', interval 10000 year);
+------------------------------------------------+
| years_add(cast('5000-10-10' as DATEV2), 10000) |
+------------------------------------------------+
| NULL                                           |
+------------------------------------------------+
1 row in set (0.10 sec)
```
now:
```sql
ERROR 1105 (HY000): errCode = 2, detailMessage = (xxx)[E-218][E-218] Operation years_add of 5000-10-10, 10000 out of range
```

### Release note

All Datetime arithmetic operation overflow will lead to exception now.

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [x] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [x] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [x] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [x] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
…ndition when updating tablet cumu point (apache#45643)

Currently, when setting tablet's cumu point, aseert fail will happend if
new point is less than local value, resulting BE coredump.

This could happend when race condition happend:
1. thread A try to sync rowset
2. thread A fetch cumu point from ms 
3. thread B update cumu point(like sc/compaction),commit to ms after 2.
and set be tablet cumu point before 4.
4. thread A try to set cumu point seen before and meet the assertion,
coredump.
### What problem does this PR solve?

Add elapsed time in log
Problem Summary:
1. Adding an inverted index cache toggle can help with debugging.
### What problem does this PR solve?

Add test simplify comparison predicate
… case (apache#45784)

Problem Summary:
Related PR:apache#45127

When set `enable_match_without_inverted_index` to `false`,
`enable_common_expr_pushdown` must be `true`, if not, it will throw
`[E-6001]match_any not support execute_match` error.
Add unitest for token extractor for ngram bf index.
### What problem does this PR solve?

Don't call MetricRepo if it is not initialized to avoid NPE.

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

### Release note

None
### What problem does this PR solve?

1. Remove `IntAtomicCounter`, it is equal to `IntCounter`.
2. Remove `CoreLocal` related code. It is not used any more.
@Thearas
Copy link
Contributor

Thearas commented Jan 6, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@eldenmoon eldenmoon changed the title Variant sparse merge master Jan 6, 2025
@eldenmoon eldenmoon merged commit 55bf035 into apache:variant-sparse Jan 7, 2025
11 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.