-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Druid 31.0.0 release notes #17092
base: 31.0.0
Are you sure you want to change the base?
Druid 31.0.0 release notes #17092
Conversation
docs/release-info/release-notes.md
Outdated
@@ -57,46 +57,541 @@ For tips about how to write a good release note, see [Release notes](https://git | |||
|
|||
This section contains important information about new and existing features. | |||
|
|||
### Compaction on MSQ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets rename this headline to Compaction Features. And then list
- Compaction scheduler with greater flexibility and control over when and what to compact
- MSQ Based Compaction for performant compaction jobs
- Concurrent compaction is now GA
No need to list all the nitty-gritty details as you have done right now. They just move to the different section or in the docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated - asked @317brian to add the detail to the compaction docs.
Co-authored-by: Kashif Faraz <[email protected]>
Co-authored-by: 317brian <[email protected]>
docs/release-info/release-notes.md
Outdated
- Fixed an issue with `ScanQueryFrameProcessor` cursor build not adjusting intervals [#17168](https://github.com/apache/druid/pull/17168) | ||
- Improved worker cancellation for the MSQ task engine to prevent race conditions [#17046](https://github.com/apache/druid/pull/17046) | ||
- Improved memory management to better support multi-threaded workers [#17057](https://github.com/apache/druid/pull/17057) | ||
- Reduced memory usage when transferring sketches between the MSQ task engine controller and worker [#16269](https://github.com/apache/druid/pull/16269) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is duplicated from line 245. Also, a better way to word it might be "Add new format for serialization of sketches between MSQ controller and worker to reduce memory usage".
- Improved memory management to better support multi-threaded workers [#17057](https://github.com/apache/druid/pull/17057) | ||
- Reduced memory usage when transferring sketches between the MSQ task engine controller and worker [#16269](https://github.com/apache/druid/pull/16269) | ||
- Improved error handling when retrieving Avro schemas from registry [#16684](https://github.com/apache/druid/pull/16684) | ||
- Fixed issues related to partitioning boundaries in the MSQ task engine's window functions [#16729](https://github.com/apache/druid/pull/16729) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is duplicated from line 247.
Also, it's a nit but a better message might be: Fixed issues related to partitioning boundaries for window functions in the MSQ task engine
##### Other streaming ingestion improvements | ||
[#16358](https://github.com/apache/druid/pull/16358) | ||
|
||
#### Other SQL-based ingestion improvements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this file expected to contain all PRs marked with milestone 31.0.0? For example, I don't see #16804 mentioned, is that expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Akshat, we typically don't include bug fixes unless there's a specific reason to. It's just new features/improvements. There are currently some fixes in there that I'll remove as part of the final cleanup.
It looks like 16804 and 17141 didn't have the bug labeled applied. Was that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks for the info!
It looks like 16804 and 17141 didn't have the bug labeled applied. Was that intentional?
Nope. I don't have the access to update PR labels, but yes both those PRs are bug-fixes.
- Reduced memory usage when transferring sketches between the MSQ task engine controller and worker [#16269](https://github.com/apache/druid/pull/16269) | ||
- Improved error handling when retrieving Avro schemas from registry [#16684](https://github.com/apache/druid/pull/16684) | ||
- Fixed issues related to partitioning boundaries in the MSQ task engine's window functions [#16729](https://github.com/apache/druid/pull/16729) | ||
- Fixed a boost column issue causing quantile sketches to incorrectly estimate the number of output partitions to create [#17141](https://github.com/apache/druid/pull/17141) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This is MSQ window function specific, so we can maybe add that to the message: Fixed a boost column issue causing quantile sketches to incorrectly estimate the number of output partitions to create for window functions in MSQ task engine
Also, I see this PR also mentioned in the Other querying improvements
section - is that expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, it should not be duplicated. Will remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#16887 is not added in the release notes. A line item somewhere would be good.
@LakshSingla It doesn't look like it's in the milestone. Should I add it to the milestone too? |
docs/release-info/release-notes.md
Outdated
|
||
### Projections (experimental) | ||
|
||
Druid 31.0.0 includes experimental support for projections in segments. Like materialized views, projections can improve the performance of queries by optimizing the route the query takes when it executes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, i gave this a shot, also included some instruction on how to use the feature since it isn't documented yet
Druid 31.0.0 includes experimental support for new feature called projections. Projections are grouped pre-aggregates of a segment that are automatically used at query time to optimize execution for any queries which 'fit' the shape of the projection by reducing both computation and i/o cost by reducing the number of rows which need to be processed. Projections are contained within segments of a datasource, and do increase the segment size, but are also able to share data such as value dictionaries of dictionary encoded columns with the columns of the base segment.
As an experimental feature, projections are not well documented yet, but can be defined for streaming ingestion and 'classic' batch ingestion as part of the
dataSchema
. For example, using the standard wikipedia example:
"dataSchema": {
"granularitySpec": {
...
},
"dataSource": ...,
"timestampSpec": {
...
},
"dimensionsSpec": {
...
},
"projections": [
{
"type": "aggregate",
"name": "channel_page_hourly_distinct_user_added_deleted",
"groupingColumns": [
{
"type": "long",
"name": "__gran"
},
{
"type": "string",
"name": "channel"
},
{
"type": "string",
"name": "page"
}
],
"virtualColumns": [
{
"type": "expression",
"expression": "timestamp_floor(__time, 'PT1H')",
"name": "__gran",
"outputType": "LONG"
}
],
"aggregators": [
{
"type": "HLLSketchBuild",
"name": "distinct_users",
"fieldName": "user",
"round": true
},
{
"type": "longSum",
"name": "sum_added",
"fieldName": "added"
},
{
"type": "longSum",
"name": "sum_deleted",
"fieldName": "deleted"
}
]
},
...
]
},
...
The
groupingColumns
define the order which data is sorted in the projection. Instead of explicitly defining granularity like for the base table, it is defined by defining a virtual column; during ingestion the processing logic finds the ‘finest’ granularity virtual column that is atimestamp_floor
expression and uses it as the__time
column for the projection. Projections do not need to have a time column defined, in which case they can still match queries that are not grouping on time.
Projections only can currently be defined by classic ingestion, but they can still be used by queries using MSQ or the new Dart engine. Future development will allow projections to be created as part of MSQ based ingestion as well.
There are a few new query context flags which have been added to aid in experimentation with projections.
useProjection
accepts a specific projection name and instructs the query engine that it must use that projection, and will fail the query if the projection does not match the queryforceProjections
acceptstrue
orfalse
and instructs the query engine that it must use a projection, and will fail the query if it cannot find a matching projectionnoProjections
accpetstrue
orfalse
and instructs the query engines to not use any projections
We have a lot of plans to continue to improve this feature in the coming releases, but are excited to get it out there so users can begin experimentation since projections can dramatically improve query performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm still working on the writeup for a design proposal for this, another option would be to link to that from this since it should contain some of this information
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I split this up. Part of it is in the highlight section and the details are in the Querying section. Also instead of including the JSON, I linked to it.
Release and upgrade notes for Druid 31.0.0
This PR has: