Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Druid 31.0.0 release notes #17092

Open
wants to merge 30 commits into
base: 31.0.0
Choose a base branch
from

Conversation

writer-jill
Copy link
Contributor

Release and upgrade notes for Druid 31.0.0

This PR has:

  • been self-reviewed.

@@ -57,46 +57,541 @@ For tips about how to write a good release note, see [Release notes](https://git

This section contains important information about new and existing features.

### Compaction on MSQ
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets rename this headline to Compaction Features. And then list

  • Compaction scheduler with greater flexibility and control over when and what to compact
  • MSQ Based Compaction for performant compaction jobs
  • Concurrent compaction is now GA

No need to list all the nitty-gritty details as you have done right now. They just move to the different section or in the docs

Copy link
Contributor Author

@writer-jill writer-jill Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated - asked @317brian to add the detail to the compaction docs.

docs/release-info/release-notes.md Outdated Show resolved Hide resolved
docs/release-info/release-notes.md Show resolved Hide resolved
docs/release-info/release-notes.md Outdated Show resolved Hide resolved
docs/release-info/release-notes.md Outdated Show resolved Hide resolved
docs/release-info/release-notes.md Outdated Show resolved Hide resolved
docs/release-info/release-notes.md Outdated Show resolved Hide resolved
docs/release-info/release-notes.md Outdated Show resolved Hide resolved
docs/release-info/release-notes.md Outdated Show resolved Hide resolved
docs/release-info/release-notes.md Show resolved Hide resolved
docs/release-info/release-notes.md Outdated Show resolved Hide resolved
docs/release-info/release-notes.md Outdated Show resolved Hide resolved
docs/release-info/release-notes.md Outdated Show resolved Hide resolved
docs/release-info/release-notes.md Outdated Show resolved Hide resolved
docs/release-info/release-notes.md Outdated Show resolved Hide resolved
docs/release-info/release-notes.md Outdated Show resolved Hide resolved
docs/release-info/release-notes.md Show resolved Hide resolved
docs/release-info/release-notes.md Outdated Show resolved Hide resolved
docs/release-info/release-notes.md Show resolved Hide resolved
docs/release-info/release-notes.md Show resolved Hide resolved
docs/release-info/release-notes.md Outdated Show resolved Hide resolved
docs/release-info/release-notes.md Outdated Show resolved Hide resolved
docs/release-info/upgrade-notes.md Show resolved Hide resolved
docs/release-info/release-notes.md Outdated Show resolved Hide resolved
docs/release-info/release-notes.md Outdated Show resolved Hide resolved
- Fixed an issue with `ScanQueryFrameProcessor` cursor build not adjusting intervals [#17168](https://github.com/apache/druid/pull/17168)
- Improved worker cancellation for the MSQ task engine to prevent race conditions [#17046](https://github.com/apache/druid/pull/17046)
- Improved memory management to better support multi-threaded workers [#17057](https://github.com/apache/druid/pull/17057)
- Reduced memory usage when transferring sketches between the MSQ task engine controller and worker [#16269](https://github.com/apache/druid/pull/16269)
Copy link
Contributor

@adarshsanjeev adarshsanjeev Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is duplicated from line 245. Also, a better way to word it might be "Add new format for serialization of sketches between MSQ controller and worker to reduce memory usage".

docs/release-info/release-notes.md Outdated Show resolved Hide resolved
- Improved memory management to better support multi-threaded workers [#17057](https://github.com/apache/druid/pull/17057)
- Reduced memory usage when transferring sketches between the MSQ task engine controller and worker [#16269](https://github.com/apache/druid/pull/16269)
- Improved error handling when retrieving Avro schemas from registry [#16684](https://github.com/apache/druid/pull/16684)
- Fixed issues related to partitioning boundaries in the MSQ task engine's window functions [#16729](https://github.com/apache/druid/pull/16729)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is duplicated from line 247.

Also, it's a nit but a better message might be: Fixed issues related to partitioning boundaries for window functions in the MSQ task engine

##### Other streaming ingestion improvements
[#16358](https://github.com/apache/druid/pull/16358)

#### Other SQL-based ingestion improvements
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this file expected to contain all PRs marked with milestone 31.0.0? For example, I don't see #16804 mentioned, is that expected?

Copy link
Contributor

@317brian 317brian Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Akshat, we typically don't include bug fixes unless there's a specific reason to. It's just new features/improvements. There are currently some fixes in there that I'll remove as part of the final cleanup.

It looks like 16804 and 17141 didn't have the bug labeled applied. Was that intentional?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for the info!

It looks like 16804 and 17141 didn't have the bug labeled applied. Was that intentional?

Nope. I don't have the access to update PR labels, but yes both those PRs are bug-fixes.

- Reduced memory usage when transferring sketches between the MSQ task engine controller and worker [#16269](https://github.com/apache/druid/pull/16269)
- Improved error handling when retrieving Avro schemas from registry [#16684](https://github.com/apache/druid/pull/16684)
- Fixed issues related to partitioning boundaries in the MSQ task engine's window functions [#16729](https://github.com/apache/druid/pull/16729)
- Fixed a boost column issue causing quantile sketches to incorrectly estimate the number of output partitions to create [#17141](https://github.com/apache/druid/pull/17141)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This is MSQ window function specific, so we can maybe add that to the message: Fixed a boost column issue causing quantile sketches to incorrectly estimate the number of output partitions to create for window functions in MSQ task engine

Also, I see this PR also mentioned in the Other querying improvements section - is that expected?

Copy link
Contributor

@317brian 317brian Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, it should not be duplicated. Will remove

Copy link
Contributor

@LakshSingla LakshSingla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#16887 is not added in the release notes. A line item somewhere would be good.

@317brian
Copy link
Contributor

317brian commented Oct 9, 2024

#16887 is not added in the release notes. A line item somewhere would be good.

@LakshSingla It doesn't look like it's in the milestone. Should I add it to the milestone too?


### Projections (experimental)

Druid 31.0.0 includes experimental support for projections in segments. Like materialized views, projections can improve the performance of queries by optimizing the route the query takes when it executes.
Copy link
Member

@clintropolis clintropolis Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, i gave this a shot, also included some instruction on how to use the feature since it isn't documented yet

Druid 31.0.0 includes experimental support for new feature called projections. Projections are grouped pre-aggregates of a segment that are automatically used at query time to optimize execution for any queries which 'fit' the shape of the projection by reducing both computation and i/o cost by reducing the number of rows which need to be processed. Projections are contained within segments of a datasource, and do increase the segment size, but are also able to share data such as value dictionaries of dictionary encoded columns with the columns of the base segment.

As an experimental feature, projections are not well documented yet, but can be defined for streaming ingestion and 'classic' batch ingestion as part of the dataSchema. For example, using the standard wikipedia example:

    "dataSchema": {
      "granularitySpec": {
        ...
      },
      "dataSource": ...,
      "timestampSpec": {
        ...
      },
      "dimensionsSpec": {
        ...
      },
      "projections": [
        {
          "type": "aggregate",
          "name": "channel_page_hourly_distinct_user_added_deleted",
          "groupingColumns": [
            {
              "type": "long",
              "name": "__gran"
            },
            {
              "type": "string",
              "name": "channel"
            },
            {
              "type": "string",
              "name": "page"
            }
          ],
          "virtualColumns": [
            {
              "type": "expression",
              "expression": "timestamp_floor(__time, 'PT1H')",
              "name": "__gran",
              "outputType": "LONG"
            }
          ],
          "aggregators": [
            {
              "type": "HLLSketchBuild",
              "name": "distinct_users",
              "fieldName": "user",
              "round": true
            },
            {
              "type": "longSum",
              "name": "sum_added",
              "fieldName": "added"
            },
            {
              "type": "longSum",
              "name": "sum_deleted",
              "fieldName": "deleted"
            }
          ]
        },
        ...
      ]
    },
    ...

The groupingColumns define the order which data is sorted in the projection. Instead of explicitly defining granularity like for the base table, it is defined by defining a virtual column; during ingestion the processing logic finds the ‘finest’ granularity virtual column that is a timestamp_floor expression and uses it as the __time column for the projection. Projections do not need to have a time column defined, in which case they can still match queries that are not grouping on time.

Projections only can currently be defined by classic ingestion, but they can still be used by queries using MSQ or the new Dart engine. Future development will allow projections to be created as part of MSQ based ingestion as well.

There are a few new query context flags which have been added to aid in experimentation with projections.

  • useProjection accepts a specific projection name and instructs the query engine that it must use that projection, and will fail the query if the projection does not match the query
  • forceProjections accepts true or false and instructs the query engine that it must use a projection, and will fail the query if it cannot find a matching projection
  • noProjections accpets true or false and instructs the query engines to not use any projections

We have a lot of plans to continue to improve this feature in the coming releases, but are excited to get it out there so users can begin experimentation since projections can dramatically improve query performance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm still working on the writeup for a design proposal for this, another option would be to link to that from this since it should contain some of this information

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I split this up. Part of it is in the highlight section and the details are in the Querying section. Also instead of including the JSON, I linked to it.

@317brian 317brian marked this pull request as ready for review October 15, 2024 18:56
docs/release-info/release-notes.md Outdated Show resolved Hide resolved
docs/release-info/release-notes.md Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.