Skip to content

Commit

Permalink
Add composite aggregations content
Browse files Browse the repository at this point in the history
Signed-off-by: Melissa Vagi <[email protected]>
  • Loading branch information
vagimeli committed Jul 10, 2024
1 parent 621d0c0 commit 1fdb04d
Show file tree
Hide file tree
Showing 6 changed files with 930 additions and 16 deletions.
148 changes: 146 additions & 2 deletions _aggregations/bucket/early-termination.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,152 @@
---
layout: default
title: Early termination
title: Optimizing composite aggregations with early termination
parent: Composite
grand_parent: Bucket aggregations
great_grand_parent: Aggregations
nav_order: 35
---
---

# Optimizing composite aggregations with early termination

Composite aggregations can be optimized for better performance by using the early termination feature. Early termination stops processing the aggregation as soon as it has found all the relevant buckets.

## Setting the index sort

To enable early termination, you need to set the `sort.field` and `sort.order` settings on your index. These settings define the order in which the documents are sorted in the index, which should match the order of the sources in your composite aggregation.

The following example request shows how to set the index sort when creating an index, sorting by `username` in ascending order and then by the `timestamp` field in descending order:

```json
PUT my-index
{
"settings": {
"index": {
"sort.field": ["username", "timestamp"],
"sort.order": ["asc", "desc"]
}
},
"mappings": {
"properties": {
"username": {
"type": "keyword",
"doc_values": true
},
"timestamp": {
"type": "date"
}
}
}
}
```
{% include copy-curl.html %}


## Ordering sources

For optimal early termination, composite aggregation sources should be ordered to match the index sort, with higher cardinality sources placed first, followed by lower cardinality sources. The field order within the aggregation must align with the index sort order.

For example, if the index is sorted by `username` (ascending) and then `timestamp` (descending), your composite aggregation should have the same order similar the following query:

```json
GET /my-index/_search
{
"size": 0,
"aggs": {
"my_buckets": {
"composite": {
"sources": [
{ "user_name": { "terms": { "field": "username" } } },
{ "date": { "date_histogram": { "field": "timestamp", "calendar_interval": "1d", "order": "desc" } } }
]
}
}
}
}
```
{% include copy-curl.html %}

#### Example response

```json
{
"took": 10,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"my_buckets": {
"buckets": []
}
}
}
```
{% include copy-curl.html %}

## Disabling total hit tracking

To further optimize performance, you can disable the tracking of total hits by setting `track_total_hits` to `false` in your query. This prevents OpenSearch from calculating the total number of matching documents for every page of results. Note that if you need to know the total number of matching documents, you can retrieve it from the first request and skip the calculation for subsequent requests. See the following example query:

```json
GET /my-index/_search
{
"size": 0,
"track_total_hits": false,
"aggs": {
"my_buckets": {
"composite": {
"sources": [
{ "user_name": { "terms": { "field": "username" } } },
{ "date": { "date_histogram": { "field": "timestamp", "calendar_interval": "1d", "order": "desc" } } }
]
}
}
}
}
```
{% include copy-curl.html %}

#### Example response

```json
{
"took": 13,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"max_score": null,
"hits": []
},
"aggregations": {
"my_buckets": {
"buckets": []
}
}
}
```
{% include copy-curl.html %}

## Additional considerations

Keep in the following considerations in mind when working with this feature:

- Multi-valued fields cannot be used for early termination, so it is recommended to place them last in the `sources` array.
- Index sorting can potentially slow down indexing operations, so it is important to test the impact of index sorting on your specific use case and dataset.
- If the index is not sorted, composite aggregations will still attempt early termination if the query matches all documents, for example, a `match_all` query.
121 changes: 119 additions & 2 deletions _aggregations/bucket/missing-bucket.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,125 @@
---
layout: default
title: Missing bucket
title: Handling missing buckets
parent: Composite
grand_parent: Bucket aggregations
great_grand_parent: Aggregations
nav_order: 20
---
---

## Handling missing buckets

By default, composite aggregations exclude documents that do not have a value for a particular source. However, you can choose to include these missing values by setting the `missing_bucket` parameter to `true` for the relevant source.

## Syntax

The syntax for handling missing values in a composite aggregation requires you to include the `missing_bucket` parameter with a value of `true` within the relevant source definition, as shown in the following example syntax for the `sources` array.

```json
"sources": [
{
"NAME": {
"AGGREGATION": {
"field": "FIELD",
"missing_bucket": true
}
}
}
]
```
{% include copy-curl.html %}

---

## Example

For example, the following query groups documents by product name using a `terms` aggregation and includes a bucket for documents that do not have a product name specified:

```json
GET /sales/_search
{
"size": 0,
"aggs": {
"sales_by_day_product": {
"composite": {
"sources": [
{
"day": {
"date_histogram": {
"field": "timestamp",
"calendar_interval": "1d",
"order": "desc"
}
}
},
{
"product": {
"terms": {
"field": "product.keyword",
"order": "asc",
"missing_bucket": true
}
}
}
]
}
}
}
}
```
{% include copy-curl.html %}

#### Example response

```json
{
"took": 23,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"sales_by_day_product": {
"after_key": {
"day": 1680307200000,
"product": "Product B"
},
"buckets": [
{
"key": {
"day": 1680393600000,
"product": "Product A"
},
"doc_count": 1
},
{
"key": {
"day": 1680307200000,
"product": "Product A"
},
"doc_count": 1
},
{
"key": {
"day": 1680307200000,
"product": "Product B"
},
"doc_count": 1
}
]
}
}
}
```
{% include copy-curl.html %}
8 changes: 0 additions & 8 deletions _aggregations/bucket/ordering-composite-buckets.md

This file was deleted.

Loading

0 comments on commit 1fdb04d

Please sign in to comment.