Skip to content

Commit

Permalink
Docs histogram bucket notes (#423)
Browse files Browse the repository at this point in the history
* docs: histogram behavior notes

* docs: histogram behavior notes

* docs: histogram behavior notes

* docs: histogram behavior notes

* docs: histogram behavior notes

* docs: histogram behavior notes
  • Loading branch information
lukashornych authored Jan 12, 2024
1 parent 2572f3a commit 5643c91
Show file tree
Hide file tree
Showing 4 changed files with 141 additions and 10 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
```json
{
"histograms": {
"width": {
"buckets": [
{
"occurrences": 1291,
"requested": true,
"threshold": 15.90
},
{
"index": 1,
"occurrences": 210,
"requested": true,
"threshold": 97.92
},
{
"index": 2,
"occurrences": 1876,
"requested": true,
"threshold": 179.94
},
{
"index": 3,
"occurrences": 531,
"requested": true,
"threshold": 261.96
},
{
"index": 4,
"occurrences": 2,
"requested": true,
"threshold": 343.98
},
{
"index": 5,
"occurrences": 1,
"requested": true,
"threshold": 426.00
}
],
"max": 508.00,
"min": 15.90,
"overallCount": 3911
}
}
}
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
```json
{
"buckets": [
{
"occurrences": 3,
"requested": true,
"threshold": 59.00
},
{
"index": 1,
"occurrences": 2,
"requested": true,
"threshold": 72.00
},
{
"index": 2,
"occurrences": 3,
"requested": true,
"threshold": 85.00
},
{
"index": 3,
"occurrences": 1,
"requested": true,
"threshold": 98.00
},
{
"index": 4,
"occurrences": 6,
"requested": true,
"threshold": 111.00
},
{
"index": 5,
"requested": true,
"threshold": 124.00
},
{
"index": 6,
"occurrences": 3,
"requested": true,
"threshold": 137.00
}
],
"max": 150.00,
"min": 59.00,
"overallCount": 18
}
```
48 changes: 41 additions & 7 deletions documentation/user/en/query/requirements/histogram.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,17 @@ The histogram data structure is optimized for frontend rendering. It contains th
- **`min`** - the minimum value of the attribute in the current filter context
- **`max`** - the maximum value of the attribute in the current filter context
- **`overallCount`** - the number of elements whose attribute value falls into any of the buckets (it's basically a sum of all bucket occurrences)
- **`buckets`** - an array of buckets, each of which contains the following fields:
- **`buckets`** - an *sorted* array of buckets, each of which contains the following fields:
- **`index`** - the index of the bucket in the array
- **`threshold`** - the minimum value of the attribute in the bucket
- **`threshold`** - the minimum value of the attribute in the bucket, the maximum value is the threshold of the next bucket (or `max` for the last bucket)
- **`occurrences`** - the number of elements whose attribute value falls into the bucket
- **`requested`** - contains `true` if the query contained [attributeBetween](../filtering/comparable.md#attribute-between)
or [priceBetween](../filtering/price.md#price-between) constraint for particular attribute / price
and the bucket threshold lies within the range (inclusive) of the constraint
- **`requested`**:
- contains `true` if the query didn't contain any [attributeBetween](../filtering/comparable.md#attribute-between)
or [priceBetween](../filtering/price.md#price-between) constraints
- contains `true` if the query contained [attributeBetween](../filtering/comparable.md#attribute-between)
or [priceBetween](../filtering/price.md#price-between) constraint for particular attribute / price
and the bucket threshold lies within the range (inclusive) of the constraint
- contains `false` otherwise

## Attribute histogram

Expand Down Expand Up @@ -114,6 +118,8 @@ The histogram result in JSON format is a bit more verbose, but it's still quite

</LanguageSpecific>

</Note>

### Attribute histogram contents optimization

During user testing, we found that histograms with scarce data are not very useful. Besides the fact that they don't
Expand All @@ -133,9 +139,21 @@ To demonstrate the optimization of the histogram, we will use the following exam

The simplified result looks like this:

<MDInclude sourceVariable="extraResults.AttributeHistogram">[The result of optimized `width` attribute histogram](/documentation/user/en/query/requirements/examples/histogram/attribute-histogram-optimized.evitaql.string.md)</MDInclude>

<Note type="info">

<NoteTitle toggles="true">

##### The optimized result of `width` and `height` attribute histogram in JSON format

</NoteTitle>

The optimized histogram result in JSON format is a bit more verbose, but it's still quite readable:

<LS to="e,j,c">

<MDInclude sourceVariable="extraResults.AttributeHistogram">[The result of optimized `width` attribute histogram](/documentation/user/en/query/requirements/examples/histogram/attribute-histogram-optimized.evitaql.string.md)</MDInclude>
<MDInclude sourceVariable="extraResults.AttributeHistogram">[The result of optimized `width` attribute histogram](/documentation/user/en/query/requirements/examples/histogram/attribute-histogram-optimized.evitaql.json.md)</MDInclude>

</LS>
<LS to="g">
Expand All @@ -149,6 +167,8 @@ The simplified result looks like this:

</LS>

</Note>

As you can see, the number of buckets has been adjusted to fit the data, contrary to the default behavior.

## Price histogram
Expand Down Expand Up @@ -249,9 +269,21 @@ To demonstrate the optimization of the histogram, we will use the following exam

The simplified result looks like this:

<MDInclude sourceVariable="extraResults.PriceHistogram">[The result of optimized price histogram](/documentation/user/en/query/requirements/examples/histogram/price-histogram-optimized.evitaql.string.md)</MDInclude>

<Note type="info">

<NoteTitle toggles="true">

##### The result of optimized price histogram in JSON format

</NoteTitle>

The optimized histogram result in JSON format is a bit more verbose, but it's still quite readable:

<LS to="e,j,s">

<MDInclude sourceVariable="extraResults.PriceHistogram">[The result of optimized price histogram](/documentation/user/en/query/requirements/examples/histogram/price-histogram-optimized.evitaql.string.md)</MDInclude>
<MDInclude sourceVariable="extraResults.PriceHistogram">[The result of optimized price histogram](/documentation/user/en/query/requirements/examples/histogram/price-histogram-optimized.evitaql.json.md)</MDInclude>

</LS>
<LS to="g">
Expand All @@ -265,4 +297,6 @@ The simplified result looks like this:

</LS>

</Note>

As you can see, the number of buckets has been adjusted to fit the data, contrary to the default behavior.
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
* | __/\ V /| | || (_| | |_| | |_) |
* \___| \_/ |_|\__\__,_|____/|____/
*
* Copyright (c) 2023
* Copyright (c) 2023-2024
*
* Licensed under the Business Source License, Version 1.1 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -456,9 +456,9 @@ Stream<DynamicTest> testSingleFileDocumentation() {
Stream<DynamicTest> testSingleFileDocumentationAndCreateOtherLanguageSnippets() {
return this.createTests(
Environment.DEMO_SERVER,
getRootDirectory().resolve("documentation/user/en/query/requirements/facet.md"),
getRootDirectory().resolve("documentation/user/en/query/requirements/histogram.md"),
ExampleFilter.values(),
CreateSnippets.MARKDOWN, /*CreateSnippets.JAVA, CreateSnippets.GRAPHQL, */CreateSnippets.REST/*, CreateSnippets.CSHARP*/
CreateSnippets.MARKDOWN, /*CreateSnippets.JAVA,*/ CreateSnippets.GRAPHQL, CreateSnippets.REST/*, CreateSnippets.CSHARP*/
).stream();
}

Expand Down

0 comments on commit 5643c91

Please sign in to comment.