Skip to content

Commit

Permalink
Add post for release 3.2
Browse files Browse the repository at this point in the history
  • Loading branch information
rubensworks committed Jul 5, 2024
1 parent 3c5cfe9 commit ec8a561
Show file tree
Hide file tree
Showing 3 changed files with 218 additions and 68 deletions.
109 changes: 109 additions & 0 deletions pages/blog/2024-07-05-release_3_2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
---
title: 'Release 3.2: 🔎 Knowing what to optimize'
---

For this release, we mainly focused on improving tooling to more easily track down performance issues.
Concretely, we improved our query explain output,
started running multiple benchmarks in our CI to avoid performance regressions,
and applied several performance improvements that were identified following these changes.

<!-- excerpt-end -->

## 🔎 Query explain improvements

Comunica has had several [query explain functionalities](/docs/query/advanced/explain/) for a while now,
to show how a query is parsed, optimized (logical), and executed (physical).
However, the physical plan output tended to be very verbose, which made it difficult to draw conclusions from.

In this update, the physical plan output has undergone three main changes:

1. The output from joins (especially Bind Joins) are _compacted_, so that recurring patterns in sub-plans are not repeated. Instead, a counter is added showing how many times a certain sub-plan was executed.
2. The default output is a compact text representation instead of the previous JSON output. (The old JSON representation is still available when passing the `physical-json` explain value)
3. Additional metadata is emitted, such as cardinalities and execution times.

For example, outputs such as the following can now be obtained:

```bash
$ node bin/query.js https://fragments.dbpedia.org/2016-04/en \
-q 'SELECT ?movie ?title ?name
WHERE {
?movie dbpedia-owl:starring [ rdfs:label "Brad Pitt"@en ];
rdfs:label ?title;
dbpedia-owl:director [ rdfs:label ?name ].
FILTER LANGMATCHES(LANG(?title), "EN")
FILTER LANGMATCHES(LANG(?name), "EN")
}' --explain physical
```
```text
project (movie,title,name)
join
join-inner(bind) bindOperation:(?g_0 http://www.w3.org/2000/01/rdf-schema#label "Brad Pitt"@en) bindCardEst:~2 cardReal:43 timeSelf:2.567ms timeLife:667.726ms
join compacted-occurrences:1
join-inner(bind) bindOperation:(?movie http://dbpedia.org/ontology/starring http://dbpedia.org/resource/Brad_Pitt) bindCardEst:~40 cardReal:43 timeSelf:6.011ms timeLife:641.139ms
join compacted-occurrences:38
join-inner(bind) bindOperation:(http://dbpedia.org/resource/12_Monkeys http://dbpedia.org/ontology/director ?g_1) bindCardEst:~1 cardReal:1 timeSelf:0.647ms timeLife:34.827ms
filter compacted-occurrences:1
join
join-inner(nested-loop) cardReal:1 timeSelf:0.432ms timeLife:4.024ms
pattern (http://dbpedia.org/resource/12_Monkeys http://www.w3.org/2000/01/rdf-schema#label ?title) cardEst:~1 src:0
pattern (http://dbpedia.org/resource/Terry_Gilliam http://www.w3.org/2000/01/rdf-schema#label ?name) cardEst:~1 src:0
join compacted-occurrences:2
join-inner(multi-empty) timeSelf:0.004ms timeLife:0.053ms
pattern (http://dbpedia.org/resource/Contact_(1992_film) http://dbpedia.org/ontology/director ?g_1) cardEst:~0 src:0
filter cardEst:~5,188,789.667
join
join-inner(nested-loop) timeLife:0.6ms
pattern (http://dbpedia.org/resource/Contact_(1992_film) http://www.w3.org/2000/01/rdf-schema#label ?title) cardEst:~1 src:0
pattern (?g_1 http://www.w3.org/2000/01/rdf-schema#label ?name) cardEst:~20,013,903 src:0
join compacted-occurrences:1
join-inner(multi-empty) timeSelf:0.053ms timeLife:0.323ms
pattern (?movie http://dbpedia.org/ontology/director ?g_1) cardEst:~118,505 src:0
pattern (?movie http://dbpedia.org/ontology/starring http://wikidata.dbpedia.org/resource/Q35332) cardEst:~0 src:0
filter cardEst:~242,311,843,844,161
join
join-inner(symmetric-hash) timeLife:36.548ms
pattern (?movie http://www.w3.org/2000/01/rdf-schema#label ?title) cardEst:~20,013,903 src:0
pattern (?g_1 http://www.w3.org/2000/01/rdf-schema#label ?name) cardEst:~20,013,903 src:0
sources:
0: QuerySourceHypermedia(https://fragments.dbpedia.org/2016-04/en)(SkolemID:0)
```

## ⚙️ Continuous performance tracking

In order to keep better track of the evolution of Comunica's performance,
we have added continuous performance tracking into our continuous integration.
For various benchmarks, we can now see the evolution of execution times across our commit history.
This allows us to easily identify which changes have a positive or negative impact on performance.

For considering the performance for different aspects, we have included the following benchmarks:

- WatDiv (in-memory)
- WatDiv (TPF)
- Berlin SPARQL Benchmark (in-memory)
- Berlin SPARQL Benchmark (TPF)
- Custom web queries: manually crafted queries to test for specific edge cases over the live Web

This allows us to inspect performance as follows:

<div class="docs-intro-img">
<a href="https://comunica.github.io/comunica-performance-results/comunica/master/benchmarks-total/"><img src="/img/blog/2024-07-05-release_3_2/continuous-perf.png" alt="Continuous performance" style="width:100%" \></a>
</div>

These results can be inspected in [more close detail](https://github.com/comunica/comunica-performance-results) together with execution times per query separately.

## 🏎️ Performance improvements

Thanks to the improvements to our physical query plan output and the continuous performance tracking,
we identified several low-hanging efforts for improving performance:

- [Addition of a hash-based optional join actor](https://github.com/comunica/comunica/commit/de90db0140cd10e2bfdf23c26f9eeff5e94f3ef2)
- [Tweaking constants of our internal join cost model](https://github.com/comunica/comunica/commit/50333c92ed1cf5410f172f608a213424e510986e)
- [Making optional hash and bind join only work with common variables](https://github.com/comunica/comunica/commit/df40c20e001121cd0ae9a9adf67ed221dc2966ba)

Besides these changes, we have many more performance-impacting changes in the pipeline for upcoming releases!

## Full changelog

Besides this, several fixes were applied, as well as various changes and additions.
If you want to learn more about all changes, check out the [full changelog](https://github.com/comunica/comunica/blob/master/CHANGELOG.md#v320---2024-07-05).
119 changes: 78 additions & 41 deletions pages/docs/1_query/advanced/explain.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,66 +147,109 @@ $ comunica-sparql https://fragments.dbpedia.org/2016-04/en \

### Explain physical on the command line

```bash
$ node bin/query.js https://fragments.dbpedia.org/2016-04/en \
-q 'SELECT ?movie ?title ?name
WHERE {
?movie dbpedia-owl:starring [ rdfs:label "Brad Pitt"@en ];
rdfs:label ?title;
dbpedia-owl:director [ rdfs:label ?name ].
FILTER LANGMATCHES(LANG(?title), "EN")
FILTER LANGMATCHES(LANG(?name), "EN")
}' --explain physical

project (movie,title,name)
join
join-inner(bind) bindOperation:(?g_0 http://www.w3.org/2000/01/rdf-schema#label "Brad Pitt"@en) bindCardEst:~2 cardReal:43 timeSelf:2.567ms timeLife:667.726ms
join compacted-occurrences:1
join-inner(bind) bindOperation:(?movie http://dbpedia.org/ontology/starring http://dbpedia.org/resource/Brad_Pitt) bindCardEst:~40 cardReal:43 timeSelf:6.011ms timeLife:641.139ms
join compacted-occurrences:38
join-inner(bind) bindOperation:(http://dbpedia.org/resource/12_Monkeys http://dbpedia.org/ontology/director ?g_1) bindCardEst:~1 cardReal:1 timeSelf:0.647ms timeLife:34.827ms
filter compacted-occurrences:1
join
join-inner(nested-loop) cardReal:1 timeSelf:0.432ms timeLife:4.024ms
pattern (http://dbpedia.org/resource/12_Monkeys http://www.w3.org/2000/01/rdf-schema#label ?title) cardEst:~1 src:0
pattern (http://dbpedia.org/resource/Terry_Gilliam http://www.w3.org/2000/01/rdf-schema#label ?name) cardEst:~1 src:0
join compacted-occurrences:2
join-inner(multi-empty) timeSelf:0.004ms timeLife:0.053ms
pattern (http://dbpedia.org/resource/Contact_(1992_film) http://dbpedia.org/ontology/director ?g_1) cardEst:~0 src:0
filter cardEst:~5,188,789.667
join
join-inner(nested-loop) timeLife:0.6ms
pattern (http://dbpedia.org/resource/Contact_(1992_film) http://www.w3.org/2000/01/rdf-schema#label ?title) cardEst:~1 src:0
pattern (?g_1 http://www.w3.org/2000/01/rdf-schema#label ?name) cardEst:~20,013,903 src:0
join compacted-occurrences:1
join-inner(multi-empty) timeSelf:0.053ms timeLife:0.323ms
pattern (?movie http://dbpedia.org/ontology/director ?g_1) cardEst:~118,505 src:0
pattern (?movie http://dbpedia.org/ontology/starring http://wikidata.dbpedia.org/resource/Q35332) cardEst:~0 src:0
filter cardEst:~242,311,843,844,161
join
join-inner(symmetric-hash) timeLife:36.548ms
pattern (?movie http://www.w3.org/2000/01/rdf-schema#label ?title) cardEst:~20,013,903 src:0
pattern (?g_1 http://www.w3.org/2000/01/rdf-schema#label ?name) cardEst:~20,013,903 src:0

sources:
0: QuerySourceHypermedia(https://fragments.dbpedia.org/2016-04/en)(SkolemID:0)
```

```bash
$ comunica-sparql https://fragments.dbpedia.org/2016-04/en \
-q 'SELECT * { ?s ?p ?o. ?s a ?o } LIMIT 100' --explain physical
-q 'SELECT * { ?s ?p ?o. ?s a ?o } LIMIT 100' --explain physical-json

{
"logical": "slice",
"children": [
{
"logical": "project",
"variables": [
"s",
"o",
"p",
"o"
"s"
],
"children": [
{
"logical": "join",
"children": [
{
"logical": "pattern",
"pattern": "?s ?p ?o"
},
{
"logical": "pattern",
"pattern": "?s http://www.w3.org/1999/02/22-rdf-syntax-ns#type ?o"
},
{
"logical": "join-inner",
"physical": "bind",
"bindIndex": 1,
"bindOperation": {
"source": "QuerySourceHypermedia(https://fragments.dbpedia.org/2016-04/en)(SkolemID:0)",
"pattern": "?s http://www.w3.org/1999/02/22-rdf-syntax-ns#type ?o"
},
"bindOperationCardinality": {
"type": "estimate",
"value": 100022186,
"dataset": "https://fragments.dbpedia.org/2016-04/en?predicate=http%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23type"
},
"bindOrder": "depth-first",
"cardinalities": [
{
"type": "estimate",
"value": 1040358853
"value": 1040358853,
"dataset": "https://fragments.dbpedia.org/2016-04/en"
},
{
"type": "estimate",
"value": 100022186
"value": 100022186,
"dataset": "https://fragments.dbpedia.org/2016-04/en?predicate=http%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23type"
}
],
"joinCoefficients": {
"iterations": 6404592831613.728,
"persistedItems": 0,
"blockingItems": 0,
"requestTime": 556926378.1422498
"requestTime": 8902477556686.99
},
"children": [
{
"logical": "pattern",
"pattern": "http://commons.wikimedia.org/wiki/Special:FilePath/!!!善福寺.JPG ?p http://dbpedia.org/ontology/Image"
},
"childrenCompact": [
{
"logical": "pattern",
"pattern": "http://commons.wikimedia.org/wiki/Special:FilePath/!!!善福寺.JPG ?p http://wikidata.dbpedia.org/ontology/Image"
},
...
{
"logical": "pattern",
"pattern": "http://commons.wikimedia.org/wiki/Special:FilePath/%22..._WAAC_cooks_prepare_dinner_for_the_first_time_in_new_kitchen_at_Fort_Huachuca,_Arizona.%22,_12-05-1942_-_NARA_-_531152.jpg ?p http://wikidata.dbpedia.org/ontology/Image"
"occurrences": 100,
"firstOccurrence": {
"logical": "pattern",
"source": "QuerySourceHypermedia(https://fragments.dbpedia.org/2016-04/en)(SkolemID:0)",
"pattern": "http://commons.wikimedia.org/wiki/Special:FilePath/!!!善福寺.JPG ?p http://dbpedia.org/ontology/Image"
}
}
]
}
Expand Down Expand Up @@ -318,21 +361,15 @@ Will print:
{
explain: true,
type: 'physical',
data: {
logical: 'project',
variables: [ 's', 'p', 'o' ],
children: [
{
logical: 'join',
children: [
{
logical: 'pattern',
pattern: '?s ?p ?o',
},
],
},
],
},
data: `slice
project (o,p,s)
join
join-inner(bind) bindOperation:(?s http://www.w3.org/1999/02/22-rdf-syntax-ns#type ?o) bindCardEst:~100,022,186
pattern (http://commons.wikimedia.org/wiki/Special:FilePath/!!!善福寺.JPG ?p http://dbpedia.org/ontology/Image) src:0 compacted-occurrences:100
sources:
0: QuerySourceHypermedia(https://fragments.dbpedia.org/2016-04/en)(SkolemID:0)
`,
}
*/
```
Loading

0 comments on commit ec8a561

Please sign in to comment.