Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend PrefilterExpression for Literal and Iri #1653

Merged
merged 124 commits into from
Dec 6, 2024

Conversation

realHannes
Copy link
Collaborator

@realHannes realHannes commented Dec 2, 2024

With this PR, the prefilter expressions implemented in #1619 also apply to literals and IRIs. For example the following query only extracts the relevant, prefiltered blocks from the IndexScan:

SELECT * {
?s ?p ?o FILTER (?o >= "hallo" && ?o <= "hello")
}

realHannes and others added 30 commits April 25, 2024 19:07
realHannes and others added 17 commits November 27, 2024 11:07
The recent implementation of lazy joins introduced a bug that lead to deadlocks if the result of a lazy join was not fully consumed (e.g. because of a LIMIT clause, or because the query was cancelled or an exception occured upstream.
This PR fixes that bug.
Co-authored-by: Johannes Kalmbach <[email protected]>
…urg#1646)

An update can invalidate a cached query result in the sense that if one would run the query again after the update, the result may be different. This was ignored so far, and is now considered as follows: Each `LocatedTriplesSnapshot` gets its own "index" (starting from zero and then incremented for each new snaphot). That index becomes part of the cache key. That way, a query will make use of a cached result if and only if there was no update between the time of the query and the time when the cached result was computed.
An update can invalidate a cached query result in the sense that if one would run the query again after the update, the result may be different. This was ignored so far, and is now considered as follows: Each `LocatedTriplesSnapshot` gets its own "index" (starting from zero and then incremented for each new snaphot). That index becomes part of the cache key. That way, a query will make use of a cached result if and only if there was no update between the time of the query and the time when the cached result was computed.
… of `Index Scan`s (ad-freiburg#1619)

With this PR, filter expressions that can be evaluated via binary search on a sorted input are directly evaluated on the block metadata of an IndexScan. For example in a query that contains `{ ?s ?p ?o FILTER (?o > 3)`} only the blocks of the full index scan (sorted by the object) are read from disk that according to their metadata might contain values `> 3`.

Currently this mechanism has the following limitations:
1. It can only be applied if the IndexScan directly is the child of the FILTER clause
2. It can only be applied to logical expressions (AND/OR/NOT) and to relational expressions (greater than, equal to, etc.) between a variable and a constant. Currently the constant can not yet be an IRI or Literal.
@realHannes realHannes requested a review from joka921 December 3, 2024 11:52
Copy link

codecov bot commented Dec 3, 2024

Codecov Report

Attention: Patch coverage is 98.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 89.62%. Comparing base (7680177) to head (b88f3b1).
Report is 8 commits behind head on master.

Files with missing lines Patch % Lines
src/engine/sparqlExpressions/LiteralExpression.h 94.11% 0 Missing and 1 partial ⚠️
...engine/sparqlExpressions/RelationalExpressions.cpp 92.85% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1653      +/-   ##
==========================================
+ Coverage   89.43%   89.62%   +0.18%     
==========================================
  Files         375      381       +6     
  Lines       36338    36839     +501     
  Branches     4100     4162      +62     
==========================================
+ Hits        32499    33017     +518     
- Misses       2518     2521       +3     
+ Partials     1321     1301      -20     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A first round of reviews. I haven't looked at the tests yet.
Overall this looks very promising, I will try this out later!

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some additional suggestions to make the code even cleaner,
This already looks very good and works like a charm!

src/engine/sparqlExpressions/PrefilterExpressionIndex.h Outdated Show resolved Hide resolved
src/engine/sparqlExpressions/PrefilterExpressionIndex.h Outdated Show resolved Hide resolved
src/engine/sparqlExpressions/PrefilterExpressionIndex.cpp Outdated Show resolved Hide resolved
src/engine/sparqlExpressions/PrefilterExpressionIndex.cpp Outdated Show resolved Hide resolved
src/engine/sparqlExpressions/PrefilterExpressionIndex.cpp Outdated Show resolved Hide resolved
test/PrefilterExpressionTestHelpers.h Outdated Show resolved Hide resolved
Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small additional improvements which save even more code.

src/engine/sparqlExpressions/PrefilterExpressionIndex.cpp Outdated Show resolved Hide resolved
using namespace ad_utility;
constexpr static std::array mirrorPairs{P{LT, GT}, P{LE, GE}, P{GE, LE},
P{GT, LT}, P{EQ, EQ}, P{NE, NE}};
constexpr ConstexprMap<CompOp, CompOp, 6> mirrorMap(mirrorPairs);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
constexpr ConstexprMap<CompOp, CompOp, 6> mirrorMap(mirrorPairs);
constexpr ConstexprMap<CompOp, CompOp, 6> mirrorMap({{LT, GT}, {LE, GE}, ...});

(should work , no need for the using P, using namespace and explicit array mirrorPairs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

W.r.t. the std::pair<CompOp, CompOp> values it seems like that they can't be automatically constructed within an initializer list, still had to use P = std::pair<CompOp, CompOp>.

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much!

@sparql-conformance
Copy link

Copy link

sonarqubecloud bot commented Dec 5, 2024

@joka921 joka921 merged commit 9d9bab0 into ad-freiburg:master Dec 6, 2024
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants