-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add BlankNode support for SERVICE #1504
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1504 +/- ##
==========================================
+ Coverage 88.93% 88.94% +0.01%
==========================================
Files 364 366 +2
Lines 33127 33194 +67
Branches 3715 3722 +7
==========================================
+ Hits 29462 29526 +64
- Misses 2433 2439 +6
+ Partials 1232 1229 -3 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have looked at everything but the tests. This already looks nice.
You have forgotten some switch statements that use the Datatype of an ID
(at least in ExportQueryExecutionTrees.cpp
. But search for all occurences of the BlankNodeIndex
, in every occasion you should also do at least something for NewBlankNodeIndex
.
src/engine/QueryExecutionContext.h
Outdated
@@ -116,6 +118,17 @@ class QueryExecutionContext { | |||
updateCallback_(nlohmann::ordered_json(runtimeInformation).dump()); | |||
} | |||
|
|||
// Getter for the global `NEXT_NEWBLANKNODEINDEX` counter. | |||
[[nodiscard]] static int getNextNewBlankNodeIndex() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't make this function static
, even though it can currently be. The staticness should in this case be an implementation detail.
src/engine/QueryExecutionContext.h
Outdated
if (NEXT_NEWBLANKNODEINDEX == std::numeric_limits<uint64_t>::max()) | ||
[[unlikely]] { | ||
throw std::runtime_error( | ||
"Counter for blank nodes overflowed. Tell the developers to reset " | ||
"it."); | ||
} | ||
return NEXT_NEWBLANKNODEINDEX++; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this has a race condition (there is one blank node left, two threads first run the check in line 123,
(which for both is okay), then they both increment by one , upps overflow.
You either have to work with a compare_exchange_weak
loop (I am sure that cppreference has an example)
or with a std::mutex
.
Also you could here directly also implement the function getMultipleNewBlankNodes
that reserves several nodes (e.g. 100) at once for performance reasons.
src/global/Constants.h
Outdated
@@ -245,3 +246,6 @@ constexpr inline size_t NUM_SORT_THREADS = 4; | |||
constexpr inline std::string_view EMPH_ON = "\033[1m"; | |||
/// ANSI escape sequence to print "normal" text again in the console. | |||
constexpr inline std::string_view EMPH_OFF = "\033[22m"; | |||
|
|||
// Counter for internal Blank Node Ids. | |||
inline std::atomic_uint64_t NEXT_NEWBLANKNODEINDEX{0}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better call them (here and everywhere local
blank nodes (not new
). Then it is consistent with the local
vocab, which is a similar concept.
src/global/Constants.h
Outdated
@@ -245,3 +246,6 @@ constexpr inline size_t NUM_SORT_THREADS = 4; | |||
constexpr inline std::string_view EMPH_ON = "\033[1m"; | |||
/// ANSI escape sequence to print "normal" text again in the console. | |||
constexpr inline std::string_view EMPH_OFF = "\033[22m"; | |||
|
|||
// Counter for internal Blank Node Ids. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Counter for internal Blank Node Ids. | |
// Counter for internal Blank Node Ids that are not part of the index, but were locally added (by SERVICE clauses or expressions). |
src/engine/Service.cpp
Outdated
if (!blankNodeMap.contains(value)) { | ||
// create a new blankNodeIndex | ||
blankNodeMap[value] = | ||
Id::makeFromNewBlankNodeIndex(NewBlankNodeIndex::make( | ||
getExecutionContext()->getNextNewBlankNodeIndex())); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has to compute the hash of value
twice or three times in all cases.
Use try_emplace
. You would have to first insert a dummy Id as the value, but that should not be a problem here.
(see cppreference, ask me if there's trouble).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several small, and 1.5 big requests:
In particular,
- Store the highest
BlankNodeIndex
s.t. you don't need theLocalBlankNodeIndex
- (necessary if you implement 1.) Don't use a global variable for the blank node manager, but always access it via the corresponding
Index
.
@@ -217,6 +217,9 @@ ExportQueryExecutionTrees::idToStringAndTypeForEncodedValue(Id id) { | |||
case BlankNodeIndex: | |||
return std::pair{absl::StrCat("_:bn", id.getBlankNodeIndex().get()), | |||
nullptr}; | |||
case LocalBlankNodeIndex: | |||
return std::pair{absl::StrCat("_:bn", id.getLocalBlankNodeIndex().get()), | |||
nullptr}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should the index of the highest blank node in the index during index building
(It is created in the VocabularyMerger
, should be returned from there and can be added during the index building via IndexImpl::writeConfiguration
(to the metaData.json file). It should be read from there during the reading of the index builder from file. The LocalBlankNodeManager can then be a member of the Index
class, and we don't need the additional datatype LocalBlankNodeIndex
. (Datatypes are valuable).
src/engine/Service.cpp
Outdated
"For now, consider filtering them out using the ISBLANK function " | ||
"or " | ||
"converting them via the STR function."); | ||
auto optEmplace = blankNodeMap.try_emplace(value, Id()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto optEmplace = blankNodeMap.try_emplace(value, Id()); | |
auto [it, wasNew] = blankNodeMap.try_emplace(value, Id()); |
src/util/BlankNodeManager.h
Outdated
public: | ||
static const uint blockSize_ = 1000; | ||
static constexpr uint64_t totalAvailableBlocks_ = | ||
(ValueId::maxIndex + 1) / blockSize_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the +1
correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and is it necessary? one block less for better readability shouldn't be much of an issue.
namespace ad_utility { | ||
/* | ||
* Manager class for LocalBlankNodeIndex. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please elaborate a little bit, e.g. what are local blank node indices, what does this class do/manage, etc.
src/util/BlankNodeManager.h
Outdated
*/ | ||
class BlankNodeManager { | ||
public: | ||
static const uint blockSize_ = 1000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can also be constexpr.
src/util/BlankNodeManager.h
Outdated
uint64_t nextIdx_; | ||
}; | ||
|
||
// Manages the LocalBlankNodes used within a LocalVocab. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also document the lifetime implications (how long does what have to be kept alive to savely work with the blank node indices.
src/util/BlankNodeManager.h
Outdated
|
||
} // namespace ad_utility | ||
|
||
inline ad_utility::BlankNodeManager globalBlankNodeManager; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With my proposed change from above, you have to get rid of the global blank node manager, so to create a blank node manager, you need access to something. (e.g. a queryExecutionContext).
src/util/BlankNodeManager.cpp
Outdated
|
||
// Lock this part, as it might be accessed concurrently by | ||
// `LocalBlankNodeManager` instances. | ||
mtx_.lock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use a std::lock_guard
object instead of manually (un)-locking the object.
src/util/BlankNodeManager.cpp
Outdated
// _____________________________________________________________________________ | ||
void BlankNodeManager::freeBlock(uint64_t blockIndex) { | ||
usedBlocksSet_.erase(blockIndex); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Erasing the blocks must also lock a mutex.
Best use the ad_utility::Synchronized
class for access to the usedBlockSet
(see the header + tests for documentation).
// BlankNode exists already, known Id will be used. | ||
Id a2 = | ||
bTTC({{"type", "bnode"}, {"value", "A"}}).toValueIdIfNotString().value(); | ||
EXPECT_EQ(a, a2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also check the datatype of the blank node index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A thorough review on everything but the tests.
There's not that much to do, this looks very nice.
@@ -62,7 +62,7 @@ struct VocabularyMetaData { | |||
Id begin() const { return begin_; } | |||
Id end() const { return end_; } | |||
|
|||
// Return true iff the `id` belongs to this range. | |||
// Return true if the `id` belongs to this range. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Return true if the `id` belongs to this range. | |
// Return true iff the `id` belongs to this range. |
That is not really a typo.
In mathematics and therefore also in computer since you write iff
with two ff
s for if and only if
to be more precise.
src/util/BlankNodeManager.cpp
Outdated
// The Random-Generation Algorithm's performance is reduced once the number of | ||
// used blocks exceeds a limit. | ||
AD_CORRECTNESS_CHECK(usedBlocksSet_.rlock()->size() < | ||
totalAvailableBlocks_ / 256); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a message to this check (the second argument to the CORRECTNESS_CHECK
can be a string or a lambda that returns a string., s.t. we know, what has gone wrong here.
src/util/BlankNodeManager.cpp
Outdated
for (auto block : blocks_) { | ||
blankNodeManager_->freeBlock(block.blockIdx_); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This takes one lock + unlock per block, which is a waste of resources.
You can either create a templated function freeBlocks
s.t. you here can write manager_->freeBlocks(blocks_ | std::views::transform(...))
.
or you make the LocalBlankNodeManager a friend of the BlankNodeManager and handle the locking here yourself.
src/util/BlankNodeManager.cpp
Outdated
blocks_.back().nextIdx_ == | ||
(blankNodeManager_->minIndex_ + blocks_.back().blockIdx_ + 1) * | ||
blockSize_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can explicitly store the lastIndex
or the endIndex
in the Block
, that way you don't have to fiddle with the internals here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also,
This will be an index-breaking change (because you have to know the number of blank nodes), so you have to change something in the IndexFormatVersion.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, almost finished, only minor suggestions remaining.
Make sure to also fix that one trivial sonarcloud thing (public member var in indexImpl, can definitely be private.
src/engine/LocalVocab.cpp
Outdated
// Initialize the `localBlankNodeManager_` if it doesn't exist yet. | ||
if (!localBlankNodeManager_) [[unlikely]] { | ||
localBlankNodeManager_ = | ||
std::make_unique<ad_utility::BlankNodeManager::LocalBlankNodeManager>( | ||
blankNodeManager); | ||
ad_utility::BlankNodeManager::LocalBlankNodeManager(blankNodeManager); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for optional you can write
localBlankNodeManager_.emplace(blankNodeManager)
, emplace
also works for explicit constructors.
src/util/BlankNodeManager.cpp
Outdated
usedBlocksSet_.rlock()->size() < totalAvailableBlocks_ / 256, | ||
absl::StrCat("Critical high number of blank node blocks in use: ", | ||
usedBlocksSet_.rlock()->size(), " blocks")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should in priniple (because of rlock) be deadlock-free, but stil pecompute
auto numBlocks = ...
and then use that variable in the check to reduce the code duplication.
src/util/BlankNodeManager.cpp
Outdated
auto usedBlocksSetPtr = usedBlocksSet_.wlock(); | ||
while (usedBlocksSetPtr->contains(newBlockIndex)) { | ||
newBlockIndex = randBlockIndex_(); | ||
} | ||
usedBlocksSetPtr->insert(newBlockIndex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two lines withrandBlockIndex()
can be refactored, what about
while(true) {
auto idx = randIndx();
if not contained(idx) {
insert + return;
}
LocalBlankNodeManager& operator=(const LocalBlankNodeManager&) = delete; | ||
|
||
LocalBlankNodeManager(LocalBlankNodeManager&&) = default; | ||
LocalBlankNodeManager& operator=(LocalBlankNodeManager&&) = default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good point of codecov:
Add a unit test, that moving of the local blank node managers does the freeing of the blocks still only once and at the right place. That makes us robust on ou assuptions on the moving of std::vector
@@ -2,4 +2,4 @@ add_subdirectory(ConfigManager) | |||
add_subdirectory(MemorySize) | |||
add_subdirectory(http) | |||
add_library(util GeoSparqlHelpers.cpp antlr/ANTLRErrorHandling.cpp ParseException.cpp Conversions.cpp Date.cpp DateYearDuration.cpp Duration.cpp antlr/GenerateAntlrExceptionMetadata.cpp CancellationHandle.cpp StringUtils.cpp LazyJsonParser.cpp BlankNodeManager.cpp) | |||
qlever_target_link_libraries(util re2::re2) | |||
qlever_target_link_libraries(util re2::re2 s2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right... thanks.
|
||
const Index& index2 = getQec("")->getIndex(); | ||
EXPECT_NO_THROW(index2.getBlankNodeManager()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also add a test for the correct reading and writing of the minimal local blank node index
just use getQec(...) with a kg that has some blank nodes, and then access the blank node manager of the returned index to check that the number of blank nodes was propagated correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much
Conformance check passed ✅No test result changes. |
Quality Gate passedIssues Measures |
With this commit, QLever supports to add new blank nodes during the evaluation of a query. This function is used to support blank nodes in the result of a
SERVICE
request. These blank nodes are distinct from all blank nodes in the index, and also from all blank nodes from other SERVICE request, eve if they came from the same server. This behavior is correct accordin to the SPARQL 1.1 federated query standard.