Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement lazy join #1524

Merged
merged 34 commits into from
Nov 25, 2024
Merged
Show file tree
Hide file tree
Changes from 32 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
f1a67d4
Implement lazy join
RobinTF Nov 18, 2024
711c3a7
Properly reformat
RobinTF Nov 18, 2024
08be2d8
Fix race condition
RobinTF Nov 18, 2024
c9a5648
Fix lifetime issues and add documentation
RobinTF Nov 18, 2024
bc659c6
Properly reset runtime parameter after test
RobinTF Nov 19, 2024
112a757
Properly use `LocalVocab`s from fully materialized results
RobinTF Nov 19, 2024
d091bba
Properly apply permutation
RobinTF Nov 19, 2024
f585987
Spawn thread only if generator is consumed
RobinTF Nov 20, 2024
8ecc821
Fix issue with join aggregation
RobinTF Nov 20, 2024
190b8de
Fix typo
RobinTF Nov 20, 2024
3bbc466
Unify two generators
RobinTF Nov 20, 2024
3e6561c
Fix deadlock
RobinTF Nov 20, 2024
ec02b9f
Use robuster approach for thread safety
RobinTF Nov 20, 2024
cbc6252
Add test for operation failure propagation
RobinTF Nov 20, 2024
f56a00b
Merge remote-tracking branch 'origin/master' into lazy-join
Nov 20, 2024
1f54327
Show num-local-vocabs in runtime information
Nov 20, 2024
99ea8ee
Merge branch 'lazy-join' of github.com:RobinTF/qlever into lazy-join
Nov 20, 2024
1d5dcda
Fix problem from previous conflict resolution
Nov 20, 2024
b2b944f
Add unit test to ensure correct permutations
RobinTF Nov 20, 2024
1b55f45
Avoid out-of-line definition
RobinTF Nov 20, 2024
b9b7cae
Address PR comments
RobinTF Nov 21, 2024
0f4f859
Address simple PR comments
RobinTF Nov 22, 2024
62d9295
Allow more ranges to be used in `LocalVocab`
RobinTF Nov 22, 2024
c7889a6
Merge missing vocabs
RobinTF Nov 22, 2024
f1c886a
Merge remote-tracking branch 'ad-freiburg/master' into lazy-join
RobinTF Nov 22, 2024
9e357cd
Revert "Allow more ranges to be used in `LocalVocab`"
RobinTF Nov 22, 2024
711212a
Fix compilation
RobinTF Nov 22, 2024
4b51d1d
Address minor PR comments
RobinTF Nov 24, 2024
8e85cc7
Fix issues with vocab and add unit tests
RobinTF Nov 24, 2024
a8e3240
Add another unit test
RobinTF Nov 24, 2024
3f817fa
Fix typo
RobinTF Nov 24, 2024
ad0128a
Replace else case with assertion
RobinTF Nov 24, 2024
f88d4b8
Check empty table instead of vocab
RobinTF Nov 25, 2024
66cb836
Fix unit test
RobinTF Nov 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 56 additions & 24 deletions src/engine/AddCombinedRowToTable.h
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ class AddCombinedRowToIdTable {
size_t numJoinColumns_;
std::optional<std::array<IdTableView<0>, 2>> inputLeftAndRight_;
IdTable resultTable_;
LocalVocab mergedVocab_{};
std::array<const LocalVocab*, 2> currentVocabs_{nullptr, nullptr};

// This struct stores the information, which row indices from the input are
// combined into a given row index in the output, i.e. "To obtain the
Expand Down Expand Up @@ -62,7 +64,7 @@ class AddCombinedRowToIdTable {
// This callback is called with the result as an argument each time `flush()`
// is called. It can be used to consume parts of the result early, before the
// complete operation has finished.
using BlockwiseCallback = std::function<void(IdTable&)>;
using BlockwiseCallback = std::function<void(IdTable&, LocalVocab&)>;
[[no_unique_address]] BlockwiseCallback blockwiseCallback_{ad_utility::noop};

using CancellationHandle = ad_utility::SharedCancellationHandle;
Expand Down Expand Up @@ -124,42 +126,60 @@ class AddCombinedRowToIdTable {
}
}

// Unwrap type `T` to get an `IdTableView<0>`, even if it's not an
// `IdTableView<0>`. Identity for `IdTableView<0>`.
template <typename T>
static IdTableView<0> toView(const T& table) {
if constexpr (requires { table.template asStaticView<0>(); }) {
return table.template asStaticView<0>();
} else {
return table;
}
}

// Merge the vocab contained in `T` with the `mergedVocab_` and set the passed
// pointer reference to the proper value.
RobinTF marked this conversation as resolved.
Show resolved Hide resolved
template <typename T>
void mergeVocab(const T& table, const LocalVocab*& currentVocab) {
AD_CORRECTNESS_CHECK(currentVocab == nullptr);
if constexpr (requires { table.getLocalVocab(); }) {
currentVocab = &table.getLocalVocab();
mergedVocab_.mergeWith(std::span{&table.getLocalVocab(), 1});
}
}

// Flush remaining pending entries before changing the input.
void flushBeforeInputChange() {
// Clear to avoid unnecessary merge.
currentVocabs_ = {nullptr, nullptr};
if (nextIndex_ != 0) {
AD_CORRECTNESS_CHECK(inputLeftAndRight_.has_value());
flush();
} else {
// Clear vocab when no rows were written.
mergedVocab_ = LocalVocab{};
}
}

// Set or reset the input. All following calls to `addRow` then refer to
// indices in the new input. Before resetting, `flush()` is called, so all the
// rows from the previous inputs get materialized before deleting the old
// inputs. The arguments to `inputLeft` and `inputRight` can either be
// `IdTable` or `IdTableView<0>`, or any other type that has a
// `asStaticView<0>` method that returns an `IdTableView<0>`.
void setInput(const auto& inputLeft, const auto& inputRight) {
auto toView = []<typename T>(const T& table) {
if constexpr (requires { table.template asStaticView<0>(); }) {
return table.template asStaticView<0>();
} else {
return table;
}
};
if (nextIndex_ != 0) {
AD_CORRECTNESS_CHECK(inputLeftAndRight_.has_value());
flush();
}
flushBeforeInputChange();
mergeVocab(inputLeft, currentVocabs_.at(0));
mergeVocab(inputRight, currentVocabs_.at(1));
inputLeftAndRight_ = std::array{toView(inputLeft), toView(inputRight)};
checkNumColumns();
}

// Only set the left input. After this it is only allowed to call
// `addOptionalRow` and not `addRow` until `setInput` has been called again.
void setOnlyLeftInputForOptionalJoin(const auto& inputLeft) {
auto toView = []<typename T>(const T& table) {
if constexpr (requires { table.template asStaticView<0>(); }) {
return table.template asStaticView<0>();
} else {
return table;
}
};
if (nextIndex_ != 0) {
AD_CORRECTNESS_CHECK(inputLeftAndRight_.has_value());
flush();
}
flushBeforeInputChange();
mergeVocab(inputLeft, currentVocabs_.at(0));
// The right input will be empty, but with the correct number of columns.
inputLeftAndRight_ = std::array{
toView(inputLeft),
Expand Down Expand Up @@ -188,6 +208,8 @@ class AddCombinedRowToIdTable {
return std::move(resultTable_);
}

LocalVocab& localVocab() { return mergedVocab_; }

// Disable copying and moving, it is currently not needed and makes it harder
// to reason about
AddCombinedRowToIdTable(const AddCombinedRowToIdTable&) = delete;
Expand Down Expand Up @@ -320,7 +342,17 @@ class AddCombinedRowToIdTable {
indexBuffer_.clear();
optionalIndexBuffer_.clear();
nextIndex_ = 0;
std::invoke(blockwiseCallback_, result);
std::invoke(blockwiseCallback_, result, mergedVocab_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely have to talk about the LocalVocab stuff, as this really aggressively merges the local vocabs over and over again in the following cases:

  • The undef blocks are set over and over again
  • We have lazy inputs, but a fully materialized result [The most common case]
  • We have many cartesian blocks (but there we typically have much more input rows than input vocabularies).

I see two potential angles here:

  1. Deduplicate inside the LocalVocab class (the most aggressive way: Use a HashMap instead of a vector for the otherWordSets.
  2. Fiddle with the internals of the join here (hard to do, because we have to further hack the ZipperJoinWithBlocksAndUndef to be aware of the result being fully materialized etc. That's why I am a little bit against it.
    I will ask @hannahbast what she thinks about 1. It will only have a performance impact for inputs with very many nonempty local vocabs, and these are typically slower anyway, so it seems feasible to me.

RobinTF marked this conversation as resolved.
Show resolved Hide resolved
// The current `IdTable`s might still be active, so we have to merge the
// local vocabs again if all other sets were moved-out.
if (mergedVocab_.numSets() == 1) {
RobinTF marked this conversation as resolved.
Show resolved Hide resolved
// Make sure to reset `mergedVocab_` so it is in a valid state again.
mergedVocab_ = LocalVocab{};
// Only merge non-null vocabs.
auto range = currentVocabs_ | std::views::filter(toBool) |
std::views::transform(dereference);
mergedVocab_.mergeWith(std::ranges::ref_view{range});
RobinTF marked this conversation as resolved.
Show resolved Hide resolved
}
}
const IdTableView<0>& inputLeft() const {
return inputLeftAndRight_.value()[0];
Expand Down
Loading
Loading