Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow selecting all variables as payload in spatial search SERVICE #1656

Merged
merged 8 commits into from
Dec 6, 2024
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/engine/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ add_library(engine
Union.cpp MultiColumnJoin.cpp TransitivePathBase.cpp
TransitivePathHashMap.cpp TransitivePathBinSearch.cpp Service.cpp
Values.cpp Bind.cpp Minus.cpp RuntimeInformation.cpp CheckUsePatternTrick.cpp
VariableToColumnMap.cpp ExportQueryExecutionTrees.cpp
VariableToColumnMap.cpp ExportQueryExecutionTrees.cpp PayloadVariables.cpp
CartesianProductJoin.cpp TextIndexScanForWord.cpp TextIndexScanForEntity.cpp
TextLimit.cpp LazyGroupBy.cpp GroupByHashMapOptimization.cpp SpatialJoin.cpp
CountConnectedSubgraphs.cpp SpatialJoinAlgorithms.cpp PathSearch.cpp ExecuteUpdate.cpp)
Expand Down
72 changes: 72 additions & 0 deletions src/engine/PayloadVariables.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
// Copyright 2024, University of Freiburg,
// Chair of Algorithms and Data Structures.
// Author: Christoph Ullinger <[email protected]>

#include "engine/PayloadVariables.h"

// ____________________________________________________________________________
PayloadVariables::PayloadVariables(std::vector<Variable> variables)
: variables_{std::move(variables)} {};

// ____________________________________________________________________________
PayloadVariables PayloadVariables::all() {
PayloadVariables pv{};
pv.setToAll();
return pv;
};

// ____________________________________________________________________________
void PayloadVariables::addVariable(const Variable& variable) {
// Helper visitor to check if the payload variables has not been set to all
// and a variable can be added. If yes, add it.
auto addVarVisitor = [&]<typename T>(T& value) {
if constexpr (std::is_same_v<T, std::vector<Variable>>) {
value.push_back(variable);
}
};

std::visit(addVarVisitor, variables_);
};

// ____________________________________________________________________________
void PayloadVariables::setToAll() {
variables_ = detail::PayloadAllVariables{};
};

// ____________________________________________________________________________
bool PayloadVariables::empty() const {
// Helper visitor to check if the payload variables are empty
auto emptyVisitor = []<typename T>(const T& value) -> bool {
if constexpr (std::is_same_v<T, detail::PayloadAllVariables>) {
return false;
} else {
static_assert(std::is_same_v<T, std::vector<Variable>>);
return value.empty();
}
};

return std::visit(emptyVisitor, variables_);
};

// ____________________________________________________________________________
bool PayloadVariables::isAll() const {
return std::holds_alternative<detail::PayloadAllVariables>(variables_);
};

// ____________________________________________________________________________
const std::vector<Variable>& PayloadVariables::getVariables() const {
// Helper visitor to check if the payload variables has been set to all: then
// throw, otherwise return the vector
auto getVarVisitor =
[]<typename T>(const T& value) -> const std::vector<Variable>& {
if constexpr (std::is_same_v<T, std::vector<Variable>>) {
return value;
} else {
AD_THROW(
"getVariables may only be called on a non-all PayloadVariables "
"object.");
}
};

return std::visit(getVarVisitor, variables_);
};
57 changes: 57 additions & 0 deletions src/engine/PayloadVariables.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
// Copyright 2024, University of Freiburg,
// Chair of Algorithms and Data Structures.
// Author: Christoph Ullinger <[email protected]>

#pragma once

#include <vector>

#include "parser/data/Variable.h"

namespace detail {
// Represents the selection of all variables as payload
struct PayloadAllVariables : std::monostate {
bool operator==([[maybe_unused]] const std::vector<Variable>& other) const {
return false;
};

Check warning on line 16 in src/engine/PayloadVariables.h

View check run for this annotation

Codecov / codecov/patch

src/engine/PayloadVariables.h#L14-L16

Added lines #L14 - L16 were not covered by tests
};
} // namespace detail

// This class represents a list of variables to be included in the result of
// an operation. This is currently used in the spatial search.
class PayloadVariables {
public:
// Construct an empty payload variables object
PayloadVariables() = default;

// Construct a payload variables object from a vector of variables
PayloadVariables(std::vector<Variable> variables);

// Construct a payload variables object that is set to all
static PayloadVariables all();

// Add a variable to the payload variables or do nothing if all variables are
// already selected
void addVariable(const Variable& variable);

// Select all variables.
void setToAll();

// Returns whether the payload variables object is empty.
bool empty() const;

// Returns whether all variables have been selected.
bool isAll() const;

// Returns a vector of variables if all has not been set. Otherwise throws.
const std::vector<Variable>& getVariables() const;

// For testing: equality operator
bool operator==(const PayloadVariables& other) const {
return variables_ == other.variables_;
};

private:
std::variant<detail::PayloadAllVariables, std::vector<Variable>> variables_ =
std::vector<Variable>{};
};
75 changes: 32 additions & 43 deletions src/engine/SpatialJoin.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -175,30 +175,24 @@ string SpatialJoin::getDescriptor() const {
// ____________________________________________________________________________
size_t SpatialJoin::getResultWidth() const {
if (childLeft_ && childRight_) {
// Helper visitor lambda to determine the size of the right child after
// selecting only the join column and explicitly requested payload columns
auto sizeRight = [this]<typename T>(const T& value) {
if constexpr (std::is_same_v<T, PayloadAllVariables>) {
return childRight_->getResultWidth();
} else {
static_assert(std::is_same_v<T, std::vector<Variable>>);

// We convert to a set here, because we allow multiple occurrences of
// variables in payloadVariables_
absl::flat_hash_set<Variable> pvSet{value.begin(), value.end()};

// The payloadVariables_ may contain the right join variable
return pvSet.size() + (pvSet.contains(config_.right_) ? 0 : 1);
}
};

// don't subtract anything because of a common join column. In the case
// of the spatial join, the join column is different for both sides (e.g.
// objects with a distance of at most 500m to each other, then each object
// might contain different positions, which should be kept).
// For the right join table we only use the selected columns.
auto widthChildren = childLeft_->getResultWidth() +
std::visit(sizeRight, config_.payloadVariables_);
size_t sizeRight;
if (config_.payloadVariables_.isAll()) {
sizeRight = childRight_->getResultWidth();
} else {
// We convert to a set here, because we allow multiple occurrences of
// variables in payloadVariables_
std::vector<Variable> pv = config_.payloadVariables_.getVariables();
absl::flat_hash_set<Variable> pvSet{pv.begin(), pv.end()};

// The payloadVariables_ may contain the right join variable
return pvSet.size() + (pvSet.contains(config_.right_) ? 0 : 1);
}
auto widthChildren = childLeft_->getResultWidth() + sizeRight;

if (config_.distanceVariable_.has_value()) {
return widthChildren + 1;
Expand Down Expand Up @@ -324,34 +318,29 @@ VariableToColumnMap SpatialJoin::getVarColMapPayloadVars() const {
"Child right must be added before payload variable to "
"column map can be computed.");

auto payloadVariablesVisitor = [this]<typename T>(const T& value) {
const auto& varColMap = childRight_->getVariableColumns();
VariableToColumnMap newVarColMap;
const auto& varColMap = childRight_->getVariableColumns();
VariableToColumnMap newVarColMap;

if constexpr (std::is_same_v<T, PayloadAllVariables>) {
newVarColMap = VariableToColumnMap{varColMap};
} else {
static_assert(std::is_same_v<T, std::vector<Variable>>);
for (const auto& var : value) {
if (varColMap.contains(var)) {
newVarColMap.insert_or_assign(var, varColMap.at(var));
} else {
addWarningOrThrow(
absl::StrCat("Variable '", var.name(),
"' selected as payload to spatial join but not "
"present in right child."));
}
if (config_.payloadVariables_.isAll()) {
newVarColMap = VariableToColumnMap{varColMap};
} else {
for (const auto& var : config_.payloadVariables_.getVariables()) {
if (varColMap.contains(var)) {
newVarColMap.insert_or_assign(var, varColMap.at(var));
} else {
addWarningOrThrow(
absl::StrCat("Variable '", var.name(),
"' selected as payload to spatial join but not "
"present in right child."));
}
AD_CONTRACT_CHECK(
varColMap.contains(config_.right_),
"Right variable is not defined in right child of spatial join.");
newVarColMap.insert_or_assign(config_.right_,
varColMap.at(config_.right_));
}
return newVarColMap;
};
AD_CONTRACT_CHECK(
varColMap.contains(config_.right_),
"Right variable is not defined in right child of spatial join.");
newVarColMap.insert_or_assign(config_.right_, varColMap.at(config_.right_));
}

return std::visit(payloadVariablesVisitor, config_.payloadVariables_);
return newVarColMap;
}

// ____________________________________________________________________________
Expand Down
15 changes: 2 additions & 13 deletions src/engine/SpatialJoin.h
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
#include <variant>

#include "engine/Operation.h"
#include "engine/PayloadVariables.h"
#include "global/Id.h"
#include "parser/data/Variable.h"

Expand All @@ -27,18 +28,6 @@ struct MaxDistanceConfig {
// Configuration to restrict the results provided by the SpatialJoin
using SpatialJoinTask = std::variant<NearestNeighborsConfig, MaxDistanceConfig>;

// Represents the selection of all variables of the right join table as payload
struct PayloadAllVariables : std::monostate {
bool operator==([[maybe_unused]] const std::vector<Variable>& other) const {
return false;
};
};

// Configuration to select which columns of the right join table should be
// part of the result.
using PayloadVariables =
std::variant<PayloadAllVariables, std::vector<Variable>>;

// Selection of a SpatialJoin algorithm
enum class SpatialJoinAlgorithm { BASELINE, S2_GEOMETRY, BOUNDING_BOX };
const SpatialJoinAlgorithm SPATIAL_JOIN_DEFAULT_ALGORITHM =
Expand All @@ -60,7 +49,7 @@ struct SpatialJoinConfiguration {
// If given a vector of variables, the selected variables will be part of the
// result table - the join column will automatically be part of the result.
// You may use PayloadAllVariables to select all columns of the right table.
PayloadVariables payloadVariables_ = PayloadAllVariables{};
PayloadVariables payloadVariables_ = PayloadVariables::all();

// Choice of algorithm. Both algorithms have equal results, but different
// runtime characteristics.
Expand Down
26 changes: 23 additions & 3 deletions src/parser/SpatialQuery.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,13 @@

#include "parser/SpatialQuery.h"

#include <type_traits>
#include <variant>

#include "engine/PayloadVariables.h"
#include "engine/SpatialJoin.h"
#include "parser/MagicServiceIriConstants.h"
#include "parser/data/Variable.h"

namespace parsedQuery {

Expand Down Expand Up @@ -60,7 +65,22 @@ void SpatialQuery::addParameter(const SparqlTriple& triple) {
"<baseline>, <s2> or <boundingBox>.");
}
} else if (predString == "payload") {
payloadVariables_.push_back(getVariable("payload", object));
if (object.isVariable()) {
// Single selected variable

// If we have already selected all payload variables, we can ignore
// another explicitly selected variable.
payloadVariables_.addVariable(getVariable("payload", object));
} else if (object.isIri() &&
extractParameterName(object, SPATIAL_SEARCH_IRI) == "all") {
// All variables selected
payloadVariables_.setToAll();
} else {
throw SpatialSearchException(
"The argument to the <payload> parameter must be either a variable "
"to be selected or <all>.");
}

} else {
throw SpatialSearchException(absl::StrCat(
"Unsupported argument ", predString,
Expand Down Expand Up @@ -95,7 +115,7 @@ SpatialJoinConfiguration SpatialQuery::toSpatialJoinConfiguration() const {
"spatialSearch: { [Config Triples] { <Something> <ThatSelects> ?right "
"} }.");
} else if (!ignoreMissingRightChild_ && !childGraphPattern_.has_value() &&
!payloadVariables_.empty()) {
!payloadVariables_.isAll() && !payloadVariables_.empty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the !isAll call redundant, because isAll() directly implies !empty() ?

Copy link
Collaborator Author

@ullingerc ullingerc Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree. If I'm not totally mistaken, this is neither logically nor semantically redundant.

  • Logically because isAll implies not-empty but not-empty does not imply isAll
  • Semantically because there is a subtle edge case, where this has an actual impact: Assume the user does a max-dist-join and declares left and right outside of the service. Then they give the configuration <payload> <all>. This is the default behavior anyway. But without the !isAll() here we would throw despite the user's configuration just stating the default value.

Edit: I will add a unit test for the edge case.

throw SpatialSearchException(
"The right variable for the spatial search is declared outside the "
"SERVICE, but the <payload> parameter was set. Please move the "
Expand All @@ -112,7 +132,7 @@ SpatialJoinConfiguration SpatialQuery::toSpatialJoinConfiguration() const {
// Payload variables
PayloadVariables pv;
if (!childGraphPattern_.has_value()) {
pv = PayloadAllVariables{};
pv = PayloadVariables::all();
} else {
pv = payloadVariables_;
}
Expand Down
3 changes: 2 additions & 1 deletion src/parser/SpatialQuery.h
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

#pragma once

#include "engine/PayloadVariables.h"
#include "parser/GraphPattern.h"
#include "parser/MagicServiceQuery.h"

Expand Down Expand Up @@ -39,7 +40,7 @@ struct SpatialQuery : MagicServiceQuery {
// this vector is required to be empty - the user may not specify the payload
// configuration parameter. It will then be automatically set to
// `PayloadAllVariables` to ensure appropriate semantics.
std::vector<Variable> payloadVariables_;
PayloadVariables payloadVariables_;

// Optional further argument: the join algorithm. If it is not given, the
// default algorithm is used implicitly.
Expand Down
Loading
Loading