-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Denial Constraint structures that are necessary to implement DC verification #417
base: main
Are you sure you want to change the base?
Conversation
b229892
to
2f7d757
Compare
This class represents an operator used in predicates for Denial Constrains (DC) representation. Predicates used there are less, greater, eq, neq, geq and leq. This C++ implementation is 100% bad, there is whole bunch of objects being created, but conceptually all of them are the same. Object "Operator '+'" and another object "Operator '+'" represents the same thing. ---------------------------------------------------------------------------- This is just a copy of a java code from https://github.com/RangerShaw/FastADC. I will refactor and think about better implementation later. I'll be just copying java code to get working algorithm ASAP, and after that I'll start thinking about good implementation.
This commit adds test_dc_structures.cpp file, which will be used to test different data structures which are required for DC representation (there are a lot).
This class represents a column operand within a predicate for FastADC. FastADC processes Denial Constraints (DCs) that involve comparisons between pairs of rows within a dataset. A typical DC example, derived from a Functional Dependency (FD) such as A -> B, is expressed as: ∀𝑡, 𝑠 ∈ 𝑟, ¬(𝑡.𝐴 = 𝑠.𝐴 ∧ 𝑡.𝐵 ≠ 𝑠.𝐵). This denotes that for any pair of rows in the relation, it should not be the case that while the values in column "A" are equal, the values in column "B" are unequal. A predicate in this context (e.g., 𝑡.𝐴 = 𝑠.𝐴) comprises three elements to be fully represented: the column operand from the first tuple ("t.A"), the comparison operator ("="), and the column operand from the second tuple ("s.A"). The `ColumnOperand` class encapsulates the column operand part of a predicate, such as "t.A" or "s.A".
First step in FastADC algorithm is to build so-called "Predicate Space". This is a long process during which many places in the code wants to get a Predicate. But each predicate is stored in a global storage -- map. In Java code this class (and other similar "provider" classes) are singletons. BaseProvider class is the class, from which a *Provider class should be derived. It ensures that only a PredicateBuilder class can initialize and free these singletons. I'm sure there exists a better approach, where we will store Provider classes in some fields to bind their lifetime more explicitly, but this is how it's done in Java, and I don't have much time to devise perfect architecture.
This class acts as a centralized storage to manage and provide access to Predicate objects. A Predicate is defined as "t1.A_i op t2.A_j", where t1 and t2 represent different rows, and A_i and A_j are columns (which may be the same or different) The FastADC algorithm first will build a so-called "Predicate Space", which is a set of all predicates that are allowed on R (set of rows, basically a table). In order to create and store predicates, this commit implements a singleton class with a hashmap storage.
FastADC processes Denial Constraints (DCs) that involve comparisons between pairs of rows within a dataset. A typical DC example, derived from a Functional Dependency such as A -> B, is expressed as: `forall t, s in r, not (t.A = s.A and t.B != s.B)` This denotes that for any pair of rows in the relation, it should not be the case that while the values in column "A" are equal, the values in column "B" are unequal. A predicate in this context (e.g., t.A == s.A) comprises three elements to be fully represented: the column operand from the first tuple ("t.A"), the comparison operator ("="), and the column operand from the second tuple ("s.A").
This simple test creates two predicates on a 2x2 table and evaluates them. We're checking for mo::GetPredicate function ability to correctly create a predicate
In the original FastADC pull request this class manages creation of predicates, so it initializes PredicateProvider. But in this pr this class is not required for DC verification. Hence adding a temorary class just to make the tests work
2f7d757
to
69ae72e
Compare
bool operator==(ColumnOperand const& rhs) const { | ||
return column_ == rhs.column_ && tuple_ == rhs.tuple_; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class has operator==
, but doesn't have operator!=
, which is weird. Define both or neither
@@ -0,0 +1,95 @@ | |||
#pragma once | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
include what you use, that is all standard headers from which types/functions are used here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same applies to all files
static std::unordered_map<OperatorType, OperatorType> const InitializeInverseMap(); | ||
|
||
static std::unordered_map<OperatorType, OperatorType> const InitializeSymmetricMap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const
in return type does nothing
TypedColumnData const& lhs = col_data[l_.GetColumn()->GetIndex()]; | ||
TypedColumnData const& rhs = col_data[r_.GetColumn()->GetIndex()]; | ||
|
||
// Assumes that types in both columns are the same (and they should) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add assert then if it's expected
static std::unordered_map<OperatorType, OperatorType> const InitializeInverseMap(); | ||
|
||
static std::unordered_map<OperatorType, OperatorType> const InitializeSymmetricMap(); | ||
|
||
static std::unordered_map<OperatorType, std::vector<OperatorType>> const | ||
InitializeImplicationsMap(); | ||
|
||
static std::unordered_map<OperatorType, std::vector<OperatorType>> const | ||
InitializeTransitivesMap(); | ||
|
||
static std::unordered_map<OperatorType, std::string> const InitializeShortStringMap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why all these functions are part of the public interface? Why do we need these functions at all?
They're used only to initialize static maps and the initialization itself is just an assignment. So why not simply make maps internally linked in src/core/algorithms/dc/operator.cpp
and assign correct values in the point of declaration without any additional functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, strictly speaking google style guide forbids static variables of type with throwing constructor (because it's impossible to catch the exception). So it's a good place to use frozen containers
template <typename Derived> | ||
class BaseProvider { | ||
protected: | ||
static std::shared_ptr<Derived> instance_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Singleton shared pointer is a bad design. (we discussed this extensively in private messages)
PredicatePtr Predicate::GetSymmetric() const { | ||
if (!symmetric_) { | ||
symmetric_ = GetPredicate(op_.GetSymmetric(), r_, l_); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a problem with this caching strategy: when the method is const
, it is implicitly assumed by other developers that it's thread safe, which is not the case for these methods and may cause subtle data race bugs in the future.
I'd think about more robust design of caching here (also these mutable
fields just look bad)
auto test = [&](int l, int r) { | ||
this->SetVals(l, r); | ||
|
||
for (auto& op : all_operators_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bad auto usage (according to google code style), op
type is not immediately obvious
// FIXME: this is wrong, there should be one class representing the algorithm, | ||
// which will create/clear singletons and call PredicateBuilder, PliShardBuiler, | ||
// SomeOtherBuilder in order.. | ||
friend PredicateBuilder; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is declared friend though? This class doesn't have private fields at all
// predicates_[op][col1][col2] corresponds to the related Predicate object | ||
std::unordered_map<Operator, OperatorMap> predicates_; | ||
|
||
friend BaseProvider<PredicateProvider>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why making it firend? BaseProvider
doesn't use any private fields of this class as far as I can see
No description provided.