-
Notifications
You must be signed in to change notification settings - Fork 623
Syntax-based query rewriter #1494
base: master
Are you sure you want to change the base?
Conversation
- pattern - rule - ruleset - group - groupexpression - binding - memo - optimize_context - optimizer_task (TopDownRewrite/BottomUpRewrite) Templates generally followed: template <class Node, class OperatorType, class OperatorExpr> The template instantiation associated with: Node = Operator, OperatorType = OpType, OperatorExpr = OperatorExpression is used primarily by the core Optimizer. All references to the templated files/classes from core optimizer files were instantiated to that. Note worth mentioning: Operator class defines a public interface wrapper around BaseOperatorNode, basically defines a single logical/physical operator. OpType class defines the various logical/physical operations OperatorExpression class is essentially a tree of Operator
Possibly annoying problems w.r.t Peloton/terrier: (1) Use of unique_ptr/raw pointer as opposed to shared_ptr in AbstractExpression (2) AbstractExpression equality comparison method Additional components needed: - Dynamic/template/strategy rule evaluation (particularly comparison) - Repeated/multi-level application of rules - Layer to convert from memo -> AbstractExpression - Some refactoring w.r.t templated code - Better AbsExpr_Container/Expression indirection layer (intended to present a similar interface exposed by Operator/OperatorExpression relied upon by core logic) - Proper memory management strategy (tightly coupled to problem #1)
What still doesn't work/don't care about yet/not done - proper memory management (terrier uses shared_ptr anyways) - other 1-level rewrites, multi-layer rewrites, other expr rewrites - how can we define a grammar to programmatically create these rewrites? (the one we have is way too static...) - in relation to logical equivalence: (1) how do we generate logically equivalent expressions: - multi-pass using generating rules (similar to ApplyRule) OR - from Pattern A, generate logically equivalent set of patterns P OR - transform input expression to match certain specification OR - ??? (2) what operators do we support translating? - probably (a AND b) ====> (b AND a) - probably (a OR b) ====> (b OR a) - probably (a = b) ====> (b = a) - maybe more??? (3) do we want multi level translations? - i.e (a AND b) AND c ====> (a AND (b AND c)) - what order do we do these in? May have to modify these operations: - Some assertions in TopDownRewrite/BottomUpRewrite w.r.t to the iterator - Possibly binding.cpp / optimizer_metadata.h / optimizer_task.cpp Issues still pending: - Comparing Values (Matt email/discussion) - r.rule->Check (terrier issue cmu-db#332)
TEST_F(RewriterTests, SimpleEqualityTree) { | ||
// [=] | ||
// [=] [=] | ||
// [4] [5] [3] [3] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a line to show what you expect it to be rewritten to, i.e. // false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also add a comment similar to this one to the other tests so we can easily know what they are testing.
LOW = 1 | ||
}; | ||
|
||
class ComparatorElimination: public Rule<AbsExpr_Container,ExpressionType,AbsExpr_Expression> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if you added some documentation to this, even if it's simplistic.
@@ -68,10 +73,10 @@ class Memo { | |||
//===--------------------------------------------------------------------===// | |||
// For rewrite phase: remove and add expression directly for the set | |||
//===--------------------------------------------------------------------===// | |||
void RemoveParExpressionForRewirte(GroupExpression* gexpr) { | |||
void RemoveParExpressionForRewirte(GroupExpression<Node,OperatorType,OperatorExpr>* gexpr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo?
Rewriter(const Rewriter &) = delete; | ||
Rewriter &operator=(const Rewriter &) = delete; | ||
Rewriter(Rewriter &&) = delete; | ||
Rewriter &operator=(Rewriter &&) = delete; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is necessary as this is default behavior I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider DISALLOW_COPY_AND_MOVE macro.
@@ -14,34 +14,71 @@ | |||
#include "optimizer/memo.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if you added more documentation to memo.h to give a better high-level idea of what these Memo objects are used for.
// AbsExpr_Container does *not* handle memory correctly w.r.t internal instantiations | ||
// from Rule transformation. This is since Peloton itself mixes unique_ptrs and | ||
// hands out raw pointers which makes adding a shared_ptr here extremely problematic. | ||
// terrier uses only shared_ptr when dealing with AbstractExpression trees. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(All Terrier parser behavior can be changed, just FYI. If anything would make it more convenient for you, make the case for it.)
@@ -85,16 +88,18 @@ class Optimizer : public AbstractOptimizer { | |||
|
|||
void Reset() override; | |||
|
|||
OptimizerMetadata &GetMetadata() { return metadata_; } | |||
OptimizerMetadata<Operator,OpType,OperatorExpression> &GetMetadata() { return metadata_; } | |||
|
|||
/* For test purposes only */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kind of inherited from bad decisions before, but in terrier we try pretty hard to not have public test-only functions and use FRIEND_TEST instead.
namespace optimizer { | ||
|
||
/* Rules are applied from high to low priority */ | ||
enum class RulePriority : int { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I know nothing about optimizers) Is priority a well-defined concept for optimization or is this a rough heuristic? What happens if you have two rules of the same priority that can both be applied, is it always arbitrary which should go first?
} | ||
return false; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may not be a huge deal since, as you said, the == op doesn't affect correctness but do you intend to implement the == op for rewrites? If so, does there exist an == op for AbstractExpression you can use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe Terrier currently only has the notion of logical equality for abstract expressions.
} | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does Rebuild do in context of its AbsExpr_Container? You could probably add some more documentation for this function.
#include "optimizer/rule.h" | ||
#include "optimizer/absexpr_expression.h" | ||
|
||
#include <memory> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: clang-tidy will complain about the ordering of the imports so be aware of that in the future. I believe native libraries such as should go first here.
#include "expression/comparison_expression.h" | ||
#include "expression/constant_value_expression.h" | ||
|
||
#include <memory> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same nitpick about ordering #include's.
using GroupExpressionTemplate = GroupExpression<AbsExpr_Container,ExpressionType,AbsExpr_Expression>; | ||
|
||
using GroupTemplate = Group<AbsExpr_Container,ExpressionType,AbsExpr_Expression>; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to see more documentation of the functions in Rewriter as well explanations for the templates you use. Maybe a high level explanation of what the rewriter does at the top of the rewriter.h file would be helpful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may just be due to Peloton, but the code in some parts is fairly cryptic, so it would help to have more documentation to understand what it's doing. Also, I think terrier may reject PRs that do not contain enough documentation for functions.
std::shared_ptr<PropertySet> required_prop, | ||
double cost_upper_bound = std::numeric_limits<double>::max()) | ||
: metadata(metadata), | ||
required_prop(required_prop), | ||
cost_upper_bound(cost_upper_bound) {} | ||
|
||
OptimizerMetadata *metadata; | ||
OptimizerMetadata<Operator,OperatorType,OperatorExpr> *metadata; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for style: the rest of the code names class members as metadata_ or required_prop_.
In addition, is there a particular reason why these members are public?
virtual bool Empty() = 0; | ||
}; | ||
|
||
/** | ||
* @brief Stack implementation of the task pool | ||
*/ | ||
class OptimizerTaskStack : public OptimizerTaskPool { | ||
template <class Node, class OperatorType, class OperatorExpr> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using the final keyword if this class is not be inherited.
TEST_F(RewriterTests, SimpleEqualityTree) { | ||
// [=] | ||
// [=] [=] | ||
// [4] [5] [3] [3] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also add a comment similar to this one to the other tests so we can easily know what they are testing.
The code here uses templates across many optimizer files to leverage the old query rewriter and allow it to operate on expression trees. A very simple rewriting task is passing. Use of abstract interfaces may provide a cleaner way to generalize the rewriter, and the code is currently in progress in a separate branch of development.