-
Notifications
You must be signed in to change notification settings - Fork 623
Syntax-Based Query Rewriter, Code Review #2 #1495
base: master
Are you sure you want to change the base?
Conversation
- pattern - rule - ruleset - group - groupexpression - binding - memo - optimize_context - optimizer_task (TopDownRewrite/BottomUpRewrite) Templates generally followed: template <class Node, class OperatorType, class OperatorExpr> The template instantiation associated with: Node = Operator, OperatorType = OpType, OperatorExpr = OperatorExpression is used primarily by the core Optimizer. All references to the templated files/classes from core optimizer files were instantiated to that. Note worth mentioning: Operator class defines a public interface wrapper around BaseOperatorNode, basically defines a single logical/physical operator. OpType class defines the various logical/physical operations OperatorExpression class is essentially a tree of Operator
Possibly annoying problems w.r.t Peloton/terrier: (1) Use of unique_ptr/raw pointer as opposed to shared_ptr in AbstractExpression (2) AbstractExpression equality comparison method Additional components needed: - Dynamic/template/strategy rule evaluation (particularly comparison) - Repeated/multi-level application of rules - Layer to convert from memo -> AbstractExpression - Some refactoring w.r.t templated code - Better AbsExpr_Container/Expression indirection layer (intended to present a similar interface exposed by Operator/OperatorExpression relied upon by core logic) - Proper memory management strategy (tightly coupled to problem #1)
What still doesn't work/don't care about yet/not done - proper memory management (terrier uses shared_ptr anyways) - other 1-level rewrites, multi-layer rewrites, other expr rewrites - how can we define a grammar to programmatically create these rewrites? (the one we have is way too static...) - in relation to logical equivalence: (1) how do we generate logically equivalent expressions: - multi-pass using generating rules (similar to ApplyRule) OR - from Pattern A, generate logically equivalent set of patterns P OR - transform input expression to match certain specification OR - ??? (2) what operators do we support translating? - probably (a AND b) ====> (b AND a) - probably (a OR b) ====> (b OR a) - probably (a = b) ====> (b = a) - maybe more??? (3) do we want multi level translations? - i.e (a AND b) AND c ====> (a AND (b AND c)) - what order do we do these in? May have to modify these operations: - Some assertions in TopDownRewrite/BottomUpRewrite w.r.t to the iterator - Possibly binding.cpp / optimizer_metadata.h / optimizer_task.cpp Issues still pending: - Comparing Values (Matt email/discussion) - r.rule->Check (terrier issue cmu-db#332)
AbstractNode will provide interface for Operator and eventually AbstractExpressions as well. Note there are a few road blocks before the rest of the rewriter can be changed to cleanly use abstract classes: (1) Similarly abstract OperatorExpressions. (2) We will have to find a good place to hide OpType, which is currently an enum type (cannot be abstracted) and pervades the code base. This may be solved by abstracting at the group level, but will have to look into it. (3) Need to clean up and separate interfaces between AbstractNode, OperatorNode, and Operator classes.
Abstract nodes were implemented in 209c46a. This is essentially just refactoring and plugging in abstract nodes throughout the optimizer. The abstract interface exposes OpType and ExpressionType for now, which ideally will be fixed later. Work remaining for abstracting OperatorExpression.
Still need to make fixes to Pattern to support both OpType and ExpType without templatizing. Will also need to clean up code after before build will work.
c55babe
to
e3ac1ba
Compare
@@ -250,6 +252,21 @@ void BindNodeVisitor::Visit(expression::TupleValueExpression *expr) { | |||
expr->SetColName(col_name); | |||
expr->SetValueType(value_type); | |||
expr->SetBoundOid(col_pos_tuple); | |||
|
|||
// TODO(esargent): Uncommenting the following code makes AddressSanitizer get mad at me with a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a good issue to file, so that we don't make the same mistake when we write a new binder.
// | ||
// absexpr_expression.h | ||
// | ||
// Identification: src/include/optimizer/absexpr_expression.h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, if I understand correctly, it is more convenient for you to be operating on something that is not directly the parser's expressions. We are bringing in logical operators or something like that (PR 232) in terrier, which are the outputs of the binder and inputs to the optimizer. This seems like it might belong there, then. I see that you note this in your design doc.
src/include/optimizer/rule.h
Outdated
class Rule { | ||
public: | ||
virtual ~Rule(){}; | ||
|
||
// TODO(ncx): pattern |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate on this TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolving.
src/include/optimizer/rule_rewrite.h
Outdated
MEDIUM = 2, | ||
LOW = 1 | ||
}; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a good place to use macros. All the header stubs look the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolving.
src/optimizer/group.cpp
Outdated
|
||
// Additional assertion checks for AddExpression() with AST rewriting | ||
// TODO(ncx): get group expression type | ||
// if (std::is_same<Node, AbsExpr_Container>::value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason these are currently commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolving.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding comments in .h files (like rule_rewrite.h) could be useful to someone just reviewing the code without going deep into the implementation.
// | ||
// This is done to export the correct interface from the wrapped AbstractExpression | ||
// to the rest of the core rule/optimizer code/logic. | ||
class AbsExpr_Container: public AbstractNode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow naming convention: AbsExprContainer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolving.
return expr->GetExpressionName(); | ||
} | ||
|
||
return "Undefined"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this case supposed to happen? If not, throw an exception instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolving.
}; | ||
|
||
|
||
class AbsExpr_Expression: public AbstractNodeExpression { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment about the naming convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolving.
void RewriteLoop(int root_group_id); | ||
|
||
std::shared_ptr<AbsExpr_Expression> ConvertToAbsExpr(const expression::AbstractExpression *expr); | ||
std::shared_ptr<GroupExpression> RecordTreeGroups(const expression::AbstractExpression *expr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing documentation on these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved.
|
||
DISALLOW_COPY_AND_MOVE(Rewriter); | ||
|
||
OptimizerMetadata &GetMetadata() { return metadata_; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doxygen will complain about a lack of doxygen documentation on public functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved.
|
||
~AbstractNode() {} | ||
|
||
virtual void Accept(OperatorVisitor *v) const = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation on this would be really useful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved.
Rewriter *rewriter = new Rewriter(); | ||
auto rewrote = rewriter->RewriteExpression(common); | ||
|
||
delete rewriter; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The convention that I think we usually use is to use smart pointers with a custom scope to achieve the same thing.
TEST_F(RewriterTests, SingleCompareEqualRewritePassTrue) { | ||
type::Value leftValue = type::ValueFactory::GetIntegerValue(4); | ||
type::Value rightValue = type::ValueFactory::GetIntegerValue(4); | ||
auto left = new expression::ConstantValueExpression(leftValue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Write in comments what this one is doing even though it may be relatively obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved.
std::vector<std::shared_ptr<AbstractNodeExpression>> &transformed, | ||
UNUSED_ATTRIBUTE OptimizeContext *context) const { | ||
(void)transformed; | ||
(void)context; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You already marked context as UNUSED_ATTRIBUTE on line 49. Take that macro out then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved.
@@ -19,6 +19,8 @@ | |||
#include "optimizer/rule.h" | |||
#include "settings/settings_manager.h" | |||
|
|||
#include <memory> | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some documentation of what OptimizerMetadata contains in optimizer_metadata would be nice as well.
@@ -111,6 +111,24 @@ class AbstractExpression : public Printable { | |||
children_[index].reset(expr); | |||
} | |||
|
|||
void ClearChildren() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great documentation for this function and how it fits in with the query rewriting process as a whole. Is this function only used once during the rewriting phase for a single query? If not, have you guys looked at how the rewriter’s performance is affected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is only used once during the rewriting phase in Peloton. As mentioned I believe in rewriter.cpp
, this function is solely intended for ensuring that our rewriter manages state (i.e memory, pointers) correctly.
return false; | ||
} | ||
|
||
bool operator==(const AbsExpr_Container &r) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So does this == basically check if at least one of the AbsExpr_Container nodes is not a nullptr? Would it make sense to have this function call the more extensive checks on its contents? Or would this significantly affect your performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We believe extensive checks on the contents are important for performance reasons but not necessary for correctness. Due to Peloton's design, we could not use operator== directly (i.e ConstantValueExpression does not check the Value). We believe "terrier" implements operator== correctly.
// GroupMarkerExpression | ||
//===----------------------------------------------------------------------===// | ||
|
||
class GroupMarkerExpression : public AbstractExpression { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add some brief high-level documentation about GroupMarkerExpression and how it is used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved.
New additions:
Our full design doc and review guide can be found here.