Skip to content
This repository has been archived by the owner on Sep 27, 2019. It is now read-only.

Syntax-based query rewriter #1494

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

newtoncx
Copy link

@newtoncx newtoncx commented Apr 9, 2019

The code here uses templates across many optimizer files to leverage the old query rewriter and allow it to operate on expression trees. A very simple rewriting task is passing. Use of abstract interfaces may provide a cleaner way to generalize the rewriter, and the code is currently in progress in a separate branch of development.

- pattern
- rule
- ruleset
- group
- groupexpression
- binding
- memo
- optimize_context
- optimizer_task (TopDownRewrite/BottomUpRewrite)

Templates generally followed:
template <class Node, class OperatorType, class OperatorExpr>

The template instantiation associated with:
Node = Operator, OperatorType = OpType, OperatorExpr = OperatorExpression
is used primarily by the core Optimizer. All references to the templated
files/classes from core optimizer files were instantiated to that.

Note worth mentioning:
Operator class defines a public interface wrapper around BaseOperatorNode,
basically defines a single logical/physical operator.

OpType class defines the various logical/physical operations
OperatorExpression class is essentially a tree of Operator
Possibly annoying problems w.r.t Peloton/terrier:
(1) Use of unique_ptr/raw pointer as opposed to shared_ptr in AbstractExpression
(2) AbstractExpression equality comparison method

Additional components needed:
- Dynamic/template/strategy rule evaluation (particularly comparison)
- Repeated/multi-level application of rules
- Layer to convert from memo -> AbstractExpression
- Some refactoring w.r.t templated code
- Better AbsExpr_Container/Expression indirection layer
  (intended to present a similar interface exposed by
   Operator/OperatorExpression relied upon by core logic)
- Proper memory management strategy (tightly coupled to problem #1)
What still doesn't work/don't care about yet/not done
- proper memory management (terrier uses shared_ptr anyways)

- other 1-level rewrites, multi-layer rewrites, other expr rewrites

- how can we define a grammar to programmatically create these rewrites?
  (the one we have is way too static...)

- in relation to logical equivalence:
  (1) how do we generate logically equivalent expressions:
      - multi-pass using generating rules (similar to ApplyRule) OR
      - from Pattern A, generate logically equivalent set of patterns P OR
      - transform input expression to match certain specification OR
      - ???
  (2) what operators do we support translating?
      - probably (a AND b) ====> (b AND a)
      - probably (a OR b) ====> (b OR a)
      - probably (a = b) ====> (b = a)
      - maybe more???
  (3) do we want multi level translations?
      - i.e (a AND b) AND c ====> (a AND (b AND c))
      - what order do we do these in?
  May have to modify these operations:
  - Some assertions in TopDownRewrite/BottomUpRewrite w.r.t to the iterator
  - Possibly binding.cpp / optimizer_metadata.h / optimizer_task.cpp

Issues still pending:
- Comparing Values (Matt email/discussion)
- r.rule->Check (terrier issue cmu-db#332)
@newtoncx newtoncx changed the title Simple rewrite functionality Syntax-based query rewriter Apr 9, 2019
@coveralls
Copy link

Coverage Status

Coverage decreased (-1.1%) to 75.418% when pulling f4d4e8f on 17zhangw:templatize into 484d76d on cmu-db:master.

TEST_F(RewriterTests, SimpleEqualityTree) {
// [=]
// [=] [=]
// [4] [5] [3] [3]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a line to show what you expect it to be rewritten to, i.e. // false.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add a comment similar to this one to the other tests so we can easily know what they are testing.

LOW = 1
};

class ComparatorElimination: public Rule<AbsExpr_Container,ExpressionType,AbsExpr_Expression> {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if you added some documentation to this, even if it's simplistic.

@@ -68,10 +73,10 @@ class Memo {
//===--------------------------------------------------------------------===//
// For rewrite phase: remove and add expression directly for the set
//===--------------------------------------------------------------------===//
void RemoveParExpressionForRewirte(GroupExpression* gexpr) {
void RemoveParExpressionForRewirte(GroupExpression<Node,OperatorType,OperatorExpr>* gexpr) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo?

Rewriter(const Rewriter &) = delete;
Rewriter &operator=(const Rewriter &) = delete;
Rewriter(Rewriter &&) = delete;
Rewriter &operator=(Rewriter &&) = delete;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is necessary as this is default behavior I think

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider DISALLOW_COPY_AND_MOVE macro.

@@ -14,34 +14,71 @@
#include "optimizer/memo.h"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if you added more documentation to memo.h to give a better high-level idea of what these Memo objects are used for.

// AbsExpr_Container does *not* handle memory correctly w.r.t internal instantiations
// from Rule transformation. This is since Peloton itself mixes unique_ptrs and
// hands out raw pointers which makes adding a shared_ptr here extremely problematic.
// terrier uses only shared_ptr when dealing with AbstractExpression trees.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(All Terrier parser behavior can be changed, just FYI. If anything would make it more convenient for you, make the case for it.)

@@ -85,16 +88,18 @@ class Optimizer : public AbstractOptimizer {

void Reset() override;

OptimizerMetadata &GetMetadata() { return metadata_; }
OptimizerMetadata<Operator,OpType,OperatorExpression> &GetMetadata() { return metadata_; }

/* For test purposes only */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of inherited from bad decisions before, but in terrier we try pretty hard to not have public test-only functions and use FRIEND_TEST instead.

namespace optimizer {

/* Rules are applied from high to low priority */
enum class RulePriority : int {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I know nothing about optimizers) Is priority a well-defined concept for optimization or is this a rough heuristic? What happens if you have two rules of the same priority that can both be applied, is it always arbitrary which should go first?

}
return false;
}

Copy link

@KatiaVi KatiaVi Apr 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may not be a huge deal since, as you said, the == op doesn't affect correctness but do you intend to implement the == op for rewrites? If so, does there exist an == op for AbstractExpression you can use?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe Terrier currently only has the notion of logical equality for abstract expressions.

}
}
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does Rebuild do in context of its AbsExpr_Container? You could probably add some more documentation for this function.

#include "optimizer/rule.h"
#include "optimizer/absexpr_expression.h"

#include <memory>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: clang-tidy will complain about the ordering of the imports so be aware of that in the future. I believe native libraries such as should go first here.

#include "expression/comparison_expression.h"
#include "expression/constant_value_expression.h"

#include <memory>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same nitpick about ordering #include's.

using GroupExpressionTemplate = GroupExpression<AbsExpr_Container,ExpressionType,AbsExpr_Expression>;

using GroupTemplate = Group<AbsExpr_Container,ExpressionType,AbsExpr_Expression>;

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to see more documentation of the functions in Rewriter as well explanations for the templates you use. Maybe a high level explanation of what the rewriter does at the top of the rewriter.h file would be helpful?

Copy link

@amlatyrngom amlatyrngom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may just be due to Peloton, but the code in some parts is fairly cryptic, so it would help to have more documentation to understand what it's doing. Also, I think terrier may reject PRs that do not contain enough documentation for functions.

std::shared_ptr<PropertySet> required_prop,
double cost_upper_bound = std::numeric_limits<double>::max())
: metadata(metadata),
required_prop(required_prop),
cost_upper_bound(cost_upper_bound) {}

OptimizerMetadata *metadata;
OptimizerMetadata<Operator,OperatorType,OperatorExpr> *metadata;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for style: the rest of the code names class members as metadata_ or required_prop_.
In addition, is there a particular reason why these members are public?

virtual bool Empty() = 0;
};

/**
* @brief Stack implementation of the task pool
*/
class OptimizerTaskStack : public OptimizerTaskPool {
template <class Node, class OperatorType, class OperatorExpr>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using the final keyword if this class is not be inherited.

TEST_F(RewriterTests, SimpleEqualityTree) {
// [=]
// [=] [=]
// [4] [5] [3] [3]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add a comment similar to this one to the other tests so we can easily know what they are testing.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants