Skip to content
This repository has been archived by the owner on Sep 27, 2019. It is now read-only.

Optimizer refactor and cost model additions #1484

Merged
merged 23 commits into from
Feb 13, 2019

Conversation

GustavoAngulo
Copy link
Member

@GustavoAngulo GustavoAngulo commented Oct 1, 2018

This PR adds a ton of different things before we move over the optimizer to terrier:

  • Reorganize the optimizer directory. All stats related files are under optimizer/stats, and all cost model files are under optimizer/cost_model
  • Cost models are now entirely in header files. This is to fit with a theme of portability or "plug-and-play", similar to how indexes are entirely in header files.
  • Add in the postgres cost model from 15721 that was never merged in
  • Addition of a trivial cost model under optimizer/cost_model/trivial_cost_model.h. This can be used for debugging or trivial execution. It does not use any statistics, it can be thought of as heuristic based.
  • Fixes a bug involving MurmurHash in optimizer/stats/stats_util.h

@coveralls
Copy link

coveralls commented Oct 1, 2018

Coverage Status

Coverage increased (+0.03%) to 76.54% when pulling 49ba11a on GustavoAngulo:postgres_cost_model into 3bc6d46 on cmu-db:master.

Copy link
Member

@linmagit linmagit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good. Some minor comments.

src/include/optimizer/optimizer.h Outdated Show resolved Hide resolved
@@ -35,9 +37,16 @@ class OptimizerMetadata {
settings::SettingId::task_execution_timeout)),
timer(Timer<std::milli>()) {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarity, do we still need this method here?

namespace peloton {
namespace optimizer {

double PostgresCostCalculator::CalculateCost(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, is this exactly the same of Postgres query optimizer? Including the DEFAULT_TUPLE_COST or DEFAULT_INDEX_TUPLE_COST?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Postgres does it a bit different as far as I can tell. While we have a cost per tuple, postgres estimates a cost for hashing a predicate to cost hash joins. This is something we could try to implement but would require more complex estimators that should probably be in a separate PR

@apavlo
Copy link
Member

apavlo commented Jan 24, 2019

@GustavoAngulo I would not worry about getting this to work on Travis since we're not maintaining it.

@GustavoAngulo GustavoAngulo changed the title Separate cost model from optimizer and add postgres cost model Optimizer refactor and cost model additions Feb 12, 2019
Copy link
Member

@apavlo apavlo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just merging this. We will review when we move to new DBMS.

@apavlo apavlo merged commit a96b376 into cmu-db:master Feb 13, 2019
@GustavoAngulo GustavoAngulo deleted the postgres_cost_model branch February 13, 2019 00:31
mtunique pushed a commit to mtunique/peloton that referenced this pull request Apr 16, 2019
* Separate cost model from optimizer and add postgres cost model

* Fix bug overflow in Analyze

* Changes to cost model construction

* Fix bug in stats hashing

* Fix to commutativity of equality comparison expressions

* Revert group equality, this works

* Move Postgres cost model to header file and add some starting plan tests, along with more optimizer test utility functions

* Remove printf

* Remove old optimizer constructor

* Fix unused variable

* Testing if changing llvm path fixes travis

* Did not work : (

* Ok trying changing to 3.9.1_2

* Revert "Fix bug overflow in Analyze"

This reverts commit fcbf161.

* Update LLVM dir in travis config

* Add trivial cost model

* Move files into stats folder

* Add test cases for trivial cost model

* Delete cost.h and cost.cpp that were commented out

* Cost model name and directory refactoring

* Fix three join test
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants