Logical Query Optimization

Transform a logical plan into an ==equivalent logical plan== using ==pattern matching rules==.

The goal is to increase the likelihood of enumerating the optimal plan in the search.

==Cannot compare plans== because there is ==no cost model== but ==can "direct" a transformation to a preferred side==.

Split Conjunctive Predicates

Predicate Pushdown

Replace Cartesian Products with Joins

Projection Pushdown

Physical Query Optimization

Transform a query plan's logical operators into physical operators.

Add more execution information
Select indexes / access paths
Choose operator implementations
Choose when to materialize (i.e., temp tables).

This stage must support cost model estimates.

All the queries we have looked at so far have had the following properties:

Equi/Inner Joins
Simple join predicates that reference only two tables.
No cross products

Real-world queries are much more complex:

Outer Joins
Semi-joins
Anti-joins

Reordering Limittations

Plan Enumeration

Approach #1: Transformation

Modify some part of an existing query plan to transform it into an alternative plan that is equivalent.

Approach #2: Generative

Assemble building blocks to generate a query plan.

Dynamic Programming Optimizer

Model the query as a hypergraph and then incrementally expand to enumerate new plans.

Algorithm Overview:

Iterate connected sub-graphs and incrementally add new edges to other nodes to complete query plan.
Use rules to determine which nodes the traversal is allowed to visit and expand.

Cascades Optimizer ⭐️

Object-oriented implementation of the Volcano query optimizer.

Supports simplistic expression re-writing through a direct mapping function rather than an exhaustive search.

Optimization tasks as data structures.
Rules to place property enforcers.
Ordering of moves by promise.
Predicates as logical/physical operators.

Expressions

Groups

Multi Expression

Rules

A rule is a transformation of an expression to a logically equivalent expression.

==Transformation Rule==: Logical to Logical
==Implementation Rule==: Logical to Physical

Each rule is represented as a pair of attributes:

==Pattern==: Defines the structure of the logical expression that can be applied to the rule.
==Substitute==: Defines the structure of the result after applying the rule.

Memo Table

Stores all previously explored alternatives in a compact graph structure / hash table.
Equivalent operator trees and their corresponding plans are stored together in groups.
Provides memoization, duplicate detection, and property + cost management.

PRINCIPLE OF OPTIMALITY

Every sub-plan of an optimal plan is itself optimal.

This allows the optimizer to restrict the search space to a smaller set of expressions. (注：即 ==branch-and-bound== search)

The optimizer never has to consider a plan containing sub-plan P1 that has a greater cost than equivalent plan P2 with the same physical properties.

初始状态的 memo 如图所示，我们最终需要的结果是 [ABC] 三表 join cost 最小的执行计划，我们首先使用 Transformation Rule 得到 [ABC] 一个等价的 Logical Expression [AB] Join [C]，接着我们进行==深度优先搜索==。
当搜索到最下层的时候，对 GET(A) 使用 Implementation Rule 生成 2 个 Physical Expression，然后找到 GET(A) cost 最小的 Physical Expression 是 SeqScan(A)，并记录到 hash table 中。
同理找到 B 的最小 Physical Expression
然后搜索回到 [AB]，通过 Transformation Rule 得到另一个等价的 Logical Expression [B] Join [A]，这时我们发现 hash table 中已经有 Get(A) 和 Get(B) 的最优 Physical Expression 了，所以没必要继续搜索了。
然后对所有的 Logical Expression 应用 Implementation Rule 得到一些 Physical Expression，然后我们得到 [AB] cost 最小的 Physical Expression 是 [A] SM-Join [B]，把它记录到 hash table 中。
搜索继续回到最上层，使用同样的方法找到 [C] cost 最小的 Physical Expression
然后对 [ABC] 应用 Transformation Rule 得到更多的 Logical Expression。

Tips:

在后面搜索的过程中，如果发现 cost 已经大于第一个的结果，可以使用 branch-and-bound 规则进行剪枝

在 Cascade 的论文中， Transformation Rule 和 Implementation Rule 是混合交替使用的，有一定的规则来决定顺序

Search Termination

Approach #1: Wall-clock Time

Stop after the optimizer runs for some length of time.

Approach #2: Cost Threshold

Stop when the optimizer finds a plan that has a lower cost than some threshold.

Approach #3: Transformation Exhaustion

Stop when there are no more ways to transform the target plan. Usually done per group.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lecture #20: Optimizer Implementation (Part II).md

Lecture #20: Optimizer Implementation (Part II).md

Logical Query Optimization

Split Conjunctive Predicates

Predicate Pushdown

Replace Cartesian Products with Joins

Projection Pushdown

Physical Query Optimization

Reordering Limittations

Plan Enumeration

Dynamic Programming Optimizer

Cascades Optimizer ⭐️

Expressions

Groups

Multi Expression

Rules

Memo Table

Search Termination

Files

Lecture #20: Optimizer Implementation (Part II).md

Latest commit

History

Lecture #20: Optimizer Implementation (Part II).md

File metadata and controls

Logical Query Optimization

Split Conjunctive Predicates

Predicate Pushdown

Replace Cartesian Products with Joins

Projection Pushdown

Physical Query Optimization

Reordering Limittations

Plan Enumeration

Dynamic Programming Optimizer

Cascades Optimizer ⭐️

Expressions

Groups

Multi Expression

Rules

Memo Table

Search Termination