Skip to content

BH Operator Framework

Paul Rogers edited this page Jan 12, 2018 · 2 revisions

Now that we've seen the main event, we are ready to turn to the after-party: the changes made to allow the scan operator to use the result set loader. We start with a revised operator framework.

Background

Like all query engines, Drill has the concept of an "operator". Each operator has several representations: the plan-time "physical" operator, the run-time physical implementation and so on.

When first building a system, it is often not clear exactly how to partition the system. Rather than making it perfect the first time, often we take our best guess, knowing we can refactor the system later based on what we learned.

Drill's first cut at the operator implementation turned out to combine the idea of an operator and the record batch on which the operator does its work. The result was the RecordBatch concept which is, in essence, an operator implementation, but has the name of the data that the operator manipulates. The result is that each operator ("record batch") is quite complex because each takes on many tasks.

Another consequence of the existing design is that, to test an operator, it must sit within an operator stack which must be bound to a fragment context, which is bound to a drillbit context. The result is that, to test any one operator, we need a full Drill server (or a bunch of mocks.) This arrangement throws sand in the gears when trying to create comprehensive unit tests. (It is very hard to set up a specific test case if one must do so at the level of an entire query.)

The work needed to revise the scan operator provided an opportunity to try a different approach. The operator framework is not absolutely required, but it did turn out to be a very simple way to design, implement and test the revised scan operator. The team can decide if this approach is useful for other operators.

Requirements

Structure

Lifecycle

Usage

Clone this wiki locally