Skip to content

EVF Tutorial Overview

Paul Rogers edited this page May 22, 2019 · 3 revisions

Overview

This tutorial focuses on a real-world use case: a specific format plugin based on Drill's "Easy" framework. We'll walk through the steps to convert the plugin from the traditional way to create vectors to an implementation based on the EVF.

The Log Plugin

The Drill log plugin is the focus of this tutorial. A simplified version of this plugin is explained in the Learning Apache Drill book. The version used here is the one which ships with Drill.

The focus here is on the conversion to EVF, rather than the details of the plugin. Each plugin has its own internal structure, so we leave it to the reader to map from the log reader to some other plugin.

Plugin Design

Most format plugins are based on the "Easy" framework. EVF extends the "Easy" framework, offering a simplified plugin implementation based on EVF. The Easy framework supports both styles; we select one or the other (or even both) based on a few lines of code.

"Legacy" plugins are based on the idea of a "record reader" (a concept borrowed from Hive.) Unlike the hive record readers, Drill's never read a single record: they all read a batch of records. In EVF, the reader changes to be a "row batch reader" which implements a new batch-focused interface.

In Drill 1.16 and earlier, the LogRecordReader uses a typical method to write to value vectors using the associated Mutator class. Other readers tried to be more clever. For example, the "V2" text reader (Drill 1.16 and earlier) worked with direct memory itself, handling its own buffer allocation, offset vector calculations and so on.

With the EVF, we'll replace the Mutator (or direct access to vectors) with a ColumnWriter. We'll first do the simplest possible conversion, then look at how to use advanced features, such as type conversions, schema and table properties.

Let's work though the needed changes one-by-one.


Next: Plugin Revisions

Clone this wiki locally