Transform Optimizations + Trainer Role (#68)

Summary: Pull Request resolved: #68 To better utilize our workers, I enabled the following: 1. Parallel encoding 2. Parallel stream reading 3. I/O decoupling These increased our CPU utilization on workers from ~30% to ~85%, speeding up transforms by a lot. In addition, created a new role to train encoding layouts for jobs (with enough items in them). This trainer is not super robust right now, and will be improved in the future (for example, detect if it is stuck, and restert). Not sure how much this is contributing to transform speed yet. Reviewed By: sdruzkin Differential Revision: D59125380 fbshipit-source-id: 6d0d2ef3bd34ba268719353238d6b8fd176e8446
facebookincubator · Jun 28, 2024 · 5ba326f · 5ba326f
1 parent 277b2c1
commit 5ba326f
Showing 1 changed file with 2 additions and 1 deletion.
diff --git a/dwio/nimble/velox/VeloxWriter.cpp b/dwio/nimble/velox/VeloxWriter.cpp
@@ -471,8 +471,9 @@ VeloxWriter::VeloxWriter(
                [&sb = context_->schemaBuilder]() { return sb.getRoot(); },
                context_->options.featureReordering)}},
       root_{createRootField(*context_, schema_)},
-      spillConfig_{options.spillConfig} {
+      spillConfig_{context_->options.spillConfig} {
   NIMBLE_CHECK(file_, "File is null");
+
   if (context_->options.encodingLayoutTree.has_value()) {
     context_->flatmapFieldAddedEventHandler =
         [&](const TypeBuilder& flatmap,