Skip to content

Commit

Permalink
Transform Optimizations + Trainer Role (#68)
Browse files Browse the repository at this point in the history
Summary:
Pull Request resolved: #68

To better utilize our workers, I enabled the following:
1. Parallel encoding
2. Parallel stream reading
3. I/O decoupling

These increased our CPU utilization on workers from ~30% to ~85%, speeding up transforms by a lot.

In addition, created a new role to train encoding layouts for jobs (with enough items in them).
This trainer is not super robust right now, and will be improved in the future (for example, detect if it is stuck, and restert).
Not sure how much this is contributing to transform speed yet.

Reviewed By: sdruzkin

Differential Revision: D59125380

fbshipit-source-id: 6d0d2ef3bd34ba268719353238d6b8fd176e8446
  • Loading branch information
helfman authored and facebook-github-bot committed Jun 28, 2024
1 parent 277b2c1 commit 5ba326f
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion dwio/nimble/velox/VeloxWriter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -471,8 +471,9 @@ VeloxWriter::VeloxWriter(
[&sb = context_->schemaBuilder]() { return sb.getRoot(); },
context_->options.featureReordering)}},
root_{createRootField(*context_, schema_)},
spillConfig_{options.spillConfig} {
spillConfig_{context_->options.spillConfig} {
NIMBLE_CHECK(file_, "File is null");

if (context_->options.encodingLayoutTree.has_value()) {
context_->flatmapFieldAddedEventHandler =
[&](const TypeBuilder& flatmap,
Expand Down

0 comments on commit 5ba326f

Please sign in to comment.