Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local execution tests #1418

Merged
merged 61 commits into from
Aug 22, 2024

Conversation

reyna-abhyankar
Copy link
Collaborator

@reyna-abhyankar reyna-abhyankar commented Jun 21, 2024

Description of changes:

Tests for local backing workflow

Related Issues:

Linked Issues:

  • Issue #

Issues closed by this PR:

  • Closes #

This change is Reviewable

@reyna-abhyankar reyna-abhyankar self-assigned this Jun 21, 2024
@reyna-abhyankar reyna-abhyankar added repo-refactor Topics related to the repo and search refactors rr-runtime labels Jun 21, 2024
@reyna-abhyankar reyna-abhyankar marked this pull request as draft June 21, 2024 00:48
@reyna-abhyankar reyna-abhyankar marked this pull request as ready for review June 22, 2024 18:27
Copy link
Collaborator

@lockshaw lockshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you double check and make sure all of the std::couts are removed? I left comments on all I found but it's not impossible that I missed one or two

Reviewed 48 of 67 files at r8, 1 of 1 files at r10, 34 of 34 files at r11, all commit messages.
Reviewable status: all files reviewed, 51 unresolved discussions (waiting on @reyna-abhyankar)


lib/kernels/include/kernels/accessor.h line 172 at r11 (raw file):

              "");

std::string format_as(std::vector<GenericTensorAccessorR> const &x);

Why do we have this instead of just having fmt support for GenericTensorAccessorR and using the default utils/fmt/vector.h fmt support for std::vector?


lib/kernels/include/kernels/device.h line 102 at r4 (raw file):

    std::stringstream _error;                                                  \
    if (status != 0) {                                                         \
      _error << "CUDA failure: " << cudaGetErrorString(status) << " ("         \

This should be kept, why was this removed?


lib/kernels/src/ff_handle.cc line 6 at r11 (raw file):

std::string format_as(PerDeviceFFHandle const &x) {
  return fmt::format("PerDeviceFFHandle");

Why not include some fields in the fmt description?


lib/local-execution/include/local-execution/cost_estimate.h line 21 at r11 (raw file):

                    std::vector<ParallelTensorAttrs> const &outputs,
                    MachineView const &mv) const = 0;
  virtual CostDetails

Remove this second overload and just ignore MachineView where it's not necessary. Overall CostEstimator should include network modelling, etc. rather than just being a thin interface onto kernel runtimes


lib/local-execution/include/local-execution/device_specific_device_states.variant.toml line 2 at r11 (raw file):

namespace = "FlexFlow"
name = "DeviceSpecificDeviceStates"

What is the purpose of this variant? What's the reasoning for having a variant<DeviceSpecific<T>, ...> vs a DeviceSpecific<variant<T, ...>>?


lib/local-execution/include/local-execution/device_states.variant.toml line 2 at r11 (raw file):

namespace = "FlexFlow"
name = "DeviceStates"

Suggestion:

name = "PerDeviceState"

lib/local-execution/include/local-execution/local_task_argument_accessor.h line 47 at r11 (raw file):

CHECK_RC_COPY_VIRTUAL_COMPLIANT(LocalTaskArgumentAccessor);

std::string format_as(std::unordered_map<slot_id_t, ConcreteArgSpec> const &x);

utils/fmt/unordered_map.h


lib/local-execution/include/local-execution/local_training_backing.h line 17 at r11 (raw file):

                       TensorBackingMap const &,
                       RuntimeArgConfig const &);
  ~LocalTrainingBacking() = default;

lib/local-execution/include/local-execution/local_training_backing.h line 38 at r11 (raw file):

  ComputationGraph computation_graph;
  TaskRegistry task_registry;
  LocalSlotsBacking local_slots_backing;

Suggestion:

private:  
  Allocator allocator;
  ComputationGraph computation_graph;
  TaskRegistry task_registry;
  LocalSlotsBacking local_slots_backing;

lib/local-execution/include/local-execution/op_task_invocation.h line 102 at r11 (raw file):

  std::unordered_map<slot_id_t, OpArgSpec> arg_bindings;
  std::tuple<decltype(tensor_bindings) const &, decltype(arg_bindings) const &>
      tie() const;

Having a separate section for member variables can improve readability

Suggestion:

private:
  std::unordered_map<SlotGradId, OpTensorSpec> tensor_bindings;
  std::unordered_map<slot_id_t, OpArgSpec> arg_bindings;

private:
  void insert_arg_spec(slot_id_t name, OpArgSpec const &arg_spec);
  std::tuple<decltype(tensor_bindings) const &, decltype(arg_bindings) const &>
      tie() const;

lib/local-execution/include/local-execution/task_argument_accessor.h line 13 at r11 (raw file):

  template <typename T>
  T const &get_argument(slot_id_t slot) const {
    if constexpr (DeviceStates::IsPartOfDeviceStates_v<T>) {

Is doing this based on type the best way of doing this, or should this be done based on whether a value was passed as device-specific in the invocation?


lib/local-execution/include/local-execution/task_impl_function.variant.toml line 12 at r7 (raw file):

Previously, reyna-abhyankar (Reyna Abhyankar) wrote…

I think here the key is expressive enough for the type anyway?

key isn't really used for much except for json serialization iirc


lib/local-execution/include/local-execution/task_registry.h line 12 at r11 (raw file):

namespace FlexFlow {

struct TaskRegistry {

Move TaskRegistry over to dtgen?


lib/local-execution/include/local-execution/task_registry.h line 26 at r11 (raw file):

  bool operator!=(TaskRegistry const &other) const;

private:

Suggestion:

  bool operator==(TaskRegistry const &other) const;
  bool operator!=(TaskRegistry const &other) const;

private:  
  std::unordered_map<layer_guid_t, std::optional<task_id_t>> init_task_ids;
  std::unordered_map<layer_guid_t, std::optional<task_id_t>> forward_task_ids;
  std::unordered_map<layer_guid_t, std::optional<task_id_t>> backward_task_ids;
  std::unordered_map<task_id_t, TaskSignatureAndImpl> task_mapping;
private:

lib/local-execution/include/local-execution/task_registry.h line 35 at r11 (raw file):

std::string format_as(
    std::unordered_map<layer_guid_t, std::optional<task_id_t>> const &x);

Why not use the default utils/fmt/unordered_map.h and utils/fmt/optional.h?


lib/local-execution/include/local-execution/task_signature_impl.h line 19 at r7 (raw file):

Previously, reyna-abhyankar (Reyna Abhyankar) wrote…

The TaskImplFunction doesn't have eq because I don't think that works for std::function, so for this I think it's fine? since the == for TaskSignatureAndImpl isn't actually comparing all values, just the signatures.

In that case it would be better to have a separate function rather than an operator==--I'd recommend only having operator== if the type does actually support equality


lib/local-execution/src/concrete_arg.cc line 19 at r11 (raw file):

std::string format_as(ConcreteArgSpec const &x) {
  std::ostringstream oss;
  oss << "<ConcreteArgSpec";

Would probably be good to print the value too since it's concrete?


lib/local-execution/src/device_states.cc line 11 at r11 (raw file):

      [device_idx](DeviceSpecific<MHAPerDeviceState> const
                       &device_specific_device_state) {
        std::cout << "im";

Delete


lib/local-execution/src/device_states.cc line 14 at r11 (raw file):

        return DeviceStates{*(device_specific_device_state.get(device_idx))};
      },
      // [device_idx](DeviceSpecific<BatchNormPerDeviceState> const

What is going on here?


lib/local-execution/src/local_training_backing.cc line 21 at r6 (raw file):

Previously, reyna-abhyankar (Reyna Abhyankar) wrote…

Why can't its output tensor guid just point to the same region?

Probably? It's fine for now, this can always be addressed if necessary when the legion backend is added


lib/local-execution/src/local_training_backing.cc line 33 at r7 (raw file):

Previously, reyna-abhyankar (Reyna Abhyankar) wrote…

Shouldn't every operator have std::nullopt by default? Some ops don't have init task for example, but if we just moved this insertion into the else case, then those operators won't have a key in the init_task_ids

In that case you'd add it to the list of attrs not to register tasks for along with WeightAttrs, InputAttrs, and NoopAttrs--that at least seems to be the current concept behind how the code works, so having two different methods for not registering a task seems odd to me


lib/local-execution/src/local_training_backing.cc line 60 at r11 (raw file):

      DeviceSpecificDeviceStates device_state =
          this->call_init_task_impl(invocation.task_id, accessor);
      std::cout << "post init";

Delete


lib/local-execution/src/op_task_signature.cc line 120 at r11 (raw file):

OpTaskSignature infer_bwd_signature(OpTaskSignature const &fwd) {
  OpTaskSignature bwd(fwd);

Suggestion:

  OpTaskSignature bwd = fwd;

lib/local-execution/src/task_registry.cc line 38 at r11 (raw file):

      this->task_mapping.insert({task_id, task_signature_impl});
    }
  }

It seems like I'd like to know if something I expected to register a task doesn't--otherwise the optional doesn't have much of a point--yes this will complain if something isn't registered, but that seems like the point

Suggestion:

  if (!(attrs.has<WeightAttrs>() || attrs.has<InputAttrs>() ||
        attrs.has<NoopAttrs>())) {
    // register tasks
    std::vector<task_id_t> task_ids = get_task_ids(attrs);
    for (task_id_t task_id : task_ids) {
      TaskSignatureAndImpl task_signature_impl = get_task_sig_impl(task_id);
      switch (task_signature_impl.task_signature.type) {
        case OpTaskType::INIT:
          assert(is_invocation_valid(task_signature_impl.task_signature,
                                     init(attrs)));
          this->init_task_ids[op_id] = task_id;
          break;
        case OpTaskType::FWD:
          assert(is_invocation_valid(task_signature_impl.task_signature,
                                     forward(attrs)));
          this->forward_task_ids[op_id] = task_id;
          break;
        case OpTaskType::BWD:
          assert(is_invocation_valid(task_signature_impl.task_signature,
                                     backward(attrs)));
          this->backward_task_ids[op_id] = task_id;
          break;
        default:
          throw mk_runtime_error("Invalid OpTaskType");
      }
      this->task_mapping.insert({task_id, task_signature_impl});
    } else {
      this->init_task_ids.insert({op_id, std::nullopt});
      this->forward_task_ids.insert({op_id, std::nullopt});
      this->backward_task_ids.insert({op_id, std::nullopt});
    }
  }

lib/local-execution/src/task_registry.cc line 61 at r11 (raw file):

std::string format_as(
    std::unordered_map<layer_guid_t, std::optional<task_id_t>> const &x) {

utils/fmt/unordered_map.h


lib/local-execution/src/task_registry.cc line 72 at r11 (raw file):

std::string
    format_as(std::unordered_map<task_id_t, TaskSignatureAndImpl> const &x) {

utils/fmt/unordered_map.h


lib/local-execution/src/task_registry.cc line 82 at r11 (raw file):

}

std::string format_as(TaskRegistry const &x) {

Fix (if you move TaskRegistry over to dtgen, this should be automatically generated for you)


lib/local-execution/src/task_signature_impl.cc line 314 at r11 (raw file):

std::string format_as(TaskSignatureAndImpl const &x) {
  return fmt::format("TaskSignatureAndImpl");

Fix (ideally through dtgen)


lib/local-execution/src/tasks.cc line 6 at r11 (raw file):

namespace FlexFlow {

std::string format_as(task_id_t const &x) {

Probably best to just make task_id_t a dtgen enum, and then you get format_as for free. We'll need a bit of conversion code when registering things in Legion, but it should be pretty trivial


lib/local-execution/src/ops/attention.cc line 134 at r11 (raw file):

                                                   kvSeqLength,
                                                   attrs.add_bias_kv);
  std::cout << "pre fn return";

Remove


lib/local-execution/src/ops/conv_2d.cc line 54 at r11 (raw file):

}

static DeviceSpecificDeviceStates

Why did this return type get changed? Why not just return a DeviceSpecific<Conv2DPerDeviceState>?

Code quote:

DeviceSpecificDeviceStates

lib/local-execution/test/src/test_local_slots_backing.cc line 12 at r11 (raw file):

TEST_SUITE(FF_TEST_SUITE) {
  TEST_CASE("Local Slots Backing -- Attention Op") {

Suggestion:

  TEST_CASE("LocalSlotsBacking -- Attention Op") {

lib/local-execution/test/src/test_local_slots_backing.cc line 31 at r11 (raw file):

    GenericTensorAccessorW key = allocator.allocate_tensor(input_tensor_shape);
    GenericTensorAccessorW value =
        allocator.allocate_tensor(input_tensor_shape);

Suggestion:

    TensorShape input_tensor_shape = TensorShape{
        TensorDims{FFOrdered<size_t>{batch_size, seq_len, feature_size}},
        DataType::FLOAT,
    };
    TensorShape query_shape = input_tensor_shape;
    TensorShape key_shape = input_tensor_shape;
    TensorShape value_shape = input_tensor_shape;
    GenericTensorAccessorW query =
        allocator.allocate_tensor(query_shape);
    GenericTensorAccessorW key = allocator.allocate_tensor(key_shape);
    GenericTensorAccessorW value =
        allocator.allocate_tensor(value_shape);

lib/local-execution/test/src/test_local_slots_backing.cc line 77 at r11 (raw file):

    for (layer_guid_t const &node :
         topological_ordering(cg_builder.computation_graph)) {
      local_slots_backing.allocate_tensors(

If we're testing allocate_tensors, while are we calling allocate_tensors so many times (i.e., in a loop)? Seems like it would be easier to call it for one input and then check the output


lib/local-execution/test/src/test_local_slots_backing.cc line 81 at r11 (raw file):

    }
    SUBCASE("Allocate and insert new tensors into slots") {
      SUBCASE("Tensor allocation") {

Suggestion:

      SUBCASE("Tensor allocation") {
        auto get_result_shape_and_dtype_for_tensor_guid = [&](tensor_guid_t t) -> std::pair<ArrayShape, DataType> {
          GenericTensorAccessorW accessor = local_slots_backing.gradient_tensor_backing.at(query_guid);
          return get_shape_and_datatype(query_grad); 
        });

lib/local-execution/test/src/test_local_slots_backing.cc line 90 at r11 (raw file):

              get_shape_and_datatype(query_grad);
          CHECK(result == correct_input);
        }

Suggestion:

        SUBCASE("Query grad") {
          std::pair<ArrayShape, DataType> result = [] {
            GenericTensorAccessorW query_grad =
              local_slots_backing.gradient_tensor_mapping.at(query_guid);
            return get_shape_and_datatype(query_grad);
          }();

          std::pair<ArrayShape, DataType> correct = {ArrayShape{query_shape}, dtype};
          CHECK(result == correct);
        }

lib/local-execution/test/src/test_local_slots_backing.cc line 124 at r11 (raw file):

          CHECK(result == correct_output);
        }
      }

Suggestion:

    SUBCASE("LocalSlotsBacking::allocate_tensors") {
      for (layer_guid_t const &node :
         topological_ordering(cg_builder.computation_graph)) {
        local_slots_backing.allocate_tensors(
          node, cg_builder.computation_graph, allocator);
      }
      SUBCASE("Tensor allocation") {
        std::pair<ArrayShape, DataType> correct_input =
            std::make_pair(ArrayShape{input_tensor_shape}, dtype);
        SUBCASE("Query grad") {
          GenericTensorAccessorW query_grad =
              local_slots_backing.gradient_tensor_mapping.at(query_guid);
          std::pair<ArrayShape, DataType> result =
              get_shape_and_datatype(query_grad);
          CHECK(result == correct_input);
        }
        SUBCASE("Key grad") {
          GenericTensorAccessorW key_grad =
              local_slots_backing.gradient_tensor_mapping.at(key_guid);
          std::pair<ArrayShape, DataType> result =
              get_shape_and_datatype(key_grad);
          CHECK(result == correct_input);
        }
        SUBCASE("Value grad") {
          GenericTensorAccessorW value_grad =
              local_slots_backing.gradient_tensor_mapping.at(value_guid);
          std::pair<ArrayShape, DataType> result =
              get_shape_and_datatype(value_grad);
          CHECK(result == correct_input);
        }

        TensorAttrs output_attrs =
            get_tensor_attrs(cg_builder.computation_graph, output_guid);
        std::pair<ArrayShape, DataType> correct_output =
            std::make_pair(ArrayShape{output_attrs.shape}, dtype);
        SUBCASE("Output") {
          GenericTensorAccessorW output =
              local_slots_backing.tensor_mapping.at(output_guid);
          std::pair<ArrayShape, DataType> result =
              get_shape_and_datatype(output);
          CHECK(result == correct_output);
        }
        SUBCASE("Output grad") {
          GenericTensorAccessorW output_grad =
              local_slots_backing.tensor_mapping.at(output_guid);
          std::pair<ArrayShape, DataType> result =
              get_shape_and_datatype(output_grad);
          CHECK(result == correct_output);
        }
      }

lib/local-execution/test/src/test_local_slots_backing.cc line 176 at r11 (raw file):

      }();

      SUBCASE("Tensor Slots Backing") {

Suggestion:

      SUBCASE("LocalSlotsBacking::construct_tensor_slots_backing") {

lib/local-execution/test/src/test_local_slots_backing.cc line 195 at r11 (raw file):

            {SlotGradId{slot_id_t{WEIGHTS}, IsGrad::NO}, weights},
            {SlotGradId{slot_id_t{OUTPUT}, IsGrad::NO}, output},
            {SlotGradId{slot_id_t{QUERY}, IsGrad::YES}, query}};

Suggestion:

        TensorSlotsBacking correct = [&] {
          TensorShape weights_shape = throw_if_unexpected(get_weights_shape(
            attrs, input_tensor_shape, input_tensor_shape, input_tensor_shape));
          GenericTensorAccessorW weights =
            allocator.allocate_tensor(weights_shape);

          TensorAttrs output_attrs =
            get_tensor_attrs(cg_builder.computation_graph, output_guid);
          GenericTensorAccessorW output =
            allocator.allocate_tensor(output_attrs.shape);
          
          return TensorSlotsBacking{
            {SlotGradId{slot_id_t{QUERY}, IsGrad::NO}, query},
            {SlotGradId{slot_id_t{KEY}, IsGrad::NO}, key},
            {SlotGradId{slot_id_t{VALUE}, IsGrad::NO}, value},
            {SlotGradId{slot_id_t{WEIGHTS}, IsGrad::NO}, weights},
            {SlotGradId{slot_id_t{OUTPUT}, IsGrad::NO}, output},
            {SlotGradId{slot_id_t{QUERY}, IsGrad::YES}, query},
          };
      }();

lib/local-execution/test/src/test_local_slots_backing.cc line 197 at r11 (raw file):

            {SlotGradId{slot_id_t{QUERY}, IsGrad::YES}, query}};

        CHECK(are_slots_backings_equivalent_up_to_tensor_allocation_addresses(

Maybe instead have a get_slot_backings_without_tensor_allocation_addresses and compare them as equal?


lib/local-execution/test/src/test_local_slots_backing.cc line 200 at r11 (raw file):

            correct, result));
      }
      SUBCASE("Arg Slots Backing") {

Suggestion:

      SUBCASE("ArgSlotsBacking::construct_arg_slots_backing")

lib/local-execution/test/src/test_local_slots_backing.cc line 216 at r11 (raw file):

            {slot_id_t{PROFILING},
             ConcreteArgSpec::create(runtime_arg_config.profiling_settings)},
            {slot_id_t{HANDLE}, ConcreteArgSpec::create(handle)}};

Suggestion:

        ArgSlotsBacking correct = [&] {
            ParallelTensorShape query_parallel_tensor_shape =
            lift_to_parallel(input_tensor_shape);
          
            return ArgSlotsBacking{
              {slot_id_t{QPROJSIZE},
               ConcreteArgSpec::create(get_qProjSize(attrs))},
              {slot_id_t{ATTRS}, ConcreteArgSpec::create(attrs)},
              {slot_id_t{QUERY_PARALLEL_TENSOR_SHAPE},
               ConcreteArgSpec::create(query_parallel_tensor_shape)},
              {slot_id_t{PROFILING},
               ConcreteArgSpec::create(runtime_arg_config.profiling_settings)},
              {slot_id_t{HANDLE}, ConcreteArgSpec::create(handle)}};
          }();

lib/local-execution/test/src/test_local_slots_backing.cc line 220 at r11 (raw file):

        CHECK(result == correct);

        SUBCASE("FF Handle") {

This seems like a check for resolve_runtime_arg_ref_spec, not construct_arg_slots_backing?


lib/local-execution/test/src/test_local_slots_backing.cc line 230 at r11 (raw file):

      }

      SUBCASE("Device Specific Device States") {

What function is being tested here?

Code quote:

      SUBCASE("Device Specific Device States") {

lib/local-execution/test/src/test_local_task_arg_accessor.cc line 9 at r11 (raw file):

TEST_SUITE(FF_TEST_SUITE) {
  TEST_CASE("Local Task Argument Accessor") {

Suggestion:

  TEST_CASE("LocalTaskArgumentAccessor") {

lib/local-execution/test/src/test_local_task_arg_accessor.cc line 49 at r11 (raw file):

    SUBCASE("get_tensor") {
      SUBCASE("Read-only input tensor") {

Suggestion:

      SUBCASE("get_tensor(slot_id_t, Permissions::RO, IsGrad::NO)") {

lib/local-execution/test/src/test_local_task_arg_accessor.cc line 54 at r11 (raw file):

        GenericTensorAccessorR result = std::get<GenericTensorAccessorR>(
            acc.get_tensor(slot_id_t{INPUT}, Permissions::RO, IsGrad::NO));
        CHECK(correct == result);

Suggestion:

        GenericTensorAccessor correct = GenericTensorAccessor{
          read_only_accessor_from_write_accessor(input),
        };
        GenericTensorAccessor result = acc.get_tensor(slot_id_t{INPUT}, Permissions::RO, IsGrad::NO);
        CHECK(correct == result);

lib/local-execution/test/src/test_local_task_arg_accessor.cc line 81 at r11 (raw file):

        GenericTensorAccessorW result = std::get<GenericTensorAccessorW>(
            acc.get_tensor(slot_id_t{INPUT}, Permissions::RW, IsGrad::YES));
        CHECK(input_grad == result);

Suggestion:

        GenericTensorAccessor result =
            acc.get_tensor(slot_id_t{INPUT}, Permissions::RW, IsGrad::YES);
        GenericTensorAccessor correct = input_grad;
        CHECK(result == correct);

lib/local-execution/test/src/test_local_task_arg_accessor.cc line 93 at r11 (raw file):

            std::get<std::vector<GenericTensorAccessorR>>(
                acc.get_variadic_tensor(
                    slot_id_t{VARIADIC_TENSORS}, Permissions::RO, IsGrad::NO));

Suggestion:

        VariadicGenericTensorAccessor result =
                acc.get_variadic_tensor(
                    slot_id_t{VARIADIC_TENSORS}, Permissions::RO, IsGrad::NO));

lib/local-execution/test/src/test_task_registry.cc line 30 at r11 (raw file):

        }};

    SUBCASE("register single layer") {

Why not just do a single == comparison on the TaskRegistry, rather than having all of these subcases?


lib/local-execution/test/src/test_task_registry.cc line 64 at r11 (raw file):

    SUBCASE("multiple layers same task") {
      layer_guid_t layer_1 = layer_guid_t{Node{1}};
      layer_guid_t layer_2 = layer_guid_t{Node{2}};

Suggestion:

      layer_guid_t other_layer_guid = layer_guid_t{Node{1}};

lib/local-execution/test/src/test_task_registry.cc line 67 at r11 (raw file):

      task_registry.register_tasks_for_layer(layer_guid, attrs);
      task_registry.register_tasks_for_layer(layer_1, attrs);
      task_registry.register_tasks_for_layer(layer_2, attrs);

Suggestion:

      task_registry.register_tasks_for_layer(layer_guid, attrs);
      task_registry.register_tasks_for_layer(other_layer_guid, attrs);

lib/local-execution/src/runtime_arg_ref.cc line 10 at r8 (raw file):

}

RuntimeArgRef<DeviceSpecific<PerDeviceFFHandle>> ff_handle() {

What's the argument for this being a DeviceSpecific<PerDeviceFFHandle> instead of just a PerDeviceFFHandle?

Code quote:

RuntimeArgRef<DeviceSpecific<PerDeviceFFHandle>>

Copy link
Collaborator

@lockshaw lockshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r12, all commit messages.
Reviewable status: all files reviewed, 51 unresolved discussions (waiting on @reyna-abhyankar)

Copy link
Collaborator Author

@reyna-abhyankar reyna-abhyankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Reviewable status: 104 of 177 files reviewed, 51 unresolved discussions (waiting on @lockshaw and @reyna-abhyankar)


lib/kernels/include/kernels/accessor.h line 172 at r11 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Why do we have this instead of just having fmt support for GenericTensorAccessorR and using the default utils/fmt/vector.h fmt support for std::vector?

Done.


lib/kernels/include/kernels/device.h line 102 at r4 (raw file):

Previously, lockshaw (Colin Unger) wrote…

This should be kept, why was this removed?

Temporarily removed it because I was getting a linker error when running proj test for local-execution-tests: undefined reference to 'cudaGetErrorString'


lib/kernels/src/ff_handle.cc line 6 at r11 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Why not include some fields in the fmt description?

Done.


lib/local-execution/include/local-execution/device_specific_device_states.variant.toml line 2 at r11 (raw file):

Previously, lockshaw (Colin Unger) wrote…

What is the purpose of this variant? What's the reasoning for having a variant<DeviceSpecific<T>, ...> vs a DeviceSpecific<variant<T, ...>>?

We want the device specific to hold the actual device state type, and then the init functions for all operators can return one of these device specifics.


lib/local-execution/include/local-execution/local_task_argument_accessor.h line 47 at r11 (raw file):

Previously, lockshaw (Colin Unger) wrote…

utils/fmt/unordered_map.h

Done.


lib/local-execution/include/local-execution/local_training_backing.h line 17 at r11 (raw file):

                       TensorBackingMap const &,
                       RuntimeArgConfig const &);
  ~LocalTrainingBacking() = default;

Done.


lib/local-execution/include/local-execution/local_training_backing.h line 38 at r11 (raw file):

  ComputationGraph computation_graph;
  TaskRegistry task_registry;
  LocalSlotsBacking local_slots_backing;

Done.


lib/local-execution/include/local-execution/op_task_invocation.h line 102 at r11 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Having a separate section for member variables can improve readability

Done.


lib/local-execution/include/local-execution/task_argument_accessor.h line 13 at r11 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Is doing this based on type the best way of doing this, or should this be done based on whether a value was passed as device-specific in the invocation?

I'm going to make an issue for this because it's related to actually adding support for device specific bindings. #1462


lib/local-execution/include/local-execution/task_registry.h line 12 at r11 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Move TaskRegistry over to dtgen?

We modify it on the fly while constructing the local backing, so I think we should keep as is.


lib/local-execution/include/local-execution/task_registry.h line 26 at r11 (raw file):

  bool operator!=(TaskRegistry const &other) const;

private:

Why private these?


lib/local-execution/include/local-execution/task_registry.h line 35 at r11 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Why not use the default utils/fmt/unordered_map.h and utils/fmt/optional.h?

Done.


lib/local-execution/include/local-execution/task_signature_impl.h line 19 at r7 (raw file):

Previously, lockshaw (Colin Unger) wrote…

In that case it would be better to have a separate function rather than an operator==--I'd recommend only having operator== if the type does actually support equality

Done.


lib/local-execution/src/concrete_arg.cc line 19 at r11 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Would probably be good to print the value too since it's concrete?

the actual value is a template function .get<T> and we don't know the type


lib/local-execution/src/local_training_backing.cc line 21 at r6 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Probably? It's fine for now, this can always be addressed if necessary when the legion backend is added

Ok.


lib/local-execution/src/local_training_backing.cc line 33 at r7 (raw file):

Previously, lockshaw (Colin Unger) wrote…

In that case you'd add it to the list of attrs not to register tasks for along with WeightAttrs, InputAttrs, and NoopAttrs--that at least seems to be the current concept behind how the code works, so having two different methods for not registering a task seems odd to me

Changed this so that the get_task_ids function handles the individual attr types instead. I think this cleans up the function


lib/local-execution/src/local_training_backing.cc line 60 at r11 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Delete

Done.


lib/local-execution/src/op_task_signature.cc line 120 at r11 (raw file):

OpTaskSignature infer_bwd_signature(OpTaskSignature const &fwd) {
  OpTaskSignature bwd(fwd);

Done.


lib/local-execution/src/task_registry.cc line 38 at r11 (raw file):

Previously, lockshaw (Colin Unger) wrote…

It seems like I'd like to know if something I expected to register a task doesn't--otherwise the optional doesn't have much of a point--yes this will complain if something isn't registered, but that seems like the point

The purpose of the optional is because not every operator has all tasks. So when we are doing init, for example, an operator without init can be skipped. This solution will throw an error when something you expect NOT to register a task correctly doesn't. Are you saying these maps shouldn't even contain the operators that don't have that task? (in which case why have the optional at all / why even insert weight/noop/input into the map)


lib/local-execution/src/task_registry.cc line 61 at r11 (raw file):

Previously, lockshaw (Colin Unger) wrote…

utils/fmt/unordered_map.h

Done.


lib/local-execution/src/task_registry.cc line 72 at r11 (raw file):

Previously, lockshaw (Colin Unger) wrote…

utils/fmt/unordered_map.h

Done.


lib/local-execution/src/task_registry.cc line 82 at r11 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Fix (if you move TaskRegistry over to dtgen, this should be automatically generated for you)

Made it more descriptive, see other comment about moving it to dtgen


lib/local-execution/src/task_signature_impl.cc line 314 at r11 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Fix (ideally through dtgen)

Done.


lib/local-execution/src/ops/attention.cc line 134 at r11 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Remove

Done.


lib/local-execution/src/ops/conv_2d.cc line 54 at r11 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Why did this return type get changed? Why not just return a DeviceSpecific<Conv2DPerDeviceState>?

previously it was still a variant, just the DeviceSpecific. This is for consistency as mentioned above.


lib/local-execution/test/src/test_local_slots_backing.cc line 12 at r11 (raw file):

TEST_SUITE(FF_TEST_SUITE) {
  TEST_CASE("Local Slots Backing -- Attention Op") {

Done.


lib/local-execution/test/src/test_local_slots_backing.cc line 31 at r11 (raw file):

    GenericTensorAccessorW key = allocator.allocate_tensor(input_tensor_shape);
    GenericTensorAccessorW value =
        allocator.allocate_tensor(input_tensor_shape);

Done.


lib/local-execution/test/src/test_local_slots_backing.cc line 77 at r11 (raw file):

Previously, lockshaw (Colin Unger) wrote…

If we're testing allocate_tensors, while are we calling allocate_tensors so many times (i.e., in a loop)? Seems like it would be easier to call it for one input and then check the output

The interface for allocate tensors allocates all outgoing tensors for a layer (because the assumption is that all incoming tensors have been allocated by the previous call). So it's actually much easier this way unless we want to change that interface to make it more granular (i.e. a function that takes a shape, allocator, and tensor guid and then makes the appropriate allocator call). We could do that, there'd just be a bunch of smaller functions when the current allocate_tensors is already only ~30 lines


lib/local-execution/test/src/test_local_slots_backing.cc line 81 at r11 (raw file):

    }
    SUBCASE("Allocate and insert new tensors into slots") {
      SUBCASE("Tensor allocation") {

Done.


lib/local-execution/test/src/test_local_slots_backing.cc line 90 at r11 (raw file):

              get_shape_and_datatype(query_grad);
          CHECK(result == correct_input);
        }

Done.


lib/local-execution/test/src/test_local_slots_backing.cc line 124 at r11 (raw file):

          CHECK(result == correct_output);
        }
      }

The later subcase also relies on the allocation so it should be pulled out.


lib/local-execution/test/src/test_local_slots_backing.cc line 176 at r11 (raw file):

      }();

      SUBCASE("Tensor Slots Backing") {

Done.


lib/local-execution/test/src/test_local_slots_backing.cc line 195 at r11 (raw file):

            {SlotGradId{slot_id_t{WEIGHTS}, IsGrad::NO}, weights},
            {SlotGradId{slot_id_t{OUTPUT}, IsGrad::NO}, output},
            {SlotGradId{slot_id_t{QUERY}, IsGrad::YES}, query}};

Done.


lib/local-execution/test/src/test_local_slots_backing.cc line 197 at r11 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Maybe instead have a get_slot_backings_without_tensor_allocation_addresses and compare them as equal?

Done.


lib/local-execution/test/src/test_local_slots_backing.cc line 200 at r11 (raw file):

            correct, result));
      }
      SUBCASE("Arg Slots Backing") {

Done.


lib/local-execution/test/src/test_local_slots_backing.cc line 216 at r11 (raw file):

            {slot_id_t{PROFILING},
             ConcreteArgSpec::create(runtime_arg_config.profiling_settings)},
            {slot_id_t{HANDLE}, ConcreteArgSpec::create(handle)}};

Done.


lib/local-execution/test/src/test_local_slots_backing.cc line 220 at r11 (raw file):

Previously, lockshaw (Colin Unger) wrote…

This seems like a check for resolve_runtime_arg_ref_spec, not construct_arg_slots_backing?

Done.


lib/local-execution/test/src/test_local_slots_backing.cc line 230 at r11 (raw file):

Previously, lockshaw (Colin Unger) wrote…

What function is being tested here?

Here, I was just testing device specific unwrapping, but I don't think this test is necessary anymore.


lib/local-execution/test/src/test_local_task_arg_accessor.cc line 9 at r11 (raw file):

TEST_SUITE(FF_TEST_SUITE) {
  TEST_CASE("Local Task Argument Accessor") {

Done.


lib/local-execution/test/src/test_local_task_arg_accessor.cc line 49 at r11 (raw file):

    SUBCASE("get_tensor") {
      SUBCASE("Read-only input tensor") {

Done.


lib/local-execution/test/src/test_local_task_arg_accessor.cc line 54 at r11 (raw file):

        GenericTensorAccessorR result = std::get<GenericTensorAccessorR>(
            acc.get_tensor(slot_id_t{INPUT}, Permissions::RO, IsGrad::NO));
        CHECK(correct == result);

Done.


lib/local-execution/test/src/test_local_task_arg_accessor.cc line 81 at r11 (raw file):

        GenericTensorAccessorW result = std::get<GenericTensorAccessorW>(
            acc.get_tensor(slot_id_t{INPUT}, Permissions::RW, IsGrad::YES));
        CHECK(input_grad == result);

Done.


lib/local-execution/test/src/test_local_task_arg_accessor.cc line 93 at r11 (raw file):

            std::get<std::vector<GenericTensorAccessorR>>(
                acc.get_variadic_tensor(
                    slot_id_t{VARIADIC_TENSORS}, Permissions::RO, IsGrad::NO));

Done.


lib/local-execution/test/src/test_task_registry.cc line 30 at r11 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Why not just do a single == comparison on the TaskRegistry, rather than having all of these subcases?

Done.


lib/local-execution/test/src/test_task_registry.cc line 64 at r11 (raw file):

    SUBCASE("multiple layers same task") {
      layer_guid_t layer_1 = layer_guid_t{Node{1}};
      layer_guid_t layer_2 = layer_guid_t{Node{2}};

Done.


lib/local-execution/test/src/test_task_registry.cc line 67 at r11 (raw file):

      task_registry.register_tasks_for_layer(layer_guid, attrs);
      task_registry.register_tasks_for_layer(layer_1, attrs);
      task_registry.register_tasks_for_layer(layer_2, attrs);

Done.


lib/local-execution/include/local-execution/cost_estimate.h line 21 at r11 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Remove this second overload and just ignore MachineView where it's not necessary. Overall CostEstimator should include network modelling, etc. rather than just being a thin interface onto kernel runtimes

Done.


lib/local-execution/src/device_states.cc line 11 at r11 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Delete

Done.


lib/local-execution/src/device_states.cc line 14 at r11 (raw file):

Previously, lockshaw (Colin Unger) wrote…

What is going on here?

Removed, just made it auto.


lib/local-execution/src/runtime_arg_ref.cc line 10 at r8 (raw file):

Previously, lockshaw (Colin Unger) wrote…

What's the argument for this being a DeviceSpecific<PerDeviceFFHandle> instead of just a PerDeviceFFHandle?

I'm not too knowledgeable about how handles work/or are passed around, but I'd assume we need to know what device it's referring to? So device specific would take care of that.


lib/local-execution/src/tasks.cc line 6 at r11 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Probably best to just make task_id_t a dtgen enum, and then you get format_as for free. We'll need a bit of conversion code when registering things in Legion, but it should be pretty trivial

Done.


lib/local-execution/include/local-execution/device_states.variant.toml line 2 at r11 (raw file):

namespace = "FlexFlow"
name = "DeviceStates"

Done.

Copy link
Collaborator

@lockshaw lockshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 76 of 76 files at r13, all commit messages.
Reviewable status: all files reviewed, 27 unresolved discussions (waiting on @reyna-abhyankar)


lib/kernels/include/kernels/array_shape.h line 58 at r13 (raw file):

TensorShape get_tensor_shape(ArrayShape const &, DataType);

std::string format_as(std::pair<ArrayShape, DataType> const &);

utils/fmt/pair.h?


lib/kernels/include/kernels/device.h line 102 at r4 (raw file):

Previously, reyna-abhyankar (Reyna Abhyankar) wrote…

Temporarily removed it because I was getting a linker error when running proj test for local-execution-tests: undefined reference to 'cudaGetErrorString'

It should get added back in, the key is just only to call it from non-template code in kernels so the linker does the right thing. But we really do want the string version of those error messages. Of course if you wanted you could also pull parts of it out of macros and into constant functions to reduce the restrictions--for example if you just wrapped cudaGetErrorString in kernels this wouldn't be an issue


lib/kernels/src/ff_handle.cc line 9 at r13 (raw file):

  oss << "<PerDeviceFFHandle";
  oss << " workSpaceSize=" << x.workSpaceSize;
  oss << " allowTensorOpMathConversion=" << x.allowTensorOpMathConversion;

You're missing a closing >. You can also always use fmt syntax if you prefer, i.e., `return fmt::format("", x.workSpaceSize, x.allowTensorOpMathConversion);


lib/local-execution/include/local-execution/fwd_bwd_task_impl_function.h line 1 at r13 (raw file):

#include "local-execution/task_argument_accessor.h"

Header guard?


lib/local-execution/include/local-execution/fwd_bwd_task_impl_function.h line 7 at r13 (raw file):

struct FwdBwdTaskImplFunction {

std::optional<float> (*fwd_bwd_task_impl_function)(TaskArgumentAccessor const &);

Suggestion:

std::optional<float> (*function_ptr)(TaskArgumentAccessor const &);

lib/local-execution/include/local-execution/fwd_bwd_task_impl_function.h line 11 at r13 (raw file):

bool operator==(FwdBwdTaskImplFunction const &) const;
bool operator!=(FwdBwdTaskImplFunction const &) const;

Ideally also add operator<


lib/local-execution/include/local-execution/init_task_impl_function.h line 1 at r13 (raw file):

#include "local-execution/device_specific_device_states.dtg.h"

Header guard?


lib/local-execution/include/local-execution/init_task_impl_function.h line 8 at r13 (raw file):

struct InitTaskImplFunction {

DeviceSpecificDeviceStates (*init_task_impl_function)(TaskArgumentAccessor const &);

Suggestion:

DeviceSpecificDeviceStates (*function_ptr)(TaskArgumentAccessor const &);

lib/local-execution/include/local-execution/init_task_impl_function.h line 12 at r13 (raw file):

bool operator==(InitTaskImplFunction const &) const;
bool operator!=(InitTaskImplFunction const &) const;

Ideally also add operator<


lib/local-execution/include/local-execution/local_task_argument_accessor.h line 42 at r13 (raw file):

};

using TensorSlotsBackingWithoutAddresses = std::unordered_map<

dtgen wrapper would be preferable


lib/local-execution/include/local-execution/op_task_signature.h line 95 at r13 (raw file):

  std::unordered_set<OpTensorSlotSpec> op_tensor_slots;
};
FF_VISITABLE_STRUCT_NONSTANDARD_CONSTRUCTION(

Eventually this should be migrated on from visitable, but doesn't have to happen now


lib/local-execution/include/local-execution/per_device_state.h line 1 at r13 (raw file):

#include "local-execution/device_specific_device_states.dtg.h"

Header guard?


lib/local-execution/include/local-execution/per_device_state.variant.toml line 2 at r13 (raw file):

namespace = "FlexFlow"
name = "PerDeviceState"

These are all for operators, so PerDeviceOpState seems clearer--PerDeviceState makes it seem there is one per device and it has nothing to do with operators

Code quote:

"PerDeviceState"

lib/local-execution/include/local-execution/task_impl_function.variant.toml line 4 at r13 (raw file):

name = "TaskImplFunction"
features = [
  "eq",

Would be good to have hash. ord too if it's possible (currently there's no ord for unordered_set and unordered_map, so it's not always possible)


lib/local-execution/include/local-execution/task_impl_function.variant.toml line 13 at r13 (raw file):

  "local-execution/init_task_impl_function.h",
  "local-execution/fwd_bwd_task_impl_function.h",
]

Suggestion:

includes = [
  "local-execution/init_task_impl_function.h",
  "local-execution/fwd_bwd_task_impl_function.h",
]

lib/local-execution/include/local-execution/task_registry.h line 12 at r11 (raw file):

Previously, reyna-abhyankar (Reyna Abhyankar) wrote…

We modify it on the fly while constructing the local backing, so I think we should keep as is.

dtgen types can be modified after creation, they just don't normally have default constructors. You could define a register_tasks_for_layer function instead of the method. It's not a huge deal, just might automate some operations you currently have to maintain


lib/local-execution/include/local-execution/task_registry.h line 26 at r11 (raw file):

Previously, reyna-abhyankar (Reyna Abhyankar) wrote…

Why private these?

Oh, I meant just pull them into a separate section, didn't think of the fact that they're not currently private, so it would just be public: instead


lib/local-execution/include/local-execution/task_signature_impl.h line 5 at r13 (raw file):

// #include "local-execution/op_task_invocation.h"
// #include "local-execution/task_impl_function.dtg.h"

Delete

Code quote:

// #include "local-execution/op_task_invocation.h"
// #include "local-execution/task_impl_function.dtg.h"

lib/local-execution/include/local-execution/task_signature_impl.struct.toml line 4 at r13 (raw file):

name = "TaskSignatureAndImpl"
features = [
  "eq",

Ideally also add at least hash (ord if possible)


lib/local-execution/include/local-execution/task_signature_impl.struct.toml line 10 at r13 (raw file):

includes = [
  "local-execution/task_impl_function.dtg.h",
  "local-execution/op_task_invocation.h",

Suggestion:

  "local-execution/op_task_signature.h",

lib/local-execution/src/concrete_arg.cc line 19 at r11 (raw file):

Previously, reyna-abhyankar (Reyna Abhyankar) wrote…

the actual value is a template function .get<T> and we don't know the type

Fair enough


lib/local-execution/src/fwd_bwd_task_impl_function.cc line 17 at r13 (raw file):

  std::ostringstream oss;
  oss << "<FwdBwdTaskImplFunction";
  oss << " fwd_bwd_task_impl_function=" << x.fwd_bwd_task_impl_function;

Suggestion:

 oss << " function_ptr=" << x.fwd_bwd_task_impl_function;

lib/local-execution/src/fwd_bwd_task_impl_function.cc line 17 at r13 (raw file):

  std::ostringstream oss;
  oss << "<FwdBwdTaskImplFunction";
  oss << " fwd_bwd_task_impl_function=" << x.fwd_bwd_task_impl_function;

How does this currently render x.fwd_bwd_task_impl_function? As a pointer? A number?


lib/local-execution/src/init_task_impl_function.cc line 17 at r13 (raw file):

  std::ostringstream oss;
  oss << "<InitTaskImplFunction";
  oss << " init_task_impl_function=" << x.init_task_impl_function;

Suggestion:

  oss << " function_ptr=" << x.init_task_impl_function;

lib/local-execution/src/op_task_signature.cc line 152 at r13 (raw file):

std::string format_as(OpTaskSignature const &x) {
  std::ostringstream oss;
  oss << "<OpTaskSignature";

Remember you can also use fmt syntax if you prefer 🙂


lib/local-execution/src/task_registry.cc line 38 at r11 (raw file):

Previously, reyna-abhyankar (Reyna Abhyankar) wrote…

The purpose of the optional is because not every operator has all tasks. So when we are doing init, for example, an operator without init can be skipped. This solution will throw an error when something you expect NOT to register a task correctly doesn't. Are you saying these maps shouldn't even contain the operators that don't have that task? (in which case why have the optional at all / why even insert weight/noop/input into the map)

Good point, forgot that at least not everything has an init task, so adding optional everywhere makes sense. In that case is there a reason to handle Input, Weight, etc. separately vs just having get_task_ids return an empty list?


lib/local-execution/src/ops/input.h line 0 at r13 (raw file):
Why an empty file?


lib/local-execution/src/ops/noop.h line 11 at r13 (raw file):

namespace FlexFlow {

std::vector<task_id_t> get_task_ids(InputAttrs const &);

Ideally put these in their own headers, unlike in master their attr types are distinct


lib/local-execution/test/src/test_local_slots_backing.cc line 77 at r11 (raw file):

Previously, reyna-abhyankar (Reyna Abhyankar) wrote…

The interface for allocate tensors allocates all outgoing tensors for a layer (because the assumption is that all incoming tensors have been allocated by the previous call). So it's actually much easier this way unless we want to change that interface to make it more granular (i.e. a function that takes a shape, allocator, and tensor guid and then makes the appropriate allocator call). We could do that, there'd just be a bunch of smaller functions when the current allocate_tensors is already only ~30 lines

My point was more--why not just call it once and check that the outgoing tensor for that layer are allocated, rather than calling it a bunch of times and then checking for a bunch of operators?


lib/local-execution/src/runtime_arg_ref.cc line 10 at r8 (raw file):

Previously, reyna-abhyankar (Reyna Abhyankar) wrote…

I'm not too knowledgeable about how handles work/or are passed around, but I'd assume we need to know what device it's referring to? So device specific would take care of that.

Makes sense, I convinced myself 😛

Copy link
Collaborator Author

@reyna-abhyankar reyna-abhyankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 155 of 186 files reviewed, 27 unresolved discussions (waiting on @lockshaw and @reyna-abhyankar)


lib/kernels/include/kernels/array_shape.h line 58 at r13 (raw file):

Previously, lockshaw (Colin Unger) wrote…

utils/fmt/pair.h?

Added a format fn for ArrayShape (then pair takes care of the rest)


lib/kernels/include/kernels/device.h line 102 at r4 (raw file):

Previously, lockshaw (Colin Unger) wrote…

It should get added back in, the key is just only to call it from non-template code in kernels so the linker does the right thing. But we really do want the string version of those error messages. Of course if you wanted you could also pull parts of it out of macros and into constant functions to reduce the restrictions--for example if you just wrapped cudaGetErrorString in kernels this wouldn't be an issue

Done.


lib/kernels/src/ff_handle.cc line 9 at r13 (raw file):

Previously, lockshaw (Colin Unger) wrote…

You're missing a closing >. You can also always use fmt syntax if you prefer, i.e., `return fmt::format("", x.workSpaceSize, x.allowTensorOpMathConversion);

Just added the `>.


lib/local-execution/include/local-execution/fwd_bwd_task_impl_function.h line 1 at r13 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Header guard?

Done.


lib/local-execution/include/local-execution/fwd_bwd_task_impl_function.h line 7 at r13 (raw file):

struct FwdBwdTaskImplFunction {

std::optional<float> (*fwd_bwd_task_impl_function)(TaskArgumentAccessor const &);

Done.


lib/local-execution/include/local-execution/fwd_bwd_task_impl_function.h line 11 at r13 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Ideally also add operator<

Why do we need ordering? Even though we technically can order the function ptrs, there isn't a real order to the functions themselves (which is what we are actually comparing)


lib/local-execution/include/local-execution/init_task_impl_function.h line 1 at r13 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Header guard?

Done.


lib/local-execution/include/local-execution/init_task_impl_function.h line 8 at r13 (raw file):

struct InitTaskImplFunction {

DeviceSpecificDeviceStates (*init_task_impl_function)(TaskArgumentAccessor const &);

Done.


lib/local-execution/include/local-execution/init_task_impl_function.h line 12 at r13 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Ideally also add operator<

See https://reviewable.io/reviews/flexflow/FlexFlow/1418#-O3txKxiEga5ouuIbrgb


lib/local-execution/include/local-execution/local_task_argument_accessor.h line 42 at r13 (raw file):

Previously, lockshaw (Colin Unger) wrote…

dtgen wrapper would be preferable

It's only used in this one spot, I feel like dtgen for it is kind of overkill.


lib/local-execution/include/local-execution/op_task_signature.h line 95 at r13 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Eventually this should be migrated on from visitable, but doesn't have to happen now

Created an issue for this: #1465


lib/local-execution/include/local-execution/task_impl_function.variant.toml line 4 at r13 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Would be good to have hash. ord too if it's possible (currently there's no ord for unordered_set and unordered_map, so it's not always possible)

Added hash, for similar discussion of ord see https://reviewable.io/reviews/flexflow/FlexFlow/1418#-O3txKxiEga5ouuIbrgb


lib/local-execution/include/local-execution/task_impl_function.variant.toml line 13 at r13 (raw file):

  "local-execution/init_task_impl_function.h",
  "local-execution/fwd_bwd_task_impl_function.h",
]

Done.


lib/local-execution/include/local-execution/task_registry.h line 12 at r11 (raw file):

Previously, lockshaw (Colin Unger) wrote…

dtgen types can be modified after creation, they just don't normally have default constructors. You could define a register_tasks_for_layer function instead of the method. It's not a huge deal, just might automate some operations you currently have to maintain

Don't the xxx_task_ids maps have to be const? register_tasks_for_layer has to insert into them


lib/local-execution/include/local-execution/task_registry.h line 26 at r11 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Oh, I meant just pull them into a separate section, didn't think of the fact that they're not currently private, so it would just be public: instead

Done.


lib/local-execution/include/local-execution/task_signature_impl.h line 5 at r13 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Delete

Done.


lib/local-execution/include/local-execution/task_signature_impl.struct.toml line 4 at r13 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Ideally also add at least hash (ord if possible)

Currently, OpTaskSignature is not hashable, I think we can solve that in #1465
For ord, see https://reviewable.io/reviews/flexflow/FlexFlow/1418#-O3txKxiEga5ouuIbrgb


lib/local-execution/include/local-execution/task_signature_impl.struct.toml line 10 at r13 (raw file):

includes = [
  "local-execution/task_impl_function.dtg.h",
  "local-execution/op_task_invocation.h",

Done.


lib/local-execution/src/fwd_bwd_task_impl_function.cc line 17 at r13 (raw file):

  std::ostringstream oss;
  oss << "<FwdBwdTaskImplFunction";
  oss << " fwd_bwd_task_impl_function=" << x.fwd_bwd_task_impl_function;

Done.


lib/local-execution/src/fwd_bwd_task_impl_function.cc line 17 at r13 (raw file):

Previously, lockshaw (Colin Unger) wrote…

How does this currently render x.fwd_bwd_task_impl_function? As a pointer? A number?

Currently, it is always the number 1. Should it be (void*)x.function_ptr? That actually prints an address.


lib/local-execution/src/init_task_impl_function.cc line 17 at r13 (raw file):

  std::ostringstream oss;
  oss << "<InitTaskImplFunction";
  oss << " init_task_impl_function=" << x.init_task_impl_function;

Done.


lib/local-execution/src/op_task_signature.cc line 152 at r13 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Remember you can also use fmt syntax if you prefer 🙂

I think I just saw this format in one of the files lol. I like the readability, might just stick with it. It's not really that much of a pain.


lib/local-execution/src/task_registry.cc line 38 at r11 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Good point, forgot that at least not everything has an init task, so adding optional everywhere makes sense. In that case is there a reason to handle Input, Weight, etc. separately vs just having get_task_ids return an empty list?

Yep, there's no more separate handling anymore. If you look at local_training_backing.cc and task_signature_impl.cc you'll see a function for input, noop, and weights.


lib/local-execution/src/ops/input.h line at r13 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Why an empty file?

Had initially decided to split the get_task_ids functions for input and weight separately. Then just clubbed it with noop. Now it's back to separate again (so instead of empty this is a filled file).


lib/local-execution/src/ops/noop.h line 11 at r13 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Ideally put these in their own headers, unlike in master their attr types are distinct

Done. Though it does feel a little weird to have weight.h in op, but I guess it's the same as input being an operator too and they're all graph nodes. So it's fine


lib/local-execution/test/src/test_local_slots_backing.cc line 77 at r11 (raw file):

Previously, lockshaw (Colin Unger) wrote…

My point was more--why not just call it once and check that the outgoing tensor for that layer are allocated, rather than calling it a bunch of times and then checking for a bunch of operators?

Ah, actually here we are only checking for a single operator. It's just all incoming tensors for that node have to have been allocated first since we are checking both incoming and outgoing tensors exist for this operator.


lib/local-execution/include/local-execution/per_device_state.h line 1 at r13 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Header guard?

Done.


lib/local-execution/include/local-execution/per_device_state.variant.toml line 2 at r13 (raw file):

Previously, lockshaw (Colin Unger) wrote…

These are all for operators, so PerDeviceOpState seems clearer--PerDeviceState makes it seem there is one per device and it has nothing to do with operators

Yup that makes sense. Done.

Copy link
Collaborator

@lockshaw lockshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 30 of 30 files at r14, all commit messages.
Reviewable status: all files reviewed, 13 unresolved discussions (waiting on @lockshaw and @reyna-abhyankar)


lib/local-execution/include/local-execution/fwd_bwd_task_impl_function.h line 11 at r13 (raw file):

Previously, reyna-abhyankar (Reyna Abhyankar) wrote…

Why do we need ordering? Even though we technically can order the function ptrs, there isn't a real order to the functions themselves (which is what we are actually comparing)

Usually to allow the thing to also be stored in a std::map or a std::set which require operator<


lib/local-execution/include/local-execution/local_task_argument_accessor.h line 42 at r13 (raw file):

Previously, reyna-abhyankar (Reyna Abhyankar) wrote…

It's only used in this one spot, I feel like dtgen for it is kind of overkill.

The reason would be more to wrap the type so the additional utils/fmt/variant.h, utils/fmt/vector.h, etc. includes aren't necessary, and just to avoid the potential confusion/weirdness of a type alias. I'd lean toward dtgen here as the overhead is pretty small and it avoids these issues, but I don't think it's worth holding up the PR over


lib/local-execution/include/local-execution/task_registry.h line 12 at r11 (raw file):

Previously, reyna-abhyankar (Reyna Abhyankar) wrote…

Don't the xxx_task_ids maps have to be const? register_tasks_for_layer has to insert into them

You'd take it as TaskRegistry & rather than TaskRegistry const & and then you'd be allowed to modify the std::unordered_maps


lib/local-execution/src/fwd_bwd_task_impl_function.cc line 18 at r14 (raw file):

  std::ostringstream oss;
  oss << "<FwdBwdTaskImplFunction";
  oss << " function_ptr=" << (void *)x.function_ptr;

Suggestion:

static_cast<void *>(x.function_ptr);

lib/local-execution/src/init_task_impl_function.cc line 16 at r14 (raw file):

  std::ostringstream oss;
  oss << "<InitTaskImplFunction";
  oss << " function_ptr=" << (void *)x.function_ptr;

Suggestion:

  oss << " function_ptr=" << static_cast<void *>(x.function_ptr);

lib/local-execution/src/ops/input.cc line 2 at r14 (raw file):

#include "input.h"
#include "local-execution/op_task_invocation.h"

Why is op_task_invocation.h needed?


lib/local-execution/src/ops/input.cc line 3 at r14 (raw file):

#include "input.h"
#include "local-execution/op_task_invocation.h"
#include "utils/hash-utils.h"

Why is hash-utils.h needed?


lib/local-execution/src/ops/noop.cc line 17 at r14 (raw file):

#include "noop.h"
#include "utils/hash-utils.h"

Why is hash-utils.h necessary?


lib/local-execution/src/ops/weight.cc line 2 at r14 (raw file):

#include "weight.h"
#include "local-execution/op_task_invocation.h"

Why is op_task_invocation.h necessary?


lib/local-execution/src/ops/weight.cc line 3 at r14 (raw file):

#include "weight.h"
#include "local-execution/op_task_invocation.h"
#include "utils/hash-utils.h"

Why is hash-utils.h necessary?


lib/local-execution/test/src/test_local_slots_backing.cc line 77 at r11 (raw file):

Previously, reyna-abhyankar (Reyna Abhyankar) wrote…

Ah, actually here we are only checking for a single operator. It's just all incoming tensors for that node have to have been allocated first since we are checking both incoming and outgoing tensors exist for this operator.

Sorry, I meant "bunch of tensors" instead of "bunch of operators"--this just seems to be checking much more than what allocate_tensors does? I'm not sure why we need to check that all incoming and outgoing tensors exist to confirm that allocate_tensors works?

@lockshaw lockshaw self-requested a review August 21, 2024 23:05
Copy link
Collaborator Author

@reyna-abhyankar reyna-abhyankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 13 unresolved discussions (waiting on @lockshaw)


lib/local-execution/include/local-execution/fwd_bwd_task_impl_function.h line 11 at r13 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Usually to allow the thing to also be stored in a std::map or a std::set which require operator<

Ah ok. Done


lib/local-execution/include/local-execution/local_task_argument_accessor.h line 42 at r13 (raw file):

Previously, lockshaw (Colin Unger) wrote…

The reason would be more to wrap the type so the additional utils/fmt/variant.h, utils/fmt/vector.h, etc. includes aren't necessary, and just to avoid the potential confusion/weirdness of a type alias. I'd lean toward dtgen here as the overhead is pretty small and it avoids these issues, but I don't think it's worth holding up the PR over

I'll make an issue: #1468


lib/local-execution/include/local-execution/task_registry.h line 12 at r11 (raw file):

Previously, lockshaw (Colin Unger) wrote…

You'd take it as TaskRegistry & rather than TaskRegistry const & and then you'd be allowed to modify the std::unordered_maps

but when constructing TaskRegistry with dtgen, it requires const maps. Maybe I'm misunderstanding this part.


lib/local-execution/src/fwd_bwd_task_impl_function.cc line 18 at r14 (raw file):

  std::ostringstream oss;
  oss << "<FwdBwdTaskImplFunction";
  oss << " function_ptr=" << (void *)x.function_ptr;

Done. Actually needed to be reinterpret_cast


lib/local-execution/src/init_task_impl_function.cc line 16 at r14 (raw file):

  std::ostringstream oss;
  oss << "<InitTaskImplFunction";
  oss << " function_ptr=" << (void *)x.function_ptr;

Done. Actually needed to be reinterpret_cast


lib/local-execution/src/ops/input.cc line 2 at r14 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Why is op_task_invocation.h needed?

Done.


lib/local-execution/src/ops/input.cc line 3 at r14 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Why is hash-utils.h needed?

Done.


lib/local-execution/src/ops/noop.cc line 17 at r14 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Why is hash-utils.h necessary?

Done.


lib/local-execution/src/ops/weight.cc line 2 at r14 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Why is op_task_invocation.h necessary?

Done.


lib/local-execution/src/ops/weight.cc line 3 at r14 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Why is hash-utils.h necessary?

Done.


lib/local-execution/test/src/test_local_slots_backing.cc line 77 at r11 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Sorry, I meant "bunch of tensors" instead of "bunch of operators"--this just seems to be checking much more than what allocate_tensors does? I'm not sure why we need to check that all incoming and outgoing tensors exist to confirm that allocate_tensors works?

allocate_tensors will allocate all outgoing tensors for a given operator. You're right that we don't have to check all incoming tensors, but seeing as those should be allocated using allocate_tensors (or allocator.allocate_tensor if it's an input tensor) then I think it's ok? Again, the alternative would be to make allocate_tensors more granular OR we only check the output tensors are allocated.

Copy link
Collaborator

@lockshaw lockshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 6 files at r15, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @reyna-abhyankar)


lib/local-execution/include/local-execution/task_impl_function.variant.toml line 4 at r13 (raw file):

Previously, reyna-abhyankar (Reyna Abhyankar) wrote…

Added hash, for similar discussion of ord see https://reviewable.io/reviews/flexflow/FlexFlow/1418#-O3txKxiEga5ouuIbrgb

Can we also add ord here since FwdBwdTaskImplFunction now supports ord?


lib/local-execution/include/local-execution/task_registry.h line 12 at r11 (raw file):

Previously, reyna-abhyankar (Reyna Abhyankar) wrote…

but when constructing TaskRegistry with dtgen, it requires const maps. Maybe I'm misunderstanding this part.

I don't see why it requires const maps? It's totally fine to do TaskRegistry r = TaskRegistry{{}, {}, {}, {}}; r.init_task_ids.insert({...});. dtgen doesn't make the fields const (of course unless you declare a TaskRegistry const r, but then it's const because you explicitly told it to be)


lib/local-execution/src/fwd_bwd_task_impl_function.cc line 18 at r14 (raw file):

Previously, reyna-abhyankar (Reyna Abhyankar) wrote…

Done. Actually needed to be reinterpret_cast

Yeah, that's because apparently this operation (and printing a function pointer in general), is really not a well-defined operation C++-wise. It's fine here because this is really just for debugging, but in general if something is requiring reinterpret_cast remember that's essentially just reinterpreting the bytes in memory and so you lose (at least a lot if not all) portability and correctness guarantees


lib/local-execution/src/init_task_impl_function.cc line 16 at r14 (raw file):

Previously, reyna-abhyankar (Reyna Abhyankar) wrote…

Done. Actually needed to be reinterpret_cast

See https://reviewable.io/reviews/flexflow/FlexFlow/1418#-O4r72igEzyz6-5OdpSc


lib/local-execution/test/src/test_local_slots_backing.cc line 77 at r11 (raw file):

Previously, reyna-abhyankar (Reyna Abhyankar) wrote…

allocate_tensors will allocate all outgoing tensors for a given operator. You're right that we don't have to check all incoming tensors, but seeing as those should be allocated using allocate_tensors (or allocator.allocate_tensor if it's an input tensor) then I think it's ok? Again, the alternative would be to make allocate_tensors more granular OR we only check the output tensors are allocated.

The second option is what I would recommend--I have no issue with some other test checking that all of the input tensors are allocated (this actually probably is something that should be tested), but I think that test is outside the scope of the semantics of allocate_tensors, which is only required to allocate the output tensors (also, in that case it should probably be renamed to allocate_output_tensors?)

Copy link
Collaborator Author

@reyna-abhyankar reyna-abhyankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @lockshaw)


lib/local-execution/include/local-execution/task_impl_function.variant.toml line 4 at r13 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Can we also add ord here since FwdBwdTaskImplFunction now supports ord?

Done


lib/local-execution/include/local-execution/task_registry.h line 12 at r11 (raw file):

Previously, lockshaw (Colin Unger) wrote…

I don't see why it requires const maps? It's totally fine to do TaskRegistry r = TaskRegistry{{}, {}, {}, {}}; r.init_task_ids.insert({...});. dtgen doesn't make the fields const (of course unless you declare a TaskRegistry const r, but then it's const because you explicitly told it to be)

That makes complete sense. Changed.


lib/local-execution/test/src/test_local_slots_backing.cc line 77 at r11 (raw file):

Previously, lockshaw (Colin Unger) wrote…

The second option is what I would recommend--I have no issue with some other test checking that all of the input tensors are allocated (this actually probably is something that should be tested), but I think that test is outside the scope of the semantics of allocate_tensors, which is only required to allocate the output tensors (also, in that case it should probably be renamed to allocate_output_tensors?)

Ok, I've fixed this and moved some of the allocation around in the test to reflect only allocating inputs and then separately only allocating outputs.

Copy link
Collaborator

@lockshaw lockshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 14 of 14 files at r16, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @reyna-abhyankar)


lib/local-execution/include/local-execution/task_registry.struct.toml line 9 at r16 (raw file):

includes = [
  "local-execution/task_signature_impl.h",

This file doesn't seem to have any datatype declarations? Should this be a different file? Did you mean task_signature_impl.dtg.h?


lib/local-execution/include/local-execution/task_registry.struct.toml line 12 at r16 (raw file):

  "local-execution/task_id_t.dtg.h",
  "pcg/layer_guid_t.dtg.h",
]

Suggestion:

features = [
  "eq",
  "hash",
  "fmt",
]

includes = [
  "local-execution/task_signature_impl.h",
  "local-execution/task_id_t.dtg.h",
  "pcg/layer_guid_t.dtg.h",
]

src_includes = [
  "utils/hash/unordered_map.h",
  "utils/fmt/unordered_map.h", 
]

lib/local-execution/test/src/test_task_registry.cc line 99 at r16 (raw file):

    SUBCASE("equality") {
      TaskRegistry other_task_registry = TaskRegistry{{}, {}, {}, {}};

If you want you can also create a helper function like TaskRegistry empty_task_registry() in task_registry.h that just returns a TaskRegistry{{}, {}, {}, {}} to avoid duplicating that code everywhere, but not a big deal either way

Code quote:

     TaskRegistry other_task_registry = TaskRegistry{{}, {}, {}, {}}

Copy link
Collaborator Author

@reyna-abhyankar reyna-abhyankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @lockshaw)


lib/local-execution/include/local-execution/task_registry.struct.toml line 9 at r16 (raw file):

Previously, lockshaw (Colin Unger) wrote…

This file doesn't seem to have any datatype declarations? Should this be a different file? Did you mean task_signature_impl.dtg.h?

Yes, changed.


lib/local-execution/include/local-execution/task_registry.struct.toml line 12 at r16 (raw file):

  "local-execution/task_id_t.dtg.h",
  "pcg/layer_guid_t.dtg.h",
]

Done.


lib/local-execution/test/src/test_task_registry.cc line 99 at r16 (raw file):

Previously, lockshaw (Colin Unger) wrote…

If you want you can also create a helper function like TaskRegistry empty_task_registry() in task_registry.h that just returns a TaskRegistry{{}, {}, {}, {}} to avoid duplicating that code everywhere, but not a big deal either way

Done.

Copy link
Collaborator

@lockshaw lockshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 6 files at r17, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @reyna-abhyankar)


lib/local-execution/include/local-execution/task_registry.h line 6 at r17 (raw file):

#include "local-execution/task_registry.dtg.h"
#include "local-execution/task_signature_impl.h"

Why was this added?


lib/local-execution/include/local-execution/task_signature_impl.struct.toml line 15 at r17 (raw file):

src_includes = [
  "utils/hash/unordered_map.h",

Why are these necessary? I don't see any unordered_maps or unordered_sets in the members?

Copy link
Collaborator Author

@reyna-abhyankar reyna-abhyankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @lockshaw)


lib/local-execution/include/local-execution/task_registry.h line 6 at r17 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Why was this added?

oops, added to the wrong file. changed (it's in local_training_backing.cc and task_registry.cc now)


lib/local-execution/include/local-execution/task_signature_impl.struct.toml line 15 at r17 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Why are these necessary? I don't see any unordered_maps or unordered_sets in the members?

It's needed to be able to hash OpTaskSignature. Removed the fmt includes though.

Copy link
Collaborator

@lockshaw lockshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r18, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @reyna-abhyankar)


lib/local-execution/include/local-execution/task_signature_impl.struct.toml line 15 at r17 (raw file):

Previously, reyna-abhyankar (Reyna Abhyankar) wrote…

It's needed to be able to hash OpTaskSignature. Removed the fmt includes though.

Ah, because that's still using visitable. Let's put the includes in op_task_signature.h then rather than here

Copy link
Collaborator

@lockshaw lockshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r19, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @reyna-abhyankar)

Copy link
Collaborator Author

@reyna-abhyankar reyna-abhyankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @reyna-abhyankar)


lib/local-execution/include/local-execution/task_signature_impl.struct.toml line 15 at r17 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Ah, because that's still using visitable. Let's put the includes in op_task_signature.h then rather than here

Done.

@lockshaw lockshaw enabled auto-merge (squash) August 22, 2024 21:51
@lockshaw lockshaw merged commit 5e1f349 into flexflow:repo-refactor Aug 22, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
repo-refactor Topics related to the repo and search refactors rr-runtime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants