Remove AdapterSpec from metrics #2244

yifanmai · 2024-01-17T00:27:51Z

This removes the coupling between the adapter and the metrics, allowing the metrics to be computed only using the requests and results from the model clients.

yifanmai · 2024-01-17T06:09:30Z

Converting to draft because this requires some manual testing.

brianwgoldman

This PR is also removing ScenarioState from metrics. Consider updating the name/description to mention that.

brianwgoldman · 2024-01-17T16:29:25Z

src/helm/benchmark/metrics/basic_metrics.py

        reference_stats: Dict[ReferenceKey, ReferenceStat] = {}
        for request_state in reference_request_states:
            assert request_state.reference_index is not None and request_state.request_mode is not None
            reference_key = ReferenceKey(request_state.reference_index, request_state.request_mode)
            reference_stats[reference_key] = compute_logprob_and_length(request_state, window_service)

-        if adapter_spec.method in [ADAPT_MULTIPLE_CHOICE_SEPARATE_ORIGINAL, ADAPT_RANKING_BINARY]:
+        is_calibrated = any([request_state.request_mode == "calibration" for request_state in reference_request_states])


Why "any" here but using reference_request_states[0] to decide model_deployment?

If we are asserting in both cases that they are universal values, maybe we should write a helper to do that assertion?

brianwgoldman · 2024-01-17T16:30:30Z

src/helm/benchmark/metrics/basic_metrics.py

@@ -294,20 +280,14 @@ def compute_request_state_metrics(
    stats: List[Stat] = []

    stats.append(Stat(MetricName("num_references")).add(len(request_state.instance.references)))
-
-    # Copy from adapter spec
-    stats.append(Stat(MetricName("num_train_trials")).add(adapter_spec.num_train_trials))


Is this Stat not needed?

brianwgoldman · 2024-01-17T16:34:22Z

src/helm/benchmark/metrics/evaluate_instances_metric.py

-            for context, request_states in grouped_request_states.items():
-                for stat in self.evaluate_instances(request_states):
+            for request_state in trial_request_states:
+                grouped_request_states[MetricContext.from_instance(request_state.instance)].append(request_state)


This has potential behavior change since it can include request_states that have non-None reference_index.

brianwgoldman · 2024-01-17T16:43:31Z

src/helm/benchmark/metrics/metric.py

+                if request_state.reference_index is None:
+                    instance_to_request_state_set[instance].generation_states.append(request_state)
+                else:
+                    instance_to_request_state_set[instance].references_states.append(request_state)


Previously the reference_states were ordered by reference_index. Is that still guaranteed? Does it matter if the order changes?

brianwgoldman · 2024-01-17T16:46:59Z

src/helm/benchmark/metrics/metric.py

@@ -166,7 +149,7 @@ def evaluate(

            # Compute per-instance stats
            per_instance_stats: List[PerInstanceStats] = []
-            for instance, stats in zip(scenario_state.instances, results):
+            for instance, stats in zip(instances, results):


I think switching this to zip(request_state_sets, results) would make it less fragile and more clear that we are putting the input and output of the parallel map back together.

brianwgoldman · 2024-01-17T16:48:35Z

src/helm/benchmark/metrics/metric.py

@@ -352,3 +333,19 @@ def add_context(stat: Stat, context: MetricContext) -> Stat:
    return Stat(
        replace(stat.name, split=context.split, sub_split=context.sub_split, perturbation=context.perturbation)
    ).merge(stat)
+
+
+def get_num_train_trials(request_states: List[RequestState]) -> int:


There appears to be no method calling this? Is it left over from a previous iteration?

brianwgoldman · 2024-01-17T16:55:22Z

src/helm/benchmark/metrics/metric.py

+                    instance_to_request_state_set[instance].generation_states.append(request_state)
+                else:
+                    instance_to_request_state_set[instance].references_states.append(request_state)
+            request_state_sets: List[RequestStateSet] = list(instance_to_request_state_set.values())


Order here can also change. Maybe we want an OrderedDict?

Remove AdapterSpec from metrics

e23220b

yifanmai requested review from percyliang and brianwgoldman January 17, 2024 00:27

yifanmai force-pushed the yifanmai/fix-remove-adapter-spec-from-metrics branch from ba53f57 to e23220b Compare January 17, 2024 01:14

yifanmai marked this pull request as draft January 17, 2024 06:09

brianwgoldman approved these changes Jan 17, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove AdapterSpec from metrics #2244

Remove AdapterSpec from metrics #2244

yifanmai commented Jan 17, 2024

yifanmai commented Jan 17, 2024

brianwgoldman left a comment

brianwgoldman Jan 17, 2024

brianwgoldman Jan 17, 2024

brianwgoldman Jan 17, 2024

brianwgoldman Jan 17, 2024

brianwgoldman Jan 17, 2024

brianwgoldman Jan 17, 2024

brianwgoldman Jan 17, 2024

Remove AdapterSpec from metrics #2244

Are you sure you want to change the base?

Remove AdapterSpec from metrics #2244

Conversation

yifanmai commented Jan 17, 2024

yifanmai commented Jan 17, 2024

brianwgoldman left a comment

Choose a reason for hiding this comment

brianwgoldman Jan 17, 2024

Choose a reason for hiding this comment

brianwgoldman Jan 17, 2024

Choose a reason for hiding this comment

brianwgoldman Jan 17, 2024

Choose a reason for hiding this comment

brianwgoldman Jan 17, 2024

Choose a reason for hiding this comment

brianwgoldman Jan 17, 2024

Choose a reason for hiding this comment

brianwgoldman Jan 17, 2024

Choose a reason for hiding this comment

brianwgoldman Jan 17, 2024

Choose a reason for hiding this comment