Skip to content

Commit

Permalink
Add xfail for conformance test (#2890)
Browse files Browse the repository at this point in the history
### Changes

<!--- What was changed (briefly), how to reproduce (if applicable), what
the reviewers should focus on -->

### Reason for changes

<!--- Why should the change be applied -->

### Related tickets

<!--- Post the numerical ID of the ticket, if available -->

### Tests

<!--- How was the correctness of changes tested and whether new tests
were added -->
  • Loading branch information
AlexanderDokuchaev authored Aug 19, 2024
1 parent 60fe68a commit bdf8d27
Show file tree
Hide file tree
Showing 4 changed files with 34 additions and 3 deletions.
23 changes: 23 additions & 0 deletions tests/post_training/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,3 +122,26 @@ Run test with calibration dataset having batch-size=10 for all models:
```bash
pytest --data=<path_to_datasets> --batch-size 10 tests/post_training/test_quantize_conformance.py
```

## Reference data

The reference data section outlines the expected format for defining reference values used during parallel testing.

```yml
<Name from model scopes>_backend_<BACKEND>:
metric_value: <expected value>
```
> [!IMPORTANT]
> The reference file is used for parallel testing.
> The path to the *_reference_data.yaml files is used during testing and should not be changed without updating Jenkins scripts.
### Marking tests as xfail
To mark a test as expected to fail (xfail) when a validation metric does not meet expectations, add the following line to the reference data:
```yml
<Name from model scopes>_backend_<BACKEND>:
...
metrics_xfail_reason: "Issue-<jira ticket number>"
```
4 changes: 2 additions & 2 deletions tests/post_training/data/wc_reference_data.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,9 @@ tinyllama_data_aware_gptq_backend_OV:
metric_value: 0.87134
num_int4: 94
num_int8: 124
atol: 0.0004 # issue 148819
metrics_xfail_reason: "Issue-148819"
tinyllama_scale_estimation_per_channel_backend_OV:
metric_value: 0.81389
num_int4: 188
num_int8: 124
atol: 0.006 # issue 148819
metrics_xfail_reason: "Issue-148819"
8 changes: 7 additions & 1 deletion tests/post_training/pipelines/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@
from tools.memory_monitor import memory_monitor_context

DEFAULT_VAL_THREADS = 4
METRICS_XFAIL_REASON = "metrics_xfail_reason"


class BackendType(Enum):
Expand Down Expand Up @@ -307,16 +308,21 @@ def validate(self) -> None:
if metric_value is not None and metric_value_fp32 is not None:
self.run_info.metric_diff = round(self.run_info.metric_value - self.reference_data["metric_value_fp32"], 5)

status_msg = None
if (
metric_value is not None
and metric_reference is not None
and not np.isclose(metric_value, metric_reference, atol=self.reference_data.get("atol", 0.001))
):
if metric_value < metric_reference:
status_msg = f"Regression: Metric value is less than reference {metric_value} < {metric_reference}"
raise ValueError(status_msg)
if metric_value > metric_reference:
status_msg = f"Improvement: Metric value is better than reference {metric_value} > {metric_reference}"

if status_msg is not None:
if METRICS_XFAIL_REASON in self.reference_data:
self.run_info.status = f"XFAIL: {self.reference_data[METRICS_XFAIL_REASON]} - {status_msg}"
else:
raise ValueError(status_msg)

def run(self) -> None:
Expand Down
2 changes: 2 additions & 0 deletions tests/post_training/test_quantize_conformance.py
Original file line number Diff line number Diff line change
Expand Up @@ -366,3 +366,5 @@ def test_weight_compression(

if err_msg:
pytest.fail(err_msg)
if run_info.status is not None and run_info.status.startswith("XFAIL:"):
pytest.xfail(run_info.status)

0 comments on commit bdf8d27

Please sign in to comment.