Update README.md

GRAAL-Research · Mar 23, 2024 · e1ed220 · e1ed220
1 parent 16ccf47
commit e1ed220
Showing 1 changed file with 7 additions and 7 deletions.
diff --git a/README.md b/README.md
@@ -17,32 +17,32 @@ checks. For more details, refer to our publicly available article.
 
 > This public version of our model uses the best model trained (where in our article, we present the performance results
 > of an average of 10 models) for a more extended period (500 epochs instead of 250). We have observed later that the
-> model can further reduce dev loss and increase performance. Also, we have changed the data augmentation technique use
-> in the article for a more robust one.
+> model can further reduce dev loss and increase performance. Also, we have changed the data augmentation technique used
+> in the article for a more robust one, that also includes the commutative property of the meaning function. Namely, Meaning(Sent_a, Sent_b) = Meaning(Sent_b, Sent_a).
 
 - [HuggingFace Model Card](https://huggingface.co/davebulaval/MeaningBERT)
 
 ## Sanity Check
 
 Correlation to human judgment is one way to evaluate the quality of a meaning preservation metric.
-However, it is inherently subjective, since it uses human judgment as a gold standard, and expensive, since it requires
+However, it is inherently subjective, since it uses human judgment as a gold standard, and expensive since it requires
 a large dataset
 annotated by several humans. As an alternative, we designed two automated tests: evaluating meaning preservation between
 identical sentences (which should be 100% preserving) and between unrelated sentences (which should be 0% preserving).
 In these tests, the meaning preservation target value is not subjective and does not require human annotation to
-measure. They represent a trivial and minimal threshold a good automatic meaning preservation metric should be able to
+be measured. They represent a trivial and minimal threshold a good automatic meaning preservation metric should be able to
 achieve. Namely, a metric should be minimally able to return a perfect score (i.e., 100%) if two identical sentences are
 compared and return a null score (i.e., 0%) if two sentences are completely unrelated.
 
-### Identical sentences
+### Identical Sentences
 
 The first test evaluates meaning preservation between identical sentences. To analyze the metrics' capabilities to pass
 this test, we count the number of times a metric rating was greater or equal to a threshold value X∈[95, 99] and divide
-it by the number of sentences to create a ratio of the number of times the metric gives the expected rating. To account
+It is calculated by the number of sentences to create a ratio of the number of times the metric gives the expected rating. To account
 for computer floating-point inaccuracy, we round the ratings to the nearest integer and do not use a threshold value of
 100%.
 
-### Unrelated sentences
+### Unrelated Sentences
 
 Our second test evaluates meaning preservation between a source sentence and an unrelated sentence generated by a large
 language model.3 The idea is to verify that the metric finds a meaning preservation rating of 0 when given a completely