You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The encode_labels() function in senteval/sick.py from this article is meant for labels [1, ..., K] (see section 4.2 of the paper). STSBenchmarkEval class inherits SICKRelatednessEval, so it inherits its encode_labels() function. However, STSBenchmark task has labels from 0 to 5.
Thus by constructions, a model trained in this way will never predict correctly data with label in [0, 1].
It is easy to check this issue by running examples/bow.py on STSBenchmark task and printing min(results['STSBenchmark']['yhat']) that will be always greater than 1!
An easy way to fix this could be shifting the original labels in senteval/sts.py: sick_data['y'] = [float(s)+1 for s in sick_data['y']]
and then fix the ranges in the rest of the code. However, that will probably mess the code for SICK task, that is currently correct.
The text was updated successfully, but these errors were encountered:
The
encode_labels()
function insenteval/sick.py
from this article is meant for labels [1, ..., K] (see section 4.2 of the paper).STSBenchmarkEval
class inheritsSICKRelatednessEval
, so it inherits itsencode_labels()
function. However,STSBenchmark
task has labels from 0 to 5.Thus by constructions, a model trained in this way will never predict correctly data with label in [0, 1].
It is easy to check this issue by running
examples/bow.py
onSTSBenchmark
task and printingmin(results['STSBenchmark']['yhat'])
that will be always greater than 1!An easy way to fix this could be shifting the original labels in
senteval/sts.py
:sick_data['y'] = [float(s)+1 for s in sick_data['y']]
and then fix the ranges in the rest of the code. However, that will probably mess the code for SICK task, that is currently correct.
The text was updated successfully, but these errors were encountered: