You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sorry to bother you again. I have a question regarding the performance evaluation on the libero_object dataset. I ran LIBERO with the following command:
Despite the few differences in the evaluation settings, the results show significant discrepancies. For instance, the evaluation loss for the first task is -6.28 in the first run but 52.18 in the second run. Why is there such a large gap?
I expected the success rate for earlier tasks to decrease as new tasks are learned in the lifelong run. However, in some cases, the success rate for earlier tasks remains high or even higher than the latest tasks. Would it possibly be because the number of evaluations is not enough for each task (25 episodes)? Could you provide guidance on why this might happen?
Thank you again for your attention and help. I look forward to your reply!
Best regards,
Pengzhi
The text was updated successfully, but these errors were encountered:
Hi @pengzhi1998 , interesting findings. Here are my thoughts:
Different loss across two experiments: the loss is calculated by the negative log probability of the GMM head, and the variance of the GMM will significantly influence the log probability. I guess the learned policies among these two experiments have different final mean & variance, thus the loss is so different.
In lifelong learning, yes generally the results on previously learned tasks are worse than the latest tasks. You can see that the loss of the latest task is always the lowest among the 10 learned tasks. However, for decision-making tasks, the success rate is NOT always proportional to the loss because the decision-making process is a sequential process. Easier tasks may remain high success rates even with a higher loss. Besides, the task order may also have some influences. This is one of the reasons why we build this LIBERO benchmark for the community to study why there is a mismatch between the loss and success rate.
Thank you, Chongkai, for your reply and clear explanation! @HeegerGao
This is very clear to me. Thanks a lot!
I understand that success rates don't have a strong relationship with losses since these are sequential decision-making tasks. However, the most confusing observation to me is that, in both runs, many of the older tasks seem to have higher success rates than the newer ones. Additionally, the success rates between the two runs differ quite a bit.
Dear Authors,
Sorry to bother you again. I have a question regarding the performance evaluation on the
libero_object
dataset. I ran LIBERO with the following command:I conducted two evaluations with slightly different configurations.
First Evaluation Configuration:
Results of the 10 tasks:
Second Evaluation Configuration:
Results:
I'm confused about two points:
Thank you again for your attention and help. I look forward to your reply!
Best regards,
Pengzhi
The text was updated successfully, but these errors were encountered: