You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
All of the work that you did is great. But while using your code of IMDBDataset i found some strange things! Array of indicies and array of inverse token mask values do not match each other from the beggining of the sequence because of CLS token (see the reference below). And it was also strange for me, while i found out CLS token in the second sentence. So, i may be wrong, but original BERT uses only one CLS token in the begining. And the last one, calculating length of vocab each time could be very expensive. So, i hope, that my notes will help you to make your code better and more clear. I would be proud if you gave me a posibility to take part in this and help you to fix it.
I am looking forward to your reply,
Nesemenpolkov.
Dear Ivan,
All of the work that you did is great. But while using your code of IMDBDataset i found some strange things! Array of indicies and array of inverse token mask values do not match each other from the beggining of the sequence because of CLS token (see the reference below). And it was also strange for me, while i found out CLS token in the second sentence. So, i may be wrong, but original BERT uses only one CLS token in the begining. And the last one, calculating length of vocab each time could be very expensive. So, i hope, that my notes will help you to make your code better and more clear. I would be proud if you gave me a posibility to take part in this and help you to fix it.
I am looking forward to your reply,
Nesemenpolkov.
Reference:
pytorch_bert/bert/dataset.py
Line 145 in 2cd4724
The text was updated successfully, but these errors were encountered: