You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey everyone, I'm pretty new to transformers and had a couple questions regarding the microGPT example:
any reason why an encoder type block was used in the example?
I am trying to build a small proof of concept using BERT and wanted to see if anyone has any pointers. So far I have just modified the training_step function to factor in an attention_mask and suppress the logits corresponding to non-masked tokens. I reckon there's a few other things to be changed but just not sure on how to go about it.
A little more context:
I wanted to build a chess engine for fun and actually got some pretty cool results from just the microgpt example.
Some things I wanted to see I could use BERT for:
I am unpacking each game into a series of board states. the result is a 64 token "sentence" where each token is a piece (1 of rbnkqpRBNKQP and E for empty). I then mask the piece that was moved on each turn and all possible positions it can go to on that turn. So not a random masking but rather masking with some context of the moves each piece can make. There's a different mask for each type of piece. I then use the next board state as the target when calculating loss. My hope is that this will embed some sense of how each piece is allowed to move during the learning process. Grateful for any pointers!
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hey everyone, I'm pretty new to transformers and had a couple questions regarding the microGPT example:
A little more context:
I wanted to build a chess engine for fun and actually got some pretty cool results from just the microgpt example.
Some things I wanted to see I could use BERT for:
I am unpacking each game into a series of board states. the result is a 64 token "sentence" where each token is a piece (1 of rbnkqpRBNKQP and E for empty). I then mask the piece that was moved on each turn and all possible positions it can go to on that turn. So not a random masking but rather masking with some context of the moves each piece can make. There's a different mask for each type of piece. I then use the next board state as the target when calculating loss. My hope is that this will embed some sense of how each piece is allowed to move during the learning process. Grateful for any pointers!
https://colab.research.google.com/drive/17A1hi_1vaa3GElKQ-t3tUJYRKhIAmQaz#scrollTo=pGUpCVfPeYoQ
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions