Trying to modify microGPT to use BERT architecture #662

phshah95 · 2023-02-03T07:46:17Z

phshah95
Feb 3, 2023

Hey everyone, I'm pretty new to transformers and had a couple questions regarding the microGPT example:

any reason why an encoder type block was used in the example?
I am trying to build a small proof of concept using BERT and wanted to see if anyone has any pointers. So far I have just modified the training_step function to factor in an attention_mask and suppress the logits corresponding to non-masked tokens. I reckon there's a few other things to be changed but just not sure on how to go about it.

A little more context:
I wanted to build a chess engine for fun and actually got some pretty cool results from just the microgpt example.
Some things I wanted to see I could use BERT for:

I am unpacking each game into a series of board states. the result is a 64 token "sentence" where each token is a piece (1 of rbnkqpRBNKQP and E for empty). I then mask the piece that was moved on each turn and all possible positions it can go to on that turn. So not a random masking but rather masking with some context of the moves each piece can make. There's a different mask for each type of piece. I then use the next board state as the target when calculating loss. My hope is that this will embed some sense of how each piece is allowed to move during the learning process. Grateful for any pointers!

https://colab.research.google.com/drive/17A1hi_1vaa3GElKQ-t3tUJYRKhIAmQaz#scrollTo=pGUpCVfPeYoQ

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trying to modify microGPT to use BERT architecture #662

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Trying to modify microGPT to use BERT architecture #662

phshah95 Feb 3, 2023

Replies: 0 comments

phshah95
Feb 3, 2023