Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understand the attention design #124

Open
HelloWorldLTY opened this issue Sep 17, 2024 · 0 comments
Open

Understand the attention design #124

HelloWorldLTY opened this issue Sep 17, 2024 · 0 comments

Comments

@HelloWorldLTY
Copy link

Hi, thanks for your great work. I intend to compute the attention scores between tokens and here is my code:

import torch
from transformers import BertModel, BertConfig, DNATokenizer

dir_to_pretrained_model = "./6-new-12w-0/"

config = BertConfig.from_pretrained('../src/transformers/dnabert-config/bert-config-6/config.json')
tokenizer = DNATokenizer.from_pretrained('dna6')
print(config)

model = BertModel.from_pretrained(dir_to_pretrained_model, config=config).cuda()

sequence = "AATCTAATCTAGTCTAGCCTAGCA"
model_input = tokenizer.encode_plus(sequence, add_special_tokens=True, max_length=512)["input_ids"]
inputs = tokenizer.encode_plus(sequence, add_special_tokens=True, max_length=512)

model_input = torch.tensor(model_input, dtype=torch.long).cuda()
model_input = model_input.unsqueeze(0)   # to generate a fake batch with batch size one

output = model(model_input)

print(output[-1][-1])
print(output[-1][-1].shape)

I think the output[-1] will contain attention matrices and I took out the last item, whose shape is [1,12,3,3]. Does this 12 means the 12 heads? And 3,3 represents two sets of tokens? May I know how to comptue the correct attention for these tokens? Just averaging all the attention in each layer? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant