Understand the attention design #124

HelloWorldLTY · 2024-09-17T03:29:36Z

Hi, thanks for your great work. I intend to compute the attention scores between tokens and here is my code:

import torch
from transformers import BertModel, BertConfig, DNATokenizer

dir_to_pretrained_model = "./6-new-12w-0/"

config = BertConfig.from_pretrained('../src/transformers/dnabert-config/bert-config-6/config.json')
tokenizer = DNATokenizer.from_pretrained('dna6')
print(config)

model = BertModel.from_pretrained(dir_to_pretrained_model, config=config).cuda()

sequence = "AATCTAATCTAGTCTAGCCTAGCA"
model_input = tokenizer.encode_plus(sequence, add_special_tokens=True, max_length=512)["input_ids"]
inputs = tokenizer.encode_plus(sequence, add_special_tokens=True, max_length=512)

model_input = torch.tensor(model_input, dtype=torch.long).cuda()
model_input = model_input.unsqueeze(0)   # to generate a fake batch with batch size one

output = model(model_input)

print(output[-1][-1])
print(output[-1][-1].shape)

I think the output[-1] will contain attention matrices and I took out the last item, whose shape is [1,12,3,3]. Does this 12 means the 12 heads? And 3,3 represents two sets of tokens? May I know how to comptue the correct attention for these tokens? Just averaging all the attention in each layer? Thanks.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understand the attention design #124

Understand the attention design #124

HelloWorldLTY commented Sep 17, 2024

Understand the attention design #124

Understand the attention design #124

Comments

HelloWorldLTY commented Sep 17, 2024