-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
返回 embedding 和 huggingface 的返回结果不完全一致 #8
Comments
原始版本 from transformers import BertModel
from transformers import BertTokenizer
sentence = "我是一个好男人!"
max_len = 32
bert_model = BertModel.from_pretrained("/bert-base-chinese")
bert_model.eval()
text_tokenizer = BertTokenizer.from_pretrained("/bert-base-chinese", do_lower_case=True)
tensor_caption = text_tokenizer.encode(sentence,
return_tensors="pt",
padding='max_length',
truncation=True,max_length=max_len)
pooler_output = bert_model(tensor_caption).pooler_output
last_hidden_state = bert_model(tensor_caption).last_hidden_state bert4pytorch 版本 import torch
from bert4pytorch.modeling import build_transformer_model
from bert4pytorch.tokenization import Tokenizer
sentence = "我是一个好男人!"
max_len = 32
root_model_path = "/bert-base-chinese"
vocab_path = root_model_path + "/vocab.txt"
config_path = root_model_path + "/config.json"
checkpoint_path = root_model_path + '/pytorch_model.bin'
# 建立分词器
tokenizer = Tokenizer(vocab_path)
# 读取数据
tokens_ids, segments_ids = tokenizer.encode(sentence, max_len=max_len)
tokens_ids = tokens_ids + (max_len - len(tokens_ids)) * [0]
segments_ids = segments_ids + (max_len - len(segments_ids)) * [0]
tokens_ids_tensor = torch.tensor([tokens_ids])
segment_ids_tensor = torch.tensor([segments_ids])
model = build_transformer_model(config_path, checkpoint_path, with_pool=True)
model.eval()
encoded_layers, pooled_output = model(tokens_ids_tensor, segment_ids_tensor) |
试过把transformer中max_length这个入参去掉,两者是一致的 |
经过我的调试,这个问题最终定位是hugging face 的模型中对layerNorm参数的命名是"gamma"和“beta”。 但是作者导入参数时写的mapping是weight和bias,因此参数导入失败 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
比如 bert-base-chinese,作者是否有做过这方面的评估测试呀~
The text was updated successfully, but these errors were encountered: