Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU算子bug #315

Open
ResDream opened this issue Nov 13, 2024 · 1 comment
Open

CPU算子bug #315

ResDream opened this issue Nov 13, 2024 · 1 comment

Comments

@ResDream
Copy link

Describe the bug/ 问题描述 (Mandatory / 必填)
在使用XLMRobertaModel族模型bge-reranker-base出现输出全为nan,具体来说,bge-reranker-base在前向传播第12层过attention层的时候出现了一个-nan导致后续的值全部为-nan,同样使用XLMRobertaModel的embedding模型也同样有这个错误,Ascend设备上没有这个bug。

  • Hardware Environment(Ascend/GPU/CPU) / 硬件环境:
    CPU

  • Software Environment / 软件环境 (Mandatory / 必填):
    -- MindSpore version (e.g., 1.7.0.Bxxx) :2.4.0
    -- Python version (e.g., Python 3.7.5) :3.10
    -- OS platform and distribution (e.g., Linux Ubuntu 16.04):Windows
    -- GCC/Compiler version (if compiled from source):

To Reproduce / 重现步骤 (Mandatory / 必填)
sentence中字符串的长度大于20就出现上述错误

from mindnlp.sentence import SentenceTransformer
model = SentenceTransformer('BAAI/bge-reranker-base')
sentences = [
    '远程仓库,可以使用gitpush命令。通常,这个命令后面会跟远程仓库的名称和要推送的分支名称。\nbash\ngitpush<remote-name><branch-name>\n例如,将本地的master分支推送到origin远程仓库:\nbash\ngitpushoriginmaster\n从远程仓库拉取\n从远程仓库获取最新的更改并合并到本地分支,可以使用gitpull命令。这个命令会将远程仓库的指定分支的更改拉取到当前分支。bash\ngitpull<remote-name><branch-name>\n例如,从origin远程仓库的master分支拉取最新更改:\nbash\ngitpulloriginmaster\n远程分支管理\n查看远程分支,可以使用gitbranch命令加上-r选项。\nbash\ngitbranch-r\n删除远程分支,可以使用gitpush命令加上--delete选项。\nbash\ngitpush<remote-name>--delete<branch-name>\n例如,删除origin远程仓库的feature分支:\nbash\ngitpushorigin--deletefeature\n远程仓库的协作与贡献\n协作和贡献通常涉及以下步骤:\n\nFork远程仓库。\nCloneFork后的仓库到本地。\n创建新的分支进行开发。\n完成开发后,将分支推送到自己的Fork仓库。\n']

# 2. Calculate embeddings by calling model.encode()
embeddings = model.encode(sentences)
print(embeddings)

Expected behavior / 预期结果 (Mandatory / 必填)
正确输出

Screenshots/ 日志 / 截图 (Mandatory / 必填)
If applicable, add screenshots to help explain your problem.
9e9ce04099aa547e3db1afdfb5005ce6
efc60b20ef6a65afbd77341f9a9f6f1d

Additional context / 备注 (Optional / 选填)
Add any other context about the problem here.

@zhouyifeng888
Copy link

确实有某个别算子、或者在某些数值情况下,在Ascend上正常,CPU上却会出现nan,而且win cpu 2.4.0应该也不是正式版,有些不稳定的情况暂时也没啥法子,我碰到这种情况一般就是加个判断赋值操作来处理,就是比如使用mindspore.ops.nan_to_num这个方法,该方法会判断tensor中的值是否为nan,如果是nan可以给它赋上一个指定的正常数值,虽然会有些误差,但不少情况下确实也能基本解决问题
https://www.mindspore.cn/docs/zh-CN/r2.4.1/api_python/ops/mindspore.ops.nan_to_num.html?highlight=nan
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants