-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
关于困难蛋白质 #20
Comments
不是很确定为什么,BLAST输出就有一个0~1之间的identity,然后cutoff是0.6,我用的BLAST迭代次数是1 |
感谢您的回复,我是用测试集的psiblast的查询结果xx-test-ppi-blast-out.xml为依据查询的,可能需要用psiblast跑一下训练集的结果? |
哦,是,要跑训练集的 |
identity是不是也要根据blast的结果进一步计算得到呢 |
是的,BLAST的输出结果里直接就有identity,然后是选所有hsp里最大的 |
我是通过: |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
the definition of the difficult proteins is: the sequence identity of the protein (in the training set) most similar (homologous) to a difficult protein is less than 60%.
你好,请问困难蛋白质的数据是通过max(hsp.identities / rec.query_length for hsp in alignment.hsps) < 0.6得到的吗?
我基于此得到的cc mf bp上的困难蛋白质在数量上和论文中给出的有10个左右的偏差。
The text was updated successfully, but these errors were encountered: