Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heuristics / Rule of Thumb for selecting --gap_cutoff (default 0.25) #5

Open
ackbar03 opened this issue Nov 25, 2022 · 0 comments
Open

Comments

@ackbar03
Copy link

ackbar03 commented Nov 25, 2022

Hi,

Can I ask how I should determine what the gap_cutoff parameter should be for different sequences?

For the target sequence I am looking at, the 25% cutoff removes the majority of sequences from the MSAs and only gives 2 clusters. None of these MSA clusters give good predictions using AF2, they are completely off with no structure.

Thanks! Attached the relevant log below

620 seqs removed for containing more than 25% gaps, 138 remaining
eps n_clusters n_not_clustered
3.00 1 34
3.50 1 34
4.00 1 34
4.50 1 34
5.00 1 34
5.50 1 34
6.00 1 34
6.50 1 34
7.00 1 34
7.50 1 34
8.00 1 34
8.50 1 34
9.00 1 34
9.50 2 31
10.00 1 34
10.50 2 31
11.00 1 34
Selected eps=9.50
138 total seqs
2 clusters, 127 of 138 not clustered (0.92)
avg identity to query of unclustered: 0.38
avg identity to query of clustered: 0.31

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant