Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allele option doesn't restrict input to actual human class I alleles #6

Open
susannasiebert opened this issue Feb 1, 2024 · 8 comments

Comments

@susannasiebert
Copy link
Contributor

As far as I understand it from your paper, BigMHC supports predictions for all human class I alleles. However, it doesn't restrict the input provided to the -a option to only human class I HLA alleles. I was able to provide non-human alleles, human class II alleles, as well as nonsense words and a prediction result was returned. Is this intended or should this option be limited to only inputs that are actual class I human HLA alleles?

@benjaminalbert
Copy link
Collaborator

benjaminalbert commented Feb 1, 2024

To handle different MHC naming schemes and possibly MHCs that are not found in our MSA, we have a fuzzy string matching method called nearestMHC in the src/mhenc.py file. However, as you note, this fuzzy matching protocol does not filter out completely invalid inputs.

@RachelKarchin
Copy link

@susannasiebert Is this issue an obstacle to integrate BigMHC into pVacTools? If yes, we will come up with a workaround for you. Please let us know.

@susannasiebert
Copy link
Contributor Author

For all algorithms pVACtools currently support we maintain files for supported alleles for each algorithm in order to fail early when someone supplies a nonsense value for the allele or skip prediction calls for unsupported alleles. I think for BigMHC I will just manually create such a file with all known human class I alleles.

However, I do think it would be worthwhile to have BigMHC fail if a user supplies non-human or class II alleles since right now predictions will be made and that might be confusing to users. Unless I misunderstood the paper and such predictions are valid?

@benjaminalbert
Copy link
Collaborator

benjaminalbert commented Feb 2, 2024

The list of alleles used in the MSA can be found here: https://github.com/KarchinLab/bigmhc/blob/master/data/pseudoseqs.csv

I've add some text to the README describing the fuzzy string matching.

@RachelKarchin
Copy link

One caveat to that - the alleles used in the MSA include many that are not human.

@susannasiebert
Copy link
Contributor Author

One caveat to that - the alleles used in the MSA include many that are not human.

Can BigMHC be used with non-human data/alleles then?

@benjaminalbert
Copy link
Collaborator

One caveat to that - the alleles used in the MSA include many that are not human.

Can BigMHC be used with non-human data/alleles then?

Although BigMHC can consume non-human data, we have not trained or validated on non-human data, so I would not recommend it.

@Stikus
Copy link

Stikus commented Sep 13, 2024

@susannasiebert As far as I see - your fork https://github.com/griffithlab/bigmhc is 10 commits ahead of, 4 commits behind.

Also, I see opened PR #8

What is the current state of this issue and PR? Should we use your fork or latest bigmhc ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants