-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dealing with alleles and multiple gene calls #561
Comments
@MKanetscheider, I made this a separate issue. |
@zktuong, would be curious if and how you have dealt with this before? From a technical perspective I could see two solutions:
|
yea i just only do: https://github.com/zktuong/dandelion/blob/7a86c211da0a3eab6863921d2dfbbd562feeeaa9/dandelion/tools/_tools.py#L1660C13-L1660C79 but i don't select the first one: https://github.com/zktuong/dandelion/blob/7a86c211da0a3eab6863921d2dfbbd562feeeaa9/dandelion/tools/_tools.py#L1669C4-L1669C55 can be easily changed to just |
I think the matrix solution would be quite elegant, as it would also allow us to deal with the case of multiple genes, e.g. to ensure
Basically all we'd need is a function that builds such a matrix from the gene names in plug it into the clonotype definition module. @felixpetschko, do you think you could look into that? CC @FFinotello |
Hi @grst, I just took a look at my improved implementation of the v/j-gene matching and I think the matrix solution you suggested should work fine. If this implementation is not that urgent I would actually tackle this problem in a few weeks when I completed my thesis if that's fine. |
sounds good, thanks! |
I want to report some kind of bug/issue that I recognized during my work in Scirpy and has to do with scirpy.tl.define_clonotype cluster (but likely also with scirpy.tl.define_clonotype, although this was not tested). Scirpy considers any string inside v_call (same problem with j_call) as a unique V-gene assignment, and this is perfectly fine for working with Cell Ranger annotation. However, if we are working with re-annotated data, which is done with IgBlastn or IMGT/Highv-quest this is not true any more. First, the annotation contains alleles, which are depicted like this IGHV3-33*001. The problem is that if another cell would have IGHV3-33*002 these two cells would always be separated by Scirpy (if v_gene = True), because Scirpy thinks that those are totally different genes, although they only differ in their alleles, even if everything else even the junction sequence can be quite similar.
Another issue with using IgBlastn or IMGT/Highv-quest re-annotation is that they often leave multiple possible gene assignments into the v_call column if they are all similar likely.
What I did so far with my datasets was that I manually manipulated the v_call and j_call column prior to loading the datatset into an AnnData object so that they only contain the first v/j_call and without the allele information.
My idea here would be to adapt the clonotype cluster function so that it automatically ignores multiple v_call's/j_call's i.e. only considers the first one and also ignores the allele information for clustering, but doesn't manipulate the call itself. Immcantation has a own parameter on how to work with multiple calls for a gene (see "parameter first= FALSE": https://scoper.readthedocs.io/en/stable/topics/hierarchicalClones/).
Actually I encountered this problem already some time ago and discussed it with @felixpetschko but eventually we both forgot about it until now. Either way, I think it's good if @grst can also have a look on this problem and help with a solution, because if I remeber correctly it's not that trivial to "fix" this. Maybe there is some elegant workaround available?
Originally posted by @MKanetscheider in #542 (comment)
The text was updated successfully, but these errors were encountered: