Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loss of ZFIN annotations (due to mapping issue?) #2404

Open
kltm opened this issue Dec 23, 2024 · 15 comments
Open

Loss of ZFIN annotations (due to mapping issue?) #2404

kltm opened this issue Dec 23, 2024 · 15 comments

Comments

@kltm
Copy link
Member

kltm commented Dec 23, 2024

From @doughowe on the GO Helpdesk email system:

[...]
We noticed major changes in the gene descriptions being generated at the Alliance in the latest release vs. the prior
release.  The changes are related to the underlying GO annotations.  After spot checking a few, it looks like the 
underlying GO annotation set may be incomplete?  The examples I spot checked seemed to point to missing IBA 
annotations, though I'm not sure if it is limited to that.  The annotations that are affecting the gene descriptions are 
also absent in AmiGO.  Was there a problem somewhere in this department recently?

Here are some gene description examples showing the changes:
Screenshot 2024-12-19 at 11 55 37 AM-1
@kltm
Copy link
Member Author

kltm commented Dec 23, 2024

Noting from @cmungall that this may be a PANTHER/UniProt/ZFIN mapping issue.

ZDB-GENE-081104-83 == A0A8M1RHF7

https://www.uniprot.org/uniprotkb/A0A8M1RHF7/entry

No PAINT annotations for the gene:

curl -L -s ftp://ftp.pantherdb.org/downloads/paint/presubmission/gene_association.paint_zfin.gaf.gz | gzip -dc  | grep ZDB-GENE-081104-83 | wc
       0       0       0

PAINT does propagate the mappings to the protein, but these don't get mapped to the gene:

curl -L -s ftp://ftp.pantherdb.org/downloads/paint/presubmission/gene_association.paint_zfin.gaf.gz | gzip -dc  | grep A0A8M1RHF7
UniProtKB A0A8M1RHF7 si:ch1073-349o24.2 involved_in GO:0007288 GO_REF:0000033 IBA PANTHER:PTN002919391|MGI:MGI:2444274 P Cilia- and flagella-associated protein 65 UniProtKB:A0A8M1RHF7|PTN000814515 protein taxon:7955 20220331 GO_Central
UniProtKB A0A8M1RHF7 si:ch1073-349o24.2 is_active_in GO:0005737 GO_REF:0000033 IBA PANTHER:PTN002919391|UniProtKB:Q6ZU64|MGI:MGI:2444274 C Cilia- and flagella-associated protein 65 UniProtKB:A0A8M1RHF7|PTN000814515 protein taxon:7955 20220331 GO_Central
UniProtKB A0A8M1RHF7 si:ch1073-349o24.2 is_active_in GO:0036126 GO_REF:0000033 IBA PANTHER:PTN002919391|MGI:MGI:2444274|UniProtKB:Q6ZU64 C Cilia- and flagella-associated protein 65 UniProtKB:A0A8M1RHF7|PTN000814515 protein taxon:7955 20220331 GO_Central

You can see this on a rump zebrafish uniprot in GO central
https://amigo.geneontology.org/amigo/gene_product/UniProtKB:A0A8M1RHF7

Possibly noted by @pgaudet at geneontology/go-releases#92

@kltm
Copy link
Member Author

kltm commented Dec 23, 2024

Once we have a handle on the exact mechanism and what has changed, we should also update the release notes at:
https://github.com/geneontology/go-site/tree/master/releases/2024-11-03

@doughowe
Copy link
Contributor

doughowe commented Jan 7, 2025

Any chance this can be rectified prior to the next Alliance data release mid-January? I have no idea how complex a problem this is. I do see A0A8M1RHF7 in ZFIN associated with the cfap65 gene FW that's worth.

@dustine32
Copy link
Contributor

@kltm I have pushed the fixed ZFIN IBA file gene_association.paint_zfin.gaf to the current PAINT download URL:
http://data.pantherdb.org/ftp/downloads/paint/presubmission/gene_association.paint_zfin.gaf.gz

So, snapshot should pick this new file up automatically.

After running the fix, the number of annotations with ZFIN ID jumped from 53121 (previous 19.0 IBA file) to 68236. This is closer to the last 17.0 IBA file (before the ID mappng "reversion"), which had 68035 ZFIN ID annotations.

@kltm
Copy link
Member Author

kltm commented Jan 8, 2025

@dustine32 We are wired into ftp for the metadata/datasets, and it doesn't seem to be there? Will that get propagated, or should we switch schema?

@kltm
Copy link
Member Author

kltm commented Jan 8, 2025

@doughowe Just to let you know what's going on, we believe we have a possible fix in place and are processing it now. It will likely take about a day or so to see the results.
Initially, the updated data will appear on snapshot.geneontology.org, before going into the (longer) release process. Would the snapshot be sufficient for your purposes at this point?

@doughowe
Copy link
Contributor

doughowe commented Jan 8, 2025

Thanks @kltm. From my perspective, this bug shows up as missing IBA annotations in AmiGO and then when the Alliance pulls in IBA GO annotations, which in turn are used to generate the gene descriptions. I believe the Alliance will be completing new data submissions in the next week or so. If this fixed up data will be available to the Alliance by then, this would be great. Thanks for taking a quick Look into this.

@kltm
Copy link
Member Author

kltm commented Jan 8, 2025

@doughowe Would you know which data the Alliance is pulling: the snapshot or release data? If the former, there should not be an issue, as we are currently processing the fixes; if the latter, we would not normally be expecting a release until more towards the end of January, and would have to coordinate a one-off release somehow, or have the data submission deadline pushed back slightly.

@sierra-moxon
Copy link
Member

This used to be the config file for file location gathering at the Alliance: https://github.com/alliance-genome/agr_ferret/blob/master/src/datasets/GAF.yaml (to me, that looks like release vs. snapshot)

It could be that the A-Team gets a copy of the files for use in the curation system from a different config file.

@kltm
Copy link
Member Author

kltm commented Jan 8, 2025

Okay, for the moment, let's get the file first, then we can solve getting it to the right location or setting the right metadata.

@kltm
Copy link
Member Author

kltm commented Jan 12, 2025

@doughowe @dustine32 Okay, I've gone through a snapshot cycle and it looks like the ZFIN PAINT file now contains 75978 lines. What is the best way to confirm that the mapping is now desired? E.g.:

curl -L -s http://snapshot.geneontology.org/annotations/zfin.gaf.gz | gzip -dc  | grep ZDB-GENE-081104-83

@doughowe
Copy link
Contributor

doughowe commented Jan 12, 2025

@kltm @dustine32
A recent GAF taken directly out of ZFIN has 76404 IBA annotations, so that number sounds like it is in the right ballpark...probably correct. I'm assuming the 75978 you are reporting are all IBA? I don't know how those get merged with other experimental annotations before heading into the Alliance...maybe they get merged into the data source behind AmiGO before heading over to Alliance?

@kltm
Copy link
Member Author

kltm commented Jan 13, 2025

@doughowe Yes: the number I listed is the line count from the PAINT ZFIN file, which is the exclusive source of IBAs in the GO data flow.
To clarify a little, AmiGO is directly driven off of our generated GAFs the final "snapshotted" version of all ZFIN data that GO currently has would be at: http://snapshot.geneontology.org/annotations/zfin.gaf.gz . I'm assuming that the Alliance has some metadata to pick up a given file. I am not sure if they are using snapshot or released versions.

@sierra-moxon
Copy link
Member

@kltm - confirmed with Alliance, they are using the released version and still use the metadata file above. So we need to either change the Alliance metadata to pull snapshot instead, or this will not be visible in the alliance until we release and they have another data release.

@kltm
Copy link
Member Author

kltm commented Jan 13, 2025

Thank you, @sierra-moxon !
@doughowe Maybe the best thing would be for you to switch to the snapshot of the GO ZFIN file for this Alliance release, then revert it for the one after? I'm not really up on the best way to communicate that. Perhaps you could make a PR or a ticket to request that we work that out with Alliance?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants