-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No secondary structure data in CASP12 TFRecord files. I didn't check others... #5
Comments
"* CASP12 test set is incomplete due to embargoed structures. Once the embargo is lifted we will release all structures." https://github.com/aqlaboratory/proteinnet |
It doesn't apply to CASP12 only. None of the files contains secondary structure data |
Hi, Yes, looks like the text-based records of CASP11 are also missing the secondary structure entries. |
Thanks for bringing this to my attention. I will update the files soon with secondary structure information. |
@alquraishi Thank you for the amazing resource. May I please know when this issue will be fixed? Thanks! |
I checked a number of splits for a number CASPs - both in TFRecord and in textual formats. I wasn't exhaustive, but it seems like secondary structure data is missing from all of them. Can the information still be (easily) added to the datasets? |
Hi @AlexeyG, yes the information can be added easily. It's mostly already there, I just need to expose it. Stay tuned. |
Hi @alquraishi, can you estimate when this is going to happen? |
I am Harun Or Rashid,doing masters thesis in Protein sequence, structure and function analysis at University of Wuerzburg,Germany under Prof.Dr.Thomas Dandekar who is the chair of department of Bioinformatics. I have been trying to implement your RGN network to predict protein 3d structure from sequence. I followed the instruction in your Github :https://github.com/aqlaboratory/rgn I installed the cpu version of tensorflow 1.10.0 including python 2.7 and setproctitle in conda environment. I made directory as you mentioned. I ran script: But i got the out one CASP7.log file which i attached here. I do not understand the error and wheres the problem. |
Hello, also encountered this issue(specifically in CASP11), are there still plans on adding secondary structures in observable future? |
Any progress in resolving this issue? |
I am also very interested in using the secondary structure information-- are there still plans to release this info? Thanks! |
As an interim solution I added JSON files for the secondary structure data. I say interim because there are a few caveats: the data is not currently integrated within the rest of ProteinNet. Instead, these JSON files are on their own and in an ad hoc file format. There are two files, one corresponding to single domain entries coming from ASTRAL and the other to whole proteins coming from the PDB. The IDs of these entries match those of the original ProteinNet files, and so it should be easy to cross-reference them. The only other wrinkle is that not all ProteinNet entries have secondary structure information, but the vast majority do. The files are linked to in the main README page. |
@alquraishi I've just checked the ids of validation and test datasets of CASP-11 with your added JSON files. Unfortunately, I cannot match any ids between the CASP-11 and JSON files. I would be thankful if you could kindly let me know about the possibility of adding the secondary structures info directly to the original CASP datasets? Thanks in advance. |
No description provided.
The text was updated successfully, but these errors were encountered: