Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnboundLocalError: local variable 'sequence' referenced before assignment #6

Open
jpummil opened this issue Nov 19, 2024 · 13 comments
Open

Comments

@jpummil
Copy link

jpummil commented Nov 19, 2024

Have run the test packaged with the software successfully. But when I try to run my own data, I get the following error:

python create_inference_graphs.py --reads All+RatQ3.fastq --gfa raven-unpolished.gfa --asm raven --out Assembly
Starting to parse assembler output
Starting to loop over GFA
Traceback (most recent call last):
File "create_inference_graphs.py", line 50, in
create_inference_graph(gfa, reads, out, asm)
File "create_inference_graphs.py", line 13, in create_inference_graph
graph, pred, succ, reads, edges, read_to_node, _ = graph_parser.only_from_gfa(gfa_path, training=False, reads_path=reads_path, get_similarities=True)
File "/home/jpummil/Applications/GNNome/graph_parser.py", line 165, in only_from_gfa
if sequence == '*':
UnboundLocalError: local variable 'sequence' referenced before assignment

The referenced .fastq file assembles fine using Raven. Line 165 as referenced begins as follows:
S f890dea9-4546-4e77-aaee-6d7924f1a07d CCGAGTGCCGCCTCTGGCACACGTGCCGTAGGTTCGCCACCACTGCTATA

Something obvious I'm doing incorrectly, or perhaps just an early code issue?

@lvrcek
Copy link
Collaborator

lvrcek commented Nov 21, 2024

Hi,

Thanks for using GNNome!
Can you please tell me which version of Raven you are using? And could you maybe send me first few (complete) lines of the raven-unpolished.gfa in a file, so that it's easier for me to debug?

I suspect that the problem lies in different versions of Raven creating slightly different GFA files, which makes parsing tricky.

@jpummil
Copy link
Author

jpummil commented Nov 21, 2024

Greetings Ivrcek!

I was using Raven 1.8.3 in conjunction with the DragonFlye pipeline.
Apologies for the suffix change on .gfa file, but .gfa's apparently allowed as uploads.
GFA-Sample.txt

Happy to provide any additional information you might need to resolve the issue!

Thanks again!

@lvrcek
Copy link
Collaborator

lvrcek commented Nov 22, 2024

I fixed it in the latest commit, it should be fine now.
As suspected, it was about a slightly different GFA. You can simply do git pull and it should update your local code with all the changes I made. Please let me know if you run into any issues.

Also, if something breaks again during the parsing of the L-lines in the GFA, you can send me the last few lines of your GFA and I will fix that as well.

Best,
Lovro

@jpummil
Copy link
Author

jpummil commented Nov 22, 2024

Hey Lovro!

Thanks for the update! The git pull went fine, but now it has a different error.

$ python create_inference_graphs.py --reads All+RatQ3.fastq --gfa raven-unpolished.gfa --asm raven --out Assembly
Starting to parse assembler output
Starting to loop over GFA
Elapsed time: 3s
Elapsed time: 3s
Calculating similarities...
0%| | 0/7790 [00:00<?, ?it/s]
Traceback (most recent call last):
File "create_inference_graphs.py", line 50, in
create_inference_graph(gfa, reads, out, asm)
File "create_inference_graphs.py", line 13, in create_inference_graph
graph, pred, succ, reads, edges, read_to_node, _ = graph_parser.only_from_gfa(gfa_path, training=False, reads_path=reads_path, get_similarities=True)
File "/home/jpummil/Applications/GNNome/graph_parser.py", line 306, in only_from_gfa
overlap_similarities = calculate_similarities(edge_ids, read_seqs, overlap_lengths)
File "/home/jpummil/Applications/GNNome/graph_parser.py", line 108, in calculate_similarities
overlap_similarities[(src, dst)] = 1 - edit_distance / ol_length
ZeroDivisionError: division by zero

Attaching the last 50 lines of .gfa
GFA-Sample2.txt

Sorry for all the bother,
Jeff

@lvrcek
Copy link
Collaborator

lvrcek commented Nov 22, 2024

Hey Jeff,

No worries at all :)

I suspect I know where the error comes from, so I pushed another commit which fixes that. If it persists, now at least it should print out a message for which exactly reads this error happens. It should say:
Zero division error occurs for reads: {id1} {id2}
{id1} and {id2} will be some read IDs. You can then send me an output of the following commands

grep {id1} raven-unpolished.gfa > id1.out
grep {id2} raven-unpolished.gfa > id2.out

Just substitute {id1} and {id2} with actual IDs. Hope the problem is fixed though.

Best,
Lovro

@jpummil
Copy link
Author

jpummil commented Nov 22, 2024

Thanks Lovro! That did it!

I was able to run both the create_inference_graphs.py as well as the subsequent inference.py with no errors!

The resulting assembly is about 8x smaller than it should be, so I need to ponder that a bit. But, CHEERS! I really appreciate your diligence getting things running!

Jeff

@lvrcek
Copy link
Collaborator

lvrcek commented Nov 22, 2024

No problem, glad you got it to work!

Hmm ok, I will try to take a look at why this happens and see if it's something about the parameters we use during the inference. Can I ask which genome you are trying to reconstruct?

Lovro

@jpummil
Copy link
Author

jpummil commented Nov 22, 2024

Sure! It's C. horridus (Timber rattlesnake). We have a pretty solid assembly that's around 1.5Gb in size. In contrast, GNNome output was 214Mb...

@lvrcek
Copy link
Collaborator

lvrcek commented Nov 22, 2024

I agree that's a lot shorter than it should be. I will take a look at what's going on.

@lvrcek
Copy link
Collaborator

lvrcek commented Nov 28, 2024

Hey, I'm trying to debug this and would appreciate your help. Could you pull the code again and try to assemble the genome? Also, if you save the output of running GNNome to, e.g., output.log, could you send me the result of the following command:

grep "Zero division error" output.log

Thanks!

@jpummil
Copy link
Author

jpummil commented Nov 28, 2024

Greetings Lovro,

Did a git pull, then repeated the process of both create_inference_graphs and inference. Oddly enough, while the first step shows Zero division errors, the inference step shows no such errors.

$ python create_inference_graphs.py --reads All+RatQ3.fastq --gfa raven-unpolished.gfa --asm raven --out Assembly
Starting to parse assembler output
Starting to loop over GFA
Elapsed time: 3s
Elapsed time: 3s
Calculating similarities...
100%|██████████████████████████████████████████████████████████████████████████████| 7790/7790 [00:09<00:00, 805.80it/s]
Zero division error occurs for 44 pairs: [(6480, 6480), (6481, 6481), (10444, 10444), (10445, 10445), (10948, 10948), (10949, 10949), (11736, 11736), (11737, 11737), (12572, 12572), (12573, 12573), (12614, 12614), (12615, 12615), (13790, 13790), (13791, 13791), (18844, 18844), (18845, 18845), (20368, 20368), (20369, 20369), (20390, 20390), (20391, 20391), (20400, 20400), (20401, 20401), (21424, 21424), (21425, 21425), (21698, 21698), (21699, 21699), (22578, 22578), (22579, 22579), (22588, 22588), (22589, 22589), (22620, 22620), (22621, 22621), (23724, 23724), (23725, 23725), (24850, 24850), (24851, 24851), (24986, 24986), (24987, 24987), (25284, 25284), (25285, 25285), (25580, 25580), (25581, 25581), (25708, 25708), (25709, 25709)]
Done!
Elapsed time: 13s
Parsed assembler output! Saving files...
Processing of graph done!

Resulting assembly after inference is now 221M (should be ~1.5G).

@lvrcek
Copy link
Collaborator

lvrcek commented Dec 3, 2024

It seems like this is because Raven produces GFA where some edges have length 0, thus the zero division error when computing edge similarities. However it seems like this only happens for self-loops. I will leave this issue open for now and try to figure out why Raven produces such edges and if this is the cause of the short length of the assembly.

Thank you for your help, Jeff.

@jpummil
Copy link
Author

jpummil commented Dec 3, 2024

Sure thing, Lovro!

I have 3-4 new projects being sequenced now as well. I'll give them a try once I'm a ways along and see if they behave similarly.

Let me know if I can be of further assistance in the future!

--jeff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants