UnboundLocalError: local variable 'sequence' referenced before assignment #6

jpummil · 2024-11-19T19:42:59Z

Have run the test packaged with the software successfully. But when I try to run my own data, I get the following error:

python create_inference_graphs.py --reads All+RatQ3.fastq --gfa raven-unpolished.gfa --asm raven --out Assembly
Starting to parse assembler output
Starting to loop over GFA
Traceback (most recent call last):
File "create_inference_graphs.py", line 50, in
create_inference_graph(gfa, reads, out, asm)
File "create_inference_graphs.py", line 13, in create_inference_graph
graph, pred, succ, reads, edges, read_to_node, _ = graph_parser.only_from_gfa(gfa_path, training=False, reads_path=reads_path, get_similarities=True)
File "/home/jpummil/Applications/GNNome/graph_parser.py", line 165, in only_from_gfa
if sequence == '*':
UnboundLocalError: local variable 'sequence' referenced before assignment

The referenced .fastq file assembles fine using Raven. Line 165 as referenced begins as follows:
S f890dea9-4546-4e77-aaee-6d7924f1a07d CCGAGTGCCGCCTCTGGCACACGTGCCGTAGGTTCGCCACCACTGCTATA

Something obvious I'm doing incorrectly, or perhaps just an early code issue?

lvrcek · 2024-11-21T06:45:33Z

Hi,

Thanks for using GNNome!
Can you please tell me which version of Raven you are using? And could you maybe send me first few (complete) lines of the raven-unpolished.gfa in a file, so that it's easier for me to debug?

I suspect that the problem lies in different versions of Raven creating slightly different GFA files, which makes parsing tricky.

jpummil · 2024-11-21T16:24:17Z

Greetings Ivrcek!

I was using Raven 1.8.3 in conjunction with the DragonFlye pipeline.
Apologies for the suffix change on .gfa file, but .gfa's apparently allowed as uploads.
GFA-Sample.txt

Happy to provide any additional information you might need to resolve the issue!

Thanks again!

lvrcek · 2024-11-22T02:35:15Z

I fixed it in the latest commit, it should be fine now.
As suspected, it was about a slightly different GFA. You can simply do git pull and it should update your local code with all the changes I made. Please let me know if you run into any issues.

Also, if something breaks again during the parsing of the L-lines in the GFA, you can send me the last few lines of your GFA and I will fix that as well.

Best,
Lovro

jpummil · 2024-11-22T16:31:00Z

Hey Lovro!

Thanks for the update! The git pull went fine, but now it has a different error.

$ python create_inference_graphs.py --reads All+RatQ3.fastq --gfa raven-unpolished.gfa --asm raven --out Assembly
Starting to parse assembler output
Starting to loop over GFA
Elapsed time: 3s
Elapsed time: 3s
Calculating similarities...
0%| | 0/7790 [00:00<?, ?it/s]
Traceback (most recent call last):
File "create_inference_graphs.py", line 50, in
create_inference_graph(gfa, reads, out, asm)
File "create_inference_graphs.py", line 13, in create_inference_graph
graph, pred, succ, reads, edges, read_to_node, _ = graph_parser.only_from_gfa(gfa_path, training=False, reads_path=reads_path, get_similarities=True)
File "/home/jpummil/Applications/GNNome/graph_parser.py", line 306, in only_from_gfa
overlap_similarities = calculate_similarities(edge_ids, read_seqs, overlap_lengths)
File "/home/jpummil/Applications/GNNome/graph_parser.py", line 108, in calculate_similarities
overlap_similarities[(src, dst)] = 1 - edit_distance / ol_length
ZeroDivisionError: division by zero

Attaching the last 50 lines of .gfa
GFA-Sample2.txt

Sorry for all the bother,
Jeff

lvrcek · 2024-11-22T17:13:01Z

Hey Jeff,

No worries at all :)

I suspect I know where the error comes from, so I pushed another commit which fixes that. If it persists, now at least it should print out a message for which exactly reads this error happens. It should say:
Zero division error occurs for reads: {id1} {id2}
{id1} and {id2} will be some read IDs. You can then send me an output of the following commands

grep {id1} raven-unpolished.gfa > id1.out
grep {id2} raven-unpolished.gfa > id2.out

Just substitute {id1} and {id2} with actual IDs. Hope the problem is fixed though.

Best,
Lovro

jpummil · 2024-11-22T17:27:30Z

Thanks Lovro! That did it!

I was able to run both the create_inference_graphs.py as well as the subsequent inference.py with no errors!

The resulting assembly is about 8x smaller than it should be, so I need to ponder that a bit. But, CHEERS! I really appreciate your diligence getting things running!

Jeff

lvrcek · 2024-11-22T17:31:44Z

No problem, glad you got it to work!

Hmm ok, I will try to take a look at why this happens and see if it's something about the parameters we use during the inference. Can I ask which genome you are trying to reconstruct?

Lovro

jpummil · 2024-11-22T17:35:38Z

Sure! It's C. horridus (Timber rattlesnake). We have a pretty solid assembly that's around 1.5Gb in size. In contrast, GNNome output was 214Mb...

lvrcek · 2024-11-22T17:37:33Z

I agree that's a lot shorter than it should be. I will take a look at what's going on.

lvrcek · 2024-11-28T07:59:01Z

Hey, I'm trying to debug this and would appreciate your help. Could you pull the code again and try to assemble the genome? Also, if you save the output of running GNNome to, e.g., output.log, could you send me the result of the following command:

grep "Zero division error" output.log

Thanks!

jpummil · 2024-11-28T16:56:08Z

Greetings Lovro,

Did a git pull, then repeated the process of both create_inference_graphs and inference. Oddly enough, while the first step shows Zero division errors, the inference step shows no such errors.

$ python create_inference_graphs.py --reads All+RatQ3.fastq --gfa raven-unpolished.gfa --asm raven --out Assembly
Starting to parse assembler output
Starting to loop over GFA
Elapsed time: 3s
Elapsed time: 3s
Calculating similarities...
100%|██████████████████████████████████████████████████████████████████████████████| 7790/7790 [00:09<00:00, 805.80it/s]
Zero division error occurs for 44 pairs: [(6480, 6480), (6481, 6481), (10444, 10444), (10445, 10445), (10948, 10948), (10949, 10949), (11736, 11736), (11737, 11737), (12572, 12572), (12573, 12573), (12614, 12614), (12615, 12615), (13790, 13790), (13791, 13791), (18844, 18844), (18845, 18845), (20368, 20368), (20369, 20369), (20390, 20390), (20391, 20391), (20400, 20400), (20401, 20401), (21424, 21424), (21425, 21425), (21698, 21698), (21699, 21699), (22578, 22578), (22579, 22579), (22588, 22588), (22589, 22589), (22620, 22620), (22621, 22621), (23724, 23724), (23725, 23725), (24850, 24850), (24851, 24851), (24986, 24986), (24987, 24987), (25284, 25284), (25285, 25285), (25580, 25580), (25581, 25581), (25708, 25708), (25709, 25709)]
Done!
Elapsed time: 13s
Parsed assembler output! Saving files...
Processing of graph done!

Resulting assembly after inference is now 221M (should be ~1.5G).

lvrcek · 2024-12-03T02:54:15Z

It seems like this is because Raven produces GFA where some edges have length 0, thus the zero division error when computing edge similarities. However it seems like this only happens for self-loops. I will leave this issue open for now and try to figure out why Raven produces such edges and if this is the cause of the short length of the assembly.

Thank you for your help, Jeff.

jpummil · 2024-12-03T03:06:15Z

Sure thing, Lovro!

I have 3-4 new projects being sequenced now as well. I'll give them a try once I'm a ways along and see if they behave similarly.

Let me know if I can be of further assistance in the future!

--jeff

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnboundLocalError: local variable 'sequence' referenced before assignment #6

UnboundLocalError: local variable 'sequence' referenced before assignment #6

jpummil commented Nov 19, 2024

lvrcek commented Nov 21, 2024

jpummil commented Nov 21, 2024

lvrcek commented Nov 22, 2024

jpummil commented Nov 22, 2024

lvrcek commented Nov 22, 2024

jpummil commented Nov 22, 2024

lvrcek commented Nov 22, 2024

jpummil commented Nov 22, 2024

lvrcek commented Nov 22, 2024

lvrcek commented Nov 28, 2024

jpummil commented Nov 28, 2024

lvrcek commented Dec 3, 2024

jpummil commented Dec 3, 2024

UnboundLocalError: local variable 'sequence' referenced before assignment #6

UnboundLocalError: local variable 'sequence' referenced before assignment #6

Comments

jpummil commented Nov 19, 2024

lvrcek commented Nov 21, 2024

jpummil commented Nov 21, 2024

lvrcek commented Nov 22, 2024

jpummil commented Nov 22, 2024

lvrcek commented Nov 22, 2024

jpummil commented Nov 22, 2024

lvrcek commented Nov 22, 2024

jpummil commented Nov 22, 2024

lvrcek commented Nov 22, 2024

lvrcek commented Nov 28, 2024

jpummil commented Nov 28, 2024

lvrcek commented Dec 3, 2024

jpummil commented Dec 3, 2024