Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VG Paths --Metadata Warning: Duplicate Paths Exist in Graph #4440

Open
AbdelF6 opened this issue Nov 8, 2024 · 1 comment
Open

VG Paths --Metadata Warning: Duplicate Paths Exist in Graph #4440

AbdelF6 opened this issue Nov 8, 2024 · 1 comment

Comments

@AbdelF6
Copy link

AbdelF6 commented Nov 8, 2024

Hi!
-->

1. What were you trying to do?
Running vg paths on the gfa file of the T2T-CHM13 reference

2. What did you want to happen?
Wanted a txt output to show the reference paths

3. What actually happened?
The output did not contain a reference or generic path and only had the haplotype paths. Also, I get a warning with this message, for each chromosome. What does this mean, and how could I fix it?

warning:[GFAParser] Skipping GFA P line: GFA format error: On pass 1: On line 6061248: Duplicate path CHM13.chr1 exists in graph

4. If you got a line like Stack trace path: /somewhere/on/your/computer/stacktrace.txt, please copy-paste the contents of that file here:

Place stacktrace here.

5. What data and command can the vg dev team use to make the problem happen?
vg paths --metadata -R -x hprc-v1.0-mc-chm13.gfa > hprc-v1.0-mc-chm13_metadata.txt

I downloaded the gfa file and all other CHM13-T2T index files from this Github link: https://github.com/human-pangenomics/hpp_pangenome_resources/blob/main/hprc-v1.0-mc.md

6. What does running vg version say?

Place vg version output here:
vg version v1.60.0 "Annicco"
Compiled with g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 on Linux
Linked against libstd++ 20230528
Built by [email protected]

Thank you for all of your help and time!

@jltsiren
Copy link
Contributor

I'm not sure where the warning about duplicate paths comes from. There are no duplicate paths in the GFA file, and I didn't see any obvious reasons for it in the GFA parsing code. I believe the warning can be safely ignored.

Beyond that:

  • The HPRC v1.0 graphs are older than the path metadata model currently used in vg. Those graphs do not have reference paths in the sense vg understands them. The reference sequences are stored as P-lines with names of the form sample.contig, which vg interprets as generic paths. Names of the form sample#contig would have been interpreted as reference paths.
  • You can list generic paths using option -G / --generic-paths in the vg paths command.
  • We recommend using HPRC v1.1 graphs, unless you intend to replicate something from a paper using a v1.0 graph.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants