Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Total" sequence length doesn't include organelles #403

Open
2 tasks done
muffato opened this issue Sep 17, 2024 · 1 comment
Open
2 tasks done

"Total" sequence length doesn't include organelles #403

muffato opened this issue Sep 17, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@muffato
Copy link

muffato commented Sep 17, 2024

Before opening an issue, please:

  • Make sure you are using the latest version using datasets --version
  • Review our documentation

Describe the bug

Hello NCBI !

The assembly GCA_964199945.1 is reported as having a "Total Sequence Length" of 1,327,610,284 bp, but the the Fasta file actually contains 1,328,070,353 bp. The difference is exactly the MT and the plastid.

In https://www.ncbi.nlm.nih.gov/datasets/docs/v2/reference-docs/command-line/dataformat/tsv/dataformat_tsv_genome/:

assmstats-total-sequence-len Assembly Stats Total Sequence Length

To Reproduce

$ datasets summary genome accession GCA_964199945.1 --as-json-lines | dataformat tsv genome --fields assmstats-total-sequence-len --elide-header
1327610284

Expected behavior

I would expect the "total" sequence length to include everything. I would otherwise call it the length of "nuclear" genome only.

Best regards,
Matthieu

@muffato muffato added the bug Something isn't working label Sep 17, 2024
@olearyna
Copy link
Contributor

Hi muffato

Thank you for highlighting this issue. I agree that it could be clearer, and we’ll work on improving it.

Nuala

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants