Position and interval notation #46
h-2
started this conversation in
Design Questions
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
There are typically two notations:
[beg, end)
[beg, end]
Some formats use 1., others use 2.
We would like to use 1. everywhere, but this means that when reading a SAM or VCF record, the reported
.pos()
will different from the position in the plaintext file and the position shown bysamtools
/bcftools
. That's why I initially proposed to use whatever the common formats use, even if this is inconsistent within the library.However, with the introduction of
bio::genome_region
and subregion reading, this becomes more difficult. The algorithm to compute an overlap, for example, needs to know whether the interval is half-open or closed. It's possible to add a template parameter togenome_region
that differentiates between half-open and closed, but this adds complexity 😒I see three solutions:
genome_region
. Document everywhere which notation is used.Beta Was this translation helpful? Give feedback.
All reactions