Update handling of INFO/END when writing records in VCF #1201
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR is meant to resolve issue #1200. In particular, it changes the way the
write
method forVariantFile
handles reasoning about when to writeINFO/END
or not. Previously, the code attempted to check to write this only when there were symbolic alleles, but ended up only writing this for insertions when asked for explicitly.The code change now decides to exclude writing
INFO/END
if it's not present in the header, but will write in all cases when included in the header. This should allow users to updateEND
values usingrecord.stop
, like in the following examples.Start with
example.vcf.gz
as:Here are two blocks of Python code run on it with their respective outputs:
In this case, the output matches
example.vcf.gz
despite editing therecord.stop
positions, because theEND
field is not defined in the header. However, this code block:produces the following output:
So the user can control whether
END
should appear in the INFO fields by toggling whether it should be included in the header or not, and then access it viarecord.stop
as usual. I think this makes more conceptual sense to check whether to print the field or not based on the header values. Since thesync
method uses the same formula for determining theEND
coordinate, it should be consistent with the existing paradigm in the other field setters.