-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some kind of breakend enhancement #64
Comments
…so include breakend alt alleles. Added new GenomicVariant.Builder class for more convenient builder-based construction Added new GenomicVariant.mateId and GenomicVariant.eventId variables Added new simpler GenomicVariant static constructors not requiring an id field.
Relaxed the requirements for creating breakends in the same way as other symbolic alleles. If the GenomicBreakend API is required, then the symbolic version can be converted relatively easily: // Breakends can be treated as a symbolic variant
GenomicVariant bnd = vcfConverter.convertSymbolic(vcfConverter.parseContig("1"), "bnd_U", 12345, 12345, "C", "C[2:321682[", 0, "bnd_V", "tra2");
assertInstanceOf(GenomicVariant.class, bnd);
// or they can be converted to a specialised breakend variant
GenomicVariant bndb = vcfConverter.convertBreakend(vcfConverter.parseContig("1"), "bnd_U", 12345, ConfidenceInterval.precise(), "C", "C[2:321682[", ConfidenceInterval.precise(), "bnd_V", "tra2");
assertInstanceOf(GenomicBreakendVariant.class, bndb);
VcfBreakendResolver vcfBreakendResolver = new VcfBreakendResolver(b37);
GenomicBreakendVariant breakendVariant = vcfBreakendResolver.resolveBreakend(bnd);
assertThat(breakendVariant, equalTo(bndb));
assertThat(bnd.contig(), equalTo(breakendVariant.contig()));
assertThat(bnd.length(), equalTo(breakendVariant.length()));
assertThat(bnd.ref(), equalTo(breakendVariant.ref()));
// C[2:321682[
String breakendAltValue = VcfBreakendFormatter.makeAltVcfField(breakendVariant);
assertThat(bnd.alt(), equalTo(breakendAltValue));
assertThat(bnd.mateId(), equalTo(breakendVariant.mateId()));
assertThat(bnd.eventId(), equalTo(breakendVariant.eventId())); |
Breakend and Symbolic versions of breakend are not fully symmetrical. i.e. a // Breakends can be treated as a symbolic variant
GenomicVariant bnd = vcfConverter.convertSymbolic(vcfConverter.parseContig("1"), "bnd_U", 12345, 12345, "C", "C[2:321682[", 0, "bnd_V", "tra2");
assertInstanceOf(GenomicVariant.class, bnd);
// or they can be converted to a specialised breakend variant
GenomicVariant bndb = vcfConverter.convertBreakend(vcfConverter.parseContig("1"), "bnd_U", 12345, ConfidenceInterval.precise(), "C", "C[2:321682[", ConfidenceInterval.precise(), "bnd_V", "tra2");
assertInstanceOf(GenomicBreakendVariant.class, bndb);
// C[2:321682[
String breakendAltValue = VcfBreakendFormatter.makeAltVcfField(breakendVariant);
assertThat(bnd.alt(), equalTo(breakendAltValue));
assertThat(bnd.alt(), equalTo("C[2:321682["))
assertThat(bndb.alt(), equalTo("C[2:321682[")) // FAIL!! bndb.alt() is empty, in this case - high WTF score here. Having A solution to this might be to store the sequence of ref and alt in a new |
@ielis How does this sound as a solution to maximise ease of use and hopefully reduce the WTF factor: // consider the following breakend variant in VCF:
// 1 12345 bnd_U C C[2:321682[ 6 PASS SVTYPE=BND;MATEID=bnd_V;EVENT=tra2
// Breakends can be treated as a symbolic variant
GenomicVariant bnd = vcfConverter.convertSymbolic(vcfConverter.parseContig("1"), "bnd_U", 12345, 12345, "C", "C[2:321682[", 0, "bnd_V", "tra2");
// or they can be converted to a specialised breakend variant
GenomicBreakendVariant bndb = vcfConverter.convertBreakend(vcfConverter.parseContig("1"), "bnd_U", 12345, "C", "C[2:321682[", "bnd_V", "tra2");
// note that the input values are the SAME as the returned values. No WTF here
assertThat(bnd.isSymbolic(), equalTo(true));
assertThat(bnd.isBreakend(), equalTo(true));
assertThat(bnd.contig().name(), equalTo("1"));
assertThat(bnd.coordinates(), equalTo(Coordinates.oneBased(12345, 12345)));
assertThat(bnd.ref(), equalTo("C"));
assertThat(bnd.alt(), equalTo("C[2:321682["));
// note that the input values are often DIFFERENT to the returned values. High WTF here.
assertThat(bndb.isSymbolic(), equalTo(true));
assertThat(bndb.isBreakend(), equalTo(true));
assertThat(bndb.contig().name(), equalTo("1"));
assertThat(bndb.coordinates(), equalTo(Coordinates.oneBased(12346, 12345))); // would expect Coordinates.oneBased(12345, 12345)
assertThat(bndb.ref(), equalTo("C")); // if this was on the - strand a G would be returned here.
assertThat(bndb.alt(), equalTo("")); // would expect "C[2:321682["
// To get all these values the VcfBreakendFormatter needs to be used
// C[2:321682[
String breakendAltValue = VcfBreakendFormatter.makeAltVcfField(bndb);
// However the GenomicBreakendVariant can be converted back to a symbolic genomic variant again (see commit [36dc08d844b37e7343a60d08640a82f952fc9bbf](https://github.com/exomiser/svart/commit/36dc08d844b37e7343a60d08640a82f952fc9bbf))
// this uses the VcfBreakendFormatter under the hood
GenomicVariant symbolicBreakend = breakendVariant.toSymbolicGenomicVariant();
assertThat(bnd, equalTo(symbolicBreakend)); ProposalRetain the current methods and constructor signatures but:
In detail this would mean: // Current
/**
* @param eventId The VCF event ID
* @param left The left {@link GenomicBreakend}
* @param right The right {@link GenomicBreakend}
* @param ref The REF allele <b>AS IT WOULD BE REPRESENTED ON THE LEFT BREAKEND STRAND</b>
* @param alt Any inserted bases <b>AS IT WOULD BE REPRESENTED ON THE RIGHT BREAKEND STRAND</b> if any, or empty.
* @return a {@link GenomicBreakendVariant} composed of the input values.
*/
static GenomicBreakendVariant of(String eventId, GenomicBreakend left, GenomicBreakend right, String ref, String alt) {
return DefaultGenomicBreakendVariant.of(eventId, left, right, ref, alt);
}
// Proposed
/**
* @param eventId The VCF event ID
* @param left The left {@link GenomicBreakend}
* @param right The right {@link GenomicBreakend}
* @param breakendRef The REF allele <b>AS IT WOULD BE REPRESENTED ON THE LEFT BREAKEND STRAND</b>
* @param insertSeq Any inserted bases <b>AS IT WOULD BE REPRESENTED ON THE RIGHT BREAKEND STRAND</b> if any, or empty.
* @return a {@link GenomicBreakendVariant} composed of the input values.
*/
static GenomicBreakendVariant of(String eventId, GenomicBreakend left, GenomicBreakend right, String breakendRef, String insertSeq) {
return DefaultGenomicBreakendVariant.of(eventId, left, right, breakendRef, insertSeq);
} public interface GenomicBreakendVariant extends GenomicVariant {
// added fields
/**
* The value of the ref allele as found on the left breakend strand.
*/
String breakendRef();
/**
* The inserted sequence (if any) between the mated breakends.
*/
String insertSeq();
} The values of strand(), coordinates(), ref() and alt() will the calculated from the left(), right(), breakendRef() and insertSeq() to return the symbolic All of which which means this, in the end: // consider the following breakend variant in VCF:
// 1 12345 bnd_U C C[2:321682[ 6 PASS SVTYPE=BND;MATEID=bnd_V;EVENT=tra2
// Breakends can be treated as a symbolic variant
GenomicVariant bnd = vcfConverter.convertSymbolic(vcfConverter.parseContig("1"), "bnd_U", 12345, 12345, "C", "C[2:321682[", 0, "bnd_V", "tra2");
// or they can be converted to a specialised breakend variant
GenomicBreakendVariant bndb = vcfConverter.convertBreakend(vcfConverter.parseContig("1"), "bnd_U", 12345, "C", "C[2:321682[", "bnd_V", "tra2");
// note that the input values are the SAME as the returned values. No WTF here
assertThat(bnd.isSymbolic(), equalTo(true));
assertThat(bnd.isBreakend(), equalTo(true));
assertThat(bnd.contig().name(), equalTo("1"));
assertThat(bnd.strand(), equalTo(Strand.POSITIVE));
assertThat(bnd.coordinates(), equalTo(Coordinates.oneBased(12345, 12345)));
assertThat(bnd.ref(), equalTo("C"));
assertThat(bnd.alt(), equalTo("C[2:321682["));
// note that the input values are the SAME as the returned values. No WTF here either
assertThat(bndb.isSymbolic(), equalTo(true));
assertThat(bndb.isBreakend(), equalTo(true));
assertThat(bndb.contig().name(), equalTo("1"));
assertThat(bndb.strand(), equalTo(Strand.POSITIVE)); // this could be NEGATIVE
assertThat(bndb.coordinates(), equalTo(Coordinates.oneBased(12345, 12345))); // same as symbolic version
assertThat(bndb.ref(), equalTo("C")); // same as symbolic version
assertThat(bndb.alt(), equalTo("C[2:321682[")); // same as symbolic version
assertThat(bndb.breakendRef(), equalTo("C")); // if this was on the - strand a G would be returned here.
assertThat(bndb.insertSeq(), equalTo("")); I think that this will really minimise the differences between symbolic and breakend variants yet still allow applications such as Svanna to utilise the |
Found some TODOs but can't quite fully grok them right now.
The text was updated successfully, but these errors were encountered: