Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Add tandem_dup_count to Junction #147

Merged

Conversation

Irallia
Copy link
Collaborator

@Irallia Irallia commented Aug 4, 2021

Resolves first part of #143.

A characteristic of tandem duplications is the number of copies. I store this in the new variable int16_t tandem_dup_count. It is 0 if there is no duplication (e.g. insertion (novel element), deletion, inversion ...).

@Irallia Irallia self-assigned this Aug 4, 2021
@Irallia Irallia linked an issue Aug 4, 2021 that may be closed by this pull request
@Irallia Irallia requested review from a team and SGSSGene and removed request for a team August 4, 2021 10:49
@codecov
Copy link

codecov bot commented Aug 4, 2021

Codecov Report

Merging #147 (a125b1c) into master (15392df) will increase coverage by 0.17%.
The diff coverage is 100.00%.

❗ Current head a125b1c differs from pull request most recent head 4df660e. Consider uploading reports for the commit 4df660e to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master     #147      +/-   ##
==========================================
+ Coverage   94.75%   94.93%   +0.17%     
==========================================
  Files          18       18              
  Lines         706      731      +25     
==========================================
+ Hits          669      694      +25     
  Misses         37       37              
Impacted Files Coverage Δ
include/structures/cluster.hpp 100.00% <100.00%> (ø)
include/structures/junction.hpp 95.65% <100.00%> (+0.19%) ⬆️
...ules/clustering/hierarchical_clustering_method.cpp 98.90% <100.00%> (+0.01%) ⬆️
...ules/sv_detection_methods/analyze_cigar_method.cpp 100.00% <100.00%> (ø)
...sv_detection_methods/analyze_split_read_method.cpp 95.08% <100.00%> (+0.08%) ⬆️
src/structures/cluster.cpp 97.01% <100.00%> (+0.71%) ⬆️
src/structures/junction.cpp 92.00% <100.00%> (+3.11%) ⬆️
src/variant_detection/variant_output.cpp 98.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 15392df...4df660e. Read the comment docs.

@@ -27,11 +32,15 @@ bool operator<(Junction const & lhs, Junction const & rhs)
: lhs.get_mate2() != rhs.get_mate2()
? lhs.get_mate2() < rhs.get_mate2()
: lhs.get_inserted_sequence() < rhs.get_inserted_sequence();
// TODO (23.7.21, irallia): tandem_dup_count doesn't play a role here right?
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joshuak94 @joergi-w If the amount of duplications is bigger or smaller between junctions, does not play a role for size comparsion right? Just for equality.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be weird...
In general, the following must hold: If a < b and b < a are both false, then a = b.
This would be violated if you omit the duplication count.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aaah, you mean a duplication is larger if it contains more duplications?
And thus, one junction is larger than another if this is the case.
Right, but we look before whether a junction is smaller from the position perspective (mate < mate). The question is, does it matter how high the tandem_dup_count is and if so where would you put that? Is that more relevant than the length of the inserted sequence?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess, I haven't yet fully understood what a junction is...
Can two junctions with equal positions and sequence lengths possibly have a different duplication count? If yes, I would include it with the lowest priority.
My point is, if any of the fields in class junction differ, there must be a clear order (so you can only omit field comparisons that are indirectly determined, or that are omitted in the equality comparison as well).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now thought about it again.
A junction describes a change in the read in comparison to the ref (e.g. insertion, duplication, deletion). It consists of two endpoints (breakends), a possibly inserted sequence and the name of the read and now also the number of duplications.
If we now compare two junctions, we want to know if they

  1. share approximately the same positions (breakend 1 and 2), then
  2. if they contain a similar number of duplications and
  3. the inserted sequence is of similar length.

Important to note, for a novel element insertion the inserted sequence is the whole new sequence, for a duplication it is the duplicated sequence, this can be shorter if there are multiple duplications. Therefore, I now compare first on duplication and then on the sequence.
The content of the sequence is not compared yet, but that is still planned.

include/structures/junction.hpp Outdated Show resolved Hide resolved
include/structures/junction.hpp Outdated Show resolved Hide resolved
@Irallia Irallia requested a review from SGSSGene August 6, 2021 09:41
Copy link

@SGSSGene SGSSGene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@Irallia Irallia requested a review from joergi-w August 6, 2021 13:30
@Irallia Irallia force-pushed the FEATURE/add_tandem_dup_count_to_junctions branch from 6867c64 to 48a7d05 Compare August 10, 2021 17:26
Copy link
Member

@joergi-w joergi-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far! Two remarks:

@@ -13,6 +13,7 @@ class Junction
Breakend mate1{};
Breakend mate2{};
seqan3::dna5_vector inserted_sequence{};
int16_t tandem_dup_count{};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to use uint16_t?
Intuitively a count is always positive.

@@ -14,6 +14,7 @@ int32_t const chrom1_position2 = 94734377;
int32_t const chrom1_position3 = 112323345;
std::string const chrom2 = "chr2";
int32_t const chrom2_position1 = 234432;
int16_t tandem_dup_count = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use const?
However, it would be good to write at least one test with more than 0 tandem duplications...

@Irallia
Copy link
Collaborator Author

Irallia commented Aug 11, 2021

sorry, it has become a bit more with the test, because functionalities for clustering were also missing

@Irallia Irallia requested a review from joergi-w August 11, 2021 15:30
Copy link
Member

@joergi-w joergi-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments on data types again...

src/structures/cluster.cpp Outdated Show resolved Hide resolved
test/api/clustering_test.cpp Outdated Show resolved Hide resolved
src/structures/cluster.cpp Outdated Show resolved Hide resolved
src/structures/cluster.cpp Outdated Show resolved Hide resolved
src/structures/cluster.cpp Outdated Show resolved Hide resolved
@Irallia Irallia requested a review from joergi-w August 12, 2021 13:41
Copy link
Member

@joergi-w joergi-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@Irallia Irallia force-pushed the FEATURE/add_tandem_dup_count_to_junctions branch from a125b1c to 4df660e Compare August 18, 2021 12:10
@Irallia Irallia merged commit 92cc315 into seqan:master Aug 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Call Tandem Duplications
3 participants