Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve lovd_getVariantInfo() and lovd_fixHGVS() #574

Merged
merged 153 commits into from
Feb 10, 2022

Conversation

loeswerkman
Copy link
Collaborator

@loeswerkman loeswerkman commented Nov 17, 2021

Big update on getVariantInfo:

  • Added functionality to deal with repeats;
  • Fixed bugs in the interpretation of positions;
  • Cleaned up the way wildtypes and '?' types were handled;
  • Made even more specific warning and error messages;
  • Implemented decisions on how to handle uncertain variants;
  • Expanded the test case.

Related to #550, #580, #581.
Closes #566.

ifokkema and others added 12 commits October 19, 2021 11:01
NOTE: This is test-driven development; many of these tests currently
fail. The goal of this branch is to improve the function such, that the
tests will no longer fail.
To make getVariantInfo simpler and easier to use, the two regular
expressions have been pulled into one. Secondly, the priority has
been put on finding the positions, so that variants will be sorted
accordingly even if the variant as a whole does not follow HGVS
syntax and is therefore ambiguous or implausable. Thirdly, more
warning messages are being returned to help the user understand
where the problems in their variant description lie.
This new commit fixes the bugs in getVariantInfo and adds multiple
functionalities. The function can now deal with repeats, and
recognises when suffixes are rightly given to variants of a type
other than ins or delins. Most of all, this commit will allow
getVariantInfo() to give truly meaningful warning and error
messages when poorly formatted variants are given to it.

Secondly, all sorts of cases have been added to the getVariantInfo
testcase, to make sure even the strangest of formats are being
tested.
Copy link
Member

@ifokkema ifokkema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went through part of the code, not done yet, but enough for today 😝

src/inc-lib-init.php Outdated Show resolved Hide resolved
src/inc-lib-init.php Outdated Show resolved Hide resolved
src/inc-lib-init.php Outdated Show resolved Hide resolved
src/inc-lib-init.php Outdated Show resolved Hide resolved
src/inc-lib-init.php Outdated Show resolved Hide resolved
src/inc-lib-init.php Outdated Show resolved Hide resolved
src/inc-lib-init.php Outdated Show resolved Hide resolved
src/inc-lib-init.php Outdated Show resolved Hide resolved
src/inc-lib-init.php Show resolved Hide resolved
src/inc-lib-init.php Outdated Show resolved Hide resolved
@ifokkema ifokkema changed the title Improve/get variant info WIP: Improve lovd_getVariantInfo() and lovd_fixHGVS() Nov 24, 2021
@ifokkema ifokkema marked this pull request as draft November 24, 2021 13:50
Copy link
Member

@ifokkema ifokkema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two minor things that I noticed.

src/inc-lib-init.php Outdated Show resolved Hide resolved
src/inc-lib-init.php Outdated Show resolved Hide resolved
src/inc-lib-init.php Outdated Show resolved Hide resolved
src/inc-lib-init.php Outdated Show resolved Hide resolved
src/inc-lib-init.php Outdated Show resolved Hide resolved
src/inc-lib-init.php Outdated Show resolved Hide resolved
src/inc-lib-init.php Outdated Show resolved Hide resolved
These are constructed from VCF files and are translated into
 substitutions. The NN>N or N>NN types we already recognize. Now
 also recognize variants where either the REF or the ALT was left
 empty, like N>. or even .>N - both of which are actually bugs in
 the VCF file generator.
... based on a start position (given in the format of
 lovd_getVariantInfo()) and a variant length (1 being the minimum).
These are variants taken straight from VCF fields, like g.100A>AT
 that should be g.100_101insT, or even more complex variants like
 g.100ATGA>AA that should be g.101_102del.

Empty ALTs are supported, so g.100A>. becomes g.100del.

However, empty REFs are not supported as it's unclear where the
 insertion should take place. Either way, an empty ALT is not a
 valid VCF file.
These now throw an ENOTSUPPORTED as long as they don't match the
 regex; otherwise they already threw a WSUFFIXGIVEN. We currently
 don't prevent this as that's for a later step when we decide to
 properly support them. At least like this, we'll get positions and
 we can recognize them and therefore allow these variants to be
 entered in LOVD.
The first is extracted and processed; then the type is overwritten
 by ";" and an ENOTSUPPORTED is added. Possible warnings that can
 occur from correct HGVS descriptions of combined variants are
 removed. This suffices for now.
Added tests for lovd_fixHGVS() that were recently added for
 lovd_getVariantInfo().
Positions in the 3' UTR weren't handled well yet.
Removed some unneded code, resorted two tests, re-added one test.
@ifokkema ifokkema marked this pull request as ready for review February 10, 2022 10:42
@ifokkema ifokkema changed the title WIP: Improve lovd_getVariantInfo() and lovd_fixHGVS() Improve lovd_getVariantInfo() and lovd_fixHGVS() Feb 10, 2022
@ifokkema ifokkema merged commit 2d1d16c into master Feb 10, 2022
@ifokkema ifokkema deleted the improve/getVariantInfo branch February 10, 2022 10:46
@ifokkema ifokkema added this to the 3.0 Build 28 milestone Jul 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve lovd_getVariantInfo() and lovd_fixHGVS().
2 participants