You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When parsing FASTA headers containing "0" or "99999" as a date value, augur parse fails with an unhelpful error message instead of handling the invalid date more gracefully.
Steps to Reproduce
Create a test FASTA file:
echo -e ">someid|0\nACTG"> test.fa
Run augur parse:
augur parse --sequences test.fa --output-sequences out.fa --output-metadata out.tsv --fields id date --fix-dates dayfirst
Current Behavior
The command fails with error:
ERROR: Invalid date '0-XX-XX': Date does not match format `%Y-%m-%d`.
Expected Behavior
--fix-dates is documented as an attempt to parse non-standard dates that "should be spot-checked". The documented behavior of fix_dates is to return the input string if fixing fails and emit a warning. So it should never crash.
Stacktrace
The stack trace is actually quite interesting, showing how convoluted the current implementation is:
print("WARNING: unable to parse %s as date"%d, e, file=sys.stderr)
returnd
Traceback (most recent call last):
File "/Users/corneliusromer/code/augur_master/augur/parse.py", line 41, in fix_dates
results = parsing.parse_datetime_string_with_reso(d, dayfirst=dayfirst)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: module 'pandas._libs.tslibs.parsing' has no attribute 'parse_datetime_string_with_reso'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "pandas/_libs/tslibs/parsing.pyx", line 440, in pandas._libs.tslibs.parsing.parse_datetime_string_with_reso
File "pandas/_libs/tslibs/parsing.pyx", line 667, in pandas._libs.tslibs.parsing.dateutil_parse
ValueError: day is out of range for month
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/corneliusromer/code/augur_master/augur/parse.py", line 44, in fix_dates
results = parsing.parse_time_string(d, dayfirst=dayfirst)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pandas/_libs/tslibs/parsing.pyx", line 367, in pandas._libs.tslibs.parsing.parse_time_string
File "pandas/_libs/tslibs/parsing.pyx", line 445, in pandas._libs.tslibs.parsing.parse_datetime_string_with_reso
pandas._libs.tslibs.parsing.DateParseError: day is out of range for month
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/corneliusromer/code/augur_master/augur/dates/init.py", line 120, in get_numerical_date_from_value
ambig_date = AmbiguousDate(value, fmt=fmt).range(min_max_year=min_max_year)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/corneliusromer/code/augur_master/augur/dates/ambiguous_date.py", line 48, in init
self.assert_only_less_significant_uncertainty()
File "/Users/corneliusromer/code/augur_master/augur/dates/ambiguous_date.py", line 138, in assert_only_less_significant_uncertainty
if "X" in self.uncertain_date_components["Y"]:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/corneliusromer/code/augur_master/augur/dates/ambiguous_date.py", line 97, in uncertain_date_components
raise InvalidDate(self.uncertain_date,
augur.dates.errors.InvalidDate: Invalid date '0-XX-XX': Date does not match format %Y-%m-%d.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/corneliusromer/code/augur_master/blackbox.py", line 94, in <module>
run_probes()
File "/Users/corneliusromer/code/augur_master/blackbox.py", line 80, in run_probes
result, warnings = probe_date_parser(test_input, dayfirst)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/corneliusromer/code/augur_master/blackbox.py", line 20, in probe_date_parser
result = fix_dates(test_input, dayfirst)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/corneliusromer/code/augur_master/augur/parse.py", line 60, in fix_dates
parsed_date = get_numerical_date_from_value(d, "%Y-%m-%d")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/corneliusromer/code/augur_master/augur/dates/init.py", line 122, in get_numerical_date_from_value
raise AugurError(str(error)) from error
augur.errors.AugurError: Invalid date '0-XX-XX': Date does not match format %Y-%m-%d.
Environment
current master of Augur (26.0.0)
Additional Notes
The documentation states that date parsing is a "brittle process that should be spot-checked", suggesting the tool should be more resilient to invalid date formats rather than failing completely.
I discovered this while trying to migrate fix_dates to be compatible with pandas v2. Given the complexity of the current implementation, I thought I'll start treating it as a blackbox, discover current behavior and then reimplement it with the help of unit tests in a future proof way.
It turns out that there are unhandled edge cases in the current implementation.
The text was updated successfully, but these errors were encountered:
corneliusroemer
changed the title
parse: crash on invalid dates "0", "99999" instead of handling gracefully as promised in docstring
parse: crash on invalid dates "0", "99999" instead of handling gracefully as intended
Nov 10, 2024
Description
When parsing FASTA headers containing "0" or "99999" as a date value, augur parse fails with an unhelpful error message instead of handling the invalid date more gracefully.
Steps to Reproduce
Current Behavior
The command fails with error:
Expected Behavior
--fix-dates
is documented as an attempt to parse non-standard dates that "should be spot-checked". The documented behavior offix_dates
is to return the input string if fixing fails and emit a warning. So it should never crash.Stacktrace
The stack trace is actually quite interesting, showing how convoluted the current implementation is:
augur/augur/parse.py
Lines 28 to 65 in 8731851
Environment
current master of Augur (26.0.0)
Additional Notes
The documentation states that date parsing is a "brittle process that should be spot-checked", suggesting the tool should be more resilient to invalid date formats rather than failing completely.
I discovered this while trying to migrate
fix_dates
to be compatible with pandas v2. Given the complexity of the current implementation, I thought I'll start treating it as a blackbox, discover current behavior and then reimplement it with the help of unit tests in a future proof way.It turns out that there are unhandled edge cases in the current implementation.
The text was updated successfully, but these errors were encountered: