Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Casting a DATE literal without time part to TIME datatype fails #1657

Open
wants to merge 43 commits into
base: BABEL_3_X_DEV
Choose a base branch
from

Conversation

Suthapalli-Ramya-satya-vasavi-srija
Copy link
Contributor

@Suthapalli-Ramya-satya-vasavi-srija Suthapalli-Ramya-satya-vasavi-srija commented Jul 27, 2023

Description

If different kind of inputs are given to time datatype like <DATE>T<TIME>, a date literal string to a TIME datatype when the time component in the string is not present and inputs are of types YYYYMMDD, YYMMDD and if month is mentioned in text format YYYY MON DD then the PG doesn't allows casting and throw errors.

This commit fixes the following issues

  • Considers format <DATE>T<TIME>[Z] as valid and return expected result.
  • Considers formats YYYY, YYYYMMDD and YYMMDD as valid and return expected result.
  • If month is mentioned in text format considered date formats YYYY MON DD, YYYY DD MON, MON DD YYYY, MON YYYY DD, DD MON YYYY, DD YYYY MON.
  • Accepts Empty String('') to time datatype.
  • Accepts only DATE literal without time part to time datatype. Accepted formats are DD-MON-YYYY, YYYY-MON-DD, MM{-/.}DD{-/.}YYYY, YYYY{-/.}MM{-/.}DD.
  • If timezone is specified then validates the timezone whether correctly mentioned or not i.e., timezone should be of format [+-]\\d{1,2}:\\d{1,2}.
  • If [AP]M specified then made sure that it is present after the time in the input.

Issues Resolved

BABEL-1541, BABEL-1528, BABEL-1539

Test Scenarios Covered

  • Use case based -

    • Added tests whether date of formats YYYY MON DD, YYYY DD MON, MON DD YYYY, MON YYYY DD, DD MON YYYY, DD YYYY MON are accepted.
    • Added tests whether <DATE>T<TIME>[Z] is valid and expected result is sent
    • Added tests whether "[AP]M" is being accepted after time specified in the input
    • Added tests for dates of format YYYY, YYYYMMDD and YYMMDD.
    • Added tests to accept timezone of format [+-]\\d{1,2}:\\d{1,2}
  • Boundary conditions -

    • Added tests for time range by having minutes, seconds>=60 and hours>=24
    • Added tests for dates by giving invalid month/date/year for each date format
  • Arbitrary inputs -

    • Along with the time given different inputs which include [ap]m offset
    • Added tests for [AP]M by giving inputs like "am", "pm"
    • Added test for time by not including min/sec/microseconds like 1::, ::1, :1:, 1.2, 1:2.2
    • Added test for date by not including year/month/day like 1990--, -12-, --12
    • Added tests for format <DATE>T<TIME>[Z] by giving 't' and 'z'
  • Negative test cases -

    • Added tests with invalid month and date tests where date is of formats YYYY MON DD, YYYY DD MON, MON DD YYYY, MON YYYY DD, DD MON YYYY, DD YYYY MON.
    • Added tests for format <DATE>T<TIME>[Z] by separating 'T' and 'Z' from
    • Added tests for inputs with "[AP]M" by adding in-front of time or the input doesn't contain time at all.
    • Added tests for dates of format YYYY, YYYYMMDD and YYMMDD by passing invalid date/month values
    • Added tests for timezone of format [+-]\\d{1,2}:\\d{1,2} by not following pattern like giving "+2", "-8" without having :\\d{1,2} part
  • Minor version upgrade tests - Added the test to minor version schedule files.

  • Major version upgrade tests - N/A

  • Performance tests - N/A

  • Tooling impact - N/A

  • Client tests - N/A

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is under the terms of the Apache 2.0 and PostgreSQL licenses, and grant any person obtaining a copy of the contribution permission to relicense all or a portion of my contribution to the PostgreSQL License solely to contribute all or a portion of my contribution to the PostgreSQL open source project.

For more information on following Developer Certificate of Origin and signing off your commits, please check here.

For time datatype PG considers only date value(ex: '2012-02-23') as bad
i/p format. So above testcase is throwing error from engine code
time_in() much before coming to TDS side. The current time casting
doesn't support i/p having month in text format(ex: '20120-jan-3 2:3')
and whitespaces between the time for casting to different formats.
However, SQL Server allows casting a date literal string to a TIME
datatype when the time component in the string is missing: o/p becomes
00:00:00.0000000. SQL Server also allows whitespaces in between and
supports i/p of month in text when casting to time datatype.

This commit fixes the issue such that date format with missing time info
like above considers as valid input when the dialect is TSQL. Supports
month in text format while casting to time datatype. Supports
whitespaces between the time.

Task: BABEL-1528
Signed-off-by: vasavi suthapalli <[email protected]>
contrib/babelfishpg_tsql/src/tsqlIface.cpp Outdated Show resolved Hide resolved
contrib/babelfishpg_tsql/src/tsqlIface.cpp Outdated Show resolved Hide resolved
contrib/babelfishpg_tsql/src/tsqlIface.cpp Outdated Show resolved Hide resolved
contrib/babelfishpg_tsql/src/hooks.c Outdated Show resolved Hide resolved
contrib/babelfishpg_tsql/src/hooks.c Outdated Show resolved Hide resolved


-- tsql
-- when month is given in nymeric format either date has ['-', '/']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let us add tests case where data is coming from columns (datime, datetime2 and more if supported) rather than string. Also, input is coming after applying expressions ((a+b) cast as ) and going as input to expression (a + (b cast as))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, will add in next revision.

test/JDBC/expected/babel_time-vu-prepare.out Show resolved Hide resolved
contrib/babelfishpg_tsql/src/hooks.c Outdated Show resolved Hide resolved
contrib/babelfishpg_tsql/src/tsqlIface.cpp Outdated Show resolved Hide resolved
test/JDBC/expected/babel_time-vu-verify.out Outdated Show resolved Hide resolved
contrib/babelfishpg_tsql/src/hooks.c Outdated Show resolved Hide resolved
contrib/babelfishpg_tsql/src/hooks.c Outdated Show resolved Hide resolved
contrib/babelfishpg_tsql/src/hooks.c Outdated Show resolved Hide resolved
contrib/babelfishpg_tsql/src/hooks.c Outdated Show resolved Hide resolved
contrib/babelfishpg_tsql/src/hooks.c Outdated Show resolved Hide resolved
contrib/babelfishpg_tsql/src/hooks.c Outdated Show resolved Hide resolved
contrib/babelfishpg_tsql/src/hooks.c Outdated Show resolved Hide resolved
contrib/babelfishpg_tsql/src/hooks.c Outdated Show resolved Hide resolved
contrib/babelfishpg_tsql/src/hooks.c Outdated Show resolved Hide resolved
contrib/babelfishpg_tsql/src/tsqlIface.cpp Outdated Show resolved Hide resolved
test/JDBC/expected/babel_time-vu-verify.out Outdated Show resolved Hide resolved
~~ERROR (Message: Conversion failed when converting date and/or time from character string.)~~


SELECT CAST('11-2000-23' AS TIME)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let us use different dates rather than same value. Also, include corner cases like 1 Jan, feb and dec etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will add edge cases for different types of inputs.

@@ -0,0 +1,925 @@
-- psql
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let us add cases where we have invalid date/time values (too small, too large). Also get date/time from a table column rather than as expression. Add cases where expression comes after applying some operator like case, +/- etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will add that in next revision.

contrib/babelfishpg_tsql/src/hooks.c Outdated Show resolved Hide resolved
contrib/babelfishpg_tsql/src/hooks.c Outdated Show resolved Hide resolved
contrib/babelfishpg_tsql/src/hooks.c Outdated Show resolved Hide resolved
contrib/babelfishpg_tsql/src/hooks.c Outdated Show resolved Hide resolved
contrib/babelfishpg_tsql/src/hooks.c Outdated Show resolved Hide resolved
contrib/babelfishpg_tsql/src/hooks.c Outdated Show resolved Hide resolved
*/
for(int k = 0 ; k < 3; k++)
{
temp_field = strtok(temp_field, different_date_formats[k]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we not overwriting temp_field again without freeing earlier memory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because when tried to pfree resulting in crash.

contrib/babelfishpg_tsql/src/hooks.c Outdated Show resolved Hide resolved
contrib/babelfishpg_tsql/src/hooks.c Outdated Show resolved Hide resolved
*/
if(ftype[0] == DTK_DATE)
{
ftype[1] = DTK_TIME;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we always have ftype/fields allocated irrespective of number of fields (nfs)? Here we have nf =1 but we are assuming that f*[1] is also available

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we are initializing the fields/ftype with max count at initialization. Later when we go deep into the code the field/ftype values are checked based on the nf count.

contrib/babelfishpg_tsql/src/hooks.c Outdated Show resolved Hide resolved
contrib/babelfishpg_tsql/src/hooks.c Outdated Show resolved Hide resolved
contrib/babelfishpg_tsql/src/hooks.c Outdated Show resolved Hide resolved
contrib/babelfishpg_tsql/src/hooks.c Outdated Show resolved Hide resolved
contrib/babelfishpg_tsql/src/hooks.c Outdated Show resolved Hide resolved
contrib/babelfishpg_tsql/src/hooks.c Outdated Show resolved Hide resolved
contrib/babelfishpg_tsql/src/hooks.c Outdated Show resolved Hide resolved
*/
if (ftype[1] == DTK_STRING && pg_strcasecmp(field[1], "t") == 0 && ftype[2] == DTK_TIME)
{
/* If the input contains "t" then throw error */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is confusing, why would you throw error if input contains "t"? or is it that a T in input is converted to "t" in field[1]? also why are we only checking for "t", PG support both t/T and SQL server only T

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While parsing all the [A-Z] will be converted to [a-z], so the fields will always contain [a-z]. But for the input format "T" the "T" should always be in caps. So we are checking whether the original str contains "T" or "t", if "t" is present then we are throwing the error. All of this done only after checking that the str is of forma "T".

if (regcomp(&time_regex, pattern, REG_EXTENDED) != 0)
ereport(ERROR,
(errcode(ERRCODE_INTERNAL_ERROR),
errmsg("time format regex failed")));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should just say time format internal error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about, we just send the same message as conversion failed?
Like merge both the regcomp and regexec in the same if condition.

* `Mon{/.-}yyyy{/.-}dd` or `Mon{/.-}dd{/.-}yyyy then
* an error should be thrown
*/
if (isTextMonthPresent(temp_field))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this possible, we are putting Text month at position 2 (in 1 to 3)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are putting the Text Month to middle only if the input contains date parts separated like "DD MON YYYY" but earlier didn't put the "MON-DD-YYYY" the pattern which doesn't even fall under category of "MON DD YYYY". So separate handling has been done.

contrib/babelfishpg_tsql/src/hooks.c Outdated Show resolved Hide resolved
contrib/babelfishpg_tsql/src/hooks.c Outdated Show resolved Hide resolved
contrib/babelfishpg_common/src/datetime.c Outdated Show resolved Hide resolved
contrib/babelfishpg_common/src/datetime.c Outdated Show resolved Hide resolved
test/JDBC/upgrade/latest/schedule Outdated Show resolved Hide resolved
*/
while (i < len_str)
{
if (str[i] == ' ' && (i == (len_str - 1) || str[i+1] == ' ' || str[i+1] == ':' || str[i+1] == '/' || str[i+1] == '.' || str[i+1] == '-'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how it will convert value 2 : 1 to 2:1, wouldn't it be 2: 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition else if (str[i] == ':' || str[i] == '/' || str[i] == '.' || str[i] == '-') will convert.
Explanation for 2: 1 her after coming to position 1 i.e., ate ":" will add the current value and ignores spaces if present after ":".

nf = nf - 2;
for(int i = 1; i < nf; i++)
{
field[i] = field[i+2];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we free last two fields if possible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't be possible as the fields are not initialized using palloc so freeing the fields will result in crash.

contrib/babelfishpg_common/src/datetime.c Outdated Show resolved Hide resolved
*/
else
{
for (int k = 0 ; k < 3; k++)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this loop if we are interested in first field only?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field can be of pattern like {"Mon{/.-}yyyy{/.-}dd", "Mon{/.-}dd{/.-}yyyy", "DD{/.-}MM{/.-}YYYY", "DD{/.-}YYYY{/.-}MM", "DD-MON-YYYY", "YYYY-MON-DD", "MM{-/.}DD{-/.}YYYY", "YYYY{-/.}MM{-/.}DD"}. It will be hard to come to conclusion of what can be the first part of field before any of [/.-]. That's why this for loop will get the part before first occurance of [/.-]. Which will be helpful in furthur analysis.

* If the modified_str is of format "<DATE>T<TIME>", then the
* <TIME> field should be of format hh:mm:ss[.sss]
* where hh should definitely be 2 digit.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments are not clear, first one seems to indicate that only sss is optional but pattern says whole mm:ss.sss is optional

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the pattern might look like mm:ss.sss is optional but in general it is not optional. To confirm the pattern validation added tests for those in JDBC test file.

return false;

/* Throw a common error message while casting to time datatype */
#define TIME_IN_ERROR() \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have we made sure that we have a test case for each code path which can throw a time_in_error in this function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tests in the JDBC test file, where we get TIME_IN_ERROR() cases i.e., negative cases for each type.

Copy link
Contributor

@rishabhtanwar29 rishabhtanwar29 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, we have covered all the tests cases mentioned in the test plan which has been reviewed by DBEs.

@rishabhtanwar29 rishabhtanwar29 self-requested a review January 18, 2024 13:25
@coveralls
Copy link
Collaborator

coveralls commented Jan 23, 2024

Pull Request Test Coverage Report for Build 7635999089

Warning: This coverage report may be inaccurate.

We've detected an issue with your CI configuration that might affect the accuracy of this pull request's coverage report.
To ensure accuracy in future PRs, please see these guidelines.
A quick fix for this PR: rebase it; your next report should be accurate.

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.07%) to 72.512%

Totals Coverage Status
Change from base Build 7616106904: 0.07%
Covered Lines: 40324
Relevant Lines: 55610

💛 - Coveralls

/* Checks whether the field is valid text month */
static bool isTextMonthPresent(char* field)
{
char* months[] = {"january", "february", "march", "april", "may",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should test one non-english language/collation to make sure that all text inputs are handling different languages properly.

StringInfo res;
regex_t time_regex;
char *pattern, *temp_field;
int len_str = strlen(str);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let us check for NULL str and return false

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually don't need to separately check for NULL and return false. The SQL Server accepts NULL string as input

-- SQL Server
1> select cast('' as time)
2> go
                      
----------------------
      00:00:00.0000000

This particular case also been handled.

"november", "december", "jan", "feb", "mar", "apr",
"may", "jun", "jul", "aug", "sep", "oct", "nov", "dec"};
for (int i = 0; i < 24; i++)
if (pg_strcasecmp(field, months[i]) == 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are doing case compare, is it guaranteed that input will always be in lower case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is guaranteed that the field will always have the lower case because while parsing all the letters will be converted to lower case

*/
if (i == nf - 1 || ftype[i+1] != DTK_STRING)
TIME_IN_ERROR();
if (pg_strcasecmp(field[i+1], "am") == 0 || pg_strcasecmp(field[i+1], "pm") == 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure that value is always in lower case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While parsing in the fields all the letters will be converted to lower case.

for (int k = 0; k < len; k++)
{
appendStringInfo(res, "%c", temp_field[k]);
if ((len == 6 && ((k + 1) % 2 == 0 && k < 5)) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just say (k == 1 or k == 3)?

* If "am/pm" is present in the input then the previous field cannot be 'z'.
* For example "1 pm Z" is valid but "1 Z pm" is invalid.
*/
if(i-1 >= 0 && ftype[i-1] == DTK_STRING && pg_strcasecmp(field[i-1], "z") == 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for single field, you can compare characters directly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a little help in understanding.

  1. Are you suggesting not to use pg_strcasecmp?
  2. Or we don't need that many(i-1>=0, field[i-1] == DTK_STRING) checks here?

* For example error will be thrown if the field is "[+-]02",
* and no error will be thrown if field is "[+-]02:00"
*/
pattern = "([0-9]?[0-9]:[0-9]?[0-9])$";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where are we validating that only +/- are before offset? on PG side?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in the function ParseDateTime the field is determined as DTK_TZ only if the input contains {+-} in-front.

* If there are no errors while parsing the modified_str then
* validate the fields based on their types.
*/
if (dterr == 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a big if block, can we break relevant functionality into separate smaller functions

case 0:
case 1:
/*
* If only date is specified or if the input is empt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empty

if (ftype[1] == DTK_STRING && pg_strcasecmp(field[1], "t") == 0 && ftype[2] == DTK_TIME)
{
/*
* Since the modified_str is of format "<DATE>T<TIME>" and if modified_str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not clear

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants