-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Casting a DATE literal without time part to TIME datatype fails #1657
base: BABEL_3_X_DEV
Are you sure you want to change the base?
Casting a DATE literal without time part to TIME datatype fails #1657
Conversation
For time datatype PG considers only date value(ex: '2012-02-23') as bad i/p format. So above testcase is throwing error from engine code time_in() much before coming to TDS side. The current time casting doesn't support i/p having month in text format(ex: '20120-jan-3 2:3') and whitespaces between the time for casting to different formats. However, SQL Server allows casting a date literal string to a TIME datatype when the time component in the string is missing: o/p becomes 00:00:00.0000000. SQL Server also allows whitespaces in between and supports i/p of month in text when casting to time datatype. This commit fixes the issue such that date format with missing time info like above considers as valid input when the dialect is TSQL. Supports month in text format while casting to time datatype. Supports whitespaces between the time. Task: BABEL-1528 Signed-off-by: vasavi suthapalli <[email protected]>
|
||
|
||
-- tsql | ||
-- when month is given in nymeric format either date has ['-', '/'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let us add tests case where data is coming from columns (datime, datetime2 and more if supported) rather than string. Also, input is coming after applying expressions ((a+b) cast as ) and going as input to expression (a + (b cast as))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, will add in next revision.
~~ERROR (Message: Conversion failed when converting date and/or time from character string.)~~ | ||
|
||
|
||
SELECT CAST('11-2000-23' AS TIME) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let us use different dates rather than same value. Also, include corner cases like 1 Jan, feb and dec etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will add edge cases for different types of inputs.
@@ -0,0 +1,925 @@ | |||
-- psql |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let us add cases where we have invalid date/time values (too small, too large). Also get date/time from a table column rather than as expression. Add cases where expression comes after applying some operator like case, +/- etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will add that in next revision.
contrib/babelfishpg_tsql/src/hooks.c
Outdated
*/ | ||
for(int k = 0 ; k < 3; k++) | ||
{ | ||
temp_field = strtok(temp_field, different_date_formats[k]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we not overwriting temp_field again without freeing earlier memory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because when tried to pfree resulting in crash.
contrib/babelfishpg_tsql/src/hooks.c
Outdated
*/ | ||
if(ftype[0] == DTK_DATE) | ||
{ | ||
ftype[1] = DTK_TIME; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we always have ftype/fields allocated irrespective of number of fields (nfs)? Here we have nf =1 but we are assuming that f*[1] is also available
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we are initializing the fields/ftype
with max count at initialization. Later when we go deep into the code the field/ftype
values are checked based on the nf
count.
contrib/babelfishpg_tsql/src/hooks.c
Outdated
*/ | ||
if (ftype[1] == DTK_STRING && pg_strcasecmp(field[1], "t") == 0 && ftype[2] == DTK_TIME) | ||
{ | ||
/* If the input contains "t" then throw error */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is confusing, why would you throw error if input contains "t"? or is it that a T in input is converted to "t" in field[1]? also why are we only checking for "t", PG support both t/T and SQL server only T
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While parsing all the [A-Z] will be converted to [a-z], so the fields will always contain [a-z]. But for the input format "T" the "T" should always be in caps. So we are checking whether the original str contains "T" or "t", if "t" is present then we are throwing the error. All of this done only after checking that the str is of forma "T".
contrib/babelfishpg_tsql/src/hooks.c
Outdated
if (regcomp(&time_regex, pattern, REG_EXTENDED) != 0) | ||
ereport(ERROR, | ||
(errcode(ERRCODE_INTERNAL_ERROR), | ||
errmsg("time format regex failed"))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should just say time format internal error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about, we just send the same message as conversion failed?
Like merge both the regcomp and regexec in the same if condition.
contrib/babelfishpg_tsql/src/hooks.c
Outdated
* `Mon{/.-}yyyy{/.-}dd` or `Mon{/.-}dd{/.-}yyyy then | ||
* an error should be thrown | ||
*/ | ||
if (isTextMonthPresent(temp_field)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is this possible, we are putting Text month at position 2 (in 1 to 3)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are putting the Text Month to middle only if the input contains date parts separated like "DD MON YYYY" but earlier didn't put the "MON-DD-YYYY" the pattern which doesn't even fall under category of "MON DD YYYY". So separate handling has been done.
…ore detailed explanation
… which was failing earlier
*/ | ||
while (i < len_str) | ||
{ | ||
if (str[i] == ' ' && (i == (len_str - 1) || str[i+1] == ' ' || str[i+1] == ':' || str[i+1] == '/' || str[i+1] == '.' || str[i+1] == '-')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how it will convert value 2 : 1 to 2:1, wouldn't it be 2: 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition else if (str[i] == ':' || str[i] == '/' || str[i] == '.' || str[i] == '-')
will convert.
Explanation for 2: 1
her after coming to position 1 i.e., ate ":" will add the current value and ignores spaces if present after ":".
nf = nf - 2; | ||
for(int i = 1; i < nf; i++) | ||
{ | ||
field[i] = field[i+2]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we free last two fields if possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It won't be possible as the fields are not initialized using palloc so freeing the fields will result in crash.
*/ | ||
else | ||
{ | ||
for (int k = 0 ; k < 3; k++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this loop if we are interested in first field only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The field can be of pattern like {"Mon{/.-}yyyy{/.-}dd", "Mon{/.-}dd{/.-}yyyy", "DD{/.-}MM{/.-}YYYY", "DD{/.-}YYYY{/.-}MM", "DD-MON-YYYY", "YYYY-MON-DD", "MM{-/.}DD{-/.}YYYY", "YYYY{-/.}MM{-/.}DD"}. It will be hard to come to conclusion of what can be the first part of field before any of [/.-]. That's why this for loop will get the part before first occurance of [/.-]. Which will be helpful in furthur analysis.
* If the modified_str is of format "<DATE>T<TIME>", then the | ||
* <TIME> field should be of format hh:mm:ss[.sss] | ||
* where hh should definitely be 2 digit. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments are not clear, first one seems to indicate that only sss is optional but pattern says whole mm:ss.sss is optional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the pattern might look like mm:ss.sss is optional but in general it is not optional. To confirm the pattern validation added tests for those in JDBC test file.
return false; | ||
|
||
/* Throw a common error message while casting to time datatype */ | ||
#define TIME_IN_ERROR() \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have we made sure that we have a test case for each code path which can throw a time_in_error in this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added tests in the JDBC test file, where we get TIME_IN_ERROR() cases i.e., negative cases for each type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, we have covered all the tests cases mentioned in the test plan which has been reviewed by DBEs.
Pull Request Test Coverage Report for Build 7635999089Warning: This coverage report may be inaccurate.We've detected an issue with your CI configuration that might affect the accuracy of this pull request's coverage report.
💛 - Coveralls |
…s by changing the regex patterns
…verage is not fulfilled and modified comments
/* Checks whether the field is valid text month */ | ||
static bool isTextMonthPresent(char* field) | ||
{ | ||
char* months[] = {"january", "february", "march", "april", "may", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should test one non-english language/collation to make sure that all text inputs are handling different languages properly.
StringInfo res; | ||
regex_t time_regex; | ||
char *pattern, *temp_field; | ||
int len_str = strlen(str); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let us check for NULL str and return false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually don't need to separately check for NULL and return false. The SQL Server accepts NULL string as input
-- SQL Server
1> select cast('' as time)
2> go
----------------------
00:00:00.0000000
This particular case also been handled.
"november", "december", "jan", "feb", "mar", "apr", | ||
"may", "jun", "jul", "aug", "sep", "oct", "nov", "dec"}; | ||
for (int i = 0; i < 24; i++) | ||
if (pg_strcasecmp(field, months[i]) == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are doing case compare, is it guaranteed that input will always be in lower case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is guaranteed that the field will always have the lower case because while parsing all the letters will be converted to lower case
*/ | ||
if (i == nf - 1 || ftype[i+1] != DTK_STRING) | ||
TIME_IN_ERROR(); | ||
if (pg_strcasecmp(field[i+1], "am") == 0 || pg_strcasecmp(field[i+1], "pm") == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure that value is always in lower case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While parsing in the fields all the letters will be converted to lower case.
for (int k = 0; k < len; k++) | ||
{ | ||
appendStringInfo(res, "%c", temp_field[k]); | ||
if ((len == 6 && ((k + 1) % 2 == 0 && k < 5)) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just say (k == 1 or k == 3)?
* If "am/pm" is present in the input then the previous field cannot be 'z'. | ||
* For example "1 pm Z" is valid but "1 Z pm" is invalid. | ||
*/ | ||
if(i-1 >= 0 && ftype[i-1] == DTK_STRING && pg_strcasecmp(field[i-1], "z") == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for single field, you can compare characters directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need a little help in understanding.
- Are you suggesting not to use
pg_strcasecmp
? - Or we don't need that many(
i-1>=0
,field[i-1] == DTK_STRING
) checks here?
* For example error will be thrown if the field is "[+-]02", | ||
* and no error will be thrown if field is "[+-]02:00" | ||
*/ | ||
pattern = "([0-9]?[0-9]:[0-9]?[0-9])$"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where are we validating that only +/- are before offset? on PG side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in the function ParseDateTime
the field is determined as DTK_TZ
only if the input contains {+-} in-front.
* If there are no errors while parsing the modified_str then | ||
* validate the fields based on their types. | ||
*/ | ||
if (dterr == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a big if block, can we break relevant functionality into separate smaller functions
case 0: | ||
case 1: | ||
/* | ||
* If only date is specified or if the input is empt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
empty
if (ftype[1] == DTK_STRING && pg_strcasecmp(field[1], "t") == 0 && ftype[2] == DTK_TIME) | ||
{ | ||
/* | ||
* Since the modified_str is of format "<DATE>T<TIME>" and if modified_str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not clear
Description
If different kind of inputs are given to
time
datatype like<DATE>T<TIME>
, a date literal string to a TIME datatype when the time component in the string is not present and inputs are of typesYYYYMMDD
,YYMMDD
and if month is mentioned in text formatYYYY MON DD
then the PG doesn't allows casting and throw errors.This commit fixes the following issues
<DATE>T<TIME>[Z]
as valid and return expected result.YYYY
,YYYYMMDD
andYYMMDD
as valid and return expected result.YYYY MON DD
,YYYY DD MON
,MON DD YYYY
,MON YYYY DD
,DD MON YYYY
,DD YYYY MON
.DD-MON-YYYY
,YYYY-MON-DD
,MM{-/.}DD{-/.}YYYY
,YYYY{-/.}MM{-/.}DD
.[+-]\\d{1,2}:\\d{1,2}
.[AP]M
specified then made sure that it is present after the time in the input.Issues Resolved
BABEL-1541, BABEL-1528, BABEL-1539
Test Scenarios Covered
Use case based -
YYYY MON DD
,YYYY DD MON
,MON DD YYYY
,MON YYYY DD
,DD MON YYYY
,DD YYYY MON
are accepted.<DATE>T<TIME>[Z]
is valid and expected result is sentYYYY
,YYYYMMDD
andYYMMDD
.[+-]\\d{1,2}:\\d{1,2}
Boundary conditions -
Arbitrary inputs -
[ap]m
offset[AP]M
by giving inputs like "am", "pm"<DATE>T<TIME>[Z]
by giving 't' and 'z'Negative test cases -
YYYY MON DD
,YYYY DD MON
,MON DD YYYY
,MON YYYY DD
,DD MON YYYY
,DD YYYY MON
.<DATE>T<TIME>[Z]
by separating 'T' and 'Z' fromYYYY
,YYYYMMDD
andYYMMDD
by passing invalid date/month values[+-]\\d{1,2}:\\d{1,2}
by not following pattern like giving "+2", "-8" without having:\\d{1,2}
partMinor version upgrade tests - Added the test to minor version schedule files.
Major version upgrade tests - N/A
Performance tests - N/A
Tooling impact - N/A
Client tests - N/A
Check List
By submitting this pull request, I confirm that my contribution is under the terms of the Apache 2.0 and PostgreSQL licenses, and grant any person obtaining a copy of the contribution permission to relicense all or a portion of my contribution to the PostgreSQL License solely to contribute all or a portion of my contribution to the PostgreSQL open source project.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.