-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix date parsing regression introduced in 2018 #6009
Fix date parsing regression introduced in 2018 #6009
Comments
I agree this change was not very helpful, I feel like it was mostly motivated by an interest in upgrading to newer Java APIs without much consideration of the user's requirements. As a user, I see more options than "interpreting in local time" or "interpreting in UTC". There is a third option, which is "interpreting as a date(time) without clear timezone info". I think that third option should be the default behaviour for a data cleaning tool, especially for dates (where timezones are even more rarely specified). This phrasing of "interpreting" might also be a bit abstract and hard to relate to concrete user workflows, so it's worth trying to pin it down in concrete terms. For instance, when loading a CSV file with datatype auto-detection:
I want that the year, month and day components of the dates to be exactly what they were in the source file, and I want them to stay the same afterwards, even if I created the project while being in the UTC+2 timezone and re-opened it later while being in the UTC-5 timezone. I want those dates to be preserved in OpenRefine and also if I export the project to any format. I would expect the same behaviour if parsing datetime objects with no timezone attached (such as In my opinion, this is different from interpreting date(time) objects in the local timezone, because if we do so, then the rendering of those objects should change depending on that local timezone (to preserve the interpretation). |
I made this comment on the forum as well but repeating here:
|
OK, I clearly should have used a better title. Hopefully the updated title is clearer. This is only about fixing the regression, not adding new functionality. The only question in my mind is whether fixing the regression is worse than leaving it as is.
Sure, but that's a significant chunk of new functionality to implement, and making it the default would not be backwards compatible. We have #533, which I reopened last year, to cover fixing cases where extra information is being unnecessarily added/derived.
But changing to the new APIs added no technical benefit OR user benefit and, additionally, was done incorrectly making the new code more confusing because it misuses the APIs. It's a sad waste of everyone's time. |
Tom, I understand your frustration but I think there is no point in escalating the blame on this contributor who has long left the project by now. A lot of his contributions were made in a personal capacity. Let's stay focused on the possible solutions to the problem and keep the discussion constructive. |
Any opinions on my question?
|
I don't know exactly what you mean by "fixing the regression". In your first message you propose to "interpret in local time" but I am not sure I understand what you mean exactly by that, which is why I tried to phrase the problem differently in my first reply above. I think there are multiple approaches, they could each be better or worse than the status quo:
In any case the move should be motivated by concrete use cases where we'd be able to validate the improvement from a user perspective. |
Users are currently not aware of what is happening most of the time with their datetimes. We should make them aware then give them the opportunity at import to make a choice. Maybe a dialog? The smart default should be local time and we should inform the user of this at import. I have never supported keeping hidden any transformations OpenRefine does, and that's the current situation that should be fixed. So can OpenRefine inform the user at import? |
Historically, backward compatibility was a core value for the project. That is the user-related motivation for this. This regression broke that backward compatibility, but now we have 5 years of product in the field with the broken behavior (vs 8 years prior), so it's a damned if you do, damned if you don't situation.
This doesn't have anything to do with Java implementation classes, but rather user visible behavior. The simplest way to understand it is to look at the old and new test behavior: OpenRefine/main/tests/server/src/com/google/refine/expr/functions/ToDateTests.java Lines 94 to 101 in b7db27c
Fixing the regression would mean uncommenting the first test and adjusting the implementation code so that the test passes.
If you mean the "Attempt to parse cell text into numbers" import option, it doesn't have anything to do with dates. The affected code is the I'm not proposing (at least here) to:
I guess maybe I should just put up a PR for people to review to make things more concrete. |
As you say @tfmorris "damned if you do, damned if you don't" . Given this my vote is:
|
I agree with @ostephens above on next steps for this issue. |
@ostephens @thadguidry I'm counting on you to follow through with issues/documentation that implement your proposed solution. I've dropped my work on a code based solution. |
@tfmorris @ostephens Please review my new PR which adds an admonition to our docs in both GREL Reference for Dates and the Dates user guide. Hopefully I've captured the provenance well enough, and noted one benefit for users. |
The current behavior of interpreting date strings without a timezone as being in UTC was a breaking change introduced in May 2018 without any announcement (as far as I can tell). The fact the tests were changed should have been a red flag that something bad was going on.
I first discovered this in 2020 and added a ToDo about it to the relevant tests and mentioned it in the comments to #3027, but apparently didn't create an issue, so remedying that now.
Current Results
All dates and datetime strings are interpreted as being at UTC when parsed.
Expected Behavior
I believe we should go back to the historical and, to me, more logical behavior of interpreting dates as being in the local timezone rather than as being at UTC.
Versions
OpenRefine <=2.7 - correct
OpenRefine 3.0+ - broken
Additional context
This is a second breaking change to undo the first breaking change, which is less than ideal, but I believe it's also more logical to the user to have things interpreted in the local timezone.
The text was updated successfully, but these errors were encountered: