-
Notifications
You must be signed in to change notification settings - Fork 546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserve -00:00
offset
#1042
base: main
Are you sure you want to change the base?
Preserve -00:00
offset
#1042
Conversation
e044499
to
e42b18a
Compare
e42b18a
to
d357688
Compare
d357688
to
d2d5956
Compare
Do you have an actual use case for this? |
Ai, why do you have to ask that 🤣 ? |
As a better reason: we would follow the RFCs more closely, and be able to round-trip these timestamps. |
In general, I would concur we want to follow the RFCs closely. But at the same time, "you find it interesting" to know this doesn't seem like a strong justification for adding ~90 lines of code on net. |
That is fair. Honestly the changes felt small, so i did some counting:
|
src/format/parse.rs
Outdated
@@ -853,7 +855,7 @@ fn test_rfc2822() { | |||
("Tue, 20 Jan 2015 17:35:90 -0800", Err(OUT_OF_RANGE)), // bad second | |||
("Tue, 20 Jan 2015 17:35:20 -0890", Err(OUT_OF_RANGE)), // bad offset | |||
("6 Jun 1944 04:00:00Z", Err(INVALID)), // bad offset (zulu not allowed) | |||
("Tue, 20 Jan 2015 17:35:20 HAS", Err(NOT_ENOUGH)), // bad named time zone | |||
("Tue, 20 Jan 2015 17:35:20 HAS", Err(INVALID)), // bad named time zone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would now report the correct error here.
src/format/parse.rs
Outdated
// named single-letter timezone "J" is specifically not valid | ||
("Tue, 20 Jan 2015 17:35:20 J", Err(NOT_ENOUGH)), | ||
("Tue, 20 Jan 2015 17:35:20 J", Err(INVALID)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very compelled by this standards-following PR. I thought something wasn't quite right about how the various RFC and +00:00
and -00:00
were being handled but I couldn't put my finger on it (and it seemed like a difficult bramblebush to crawl through).
Thanks much @pitdicker , this is a great change!
src/format/parse.rs
Outdated
@@ -853,7 +855,7 @@ fn test_rfc2822() { | |||
("Tue, 20 Jan 2015 17:35:90 -0800", Err(OUT_OF_RANGE)), // bad second | |||
("Tue, 20 Jan 2015 17:35:20 -0890", Err(OUT_OF_RANGE)), // bad offset | |||
("6 Jun 1944 04:00:00Z", Err(INVALID)), // bad offset (zulu not allowed) | |||
("Tue, 20 Jan 2015 17:35:20 HAS", Err(NOT_ENOUGH)), // bad named time zone | |||
("Tue, 20 Jan 2015 17:35:20 HAS", Err(INVALID)), // bad named time zone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add extra asserts
("20 Jan 2015 17:35:20 +0000", Ok("Tue, 20 Jan 2015 17:35:20 +0000")),
("20 Jan 2015 17:35:20 -0001", Ok("Tue, 20 Jan 2015 17:35:20 -0001")),
("Tue, 20 Jan 2015 17:35:20 -9900", Err(OUT_OF_RANGE)), // bad offset
(adjusted to whatever the correct value is)
src/lib.rs
Outdated
|
||
#[test] | ||
#[allow(deprecated)] | ||
fn test_type_sizes() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious if this has been tested on an x86 (32 bit) platform?
IME FreeBSD x86 and OpenBSD in-general is always little weird about sizings of things (often subtle differences from Linux anyway).
If I get around to it, I'll try to compile this PR on a FreeBSD 13 x86 VM.
// of +0:00 and -0:00. | ||
// Advantage of this encoding is that it only takes a single shift right to get offset in | ||
// seconds. | ||
local_minus_utc: NonZeroI32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make this code comment a docstring?
In the docstring, would you tell the user what helper functions they should call to extract various infos? Can you provide those helper functions? Can you have this PR use those bit-shift helper functions?
Even if it's only chrono developers that need to know, (even if those docstrings aren't actually published) IMO it's still worthwhile to write it down in "official-seeming" manner.
Otherwise I might assume I have to do the bit shifts and masks myself... and those are programming approaches I would like everyone to move on from. 😬
On the other hand ...
honestly, I'd prefer to have discrete struct members for each piece of information embedded here. I'm not persuaded the runtime efficiency-savings is worth the mental overhead.
Also, I'm not sure, but I'm concerned it might be a problem on less-common Big Endian platforms (I don't recall where and when Rust gives guarantees around bit endianness).
(IIUC, combining various information into one NonZeroI32
member is only done for a little less memory usage at runtime?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
honestly, I'd prefer to have discrete struct members for each piece of information embedded here. I'm not persuaded the runtime efficiency-savings is worth the mental overhead.
This is somewhat comparable to NaiveDate
, which fits a year, ordinal, leap-year flag and weekday flags in an i32
.
I agree that using individual struct members is usually better. But chrono's DateTime<FixedOffset>
is a pretty fundamental type in the rust ecosystem. It should not increase in size without very good reason.
Also, I'm not sure, but I'm concerned it might be a problem on less-common Big Endian platforms (I don't recall where and when Rust gives guarantees around bit endianness).
In safe Rust, we do not have to worry about this. As soons as we start transmuting it becomes messy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make this code comment a docstring?
In the docstring, would you tell the user what helper functions they should call to extract various infos? Can you provide those helper functions? Can you have this PR use those bit-shift helper functions?
Even if it's only chrono developers that need to know, (even if those docstrings aren't actually published) IMO it's still worthwhile to write it down in "official-seeming" manner.
I would like to keep the implementation details private.
But we definitely have helper functions which this PR uses: local_minus_utc()
and no_offset_info()
.
I'll mention them in a comment:
// Use `local_minus_utc()` to get the offset in seconds, and `no_offset_info()` to inspect the
// `OFFSET_UNKNOWN` flag.
Can you find an appropriate docstring to place this information (especiially the quoted RFC stuff)? This is a great explanation of little finicky area that I suspect quite a few users will run into. |
Interestingly, this was a common case that was causing failures in |
Good idea, I have written an introduction for the |
c31316b
to
e070c8a
Compare
Great if this would help with an existing test 😄. |
60c0f59
to
f6f0495
Compare
f59ed7b
to
d82ca0e
Compare
fd30238
to
5ecc449
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approve still, with suggestion.
a289068
to
af2da5a
Compare
7582a3f
to
efe03c9
Compare
efe03c9
to
68a7c7a
Compare
68a7c7a
to
bdd511e
Compare
bdd511e
to
0f13ac6
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1042 +/- ##
=======================================
Coverage 92.16% 92.16%
=======================================
Files 40 40
Lines 18052 18101 +49
=======================================
+ Hits 16637 16683 +46
- Misses 1415 1418 +3 ☔ View full report in Codecov by Sentry. |
0f13ac6
to
d1cbfea
Compare
d1cbfea
to
9f0ee28
Compare
9f0ee28
to
cc18784
Compare
RFC 3339 and RFC 2822 make a distinction between an offset of
+00:00
and-00:00
.RFC 2822:
From RFC3339:
Currently chrono drops the distinction, but I would like to preserve it.
format/scan.rs
was already written with the functionality of this RFCs in mind, but did not go all the way. It now follows the specs better and can sometimes give better parser errors.Parsed
to preserve this information is a bit messy as all the methods are public. Adding a public constantNO_OFFSET_INFO
seemed the cleanest solution. Parsing doesn't support more than two digits for the offset hours, limiting the values that are passed toParsed::set_offset
to99 * 3600 + 59 * 60 = 359940
. The value ofNO_OFFSET_INFO
is way outside that range, so no chance for conflicts.FixedOffset
can be modified in a reasonably clean way to encode-0
: shift the value 2 bits left and encode this special state as anOFFSET_UNKNOWN
flag in the rightmost bits. Advantage of this encoding is that it only takes a single shift right to get the offset in seconds, so this should not measurably impact performance.NonZeroI32
, as long as we set one of the rightmost bits to something non-null with theOFFSET_NORMAL
flag. This givesFixedOffset
a niche.Option<DateTime<FixedOffset>>
is now the same size asDateTime<FixedOffset>
: 16 bytes instead of 20.-00:00
is treated as equal to+00:00
, which is usually what is expected. Also this way we remain backwards compatible.FixedOffset::no_offset_info
returns whether the offset is-00:00
.FixedOffset::OFFSET_UNKNOWN
can be used to initialize an offset to-00:00
.