-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Datetime Extension #80
Conversation
Thank you for the carefully thought out RFC! I agree that having some support for dates/times would be useful, but unfortunately, this proposal is out of bounds of what's analyzable. Specifically, we cannot support the proposed time representation and operators for two reasons. First, if we keep everything else the same, then it's not possible to enforce the well-formedness constraints on time components (year, month, day, hours, minutes, seconds) without using quantifiers, which would put the encoding into the undecidable fragment of SMT. This is because Cedar lets you put arbitrary types into sets, and to ensure that every time value in a set is well-formed, we would have to use quantifiers. Second, even if we were able to ground the quantifiers by imposing restrictions on how times can be used (e.g., time values can't be included in sets, either directly or transitively as part of another value), enforcing and solving these ground time constraints would still be expensive. For each occurrence of a datetime value, we would need to emit an assumption that says if the year field corresponds to a leap year, and if the month is February, then the month has 29 days, otherwise it has 28 days. We would also need to emit an if-then-else condition that covers all the other months too. Note that determining if a given year is a leap year itself requires checking multiple conditions. Specifically, it is not enough to check if the year is divisible by four (see here). To retain analyzability, we can go the IAM route (see here) and support date comparisons, perhaps with some duration operations as well. The IAM operators are analyzable because we can internally represent time values as Unix epochs, which are unconstrained 64-bit integers, and the IAM operations on this representation reduce to long comparisons. We are free to make the string representation of times richer than what IAM supports as long as the internal representation is longs, and the operators on times are expressible as simple operations on longs. |
Thanks, @emina, for taking a look! A few ideas / questions in what follows...
I'm not tied to the proposed representation. I am hopeful we can define an abstract data type with useful, and practical time operations. If using a long instead of a "record like" is the way to get there, than so be it. Please forgive the ridiculous questions and thoughts I'm replying with, as I'm more or less ignorant in Automated Theorem Proving, SAT, and SMT. As specified in this document, the
Can we prove these operations under a simpler calendar model, and then let implementations choose the most appropriate calendar model (cough Gregorian) for their implementation? Most general purpose programming languages have robust date/time facilities, so it stands to reason that different implementations of Cedar will adopt them, rather than start from scratch. I believe that the operations requested in the this API spec are valid under any calendar that has years, months, days, hours, minutes, seconds, and whose measurements (instants in time) form a partial order.
The current IAM operations are insufficient to express many of the examples in this RFC. Adding additional functions and a duration type could improve the situation, but I think we'll still fall short. There wouldn't be a facility for policies that have periodic decision making abilities (e.g. "everyday between $x and $y"), which we've identified as important for our customers. |
I'll try to clarify some of the concerns around the proposed API and explain better (hopefully :) why the current API is not analyzable. To a first approximation, the best we can do is IAM plus a few duration operations (TBD). Let's dig into why this is the case.
These are great questions :). And your point is exactly right for the runtime implementation of the To understand the issues here, it helps to have a mental model of how the analysis works. The analyzer is a compiler: it takes as input a set of Cedar policies, a property we want to check about them (e.g., are they equivalent?), and compiles these policies into the input language of the SMT solver. The solver then checks if the property holds and, if not, it returns a concrete input (a request and entities) that violate the property. It helps to think of the solver as testing the encoded policies on the (infinite) set of all possible inputs. Now, the language that the solvers support is very limited compared to a general purpose language. And the practical subset of this language is even more limited than what the SMT standard suggests, because some operators in the SMT language are either undecidable (the solver can run forever trying to test the policies) or extremely inefficient to reason about (the solver will come back but not in any reasonable time). So, what's expressible in Cedar is limited to what we can efficiently compile to this core functional language: think, no loops or recursion, machine integers (aka bitvectors), some string operators, some set operators, some map operators (uninterpreted functions), structs, conditionals, and boolean operators. If you're curious, here are some more details about the compilation process (Section 4). Given this limited compilation target, we can't actually translate the This is why we don't actually encode the parsing of Decimals or IPAddresses either. Note that the strict validator creates a loophole for us to avoid parsing in the solver: it rejects attempts to call the decimal constructor on a non-literal, e.g., Ok, so if we can't translate the datetime parser to SMT, how would we enforce the desired constraints on date values? The only way to do is by emitting extra formulas that say the date components are within the right ranges. Think of it as emitting a giant conditional expression that says "if all dates occurrences are well-formed, then the property we care about holds."
Unfortunately, not for analysis. We can definitely do this in general (in Lean), but for the analysis to work, we must be able to express each operation in the very limited SMT language.
I totally understand that a richer set of operators would be desirable. But if a datetime operation cannot be expressed (roughly) as a loop-free expression over integers, it's out of bounds of what's analyzable. The IAM operators have the nice property of being expressible as < and <= over longs. We can push this a bit further than IAM, but not to the point where we can support very complex calculations, like extracting the year or month from an epoch. I'd be happy to chat more offline (to save us some typing :) about whether there is a middle ground that we can design. I'm hoping we can find a reasonable design that combines IAM++ and some of the alternatives discussed in the RFC. |
Based on all the feedback so far (Thank you!!!!!), it seems like I should reposition this RFC to be Alt C, with the API discussed in the comments, and millisecond precision. I'll try to post a rewrite by the end of the week. |
I agree that |
I think I like this proposal, but I am looking for more details. Spelling them out may reveal gotchas that are relevant to approval. For example, I think what's going on is that a I'd also like to see the basic approach to validation spelled out. If you look at the recent proposals for function macros or embedded attribute maps, etc. you'll see these features written out carefully. |
text/0080-datetime-extension.md
Outdated
To construct and manipulate these types we will provide the functions listed below. | ||
All of this functionality will be hidden behind a `datetime` feature flag (analogous to the current decimal and IP extensions), allowing users to opt-out. | ||
|
||
- `datetime(string)` constructs a datetime value. Like with other extension function constructors, strict validation requires `string` to be a string literal, although evaluation/authorization support any string-typed expression. The string must be of one of the forms, and regardless of the timezone offset is always normalized to UTC: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should also include datetime("epoch_start")
and datetime("epoch_end")
?
A potential usecase would be half-open intervals. E.g., the same shape of policy could be used to express buying and renting. For buying it could be:
permit(principal = User::"Bob", action = Action::"view", resource = Movie::"UnReleasedFilm") when {
context.curr_time >= resource.release_date && context.curr_time <= datetime("epoch_end")
};
While for renting you'd use the end of the rental period, e.g., context.curr_time <= datetime("2024-08-21T20:07:23")
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems fine to me. Perhaps this is mostly useful when policies are templated, possibly in the context of #61, but why would a policy author include context.curr_time <= datetime("epoch_end")
otherwise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm envisioning templated / programmatically generated policies where it's easier to just give every policy a start and end time. In my example the RHS condition would always true so it would hopefully be dropped if the policies were written by hand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I understand the utility here: if you want to time-box context.curr_time
, wouldn't it make more sense to give specific values for the bounds rather than using epoch_start
and/or epoch_end
?
I'm not sure that epoch_start
and epoch_end
are useful abstractions here: one is in 1970, and the other is the largest representable 64-bit value, the date of which will vary with our supported precision level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the usefulness is only in generated policies, but also agree with @khieta that the user could provide a date far into the future, or far into the past just as easily. I'm 👎 on adding epoch_start
and epoch_end
.
@khieta can you help with the validation approach? |
Yes, this is detailed in the Detailed Design.
I can explicitly state that durations should be negatable, because I believe they should be. It does seems like that stance is not yet shared amongst everyone. |
OK. I think most feedback that I'm aware of has been addressed by either @khieta or myself, and it should be "safe" to reevaluate. I've also implemented this API in Python to validate that the behavior makes sense. It caused me to realize that we'd have a bug with |
How does everyone feel about this signature for duration?
This gives us the ability to define more precise durations in policy without having to add a bunch of them together. It also keeps a single arg (which is convenient.). It's worth noting that with validation turned on, arguments to extension constructors are meant to be literals (see https://docs.cedarpolicy.com/policies/validation.html#:~:text=Errors%20due%20to%20incorrect,validate%20(regardless%20of%20mode). )
This suggests that there's no room for doing things like this in validated Cedar: And really the better way would be to have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To @philhassey's toplevel comment (sorry, I can't thread, because it wasn't made as a comment on any particular part of the RFC text)
I like this direction for all of the reasons given, and wonder if there is some standard for specifying durations as a string just like there are ISO standards for specifying datetimes as a string? |
text/0080-datetime-extension.md
Outdated
|
||
### Durations of Time (`duration`) | ||
|
||
The `duration(long, string)` function constructs a duration value. The string argument must be one of `"days", "hours", "minutes", "seconds", "milliseconds"`. Strict validation requires `long` and `string` to be literals, although evaluation/authorization support any appropriately-typed expressions. Values of type `duration` have the following methods: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, strict validation doesn’t need to require the long
and string
in this case to be literals. And that’s because it’s possible to encode the meaning of this constructor to SMT for arbitrary (i.e., unknown) values.
We require arguments to ip
and decimal
to be literals only because it would be too expensive / infeasible to encode their behavior (parsing) on arbitrary (unknown) strings.
That said, we may want to make these literals anyway for consistency with other operators. And, admittedly, making them literals for the purpose of strict validation would help make the analysis more efficient.
6398ba4
to
20a9376
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few suggestions for final changes. Looks great!
text/0080-datetime-extension.md
Outdated
|
||
The current proposal supports milliseconds. Do we want nanoseconds too? Or is just seconds sufficient? The ISO 8601 format does not specify a maximum precision, so we can technically allow any number of `S`s after the `.` in `YYYY-MM-DDThh:mm:ss.SSSZ`. Based on [this blog post](https://nickb.dev/blog/iso8601-and-nanosecond-precision-across-languages/), it appears that Javascript supports milliseconds (3 digits), Python supports microseconds (6 digits), and Rust and Go support nanoseconds (9 digits). Assuming nanosecond accuracy, the maximum (signed) 64-bit number (2^63 - 1) represents April 11, 2262. This date seems far enough out that any of these choices (milliseconds, microseconds, or nanoseconds) seems reasonable. | ||
|
||
Note that the backwards compatible option is to stick with milliseconds (or seconds). We can add precision later, but not remove it without a breaking change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what sense is it backward compatible? The representation (as a long
) would change its meaning, which means that data stored in the app would have to change. Are we expecting never to have to store a long
(for duration or time) in the app? I was thinking "current time" or various deadlines would be stored that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current time will need to be passed as a ISO date string with the current proposal. Users should not use the Unix long
value directly.
Increasing precision is backwards compatible in the sense that all ISO date strings that were previously accepted will continue to be accepted in the future. But you are right that this isn't truly backwards-compatible: more precision will affect the underlying representation, which will impact when certain operators will overflow. My comment was sweeping this under the rug 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that there isn't a (nice) way to make it backwards compatible for analysis because extending the bitvector representation won't correspond to exactly to N extra decimal digits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the team effort on this RFC.
@philhassey was questioning the special way to compute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Co-developed-by: Kesha Hietala <[email protected]> Signed-off-by: Andrew Gwozdziewycz <[email protected]>
The final comment period (FCP) for this RFC is starting now, with intent to accept the RFC. The FCP will end 2024-09-10 at noon PT / 3pm ET / 7pm UTC. Please add comments, and especially any objections, if you have any. For more on the RFC process, see https://github.com/cedar-policy/rfcs. |
Proposal for adding a datetime extension, co-authored with @khieta.
Rendered