Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datetime Extension #80

Merged
merged 1 commit into from
Sep 11, 2024
Merged

Datetime Extension #80

merged 1 commit into from
Sep 11, 2024

Conversation

apg
Copy link
Contributor

@apg apg commented Aug 13, 2024

Proposal for adding a datetime extension, co-authored with @khieta.

Rendered

@emina
Copy link
Contributor

emina commented Aug 13, 2024

Thank you for the carefully thought out RFC!

I agree that having some support for dates/times would be useful, but unfortunately, this proposal is out of bounds of what's analyzable.

Specifically, we cannot support the proposed time representation and operators for two reasons.

First, if we keep everything else the same, then it's not possible to enforce the well-formedness constraints on time components (year, month, day, hours, minutes, seconds) without using quantifiers, which would put the encoding into the undecidable fragment of SMT. This is because Cedar lets you put arbitrary types into sets, and to ensure that every time value in a set is well-formed, we would have to use quantifiers.

Second, even if we were able to ground the quantifiers by imposing restrictions on how times can be used (e.g., time values can't be included in sets, either directly or transitively as part of another value), enforcing and solving these ground time constraints would still be expensive. For each occurrence of a datetime value, we would need to emit an assumption that says if the year field corresponds to a leap year, and if the month is February, then the month has 29 days, otherwise it has 28 days. We would also need to emit an if-then-else condition that covers all the other months too. Note that determining if a given year is a leap year itself requires checking multiple conditions. Specifically, it is not enough to check if the year is divisible by four (see here).

To retain analyzability, we can go the IAM route (see here) and support date comparisons, perhaps with some duration operations as well. The IAM operators are analyzable because we can internally represent time values as Unix epochs, which are unconstrained 64-bit integers, and the IAM operations on this representation reduce to long comparisons. We are free to make the string representation of times richer than what IAM supports as long as the internal representation is longs, and the operators on times are expressible as simple operations on longs.

@apg
Copy link
Contributor Author

apg commented Aug 14, 2024

Thanks, @emina, for taking a look! A few ideas / questions in what follows...

Specifically, we cannot support the proposed time representation and operators for two reasons.

First, if we keep everything else the same, then it's not possible to enforce the well-formedness constraints on time components (year, month, day, hours, minutes, seconds) without using quantifiers, which would put the encoding into the undecidable fragment of SMT. This is because Cedar lets you put arbitrary types into sets, and to ensure that every time value in a set is well-formed, we would have to use quantifiers.

I'm not tied to the proposed representation. I am hopeful we can define an abstract data type with useful, and practical time operations. If using a long instead of a "record like" is the way to get there, than so be it.

Please forgive the ridiculous questions and thoughts I'm replying with, as I'm more or less ignorant in Automated Theorem Proving, SAT, and SMT.

As specified in this document, the datetime constructor is the only way to set the fields of a datetime object. The offset() method does not make an in-place modification, but rather, instantiates a new datetime with an updated set of component values. If the constructor only produces well formed datetime objects, or "errors out" at runtime, then I don't see how this is any different than a long?

Second, even if we were able to ground the quantifiers by imposing restrictions on how times can be used (e.g., time values can't be included in sets, either directly or transitively as part of another value), enforcing and solving these ground time constraints would still be expensive. For each occurrence of a datetime value, we would need to emit an assumption that says if the year field corresponds to a leap year, and if the month is February, then the month has 29 days, otherwise it has 28 days. We would also need to emit an if-then-else condition that covers all the other months too. Note that determining if a given year is a leap year itself requires checking multiple conditions. Specifically, it is not enough to check if the year is divisible by four (see here).

Can we prove these operations under a simpler calendar model, and then let implementations choose the most appropriate calendar model (cough Gregorian) for their implementation? Most general purpose programming languages have robust date/time facilities, so it stands to reason that different implementations of Cedar will adopt them, rather than start from scratch.

I believe that the operations requested in the this API spec are valid under any calendar that has years, months, days, hours, minutes, seconds, and whose measurements (instants in time) form a partial order.

To retain analyzability, we can go the IAM route (see here) and support date comparisons, perhaps with some duration operations as well. The IAM operators are analyzable because we can internally represent time values as Unix epochs, which are unconstrained 64-bit integers, and the IAM operations on this representation reduce to long comparisons. We are free to make the string representation of times richer than what IAM supports as long as the internal representation is longs, and the operators on times are expressible as simple operations on longs.

The current IAM operations are insufficient to express many of the examples in this RFC. Adding additional functions and a duration type could improve the situation, but I think we'll still fall short. There wouldn't be a facility for policies that have periodic decision making abilities (e.g. "everyday between $x and $y"), which we've identified as important for our customers.

@emina
Copy link
Contributor

emina commented Aug 14, 2024

I'll try to clarify some of the concerns around the proposed API and explain better (hopefully :) why the current API is not analyzable. To a first approximation, the best we can do is IAM plus a few duration operations (TBD). Let's dig into why this is the case.

Please forgive the ridiculous questions and thoughts I'm replying with, as I'm more or less ignorant in Automated Theorem Proving, SAT, and SMT.

As specified in this document, the datetime constructor is the only way to set the fields of a datetime object. The offset() method does not make an in-place modification, but rather, instantiates a new datetime with an updated set of component values. If the constructor only produces well formed datetime objects, or "errors out" at runtime, then I don't see how this is any different than a long?

These are great questions :). And your point is exactly right for the runtime implementation of the datetime constructor that we'd have in Lean, Rust, or Go. The issue is the analysis-time encoding of various operators.

To understand the issues here, it helps to have a mental model of how the analysis works. The analyzer is a compiler: it takes as input a set of Cedar policies, a property we want to check about them (e.g., are they equivalent?), and compiles these policies into the input language of the SMT solver. The solver then checks if the property holds and, if not, it returns a concrete input (a request and entities) that violate the property. It helps to think of the solver as testing the encoded policies on the (infinite) set of all possible inputs.

Now, the language that the solvers support is very limited compared to a general purpose language. And the practical subset of this language is even more limited than what the SMT standard suggests, because some operators in the SMT language are either undecidable (the solver can run forever trying to test the policies) or extremely inefficient to reason about (the solver will come back but not in any reasonable time).

So, what's expressible in Cedar is limited to what we can efficiently compile to this core functional language: think, no loops or recursion, machine integers (aka bitvectors), some string operators, some set operators, some map operators (uninterpreted functions), structs, conditionals, and boolean operators.

If you're curious, here are some more details about the compilation process (Section 4).

Given this limited compilation target, we can't actually translate the datetime constructor itself, because this would require writing a parser for datetime strings in the SMT language. This parser is hugely complicated, because it also needs to validate its input, and establish that the input string is a correct date time, enforcing the constraints on leap years, numbers of days in months etc. Even if we could fully unroll all loops in this parser to express it in SMT, the resulting SMT expression would be analyzable in name only :). In practice, there is a limit to how complicated and big of a formula a solver can solve in a reasonable amount of time.

This is why we don't actually encode the parsing of Decimals or IPAddresses either. Note that the strict validator creates a loophole for us to avoid parsing in the solver: it rejects attempts to call the decimal constructor on a non-literal, e.g., decimal(context.someStringRepresentation), and instead asks that arguments to these constructor are string literals. We use the regular Rust/Lean parser to parse these strings, and then turn them into concrete values (bitvectors) representable in SMT. Input (e.g., context) values of these types are represented as bitvector variables (unknowns), and the solver tries to find values for them that violate the desired property.

Ok, so if we can't translate the datetime parser to SMT, how would we enforce the desired constraints on date values? The only way to do is by emitting extra formulas that say the date components are within the right ranges. Think of it as emitting a giant conditional expression that says "if all dates occurrences are well-formed, then the property we care about holds."
Now, we can't emit this expression for a general Cedar policy because of sets: checking that every element of a set is well-formed conceptually requires iteration, which isn't supported by SMT. And even if prohibit putting dates in sets, the resulting well-formedness constraints would still be too complex in practice. This is why we can't support the proposed representation while remaining in the decidable / practical fragment of the SMT language.

Can we prove these operations under a simpler calendar model, and then let implementations choose the most appropriate calendar model (cough Gregorian) for their implementation? Most general purpose programming languages have robust date/time facilities, so it stands to reason that different implementations of Cedar will adopt them, rather than start from scratch.

Unfortunately, not for analysis. We can definitely do this in general (in Lean), but for the analysis to work, we must be able to express each operation in the very limited SMT language.

The current IAM operations are insufficient to express many of the examples in this RFC. Adding additional functions and a duration type could improve the situation, but I think we'll still fall short. There wouldn't be a facility for policies that have periodic decision making abilities (e.g. "everyday between $x and $y"), which we've identified as important for our customers.

I totally understand that a richer set of operators would be desirable. But if a datetime operation cannot be expressed (roughly) as a loop-free expression over integers, it's out of bounds of what's analyzable. The IAM operators have the nice property of being expressible as < and <= over longs. We can push this a bit further than IAM, but not to the point where we can support very complex calculations, like extracting the year or month from an epoch.

I'd be happy to chat more offline (to save us some typing :) about whether there is a middle ground that we can design. I'm hoping we can find a reasonable design that combines IAM++ and some of the alternatives discussed in the RFC.

text/0080-datetime-extension.md Outdated Show resolved Hide resolved
text/0080-datetime-extension.md Outdated Show resolved Hide resolved
@khieta khieta added the pending This RFC is pending; for definitions see the README label Aug 15, 2024
@apg
Copy link
Contributor Author

apg commented Aug 15, 2024

Based on all the feedback so far (Thank you!!!!!), it seems like I should reposition this RFC to be Alt C, with the API discussed in the comments, and millisecond precision.

I'll try to post a rewrite by the end of the week.

@philhassey
Copy link

I agree that now() or currentTime() should be removed. These are better supplied by the caller as part of the context.

text/0080-datetime-extension.md Outdated Show resolved Hide resolved
text/0080-datetime-extension.md Outdated Show resolved Hide resolved
text/0080-datetime-extension.md Outdated Show resolved Hide resolved
text/0080-datetime-extension.md Show resolved Hide resolved
text/0080-datetime-extension.md Show resolved Hide resolved
text/0080-datetime-extension.md Outdated Show resolved Hide resolved
text/0080-datetime-extension.md Outdated Show resolved Hide resolved
text/0080-datetime-extension.md Outdated Show resolved Hide resolved
text/0080-datetime-extension.md Show resolved Hide resolved
@mwhicks1
Copy link
Contributor

I think I like this proposal, but I am looking for more details. Spelling them out may reveal gotchas that are relevant to approval.

For example, I think what's going on is that a datetime and duration are represented as long integers under the covers, like Unix time. This means that evaluation of various operations is pretty simple, and encoding for SMT-based policy analysis is simple, too. But some details are important, e.g., if duration cannot be negative then x-y is really |x-y|. So I'd like to see those spelled out.

I'd also like to see the basic approach to validation spelled out. If you look at the recent proposals for function macros or embedded attribute maps, etc. you'll see these features written out carefully.

To construct and manipulate these types we will provide the functions listed below.
All of this functionality will be hidden behind a `datetime` feature flag (analogous to the current decimal and IP extensions), allowing users to opt-out.

- `datetime(string)` constructs a datetime value. Like with other extension function constructors, strict validation requires `string` to be a string literal, although evaluation/authorization support any string-typed expression. The string must be of one of the forms, and regardless of the timezone offset is always normalized to UTC:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should also include datetime("epoch_start") and datetime("epoch_end")?
A potential usecase would be half-open intervals. E.g., the same shape of policy could be used to express buying and renting. For buying it could be:

permit(principal = User::"Bob", action = Action::"view", resource = Movie::"UnReleasedFilm") when {
    context.curr_time >= resource.release_date && context.curr_time <= datetime("epoch_end")
};

While for renting you'd use the end of the rental period, e.g., context.curr_time <= datetime("2024-08-21T20:07:23").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine to me. Perhaps this is mostly useful when policies are templated, possibly in the context of #61, but why would a policy author include context.curr_time <= datetime("epoch_end") otherwise?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm envisioning templated / programmatically generated policies where it's easier to just give every policy a start and end time. In my example the RHS condition would always true so it would hopefully be dropped if the policies were written by hand.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand the utility here: if you want to time-box context.curr_time, wouldn't it make more sense to give specific values for the bounds rather than using epoch_start and/or epoch_end?

I'm not sure that epoch_start and epoch_end are useful abstractions here: one is in 1970, and the other is the largest representable 64-bit value, the date of which will vary with our supported precision level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the usefulness is only in generated policies, but also agree with @khieta that the user could provide a date far into the future, or far into the past just as easily. I'm 👎 on adding epoch_start and epoch_end.

@apg
Copy link
Contributor Author

apg commented Aug 21, 2024

I'd also like to see the basic approach to validation spelled out. If you look at the recent proposals for function macros or embedded attribute maps, etc. you'll see these features written out carefully.

@khieta can you help with the validation approach?

@apg
Copy link
Contributor Author

apg commented Aug 21, 2024

For example, I think what's going on is that a datetime and duration are represented as long integers under the covers, like Unix time.

Yes, this is detailed in the Detailed Design.

This means that evaluation of various operations is pretty simple, and encoding for SMT-based policy analysis is simple, too. But some details are important, e.g., if duration cannot be negative then x-y is really |x-y|. So I'd like to see those spelled out.

I can explicitly state that durations should be negatable, because I believe they should be. It does seems like that stance is not yet shared amongst everyone.

@apg
Copy link
Contributor Author

apg commented Aug 23, 2024

OK. I think most feedback that I'm aware of has been addressed by either @khieta or myself, and it should be "safe" to reevaluate.

I've also implemented this API in Python to validate that the behavior makes sense. It caused me to realize that we'd have a bug with -, so that's helpful!

@philhassey
Copy link

How does everyone feel about this signature for duration?

duration(string) where string is something like

"5d 3h 2m 1s"

This gives us the ability to define more precise durations in policy without having to add a bunch of them together. It also keeps a single arg (which is convenient.).

It's worth noting that with validation turned on, arguments to extension constructors are meant to be literals (see https://docs.cedarpolicy.com/policies/validation.html#:~:text=Errors%20due%20to%20incorrect,validate%20(regardless%20of%20mode). )

However, by default the validator runs in a strict mode that forbids passing non-literal strings to extension function constructors; in this mode, the expression above will fail to validate with the error extension constructors may not be called with non-literal expressions. An expression like ip("XYZ") in a policy will fail to validate (regardless of mode).

This suggests that there's no room for doing things like this in validated Cedar:
duration(context.expireDays,"days")

And really the better way would be to have context.expireDays just be the exact duration already. Or if you are writing it as policy text, then duration("5d") would be more terse anyways.

Copy link
Contributor

@mwhicks1 mwhicks1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not sold on the use of - for DTs, and I suggested an alternative. I also suggested some small wording/structure adjustments. Overall I think this is looking really good. Well done @khieta and @apg !

text/0080-datetime-extension.md Show resolved Hide resolved
text/0080-datetime-extension.md Outdated Show resolved Hide resolved
text/0080-datetime-extension.md Outdated Show resolved Hide resolved
text/0080-datetime-extension.md Show resolved Hide resolved
@cdisselkoen
Copy link
Contributor

To @philhassey's toplevel comment (sorry, I can't thread, because it wasn't made as a comment on any particular part of the RFC text)

How does everyone feel about this signature for duration?

I like this direction for all of the reasons given, and wonder if there is some standard for specifying durations as a string just like there are ISO standards for specifying datetimes as a string?


### Durations of Time (`duration`)

The `duration(long, string)` function constructs a duration value. The string argument must be one of `"days", "hours", "minutes", "seconds", "milliseconds"`. Strict validation requires `long` and `string` to be literals, although evaluation/authorization support any appropriately-typed expressions. Values of type `duration` have the following methods:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, strict validation doesn’t need to require the long and string in this case to be literals. And that’s because it’s possible to encode the meaning of this constructor to SMT for arbitrary (i.e., unknown) values.

We require arguments to ip and decimal to be literals only because it would be too expensive / infeasible to encode their behavior (parsing) on arbitrary (unknown) strings.

That said, we may want to make these literals anyway for consistency with other operators. And, admittedly, making them literals for the purpose of strict validation would help make the analysis more efficient.

@apg apg force-pushed the datetime-rfc branch 5 times, most recently from 6398ba4 to 20a9376 Compare August 27, 2024 02:07
Copy link
Contributor

@mwhicks1 mwhicks1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggestions for final changes. Looks great!


The current proposal supports milliseconds. Do we want nanoseconds too? Or is just seconds sufficient? The ISO 8601 format does not specify a maximum precision, so we can technically allow any number of `S`s after the `.` in `YYYY-MM-DDThh:mm:ss.SSSZ`. Based on [this blog post](https://nickb.dev/blog/iso8601-and-nanosecond-precision-across-languages/), it appears that Javascript supports milliseconds (3 digits), Python supports microseconds (6 digits), and Rust and Go support nanoseconds (9 digits). Assuming nanosecond accuracy, the maximum (signed) 64-bit number (2^63 - 1) represents April 11, 2262. This date seems far enough out that any of these choices (milliseconds, microseconds, or nanoseconds) seems reasonable.

Note that the backwards compatible option is to stick with milliseconds (or seconds). We can add precision later, but not remove it without a breaking change.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what sense is it backward compatible? The representation (as a long) would change its meaning, which means that data stored in the app would have to change. Are we expecting never to have to store a long (for duration or time) in the app? I was thinking "current time" or various deadlines would be stored that way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current time will need to be passed as a ISO date string with the current proposal. Users should not use the Unix long value directly.

Increasing precision is backwards compatible in the sense that all ISO date strings that were previously accepted will continue to be accepted in the future. But you are right that this isn't truly backwards-compatible: more precision will affect the underlying representation, which will impact when certain operators will overflow. My comment was sweeping this under the rug 🙂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that there isn't a (nice) way to make it backwards compatible for analysis because extending the bitvector representation won't correspond to exactly to N extra decimal digits.

text/0080-datetime-extension.md Outdated Show resolved Hide resolved
text/0080-datetime-extension.md Show resolved Hide resolved
Copy link
Contributor

@cdisselkoen cdisselkoen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the team effort on this RFC.

text/0080-datetime-extension.md Outdated Show resolved Hide resolved
text/0080-datetime-extension.md Outdated Show resolved Hide resolved
@apg
Copy link
Contributor Author

apg commented Aug 28, 2024

@philhassey was questioning the special way to compute durationSince that's listed in the RFC, and it turns out that's completely wrong. I believe I convinced myself that that was necessary by having a bug in my reference implementation. Updating the RFC with that mistake removed.

Copy link
Contributor

@shaobo-he-aws shaobo-he-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Co-developed-by: Kesha Hietala <[email protected]>
Signed-off-by: Andrew Gwozdziewycz <[email protected]>
@khieta
Copy link
Contributor

khieta commented Sep 3, 2024

The final comment period (FCP) for this RFC is starting now, with intent to accept the RFC. The FCP will end 2024-09-10 at noon PT / 3pm ET / 7pm UTC. Please add comments, and especially any objections, if you have any. For more on the RFC process, see https://github.com/cedar-policy/rfcs.

@khieta khieta added final-comment-period This RFC is in its final comment period; for definitions see the README and removed pending This RFC is pending; for definitions see the README labels Sep 3, 2024
@khieta khieta merged commit 1046090 into cedar-policy:main Sep 11, 2024
1 check passed
@khieta khieta added active This RFC is active; for definitions see the README and removed final-comment-period This RFC is in its final comment period; for definitions see the README labels Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
active This RFC is active; for definitions see the README
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants