From 60156be8038a08102b24759960bb7a275c957c27 Mon Sep 17 00:00:00 2001 From: Aaron Eline Date: Thu, 21 Mar 2024 08:59:09 -0400 Subject: [PATCH 01/17] RFC 61 Signed-off-by: Aaron Eline --- text/0061-functions.md | 294 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 294 insertions(+) create mode 100644 text/0061-functions.md diff --git a/text/0061-functions.md b/text/0061-functions.md new file mode 100644 index 00000000..524d8917 --- /dev/null +++ b/text/0061-functions.md @@ -0,0 +1,294 @@ +# User-defined Functions + +## Related Issues and PRs + +* [Semantic Versioning Issue](https://github.com/cedar-policy/cedar/issues/637) +* [RFC 58 - Standard Library](https://github.com/cedar-policy/rfcs/pull/58) + +## Timeline + +* Started: 2024-03-20 + +## Summary + +This RFC proposes to support user-defined functions in Cedar. Cedar functions provide a lightweight mechanism for users to create abstractions in their policies, aiding readability and reducing the chance for errors. Cedar functions have restrictions to ensure termination and efficiency, maintain the feasibility of validation and analysis, and help ensure policies are readable. + +### Basic Example + +Looking at the linked issue, the *semantic versioning* (SemVer) use case is a perfect use-case for functions. SemVer can be trivially encoded within Cedar’s existing expression language. + +``` +// Permit access when the api is greater than 2.1 +permit (principal, action == Action::"someAction", resource) +when { + resource.apiVersion.major > 2 || + (resource.apiVersion.major == 2 && resource.apiVersion.minor >= 1) +}; +``` + +Here it is instead represented with functions + +``` +function semver(major, minor, long) { + { major : major, minor : minor, patch : patch } +}; + +// lhs > rhs ? +function semverGT(lhs, rhs) { + if lhs.major == rhs.major then + if lhs.minor == rhs.minor then + lhs.patch > rhs.patch + else + lhs.minor > rhs.minor + else + lhs.major > rhs.major +}; + +// Permit access when the api is greater than 2.1 +permit (principal, action == Action::"someAction", resource) +when { + semverGT(resource.apiVersion, semver(2,1,0)) +}; +``` + +For simplicity, safety, and readability, Cedar functions cannot call other functions, and cannot take functions as arguments. + +## Motivation + +Cedar currently lacks mechanisms for users to build abstractions. The only existing mechanism is extension functions, which are insufficient for two reasons: + +1. They are extremely heavyweight, requiring modifying the source of the Cedar evaluator. This means that users who want to stay on official versions of Cedar have no choice but to attempt to submit a PR and get it accepted into the mainline. This process does not scale. + 1. For data structures that are relatively standard (ex: SemVer, or OID Users as proposed in [RFC 58](https://github.com/cedar-policy/rfcs/blob/cdisselkoen/standard-library/text/0058-standard-library.md)), it’s hard to know what’s in-demand enough to be included, and how to balance that against avoiding bloat. There’s no way to naturally observe usage because the only way to “install” the extension pre-acceptance is to vend a modified version of Cedar. + 2. Users may have data structures that are totally bespoke to their systems. It makes no sense to include these in the standard Cedar distribution at all, yet users may still want some way to build abstractions. +2. They are too powerful. Extensions are implemented via arbitrary Rust code, which is essential for encoding features that cannot be represented via Cedar expressions (such as IP Addresses), but opens the door for a wide range of bugs/design issues. It’s trivial to design an extension that is difficult to validate and/or logically encode for analysis. Problematically, extension functions can potentially exhibit non-determinism, non-termination, or non-linear performance; interact with the operating system; or violate memory safety. This raises the code review burden when considering an extension function's implementation. + +In contrast, functions written as simple abstractions over Cedar expressions, which themselves cannot call other Cedar functions, have none of these problems. They naturally inherit the properties of Cedar expressions. Cedar functions are guaranteed to terminate and be deterministic. Since they are compositions of Cedar expressions, it’s easy to validate and analyze them. + +## Detailed Design + +### Function declarations + +This RFC adds a new top level form to Cedar policysets: the function declaration. + +A Cedar function declaration is composed of three elements: + +1. A name, which is a valid (possibly namespaced) identifier. +2. A list of parameters, each of which is a non-namespaced identifier. +3. A body, which is a Cedar expression. + 1. The body may contain (non-namespaced) variables, drawn from the function's parameters and Cedar's global variables. + 2. The body may not contain calls to other functions. + + +Structurally, a declaration is written like this: + +``` +function name(param1, param2) { + body +}; +``` +Standard Cedar variables (`principal`, `action`, `resource`, and `context`) are not considered bound with the body. +Use of an unbound variable in the body is a syntax error. +A parameter list may not declare the same variable twice, and may not list any standard Cedar variables. +An unused variable is a syntax warning. +Function and variable names share the same namesapce, with standard lexical scoping. +Inside of a function, any function application (see below) that does not resolve to an extension function or built-in operation is an error. In other words, Cedar functions are not permitted to call other Cedar functions. + +### Function applications (a.k.a. function calls) + +A function application has the same syntax as an extension function constructor application. In particular, a function application is composed of two elements: + +1. The function name (potentially namespaced) +2. A comma separated list of arguments + +Here is an example: + +``` +foo(1,2, principal.name) +``` + +Function arguments are eagerly evaluated (i.e., call-by-value), as with extension functions today. +Functions do not exist at run-time -- they cannot be stored in entity attributes, requests, etc. As such, referencing a function by its name, without calling it, is a syntax error. +Arguments for each of a function's declared parameters, and no more, must be provided at the call, or it's a syntax error. +Other errors (such as type errors or overflow) are detected at runtime, or via validation. +Examples: +``` +function foo(a, b) { + a + b +}; + +permit(principal,action,resource) when { + foo + 1 // Parse error +}; +permit(principal,action,resource) when { + foo(1) // Parse error +}; +permit(principal,action,resource) when { + foo(1, 2) // No error +}; +permit(principal,action,resource) when { + foo(1, "hello") // Run time type error +}; +permit(principal,action,resource) when { + foo(1, "hello", principal) // Parse error +}; +permit(principal,action,resource) when { + bar(1, "hello", principal) // Parse error +}; +``` + +### Namespacing/scoping + +All functions in a policyset are in scope for all policies, i.e., function declarations do not need to lexically precede use. +Cedar policies and function bodies are lexically scoped. +`principal`/`action`/`resource`/ `context` are considered to be more tightly bound then function names. +This means that while you could name a function `principal`, you could never call it. (This should probably be a validator warning) +Function name conflicts at the top level are a parse error. +Function names may shadow extension functions (results in a warning). + +### Formal semantics +This RFC adds one new evaluation rule to Cedar: +If $f$ is the name of a declared function, _def_($f$) is the body of the definition, and $p_1, ..., p_n$ is the list of parameters in the definition: +$f(v_1, ..., v_n) \rightarrow$ _def_$(f) [p_1 \mapsto v_1, ..., p_n \mapsto v_n]$ +Where $e[x \mapsto v]$ means to substitute the $v$ for $x$ in $e$, as usual. + +### Formal grammar +The grammar of Cedar expressions is unchanged, as we re-use the existing call form. +``` +function ::= 'function' Path '(' Params? ')' '{' Expr '}' ';' +Params ::= Ident (',' Ident)? ','? +``` +Note the `Params` non-terminal allows trailing commas in parameter lists. + +### Validation + +The validator typechecks functions at use-sites via inlining. +This means that functions can be polymorphic. For example, one could write the following, where `eq` is used in the first instance with a pair of entities, and in the second instance with a pair of strings. +``` +function eq(a, b) { + a == b +}; + +permit(principal, action, resource) +when { + eq(principal,resource.owner) || + eq(principal.org,"Admin") +}; +``` + +## Drawbacks + +### Redundancy +Cedar functions can only accomplish things that can already accomplished with Cedar expressions. +This means we are not expanding the expressive power of Cedar in any way. +We are also adding more than one way to accomplish the same task. +Bringing back our SemVer example: +``` +// Permit access when the api is greater than 2.1 +permit (principal, action == Action::"someAction", resource) +when { + resource.apiVersion.major > 2 || + (resource.apiVersion.major == 2 && resource.apiVersion.minor >= 1) +}; +``` +This policy does accomplish the user's goal of encoding the SemVer relationship. +The problem is readability. A reader of this policy has to work out that it is implementing the standard semantic version comparison operator, and not some bespoke versioning scheme. Another problem is maintainability. If you have multiple policies that reason about SemVers, you have to repeat the logic, inline, in each policy. This violates basic software engineering tenets (Don’t Repeat Yourself), allowing bugs to sneak in via typos or botched copy/paste. + +### No custom parsing or error-handling +Extension functions provide more full-featured constructors through custom parsing and error handling, but Cedar functions provide no such facilities. This may make them harder to read, write, and understand. + +For example, you could encode a decimal number as a `Long`, and then make Cedar functions to construct and compare decimals: +``` +function mydecimal(i,f) { + if f >= 0 && f <= 9999 then + i * 10000 + f // will overflow if i too big + else + 18446744073709551615 + 1 // fraction f too big: induce overflow +}; +function mydecimalLTE(d,e) { + d <= e // d and e are just Long numbers +} +``` +These functions basically implement the equivalent of Cedar `decimal` numbers. But the approach has at least two drawbacks. + +First, if `i` and/or `f` are outside the allowed range, you will get a Cedar overflow exception at run-time. This exception is not as illuminating as the custom error emitted by the `decimal` extension function (whose message will be `"Too many digits"`). +Moreover, custom errors from constructor parameter validity checks can be emitted during validation when using extension functions, but not when using Cedar functions. + +Second, there is no special parsing for Cedar functions. With Cedar's built-in `decimal`, you can write `decimal("123.12")` which more directly conveys the number being represented than does `mydecimal(123,12)`. + +Of course, these drawbacks do not necessarily speak against Cedar functions generally, but suggest that for suitably general use-cases (like decimal numbers!), an extension function might be warranted instead. + +### Readability: Policies are no longer standalone +A policy can no longer be read by itself, it has to be read in the context of all function definitions it uses. +Policies that use a large number of functions may be hard to read. + +## Alternatives + +### Type annotations +We could require/allow Cedar functions to have type annotations, taking type definitions from either the schema or allowing them to be inline. + +Example: + +Schema: +``` +type SemVer = { major : Long, minor : Long, patch : Long }; +``` +Policy: +``` +function semver(major : Long, minor : Long, patch : Long) -> Semver { + { major : major, minor : minor, patch : patch } +}; +``` + +This would allow functions to typechecked in absence of a use-site, and allow for easier user specification of intent. +It would also probably result in clearer type checker error messages. + +This introduces the following questions: + +1. Do we allow `type` declaration in policies, or just in schemas? +2. Are type annotations enforced dynamically à la Contracts, or are they just ignored at runtime? + 1. If they are dynamically enforced, that implies access to the schema to unfold type definitions. It also may introduce redundant type checking. +3. Are type annotations required or optional? +4. Will we have types to support generics, i.e., polymorphism? + + +Leaving them out for this RFC only precludes only one future design decision, making type annotations required. +This decision feels unlikely, as we want Cedar to be useful without using a schema/the validator. +Adding _optional_ type annotations in the future is backwards, but mandating type annotations may not be, due to the use of polymorphic functions. + +### Let functions call other functions +As long as cycles are forbidden and functions as arguments are disallowed, we could allow functions to call other functions without sacrificing termination. +However, the potential complexity explosion is high, and it's backwards compatible to add this later. + + + +### Naming +Should these really be called `function`s? They are actually `macro`s. + +### Different namespaces for functions and variables +In the style of a Lisp-2, we could have different namespaces for functions are variables. +Assuming for the minute that functions-calling-functions is allowed: +``` +function foo() { + ... +}; + +function bar(foo) { + foo(1) // No longer an error. `foo` is looked up in the function namespace not the variable namespace +}; + +function bar(foo) { + foo(foo) // Also fine +} + +``` +We argue against this on principle of least surprise; very few modern languages work like this. + +## Unresolved Questions + +1. Do any functions ship with Cedar? If so, which ones? Leaving that out of this RFC and proposing it's handled equivalently to [RFC 58](https://github.com/cedar-policy/rfcs/pull/58) +2. Can you import functions? Leaving that for a future RFC. From 628a87f0e87d420cfd6b60cad0e6bb84c994f2ff Mon Sep 17 00:00:00 2001 From: Aaron Eline Date: Thu, 21 Mar 2024 09:21:01 -0400 Subject: [PATCH 02/17] Couple corrections Signed-off-by: Aaron Eline --- text/0061-functions.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/text/0061-functions.md b/text/0061-functions.md index 524d8917..98ac2a7d 100644 --- a/text/0061-functions.md +++ b/text/0061-functions.md @@ -75,7 +75,7 @@ A Cedar function declaration is composed of three elements: 1. A name, which is a valid (possibly namespaced) identifier. 2. A list of parameters, each of which is a non-namespaced identifier. 3. A body, which is a Cedar expression. - 1. The body may contain (non-namespaced) variables, drawn from the function's parameters and Cedar's global variables. + 1. The body may contain (non-namespaced) variables, drawn from the function's parameters. 2. The body may not contain calls to other functions. @@ -86,7 +86,8 @@ function name(param1, param2) { body }; ``` -Standard Cedar variables (`principal`, `action`, `resource`, and `context`) are not considered bound with the body. +Standard Cedar variables (`principal`, `action`, `resource`, and `context`) are *not* considered bound with the body. +(Following the principal of macro hygiene) Use of an unbound variable in the body is a syntax error. A parameter list may not declare the same variable twice, and may not list any standard Cedar variables. An unused variable is a syntax warning. @@ -253,7 +254,7 @@ This introduces the following questions: Leaving them out for this RFC only precludes only one future design decision, making type annotations required. This decision feels unlikely, as we want Cedar to be useful without using a schema/the validator. -Adding _optional_ type annotations in the future is backwards, but mandating type annotations may not be, due to the use of polymorphic functions. +Adding _optional_ type annotations in the future is backwards compatible, but mandating type annotations may not be, due to the use of polymorphic functions. ### Let functions call other functions As long as cycles are forbidden and functions as arguments are disallowed, we could allow functions to call other functions without sacrificing termination. From f891ba3396209c4e789b9a05e9d6f9ee44235639 Mon Sep 17 00:00:00 2001 From: Aaron Eline Date: Thu, 21 Mar 2024 13:04:55 -0400 Subject: [PATCH 03/17] Adds CBV section to alts Signed-off-by: Aaron Eline --- text/0061-functions.md | 37 ++++++++++++++++++++++++++++++------- 1 file changed, 30 insertions(+), 7 deletions(-) diff --git a/text/0061-functions.md b/text/0061-functions.md index 98ac2a7d..47fe55bd 100644 --- a/text/0061-functions.md +++ b/text/0061-functions.md @@ -107,7 +107,9 @@ Here is an example: foo(1,2, principal.name) ``` -Function arguments are eagerly evaluated (i.e., call-by-value), as with extension functions today. +Function arguments are lazily evaluated (i.e., call-by-name), as opposed to extension functions today. +Call-by-name is required to support inlining correctly in the presence of errors. +(Without errors, call-by-name and call-by-value are equivalent in Cedar) Functions do not exist at run-time -- they cannot be stored in entity attributes, requests, etc. As such, referencing a function by its name, without calling it, is a syntax error. Arguments for each of a function's declared parameters, and no more, must be provided at the call, or it's a syntax error. Other errors (such as type errors or overflow) are detected at runtime, or via validation. @@ -149,8 +151,8 @@ Function names may shadow extension functions (results in a warning). ### Formal semantics This RFC adds one new evaluation rule to Cedar: If $f$ is the name of a declared function, _def_($f$) is the body of the definition, and $p_1, ..., p_n$ is the list of parameters in the definition: -$f(v_1, ..., v_n) \rightarrow$ _def_$(f) [p_1 \mapsto v_1, ..., p_n \mapsto v_n]$ -Where $e[x \mapsto v]$ means to substitute the $v$ for $x$ in $e$, as usual. +$f(e_1, ..., e_n) \rightarrow$ _def_$(f) [p_1 \mapsto e_1, ..., p_n \mapsto e_n]$ +Where $e[x \mapsto e']$ means to substitute $e'$ for $x$ in $e$, as usual. ### Formal grammar The grammar of Cedar expressions is unchanged, as we re-use the existing call form. @@ -160,7 +162,7 @@ Params ::= Ident (',' Ident)? ','? ``` Note the `Params` non-terminal allows trailing commas in parameter lists. -### Validation +### Validation and Analysis The validator typechecks functions at use-sites via inlining. This means that functions can be polymorphic. For example, one could write the following, where `eq` is used in the first instance with a pair of entities, and in the second instance with a pair of strings. @@ -176,6 +178,8 @@ when { }; ``` +Likewise, any static analysis tools would work via inlining. + ## Drawbacks ### Redundancy @@ -243,18 +247,37 @@ function semver(major : Long, minor : Long, patch : Long) -> Semver { This would allow functions to typechecked in absence of a use-site, and allow for easier user specification of intent. It would also probably result in clearer type checker error messages. +If we allowed `type` declarations in policy set files, it would just be one file: + +``` +type Semver = { major : Long, minor : Long, patch : Long }; +function semver(major : Long, minor : Long, patch : Long) -> Semver { + { major : major, minor : minor, patch : patch } +}; +``` + This introduces the following questions: -1. Do we allow `type` declaration in policies, or just in schemas? -2. Are type annotations enforced dynamically à la Contracts, or are they just ignored at runtime? +1. Do we allow `type` declarations allowed in policy sets, or just in schemas? +2. Are type annotations on functions enforced dynamically à la Contracts, or are they just ignored at runtime? 1. If they are dynamically enforced, that implies access to the schema to unfold type definitions. It also may introduce redundant type checking. 3. Are type annotations required or optional? 4. Will we have types to support generics, i.e., polymorphism? +5. Will type annotations have support for parametric polymorphism/generics? Leaving them out for this RFC only precludes only one future design decision, making type annotations required. This decision feels unlikely, as we want Cedar to be useful without using a schema/the validator. -Adding _optional_ type annotations in the future is backwards compatible, but mandating type annotations may not be, due to the use of polymorphic functions. +Adding _optional_ type annotations in the future is backwards compatible, but mandating type annotations will not be. +It will at minimum require a syntax change to add the annotations, and +may be impossible due to the use of polymorphic functions. + +### Call By Value +Functions could be call-by-value instead of call-by-name. +In general, this would follow principal of least surprise. +Extension functions are call-by-value, and few popular languages have call-by-name constructs. +The simplicity of validation and analysis and pluses for CBN, but the big problem is enabling inlining for execution. + ### Let functions call other functions As long as cycles are forbidden and functions as arguments are disallowed, we could allow functions to call other functions without sacrificing termination. From 53574891024f23535cf0321dbe2c90c16cc11a8b Mon Sep 17 00:00:00 2001 From: Aaron Eline Date: Thu, 21 Mar 2024 13:14:13 -0400 Subject: [PATCH 04/17] Update text/0061-functions.md Co-authored-by: Craig Disselkoen --- text/0061-functions.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/text/0061-functions.md b/text/0061-functions.md index 47fe55bd..d52f55fa 100644 --- a/text/0061-functions.md +++ b/text/0061-functions.md @@ -86,8 +86,7 @@ function name(param1, param2) { body }; ``` -Standard Cedar variables (`principal`, `action`, `resource`, and `context`) are *not* considered bound with the body. -(Following the principal of macro hygiene) +Standard Cedar variables (`principal`, `action`, `resource`, and `context`) are *not* considered bound within the body (following the principle of macro hygiene). Use of an unbound variable in the body is a syntax error. A parameter list may not declare the same variable twice, and may not list any standard Cedar variables. An unused variable is a syntax warning. From 96ed47aca4860caa0aee078dbd614d23db5fc038 Mon Sep 17 00:00:00 2001 From: Aaron Eline Date: Thu, 21 Mar 2024 14:27:26 -0400 Subject: [PATCH 05/17] Clarity around macro-ness Signed-off-by: Aaron Eline --- text/0061-functions.md | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/text/0061-functions.md b/text/0061-functions.md index d52f55fa..d12fb0d4 100644 --- a/text/0061-functions.md +++ b/text/0061-functions.md @@ -11,7 +11,10 @@ ## Summary -This RFC proposes to support user-defined functions in Cedar. Cedar functions provide a lightweight mechanism for users to create abstractions in their policies, aiding readability and reducing the chance for errors. Cedar functions have restrictions to ensure termination and efficiency, maintain the feasibility of validation and analysis, and help ensure policies are readable. +This RFC proposes to support user-defined function-like macros in Cedar. +We call these Cedar Functions. +Cedar Functions provide a lightweight mechanism for users to create abstractions in their policies, aiding readability and reducing the chance for errors. +Cedar Functions have restrictions to ensure termination and efficiency, maintain the feasibility of validation and analysis, and help ensure policies are readable. ### Basic Example @@ -147,9 +150,11 @@ This means that while you could name a function `principal`, you could never cal Function name conflicts at the top level are a parse error. Function names may shadow extension functions (results in a warning). -### Formal semantics -This RFC adds one new evaluation rule to Cedar: -If $f$ is the name of a declared function, _def_($f$) is the body of the definition, and $p_1, ..., p_n$ is the list of parameters in the definition: +### Formal semantics/Desugaring rules +This RFC does not add any evaluation rules to Cedar, as functions can be completely desugared. +Dusugaring proceeds from the innermost function call to avoid hygiene issues. +If $f$ is the name of a declared function, _def_($f$) is the body of the definition, and $p_1, ..., p_n$ is the list of parameters in the definition. +Let $e_1, ..., e_n$ be a list of Cedar expression that do not contain and Cedar Function calls: $f(e_1, ..., e_n) \rightarrow$ _def_$(f) [p_1 \mapsto e_1, ..., p_n \mapsto e_n]$ Where $e[x \mapsto e']$ means to substitute $e'$ for $x$ in $e$, as usual. @@ -290,7 +295,7 @@ Make (any subset of) the following errors runtime errors instead of parse errors 3. Function application with incorrect arity --> ### Naming -Should these really be called `function`s? They are actually `macro`s. +Should these really be called `function`s? They are actually `macro`s. `snippet`? ### Different namespaces for functions and variables In the style of a Lisp-2, we could have different namespaces for functions are variables. From 2c0925ed80a7f0b68665946e8fc91211170f7b5e Mon Sep 17 00:00:00 2001 From: Aaron Eline Date: Thu, 21 Mar 2024 14:28:25 -0400 Subject: [PATCH 06/17] typo Signed-off-by: Aaron Eline --- text/0061-functions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0061-functions.md b/text/0061-functions.md index d12fb0d4..b0dbc6b7 100644 --- a/text/0061-functions.md +++ b/text/0061-functions.md @@ -154,7 +154,7 @@ Function names may shadow extension functions (results in a warning). This RFC does not add any evaluation rules to Cedar, as functions can be completely desugared. Dusugaring proceeds from the innermost function call to avoid hygiene issues. If $f$ is the name of a declared function, _def_($f$) is the body of the definition, and $p_1, ..., p_n$ is the list of parameters in the definition. -Let $e_1, ..., e_n$ be a list of Cedar expression that do not contain and Cedar Function calls: +Let $e_1, ..., e_n$ be a list of Cedar expression that do not contain any Cedar Function calls: $f(e_1, ..., e_n) \rightarrow$ _def_$(f) [p_1 \mapsto e_1, ..., p_n \mapsto e_n]$ Where $e[x \mapsto e']$ means to substitute $e'$ for $x$ in $e$, as usual. From abf28ef93802b9b5802a4514242447c1343d5c3c Mon Sep 17 00:00:00 2001 From: Aaron Eline Date: Fri, 22 Mar 2024 10:51:02 -0400 Subject: [PATCH 07/17] Change the syntax of functions Signed-off-by: Aaron Eline --- text/0061-functions.md | 93 +++++++++++++++++------------------------- 1 file changed, 37 insertions(+), 56 deletions(-) diff --git a/text/0061-functions.md b/text/0061-functions.md index b0dbc6b7..1096894d 100644 --- a/text/0061-functions.md +++ b/text/0061-functions.md @@ -15,6 +15,7 @@ This RFC proposes to support user-defined function-like macros in Cedar. We call these Cedar Functions. Cedar Functions provide a lightweight mechanism for users to create abstractions in their policies, aiding readability and reducing the chance for errors. Cedar Functions have restrictions to ensure termination and efficiency, maintain the feasibility of validation and analysis, and help ensure policies are readable. +An important implementation note is that Cedar policies may be implemented purely through desugaring. ### Basic Example @@ -32,20 +33,20 @@ when { Here it is instead represented with functions ``` -function semver(major, minor, long) { - { major : major, minor : minor, patch : patch } -}; +def semver(?major, ?minor, ?long) + { major : ?major, minor : ?minor, patch : ?patch } +; // lhs > rhs ? -function semverGT(lhs, rhs) { - if lhs.major == rhs.major then - if lhs.minor == rhs.minor then - lhs.patch > rhs.patch +def semverGT(?lhs, ?rhs) + if ?lhs.major == ?rhs.major then + if ?lhs.minor == ?rhs.minor then + ?lhs.patch > ?rhs.patch else - lhs.minor > rhs.minor + ?lhs.minor > ?rhs.minor else - lhs.major > rhs.major -}; + ?lhs.major > ?rhs.major +; // Permit access when the api is greater than 2.1 permit (principal, action == Action::"someAction", resource) @@ -76,20 +77,19 @@ This RFC adds a new top level form to Cedar policysets: the function declaration A Cedar function declaration is composed of three elements: 1. A name, which is a valid (possibly namespaced) identifier. -2. A list of parameters, each of which is a non-namespaced identifier. +2. A list of parameters, each of which is a non-namespaced identifier preceded by a `?`. 3. A body, which is a Cedar expression. 1. The body may contain (non-namespaced) variables, drawn from the function's parameters. 2. The body may not contain calls to other functions. + 3. The body may contain functions calls to builtin Cedar functions/methods or extensions calls. (ex: Things like `.contains()` and `ip("10.10.10.10")` are fine) + 4. Standard Cedar variables (`principal`, `action`, `resource`, and `context`) are *not* considered bound within the body (following the principle of macro hygiene). Structurally, a declaration is written like this: ``` -function name(param1, param2) { - body -}; +def name(?param1, ?param2) body ; ``` -Standard Cedar variables (`principal`, `action`, `resource`, and `context`) are *not* considered bound within the body (following the principle of macro hygiene). Use of an unbound variable in the body is a syntax error. A parameter list may not declare the same variable twice, and may not list any standard Cedar variables. An unused variable is a syntax warning. @@ -101,12 +101,14 @@ Inside of a function, any function application (see below) that does not resolve A function application has the same syntax as an extension function constructor application. In particular, a function application is composed of two elements: 1. The function name (potentially namespaced) -2. A comma separated list of arguments +2. A comma separated list of arguments, which are arbitrary cedar expressions. -Here is an example: +Here are some examples: ``` foo(1,2, principal.name) +bar(1 + 1, resource.owner in principal) +baz(some_other_function(3)) ``` Function arguments are lazily evaluated (i.e., call-by-name), as opposed to extension functions today. @@ -117,9 +119,7 @@ Arguments for each of a function's declared parameters, and no more, must be pro Other errors (such as type errors or overflow) are detected at runtime, or via validation. Examples: ``` -function foo(a, b) { - a + b -}; +def foo(?a, ?b) ?a + ?b; permit(principal,action,resource) when { foo + 1 // Parse error @@ -161,8 +161,9 @@ Where $e[x \mapsto e']$ means to substitute $e'$ for $x$ in $e$, as usual. ### Formal grammar The grammar of Cedar expressions is unchanged, as we re-use the existing call form. ``` -function ::= 'function' Path '(' Params? ')' '{' Expr '}' ';' -Params ::= Ident (',' Ident)? ','? +function ::= 'def' Path '(' Params? ')' Expr ';' +Params ::= ParamIdent (',' ParamIdent)? ','? +ParamIdent ::= '?' IDENT // These are equivalent to the production rule for template slots ``` Note the `Params` non-terminal allows trailing commas in parameter lists. @@ -171,9 +172,7 @@ Note the `Params` non-terminal allows trailing commas in parameter lists. The validator typechecks functions at use-sites via inlining. This means that functions can be polymorphic. For example, one could write the following, where `eq` is used in the first instance with a pair of entities, and in the second instance with a pair of strings. ``` -function eq(a, b) { - a == b -}; +def eq(?a, ?b) ?a == ?b ; permit(principal, action, resource) when { @@ -207,14 +206,15 @@ Extension functions provide more full-featured constructors through custom parsi For example, you could encode a decimal number as a `Long`, and then make Cedar functions to construct and compare decimals: ``` -function mydecimal(i,f) { - if f >= 0 && f <= 9999 then - i * 10000 + f // will overflow if i too big +def mydecimal(?i,?f) + if ?f >= 0 && ?f <= 9999 then + ?i * 10000 + ?f // will overflow if i too big else 18446744073709551615 + 1 // fraction f too big: induce overflow -}; -function mydecimalLTE(d,e) { - d <= e // d and e are just Long numbers +; + +def mydecimalLTE(?d,?e) { + ?d <= ?e // d and e are just Long numbers } ``` These functions basically implement the equivalent of Cedar `decimal` numbers. But the approach has at least two drawbacks. @@ -243,9 +243,9 @@ type SemVer = { major : Long, minor : Long, patch : Long }; ``` Policy: ``` -function semver(major : Long, minor : Long, patch : Long) -> Semver { - { major : major, minor : minor, patch : patch } -}; +def semver(?major : Long, ?minor : Long, ?patch : Long) -> Semver + { major : ?major, minor : ?minor, patch : ?patch } +; ``` This would allow functions to typechecked in absence of a use-site, and allow for easier user specification of intent. @@ -255,9 +255,9 @@ If we allowed `type` declarations in policy set files, it would just be one file ``` type Semver = { major : Long, minor : Long, patch : Long }; -function semver(major : Long, minor : Long, patch : Long) -> Semver { - { major : major, minor : minor, patch : patch } -}; +def semver(?major : Long, ?minor : Long, ?patch : Long) -> Semver + major : ?major, minor : ?minor, patch : ?patch +; ``` This introduces the following questions: @@ -297,25 +297,6 @@ Make (any subset of) the following errors runtime errors instead of parse errors ### Naming Should these really be called `function`s? They are actually `macro`s. `snippet`? -### Different namespaces for functions and variables -In the style of a Lisp-2, we could have different namespaces for functions are variables. -Assuming for the minute that functions-calling-functions is allowed: -``` -function foo() { - ... -}; - -function bar(foo) { - foo(1) // No longer an error. `foo` is looked up in the function namespace not the variable namespace -}; - -function bar(foo) { - foo(foo) // Also fine -} - -``` -We argue against this on principle of least surprise; very few modern languages work like this. - ## Unresolved Questions 1. Do any functions ship with Cedar? If so, which ones? Leaving that out of this RFC and proposing it's handled equivalently to [RFC 58](https://github.com/cedar-policy/rfcs/pull/58) From 4c50a6796ec915053c8b56da19830e9ce330b674 Mon Sep 17 00:00:00 2001 From: Aaron Eline Date: Fri, 22 Mar 2024 11:26:18 -0400 Subject: [PATCH 08/17] Renamed to "function macros" for clarity --- text/0061-functions.md | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/text/0061-functions.md b/text/0061-functions.md index 1096894d..55be3435 100644 --- a/text/0061-functions.md +++ b/text/0061-functions.md @@ -1,4 +1,4 @@ -# User-defined Functions +# User-defined Function Macros ## Related Issues and PRs @@ -12,9 +12,9 @@ ## Summary This RFC proposes to support user-defined function-like macros in Cedar. -We call these Cedar Functions. -Cedar Functions provide a lightweight mechanism for users to create abstractions in their policies, aiding readability and reducing the chance for errors. -Cedar Functions have restrictions to ensure termination and efficiency, maintain the feasibility of validation and analysis, and help ensure policies are readable. +We call these Cedar Function Macros. +Cedar Function Macros provide a lightweight mechanism for users to create abstractions in their policies, aiding readability and reducing the chance for errors. +Cedar Function Macros have restrictions to ensure termination and efficiency, maintain the feasibility of validation and analysis, and help ensure policies are readable. An important implementation note is that Cedar policies may be implemented purely through desugaring. ### Basic Example @@ -66,15 +66,15 @@ Cedar currently lacks mechanisms for users to build abstractions. The only exist 2. Users may have data structures that are totally bespoke to their systems. It makes no sense to include these in the standard Cedar distribution at all, yet users may still want some way to build abstractions. 2. They are too powerful. Extensions are implemented via arbitrary Rust code, which is essential for encoding features that cannot be represented via Cedar expressions (such as IP Addresses), but opens the door for a wide range of bugs/design issues. It’s trivial to design an extension that is difficult to validate and/or logically encode for analysis. Problematically, extension functions can potentially exhibit non-determinism, non-termination, or non-linear performance; interact with the operating system; or violate memory safety. This raises the code review burden when considering an extension function's implementation. -In contrast, functions written as simple abstractions over Cedar expressions, which themselves cannot call other Cedar functions, have none of these problems. They naturally inherit the properties of Cedar expressions. Cedar functions are guaranteed to terminate and be deterministic. Since they are compositions of Cedar expressions, it’s easy to validate and analyze them. +In contrast, function macros written as simple abstractions over Cedar expressions, which themselves cannot call other Cedar functions, have none of these problems. They naturally inherit the properties of Cedar expressions. Cedar functions are guaranteed to terminate and be deterministic. Since they are compositions of Cedar expressions, it’s easy to validate and analyze them. ## Detailed Design -### Function declarations +### Function Macro declarations -This RFC adds a new top level form to Cedar policysets: the function declaration. +This RFC adds a new top level form to Cedar policysets: the function macro declaration. -A Cedar function declaration is composed of three elements: +A Cedar function macro declaration is composed of three elements: 1. A name, which is a valid (possibly namespaced) identifier. 2. A list of parameters, each of which is a non-namespaced identifier preceded by a `?`. @@ -96,9 +96,9 @@ An unused variable is a syntax warning. Function and variable names share the same namesapce, with standard lexical scoping. Inside of a function, any function application (see below) that does not resolve to an extension function or built-in operation is an error. In other words, Cedar functions are not permitted to call other Cedar functions. -### Function applications (a.k.a. function calls) +### Function macro applications (a.k.a. function calls) -A function application has the same syntax as an extension function constructor application. In particular, a function application is composed of two elements: +A function macro application has the same syntax as an extension function constructor application. In particular, a function application is composed of two elements: 1. The function name (potentially namespaced) 2. A comma separated list of arguments, which are arbitrary cedar expressions. From f84b58a3963d7ebffe341a2d1a21e290023e5498 Mon Sep 17 00:00:00 2001 From: Mike Hicks Date: Fri, 22 Mar 2024 11:38:30 -0400 Subject: [PATCH 09/17] Added more rationale for CBN; simplification for naming --- text/0061-functions.md | 49 ++++++++++++++++++++++++++---------------- 1 file changed, 31 insertions(+), 18 deletions(-) diff --git a/text/0061-functions.md b/text/0061-functions.md index 55be3435..bdda50e8 100644 --- a/text/0061-functions.md +++ b/text/0061-functions.md @@ -106,14 +106,14 @@ A function macro application has the same syntax as an extension function constr Here are some examples: ``` -foo(1,2, principal.name) +foo(1, 2, principal.name) bar(1 + 1, resource.owner in principal) baz(some_other_function(3)) ``` Function arguments are lazily evaluated (i.e., call-by-name), as opposed to extension functions today. Call-by-name is required to support inlining correctly in the presence of errors. -(Without errors, call-by-name and call-by-value are equivalent in Cedar) +(Without errors, call-by-name and call-by-value should be equivalent.) Functions do not exist at run-time -- they cannot be stored in entity attributes, requests, etc. As such, referencing a function by its name, without calling it, is a syntax error. Arguments for each of a function's declared parameters, and no more, must be provided at the call, or it's a syntax error. Other errors (such as type errors or overflow) are detected at runtime, or via validation. @@ -145,17 +145,14 @@ permit(principal,action,resource) when { All functions in a policyset are in scope for all policies, i.e., function declarations do not need to lexically precede use. Cedar policies and function bodies are lexically scoped. -`principal`/`action`/`resource`/ `context` are considered to be more tightly bound then function names. -This means that while you could name a function `principal`, you could never call it. (This should probably be a validator warning) -Function name conflicts at the top level are a parse error. -Function names may shadow extension functions (results in a warning). +We forbid defining functions with names `principal`, `action`, `resource`, or `context`, to avoid conflicts and confusion with global variable names. ### Formal semantics/Desugaring rules This RFC does not add any evaluation rules to Cedar, as functions can be completely desugared. Dusugaring proceeds from the innermost function call to avoid hygiene issues. If $f$ is the name of a declared function, _def_($f$) is the body of the definition, and $p_1, ..., p_n$ is the list of parameters in the definition. Let $e_1, ..., e_n$ be a list of Cedar expression that do not contain any Cedar Function calls: -$f(e_1, ..., e_n) \rightarrow$ _def_$(f) [p_1 \mapsto e_1, ..., p_n \mapsto e_n]$ +$f(e_1, ..., e_n) \rightarrow$ _def_ $(f) [p_1 \mapsto e_1, ..., p_n \mapsto e_n]$ Where $e[x \mapsto e']$ means to substitute $e'$ for $x$ in $e$, as usual. ### Formal grammar @@ -278,24 +275,40 @@ may be impossible due to the use of polymorphic functions. ### Call By Value Functions could be call-by-value instead of call-by-name. -In general, this would follow principal of least surprise. -Extension functions are call-by-value, and few popular languages have call-by-name constructs. -The simplicity of validation and analysis and pluses for CBN, but the big problem is enabling inlining for execution. +In general, this would follow principle of least surprise. +Extension functions are call-by-value, and few popular languages have call-by-name functions. +However, CBV without further changes would lead to validation soundness issues. In particular, consider the following. +``` +function drop(a,b) { a }; +permit(principal,action,resource) +when { drop(true,1+"hello") }; +``` +This policy will validate because once we've inlined the function we'd have +``` +permit(principal,action,resource) +when { true }; +``` +But if we actually do CBV evaluation then this policy will fail because `1+"hello"` will fail. We could solve this problem by validating the argument expressions of a function call individually, in addition to validating the entire policy after inlining. But that's extra work. There's also the same problem with analysis: Our logical encoding would have to do CBV to be consistent with the actual evaluator, rather than just inlining, or else we need to prove that eager-eval(e) = lazy-eval(e) for all validated e. + +CBN also has the benefit that it's more powerful. For example, you could not define `implies` as follows if we had CBV: +``` +def implies(e1,e1) { + !e1 || (e1 && e2) +} +permit(principal,action,resource) when { + implies(principal has attr, + resource has attr && principal.attr == resource.attr) +}; +``` +The above won't work with CBV because the second expression to the call to `implies` will fail if `principal.attr` does not exist. ### Let functions call other functions As long as cycles are forbidden and functions as arguments are disallowed, we could allow functions to call other functions without sacrificing termination. However, the potential complexity explosion is high, and it's backwards compatible to add this later. - - ### Naming -Should these really be called `function`s? They are actually `macro`s. `snippet`? +Should these really be called `function`s? They are essentially macros. Call them that? ## Unresolved Questions From 67e08439aea1cb2a6096d31af5564f51bbd87267 Mon Sep 17 00:00:00 2001 From: Aaron Eline Date: Fri, 22 Mar 2024 12:33:24 -0400 Subject: [PATCH 10/17] Justification for macros --- text/0061-functions.md | 49 ++++++++++++++++++++++++++++++------------ 1 file changed, 35 insertions(+), 14 deletions(-) diff --git a/text/0061-functions.md b/text/0061-functions.md index bdda50e8..d412ec13 100644 --- a/text/0061-functions.md +++ b/text/0061-functions.md @@ -12,7 +12,7 @@ ## Summary This RFC proposes to support user-defined function-like macros in Cedar. -We call these Cedar Function Macros. +We call these Cedar Function Macros (macros for short). Cedar Function Macros provide a lightweight mechanism for users to create abstractions in their policies, aiding readability and reducing the chance for errors. Cedar Function Macros have restrictions to ensure termination and efficiency, maintain the feasibility of validation and analysis, and help ensure policies are readable. An important implementation note is that Cedar policies may be implemented purely through desugaring. @@ -55,7 +55,7 @@ when { }; ``` -For simplicity, safety, and readability, Cedar functions cannot call other functions, and cannot take functions as arguments. +For simplicity, safety, and readability, Cedar macros cannot call other macros, and cannot take macros as arguments. ## Motivation @@ -111,11 +111,11 @@ bar(1 + 1, resource.owner in principal) baz(some_other_function(3)) ``` -Function arguments are lazily evaluated (i.e., call-by-name), as opposed to extension functions today. +Macro arguments are lazily evaluated (i.e., call-by-name), as opposed to extension functions today. Call-by-name is required to support inlining correctly in the presence of errors. (Without errors, call-by-name and call-by-value should be equivalent.) -Functions do not exist at run-time -- they cannot be stored in entity attributes, requests, etc. As such, referencing a function by its name, without calling it, is a syntax error. -Arguments for each of a function's declared parameters, and no more, must be provided at the call, or it's a syntax error. +Macros do not exist at run-time -- they cannot be stored in entity attributes, requests, etc. As such, referencing a function by its name, without calling it, is a syntax error. +Arguments for each of a macro's declared parameters, and no more, must be provided at the call, or it's a syntax error. Other errors (such as type errors or overflow) are detected at runtime, or via validation. Examples: ``` @@ -143,14 +143,17 @@ permit(principal,action,resource) when { ### Namespacing/scoping -All functions in a policyset are in scope for all policies, i.e., function declarations do not need to lexically precede use. -Cedar policies and function bodies are lexically scoped. +All macros in a policyset are in scope for all policies, i.e., function declarations do not need to lexically precede use. +Cedar policies and macro bodies are lexically scoped. +`principal`/`action`/`resource`/ `context` are considered to be more tightly bound then function names. +Macro name conflicts at the top level are a parse error. +Macro names may shadow extension functions (results in a warning). We forbid defining functions with names `principal`, `action`, `resource`, or `context`, to avoid conflicts and confusion with global variable names. ### Formal semantics/Desugaring rules -This RFC does not add any evaluation rules to Cedar, as functions can be completely desugared. -Dusugaring proceeds from the innermost function call to avoid hygiene issues. -If $f$ is the name of a declared function, _def_($f$) is the body of the definition, and $p_1, ..., p_n$ is the list of parameters in the definition. +This RFC does not add any evaluation rules to Cedar, as macros can be completely desugared. +Dusugaring proceeds from the innermost macro call to avoid hygiene issues. +If $f$ is the name of a declared macro, _def_($f$) is the body of the definition, and $p_1, ..., p_n$ is the list of parameters in the definition. Let $e_1, ..., e_n$ be a list of Cedar expression that do not contain any Cedar Function calls: $f(e_1, ..., e_n) \rightarrow$ _def_ $(f) [p_1 \mapsto e_1, ..., p_n \mapsto e_n]$ Where $e[x \mapsto e']$ means to substitute $e'$ for $x$ in $e$, as usual. @@ -158,7 +161,7 @@ Where $e[x \mapsto e']$ means to substitute $e'$ for $x$ in $e$, as usual. ### Formal grammar The grammar of Cedar expressions is unchanged, as we re-use the existing call form. ``` -function ::= 'def' Path '(' Params? ')' Expr ';' +macro ::= 'def' Path '(' Params? ')' Expr ';' Params ::= ParamIdent (',' ParamIdent)? ','? ParamIdent ::= '?' IDENT // These are equivalent to the production rule for template slots ``` @@ -166,8 +169,8 @@ Note the `Params` non-terminal allows trailing commas in parameter lists. ### Validation and Analysis -The validator typechecks functions at use-sites via inlining. -This means that functions can be polymorphic. For example, one could write the following, where `eq` is used in the first instance with a pair of entities, and in the second instance with a pair of strings. +The validator typechecks macros at use-sites via inlining. +This means that macros can be polymorphic. For example, one could write the following, where `eq` is used in the first instance with a pair of entities, and in the second instance with a pair of strings. ``` def eq(?a, ?b) ?a == ?b ; @@ -183,7 +186,7 @@ Likewise, any static analysis tools would work via inlining. ## Drawbacks ### Redundancy -Cedar functions can only accomplish things that can already accomplished with Cedar expressions. +Cedar macros can only accomplish things that can already accomplished with Cedar expressions. This means we are not expanding the expressive power of Cedar in any way. We are also adding more than one way to accomplish the same task. Bringing back our SemVer example: @@ -227,8 +230,26 @@ Of course, these drawbacks do not necessarily speak against Cedar functions gene A policy can no longer be read by itself, it has to be read in the context of all function definitions it uses. Policies that use a large number of functions may be hard to read. + ## Alternatives +### Naming: Are these macros or are these functions? +The feature described in this RFC could be interpreted as either Macros or as Call-By-Name pure functions. +They are equivalent in this context. +This raises the question of what we should call them. +The RFC opts for Macros, and this sections details the pros and cons of each. + +Pros of calling them "Functions": + +* Developers are probably more familiar with the concepts of functions than of macros. Many modern languages lack a macro facility (Javascript, Java, Python, Ruby), and the feature looks like normal function calls. +* Macros mean different things in different languages: C/C++/Rust macros allow you to edit the token stream, whereas this RFC's feature can only operate on fully parsed ASTs, and can only produce fully parsed ASTs. In addition, we provide no way to pattern match/pull features out of the argument asts. +* Macros may have a poor reputation as producing impossible to read code/error messages. Most languages with macros have the guidance to avoid them if possible. + +Pros of calling them "Macros": +* While some mainstream languages lake a macro facility, all mainstream languages lack Call-By-Name functions. Most readers are used to CBV languages. So they inherently assume (having not actually read the docs) that for functions when you write f(1+2,principal.id like "foo") then you will evaluate 1+2 and then principal.id like "foo", and then call the function with the results. They will not imagine that inlining will happen, and that short-circuiting and whatnot can change the results. By calling them macros, we help point out this distinction. +* Regardless of the particular macro implementation (C/Rust/Whatever), users who are familiar with macros will understand that macros do not evaluate their arguments. For macros they know that when you write f(1+2,principal.id like "foo") you are substituting full expressions 1+2 and principal.id like "foo" into the body, you are not evaluating the them first. + + ### Type annotations We could require/allow Cedar functions to have type annotations, taking type definitions from either the schema or allowing them to be inline. From d8197311ab6765c22603c1397902fccdbb4c84e2 Mon Sep 17 00:00:00 2001 From: Aaron Eline Date: Fri, 22 Mar 2024 13:29:05 -0400 Subject: [PATCH 11/17] Update text/0061-functions.md Co-authored-by: Kesha Hietala --- text/0061-functions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0061-functions.md b/text/0061-functions.md index d412ec13..67904f66 100644 --- a/text/0061-functions.md +++ b/text/0061-functions.md @@ -93,7 +93,7 @@ def name(?param1, ?param2) body ; Use of an unbound variable in the body is a syntax error. A parameter list may not declare the same variable twice, and may not list any standard Cedar variables. An unused variable is a syntax warning. -Function and variable names share the same namesapce, with standard lexical scoping. +Function and variable names share the same namespace, with standard lexical scoping. Inside of a function, any function application (see below) that does not resolve to an extension function or built-in operation is an error. In other words, Cedar functions are not permitted to call other Cedar functions. ### Function macro applications (a.k.a. function calls) From c677d118b174cab72dc9c600f4d1e58564f4d34f Mon Sep 17 00:00:00 2001 From: Aaron Eline Date: Fri, 22 Mar 2024 13:38:23 -0400 Subject: [PATCH 12/17] Update text/0061-functions.md Co-authored-by: Kesha Hietala --- text/0061-functions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0061-functions.md b/text/0061-functions.md index 67904f66..d2d68a5e 100644 --- a/text/0061-functions.md +++ b/text/0061-functions.md @@ -33,7 +33,7 @@ when { Here it is instead represented with functions ``` -def semver(?major, ?minor, ?long) +def semver(?major, ?minor, ?patch) { major : ?major, minor : ?minor, patch : ?patch } ; From e84bcb9fa626b5b144b1b28d5e98ebc13cd53e34 Mon Sep 17 00:00:00 2001 From: Aaron Eline Date: Fri, 22 Mar 2024 13:38:50 -0400 Subject: [PATCH 13/17] Update text/0061-functions.md Co-authored-by: Kesha Hietala --- text/0061-functions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0061-functions.md b/text/0061-functions.md index d2d68a5e..0618f4a7 100644 --- a/text/0061-functions.md +++ b/text/0061-functions.md @@ -72,7 +72,7 @@ In contrast, function macros written as simple abstractions over Cedar expressio ### Function Macro declarations -This RFC adds a new top level form to Cedar policysets: the function macro declaration. +This RFC adds a new top level form to Cedar policy sets: the function macro declaration. A Cedar function macro declaration is composed of three elements: From 6af017a48c477f5b16e6c60bed42dbdc2853a646 Mon Sep 17 00:00:00 2001 From: Aaron Eline Date: Fri, 22 Mar 2024 15:00:29 -0400 Subject: [PATCH 14/17] Template discussion --- text/0061-functions.md | 39 ++++++++++++++++++++++++++++++++++++++- 1 file changed, 38 insertions(+), 1 deletion(-) diff --git a/text/0061-functions.md b/text/0061-functions.md index 0618f4a7..536b216b 100644 --- a/text/0061-functions.md +++ b/text/0061-functions.md @@ -183,6 +183,43 @@ when { Likewise, any static analysis tools would work via inlining. +### Templates +Cedar macros use the same notation for their variables as do templates, since in both cases variables are "holes" that are filled in with a Cedar expression to make a full construct -- for templates, the construct is a policy, for macros it is an expression. + +Templates can include calls to macros. For example: + +``` +def semver(?major, ?minor, ?patch) ... // same as initial example +def semverGT(?lhs, ?rhs) ... // same as initial example +permit ( + principal == ?principal, + action == Action::"view", + resource in ?resource) +when { + semverGT(resource.apiVersion, semver(2,1,0)) +}; +``` + +When we link this template, we get a full policy, as usual. + +A macro parameter can end up having the same name as a template variable, with the effect that it shadows it, which is expected with normal lexical scoping: + +``` +def isOwner(?principal,?resource) ?principal == ?resource.owner; +permit(principal,action,resource in ?resource) +when { isOwner(principal,resource) }; +``` + +Here, notice that the isOwner macro's reference to ?resource is to its parameter, not to the template slot. (We'd recommend to users to choose different parameter names to avoid confusion.) + +As already mentioned, you cannot refer to a parameter in a macro that it does not bind, which enforces hygiene. So the following is not allowed. + +``` +def isOwner() ?principal == ?resource.owner; +// ERROR: cannot refer to unbound variables ?principal, ?resource +``` + + ## Drawbacks ### Redundancy @@ -246,7 +283,7 @@ Pros of calling them "Functions": * Macros may have a poor reputation as producing impossible to read code/error messages. Most languages with macros have the guidance to avoid them if possible. Pros of calling them "Macros": -* While some mainstream languages lake a macro facility, all mainstream languages lack Call-By-Name functions. Most readers are used to CBV languages. So they inherently assume (having not actually read the docs) that for functions when you write f(1+2,principal.id like "foo") then you will evaluate 1+2 and then principal.id like "foo", and then call the function with the results. They will not imagine that inlining will happen, and that short-circuiting and whatnot can change the results. By calling them macros, we help point out this distinction. +* While some mainstream languages lack a macro facility, all mainstream languages lack Call-By-Name functions. Most readers are used to CBV languages. So they inherently assume (having not actually read the docs) that for functions when you write f(1+2,principal.id like "foo") then you will evaluate 1+2 and then principal.id like "foo", and then call the function with the results. They will not imagine that inlining will happen, and that short-circuiting and whatnot can change the results. By calling them macros, we help point out this distinction. * Regardless of the particular macro implementation (C/Rust/Whatever), users who are familiar with macros will understand that macros do not evaluate their arguments. For macros they know that when you write f(1+2,principal.id like "foo") you are substituting full expressions 1+2 and principal.id like "foo" into the body, you are not evaluating the them first. From 32ac7eb47c761a8e95a788a5c6dedc560a486812 Mon Sep 17 00:00:00 2001 From: Mike Hicks Date: Sat, 23 Mar 2024 16:00:01 -0400 Subject: [PATCH 15/17] full pass to tidy text and bring it up-to-date with discussion threads Signed-off-by: Mike Hicks --- text/0061-functions.md | 231 +++++++++++++++++------------------------ 1 file changed, 97 insertions(+), 134 deletions(-) diff --git a/text/0061-functions.md b/text/0061-functions.md index 536b216b..6849abf4 100644 --- a/text/0061-functions.md +++ b/text/0061-functions.md @@ -1,4 +1,4 @@ -# User-defined Function Macros +# User-defined Macros ## Related Issues and PRs @@ -11,15 +11,13 @@ ## Summary -This RFC proposes to support user-defined function-like macros in Cedar. -We call these Cedar Function Macros (macros for short). -Cedar Function Macros provide a lightweight mechanism for users to create abstractions in their policies, aiding readability and reducing the chance for errors. -Cedar Function Macros have restrictions to ensure termination and efficiency, maintain the feasibility of validation and analysis, and help ensure policies are readable. -An important implementation note is that Cedar policies may be implemented purely through desugaring. +This RFC proposes to support user-defined, function-like macros in Cedar. +Cedar macros provide a lightweight mechanism for users to create abstractions in their policies, aiding readability and reducing the chance for errors. +Cedar macros have restrictions to ensure termination and efficiency, maintain the feasibility of validation and analysis, and help ensure policies are readable. ### Basic Example -Looking at the linked issue, the *semantic versioning* (SemVer) use case is a perfect use-case for functions. SemVer can be trivially encoded within Cedar’s existing expression language. +Looking at the linked issue [#637](https://github.com/cedar-policy/cedar/issues/637), *semantic versioning* (SemVer) is a perfect use-case for macros: it can be trivially encoded within Cedar’s existing expression language, but doing so would make policies hard to read. ``` // Permit access when the api is greater than 2.1 @@ -30,14 +28,15 @@ when { }; ``` -Here it is instead represented with functions +Here is the same policy expressed using macros. ``` +// A semver has three version components: major, minor, and patch def semver(?major, ?minor, ?patch) { major : ?major, minor : ?minor, patch : ?patch } ; -// lhs > rhs ? +// Is the semver in the first parameter newer than (>) the second ? def semverGT(?lhs, ?rhs) if ?lhs.major == ?rhs.major then if ?lhs.minor == ?rhs.minor then @@ -59,61 +58,69 @@ For simplicity, safety, and readability, Cedar macros cannot call other macros, ## Motivation -Cedar currently lacks mechanisms for users to build abstractions. The only existing mechanism is extension functions, which are insufficient for two reasons: +Cedar currently lacks adequate mechanisms for users to build abstractions. The only existing mechanism is extension functions, which are limited for two reasons: -1. They are extremely heavyweight, requiring modifying the source of the Cedar evaluator. This means that users who want to stay on official versions of Cedar have no choice but to attempt to submit a PR and get it accepted into the mainline. This process does not scale. - 1. For data structures that are relatively standard (ex: SemVer, or OID Users as proposed in [RFC 58](https://github.com/cedar-policy/rfcs/blob/cdisselkoen/standard-library/text/0058-standard-library.md)), it’s hard to know what’s in-demand enough to be included, and how to balance that against avoiding bloat. There’s no way to naturally observe usage because the only way to “install” the extension pre-acceptance is to vend a modified version of Cedar. - 2. Users may have data structures that are totally bespoke to their systems. It makes no sense to include these in the standard Cedar distribution at all, yet users may still want some way to build abstractions. +1. They are extremely heavyweight, requiring modifications to the Cedar evaluator. This means that users who want to stay on official versions of Cedar have no choice but to attempt to submit a PR and get it accepted into the mainline. This process does not scale. + 1. For data structures that are relatively standard (ex: SemVer, or OID Users as proposed in [RFC 58](https://github.com/cedar-policy/rfcs/blob/cdisselkoen/standard-library/text/0058-standard-library.md)), it’s hard for the Cedar maintainers to know what’s in-demand enough to be included, and how to balance that against avoiding bloat. There’s no way to naturally observe usage because the only way to “install” the extension pre-acceptance is to vend a modified version of Cedar. + 2. Users may have data structures that are totally bespoke to their systems. It makes no sense to include these in the standard Cedar distribution, yet users may still want some way to use them easily. 2. They are too powerful. Extensions are implemented via arbitrary Rust code, which is essential for encoding features that cannot be represented via Cedar expressions (such as IP Addresses), but opens the door for a wide range of bugs/design issues. It’s trivial to design an extension that is difficult to validate and/or logically encode for analysis. Problematically, extension functions can potentially exhibit non-determinism, non-termination, or non-linear performance; interact with the operating system; or violate memory safety. This raises the code review burden when considering an extension function's implementation. -In contrast, function macros written as simple abstractions over Cedar expressions, which themselves cannot call other Cedar functions, have none of these problems. They naturally inherit the properties of Cedar expressions. Cedar functions are guaranteed to terminate and be deterministic. Since they are compositions of Cedar expressions, it’s easy to validate and analyze them. +In contrast, macros have none of these problems. Macros are written as simple abstractions over Cedar expressions, whose evaluation is always deterministic. As macros cannot call other macros, they are simple to understand and guaranteed to terminate. Since macros are compositions of Cedar expressions, it’s easy to validate and analyze them. ## Detailed Design -### Function Macro declarations - -This RFC adds a new top level form to Cedar policy sets: the function macro declaration. - -A Cedar function macro declaration is composed of three elements: - -1. A name, which is a valid (possibly namespaced) identifier. -2. A list of parameters, each of which is a non-namespaced identifier preceded by a `?`. -3. A body, which is a Cedar expression. - 1. The body may contain (non-namespaced) variables, drawn from the function's parameters. - 2. The body may not contain calls to other functions. - 3. The body may contain functions calls to builtin Cedar functions/methods or extensions calls. (ex: Things like `.contains()` and `ip("10.10.10.10")` are fine) - 4. Standard Cedar variables (`principal`, `action`, `resource`, and `context`) are *not* considered bound within the body (following the principle of macro hygiene). - - -Structurally, a declaration is written like this: +### Macro definitions +This RFC adds a new top level form to Cedar policy sets: the macro definition. +Structurally, a macro definition is written like this: ``` def name(?param1, ?param2) body ; ``` -Use of an unbound variable in the body is a syntax error. -A parameter list may not declare the same variable twice, and may not list any standard Cedar variables. -An unused variable is a syntax warning. -Function and variable names share the same namespace, with standard lexical scoping. -Inside of a function, any function application (see below) that does not resolve to an extension function or built-in operation is an error. In other words, Cedar functions are not permitted to call other Cedar functions. +A macro definition has three elements: +1. A name, which is a valid, possibly namespaced, identifier. +2. A list of parameter variables, each of which is a non-namespaced identifier preceded by a `?`. (Declaring the same parameter variable twice is a syntax error.) +3. A body, which is a Cedar expression. + 1. The body may refer to the macro's parameter variables. (Use of an unbound parameter variable is a syntax error. Failure to reference one of the parameter variables is a warning.) + 2. The body may not refer to standard Cedar variables `principal`, `action`, `resource`, and `context`. (Doing so ensures macros are always functions of their parameters.) + 3. The body may not contain calls to other macros. (It may contain calls to builtin or extension functions/methods, such as `.contains()` and `ip("10.10.10.10")`, as usual.) -### Function macro applications (a.k.a. function calls) +### Macro applications (a.k.a. macro calls) -A function macro application has the same syntax as an extension function constructor application. In particular, a function application is composed of two elements: +A macro application has the same syntax as an extension function constructor application. In particular, an application is composed of two elements: -1. The function name (potentially namespaced) -2. A comma separated list of arguments, which are arbitrary cedar expressions. +1. The macro name (potentially namespaced) +2. A comma-separated list of arguments, each of which is an arbitrary Cedar expression. Here are some examples: ``` foo(1, 2, principal.name) bar(1 + 1, resource.owner in principal) -baz(some_other_function(3)) +baz(ip("1.2.3.4")) ``` -Macro arguments are lazily evaluated (i.e., call-by-name), as opposed to extension functions today. -Call-by-name is required to support inlining correctly in the presence of errors. -(Without errors, call-by-name and call-by-value should be equivalent.) +A macro call $f(e_1, ..., e_n)$ evaluates to the macro's body but with occurrences of parameters $p_1, ..., p_n$ replaced by argument expressions $e_1, ..., e_n$, respectively. For example, considering or SemVer use-case, evaluating `semver(2,1,0)` produces the expression `{ major : 2, minor : 1, patch : 0 }`. + +Importantly, macro calls are evaluated _lazily_, thus using a _call by name_ semantics: We do _not_ evaluate the argument expressions before substituting them in the macro body. To see the effect of this, consider the following: +``` +def implies(?e1,?e1) { + if ?e1 then ?e2 else true +} +permit(principal,action,resource) when { + implies(principal has attr, + resource has attr && principal.attr == resource.attr) +}; +``` +When evaluating the `when` clause, the call to `implies` evaluates to the following +``` + if principal has attr then + resource has attr && principal.attr == resource.attr + else true +``` +Notice how the argument expression `principal has attr` has been substituted for parameter `?e1` whole-cloth, without evaluating it first (and likewise for the other parameter/argument). This means that if `principal` indeed has optional attribute `attr`, then evaluating sub-expression `principal.attr` in the `then` clause is safe. If we eagerly evaluated a macro call's argument expressions then `principal.attr` would be evaluted eagerly, producing an error if `principal has attr` turns out to be `false`. + +### Errors + Macros do not exist at run-time -- they cannot be stored in entity attributes, requests, etc. As such, referencing a function by its name, without calling it, is a syntax error. Arguments for each of a macro's declared parameters, and no more, must be provided at the call, or it's a syntax error. Other errors (such as type errors or overflow) are detected at runtime, or via validation. @@ -122,46 +129,34 @@ Examples: def foo(?a, ?b) ?a + ?b; permit(principal,action,resource) when { - foo + 1 // Parse error -}; -permit(principal,action,resource) when { - foo(1) // Parse error + foo + 1 // Parse error -- macro foo not in a call }; permit(principal,action,resource) when { - foo(1, 2) // No error + foo(1) // Parse error -- too few arguments to foo }; permit(principal,action,resource) when { - foo(1, "hello") // Run time type error + foo(1, "hello", principal) // Parse error -- too many arguments to foo }; permit(principal,action,resource) when { - foo(1, "hello", principal) // Parse error + foo(1, "hello") // Run-time (and validation) type error }; permit(principal,action,resource) when { - bar(1, "hello", principal) // Parse error + bar(1, "hello", principal) // Parse error -- no such macro bar }; ``` ### Namespacing/scoping -All macros in a policyset are in scope for all policies, i.e., function declarations do not need to lexically precede use. -Cedar policies and macro bodies are lexically scoped. -`principal`/`action`/`resource`/ `context` are considered to be more tightly bound then function names. +All macros in a policy set are in scope for all policies, i.e., macro definitions do not need to lexically precede their use. +Macro parameter references in expressions are resolved via lexical scoping. Macro name conflicts at the top level are a parse error. Macro names may shadow extension functions (results in a warning). -We forbid defining functions with names `principal`, `action`, `resource`, or `context`, to avoid conflicts and confusion with global variable names. - -### Formal semantics/Desugaring rules -This RFC does not add any evaluation rules to Cedar, as macros can be completely desugared. -Dusugaring proceeds from the innermost macro call to avoid hygiene issues. -If $f$ is the name of a declared macro, _def_($f$) is the body of the definition, and $p_1, ..., p_n$ is the list of parameters in the definition. -Let $e_1, ..., e_n$ be a list of Cedar expression that do not contain any Cedar Function calls: -$f(e_1, ..., e_n) \rightarrow$ _def_ $(f) [p_1 \mapsto e_1, ..., p_n \mapsto e_n]$ -Where $e[x \mapsto e']$ means to substitute $e'$ for $x$ in $e$, as usual. +We forbid defining macros with names `principal`, `action`, `resource`, or `context` (and, as mentioned, macro bodies cannot mention these variables either), to avoid conflicts and confusion with global variable names. ### Formal grammar The grammar of Cedar expressions is unchanged, as we re-use the existing call form. ``` -macro ::= 'def' Path '(' Params? ')' Expr ';' +Macro ::= 'def' Path '(' Params? ')' Expr ';' Params ::= ParamIdent (',' ParamIdent)? ','? ParamIdent ::= '?' IDENT // These are equivalent to the production rule for template slots ``` @@ -169,8 +164,9 @@ Note the `Params` non-terminal allows trailing commas in parameter lists. ### Validation and Analysis -The validator typechecks macros at use-sites via inlining. -This means that macros can be polymorphic. For example, one could write the following, where `eq` is used in the first instance with a pair of entities, and in the second instance with a pair of strings. +The validator typechecks macros at use-sites via inlining. In particular, when validating a policy, all macro calls are replaced as described in the evaluation semantics above, prior to validating the policy. + +This approach means that macros that are defined but never used will not be validated. It also means that macros can be polymorphic. For example, one could write the following, where `eq` is used in the first instance with a pair of entities, and in the second instance with a pair of strings. ``` def eq(?a, ?b) ?a == ?b ; @@ -184,7 +180,7 @@ when { Likewise, any static analysis tools would work via inlining. ### Templates -Cedar macros use the same notation for their variables as do templates, since in both cases variables are "holes" that are filled in with a Cedar expression to make a full construct -- for templates, the construct is a policy, for macros it is an expression. +Macros use the same syntax for their parameter variables as do templates, since in both cases variables are "holes" that are filled in with a Cedar expression to make a full construct -- for templates, the construct is a policy, for macros it is an expression. Templates can include calls to macros. For example: @@ -206,11 +202,14 @@ A macro parameter can end up having the same name as a template variable, with t ``` def isOwner(?principal,?resource) ?principal == ?resource.owner; -permit(principal,action,resource in ?resource) -when { isOwner(principal,resource) }; + +permit(principal,action,resource in ?resource) +when { + isOwner(principal,resource) +}; ``` -Here, notice that the isOwner macro's reference to ?resource is to its parameter, not to the template slot. (We'd recommend to users to choose different parameter names to avoid confusion.) +Here, notice that the `isOwner` macro's reference to `?resource` is to its parameter, not to the template slot. (We'd recommend to users to choose different parameter names to avoid confusion.) As already mentioned, you cannot refer to a parameter in a macro that it does not bind, which enforces hygiene. So the following is not allowed. @@ -259,7 +258,7 @@ These functions basically implement the equivalent of Cedar `decimal` numbers. B First, if `i` and/or `f` are outside the allowed range, you will get a Cedar overflow exception at run-time. This exception is not as illuminating as the custom error emitted by the `decimal` extension function (whose message will be `"Too many digits"`). Moreover, custom errors from constructor parameter validity checks can be emitted during validation when using extension functions, but not when using Cedar functions. -Second, there is no special parsing for Cedar functions. With Cedar's built-in `decimal`, you can write `decimal("123.12")` which more directly conveys the number being represented than does `mydecimal(123,12)`. +Second, there is no special parsing for Cedar functions. With Cedar's built-in `decimal`, you can write `decimal("123.12")` which more directly conveys the number being represented than does `mydecimal(123,1200)`. (Note that `mydecimal(123,12)` represents the number 123.0012, which may surprised some readers!). Of course, these drawbacks do not necessarily speak against Cedar functions generally, but suggest that for suitably general use-cases (like decimal numbers!), an extension function might be warranted instead. @@ -271,72 +270,49 @@ Policies that use a large number of functions may be hard to read. ## Alternatives ### Naming: Are these macros or are these functions? -The feature described in this RFC could be interpreted as either Macros or as Call-By-Name pure functions. -They are equivalent in this context. -This raises the question of what we should call them. -The RFC opts for Macros, and this sections details the pros and cons of each. +The feature described in this RFC could also be referred to _functions_ with call-by-name semantics, since that is an accurate description of what they are. There are some good reasons to call them functions, instead of macros: -Pros of calling them "Functions": - -* Developers are probably more familiar with the concepts of functions than of macros. Many modern languages lack a macro facility (Javascript, Java, Python, Ruby), and the feature looks like normal function calls. -* Macros mean different things in different languages: C/C++/Rust macros allow you to edit the token stream, whereas this RFC's feature can only operate on fully parsed ASTs, and can only produce fully parsed ASTs. In addition, we provide no way to pattern match/pull features out of the argument asts. -* Macros may have a poor reputation as producing impossible to read code/error messages. Most languages with macros have the guidance to avoid them if possible. - -Pros of calling them "Macros": -* While some mainstream languages lack a macro facility, all mainstream languages lack Call-By-Name functions. Most readers are used to CBV languages. So they inherently assume (having not actually read the docs) that for functions when you write f(1+2,principal.id like "foo") then you will evaluate 1+2 and then principal.id like "foo", and then call the function with the results. They will not imagine that inlining will happen, and that short-circuiting and whatnot can change the results. By calling them macros, we help point out this distinction. -* Regardless of the particular macro implementation (C/Rust/Whatever), users who are familiar with macros will understand that macros do not evaluate their arguments. For macros they know that when you write f(1+2,principal.id like "foo") you are substituting full expressions 1+2 and principal.id like "foo" into the body, you are not evaluating the them first. +* Developers are probably more familiar with the concept of function than of macro. Many popular languages lack a macro facility (Javascript, Java, Python, Ruby, etc.), and the Cedar feature looks like normal functions. Calling them macros uses term that may seem unnecessary or unfamiliar. +* The macro facility proposed for Cedar is much more limited than macro facilities available in other languages, so the name may be misleading. For example, in C, C++, and Rust, macros can edit the syntax token stream, whereas this RFC's feature can only operate on fully parsed ASTs, and can only produce fully parsed ASTs. In addition, Cedar macros provide no way to pattern match/pull features out of the argument ASTs. +* Convoluted uses of macros, distressingly common in older C and C++ code, have given macros somewhat of a bad reputation, so using the term macro may (wrongly) signal that the proposed Cedar facility could be problematic in ways that it is not. +Despite these downsides, we feel that on balance the term macro is more helpful than harmful: +* Mainstream languages use call-by-value for function calls, rather than call-by-name, so using the term "function" may give the wrong impression. Most readers will inherently assume (having not actually read the docs) that when you write `f(1+2,principal.id like "foo")` you will evaluate `1+2` and then `principal.id like "foo"`, and then call the `f` with the results. They may not suspect call-by-name semantics or appreciate its potentially surprising and powerful effects, as described for the `implies` example above. Thus they may end up writing incorrect policies. +* Those familiar with macros (from C/C++ especially) will properly guess the call-by-name semantics because macro calls in existing languages are call-by-name. Those unfamiliar with macros will still be alerted, by the name, that calls may have different semantics than they expect. ### Type annotations -We could require/allow Cedar functions to have type annotations, taking type definitions from either the schema or allowing them to be inline. - -Example: - -Schema: +We could require/allow Cedar macros to have type annotations, taking type definitions from either the schema or allowing them to be inline. Here is our SemVer example with schema: ``` type SemVer = { major : Long, minor : Long, patch : Long }; ``` -Policy: +and policy: ``` def semver(?major : Long, ?minor : Long, ?patch : Long) -> Semver { major : ?major, minor : ?minor, patch : ?patch } ; ``` -This would allow functions to typechecked in absence of a use-site, and allow for easier user specification of intent. -It would also probably result in clearer type checker error messages. - -If we allowed `type` declarations in policy set files, it would just be one file: - -``` -type Semver = { major : Long, minor : Long, patch : Long }; -def semver(?major : Long, ?minor : Long, ?patch : Long) -> Semver - major : ?major, minor : ?minor, patch : ?patch -; -``` +Using type annotations would allow macros to be typechecked independently of policies that use them, and would add checked documentation of intent. +Doing so may also make it easier to provide clear validation error messages. -This introduces the following questions: +But introducing type annotations for macros introduces several questions. 1. Do we allow `type` declarations allowed in policy sets, or just in schemas? -2. Are type annotations on functions enforced dynamically à la Contracts, or are they just ignored at runtime? +2. Are type annotations on functions enforced dynamically à la "contracts," or are they just ignored at runtime? 1. If they are dynamically enforced, that implies access to the schema to unfold type definitions. It also may introduce redundant type checking. 3. Are type annotations required or optional? 4. Will we have types to support generics, i.e., polymorphism? 5. Will type annotations have support for parametric polymorphism/generics? +Leaving type annotations out of this RFC only precludes only one future design decision, which is making type annotations required. +It seems unlikely we would want to enforce such a requirement, as we have designed Cedar to be useful even when not using a schema and validation. +Adding _optional_ type annotations in the future is backwards compatible. -Leaving them out for this RFC only precludes only one future design decision, making type annotations required. -This decision feels unlikely, as we want Cedar to be useful without using a schema/the validator. -Adding _optional_ type annotations in the future is backwards compatible, but mandating type annotations will not be. -It will at minimum require a syntax change to add the annotations, and -may be impossible due to the use of polymorphic functions. - -### Call By Value -Functions could be call-by-value instead of call-by-name. -In general, this would follow principle of least surprise. -Extension functions are call-by-value, and few popular languages have call-by-name functions. +### Call-by-value macro calls +Macros calls could use call-by-value (CBV) instead of call-by-name semantics. +Doing so might match user expectations: Mainstream languages use call by value, extension functions in Cedar are call by value. -However, CBV without further changes would lead to validation soundness issues. In particular, consider the following. +However, using CBV would lead to validation soundness issues if we continued to perform validation by inlining calls. In particular, consider the following. ``` function drop(a,b) { a }; permit(principal,action,resource) @@ -347,28 +323,15 @@ This policy will validate because once we've inlined the function we'd have permit(principal,action,resource) when { true }; ``` -But if we actually do CBV evaluation then this policy will fail because `1+"hello"` will fail. We could solve this problem by validating the argument expressions of a function call individually, in addition to validating the entire policy after inlining. But that's extra work. There's also the same problem with analysis: Our logical encoding would have to do CBV to be consistent with the actual evaluator, rather than just inlining, or else we need to prove that eager-eval(e) = lazy-eval(e) for all validated e. - -CBN also has the benefit that it's more powerful. For example, you could not define `implies` as follows if we had CBV: -``` -def implies(e1,e1) { - !e1 || (e1 && e2) -} -permit(principal,action,resource) when { - implies(principal has attr, - resource has attr && principal.attr == resource.attr) -}; -``` -The above won't work with CBV because the second expression to the call to `implies` will fail if `principal.attr` does not exist. +But if we actually do CBV evaluation then this policy will fail because `1+"hello"` will fail. We believe we could solve this problem by validating the argument expressions of a function call individually, in addition to validating the entire policy after inlining. But that's extra work. There's also the same problem with analysis: Our logical encoding would have to do CBV to be consistent with the actual evaluator, rather than just inlining, or else we need to prove that eager-eval(e) = lazy-eval(e) for all validated e. -### Let functions call other functions -As long as cycles are forbidden and functions as arguments are disallowed, we could allow functions to call other functions without sacrificing termination. -However, the potential complexity explosion is high, and it's backwards compatible to add this later. +CBN also has the benefit that it's more powerful. The `implies` example introduced earlier won't work with CBV because the second expression to the call to `implies` will fail if `principal.attr` does not exist. -### Naming -Should these really be called `function`s? They are essentially macros. Call them that? +### Let macros call other macros +As long as cycles are forbidden and macros as arguments to other macros are disallowed, we could allow macros to call other macros while still ensuring termination and determinism. +However, doing so could make macros harder to read and adds some complexity to the implementation. +In-macro calls is always something we can add later. ## Unresolved Questions -1. Do any functions ship with Cedar? If so, which ones? Leaving that out of this RFC and proposing it's handled equivalently to [RFC 58](https://github.com/cedar-policy/rfcs/pull/58) -2. Can you import functions? Leaving that for a future RFC. +We are not proposing how to _manage_ sets of macro definitions. It would be natural to want a standard library of macro definitions, and/or a way to import particular sets of macros from a third-party library. We leave the question of macro distribution and management (and standard libraries) for another RFC (e.g., [RFC 58](https://github.com/cedar-policy/rfcs/pull/58). \ No newline at end of file From 9efba90e445b79cba8f1a22f1b013df542152262 Mon Sep 17 00:00:00 2001 From: Mike Hicks Date: Mon, 25 Mar 2024 09:06:05 -0400 Subject: [PATCH 16/17] added drawback, alternative (with new example) Signed-off-by: Mike Hicks --- text/0061-functions.md | 77 +++++++++++++++++++++++++++++++++++++++++- 1 file changed, 76 insertions(+), 1 deletion(-) diff --git a/text/0061-functions.md b/text/0061-functions.md index 6849abf4..248e7062 100644 --- a/text/0061-functions.md +++ b/text/0061-functions.md @@ -262,13 +262,88 @@ Second, there is no special parsing for Cedar functions. With Cedar's built-in ` Of course, these drawbacks do not necessarily speak against Cedar functions generally, but suggest that for suitably general use-cases (like decimal numbers!), an extension function might be warranted instead. +### Hidden performance costs +Macros allow users to write policies whose size after expansion is exponential in their pre-expansion size. Here is a pathological example. + +``` +def double (?x) { left: ?x, right: ?x } + +permit(principal,action,resource) when { + double(double(double(double({})))) has left +}; +``` + +With 4 calls to double, we've created a parse tree (and a runtime value) of size $2^4$. + +This is a fundamental issue: Macros provide readability through compactness, and in so doing they hide real costs. This is true of any programming language abstraction, and we see this tension play out in many contexts. + +One mitigation is to expose the hidden costs, when needed. For example: +- A policy authoring tool could reveal the post-expansion policy size to users, compared to the pre-expansion size, and could warn users in pathological cases like the above. +- Services storing user-provided Cedar policies that wish to protect themselves can impose bounds based on the post-expansion policy size (shared wth users), rather than the concrete policy size. + +Ultimately, this RFC takes the position that it is better to give the tool of macros to users to improve the readability and maintainability policies, than it is to withhold that tool for fear that they could misuse it. Without the tool of macros, users must "implement" their effects by cut-and-paste, which still blows up policy size while also making policies harder to read and maintain. + ### Readability: Policies are no longer standalone A policy can no longer be read by itself, it has to be read in the context of all function definitions it uses. Policies that use a large number of functions may be hard to read. - ## Alternatives +### Allow Cedar variables to appear free in macros + +This RFC requires that Cedar variables `principal`, `resource`, etc. not appear in a macro body. Allowing them to appear could make macros easier to read. For example, consider this policy: +``` +def hasTagsForRole(?principal,?resource,?role,?tag) + ?principal.taggedRoles has ?role && + ?principal.taggedRoles[?role] has ?tag && + ?resource.tags has ?tag && + ?principal.taggedRoles[?role][?tag].containsAll(?resource.tags[?tag]) +; + +permit(principal in Group:"Role-A", action, resource) when { + hasTagsForRole(principal,resource,"Role-A","country") && + hasTagsForRole(principal,resource,"Role-A","job-family") && + hasTagsForRole(principal,resource,"Role-B","task") +} +``` +In this scenario, a `principal` has a `taggedRoles` attribute to collect tags that are associated with that role, e.g., `principal["Role-A"]["country"]` is a set, if present, that contains the values of the `country` tag associated with `Role-A`. The policy authorizes access to a resource when the principal contains all the values the resource has for the tags `country`, `job-family`, and `task`, which are associated with `Role-A`. + +This example is a good use-case for macros: Without them a policy author would need to cut-and-past the `hasTagsForRole` part, producing a policy that is much harder to read and maintain: +``` +permit(principal in Group:"Role-A", action, resource) when { + principal.taggedRoles has "Role-A" && + principal.taggedRoles["Role-A"] has "country" && + resource.tags has "country" && + principal.taggedRoles["Role-A"]["country"].containsAll(resource.tags["country"]) + && + principal.taggedRoles has "Role-A" && + principal.taggedRoles["Role-A"] has "job-family" && + resource.tags has "job-family" && + principal.taggedRoles["Role-A"]["job-family"].containsAll(resource.tags["job-family"]) + && + principal.taggedRoles has "Role-B" && + principal.taggedRoles["Role-B"] has "task" && + resource.tags has "task" && + principal.taggedRoles["Role-B"]["task"].containsAll(resource.tags["task"]) +} +``` +But it could potentially be even easier to read, and a little less error prone, if we allowed `principal` and `resource` to appear free in the macro definition: +``` +def hasTagsForRole(?role,?tag) + principal.taggedRoles has ?role && + principal.taggedRoles[?role] has ?tag && + resource.tags has ?tag && + principal.taggedRoles[?role][?tag].containsAll(resource.tags[?tag]) +; + +permit(principal in Group::"Role-A", action, resource) when { + hasTagsForRole("Role-A","country") && + hasTagsForRole("Role-A","job-family") && + hasTagsForRole("Role-B","task") +} +``` +This version makes it more clear that the macro is really only a function of `?role` and `?tag` -- the `principal` and `resource` part should always be the policy's `principal` and `resource`, so we should not be forced to abstract them but always rotely fill the same variables. + ### Naming: Are these macros or are these functions? The feature described in this RFC could also be referred to _functions_ with call-by-name semantics, since that is an accurate description of what they are. There are some good reasons to call them functions, instead of macros: From fb4f046efc946371ab55b687d423a396bc599341 Mon Sep 17 00:00:00 2001 From: Aaron Eline Date: Thu, 28 Mar 2024 11:34:42 -0400 Subject: [PATCH 17/17] Update text/0061-functions.md (function -> macro) Co-authored-by: Andrew Wells <130512013+andrewmwells-amazon@users.noreply.github.com> --- text/0061-functions.md | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/text/0061-functions.md b/text/0061-functions.md index 248e7062..8cafdd4f 100644 --- a/text/0061-functions.md +++ b/text/0061-functions.md @@ -121,7 +121,7 @@ Notice how the argument expression `principal has attr` has been substituted for ### Errors -Macros do not exist at run-time -- they cannot be stored in entity attributes, requests, etc. As such, referencing a function by its name, without calling it, is a syntax error. +Macros do not exist at run-time -- they cannot be stored in entity attributes, requests, etc. As such, referencing a macro by its name, without calling it, is a syntax error. Arguments for each of a macro's declared parameters, and no more, must be provided at the call, or it's a syntax error. Other errors (such as type errors or overflow) are detected at runtime, or via validation. Examples: @@ -238,9 +238,9 @@ This policy does accomplish the user's goal of encoding the SemVer relationship. The problem is readability. A reader of this policy has to work out that it is implementing the standard semantic version comparison operator, and not some bespoke versioning scheme. Another problem is maintainability. If you have multiple policies that reason about SemVers, you have to repeat the logic, inline, in each policy. This violates basic software engineering tenets (Don’t Repeat Yourself), allowing bugs to sneak in via typos or botched copy/paste. ### No custom parsing or error-handling -Extension functions provide more full-featured constructors through custom parsing and error handling, but Cedar functions provide no such facilities. This may make them harder to read, write, and understand. +Extension functions provide more full-featured constructors through custom parsing and error handling, but Cedar macros provide no such facilities. This may make them harder to read, write, and understand. -For example, you could encode a decimal number as a `Long`, and then make Cedar functions to construct and compare decimals: +For example, you could encode a decimal number as a `Long`, and then make Cedar macros to construct and compare decimals: ``` def mydecimal(?i,?f) if ?f >= 0 && ?f <= 9999 then @@ -253,14 +253,14 @@ def mydecimalLTE(?d,?e) { ?d <= ?e // d and e are just Long numbers } ``` -These functions basically implement the equivalent of Cedar `decimal` numbers. But the approach has at least two drawbacks. +These macros basically implement the equivalent of Cedar `decimal` numbers. But the approach has at least two drawbacks. First, if `i` and/or `f` are outside the allowed range, you will get a Cedar overflow exception at run-time. This exception is not as illuminating as the custom error emitted by the `decimal` extension function (whose message will be `"Too many digits"`). -Moreover, custom errors from constructor parameter validity checks can be emitted during validation when using extension functions, but not when using Cedar functions. +Moreover, custom errors from constructor parameter validity checks can be emitted during validation when using extension functions, but not when using Cedar macros. -Second, there is no special parsing for Cedar functions. With Cedar's built-in `decimal`, you can write `decimal("123.12")` which more directly conveys the number being represented than does `mydecimal(123,1200)`. (Note that `mydecimal(123,12)` represents the number 123.0012, which may surprised some readers!). +Second, there is no special parsing for Cedar macros. With Cedar's built-in `decimal`, you can write `decimal("123.12")` which more directly conveys the number being represented than does `mydecimal(123,1200)`. (Note that `mydecimal(123,12)` represents the number 123.0012, which may surprised some readers!). -Of course, these drawbacks do not necessarily speak against Cedar functions generally, but suggest that for suitably general use-cases (like decimal numbers!), an extension function might be warranted instead. +Of course, these drawbacks do not necessarily speak against Cedar macros generally, but suggest that for suitably general use-cases (like decimal numbers!), an extension function might be warranted instead. ### Hidden performance costs Macros allow users to write policies whose size after expansion is exponential in their pre-expansion size. Here is a pathological example. @@ -284,8 +284,8 @@ One mitigation is to expose the hidden costs, when needed. For example: Ultimately, this RFC takes the position that it is better to give the tool of macros to users to improve the readability and maintainability policies, than it is to withhold that tool for fear that they could misuse it. Without the tool of macros, users must "implement" their effects by cut-and-paste, which still blows up policy size while also making policies harder to read and maintain. ### Readability: Policies are no longer standalone -A policy can no longer be read by itself, it has to be read in the context of all function definitions it uses. -Policies that use a large number of functions may be hard to read. +A policy can no longer be read by itself, it has to be read in the context of all macro definitions it uses. +Policies that use a large number of macros may be hard to read. ## Alternatives @@ -373,7 +373,7 @@ Doing so may also make it easier to provide clear validation error messages. But introducing type annotations for macros introduces several questions. 1. Do we allow `type` declarations allowed in policy sets, or just in schemas? -2. Are type annotations on functions enforced dynamically à la "contracts," or are they just ignored at runtime? +2. Are type annotations on macros enforced dynamically à la "contracts," or are they just ignored at runtime? 1. If they are dynamically enforced, that implies access to the schema to unfold type definitions. It also may introduce redundant type checking. 3. Are type annotations required or optional? 4. Will we have types to support generics, i.e., polymorphism?