-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New entity tags proposal (supersedes RFC-68) #82
Conversation
During the implementation of RFC-68, we discovered issues that are addressed by this new RFC. See the Alternatives section of this RFC for a discussion. This RFC is entering the Final Comment Period (FCP) today with the intent to accept. The FCP closes on Wednesday, September 18, at 5 pm Pacific Time. |
``` | ||
entity User = { | ||
jobLevel: Long, | ||
} tags Set<String>; | ||
entity Document = { | ||
owner: User, | ||
} tags Set<String>; | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One concern I thought of with this syntax: If you write
entity User = {
jobLevel: Long,
} tags { foo: String };
this is valid but potentially very confusing. It might look a lot like the entity has a required tag foo
of type String
, but that is not at all what that means in the current proposal. Instead, the entity has (zero or more) tags, with any names, and the value for all of them is a record with attribute foo
of type String
.
I'm not sure if there is a clearer syntax alternative. Ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally think it's Ok. Not having curly braces is clear, e.g., entity { ... } tags Set<String>
. Having them, they evoke that the tags are a record. If we wrote entity User { ... } tags { foo: String, bar: String }
what would it mean otherwise? Adding extra delimiters seems unnecessary (which could not be curly braces).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the syntax for record-typed tags is confusing, but I don't have a better suggestion. I think the confusion can be avoided by thorough documentation with plenty of examples.
I read over this as well as 68. I think I would prefer tags to be supported this way:
e.g.
This would allow tags to live anywhere in an entities attributes (not just the top level). It would also allow tags to live inside context (without having to create a separate entity to support it.). So it seems more flexible than 68 or 82 while still providing the added validation, though it would not add support for computed keys. When validating records with unknown keys, expressions such as |
@philhassey what you've suggested is essentially RFC 66. I would also prefer this, but it seems to not meet Cedar's tenets around analyzability; see comments on 66 and 68. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The arguments against RFC 68 seem convincing to me. Approving the proposal as-is.
It might also be useful to add some text comparing this proposal to RFC 27, which would be the true (IMO) generalization of RFC 68. How much work would RFC 27 be over the current proposal?
``` | ||
entity User = { | ||
jobLevel: Long, | ||
} tags Set<String>; | ||
entity Document = { | ||
owner: User, | ||
} tags Set<String>; | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the syntax for record-typed tags is confusing, but I don't have a better suggestion. I think the confusion can be avoided by thorough documentation with plenty of examples.
|
||
### Validating and parsing entities | ||
|
||
The Cedar authorizer's `is_authorized` function can be asked to validate that entities in a request are consistent with a provided schema. Extending validation to work with tags is straightforward. We check that every value in the `tags` map for an entity `e` has type `T` given that type of `e` is declared to have `tags T`. If the type `e` doesn't have `tags` in the schema, then specifying a `tags` field is an error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Entity validation isn't performed during is_authorized
, but rather when constructing a Entities
object (similar to schema-based parsing).
Co-authored-by: Craig Disselkoen <[email protected]>
Co-authored-by: Kesha Hietala <[email protected]>
Co-authored-by: Kesha Hietala <[email protected]>
Ah, RFC 27 is not analyzable because it introduces first-class maps. So, it's possible write something like |
@emina Can you explain this in a little more detail for me? I've been reading over 27, 66, 68, 82 and I still don't have a great grasp on this. Thanks! |
I'll try :) If we supported first-class maps, this means that you could use maps in the same way that you can use any other Cedar type. Specifically, (1) the Map type can be freely embedded into other types, so we can have an attribute of type Now, we implement analysis by compiling Cedar to a lower-level language called SMT-Lib, which is what automated theorem provers (SMT solvers) understand. For this compilation process to work, there must be a way to represent each Cedar first-class type as some SMT type. In SMT, there is no such thing as a finite map---that is, a map with a finite number of keys. There is something called an "Array", which is like a map, but it is defined on all possible keys of a given type. So,
This is clearly not what we want in Cedar: we want maps to be finite. So, we'd have to represent Cedar maps in some other way, let's say as Arrays of type With this proposal, note that it is impossible to put a Map in a set, or to compare two Maps for equality. This lets us avoid using Arrays, and we can represent maps efficiently as "uninterpreted functions". Because these are not first-class, it is safe (due to the way the SMT solvers work) to finitize these functions (which are also defined on all strings!) in a way that lets us encode Cedar maps. Hope that helps give some intuition :) |
So could an idea like this work:
This seems similar to the current proposal, in that it is a workaround for the analyzable problems. However, it also gets the benefits of things like RFC 27 where we get the nice ergonomics of attribute access. It also gets the benefit of being able to put tags wherever you want in your context or entity graph. |
Yes, this could work in theory. The proposal is effectively to introduce another opaque reference type (which is what Entities are) into the language, and have maps wrapped into this opaque reference type. This alternative is a much more invasive change to the language (adding a second reference type), and it would be significantly harder to implement than this RFC at the translation-to-SMT layer. The alternative does enable reusing the syntax for records and entities for opaque maps. But it doesn't add any expressiveness over this RFC. Specifically:
You can do this with the current RFC too by introducing a dedicated Entity type into the schema for every kind of map that you need, which gives you the freedom to nest wrapped maps into sets etc (like in this workaround). Basically, you use dedicated Entities to simulate wrapped maps, without having to introduce a second reference type into the language. |
I don't like this as nearly as much as my previous comment, but a variation on RFC 82 and the Typed Maps idea could be:
Thus This seems similar to the current proposal, in that it is a workaround for the analyzable problems. However, it also gets the benefits of things like RFC 27 where we get the nice ergonomics of attribute access. It has the benefit of not adding a new reference type, but has the drawback of requiring instances of a tags-map-typed-entity to populate your entity graph. |
Do you mean that we could change the Entity schema to allow two options: either (1) a regular entity with attributes, or (2) a special entity that has no attributes and behaves like a Map? Something like:
This is possible and wouldn't be harder to implement than the proposed RFC (at the translation layer). The downside though is that this design forces the use of the extra level of indirection to specify tags (in this case, via Since the main motivation for this feature is to support tags, I'd be reluctant to make the feature less convenient for modeling the most common notion of tags (a single tag-map attached to a resource type). Fundamentally, the price we pay with this proposal is that we need a dedicated syntax for tags ( On the balance, the price seems worth paying to me, and I'm curious to hear from others :) Edited to add: Just to re-iterate, this alternative is feasible from the implementation point of view. So, which way we go really depends on whether we want to prioritize reusing the record/entity access syntax, or we want to prioritize making tags a distinct and dedicated concept of "key-value pairs attached to an entity type." I'm currently leaning toward the latter. |
Fun to see a new RFC. I especially like the improvement that allows dynamic lookup! I'm a bit confused by a few aspects of this proposal, though:
|
Good questions!
My understanding of existing tag systems matches yours: currently, tags are just key-value pairs where keys are strings and values are strings or sets of strings. We decided to make tags more general to make them more future proof, and also to let you get effectively arbitrary string-value maps for the price of one construct :). The problem with fixing the tag-value type to
Yes, the reason is that we need to have an explicit AST representation for type checking and all other static tools that are type-aware, like the symbolic compilation or potentially a type-aware partial evaluator. It is indeed possible to achieve the desired effect by having a reserved attribute name, such as Edited to add: While we could achieve the same effect with a special field, it would be a lot trickier. Somewhere in the CST -> AST conversion, we’d have to carefully write the code to convert all occurrences of the CST pattern |
The examples in the RFC around this confuse me, though:
It seems silly to have Consider the above example, and then this response:
I'm expecting you to say "this is not possible" but I think the tags construct, and the "arbitrary" Map type is sort of a bandaid to paper over the fact that the only thing you can do with Sets is test membership. If you could more arbitrarily filter Sets and test the cardinality of the result, problem goes away. Suppose But you could also define This is, of course, a huge stretch, but I think it's a much much much more expressive abstraction than arbitrary
This is a bit unfortunate, but I understand. :) |
This sounds somewhat like RFC 21, which was accepted but then later rejected (to my disappointment haha). |
Yes, this is equivalent to RFC 21 in the sense that it requires adding higher-order functions to the language. In general, there is no way to encode higher-order functions in a fragment of SMT that is decidable---meaning, the SMT solver can definitely decide if a formula in that fragment has a model (binding of variables to values that makes the formula true). This translation requires using quantifiers and/or higher-order logic (HOL), both of which place us outside of the decidable fragment in general. There has been recent work showing that some interesting fragments of HOL are decidable, which is why we hoped RFC 21 would be feasible (i.e., expressible in these fragments). But after digging deeper, we found that a very reduced subset of RFC 21 definitely falls into this decidable HOL fragment. We closed RFC 21 because it wasn't clear this minimized version covers enough use cases to justify the complexity of adding it to the language. For example, it couldn't support the |
If Sets are finite, can't you "unroll" the filter such that you'd have:
These are encodable in SMT, since it's already being done. Perhaps the problem would be the representation of Sets doesn't allow for enumeration, or projection? But the representation of Sets must support some way to look at elements, because you can test membership. So, could this generalize, perhaps? Suppose we had a special type,
Such that when using a Query, which, perhaps, could be treated like a quotation (see also #61) ...
the actual expression becomes expanded to (I'm going to pretend there's a
For non-Record queries, you'd have |
Sets are finite but they are not bounded. There is no a priori bound on the size of Cedar sets that would let us generalize this unrolling. Even if there was, and this bound was big (say, 1000 elements), the unrolling wouldn't work in practice because there is a limit to how big of a problem the solver can handle in reasonable time.
This is, unfortunately, the same problem as above. Unrolling is possible only if we have a small bound on set size.
I'm a huge fan of Scheme---Racket specifically---myself :)) |
Perhaps too fundamental a question, but couldn't we bound this to make it practical for the solver and use Mathematical Induction to prove it for the general case? If the solver shows it works for the base case, and every case exhaustively to some bound
I suspect the difference is that the cardinality of the set isn't part of the type, so it's a runtime vs validation time concern, but how is this different than defining a record with 10,000 fields? I don't think there's a constraint on the number of fields in a Record? |
Ah, this is a deep rabbit hole. Sometimes, this is possible. In a prior life, I was part of a team that verified Linux eBPF compilers. This proof involved a combination of meta-theoretical reasoning by induction in Lean and discharging the non-inductive parts of the proof with an SMT solver. Perhaps something like this is possible here, though finding out would be substantial research effort in its own right.
It is indeed no different. If we had a Record with 10K fields, things would get very, very slow :) But there is a saving grace here. Even if the record has 10K fields, a reasonably-sized policy cannot reference all of them simultaneously in a complicated way. (We expect all production uses of Cedar to put some limit on policy size.). So, if we encounter a situation like this in the wild, it is possible to develop methods to slice out irrelevant fields for the purpose of analysis. |
@@ -0,0 +1,251 @@ | |||
# Entity tags with dedicated syntax and semantics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How committed are you to a hardcoded literal of tag
in the language (tags Set<String>
, .hasTag()
, .getTag()
)? For a general system, that probably makes sense, but I would love to use Cedar with Kubernetes, which has a similar concept on all resources but calls them labels
(and also a second set of key/value annotations
). I'm really not asking for a rename to fit Kubernetes, but perhaps wondering about flexibility for possibly type/function/literal aliasing in the language when the name conflicts.
|
||
|
||
``` | ||
e.hasTag(k) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you ever add additional functions like e.hasTags(Set<String>)
? I'm thinking of use cases that AWS IAM policy functions ForAllValues:
or ForAnyValue:
satisfy today to express statements "I need all these tags to be present", "I need at least one tag to be present", or "I need all keys to match a like
operator (say, all keys must have a specific prefix)"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.hasAllTags(Set<String>)
and e.hasAnyTags(Set<String>)
could be supported, though a general any/all operator isn't analyzable (see the rejected RFC 21).
}, | ||
} | ||
}, | ||
"tags" : { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can tags
be on a commonType
? If so, how do nested tags work? (I'm probably getting the syntax wrong) Ex:
{
"entityTypes": {
"Document": {
"shape": {
"type": "Record",
"attributes": {
"name": {"type": "String"},
"signature": {"type": "Signature"}
},
},
"tags": {
//...
}
}
},
"commonTypes": {
"Signature": {
"shape": {
//...
},
"tags": {
//..
}
}
}
}
I ask because Kubernetes types do have nested types, both of which have labels. (.metadata.labels
and .spec.template.metadata.labels
)
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: my-cool-app
name: example
spec:
replicas: 1
selector:
matchLabels:
app: my-cool-app
template:
metadata:
labels:
app: my-cool-app
spec:
containers:
- image: nginx:latest
name: nginx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, after reading https://github.com/cedar-policy/rfcs/pull/82/files#r1759501086, this looks to be possible.
permit(principal is User, action, resource is apps::Deployment)
when {
principal.hasTag("app") &&
resource.hasTag("app") &&
resource.spec.template.hasTag("app") &&
principal.getTag("app").contains(resource.getTag("app")) &&
principal.getTag("app").contains(resource.spec.template.getTag("app"))
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure I understand the question, but it's possible to nest tags arbitrarily using the indirection approach described in the RFC (see IntTags
and StringTags
entities).
|
||
## Drawbacks | ||
|
||
Entity types are permitted to have a single `tags` declaration `tags T`, which eliminates the possibility of attaching multiple tags-maps to a single entity (unlike [RFC 68](https://github.com/cedar-policy/rfcs/pull/68)). This RFC also does not support tags containing other tags-maps, comparing tags-maps with `==`, and the like (as discussed above). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this forever limit you to a single set of key/value types to authorize on? Ever Kubernetes object has string key/value labels
and annotations
. I don't think most users make authorization decisions based on annotation
values, but some might want policy to require a specific annotation
key has a value for audit purposes. Could I make a separate set of tags a named attribute on an entity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is possible. Paragraphs below this one show how to get arbitrarily many tag-maps attached to a single entity, and also how to achieve (if desired) arbitrary nesting of tag-maps into other types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ok I get it now, thank you!
I'm sorry, but this confuses me more. For the purposes of analysis, then, I assume we limit the number of fields to something reasonable. Why is this OK for records, but wouldn't be OK for Sets? (Also: This is getting way off topic, so I'll take this question elsewhere. ) |
} tags Set<String>; | ||
``` | ||
|
||
The `User` and `Document` entities each have entity tags, denoted by `tags Set<String>`, implementing tags whose values are sets of strings. The declaration `tags Set<String>` indicates an unspecified number of optional tags that all have values of type `Set<String>`. (Of course, any type could be written, for instance `Long`, not just `Set<String>`.) Note that the set of tag (keys) does not have to be defined in the schema, but the type of all tag values needs to be the same (here, `Set<String>`). (You cannot have tag `foo` have value `2`, and tag `bar` have value `"spam"`.) Note also that there is no way to name the key-value map representing tags; you cannot treat it as a Cedar record or even compare it for equality to the tag-map on a different entity; you can only compare the values of individual keys. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I fully grok the current capabilities of Cedar + this proposal, but I do have a real-world example I'd want to walk through.
In Kubernetes, admission gets both the requesting user and the resource being acted on. I'd really like to be able to implement some sort of ABAC, but the the tags I'd be working with are slightly different shapes. Every Kubernetes resource has key/value (String
/String
) set of labels.
The User structure has extra info in the form of String
/Set<String>
. Would it be as simple as using existing .contains()
functions in policy to implement ABAC?
permit (
principal is User,
action == Action::"writeDoc",
resource is Document)
when {
(principal.hasTag("my.tld/myorg") &&
resource.hasTag("my.tld/myorg") &&
principal.getTag("my.tld/myorg").contains(resource.getTag("my.tld/myorg"))
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the resource Entity Type is defined to have tags String
and the principal Entity Type is defined to have tags Set<String>
, then the example policy is type-correct and will work as expected.
Here, I would expect the policy to check ifresource.getTag("my.tld/myorg")
, which is a String, is included in the set of Strings principal.getTag("my.tld/myorg")
.
}; | ||
|
||
permit(principal is User, action, resource is Document) | ||
when { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason the two policies in this example could not be merged into one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming that the policies are evaluated against inputs that conform to the example schema, we can re-write these two policies as follows:
permit(principal is User, action, resource is Document)
when {
(principal.roles.contains(context.role) &&
principal.accessLevels.hasTag(context.role) &&
principal.accessLevels.getTag(context.role) > 3) ||
(principal.hasTag("clearance") &&
principal.getTag("clearance").hasTag("level") &&
principal.getTag("clearance").getTag("level") == "top secret")
};
In general though, merging two policies like this is not a safe transformation (it can result in different authz results) because of errors, but that's a separate can of worms :)
This proposal is awesome and a very welcome addition to the language! I know we've had scenarios where we wanted to support arbitrary entity tags and have been at a bit of a loss about how to handle them in Cedar. The one thing I'd push back on is the typing of entity tags. Since this RFC is now strictly about adding the concept of entity tags rather than supporting general EA-maps proposed in RFC 68, I find that the typing of the tags adds unnecessary cognitive complexity (not to mention a change to the entity schema language). I'd bet that just drawing a line in the sand and saying that entity tags are always a Otherwise, the use case for tag typing seems to mostly be to allow you to (ab?)use the entity tags feature to get something like a |
What if the type was always Maybe for ergonomic reasons, there could be one more method:
Which would just be sugar for:
|
That wouldn't work because there would be no way to operate on the elements of the set. For example, we couldn't write something like See also this thread. |
Yes, that's a tricky one. We're already aware of use cases for |
Rendered