Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New entity tags proposal (supersedes RFC-68) #82

Merged
merged 5 commits into from
Sep 19, 2024
Merged

Conversation

emina
Copy link
Contributor

@emina emina commented Sep 11, 2024

@emina
Copy link
Contributor Author

emina commented Sep 11, 2024

During the implementation of RFC-68, we discovered issues that are addressed by this new RFC. See the Alternatives section of this RFC for a discussion.

This RFC is entering the Final Comment Period (FCP) today with the intent to accept.

The FCP closes on Wednesday, September 18, at 5 pm Pacific Time.

text/0068-entity-tags.md Outdated Show resolved Hide resolved
Comment on lines +32 to +39
```
entity User = {
jobLevel: Long,
} tags Set<String>;
entity Document = {
owner: User,
} tags Set<String>;
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One concern I thought of with this syntax: If you write

entity User = {
    jobLevel: Long,
} tags { foo: String };

this is valid but potentially very confusing. It might look a lot like the entity has a required tag foo of type String, but that is not at all what that means in the current proposal. Instead, the entity has (zero or more) tags, with any names, and the value for all of them is a record with attribute foo of type String.

I'm not sure if there is a clearer syntax alternative. Ideas?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally think it's Ok. Not having curly braces is clear, e.g., entity { ... } tags Set<String>. Having them, they evoke that the tags are a record. If we wrote entity User { ... } tags { foo: String, bar: String } what would it mean otherwise? Adding extra delimiters seems unnecessary (which could not be curly braces).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the syntax for record-typed tags is confusing, but I don't have a better suggestion. I think the confusion can be avoided by thorough documentation with plenty of examples.

@philhassey
Copy link

I read over this as well as 68. I think I would prefer tags to be supported this way:

  • No new types are added to Cedar at all.
  • Schemas are is extended to support specifying records that have unknown keys and all the same type of value.

e.g.

{
  tags: {
    ?: String
  }
}

This would allow tags to live anywhere in an entities attributes (not just the top level). It would also allow tags to live inside context (without having to create a separate entity to support it.). So it seems more flexible than 68 or 82 while still providing the added validation, though it would not add support for computed keys.

When validating records with unknown keys, expressions such as x.tags has y creates the capability to access x.tags.y.

@cdisselkoen
Copy link
Contributor

@philhassey what you've suggested is essentially RFC 66. I would also prefer this, but it seems to not meet Cedar's tenets around analyzability; see comments on 66 and 68.

Copy link
Contributor

@khieta khieta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The arguments against RFC 68 seem convincing to me. Approving the proposal as-is.

It might also be useful to add some text comparing this proposal to RFC 27, which would be the true (IMO) generalization of RFC 68. How much work would RFC 27 be over the current proposal?

text/0082-entity-tags.md Outdated Show resolved Hide resolved
text/0082-entity-tags.md Outdated Show resolved Hide resolved
Comment on lines +32 to +39
```
entity User = {
jobLevel: Long,
} tags Set<String>;
entity Document = {
owner: User,
} tags Set<String>;
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the syntax for record-typed tags is confusing, but I don't have a better suggestion. I think the confusion can be avoided by thorough documentation with plenty of examples.


### Validating and parsing entities

The Cedar authorizer's `is_authorized` function can be asked to validate that entities in a request are consistent with a provided schema. Extending validation to work with tags is straightforward. We check that every value in the `tags` map for an entity `e` has type `T` given that type of `e` is declared to have `tags T`. If the type `e` doesn't have `tags` in the schema, then specifying a `tags` field is an error.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Entity validation isn't performed during is_authorized, but rather when constructing a Entities object (similar to schema-based parsing).

emina and others added 3 commits September 12, 2024 10:50
@emina
Copy link
Contributor Author

emina commented Sep 12, 2024

The arguments against RFC 68 seem convincing to me. Approving the proposal as-is.

It might also be useful to add some text comparing this proposal to RFC 27, which would be the true (IMO) generalization of RFC 68. How much work would RFC 27 be over the current proposal?

Ah, RFC 27 is not analyzable because it introduces first-class maps. So, it's possible write something like principal.tags == resource.tags, which we avoid with RFC 68 and this RFC (through different mechanisms).

@philhassey
Copy link

philhassey commented Sep 12, 2024

Ah, RFC 27 is not analyzable because it introduces first-class maps. So, it's possible write something like principal.tags == resource.tags, which we avoid with RFC 68 and this RFC (through different mechanisms).

@emina Can you explain this in a little more detail for me? I've been reading over 27, 66, 68, 82 and I still don't have a great grasp on this. Thanks!

@emina
Copy link
Contributor Author

emina commented Sep 12, 2024

Ah, RFC 27 is not analyzable because it introduces first-class maps. So, it's possible write something like principal.tags == resource.tags, which we avoid with RFC 68 and this RFC (through different mechanisms).

@emina Can you explain this in a little more detail for me? I've been reading over 27, 66, 68, 82 and I still don't have a great grasp on this. Thanks!

I'll try :)

If we supported first-class maps, this means that you could use maps in the same way that you can use any other Cedar type. Specifically, (1) the Map type can be freely embedded into other types, so we can have an attribute of type stuff: Set<Map<String, Long>>, and (2) instances of the Map type could be compared for equality, we can write principal.stuff == resource.stuff. This second point is subtle because equality arises in Cedar not only during == comparisons, but also as part of set operations: contains, containsAll, and containsAny.

Now, we implement analysis by compiling Cedar to a lower-level language called SMT-Lib, which is what automated theorem provers (SMT solvers) understand. For this compilation process to work, there must be a way to represent each Cedar first-class type as some SMT type.

In SMT, there is no such thing as a finite map---that is, a map with a finite number of keys. There is something called an "Array", which is like a map, but it is defined on all possible keys of a given type. So, Array String Integer represents a map that's defined on all strings. In fact, the SMT solver represents these infinite maps as if-then-else functions (in pseudosyntax):

def arrayFoo (s : String) : Integer := 
if s == "foo" 
then 0 
else if s == "bar" 
then 1 
else 2

This is clearly not what we want in Cedar: we want maps to be finite.

So, we'd have to represent Cedar maps in some other way, let's say as Arrays of type Array String (Option Integer), where the absence of a key in the (infinite) map is represented by mapping the key to none. But this doesn't work either because we have no efficient way to tell the solver "make sure that only a finite set of keys is mapped to some Integer and the rest of the keys (of which there are infinitely many) are mapped to none".

With this proposal, note that it is impossible to put a Map in a set, or to compare two Maps for equality. This lets us avoid using Arrays, and we can represent maps efficiently as "uninterpreted functions". Because these are not first-class, it is safe (due to the way the SMT solvers work) to finitize these functions (which are also defined on all strings!) in a way that lets us encode Cedar maps.

Hope that helps give some intuition :)

@philhassey
Copy link

So could an idea like this work:

  1. A Typed Map type exists, such that keys are strings and values are all of the same type (e.g. map of strings, or map of longs, etc).
  2. This Map type is somewhat like an Entity, in that it is only identified. In the case of Maps, likely by an int64 ID number.
  3. This Map type would have two methods "has" and "get" which could be overloaded versions of the "has" and "." operators in the Cedar language. (e.g. principal.tags has "key" && principal.tags.key == "value" would work.)
  4. This Map type would only compare as equal if the exact same Map (ID'd by the same ID) were equal. Two different maps with the same keys and values would not be considered equal. (e.g. principal.tags == principal.tags would be true, but resource.tags == principal.tags would be false unless they had the same internal ID.). Similarly [principal.tags].contains(principal.tags) would evaluate to true, but [principal.tags].contains(resource.tags) would be false)
  5. Maps could not be created within the Cedar language (similar to Entities, they are only defined outside the language). The only way to reference an existing map would be by referencing it as it exists on another object (e.g. principal.tags or context.extra.tags)
  6. JSON representation for entities, context would be similar to record, but with some specification of the Map type added as a wrapper.
  7. Schemas could specify that certain parts of records must be typed Maps or that a Set must only contain typed Maps.

This seems similar to the current proposal, in that it is a workaround for the analyzable problems. However, it also gets the benefits of things like RFC 27 where we get the nice ergonomics of attribute access. It also gets the benefit of being able to put tags wherever you want in your context or entity graph.

@emina
Copy link
Contributor Author

emina commented Sep 12, 2024

So could an idea like this work:

  1. A Typed Map type exists, such that keys are strings and values are all of the same type (e.g. map of strings, or map of longs, etc).
  2. This Map type is somewhat like an Entity, in that it is only identified. In the case of Maps, likely by an int64 ID number.
  3. This Map type would have two methods "has" and "get" which could be overloaded versions of the "has" and "." operators in the Cedar language. (e.g. principal.tags has "key" && principal.tags.key == "value" would work.)
  4. This Map type would only compare as equal if the exact same Map (ID'd by the same ID) were equal. Two different maps with the same keys and values would not be considered equal. (e.g. principal.tags == principal.tags would be true, but resource.tags == principal.tags would be false unless they had the same internal ID.). Similarly [principal.tags].contains(principal.tags) would evaluate to true, but [principal.tags].contains(resource.tags) would be false)
  5. Maps could not be created within the Cedar language (similar to Entities, they are only defined outside the language). The only way to reference an existing map would be by referencing it as it exists on another object (e.g. principal.tags or context.extra.tags)
  6. JSON representation for entities, context would be similar to record, but with some specification of the Map type added as a wrapper.
  7. Schemas could specify that certain parts of records must be typed Maps or that a Set must only contain typed Maps.

This seems similar to the current proposal, in that it is a workaround for the analyzable problems. However, it also gets the benefits of things like RFC 27 where we get the nice ergonomics of attribute access. It also gets the benefit of being able to put tags wherever you want in your context or entity graph.

Yes, this could work in theory. The proposal is effectively to introduce another opaque reference type (which is what Entities are) into the language, and have maps wrapped into this opaque reference type.

This alternative is a much more invasive change to the language (adding a second reference type), and it would be significantly harder to implement than this RFC at the translation-to-SMT layer.

The alternative does enable reusing the syntax for records and entities for opaque maps. But it doesn't add any expressiveness over this RFC. Specifically:

It also gets the benefit of being able to put tags wherever you want in your context or entity graph.

You can do this with the current RFC too by introducing a dedicated Entity type into the schema for every kind of map that you need, which gives you the freedom to nest wrapped maps into sets etc (like in this workaround). Basically, you use dedicated Entities to simulate wrapped maps, without having to introduce a second reference type into the language.

@philhassey
Copy link

philhassey commented Sep 12, 2024

I don't like this as nearly as much as my previous comment, but a variation on RFC 82 and the Typed Maps idea could be:

  • An Entity Type attribute schema could be defined as it is today, or as a typed Map.

Thus principal.tags has key && principal.tags.key == "value" can work if you define principal.tags to be an entity where the attributes are a typed map. This can also establish the capability since types are known.

This seems similar to the current proposal, in that it is a workaround for the analyzable problems. However, it also gets the benefits of things like RFC 27 where we get the nice ergonomics of attribute access. It has the benefit of not adding a new reference type, but has the drawback of requiring instances of a tags-map-typed-entity to populate your entity graph.

@emina
Copy link
Contributor Author

emina commented Sep 12, 2024

I don't like this as nearly as much, but a variation on RFC 82 and the Typed Maps idea could be:

  • An Entity Type attribute schema could be defined as it is today, or as a typed Map.

Thus principal.tags has key && principal.tags.key == "value" can work if you define principal.tags to be an entity where the attributes are a typed map. This can also establish the capability since types are known.

This seems similar to the current proposal, in that it is a workaround for the analyzable problems. However, it also gets the benefits of things like RFC 27 where we get the nice ergonomics of attribute access. It has the benefit of not adding a new reference type, but has the drawback of requiring instances of tags to populate your entity graph.

Do you mean that we could change the Entity schema to allow two options: either (1) a regular entity with attributes, or (2) a special entity that has no attributes and behaves like a Map?

Something like:

// Regular entity
entity User {
  roles: Set<String>,
  accessLevels: IntTags
};

// Map-like entity
entity IntTags : Long;

This is possible and wouldn't be harder to implement than the proposed RFC (at the translation layer).

The downside though is that this design forces the use of the extra level of indirection to specify tags (in this case, via IntTags), as opposed to only in special circumstances---for example, when we want to attach several tag-maps to a single entity.

Since the main motivation for this feature is to support tags, I'd be reluctant to make the feature less convenient for modeling the most common notion of tags (a single tag-map attached to a resource type).

Fundamentally, the price we pay with this proposal is that we need a dedicated syntax for tags (.getTag and .hasTag) that differs from record and entity syntax. But what we get in return is support for tags that's easy to explain, use (in the common case), and implement.

On the balance, the price seems worth paying to me, and I'm curious to hear from others :)

Edited to add: Just to re-iterate, this alternative is feasible from the implementation point of view. So, which way we go really depends on whether we want to prioritize reusing the record/entity access syntax, or we want to prioritize making tags a distinct and dedicated concept of "key-value pairs attached to an entity type." I'm currently leaning toward the latter.

@apg
Copy link
Contributor

apg commented Sep 13, 2024

Fun to see a new RFC. I especially like the improvement that allows dynamic lookup!

I'm a bit confused by a few aspects of this proposal, though:

  • Tags are an addition to Entities, which makes sense. However, all tags for a given Entity have the same type, but that type is arbitrary. What is the real world use case for all tag values to be, for instance, a datetime or a Long?

    • As far as I've seen, most Tag models (infrastructure cloud, ecommerce, social media content) are "stringly"-typed, and are essentially either a String or a Set<String>. Often the model doesn't even support key-value pairs, and instead sometag is semantically equivalent to sometag=true.
    • Drawing a line in the sand and saying Tags are always { String: Set<String> } removes the ugly/confusing schema syntax, and weird "you can get additional tag value types by creating a new 'proxy' (my word) Entity and including it in your Entity" (i.e. IntTags)
  • Why are new methods needed, as opposed to overloading the has operator and extending the attribute access syntax to tags? It would seem the reason is to have explicit AST representation to "clue" type checking. If there's a well known name for the tags, say Entity._tags_ or Entity._meta_ or something, could that obviate the need for methods?

@emina
Copy link
Contributor Author

emina commented Sep 13, 2024

Good questions!

  • Tags are an addition to Entities, which makes sense. However, all tags for a given Entity have the same type, but that type is arbitrary. What is the real world use case for all tag values to be, for instance, a datetime or a Long?

    • As far as I've seen, most Tag models (infrastructure cloud, ecommerce, social media content) are "stringly"-typed, and are essentially either a String or a Set<String>. Often the model doesn't even support key-value pairs, and instead sometag is semantically equivalent to sometag=true.
    • Drawing a line in the sand and saying Tags are always { String: Set<String> } removes the ugly/confusing schema syntax, and weird "you can get additional tag value types by creating a new 'proxy' (my word) Entity and including it in your Entity" (i.e. IntTags)

My understanding of existing tag systems matches yours: currently, tags are just key-value pairs where keys are strings and values are strings or sets of strings. We decided to make tags more general to make them more future proof, and also to let you get effectively arbitrary string-value maps for the price of one construct :).

The problem with fixing the tag-value type to Set<String> is that you could no longer use like operator on them, which comes up frequently. So, you couldn’t write something like principal.hasTag(“role”) && principal.get(“role”) like “* admin”.

  • Why are new methods needed, as opposed to overloading the has operator and extending the attribute access syntax to tags? It would seem the reason is to have explicit AST representation to "clue" type checking. If there's a well known name for the tags, say Entity._tags_ or Entity._meta_ or something, could that obviate the need for methods?

Yes, the reason is that we need to have an explicit AST representation for type checking and all other static tools that are type-aware, like the symbolic compilation or potentially a type-aware partial evaluator.

It is indeed possible to achieve the desired effect by having a reserved attribute name, such as _tags_ or _meta_. We decided against it because there is unfortunately no name that we can reserve that wouldn’t introduce a breaking change into the language :-( Any string is a valid attribute name, so there might be policies / schema out in the wild that are already using _tags_ or _meta_ as regular attribute names.

Edited to add: While we could achieve the same effect with a special field, it would be a lot trickier. Somewhere in the CST -> AST conversion, we’d have to carefully write the code to convert all occurrences of the CST pattern e._tags_ has foo and e._tags_.foo into the AST equivalent of e.hasTag(“foo”) and e.getTag(“foo”). If we didn’t do this during CST -> AST conversion and instead tried to handle it during type checking etc., we would run into the same problem as with RFC 68: all type-aware code would have to do a lot of special-casing to rule out first-class uses of _tags_ (such as principal._tags_ == resource._tags_).

@apg
Copy link
Contributor

apg commented Sep 13, 2024

My understanding of existing tag systems matches yours: currently, tags are just key-value pairs where keys are strings and values are strings or sets of strings. We decided to make tags more general to make them more future proof, and also to let you get effectively arbitrary string-value maps for the price of one construct :).

The examples in the RFC around this confuse me, though:

permit(principal is User, action, resource is Document)
when {
  principal.roles.contains(context.role) &&
  principal.accessLevels.hasTag(context.role) &&
  principal.accessLevels.getTag(context.role) > 3
};

It seems silly to have principal.roles and accessLevels if you could have Map<String, Role> and Role be { accessLevel: 3 ... }, no?

Consider the above example, and then this response:

The problem with fixing the tag-value type to Set<String> is that you could no longer use like operator on them, which comes up frequently. So, you couldn’t write something like principal.hasTag(“role”) && principal.get(“role”) like “* admin”.

I'm expecting you to say "this is not possible" but I think the tags construct, and the "arbitrary" Map type is sort of a bandaid to paper over the fact that the only thing you can do with Sets is test membership. If you could more arbitrarily filter Sets and test the cardinality of the result, problem goes away.

Suppose Set had another method, let's call it containsQuery(). You can now define a Record type Tag as { name: string, value: ? } and then containsQuery({ .name == "Foo" }) ... as an example. (Note: I'm using {} because of its proximity to record syntax as illustration)

But you could also define principal.roles as Set<Role> and perhaps containsQuery({ .name like "Foo *", .accessLevel > 3}). Type checking this seems possible. I imagine it's not trivially expressable in SMT, though...

This is, of course, a huge stretch, but I think it's a much much much more expressive abstraction than arbitrary Map<string, ?> which still has the fundamental problem that you MUST have map keys.

Yes, the reason is that we need to have an explicit AST representation for type checking and all other static tools that are type-aware, like the symbolic compilation or potentially a type-aware partial evaluator.

It is indeed possible to achieve the desired effect by having a reserved attribute name, such as _tags_ or _meta_. We decided against it because there is unfortunately no name that we can reserve that wouldn’t introduce a breaking change into the language :-( Any string is a valid attribute name, so there might be policies / schema out in the wild that are already using _tags_ or _meta_ as regular attribute names.

This is a bit unfortunate, but I understand. :)

@cdisselkoen
Copy link
Contributor

If you could more arbitrarily filter Sets

This sounds somewhat like RFC 21, which was accepted but then later rejected (to my disappointment haha).

@emina
Copy link
Contributor Author

emina commented Sep 13, 2024

This sounds somewhat like RFC 21, which was accepted but then later rejected (to my disappointment haha).

Yes, this is equivalent to RFC 21 in the sense that it requires adding higher-order functions to the language. In general, there is no way to encode higher-order functions in a fragment of SMT that is decidable---meaning, the SMT solver can definitely decide if a formula in that fragment has a model (binding of variables to values that makes the formula true).

This translation requires using quantifiers and/or higher-order logic (HOL), both of which place us outside of the decidable fragment in general.

There has been recent work showing that some interesting fragments of HOL are decidable, which is why we hoped RFC 21 would be feasible (i.e., expressible in these fragments). But after digging deeper, we found that a very reduced subset of RFC 21 definitely falls into this decidable HOL fragment. We closed RFC 21 because it wasn't clear this minimized version covers enough use cases to justify the complexity of adding it to the language. For example, it couldn't support the containsQuery method. Roughly speaking, we could support only simple comparisons of set elements to a constant (e.g., principal.values.all > 1).

@apg
Copy link
Contributor

apg commented Sep 13, 2024

Yes, this is equivalent to RFC 21 in the sense that it requires adding higher-order functions to the language. In general, there is no way to encode higher-order functions in a fragment of SMT that is decidable---meaning, the SMT solver can definitely decide if a formula in that fragment has a model (binding of variables to values that makes the formula true).

If Sets are finite, can't you "unroll" the filter such that you'd have:

["some", "set", "values"].containsQuery(_ like "s*") which unrolls to:

("some" like "s*" ||
"set" like "s*" ||
"values" like "s*")

These are encodable in SMT, since it's already being done. Perhaps the problem would be the representation of Sets doesn't allow for enumeration, or projection? But the representation of Sets must support some way to look at elements, because you can test membership.

So, could this generalize, perhaps?

Suppose we had a special type, Query<T>, which has the ability to lift the contained type to be Option... recursively (ok, now I'm really stretching for sure...). We could then define a Query to be something along the lines of:

Query<Person> {
  nameEQ: Option<String>,
  nameLIKE: Option<String>,
  ageEQ: Option<Long>,
  ageLT: Option<Long>,
  ... 
}

Such that when using a Query, which, perhaps, could be treated like a quotation (see also #61) ...

[{name: "Guy", age: 69}, {name: "Gerald", age: 77}].containsQuery(?{name like "g*", age > 70})

the actual expression becomes expanded to (I'm going to pretend there's a let P1 = Set.pop(), let P2 = Set.pop() ...

(
  ((not Q.nameLIKE? || P1.name like Q.nameLIKE) &&
  (not Q.ageGT? || P1.age > Q.ageGT)) ||
  ((not Q.nameLIKE? || P2.name like Q.nameLIKE) &&
  (not Q.ageGT? || P2.age > Q.ageGT)) ||
)

For non-Record queries, you'd have Q.valueLIKE? instead. I may... or may not... be showing my love of Scheme here.

@emina
Copy link
Contributor Author

emina commented Sep 13, 2024

If Sets are finite, can't you "unroll" the filter such that you'd have:

["some", "set", "values"].containsQuery(_ like "s*") which unrolls to:

("some" like "s*" ||
"set" like "s*" ||
"values" like "s*")

So, could this generalize, perhaps?

Sets are finite but they are not bounded. There is no a priori bound on the size of Cedar sets that would let us generalize this unrolling. Even if there was, and this bound was big (say, 1000 elements), the unrolling wouldn't work in practice because there is a limit to how big of a problem the solver can handle in reasonable time.

actual expression becomes expanded

This is, unfortunately, the same problem as above. Unrolling is possible only if we have a small bound on set size.

I may... or may not... be showing my love of Scheme here.

I'm a huge fan of Scheme---Racket specifically---myself :))

@apg
Copy link
Contributor

apg commented Sep 13, 2024

There is no a priori bound on the size of Cedar sets that would let us generalize this unrolling.

Perhaps too fundamental a question, but couldn't we bound this to make it practical for the solver and use Mathematical Induction to prove it for the general case? If the solver shows it works for the base case, and every case exhaustively to some bound N... by induction we've shown it works for the base case, and N+1... (I assume this is not convincing to a mathematician though... ;))

This is, unfortunately, the same problem as above. Unrolling is possible only if we have a small bound on set size.

I suspect the difference is that the cardinality of the set isn't part of the type, so it's a runtime vs validation time concern, but how is this different than defining a record with 10,000 fields? I don't think there's a constraint on the number of fields in a Record?

@emina
Copy link
Contributor Author

emina commented Sep 13, 2024

Perhaps too fundamental a question, but couldn't we bound this to make it practical for the solver and use Mathematical Induction to prove it for the general case? If the solver shows it works for the base case, and every case exhaustively to some bound N... by induction we've shown it works for the base case, and N+1... (I assume this is not convincing to a mathematician though... ;))

Ah, this is a deep rabbit hole. Sometimes, this is possible. In a prior life, I was part of a team that verified Linux eBPF compilers. This proof involved a combination of meta-theoretical reasoning by induction in Lean and discharging the non-inductive parts of the proof with an SMT solver.

Perhaps something like this is possible here, though finding out would be substantial research effort in its own right.

I suspect the difference is that the cardinality of the set isn't part of the type, so it's a runtime vs validation time concern, but how is this different than defining a record with 10,000 fields? I don't think there's a constraint on the number of fields in a Record?

It is indeed no different. If we had a Record with 10K fields, things would get very, very slow :)

But there is a saving grace here. Even if the record has 10K fields, a reasonably-sized policy cannot reference all of them simultaneously in a complicated way. (We expect all production uses of Cedar to put some limit on policy size.). So, if we encounter a situation like this in the wild, it is possible to develop methods to slice out irrelevant fields for the purpose of analysis.

@@ -0,0 +1,251 @@
# Entity tags with dedicated syntax and semantics

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How committed are you to a hardcoded literal of tag in the language (tags Set<String>, .hasTag(), .getTag())? For a general system, that probably makes sense, but I would love to use Cedar with Kubernetes, which has a similar concept on all resources but calls them labels (and also a second set of key/value annotations). I'm really not asking for a rename to fit Kubernetes, but perhaps wondering about flexibility for possibly type/function/literal aliasing in the language when the name conflicts.



```
e.hasTag(k)
Copy link

@micahhausler micahhausler Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you ever add additional functions like e.hasTags(Set<String>)? I'm thinking of use cases that AWS IAM policy functions ForAllValues: or ForAnyValue: satisfy today to express statements "I need all these tags to be present", "I need at least one tag to be present", or "I need all keys to match a like operator (say, all keys must have a specific prefix)"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.hasAllTags(Set<String>) and e.hasAnyTags(Set<String>) could be supported, though a general any/all operator isn't analyzable (see the rejected RFC 21).

},
}
},
"tags" : {
Copy link

@micahhausler micahhausler Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can tags be on a commonType? If so, how do nested tags work? (I'm probably getting the syntax wrong) Ex:

{
  "entityTypes": {
    "Document": {
      "shape": {
        "type": "Record",
        "attributes": {
          "name": {"type": "String"},
          "signature": {"type": "Signature"}
        },
      },
      "tags": {
        //...
      }
    }
  },
  "commonTypes": {
    "Signature": {
      "shape": {
        //...
      },
      "tags": {
        //..
      }
    }
  }
}

I ask because Kubernetes types do have nested types, both of which have labels. (.metadata.labels and .spec.template.metadata.labels)

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: my-cool-app
  name: example
spec:
  replicas: 1
  selector:
    matchLabels:
      app: my-cool-app
  template:
    metadata:
      labels:
        app: my-cool-app
    spec:
      containers:
      - image: nginx:latest
        name: nginx

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, after reading https://github.com/cedar-policy/rfcs/pull/82/files#r1759501086, this looks to be possible.

permit(principal is User, action, resource is apps::Deployment)
when {
  principal.hasTag("app") &&
  resource.hasTag("app") && 
  resource.spec.template.hasTag("app") &&
  principal.getTag("app").contains(resource.getTag("app")) &&
  principal.getTag("app").contains(resource.spec.template.getTag("app"))
};

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure I understand the question, but it's possible to nest tags arbitrarily using the indirection approach described in the RFC (see IntTags and StringTags entities).


## Drawbacks

Entity types are permitted to have a single `tags` declaration `tags T`, which eliminates the possibility of attaching multiple tags-maps to a single entity (unlike [RFC 68](https://github.com/cedar-policy/rfcs/pull/68)). This RFC also does not support tags containing other tags-maps, comparing tags-maps with `==`, and the like (as discussed above).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this forever limit you to a single set of key/value types to authorize on? Ever Kubernetes object has string key/value labels and annotations. I don't think most users make authorization decisions based on annotation values, but some might want policy to require a specific annotation key has a value for audit purposes. Could I make a separate set of tags a named attribute on an entity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is possible. Paragraphs below this one show how to get arbitrarily many tag-maps attached to a single entity, and also how to achieve (if desired) arbitrary nesting of tag-maps into other types.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok I get it now, thank you!

@apg
Copy link
Contributor

apg commented Sep 13, 2024

It is indeed no different. If we had a Record with 10K fields, things would get very, very slow :)

But there is a saving grace here. Even if the record has 10K fields, a reasonably-sized policy cannot reference all of them simultaneously in a complicated way. (We expect all production uses of Cedar to put some limit on policy size.). So, if we encounter a situation like this in the wild, it is possible to develop methods to slice out irrelevant fields for the purpose of analysis.

I'm sorry, but this confuses me more. For the purposes of analysis, then, I assume we limit the number of fields to something reasonable. Why is this OK for records, but wouldn't be OK for Sets? (Also: This is getting way off topic, so I'll take this question elsewhere. )

} tags Set<String>;
```

The `User` and `Document` entities each have entity tags, denoted by `tags Set<String>`, implementing tags whose values are sets of strings. The declaration `tags Set<String>` indicates an unspecified number of optional tags that all have values of type `Set<String>`. (Of course, any type could be written, for instance `Long`, not just `Set<String>`.) Note that the set of tag (keys) does not have to be defined in the schema, but the type of all tag values needs to be the same (here, `Set<String>`). (You cannot have tag `foo` have value `2`, and tag `bar` have value `"spam"`.) Note also that there is no way to name the key-value map representing tags; you cannot treat it as a Cedar record or even compare it for equality to the tag-map on a different entity; you can only compare the values of individual keys.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I fully grok the current capabilities of Cedar + this proposal, but I do have a real-world example I'd want to walk through.

In Kubernetes, admission gets both the requesting user and the resource being acted on. I'd really like to be able to implement some sort of ABAC, but the the tags I'd be working with are slightly different shapes. Every Kubernetes resource has key/value (String/String) set of labels.

The User structure has extra info in the form of String/Set<String>. Would it be as simple as using existing .contains() functions in policy to implement ABAC?

permit (
  principal is User,
  action == Action::"writeDoc",
  resource is Document)
when {
  (principal.hasTag("my.tld/myorg") &&
   resource.hasTag("my.tld/myorg") &&
   principal.getTag("my.tld/myorg").contains(resource.getTag("my.tld/myorg"))
};

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the resource Entity Type is defined to have tags String and the principal Entity Type is defined to have tags Set<String>, then the example policy is type-correct and will work as expected.

Here, I would expect the policy to check ifresource.getTag("my.tld/myorg"), which is a String, is included in the set of Strings principal.getTag("my.tld/myorg").

};

permit(principal is User, action, resource is Document)
when {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason the two policies in this example could not be merged into one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming that the policies are evaluated against inputs that conform to the example schema, we can re-write these two policies as follows:

permit(principal is User, action, resource is Document)
when {
  (principal.roles.contains(context.role) &&
   principal.accessLevels.hasTag(context.role) &&
   principal.accessLevels.getTag(context.role) > 3) || 
  (principal.hasTag("clearance") &&
   principal.getTag("clearance").hasTag("level") &&
   principal.getTag("clearance").getTag("level") == "top secret")
};

In general though, merging two policies like this is not a safe transformation (it can result in different authz results) because of errors, but that's a separate can of worms :)

@patjakdev
Copy link

This proposal is awesome and a very welcome addition to the language! I know we've had scenarios where we wanted to support arbitrary entity tags and have been at a bit of a loss about how to handle them in Cedar.

The one thing I'd push back on is the typing of entity tags. Since this RFC is now strictly about adding the concept of entity tags rather than supporting general EA-maps proposed in RFC 68, I find that the typing of the tags adds unnecessary cognitive complexity (not to mention a change to the entity schema language). I'd bet that just drawing a line in the sand and saying that entity tags are always a {String: String} map and all entity instances automatically have such a map (even if it's empty) would cover 95% of use cases. The one exception I'd be a bit wary of is the {String: Set<String>} case, but if you had enough complains perhaps then you could introduce typing for the tags.

Otherwise, the use case for tag typing seems to mostly be to allow you to (ab?)use the entity tags feature to get something like a {String: Long} EA-map type whose values can be accessed via the tags interface. The notion of a "bag of tags" entity type like entity IntTags {} tags Long; feels pretty peculiar to me.

@philhassey
Copy link

What if the type was always {String:Set<String>} ... And users who just needed a single value could just have sets with a single value in them.

Maybe for ergonomic reasons, there could be one more method:

entity.containsTag("key","value")

Which would just be sugar for:

entity.hasTag("key") && entity.getTag("key").contains("value")

@emina
Copy link
Contributor Author

emina commented Sep 18, 2024

What if the type was always {String:Set<String>} ... And users who just needed a single value could just have sets with a single value in them.

That wouldn't work because there would be no way to operate on the elements of the set. For example, we couldn't write something like resource.hasTag("role") && resource.getTag("role") like "*admin".

See also this thread.

@emina
Copy link
Contributor Author

emina commented Sep 18, 2024

The one exception I'd be a bit wary of is the {String: Set<String>} case, but if you had enough complains perhaps then you could introduce typing for the tags.

Yes, that's a tricky one. We're already aware of use cases for Set<String> (see also this thread).

@emina emina merged commit 30afd30 into main Sep 19, 2024
1 check passed
@adpaco-aws adpaco-aws mentioned this pull request Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants