Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: make labels erasable #54

Merged
merged 4 commits into from
Sep 23, 2024
Merged

feat: make labels erasable #54

merged 4 commits into from
Sep 23, 2024

Conversation

TWal
Copy link
Collaborator

@TWal TWal commented Sep 1, 2024

This PR adds the [@@erasable] to the label type…
… and updates the rest of the code to reflect this change :)

Because labels are erasable, they can't appear in the Rand constructor of bytes on which we want decidable equality. As a consequence, we now have a challenge to define the get_label function on the Rand constructor that previously contained the label, and that now only contain the time at which the randomness was generated (which correspond to a RandGen event in the trace, that do contain the label).

There are two approaches to this problem:

Now that get_label depends on the trace, we need new properties such as "the output of get_label dosen't change when the trace grows".
Here we get to a technical difficulty: what does get_label answer on a Rand constructor that do not correspond to a RandGen event in the trace? For example, if the time of random generation is in the future (with respect to that trace)? Then surely the output of get_label may change when the random bytestring is actually generated.

To deal with this scenario, we define a new well-formed check (bytes_well_formed) that checks whether all Rands inside a bytes correspond to a RandGen event in the trace. This allows us to define the theorem get_label_later stating

val get_label_later:
  {|crypto_usages|} ->
  tr1:trace -> tr2:trace ->
  b:bytes ->
  Lemma
  (requires
    bytes_well_formed tr1 b /\
    tr1 <$ tr2
  )
  (ensures get_label tr1 b == get_label tr2 b)

Now the question: how do we prove the precondition bytes_well_formed tr1 b? In most cases, we can rely on the theorem

val bytes_invariant_implies_well_formed:
  {|crypto_invariants|} ->
  tr:trace -> msg:bytes ->
  Lemma
  (requires bytes_invariant tr msg)
  (ensures bytes_well_formed tr msg)

The only place where we can't rely on bytes_invariant is when we define the bytes_invariant, namely in the crypto_predicates, more precisely in the pred_later lemmas. In that case, all cryptographic functions come with lemmas stating how they preserved the well-formedness, although I think in most scenarios we will just need the ones on split and concat that will be handled by Comparse (as it is done in the NSL example).

Although it looks like a big change, the examples were easy to update and proofs mostly didn't break.

Digression on has_label

I spent quite some time thinking about a third option, which is using a predicate val has_label: trace -> bytes -> label -> prop (which would be monotonic in the trace)
It looks convenient because we could say that a malformed bytes has no labels.

However, the difficulty is the following: how do we express that the plaintext must flow to the key (e.g. in AEAD)?
Before, we wrote

(get_label msg) `can_flow tr` (get_label key)

Now we could write

forall msg_label key_label.
  has_label tr msg msg_label /\
  has_label tr key key_label ==>
  msg_label `can_flow tr` key_label

However if key is malformed, the precondition is never realized (because key has no label) so this would be true,
but that might become false when the trace grows (and key becomes well-formed).

So instead we could write

exists msg_label key_label.
  has_label tr msg msg_label /\
  has_label tr key key_label /\
  msg_label `can_flow tr` key_label

But we need to prove that key has a label, which brings back the well-formedness problem.

Hence it seems that we have to deal with well-formedness when using has_label too, so I don't think it's worth doing (as it would be strictly more annoying to use than what is in this PR).

@TWal TWal requested a review from a team as a code owner September 1, 2024 22:19
@TWal TWal requested a review from qaphla September 2, 2024 07:01
@TWal
Copy link
Collaborator Author

TWal commented Sep 2, 2024

(asking @qaphla for review since we discussed that in Stuttgart)

@qaphla
Copy link

qaphla commented Sep 2, 2024

Some preliminary thoughts, before I go through it more deeply:

  • I like the general idea (as we discussed), and this seems like an elegant way to do it.
  • get_label returning public for ill-formed bytes seems a bit suspicious (but I can't see any concrete example where it causes problems right now) --- my feeling is that we should have something like weak_get_label : trace -> bytes -> option label and then get_label : (tr:trace) -> (b:bytes{bytes_well_formed tr b}) -> label
  • Relatedly, it seems like bytes_well_formed (and some other related predicates that depend on the trace) might be better presented as a kind of three-valued predicate, being true, false, or indeterminate. In particular, for bytes_well_formed, this lets us distinguish between bytes that are definitely not well formed with a given trace (i.e., they are incompatible with that trace), and ones that may become well-formed with an extension of the trace (compatible but not necessarily so). I'm again not sure right now of an immediate application of this, but with the (partial) ordering where indeterminate is less than both true and false, this is still monotonic with respect to trace growth. The has_label predicate also has this property, if I'm not mistaken. In practice, I think for bytes, we only really want to work with ones that are well-formed with respect to the trace at the moment, but I could envision some kinds of predicates with this property being useful if we want to talk about future events. Of course, the problem is that F* probably doesn't have very nice default handling of this kind of three-valued logic... Then again, it's essentially the same as option bool, so maybe it would work with the monad handling.
  • Passing thought: It seems like most of this could also be applied to usages in a similar manner. Do you see any obvious obstacle to that?

I'll go over it more carefully soon, and also see if I come up with any better comments on these points, but I think only the second one (get_label returning public for ill-formed cases) needs to be addressed in some form for me to approve this.

@TWal
Copy link
Collaborator Author

TWal commented Sep 3, 2024

Thank you for your comments!

About get_label returning public when we don't know the label.
I agree it may seem fishy! We can actually change this to any label, I chose public with the idea that we can't do anything useful with public bytes (e.g. if it is used as an AEAD key, encrypting secrets to it). But on the other hand it means we can send it on the network, so it might also make sense to choose the secret label. But in that case we can encrypt secrets to it.
I don't know what is the choice that looks the least fishy. Anyway, in practice we cannot use it to encryption nor send it on the network, because to do that it needs to satisfy the bytes_invariant which prevents these cases.

About get_label coming with two flavors -- one with pre-condition, the other returning option label.
I agree this would avoid needing to choose a default label (which is currently public), but this would also make the user code more aware of this technical subtlety, for example cryptographic predicates would have to deal with the None case (because they can't prove the precondition). Having a default case do seem more hacky but I think it's more nice to use when doing actual security proofs.

About bytes_invariant being three-valued.
Indeed the combo bytes_invariant + bytes_well_formed could be compressed in a three-valued predicate (that is monotonic in the trace), however there is the subtlety that bytes_invariant needs an instance of {|crypto_invariants|}, whereas bytes_well_formed do not need such things. It's important because bytes_well_formed is used when defining crypto_invariants (e.g. in the pred_later lemmas of crypto_predicates).

About doing the same thing for get_usage.
I guess we could do the same thing at it would work. The benefits would be that usage and bytes wouldn't need to be mutually recursive anymore, and we wouldn't need decidable equality on usage, which could be use for further improvements of usage (although I don't have ideas for that yet).
However I have other ideas for usage, because the design constraints are not the same as with labels, and a has_usage predicate would make sense as we don't need to lift a can_flow relation as we do with labels. Although we could apply the same recipe as the one for labels, there is no technical reason we would need bytes_well_formed for usages.
Giving a few spoilers, I think this has_usage tr msg usg would replace get_usage msg == usg \/ is_publishable tr msg, which I think is the good pre-condition we need in many theorems. But I have not yet tried to see whether this actually works well.

@qaphla
Copy link

qaphla commented Sep 3, 2024

On get_label returning public for ill-formed bytes:
I see a bit better now where the problem is, I think. This feels a little bit like where get_usage returns NoUsage, and maybe the thing to do is to extend labels with something similar (which can neither flow to nor from anything else?). However, that doesn't seem nicely compatible with #55 , unless I'm missing some clever way to design such a predicate.
Maybe the three-valued approach can be used to resolve this in some sufficiently hidden way (e.g. using the typeclass approach of #51 so that instead of explicitly getting labels, we always use less_secret tr, and can internally handle the ill-formed cases), but this might be overengineered.

I think public is in some sense the "least fishy" option we have of the existing labels, and can see now the argument for why it won't cause problems because of bytes_invariant, but would still like to ensure that we keep track of it as a suspicious point.

On three-valued logic:
I hadn't thought of combining both bytes_invariant and bytes_well_formed, but rather using both separately in this three-valued form (and similarly for other trace-indexed predicates like can_flow tr). A subtlety I noticed, though, is that we seem to want different orderings for different predicates --- bytes_well_formed works with indeterminate less than both true and false, and true and false incomparable to each other, but for bytes_invariant or can_flow, or most of the other predicates we work with, we seem to want something closer to the existing stability --- indeterminate < false < true.
I think because of this, it's probably not worth digging too much into right now, but I do think there's interesting potential there if we start to work with more predicates that are similar to bytes_well_formed, or, for that matter, with monotonic functions of the same form (I guess weak_get_label like I proposed earlier would be one such function --- it can go from None to Some x with an increase in trace argument, but can never go the other direction, or from Some x to Some y. My feeling is that this might come up again when we start to look at dynamic labels, but we can see in the future.

On usages:
That sounds good to me --- I do think overall that we should make usages eraseable, much like labels, but I can see how it could be done more simply than labels, simply because it's not tangled in as much of the rest of the core.

Overall:

  • I approve of this PR --- I'll submit that
  • If there's no further discussion on get_label, I'll make an issue about it to hold future discussion (so it's not holding up this PR)
  • I'll probably also make another issue (or something else --- I need to look more at what github offers for discussions) about the three-valued logic or, more generally, monotone functions on the trace, and if we can reuse some of the Recalling a Witness-style reasoning in the same way as they do with stable predicates, to get any useful results.

qaphla
qaphla previously approved these changes Sep 3, 2024
Copy link

@qaphla qaphla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See discussion for more details.

  • Minor complaint about get_label returning public for ill-formed bytes (to become an issue)
  • Discussion sparked about three-valued logic/monotone functions for a "weak" version of get_label that avoids the need for a special case for ill-formed bytes.
  • I look forward to seeing usages get similar treatment.

@TWal
Copy link
Collaborator Author

TWal commented Sep 4, 2024

Thanks! We could make an interface file like DY.Core.Bytes.AnyLabel.fsti that would contain

val any_label: label

and use that in get_label for the ill-formed cases. It means that unless we write friend DY.Core.Bytes.AnyLabel, we can't reason with the concrete value of any_label, which makes its actual value less suspicious?
Would that look better to you?

@TWal
Copy link
Collaborator Author

TWal commented Sep 4, 2024

Now asking the thoughts of @cwaldm and @fabian-hk: do you think the impact on the proofs is fine? Especially considering this brings other benefits (such as a nice equational theory on labels done in #56 which avoid having to deal with equivalent).

@qaphla
Copy link

qaphla commented Sep 4, 2024

Thanks! We could make an interface file like DY.Core.Bytes.AnyLabel.fsti that would contain

val any_label: label

and use that in get_label for the ill-formed cases. It means that unless we write friend DY.Core.Bytes.AnyLabel, we can't reason with the concrete value of any_label, which makes its actual value less suspicious? Would that look better to you?

I think that makes it sufficiently less suspicious for now (though I do still want to think more about if there's a way to cleanly work with but hide options) --- in particular, because it prevents anyone from unintentionally using that label, regardless of what it is. I'm a bit uncertain if any_label or no_label, or maybe unknown_label is most appropriate, but i lean towards unknown_label at the moment, even though it is viable to put an arbitrary label there (the justification for any_label, presumably?

@TWal
Copy link
Collaborator Author

TWal commented Sep 5, 2024

I have done that in the DY.Core.Label.Unknown module.

In #55 I did something a bit fancier, so that reasoning on unknown_label would require reasoning Collatz conjecture, so that there is no temptation to friend DY.Core.Label.Unknown and use its actual value :)

qaphla
qaphla previously approved these changes Sep 5, 2024
Copy link

@qaphla qaphla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

I think unknown_label should be sufficiently self-documenting in get_label, especially since one can easily follow back to the fsti file for more information.

src/core/DY.Core.Bytes.fst Outdated Show resolved Hide resolved
qaphla
qaphla previously approved these changes Sep 16, 2024
Copy link

@qaphla qaphla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked over the updates, and this (still) looks fine to me, though of course feedback from others would also be welcome.

cwaldm
cwaldm previously approved these changes Sep 18, 2024
Copy link
Contributor

@cwaldm cwaldm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me as well!

fabian-hk
fabian-hk previously approved these changes Sep 18, 2024
@TWal TWal dismissed stale reviews from fabian-hk, cwaldm, and qaphla via 52c84d4 September 23, 2024 13:46
@TWal TWal merged commit e11a607 into main Sep 23, 2024
1 check passed
@TWal TWal deleted the twal/erased_labels_3 branch September 23, 2024 13:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants