-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Draft] Confidentiality Analysis #10
base: master
Are you sure you want to change the base?
Conversation
- Type system based on the built-in tensor types and their sensitivity property. | ||
- An error is raised if a plaintext tensor is ever found on a player that is *not* in its sensitivity set; this can be checked at compile time and, optionally, at runtime. | ||
- Subtyping allows for implicitly restricting sensitivity by removing players from the set: `T(S) <: T'(S')` if `S'` is subset of `S`. | ||
- `tfe.analysis.broaden` must be used to broaden sensitivity by adding players to the set: `broaden_S(x) : T(S union S')` when `x: T(S')`; this makes it syntactically clear to the user where extra attention must be paid; no-op used by the type system, similar to type hints. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do see some benefits of this level of transparency, but I'm not sure this dynamic casting of the sensitivity fits my mental model. Here's how I think of it:
- A tensor (piece of data) comes with a sensitivity set, i.e. that's a static property of the object, & not the class <-- maybe this is the crux, and there are problems you've foreseen with the alternative
- When that tensor is consumed by some computation, the resulting output tensor has some new sensitivity set
- Computations can be categorized by how the output differs from the sensitivity set of a parent
- Arbitrary
broaden
ing is unnecessary because transformations of sensitivity sets are implicitly defined by each node (i.e. operation) in the computation graph. This is not to forbid the user from doing so, but just discourages it.
Put another way, should the sensitivity set be a property of each tensor type, or each tensor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A tensor (piece of data) comes with a sensitivity set, i.e. that's a static property of the object, & not the class <-- maybe this is the crux, and there are problems you've foreseen with the alternative
Maybe we're saying the same thing: I imagine that each tensor instance (and not each tensor type) has its own sensitivity, i.e. sensitivity
is an instance member, and depends e.g. on where the tensor was created.
When that tensor is consumed by some computation, the resulting output tensor has some new sensitivity set
I was thinking that most operations do/should not change sensitivity. For high level functionalities (such as secure aggregation) the broadening could be an internal step that doesn't require any additional broadening by the user of these.
Computations can be categorized by how the output differs from the sensitivity set of a parent
Any more thoughts on this?
Arbitrary broadening is unnecessary because transformations of sensitivity sets are implicitly defined by each node (i.e. operation) in the computation graph. This is not to forbid the user from doing so, but just discourages it.
Arbitrary broadening is an important part of the policy, say the fact that aggregation is enough to release otherwise sensitive values. As mentioned above, this policy can be baked into high level functionalities, but for general computations I don't see how we can know upfront what policy the user wants (besides the default of copying).
Put another way, should the sensitivity set be a property of each tensor type, or each tensor?
My thoughts are each tensor. Are we saying the same thing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we saying the same thing here?
I believe so -- the mention of "subtyping" was what confused me I think, can you clarify what you mean by that?
Any more thoughts on this?
I've just noticed that in e.g. DP there is an allowable query set that maintains the DP bound, and suspect this might mean that operations can be similarly categorized for the kind of sensitivity we're considering here (although perhaps as you say that doesn't account for all the operations we'd want to be able to check).
As mentioned above, this policy can be baked into high level functionalities, but for general computations I don't see how we can know upfront what policy the user wants (besides the default of copying).
Where is this? I'm not sure I see it in the doc -- are you referring to the examples below?
|
||
For secure aggregation for federated learning we obtain: | ||
|
||
0) Model weights with type `PlaintextTensor({mo})`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's an example where my understanding was different, related to above. Any tensor x
could equivalently be stated as having type PlaintextTensor(U)
where U = union(s in S_t)
and S_t
is the sensitivity set of x
at any time t
in the tensor's lifespan. In this case, that means the model weights here would be instantiated sensitivity None
(which matches my intuition about the security of the weights in the secure aggregation use case).
I'm sure there are benefits to the approach you specify here over the one I'm describing -- what did you have in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any tensor x could equivalently be stated as having type PlaintextTensor(U) where U = union(s in S_t) and S_t is the sensitivity set of x at any time t in the tensor's lifespan.
Something like this will happen at runtime when the policy is being checked, but my thoughts were that we need something to check against, ie a way to express expectations.
In this case, that means the model weights here would be instantiated sensitivity None (which matches my intuition about the security of the weights in the secure aggregation use case).
Your intuition is that it is safe to share the weights?
The idea is that when a player creates a tensor we start out by assuming that it is a very sensitive value; if that is not the case then it needs to be specified one way or another.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like this will happen at runtime when the policy is being checked, but my thoughts were that we need something to check against, ie a way to express expectations.
Agreed. I was expecting that instantiating a tensor x
with a sensitivity set S
would enforce at runtime that the value x
(or any of its children in the computation graph) would not be broadened beyond S
. There might be an exception when a specific operation releases this requirement, e.g. when secure aggregation happens we can release the tensors that result, even though some of the parents in the graph might have a stricter sensitivity set. Specifically, the training data tensors & all children (including the local gradients) would have the stricter sensitivity set.
Your intuition is that it is safe to share the weights?
In the secure aggregation example, it seems necessary for the policy to broaden the sensitivity of the weights to at least include the data owners/clients.
The idea is that when a player creates a tensor we start out by assuming that it is a very sensitive value; if that is not the case then it needs to be specified one way or another.
This makes sense to me, it's a good default in the absence of what I'm describing here and in the other thread. I suppose I'd assumed the existence of that API in my comment above, so will focus attention on that thread.
|
||
## Detailed Design | ||
|
||
## Questions and Discussion Topics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be defined over abstract notions of Plaintext & Encrypted -- does this mean that sensitivity would apply to e.g. an AdditivelySharedTensor as well as the component shares inside the AdditivelySharedTensor? Or would it just be at the higher level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concrete encrypted tensors would be inherited sensitivity from the abstract EncryptedTensor. Did not image that component/backing tensors would have their own sensitivity, although they would probably have their own placement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, what qualifies these concrete encrypted tensors being in violation of their sensitivity set? It seems like it would have to be semantically different from what it means for plaintext tensors. Encrypted might be something like "is never decrypted by a player outside of the sensitivity set" vs. plaintext might be something like "is never possessed by a player outside the sensitivity set" -- is this correct and intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think this accounts for the mismatch I describe above -- I was only thinking about sensitivity in the context of the plaintext description "is never possessed by a player outside the sensitivity set" and was thinking that the backing/component tensors would have their own sensitivity, in which case passing them through specific kernels might have more well defined effects on sensitivity sets.
No description provided.