Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clearly distinguish private data and privitized data by using different variable naming conventions on each side of the noise barrier. #13

Open
simsong opened this issue Jan 6, 2020 · 4 comments

Comments

@simsong
Copy link

simsong commented Jan 6, 2020

In reviewing the source code, it is not clear which variables refer to private data and which refer to privitized data. I suggest that you clearly distinguish these in the source code, perhaps by using different variable naming conventions.

@dasmdasm
Copy link

Can you be a little more specific? Which variables are you talking about? Can you provide an example?

@simsong
Copy link
Author

simsong commented Jan 23, 2020

Sure. Consider https://github.com/google/differential-privacy/blob/master/differential_privacy/algorithms/bounded-sum.h

The same naming convention is used for private, confidential variables, and privitized variables that have had noise added. This makes it harder to read and audit the code. For example, in this line:

  void AddEntry(const T& t) override {

Is t a confidential value or a non-confidential value?

How about here:

  base::StatusOr<Output> GenerateResult(double privacy_budget) override {
    DCHECK_GT(privacy_budget, 0.0)
        << "Privacy budget should be greater than zero.";
    if (privacy_budget == 0.0) return Output();

    Output output;
    double sum = 0;
    double remaining_budget = privacy_budget;

Is the value privacy_budget confidential or public? It's probably epsilon, so it's probably public, but I can't tell that from the naming convention.

What about sum ? Is that public or confidential?

@dasmdasm
Copy link

Interesting suggestion. We haven't given much thought to this sort of style/convention, but I can see how it would be useful. Do you know if there are any existing examples of this sort of convention in differential privacy code? We can come up with our own, but I'd like to avoid proliferating standards for this, if possible.

@simsong
Copy link
Author

simsong commented Jan 31, 2020

Sure. Check out https://github.com/uscensusbureau/census2020-das-2010ddp/blob/803126100083f6811aaf5bcb0be79c4fc7b1a148/das_decennial/programs/engine/primitives.py#L132

It's not quite the naming approach I recommend here. However, you will see code like this:

    shape = np.shape(true_answer)
    # TODO: Implement CSPRNG and floating point
    self.protected_answer = prng.laplace(loc=true_answer, scale=self.scale, size=shape)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@simsong @dasmdasm and others