Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid large download and informing the server of the list of installed application #64

Open
tdelmas opened this issue Sep 15, 2019 · 6 comments
Labels
question Further information is requested

Comments

@tdelmas
Copy link

tdelmas commented Sep 15, 2019

Today the Application fetch two things:

The list of all applications known by exodus:

https://reports.exodus-privacy.eu.org/api/applications

{"applications": [
    {
      "id": 50261,
       "handle": "org.eu.exodus_privacy.exodusprivacy",
       "name": "Exodus Privacy",
       "creator": "Exodus Privacy",
       "downloads": "10,000+ downloads",
       "app_uid": "B6FECF6541A4C151B6FB1AB1D77AD012C95349DF",
       "icon_phash": "93967134568205988083521687304934410531",
       "report_updated_at": 1566315960.731061
  }
]}

For each installed application on the phone, the list of reports:

https://reports.exodus-privacy.eu.org/api/search/org.eu.exodus_privacy.exodusprivacy

{
  "org.eu.exodus_privacy.exodusprivacy": {
    "reports": [
      {
        "downloads": "10,000+ downloads",
        "creation_date": "2019-07-30T21:23:18.334Z",
        "updated_at": "2019-08-20T10:34:49.189Z",
        "trackers": [],
        "version_code": "7",
        "id": 87886,
        "version": "1.2.0"
      }
    ],
    "name": "Exodus Privacy",
    "creator": "Exodus Privacy"
}

This has two drawback:

  • The first download is a large json file
  • The second one reveal the list of installed application to the server

These two dowload could be combined, to reduce the size of the download while protecting the privacy:

A single entry point:

https://reports.exodus-privacy.eu.org/api/search/<SHA256>

Where <SHA256> is the beginning (4 char for example) of the SHA256 of the app id (org.eu.exodus_privacy.exodusprivacy)
The server answer with all application (with reports) that match the begining of that checksum:

{"applications": [
    {
      "id": 50261,
       "handle": "org.eu.exodus_privacy.exodusprivacy",
       "name": "Exodus Privacy",
       "creator": "Exodus Privacy",
       "downloads": "10,000+ downloads",
       "app_uid": "B6FECF6541A4C151B6FB1AB1D77AD012C95349DF",
       "icon_phash": "93967134568205988083521687304934410531",
       "report_updated_at": 1566315960.731061,
        "reports": [
        {
            "downloads": "10,000+ downloads",
            "creation_date": "2019-07-30T21:23:18.334Z",
            "updated_at": "2019-08-20T10:34:49.189Z",
            "trackers": [],
            "version_code": "7",
            "id": 87886,
            "version": "1.2.0"
        }
        ]
    }
]}

The application could decide the length of the beginning of the checksum, to balance download size vs privacy: a length of 1 will strongly protect the privacy but will download 3% of the database. The full length will download only one application but will reveal it to the server. A length corresponding to 10-100 applications could be reasonable, and configurable by the user.

@Schoumi Schoumi transferred this issue from Exodus-Privacy/exodus-android-app Sep 16, 2019
@Schoumi
Copy link
Contributor

Schoumi commented Sep 16, 2019

Only installed applications analysed by Exodus and installed from Google Play download report and may be known by the server. Each time a random number of apps are add to make some noise for the server.

What you propose is good for privacy, for the processing time this will be long. And the large amount of data will always be there in some times if it's not already there. Each new report (new analysis on a new version of an app) add an extra amount of data and download many report not related to the user make the amount of useless data grow significantly.

Our app is not for tech guy that fully understand how to configure exactly their privacy so the length configuration for the api is not in my opinion an option to add to the app.The app should stay very simple for all users.

And in fact, i think that if we really want to know what app you have on your phone, this not protect as much as you want. We can guess what you have with the hash + the probability of the app you may have install by using the download number of these apps and with the other app you may have downloaded. it will not be as accurate as we can do if we want, and we don't want to know what you have on your phone.

@Schoumi Schoumi transferred this issue from Exodus-Privacy/exodus Sep 16, 2019
@tdelmas
Copy link
Author

tdelmas commented Sep 22, 2019

  • With the current solution, if the application send for example 50% fake requests the server knows the client has 50% of the applications requested. It know each application has 50% chance to be on the client.
  • With my proposal, if the client ask for a short hash, that represent for example 100 applications, the servers knows each application has 1% chance to be on the client.

In both cases, "the probability of the app you may have install" can be used. So it's a real gain in terms of privacy

With my proposal, if necessary, the client can send fake beginning of hash to confuse the server even more.

And when I say "the client", I am not talking only of this android client, but anybody who wants to build one. It's better to offer the possibility, but I agree, for this android app, we don't need to show the configuration "length of the hash", we just need to choose one that both respect the privacy and avoid downloading the entire database!

@Gu1nness
Copy link

This seems to be a nice model to me (close to some differential privacy if I remember well my lectures)
With 2 characters (uppercase, lowercase, digits) we already 3844 have combinations.
With the current number of apps it returns an average of 16.67 apps per combination.
However, with 3 characters, it is over : 0.269 apps per combination for now.
So 2 characters would be enough.
Or we use only lowercases characters?

Maybe the easiest way of deciding would be to create a nice table with all the possibilities so we can decide ?

@Porkepix
Copy link
Member

This seems to be a nice model to me (close to some differential privacy if I remember well my lectures)
With 2 characters (uppercase, lowercase, digits) we already 3844 have combinations.
With the current number of apps it returns an average of 16.67 apps per combination.
However, with 3 characters, it is over : 0.269 apps per combination for now.
So 2 characters would be enough.
Or we use only lowercases characters?

Maybe the easiest way of deciding would be to create a nice table with all the possibilities so we can decide ?

Side question, is there a risk that hashes are not evenly spread, leading to some specific ones failing on the privacy (eg. ab match 40 applications but xy only one)?

@Gu1nness
Copy link

That's what I want to check with the current DB. Maybe tonight.
It shouldn't be long to compute and to get the stats.
I hope that it will be evenly spread, but I cannot be sure, a sample of ~64k strings is not a lot.

@pnu-s pnu-s added the question Further information is requested label Jan 3, 2022
@ItsIgnacioPortal
Copy link

ItsIgnacioPortal commented May 13, 2022

OR, we could make it so the entire exodus database gets uploaded to multiple file hosts every month, and then offer the option to download it locally to the users who suffer paranoia want more privacy. Overloading the servers with unnecesarry traffic is unnacceptable when a better technical solution is available.

As of writing this, exodus says that it has "261180 reports for 127692 applications". An empty report is 394 characters,

394 x 261180 = 102,904,920 bytes = 98.1377792 Mebibytes
with a generous estimate that the average full report weighs six times that ammount:
98.1377792 x 6 = 588.826675 Mebibytes

#85 is also a valid option that doesn't require any servers, besides the initial download of the code and network signatures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

6 participants