Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do ZetaSQL examples supports JOIN queries? #233

Open
qascade opened this issue Apr 21, 2023 · 4 comments
Open

Do ZetaSQL examples supports JOIN queries? #233

qascade opened this issue Apr 21, 2023 · 4 comments

Comments

@qascade
Copy link

qascade commented Apr 21, 2023

I wanted to write a Zetasql query that joins two tables on a single private column for an ANON_COUNT() query.
For example, if there are tables: table1 and table2, both with a common email column.

SELECT WITH ANONYMIZATION OPTIONS(epsilon={{epsilon}}, delta={{delta}}, kappa={{kappa}})
ANON_COUNT( email CLAMPED BETWEEN 0 and 300) AS common_emails FROM table1 JOIN table2 using email

Is it possible to do this? If it is possible to do this using the Go library that would also be great.

@qascade qascade changed the title Does ZetaSQL allows JOIN queries? Does ZetaSQL CLI Tool allows JOIN queries? Apr 21, 2023
@qascade qascade changed the title Does ZetaSQL CLI Tool allows JOIN queries? Does this library supports JOIN queries? Apr 21, 2023
@dibakch
Copy link
Collaborator

dibakch commented Apr 24, 2023

In general, ZetaSQL allows you to join tables and apply DP on top of it. However, our sample binary execute_query only takes one argument for a table defined in a CSV file. You can modify the source code in examples/zetasql/execute_query.cc and define another table in C++ using zetasql::MakeTableFromCsvFile and define email to be the user id using the SetAnonymizationInfo method on the defined tables.

ZetaSQL is written in C++ and uses the C++ DP Lib.

@dibakch dibakch self-assigned this Apr 24, 2023
@dibakch dibakch changed the title Does this library supports JOIN queries? Do ZetaSQL examples supports JOIN queries? Apr 24, 2023
@dibakch
Copy link
Collaborator

dibakch commented Apr 25, 2023

Let's use this issue to collect if there is interest in this feature. Using join conditions for DP queries might be something that is interesting to try out, since those joins are not straight forward (they need to propagate the column that is used to identify a user for the DP aggregation).

@dibakch dibakch removed their assignment Apr 26, 2023
@qascade
Copy link
Author

qascade commented Apr 26, 2023

I am trying to use the dp library to run SQL queries that inherently support DP. Section 4 of the DP SQL paper discusses aggregation with joins and compares it with previously built DP SQL engines. In general, joins, especially inner joins, are one of the most sought out queries to be run. I think we should have an example of that and how it affects the accuracy of the results.

@qascade
Copy link
Author

qascade commented Apr 26, 2023

Section 2 of Flex Paper comprehensively analyzes the kind of queries considered as a requirement for Practical Differential privacy in the context of SQL queries, which also backs my above claim.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants