-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a string library that works on encrypted data using TFHE-rs #80
Comments
Could you provide me the time frame,I will be working on it after two weeks; will that be okay? |
Deadline for submission is 17th December (you could see it with the associated milestone |
I have a few questions:
this makes sense for me with trim_end, but I'm not sure about trim or trim_start... is it ok to just treat 0-byte as empty anywhere in the string (rather than just at its end / null-termination), so ClientKey's decryption would just filter out these 0s to get a string? Or should it still respect C-style and what to do in that case? Should it use a special value to mark empty places (e.g. 256 outside of u8's range) before 0-termination? |
@aquint-zama one follow-up question depending on the answer to 4. about function outputs:
|
hello @tomtau
|
Thanks @IceTDrinker . And for "replace" / "replacen" functions, should the "to" argument be FheString or a plain string/slice? |
I believe both are doable, with a complication/performance degradation if "to" is also encrypted |
@aquint-zama Some |
Hello @Lcressot |
@IceTDrinker Ok great thanks, so we need to measure and display the computation time for each operation and not just globally |
Absolutely @Lcressot |
Hi, should the encrypted pattern also have a padding ? I think using padding to the pattern would lead to a huge additional complexity to some algorithms. Maybe we can implement both padding and no padding functions. |
This is a good question, in the general case the pattern length can leak some amount of information about what users would be searching for. With that said the performance could be abysmal indeed, potential ideas, we will be discussing on our end at Zama but if you have some inputs on that please feel free to share :
|
Hello, thanks for answering. My current opinion on this:
This is a good question, in the general case the pattern length can leak some amount of information about what users would be searching for.
With that said the performance could be abysmal indeed, potential ideas, we will be discussing on our end at Zama but if you have some inputs on that please feel free to share :
- also have the length of the string as an encrypted value in the FheString struct (we can make it a 32 bits integer), not sure if that helps or notNot sure the encrypted length would help because because we would have to deal with all possible length value as well. The padding zeros already contain de encrypted length information in my opinion.
- have an enum in the FheString, to know if the encrypted string is padded with 0s at the end or not allowing to match on both the encrypted string and the pattern to use better algorithms if they exist in the various implementations (less error prone for the users as this enum would be set during encryption)I used a boolean has_padding that is set at encryption. I consider it a trade-off between security and performance for the user to decide.
- have different functions to manage all the cases, I'm not a fan of this option as it would clutter the API with a lot of function variations in addition to being error prone for the users
I think the algorithms could integrate an if has_padding for the pattern, maybe panic if there is padding and they don’t work in this case, and keep the same for the string (wether it has padding or not) considering it has an ending /0 in all cases.
Loïc
|
So after discussion @Lcressot and @tomtau it is acceptable to have a flag indicating if the string is padded with zeros at the end or if it has a single 0, meaning the string length can be deduced by the vector it stores. Having better algorithms for when there is no padding is accepted, submissions which cover the most difficult cases (e.g. padded input and padded pattern) would likely be rated better as the other algorithms should be more straightforward |
Instead of duplicating all functions for both types FheString and String (encrypted or clear pattern), can we make template function that take FheString where T can be CipherText or char ? This will require in the main.rs to build both encrypted and clear FheString versions of the pattern string. |
having generic functions would be a nice touch I would say, they can then dispatch to the proper algorithm |
Regarding this note: Is it OK if the |
@matthiasgeihs a flag indicating the padding status is fine yes, or an enum works as well |
|
In my opinion you can return the result along with a boolean telling wether the prefix pattern was found or not |
@MakisChristou No BooleanBlock is just a type we introduced that you can easily convert to Radix, the new type is to convey that some values encode a boolean value (i.e. a 0 or a 1) and should make some API usage less error prone, I think some functions may have had optimizations linked to BooleanBlock but don't take my word for it. |
@MakisChristou Though for some functions if it makes sense to return a Boolean then feel free to return a BooleanBlock ! |
I see, is the convertion one way or 2 way? Like can I convert a |
So you should be able to convert the boolean block to a radix with
for further support please use the FHE.org Discord channel for TFHE-rs :) |
So this has been somewhat discussed before but wanted to make sure I am getting the requirements right. For example in many functions like |
For example we can use a non ascii byte as a placeholder for bytes to be removed by the client. The resulting string will be the valid unpadded result. |
well the special character of option 2 is a |
Not nessesarily. Lets say we run repeat on “hello/0” 2 times. Instead of for example having “hello/0hello/0” and applying shuffling to get “hellohello/0/0” we can do “helloxhellox” where x can be 255u8 or something out of the ASCII range. Then the client can trivially remove the 255u8s getting “hellohello”. In my code /0 marks the end of the string whereas this special char can mark “remove this byte on the client side”. |
can you call another of the API on the string with the special character and have a correct result ? 🤔 feels like a |
And so yeah I was just checking the ASCII managed by rust is the non extended one, so all characters with code < 128, so you have 128 "free" values, though, one could argue you could use a non printable value from the non extended ASCII so that you could easily support the extended ASCII (or a non printable value from the extended ASCII). The problem of managing that special character in other algorithm still remains I believe as you could chain some methods on an FheString and each algorithm should be able to work properly I would say and it has to account for that special character that could be there as other algorithms could have put it there |
I see. Indeed if chaining has to be supported then option 2 is a no go unless I refactor all algorithms to handle the special character |
Sorry if this was not clear, but I think you can see how, the same way you could chain algorithms in rust, you should be able to chain algorithms in FHE |
In rust "".split("") gives a different result than "/0/0/0".split("/0"). |
@Lcressot |
which is not that crazy of an assumption as a lot of the world runs on C/C++ which means that |
We cannot set a fork to private :p |
I know 😞 , you could just init a private repo with the tfhe-rs referenced commit here and add your own commits. An even better structure for reviewing would be to keep |
One final question, besides Docstrings on individual functions is there any preferred way to document our solution. I was thinking of a Readme or is that not necessary? |
Readme would be nice (sorry for late answer) |
Will the winners be announced? |
@matthiasgeihs Yes we will share more information about winning solutions of S4 as soon as |
Winners🥇 1st place: A submission by JoseSK999 |
Winners
🥇 1st place: A submission by JoseSK999
🥈 2nd place A submission by Tomtau
🥉 3rd place : A submission by M-Bln
Overview
This TFHE-rs bounty looks to provide a set of APIs working over encrypted string reproducing APIs from the rust std library str type (see documentation at https://doc.rust-lang.org/std/primitive.str.html)
This document will specify the expected structure of the example that you will produce and specify various constraints on the types that will be introduced and the APIs that are expected on the primitives.
How to participate?
1️⃣ Register here.
2️⃣ When ready, submit your code here.
🗓️ Submission deadline: December 17, 2023.
Description
String encoding
The input string encoding is expected to be ASCII.
To avoid leaking the length of the input string we may want to encrypt zeros after the string actual content, to easily mark the end of the string we will make the
FheString
null terminated, i.e., it must always end by the value0u8
encrypted, like C Strings are null terminated.Functions to implement
Your submission should at least implement, the following method for an encrypted string:
contains
with clear / encrypted patternends_with
with clear pattern / encrypted patterneq_ignore_case
find
with clear pattern / encrypted patternis_empty
len
repeat
with clear / encrypted number of repetitionsreplace
with clear pattern / encrypted patternreplacen
with clear pattern / encrypted patternrfind
with clear pattern / encrypted patternrsplit
with clear pattern / encrypted patternrsplit_once
with clear pattern / encrypted patternrsplitn
with clear pattern / encrypted patternrsplit_terminator
with clear pattern / encrypted patternsplit
with clear pattern / encrypted patternsplit_ascii_whitespace
split_inclusive
with clear pattern / encrypted patternsplit_terminator
with clear pattern / encrypted patternsplitn
with clear pattern / encrypted patternstarts_with
with clear pattern / encrypted patternstrip_prefix
with clear pattern / encrypted patternstrip_suffix
with clear pattern / encrypted patternto_lowercase
to_uppercase
trim
trim_end
trim_start
+
(concatenation)>=
,<=
,!=
,==
API to use for the bounty
This bounty should make use of the integer API from TFHE-rs, more precisely we expect you to use
RadixCiphertexts
with the default parallelized functions available in the crate.Structure of the example directory
You will put your code in the directory tfhe/src/examples/fhe_strings
This directory will contain the following:
main.rs:
The code for a command line executable that will take an input string to be encrypted and a pattern for string functions which require it. This executable will encrypt the first string and run all the available APIs on the input string using the pattern when necessary. The results will be compared to the clear APIs provided by rust’s std::str, the ouptuts will be nicely formatted for the user to see that the results match (if there are errors then the bounty would not be considered valid) with timing information for the FHE version.
Use clap (the version pinned in TFHE-rs) to build the command line.
ciphertext.rs:
This module will contain:
FheAsciiChar
: a wrapper type that will hold aRadixCiphertext
from integer which must be able to store at least 8 bits of data to be able to fit a single ASCII char;FheString
: a wrapper type around aVec<FheAsciiChar>
, the lastFheAsciiChar
of the string always encrypts a 0u8, it is possible to have 0u8 earlier than the last char, this would allow the user to hide the actual length of the string that is encrypted, accessors should be made to be able to iterate easily on the inner vec both mutably and immutably, do not provide a from_blocks primitives as it would be easy to misuse, anew
function should be enough to construct the type at encryption time with a client key, see for exampletfhe/src/integer/ciphertext/mod.rs
to see how IntegerRadixCiphertext
are built to give access to their content for use by algorithms.client_key.rs:
Provide a ClientKey type that can be built from
shortint
parameters (that are also used by the integer API) or from anIntegerClientKey
directly, realistically this ClientKey type will wrap the IntegerClientKey and provide primitives to encrypt a rust str (validating before encryption that the str is a valid ASCII str), decrypt it in a fresh String. There should be an encryption primitive that allows the user to specify how many “padding” encryption of zeros should be appended after the 0-terminated string that will be encrypted from the provided input string.ClientKey must derive
serde::Serialize
,serde::Deserialize
andClone
.directory server_key:
mod.rs:
Provide a ServerKey that wraps an Integer ServerKey that can be built from a ClientKey from client_key.rs or directly from an Integer ServerKey.
ServerKey must derive
serde::Serialize
,serde::Deserialize
andClone
.Create files for families of functions that share similar algorithmic needs, e.g., IF IT MAKES SENSE (here we are speculating, this is just an example supposing similar functions will require similar algorithms)
split.rs
Will contain an
impl
Block for the aboveServerKey
dedicated to split functions likersplit
rsplit_terminator
split
split_inclusive
split_terminator
and all relevant helper functions. If some functions are needed everywhere then those functions can be put in the mod.rs file from the directory.
trim.rs
Will contain an
impl
Block for the aboveServerKey
dedicated to trim functions liketrim
trim_end
trim_start
Note that the above functions will not literally trim blocks off of the provided
FheAsciiString
but rather will zero out the correct blocks to make the null terminated string the trimmed version as appropriate.Also, it could make sense to separate functions depending on the type of the pattern (i.e., encrypted or clear) in separate files.
Other requirements
Each function should have a docstring describing what it does (those can be adapted from the rust std docs) with a doctest demonstrating the usage on simple hard coded cases.
We expect standard algorithms if it makes sense (with links to the papers describing them or publicly available content on the web like a Wikipedia page for example), if some tricks are used proper comments are expected to be provided in the code.
Note
As submissions are evaluated on a 64 cores machine, your code should heavily use parallelization
A Makefile target (see Makefile for the template of our targets) for running the tests of the example is expected. A command to run the binary easily from the terminal is also requested.
The code must compile on the latest stable rust. No #[allow(clippy::...)] are authorized unless absolutely necessary but that should never be required.
The code must pass the make pcc (pre commit check) available from our Makefile, if you get errors then the code needs to be fixed to satisfy the pcc checks.
Good luck!
Reward
🥇Best submission: up to €10,000.
To be considered best submission, a solution must be efficient, effective and demonstrate a deep understanding of the core problem. Alongside the technical correctness, it should also be submitted with a clean code, clear explanations and a complete documentation.
🥈Second-best submission: up to €3,500.
For a solution to be considered the second best submission, it should be both efficient and effective. The code should be neat and readable, while its documentation might not be as exhaustive as the best submission, it should cover the key aspects of the solution.
🥉Third-best submission: up to €1,500.
The third best submission is one that presents a solution that effectively tackles the challenge at hand, even if it may have certain areas of improvement in terms of efficiency or depth of understanding. Documentation should be present, covering the essential components of the solution.
Reward amounts are decided based on code quality and speed performance on a m6i.metal AWS server.
Related links and references
Submission
Apply directly to this bounty by opening an application here.
Questions?
Do you have a specific question about this bounty? Join the live conversation on the FHE.org discord server here. You can also send us an email at: [email protected]
The text was updated successfully, but these errors were encountered: