Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Privacy-Preserving Graph-Based Machine Learning for Collaborative Anti-Money Laundering using Concrete ML #119

Closed
fabecode opened this issue May 3, 2024 · 2 comments
Assignees
Labels
📁 Concrete ML library targeted: Concrete ML 📄 Grant application This project is currently being reviewed by the Zama team

Comments

@fabecode
Copy link

fabecode commented May 3, 2024

Zama Grant Program: Application

  • Library targeted: Concrete ML

Overview

This project is part of my Undergraduate Computer Science Final Year Project - "Privacy-Preserving Graph-Based Machine Learning with Fully Homomorphic Encryption for Collaborative Anti-Money Laundering", and will be extended for submission to a conference for publication.

With the increasing digitalization of financial transactions and the rise of cybercrime, combating money laundering has become increasingly complex. Graph-based machine learning techniques have emerged as promising tools for Anti-Money Laundering (AML) detection, capable of capturing intricate relationships within money laundering networks. However, the effectiveness of AML solutions is hindered by the challenge of data silos within financial institutions, limiting collaboration and reducing overall efficacy.

To address these challenges, this research presents a novel privacy-preserving approach for collaborative AML machine learning, facilitating secure data sharing across institutions and borders while preserving data privacy and regulatory compliance. Leveraging Fully Homomorphic Encryption (FHE), computations can be performed on encrypted data without decryption, ensuring sensitive financial data remains confidential.

The research delves into the integration of Fully Homomorphic Encryption over the Torus (TFHE) using Concrete ML with graph-based machine learning techniques, which are divided into 2 pipelines.

  1. Privacy-Preserving Graph Neural Network (GNN) pipeline
  2. Privacy-Preserving Graph-based XGBoost pipline using Graph Feature Preprocessor

Description

Milestones

1. Development of privacy-preserving custom Graph Neural Network pipeline

  • Data preparation and preprocessing
  • Environment configuration to ensure compatibility between PyTorch Geometric (PyG) and Concrete ML
  • Custom quantisation of GNN layers, activation functions, node features and edge features using Brevitas
  • Pruning of the GNN (if necessary) to be compatible with FHE bit-width constraints
  • Conversion of GNN model to FHE equivalent
    • Enhancing existing ONNX node implementation (eg. refining ONNX operation implementation in Concrete ML such as Expand, Unsqueeze, ConstantOfShape and Reshape)
    • Integration of new ScatterElements ONNX operator in Concrete ML (or develop an alternative workaround)
    • Debugging of other conversion errors (particularly challenging given the novel integration of PyG with Concrete ML)
  • Training and evaluation of compiled GNN model using Concrete ML
  • Conduct experiments varying GNN quantisation parameters
    • Evaluation of the privacy-preserving models' performance and inference time using FHE
    • Evaluation of the floating-point equivalent (non-FHE) models' performance and inference time for comparison
    • Discuss the trade-off between models' performance and inference time / time ratio

2. Development of privacy-preserving XGBoost pipeline using Graph Feature Preprocessor

  • Data preparation and preprocessing
  • Training and evaluation of XGBoost with Graph Feature Preprocessor using Concrete ML
  • Conduct experiments with incrementally GFP-enriched graph features using XGBoost
    • Evaluation of the privacy-preserving models' performance and inference time using FHE
    • Evaluation of the floating-point equivalent (non-FHE) models' performance and inference time for comparison
    • Discuss the trade-off between models' performance and inference time / time ratio
  • Conduct experiments varying XGBoost hyperparameters such as n_estimators, max_depth and n_bits
    • Evaluation of the privacy-preserving models' performance and inference time using FHE
    • Evaluation of the floating-point equivalent (non-FHE) models' performance and inference time for comparison
    • Discuss the trade-off between models' performance and inference time / time ratio

If the above milestones are achieved, exploring additional development for tutorials or blog posts related to the subject matter can also be considered.

Estimated reward: €10k-20k

Related links and reference:

@fabecode fabecode added the 📄 Grant application This project is currently being reviewed by the Zama team label May 3, 2024
@zama-bot
Copy link

zama-bot commented May 3, 2024

Hello fabecode,

Thank you for your Grant application! Our team will review and add comments in your issue! In the meantime:

  1. Join the FHE.org discord server for any questions (pick the Zama library channel you will use).
  2. Ask questions privately: [email protected].

@zaccherinij
Copy link
Collaborator

hey @fabecode,

Thank you very much for your interest in what we do at Zama, and your proposition for a grant. For now, we will not follow up with your proposition. But we invite you to keep an eye on this repository as we will launch new bounties soon, if you're interested in playing with Zama libs.

Cheers,
JZ

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📁 Concrete ML library targeted: Concrete ML 📄 Grant application This project is currently being reviewed by the Zama team
Projects
None yet
Development

No branches or pull requests

4 participants