Skip to content

Email Spam Classification using BERT on flower : This project demonstrates how to use BERT, a pre-trained deep learning model, for email spam classification using Flower

Notifications You must be signed in to change notification settings

samadeep/federated_learning_BERT

Repository files navigation

Project Overview

  • Title: Email Spam Classification using BERT with Federated Learning (Flower)
  • Description: This project demonstrates how to leverage the power of BERT (Bidirectional Encoder Representations from Transformers), a pre-trained deep learning model, for email spam classification using Flower, a federated learning framework.

Key Concepts

  • BERT: A pre-trained transformer model adept at natural language understanding tasks.
  • Federated Learning: A collaborative machine learning approach where training data remains distributed across devices or servers, enhancing privacy and security.
  • Flower: A framework that simplifies federated learning workflows.

Data Preparation

  1. Data Source: Describe the email spam classification dataset (e.g., publicly available, custom collection).
  2. Preprocessing: Explain how you prepare the data for BERT:
    • Text cleaning (removing noise, punctuation)
    • Tokenization (breaking text into smaller units)
    • Padding/truncating sequences to a uniform length
    • Converting tokens to numerical representations (using BERT's vocabulary)

Model Definition

  1. Model Selection: Specify the chosen pre-trained BERT model variant (e.g., bert-base-uncased).
  2. Fine-Tuning: Explain how you adapt the pre-trained model for email spam classification:
    • Adding a custom final layer (e.g., sigmoid for binary classification)
    • Freezing or fine-tuning specific BERT layers based on complexity and dataset size

Federated Learning with Flower

  1. Client Simulation: Describe how you simulate multiple clients within a single machine using Flower:
    • Distribution of training data to simulated clients (e.g., random or stratified sampling)
    • Communication between clients and the central server (Flower) for model updates

Federated Learning Workflow (Dot Diagram)

digraph G {
    
    // Nodes
    A [label="Clients (Simulated)"];
    B [label="Local Models"];
    C [label="Global Model"];
    A1 [label="Weighted Averaging"];
    A2 [label="Median Aggregation"];
    A3 [label="Robust Aggregation"];
    A4 [label="Differential Privacy"];
    A5 [label="Secure Aggregation"];
    A6 [label="Homomorphic Encryption"];
    A7 [label="Efficient Communication"];
    A8 [label="Asynchronous Updates"];

    // Edges
    A -> B [label="Train on Local Data"];
    B -> C [label="Aggregate Updates"];
    C -> A [label="Update Clients"];

    // Subgraphs
    subgraph cluster_0 {
        label = "Custom Aggregation Strategy";
        A1; A2; A3;
    }

    subgraph cluster_1 {
        label = "Data Privacy and Security";
        A4; A5; A6;
    }

    subgraph cluster_2 {
        label = "Communication and Efficiency";
        A7; A8;
    }

}

Running the Project

Prerequisites : List required libraries (e.g., transformers, flower) and installation instructions. Customization : Explain how to modify the main.py file to adjust training parameters:

  • Number of clients
  • Training epochs per round
  • Number of federated learning rounds
  • Client script filename
  • Dataset size per client
  • Execution: Provide clear instructions on running the project with the modified main.py file:
  • Command to execute
      python3 main.py

Expected output (training progress, evaluation metrics)

About

Email Spam Classification using BERT on flower : This project demonstrates how to use BERT, a pre-trained deep learning model, for email spam classification using Flower

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages