Table of Contents
- Papers
- FL on Graph Data and Graph Neural Networks
- FL on Tabular Data
- FL in top-tier journal
- FL in top-tier conference and journal by category
- Framework
- Datasets
- Surveys
- Tutorials and Courses
- Key Conferences/Workshops/Journals
- Update log
- How to contact us
- Acknowledgments
- Citation
categories
- Artificial Intelligence (IJCAI, AAAI, AISTATS)
- Machine Learning (NeurIPS, ICML, ICLR, COLT, UAI)
- Data Mining (KDD, WSDM)
- Secure (S&P, CCS, USENIX Security, NDSS)
- Computer Vision (ICCV, CVPR, ECCV, MM)
- Natural Language Processing (ACL, EMNLP, NAACL, COLING)
- Information Retrieval (SIGIR)
- Database (SIGMOD, ICDE, VLDB)
- Network (SIGCOMM, INFOCOM, MOBICOM, NSDI, WWW)
- System (OSDI, SOSP, ISCA, MLSys, TPDS)
keywords
Statistics: 🔥 code is available & stars >= 100 | ⭐ citation >= 50 | 🎓 Top-tier venue
kg.
: Knowledge Graph | data.
: dataset | surv.
: survey
This section partially refers to DBLP search engine and repositories Awesome-Federated-Learning-on-Graph-and-GNN-papers and Awesome-Federated-Machine-Learning.
Title | Affiliation | Venue | Year | TL;DR | Materials |
---|---|---|---|---|---|
FedWalk: Communication Efficient Federated Unsupervised Node Embedding with Differential Privacy | SJTU | KDD 🎓 | 2022 | FedWalk1 | [PUB] [PDF] |
FederatedScope-GNN: Towards a Unified, Comprehensive and Efficient Platform for Federated Graph Learning 🔥 | Alibaba | KDD (Best Paper Award) 🎓 | 2022 | FederatedScope-GNN2 | [PDF] [CODE] [PUB] |
Deep Neural Network Fusion via Graph Matching with Applications to Model Ensemble and Federated Learning | SJTU | ICML 🎓 | 2022 | GAMF3 | [PUB.] [CODE] |
Meta-Learning Based Knowledge Extrapolation for Knowledge Graphs in the Federated Setting kg. |
ZJU | IJCAI 🎓 | 2022 | MaKEr4 | [PUB] [PDF] [CODE] |
Personalized Federated Learning With a Graph | UTS | IJCAI 🎓 | 2022 | SFL5 | [PUB] [PDF] [CODE] |
Vertically Federated Graph Neural Network for Privacy-Preserving Node Classification | ZJU | IJCAI 🎓 | 2022 | VFGNN6 | [PUB] [PDF] |
SpreadGNN: Decentralized Multi-Task Federated Learning for Graph Neural Networks on Molecular Data | USC | AAAI:mortar_board: | 2022 | SpreadGNN7 | [PUB] [PDF] [CODE] [解读] |
FedGraph: Federated Graph Learning with Intelligent Sampling | UoA | TPDS 🎓 | 2022 | FedGraph8 | [PUB.] [CODE] [解读] |
FedGCN: Convergence and Communication Tradeoffs in Federated Training of Graph Convolutional Networks | CMU | CIKM Workshop (Oral) | 2022 | FedGCN9 | [PDF] [CODE] |
FedNI: Federated Graph Learning with Network Inpainting for Population-Based Disease Prediction | UESTC | TMI | 2022 | FedNI10 | [PUB] [PDF] |
FedEgo: Privacy-preserving Personalized Federated Graph Learning with Ego-graphs | SYSU | TOIS | 2022 | FedEgo11 | [PUB.] [CODE] |
A federated graph neural network framework for privacy-preserving personalization | THU | Nature Communications | 2022 | FedPerGNN12 | [PUB] [CODE] [解读] |
SemiGraphFL: Semi-supervised Graph Federated Learning for Graph Classification. | PKU | PPSN | 2022 | SemiGraphFL13 | [PUB] |
Efficient Federated Learning on Knowledge Graphs via Privacy-preserving Relation Embedding Aggregation kg. |
Lehigh University | ACL Workshop | 2022 | FedR14 | [PDF] [CODE] |
Power Allocation for Wireless Federated Learning using Graph Neural Networks | Rice University | ICASSP | 2022 | wirelessfl-pdgnet15 | [PUB] [PDF] [CODE] |
Privacy-Preserving Federated Multi-Task Linear Regression: A One-Shot Linear Mixing Approach Inspired By Graph Regularization | UC | ICASSP | 2022 | multitask-fusion16 | [PUB] [PDF] [CODE] |
Federated knowledge graph completion via embedding-contrastive learning kg. |
ZJU | Knowl. Based Syst. | 2022 | FedEC17 | [PUB] |
Federated Graph Learning with Periodic Neighbour Sampling | HKU | IWQoS | 2022 | PNS-FGL18 | [PUB] |
A Privacy-Preserving Subgraph-Level Federated Graph Neural Network via Differential Privacy | Ping An Technology | KSEM | 2022 | DP-FedRec19 | [PUB] [PDF] |
Graph-Based Traffic Forecasting via Communication-Efficient Federated Learning | SUSTech | WCNC | 2022 | CTFL20 | [PUB] |
Federated meta-learning for spatial-temporal prediction | NEU | Neural Comput. Appl. | 2022 | FML-ST21 | [PUB] [CODE] |
BiG-Fed: Bilevel Optimization Enhanced Graph-Aided Federated Learning | NTU | IEEE Transactions on Big Data | 2022 | BiG-Fed22 | [PUB] [PDF] |
Malicious Transaction Identification in Digital Currency via Federated Graph Deep Learning | BIT | INFOCOM Workshops | 2022 | GraphSniffer23 | [PUB] |
Leveraging Spanning Tree to Detect Colluding Attackers in Federated Learning | Missouri S&T | INFCOM Workshops | 2022 | FL-ST24 | [PUB] |
Federated learning of molecular properties with graph neural networks in a heterogeneous setting | University of Rochester | Patterns | 2022 | FLIT+25 | [PUB] [PDF] [CODE] |
Multi-Level Federated Graph Learning and Self-Attention Based Personalized Wi-Fi Indoor Fingerprint Localization | SYSU | IEEE Commun. Lett. | 2022 | ML-FGL26 | [PUB] |
Decentralized Graph Federated Multitask Learning for Streaming Data | NTNU | CISS | 2022 | PSO-GFML27 | [PUB.] |
Dynamic Neural Graphs Based Federated Reptile for Semi-Supervised Multi-Tasking in Healthcare Applications | Oxford | JBHI | 2022 | DNG-FR28 | [PUB.] |
FedGCN: Federated Learning-Based Graph Convolutional Networks for Non-Euclidean Spatial Data | NUIST | Mathematics | 2022 | FedGCN-NES29 | [PUB] |
Device Sampling for Heterogeneous Federated Learning: Theory, Algorithms, and Implementation. | Purdue | INFOCOM 🎓 | 2021 | D2D-FedL30 | [PUB] [PDF] |
Federated Graph Classification over Non-IID Graphs | Emory | NeurIPS 🎓 | 2021 | GCFL31 | [PUB.] [PDF] [CODE] [解读] |
Subgraph Federated Learning with Missing Neighbor Generation | Emory; UBC; Lehigh University | NeurIPS 🎓 | 2021 | FedSage32 | [PUB.] [PDF] |
Cross-Node Federated Graph Neural Network for Spatio-Temporal Data Modeling | USC | KDD 🎓 | 2021 | CNFGNN33 | [PUB] [PDF] [CODE] [解读] |
Differentially Private Federated Knowledge Graphs Embedding kg. |
BUAA | CIKM | 2021 | FKGE34 | [PUB] [PDF] [CODE] [解读] |
Decentralized Federated Graph Neural Networks | Blue Elephant Tech | IJCAI Workshop | 2021 | D-FedGNN35 | [PDF] |
FedSGC: Federated Simple Graph Convolution for Node Classification | HKUST | IJCAI Workshop | 2021 | FedSGC36 | [PDF] |
FL-DISCO: Federated Generative Adversarial Network for Graph-based Molecule Drug Discovery: Special Session Paper | UNM | ICCAD | 2021 | FL-DISCO37 | [PUB.] |
FASTGNN: A Topological Information Protected Federated Learning Approach for Traffic Speed Forecasting | UTS | IEEE Trans. Ind. Informatics | 2021 | FASTGNN38 | [PUB] |
DAG-FL: Direct Acyclic Graph-based Blockchain Empowers On-Device Federated Learning | BUPT; UESTC | ICC | 2021 | DAG-FL39 | [PUB.] [PDF] |
FedE: Embedding Knowledge Graphs in Federated Setting kg. |
ZJU | IJCKG | 2021 | FedE40 | [PUB.] [PDF] [CODE] |
Federated Knowledge Graph Embeddings with Heterogeneous Data kg. |
TJU | CCKS | 2021 | FKE41 | [PUB.] |
A Graph Federated Architecture with Privacy Preserving Learning | EPFL | SPAWC | 2021 | GFL42 | [PUB.] [PDF] [解读] |
Federated Social Recommendation with Graph Neural Network | UIC | ACM TIST | 2021 | FeSoG43 | [PUB] [PDF] [CODE] |
FedGraphNN: A Federated Learning System and Benchmark for Graph Neural Networks 🔥 surv. |
USC | ICLR Workshop / MLSys Workshop | 2021 | FedGraphNN44 | [PDF] [CODE] [解读] |
A Federated Multigraph Integration Approach for Connectional Brain Template Learning | Istanbul Technical University | MICCAI Workshop | 2021 | Fed-CBT45 | [PUB.] [CODE] |
Cluster-driven Graph Federated Learning over Multiple Domains | Politecnico di Torino | CVPR Workshop | 2021 | FedCG-MD46 | [PDF] [解读] |
FedGNN: Federated Graph Neural Network for Privacy-Preserving Recommendation | THU | ICML workshop | 2021 | FedGNN47 | [PDF] [解读] |
Decentralized federated learning of deep neural networks on non-iid data | RISE; Chalmers University of Technology | ICML workshop | 2021 | DFL-PENS48 | [PDF] [CODE] |
Glint: Decentralized Federated Graph Learning with Traffic Throttling and Flow Scheduling | The University of Aizu | IWQoS | 2021 | Glint49 | [PUB.] |
Federated Graph Neural Network for Cross-graph Node Classification | BUPT | CCIS | 2021 | FGNN50 | [PUB.] |
GraFeHTy: Graph Neural Network using Federated Learning for Human Activity Recognition | Lead Data Scientist Ericsson Digital Services | ICMLA | 2021 | GraFeHTy51 | [PUB.] |
Distributed Training of Graph Convolutional Networks | Sapienza University of Rome | TSIPN | 2021 | D-GCN52 | [PUB] [PDF] [解读] |
Decentralized federated learning for electronic health records | UMN | NeurIPS Workshop / CISS | 2020 | FL-DSGD53 | [PUB] [PDF] [解读] |
ASFGNN: Automated Separated-Federated Graph Neural Network | Ant Group | PPNA | 2020 | ASFGNN54 | [PUB.] [PDF] [解读] |
Decentralized federated learning via sgd over wireless d2d networks | SZU | SPAWC | 2020 | DSGD55 | [PUB] [PDF] |
SGNN: A Graph Neural Network Based Federated Learning Approach by Hiding Structure | SDU | BigData | 2019 | SGNN56 | [PUB] [PDF] |
Towards Federated Graph Learning for Collaborative Financial Crimes Detection | IBM | NeurIPS Workshop | 2019 | FGL-DFC57 | [PDF] |
Federated learning of predictive models from federated Electronic Health Records ⭐ | BU | Int. J. Medical Informatics | 2018 | cPDS58 | [PUB] |
Federated Graph Contrastive Learning | UTS | preprint | 2022 | FGCL59 | [PDF] |
Federated Graph Machine Learning: A Survey of Concepts, Techniques, and Applications surv. |
University of Virginia | CIKM Workshop (Oral) | 2022 | FGML60 | [PDF] |
FD-GATDR: A Federated-Decentralized-Learning Graph Attention Network for Doctor Recommendation Using EHR | preprint | 2022 | FD-GATDR61 | [PDF] | |
Privacy-preserving Graph Analytics: Secure Generation and Federated Learning | preprint | 2022 | [PDF] | ||
Personalized Subgraph Federated Learning | preprint | 2022 | FED-PUB62 | [PDF] | |
Federated Graph Attention Network for Rumor Detection | preprint | 2022 | [PDF] [CODE] | ||
FedRel: An Adaptive Federated Relevance Framework for Spatial Temporal Graph Learning | preprint | 2022 | [PDF] | ||
Privatized Graph Federated Learning | preprint | 2022 | [PDF] | ||
Graph-Assisted Communication-Efficient Ensemble Federated Learning | preprint | 2022 | [PDF] | ||
Federated Graph Neural Networks: Overview, Techniques and Challenges surv. |
preprint | 2022 | [PDF] | ||
Decentralized event-triggered federated learning with heterogeneous communication thresholds. | preprint | 2022 | EF-HC63 | [PDF] | |
More is Better (Mostly): On the Backdoor Attacks in Federated Graph Neural Networks | preprint | 2022 | [PDF] | ||
Federated Learning with Heterogeneous Architectures using Graph HyperNetworks | preprint | 2022 | [PDF] | ||
STFL: A Temporal-Spatial Federated Learning Framework for Graph Neural Networks | preprint | 2021 | [PDF] [CODE] | ||
Graph-Fraudster: Adversarial Attacks on Graph Neural Network Based Vertical Federated Learning | preprint | 2021 | [PDF] [CODE] | ||
PPSGCN: A Privacy-Preserving Subgraph Sampling Based Distributed GCN Training Method | preprint | 2021 | PPSGCN64 | [PDF] | |
Leveraging a Federation of Knowledge Graphs to Improve Faceted Search in Digital Libraries kg. |
preprint | 2021 | [PDF] | ||
Federated Myopic Community Detection with One-shot Communication | preprint | 2021 | [PDF] | ||
Federated Graph Learning -- A Position Paper surv. |
preprint | 2021 | [PDF] | ||
A Vertical Federated Learning Framework for Graph Convolutional Network | preprint | 2021 | FedVGCN65 | [PDF] | |
FedGL: Federated Graph Learning Framework with Global Self-Supervision | preprint | 2021 | FedGL66 | [PDF] | |
FL-AGCNS: Federated Learning Framework for Automatic Graph Convolutional Network Search | preprint | 2021 | FL-AGCNS67 | [PDF] | |
Towards On-Device Federated Learning: A Direct Acyclic Graph-based Blockchain Approach | preprint | 2021 | [PDF] | ||
A New Look and Convergence Rate of Federated Multi-Task Learning with Laplacian Regularization | preprint | 2021 | dFedU68 | [PDF] [CODE] | |
GraphFL: A Federated Learning Framework for Semi-Supervised Node Classification on Graphs | preprint | 2020 | GraphFL69 | [PDF] [解读] | |
Improving Federated Relational Data Modeling via Basis Alignment and Weight Penalty kg. |
preprint | 2020 | FedAlign-KG70 | [PDF] | |
Federated Dynamic GNN with Secure Aggregation | preprint | 2020 | [PDF] | ||
GraphFederator: Federated Visual Analysis for Multi-party Graphs | preprint | 2020 | [PDF] | ||
Privacy-Preserving Graph Neural Network for Node Classification | preprint | 2020 | [PDF] | ||
Peer-to-peer federated learning on graphs | UC | preprint | 2019 | P2P-FLG71 | [PDF] [解读] |
- [Arxiv 2021] Privacy-Preserving Graph Convolutional Networks for Text Classification. [PDF]
- [Arxiv 2021] GraphMI: Extracting Private Graph Data from Graph Neural Networks. [PDF]
- [Arxiv 2021] Towards Representation Identical Privacy-Preserving Graph Neural Network via Split Learning. [PDF]
- [Arxiv 2020] Locally Private Graph Neural Networks. [PDF]
This section refers to DBLP search engine.
Title | Affiliation | Venue | Year | TL;DR | Materials |
---|---|---|---|---|---|
Federated Functional Gradient Boosting | University of Pennsylvania | AISTATS 🎓 | 2022 | FFGB72 | [PUB] [PDF] [CODE] |
Federated Random Forests can improve local performance of predictive models for various healthcare applications | University of Marburg | Bioinform. | 2022 | FRF73 | [PUB] [CODE] |
Federated Forest | JD | TBD | 2022 | FF74 | [PUB] [PDF] |
Fed-GBM: a cost-effective federated gradient boosting tree for non-intrusive load monitoring | The University of Sydney | e-Energy | 2022 | Fed-GBM75 | [PUB] |
BOFRF: A Novel Boosting-Based Federated Random Forest Algorithm on Horizontally Partitioned Data | METU | IEEE Access | 2022 | BOFRF76 | [PUB] |
eFL-Boost: Efficient Federated Learning for Gradient Boosting Decision Trees | kobe-u | IEEE Access | 2022 | eFL-Boost77 | [PUB] |
Random Forest Based on Federated Learning for Intrusion Detection | Malardalen University | AIAI | 2022 | FL-RF78 | [PUB] |
Cross-silo federated learning based decision trees | ETH Zürich | SAC | 2022 | FL-DT79 | [PUB] |
Leveraging Spanning Tree to Detect Colluding Attackers in Federated Learning | Missouri S&T | INFCOM Workshops | 2022 | FL-ST24 | [PUB] |
VF2Boost: Very Fast Vertical Federated Gradient Boosting for Cross-Enterprise Learning | PKU | SIGMOD 🎓 | 2021 | VF2Boost80 | [PUB] |
SecureBoost: A Lossless Federated Learning Framework 🔥 | UC | IEEE Intell. Syst. | 2021 | SecureBoost81 | [PUB] [PDF] [SLIDES] [CODE] [解读] [UC] |
A Blockchain-Based Federated Forest for SDN-Enabled In-Vehicle Network Intrusion Detection System | CNU | IEEE Access | 2021 | BFF-IDS82 | [PUB] |
Research on privacy protection of multi source data based on improved gbdt federated ensemble method with different metrics | NCUT | Phys. Commun. | 2021 | I-GBDT83 | [PUB] |
Fed-EINI: An Efficient and Interpretable Inference Framework for Decision Tree Ensembles in Vertical Federated Learning | UCAS; CAS | IEEE BigData | 2021 | Fed-EINI84 | [PUB] [PDF] |
Gradient Boosting Forest: a Two-Stage Ensemble Method Enabling Federated Learning of GBDTs | THU | ICONIP | 2021 | GBF-Cen85 | [PUB] |
A k-Anonymised Federated Learning Framework with Decision Trees | Umeå University | DPM/CBT @ESORICS | 2021 | KA-FL86 | [PUB] |
AF-DNDF: Asynchronous Federated Learning of Deep Neural Decision Forests | Chalmers | SEAA | 2021 | AF-DNDF87 | [PUB] |
Compression Boosts Differentially Private Federated Learning | Univ. Grenoble Alpes | EuroS&P | 2021 | CB-DP88 | [PUB] [PDF] |
Practical Federated Gradient Boosting Decision Trees | NUS; UWA | AAAI 🎓 | 2020 | SimFL89 | [PUB] [PDF] [CODE] |
Privacy Preserving Vertical Federated Learning for Tree-based Models | NUS | VLDB 🎓 | 2020 | Pivot-DT90 | [PUB] [PDF] [VIDEO] [CODE] |
Boosting Privately: Federated Extreme Gradient Boosting for Mobile Crowdsensing | Xidian University | ICDCS | 2020 | FEDXGB91 | [PUB] [PDF] |
FedCluster: Boosting the Convergence of Federated Learning via Cluster-Cycling | University of Utah | IEEE BigData | 2020 | FedCluster92 | [PUB] [PDF] |
New Approaches to Federated XGBoost Learning for Privacy-Preserving Data Analysis | kobe-u | ICONIP | 2020 | FL-XGBoost93 | [PUB] |
Bandwidth Slicing to Boost Federated Learning Over Passive Optical Networks | Chalmers University of Technology | IEEE Communications Letters | 2020 | FL-PON94 | [PUB] |
DFedForest: Decentralized Federated Forest | UFRJ | Blockchain | 2020 | DFedForest95 | [PUB] |
Straggler Remission for Federated Learning via Decentralized Redundant Cayley Tree | Stevens Institute of Technology | LATINCOM | 2020 | DRC-tree96 | [PUB] |
Federated Soft Gradient Boosting Machine for Streaming Data | Sinovation Ventures AI Institute | Federated Learning | 2020 | Fed-sGBM97 | [PUB] [解读] |
Federated Learning of Deep Neural Decision Forests | Fraunhofer-Chalmers Centre | LOD | 2019 | FL-DNDF98 | [PUB] |
Statistical Detection of Adversarial examples in Blockchain-based Federated Forest In-vehicle Network Intrusion Detection Systems | preprint | 2022 | [PDF] | ||
Hercules: Boosting the Performance of Privacy-preserving Federated Learning | preprint | 2022 | Hercules99 | [PDF] | |
FedGBF: An efficient vertical federated learning framework via gradient boosting and bagging | preprint | 2022 | FedGBF100 | [PDF] | |
A Fair and Efficient Hybrid Federated Learning Framework based on XGBoost for Distributed Power Prediction. | THU | preprint | 2022 | HFL-XGBoost101 | [PDF] |
An Efficient and Robust System for Vertically Federated Random Forest | preprint | 2022 | [PDF] | ||
Efficient Batch Homomorphic Encryption for Vertically Federated XGBoost. | BUAA | preprint | 2021 | EBHE-VFXGB102 | [PDF] |
Guess what? You can boost Federated Learning for free | preprint | 2021 | [PDF] | ||
SecureBoost+ : A High Performance Gradient Boosting Tree Framework for Large Scale Vertical Federated Learning 🔥 | preprint | 2021 | SecureBoost+103 | [PDF] [CODE] | |
Fed-TGAN: Federated Learning Framework for Synthesizing Tabular Data | preprint | 2021 | Fed-TGAN104 | [PDF] | |
FedXGBoost: Privacy-Preserving XGBoost for Federated Learning | TUM | preprint | 2021 | FedXGBoost105 | [PDF] |
An Efficient Learning Framework For Federated XGBoost Using Secret Sharing And Distributed Optimization. | Tongji University | preprint | 2021 | MP-FedXGB106 | [PDF] [CODE] |
A Tree-based Federated Learning Approach for Personalized Treatment Effect Estimation from Heterogeneous Data Sources | preprint | 2021 | [PDF] [CODE] | ||
Adaptive Histogram-Based Gradient Boosted Trees for Federated Learning | preprint | 2020 | [PDF] | ||
FederBoost: Private Federated Learning for GBDT | ZJU | preprint | 2020 | FederBoost107 | [PDF] |
Privacy Preserving Text Recognition with Gradient-Boosting for Federated Learning | preprint | 2020 | [PDF] [CODE] | ||
Cloud-based Federated Boosting for Mobile Crowdsensing | preprint | 2020 | [ARXIV] | ||
Federated Extra-Trees with Privacy Preserving | preprint | 2020 | [PDF] | ||
Bandwidth Slicing to Boost Federated Learning in Edge Computing | preprint | 2019 | [PDF] | ||
Revocable Federated Learning: A Benchmark of Federated Forest | preprint | 2019 | [PDF] | ||
The Tradeoff Between Privacy and Accuracy in Anomaly Detection Using Federated XGBoost | CUHK | preprint | 2019 | F-XGBoost108 | [PDF] [CODE] |
List of papers in the field of federated learning in Nature(and its sub-journals), Cell, Science(and Science Advances) and PANS refers to WOS search engine.
Title | Affiliation | Venue | Year | TL;DR | Materials |
---|---|---|---|---|---|
Federated disentangled representation learning for unsupervised brain anomaly detection | TUM | Nat. Mach. Intell. | 2022 | FedDis109 | [PUB] [PDF] [CODE] |
Shifting machine learning for healthcare from development to deployment and from models to data | Nat. Biomed. Eng. | 2022 | FL-healthy110 | [PUB] | |
A federated graph neural network framework for privacy-preserving personalization | THU | Nat. Commun. | 2022 | FedPerGNN12 | [PUB] [CODE] [解读] |
Communication-efficient federated learning via knowledge distillation | Nat. Commun. | 2022 | [PUB] [PDF] [CODE] | ||
Lead federated neuromorphic learning for wireless edge artificial intelligence | Nat. Commun. | 2022 | [PUB] [CODE] [解读] | ||
Advancing COVID-19 diagnosis with privacy-preserving collaboration in artificial intelligence | Nat. Mach. Intell. | 2021 | [PUB] [PDF] [CODE] | ||
Federated learning for predicting clinical outcomes in patients with COVID-19 | Nat. Med. | 2021 | [PUB] [CODE] | ||
Adversarial interference and its mitigations in privacy-preserving collaborative machine learning | Nat. Mach. Intell. | 2021 | [PUB] | ||
Swarm Learning for decentralized and confidential clinical machine learning ⭐ | Nature 🎓 | 2021 | [PUB] [CODE] [SOFTWARE] [解读] | ||
End-to-end privacy preserving deep learning on multi-institutional medical imaging | Nat. Mach. Intell. | 2021 | [PUB] [CODE] [解读] | ||
Communication-efficient federated learning | PANS. | 2021 | [PUB] [CODE] | ||
Breaking medical data sharing boundaries by using synthesized radiographs | Science. Advances. | 2020 | [PUB] [CODE] | ||
Secure, privacy-preserving and federated machine learning in medical imaging ⭐ | Nat. Mach. Intell. | 2020 | [PUB] |
In this section, we will summarize Federated Learning papers accepted by top AI(Artificial Intelligence) conference and journal, Including IJCAI(International Joint Conference on Artificial Intelligence), AAAI(AAAI Conference on Artificial Intelligence), AISTATS(Artificial Intelligence and Statistics).
Title | Affiliation | Venue | Year | TL;DR | Materials |
---|---|---|---|---|---|
Towards Understanding Biased Client Selection in Federated Learning. | CMU | AISTATS | 2022 | [PUB] [CODE] | |
FLIX: A Simple and Communication-Efficient Alternative to Local Methods in Federated Learning | KAUST | AISTATS | 2022 | FLIX111 | [PUB] [PDF] [CODE] |
Sharp Bounds for Federated Averaging (Local SGD) and Continuous Perspective. | Stanford | AISTATS | 2022 | [PUB] [PDF] [CODE] | |
Federated Reinforcement Learning with Environment Heterogeneity. | PKU | AISTATS | 2022 | [PUB] [PDF] [CODE] | |
Federated Myopic Community Detection with One-shot Communication | Purdue | AISTATS | 2022 | [PUB] [PDF] | |
Asynchronous Upper Confidence Bound Algorithms for Federated Linear Bandits. | University of Virginia | AISTATS | 2022 | [PUB] [PDF] [CODE] | |
Towards Federated Bayesian Network Structure Learning with Continuous Optimization. | CMU | AISTATS | 2022 | [PUB] [PDF] [CODE] | |
Federated Learning with Buffered Asynchronous Aggregation | Meta AI | AISTATS | 2022 | [PUB] [PDF] [VIDEO] | |
Differentially Private Federated Learning on Heterogeneous Data. | Stanford | AISTATS | 2022 | DP-SCAFFOLD112 | [PUB] [PDF] [CODE] |
SparseFed: Mitigating Model Poisoning Attacks in Federated Learning with Sparsification | Princeton | AISTATS | 2022 | SparseFed113 | [PUB] [PDF] [CODE] [VIDEO] |
Basis Matters: Better Communication-Efficient Second Order Methods for Federated Learning | KAUST | AISTATS | 2022 | [PUB] [PDF] | |
Federated Functional Gradient Boosting. | University of Pennsylvania | AISTATS | 2022 | [PUB] [PDF] [CODE] | |
QLSD: Quantised Langevin Stochastic Dynamics for Bayesian Federated Learning. | Criteo AI Lab | AISTATS | 2022 | QLSD114 | [PUB] [PDF] [CODE] [VIDEO] |
Meta-Learning Based Knowledge Extrapolation for Knowledge Graphs in the Federated Setting kg. |
ZJU | IJCAI | 2022 | MaKEr4 | [PUB] [PDF] [CODE] |
Personalized Federated Learning With a Graph | UTS | IJCAI | 2022 | SFL5 | [PUB] [PDF] [CODE] |
Vertically Federated Graph Neural Network for Privacy-Preserving Node Classification | ZJU | IJCAI | 2022 | VFGNN6 | [PUB] [PDF] |
Adapt to Adaptation: Learning Personalization for Cross-Silo Federated Learning | IJCAI | 2022 | [PUB] [PDF] [CODE] | ||
Heterogeneous Ensemble Knowledge Transfer for Training Large Models in Federated Learning | IJCAI | 2022 | Fed-ET115 | [PUB] [PDF] | |
Private Semi-Supervised Federated Learning. | IJCAI | 2022 | [PUB] | ||
Continual Federated Learning Based on Knowledge Distillation. | IJCAI | 2022 | [PUB] | ||
Federated Learning on Heterogeneous and Long-Tailed Data via Classifier Re-Training with Federated Features | IJCAI | 2022 | CReFF116 | [PUB] [PDF] [CODE] | |
Federated Multi-Task Attention for Cross-Individual Human Activity Recognition | IJCAI | 2022 | [PUB] | ||
Personalized Federated Learning with Contextualized Generalization. | IJCAI | 2022 | [PUB] [PDF] | ||
Shielding Federated Learning: Robust Aggregation with Adaptive Client Selection. | IJCAI | 2022 | [PUB] [PDF] | ||
FedCG: Leverage Conditional GAN for Protecting Privacy and Maintaining Competitive Performance in Federated Learning | IJCAI | 2022 | FedCG117 | [PUB] [PDF] [CODE] | |
FedDUAP: Federated Learning with Dynamic Update and Adaptive Pruning Using Shared Data on the Server. | IJCAI | 2022 | FedDUAP118 | [PUB] [PDF] | |
Towards Verifiable Federated Learning surv. |
IJCAI | 2022 | [PUB] [PDF] | ||
HarmoFL: Harmonizing Local and Global Drifts in Federated Learning on Heterogeneous Medical Images | CUHK; BUAA | AAAI | 2022 | [PUB] [PDF] [CODE] [解读] | |
Federated Learning for Face Recognition with Gradient Correction | BUPT | AAAI | 2022 | [PUB] [PDF] | |
SpreadGNN: Decentralized Multi-Task Federated Learning for Graph Neural Networks on Molecular Data | USC | AAAI | 2022 | SpreadGNN7 | [PUB] [PDF] [CODE] [解读] |
SmartIdx: Reducing Communication Cost in Federated Learning by Exploiting the CNNs Structures | HIT; PCL | AAAI | 2022 | SmartIdx119 | [PUB] [CODE] |
Bridging between Cognitive Processing Signals and Linguistic Features via a Unified Attentional Network | TJU | AAAI | 2022 | [PUB] [PDF] | |
Seizing Critical Learning Periods in Federated Learning | SUNY-Binghamton University | AAAI | 2022 | FedFIM120 | [PUB] [PDF] |
Coordinating Momenta for Cross-silo Federated Learning | University of Pittsburgh | AAAI | 2022 | [PUB] [PDF] | |
FedProto: Federated Prototype Learning over Heterogeneous Devices | UTS | AAAI | 2022 | FedProto121 | [PUB] [PDF] [CODE] |
FedSoft: Soft Clustered Federated Learning with Proximal Local Updating | CMU | AAAI | 2022 | FedSoft122 | [PUB] [PDF] [CODE] |
Federated Dynamic Sparse Training: Computing Less, Communicating Less, Yet Learning Better | The University of Texas at Austin | AAAI | 2022 | [PUB] [PDF] [CODE] | |
FedFR: Joint Optimization Federated Framework for Generic and Personalized Face Recognition | National Taiwan University | AAAI | 2022 | FedFR123 | [PUB] [PDF] [CODE] |
SplitFed: When Federated Learning Meets Split Learning | CSIRO | AAAI | 2022 | SplitFed124 | [PUB] [PDF] [CODE] |
Efficient Device Scheduling with Multi-Job Federated Learning | Soochow University | AAAI | 2022 | [PUB] [PDF] | |
Implicit Gradient Alignment in Distributed and Federated Learning | IIT Kanpur | AAAI | 2022 | [PUB] [PDF] | |
Federated Nearest Neighbor Classification with a Colony of Fruit-Flies | IBM Research | AAAI | 2022 | FlyNNFL125 | [PUB] [PDF] [CODE] |
Federated Learning with Sparsification-Amplified Privacy and Adaptive Optimization | IJCAI | 2021 | [PUB] [PDF] [VIDEO] | ||
Behavior Mimics Distribution: Combining Individual and Group Behaviors for Federated Learning | IJCAI | 2021 | [PUB] [PDF] | ||
FedSpeech: Federated Text-to-Speech with Continual Learning | IJCAI | 2021 | FedSpeech126 | [PUB] [PDF] | |
Practical One-Shot Federated Learning for Cross-Silo Setting | IJCAI | 2021 | FedKT127 | [PUB] [PDF] [CODE] | |
Federated Model Distillation with Noise-Free Differential Privacy | IJCAI | 2021 | FEDMD-NFDP128 | [PUB] [PDF] [VIDEO] | |
LDP-FL: Practical Private Aggregation in Federated Learning with Local Differential Privacy | IJCAI | 2021 | LDP-FL129 | [PUB] [PDF] | |
Federated Learning with Fair Averaging. 🔥 | IJCAI | 2021 | FedFV130 | [PUB] [PDF] [CODE] | |
H-FL: A Hierarchical Communication-Efficient and Privacy-Protected Architecture for Federated Learning. | IJCAI | 2021 | H-FL131 | [PUB] [PDF] | |
Communication-efficient and Scalable Decentralized Federated Edge Learning. | IJCAI | 2021 | [PUB] | ||
Secure Bilevel Asynchronous Vertical Federated Learning with Backward Updating | Xidian University; JD Tech | AAAI | 2021 | [PUB] [PDF] [VIDEO] | |
FedRec++: Lossless Federated Recommendation with Explicit Feedback | SZU | AAAI | 2021 | FedRec++132 | [PUB] [VIDEO] |
Federated Multi-Armed Bandits | University of Virginia | AAAI | 2021 | [PUB] [PDF] [CODE] [VIDEO] | |
On the Convergence of Communication-Efficient Local SGD for Federated Learning | Temple University; University of Pittsburgh | AAAI | 2021 | [PUB] [VIDEO] | |
FLAME: Differentially Private Federated Learning in the Shuffle Model | Renmin University of China; Kyoto University | AAAI | 2021 | FLAME_D133 | [PUB] [PDF] [VIDEO] [CODE] |
Toward Understanding the Influence of Individual Clients in Federated Learning | SJTU; The University of Texas at Dallas | AAAI | 2021 | [PUB] [PDF] [VIDEO] | |
Provably Secure Federated Learning against Malicious Clients | Duke University | AAAI | 2021 | [PUB] [PDF] [VIDEO] [SLIDES] | |
Personalized Cross-Silo Federated Learning on Non-IID Data | Simon Fraser University; McMaster University | AAAI | 2021 | FedAMP134 | [PUB] [PDF] [VIDEO] [UC.] |
Model-Sharing Games: Analyzing Federated Learning under Voluntary Participation | Cornell University | AAAI | 2021 | [PUB] [PDF] [CODE] [VIDEO] | |
Curse or Redemption? How Data Heterogeneity Affects the Robustness of Federated Learning | University of Nevada; IBM Research | AAAI | 2021 | [PUB] [PDF] [VIDEO] | |
Game of Gradients: Mitigating Irrelevant Clients in Federated Learning | IIT Bombay; IBM Research | AAAI | 2021 | [PUB] [PDF] [CODE] [VIDEO] [SUPPLEMENTARY] | |
Federated Block Coordinate Descent Scheme for Learning Global and Personalized Models | CUHK; Arizona State University | AAAI | 2021 | [PUB] [PDF] [VIDEO] [CODE] | |
Addressing Class Imbalance in Federated Learning | Northwestern University | AAAI | 2021 | [PUB] [PDF] [VIDEO] [CODE] [解读] | |
Defending against Backdoors in Federated Learning with Robust Learning Rate | The University of Texas at Dallas | AAAI | 2021 | [PUB] [PDF] [VIDEO] [CODE] | |
Free-rider Attacks on Model Aggregation in Federated Learning | Accenture Labs | AISTAT | 2021 | [PUB] [PDF] [CODE] [VIDEO] [SUPPLEMENTARY] | |
Federated f-differential privacy | University of Pennsylvania | AISTAT | 2021 | [PUB] [CODE] [VIDEO] [SUPPLEMENTARY] | |
Federated learning with compression: Unified analysis and sharp guarantees 🔥 | The Pennsylvania State University; The University of Texas at Austin | AISTAT | 2021 | [PUB] [PDF] [CODE] [VIDEO] [SUPPLEMENTARY] | |
Shuffled Model of Differential Privacy in Federated Learning | UCLA; Google | AISTAT | 2021 | [PUB] [VIDEO] [SUPPLEMENTARY] | |
Convergence and Accuracy Trade-Offs in Federated Learning and Meta-Learning | AISTAT | 2021 | [PUB] [PDF] [VIDEO] [SUPPLEMENTARY] | ||
Federated Multi-armed Bandits with Personalization | University of Virginia; The Pennsylvania State University | AISTAT | 2021 | [PUB] [PDF] [CODE] [VIDEO] [SUPPLEMENTARY] | |
Towards Flexible Device Participation in Federated Learning | CMU; SYSU | AISTAT | 2021 | [PUB] [PDF] [VIDEO] [SUPPLEMENTARY] | |
Federated Meta-Learning for Fraudulent Credit Card Detection | IJCAI | 2020 | [PUB] [VIDEO] | ||
A Multi-player Game for Studying Federated Learning Incentive Schemes | IJCAI | 2020 | FedGame135 | [PUB] [CODE] [解读] | |
Practical Federated Gradient Boosting Decision Trees | NUS; UWA | AAAI | 2020 | SimFL89 | [PUB] [PDF] [CODE] |
Federated Learning for Vision-and-Language Grounding Problems | PKU; Tencent | AAAI | 2020 | [PUB] | |
Federated Latent Dirichlet Allocation: A Local Differential Privacy Based Framework | BUAA | AAAI | 2020 | [PUB] | |
Federated Patient Hashing | Cornell University | AAAI | 2020 | [PUB] | |
Robust Federated Learning via Collaborative Machine Teaching | Symantec Research Labs; KAUST | AAAI | 2020 | [PUB] [PDF] | |
FedVision: An Online Visual Object Detection Platform Powered by Federated Learning | WeBank | AAAI | 2020 | [PUB] [PDF] [CODE] | |
FedPAQ: A Communication-Efficient Federated Learning Method with Periodic Averaging and Quantization | UC Santa Barbara; UT Austin | AISTAT | 2020 | [PUB] [PDF] [VIDEO] [SUPPLEMENTARY] | |
How To Backdoor Federated Learning 🔥 | Cornell Tech | AISTAT | 2020 | [PUB] [PDF] [VIDEO] [CODE] [SUPPLEMENTARY] | |
Federated Heavy Hitters Discovery with Differential Privacy | RPI; Google | AISTAT | 2020 | [PUB] [PDF] [VIDEO] [SUPPLEMENTARY] | |
Multi-Agent Visualization for Explaining Federated Learning | WeBank | IJCAI | 2019 | [PUB] [VIDEO] |
In this section, we will summarize Federated Learning papers accepted by top ML(machine learning) conference and journal, Including NeurIPS(Annual Conference on Neural Information Processing Systems), ICML(International Conference on Machine Learning), ICLR(International Conference on Learning Representations), COLT(Annual Conference Computational Learning Theory) and UAI(Conference on Uncertainty in Artificial Intelligence).
- NeurIPS 2022, 2021, 2020, 2018, 2017
- ICML 2022, 2021, 2020, 2019
- ICLR 2022,2021, 2020
- COLT NULL
- UAI 2021
Title | Affiliation | Venue | Year | TL;DR | Materials |
---|---|---|---|---|---|
FedPop: A Bayesian Approach for Personalised Federated Learning | Skoltech | NeurIPS | 2022 | FedPop136 | [PUB] [PDF] |
Fairness in Federated Learning via Core-Stability | NeurIPS | 2022 | CoreFed137 | [PUB] | |
SecureFedYJ: a safe feature Gaussianization protocol for Federated Learning | NeurIPS | 2022 | SecureFedYJ138 | [PUB] | |
FedRolex: Model-Heterogeneous Federated Learning with Rolling Submodel Extraction | NeurIPS | 2022 | FedRolex139 | [PUB] | |
On Sample Optimality in Personalized Collaborative and Federated Learning | NeurIPS | 2022 | [PUB] | ||
DReS-FL: Dropout-Resilient Secure Federated Learning for Non-IID Clients via Secret Data Sharing | HKUST | NeurIPS | 2022 | DReS-FL140 | [PUB] |
FairVFL: A Fair Vertical Federated Learning Framework with Contrastive Adversarial Learning | THU | NeurIPS | 2022 | FairVFL141 | [PUB] |
Variance Reduced ProxSkip: Algorithm, Theory and Application to Federated Learning | NeurIPS | 2022 | VR-ProxSkip142 | [PUB] | |
VF-PS: How to Select Important Participants in Vertical Federated Learning, Efficiently and Securely? | WHU | NeurIPS | 2022 | VF-PS143 | [PUB] |
DENSE: Data-Free One-Shot Federated Learning | NeurIPS | 2022 | DENSE144 | [PUB] | |
CalFAT: Calibrated Federated Adversarial Training with Label Skewness | ZJU | NeurIPS | 2022 | CalFAT145 | [PUB] [PDF] |
SAGDA: Achieving O(ϵ−2) Communication Complexity in Federated Min-Max Learning | NeurIPS | 2022 | SAGDA146 | [PUB] | |
Taming Fat-Tailed (“Heavier-Tailed” with Potentially Infinite Variance) Noise in Federated Learning | NeurIPS | 2022 | FAT-Clipping147 | [PUB] | |
Personalized Federated Learning towards Communication Efficiency, Robustness and Fairness | NeurIPS | 2022 | [PUB] | ||
Federated Submodel Optimization for Hot and Cold Data Features | SJTU | NeurIPS | 2022 | FedSubAvg148 | [PUB] |
BooNTK: Convexifying Federated Learning using Bootstrapped Neural Tangent Kernels | NeurIPS | 2022 | BooNTK149 | [PUB] [PDF] | |
Byzantine-tolerant federated Gaussian process regression for streaming data | NeurIPS | 2022 | [PUB] | ||
SoteriaFL: A Unified Framework for Private Federated Learning with Communication Compression | CMU | NeurIPS | 2022 | SoteriaFL150 | [PUB] [PDF] |
Coresets for Vertical Federated Learning: Regularized Linear Regression and K-Means Clustering | NeurIPS | 2022 | [PUB] | ||
Communication Efficient Federated Learning for Generalized Linear Bandits | NeurIPS | 2022 | [PUB] | ||
Recovering Private Text in Federated Learning of Language Models | Princeton | NeurIPS | 2022 | FILM151 | [PUB] [PDF] [CODE] |
Federated Learning from Pre-Trained Models: A Contrastive Learning Approach | NeurIPS | 2022 | FedPCL152 | [PUB] | |
Global Convergence of Federated Learning for Mixed Regression | Northeastern University | NeurIPS | 2022 | [PUB] [PDF] | |
Resource-Adaptive Federated Learning with All-In-One Neural Composition | NeurIPS | 2022 | FLANC153 | [PUB] | |
Self-Aware Personalized Federated Learning | Amazon | NeurIPS | 2022 | Self-FL154 | [PUB] [PDF] |
A Communication-efficient Algorithm with Linear Convergence for Federated Minimax Learning | Northeastern University | NeurIPS | 2022 | FedGDA-GT155 | [PUB] [PDF] |
An Adaptive Kernel Approach to Federated Learning of Heterogeneous Causal Effects | NeurIPS | 2022 | [PUB] | ||
Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning | EPFL | NeurIPS | 2022 | [PUB] [PDF] | |
Personalized Online Federated Multi-Kernel Learning | NeurIPS | 2022 | [PUB] | ||
SemiFL: Semi-Supervised Federated Learning for Unlabeled Clients with Alternate Training | NeurIPS | 2022 | SemiFL156 | [PUB] [PDF] | |
A Unified Analysis of Federated Learning with Arbitrary Client Participation | NeurIPS | 2022 | [PUB] [PDF] | ||
Preservation of the Global Knowledge by Not-True Distillation in Federated Learning | NeurIPS | 2022 | FedNTD157 | [PUB] [PDF] | |
FedSR: A Simple and Effective Domain Generalization Method for Federated Learning | NeurIPS | 2022 | FedSR158 | [PUB] | |
Factorized-FL: Personalized Federated Learning with Parameter Factorization & Similarity Matching | NeurIPS | 2022 | Factorized-FL159 | [PUB] [PDF] | |
A Simple and Provably Efficient Algorithm for Asynchronous Federated Contextual Linear Bandits | UC | NeurIPS | 2022 | FedLinUCB160 | [PUB] [PDF] |
Learning to Attack Federated Learning: A Model-based Reinforcement Learning Attack Framework | Tulane University | NeurIPS | 2022 | [PUB] | |
On Privacy and Personalization in Cross-Silo Federated Learning | CMU | NeurIPS | 2022 | [PUB] [PDF] | |
A Coupled Design of Exploiting Record Similarity for Practical Vertical Federated Learning | NUS | NeurIPS | 2022 | FedSim161 | [PUB] [PDF] |
Fast Composite Optimization and Statistical Recovery in Federated Learning | SJTU | ICML | 2022 | [PUB] [PDF] [CODE] | |
Personalization Improves Privacy-Accuracy Tradeoffs in Federated Learning | NYU | ICML | 2022 | PPSGD162 | [PUB] [PDF] [CODE] |
The Fundamental Price of Secure Aggregation in Differentially Private Federated Learning 🔥 | Stanford; Google Research | ICML | 2022 | [PUB] [PDF] [CODE] [SLIDES] | |
The Poisson Binomial Mechanism for Unbiased Federated Learning with Secure Aggregation | Stanford; Google Research | ICML | 2022 | PBM163 | [PUB] [PDF] [CODE] |
DisPFL: Towards Communication-Efficient Personalized Federated Learning via Decentralized Sparse Training | USTC | ICML | 2022 | DisPFL164 | [PUB] [PDF] [CODE] |
FedNew: A Communication-Efficient and Privacy-Preserving Newton-Type Method for Federated Learning | University of Oulu | ICML | 2022 | FedNew165 | [PUB] [PDF] [CODE] |
DAdaQuant: Doubly-adaptive quantization for communication-efficient Federated Learning | University of Cambridge | ICML | 2022 | DAdaQuant166 | [PUB] [PDF] [SLIDES] [CODE] |
Accelerated Federated Learning with Decoupled Adaptive Optimization | Auburn University | ICML | 2022 | [PUB] [PDF] | |
Federated Reinforcement Learning: Linear Speedup Under Markovian Sampling | Georgia Tech | ICML | 2022 | [PUB] [PDF] | |
Multi-Level Branched Regularization for Federated Learning | Seoul National University | ICML | 2022 | FedMLB167 | [PUB] [PDF] [CODE] [PAGE] |
FedScale: Benchmarking Model and System Performance of Federated Learning at Scale 🔥 | University of Michigan | ICML | 2022 | FedScale168 | [PUB] [PDF] [CODE] |
Federated Learning with Positive and Unlabeled Data | XJTU | ICML | 2022 | FedPU169 | [PUB] [PDF] [CODE] |
Deep Neural Network Fusion via Graph Matching with Applications to Model Ensemble and Federated Learning | SJTU | ICML | 2022 | [PUB] [CODE] | |
Orchestra: Unsupervised Federated Learning via Globally Consistent Clustering | University of Michigan | ICML | 2022 | Orchestra170 | [PUB] [PDF] [CODE] |
Disentangled Federated Learning for Tackling Attributes Skew via Invariant Aggregation and Diversity Transferring | USTC | ICML | 2022 | DFL171 | [PUB] [PDF] [CODE] [SLIDES] [解读] |
Architecture Agnostic Federated Learning for Neural Networks | The University of Texas at Austin | ICML | 2022 | FedHeNN172 | [PUB] [PDF] [SLIDES] |
Personalized Federated Learning through Local Memorization | Inria | ICML | 2022 | KNN-PER173 | [PUB] [PDF] [CODE] |
Proximal and Federated Random Reshuffling | KAUST | ICML | 2022 | ProxRR174 | [PUB] [PDF] [CODE] |
Federated Learning with Partial Model Personalization | University of Washington | ICML | 2022 | [PUB] [PDF] [CODE] | |
Generalized Federated Learning via Sharpness Aware Minimization | University of South Florida | ICML | 2022 | [PUB] [PDF] | |
FedNL: Making Newton-Type Methods Applicable to Federated Learning | KAUST | ICML | 2022 | FedNL175 | [PUB] [PDF] [VIDEO] [SLIDES] |
Federated Minimax Optimization: Improved Convergence Analyses and Algorithms | CMU | ICML | 2022 | [PUB] [PDF] [SLIDES] | |
Virtual Homogeneity Learning: Defending against Data Heterogeneity in Federated Learning | Hong Kong Baptist University | ICML | 2022 | VFL176 | [PUB] [PDF] [CODE] [解读] |
FedNest: Federated Bilevel, Minimax, and Compositional Optimization | University of Michigan | ICML | 2022 | FedNest177 | [PUB] [PDF] [CODE] |
EDEN: Communication-Efficient and Robust Distributed Mean Estimation for Federated Learning | VMware Research | ICML | 2022 | EDEN178 | [PUB] [PDF] [CODE] |
Communication-Efficient Adaptive Federated Learning | Pennsylvania State University | ICML | 2022 | [PUB] [PDF] | |
ProgFed: Effective, Communication, and Computation Efficient Federated Learning by Progressive Training | CISPA Helmholz Center for Information Security | ICML | 2022 | ProgFed179 | [PUB] [PDF] [SLIDES] [CODE] |
Fishing for User Data in Large-Batch Federated Learning via Gradient Magnification 🔥 | University of Maryland | ICML | 2022 | breaching180 | [PUB] [PDF] [CODE] |
Anarchic Federated Learning | The Ohio State University | ICML | 2022 | [PUB] [PDF] | |
QSFL: A Two-Level Uplink Communication Optimization Framework for Federated Learning | Nankai University | ICML | 2022 | QSFL181 | [PUB] [CODE] |
Bitwidth Heterogeneous Federated Learning with Progressive Weight Dequantization | KAIST | ICML | 2022 | [PUB] [PDF] | |
Neural Tangent Kernel Empowered Federated Learning | NC State University | ICML | 2022 | [PUB] [PDF] [CODE] | |
Understanding Clipping for Federated Learning: Convergence and Client-Level Differential Privacy | UMN | ICML | 2022 | [PUB] [PDF] | |
Personalized Federated Learning via Variational Bayesian Inference | CAS | ICML | 2022 | [PUB] [PDF] [SLIDES] [UC.] | |
Federated Learning with Label Distribution Skew via Logits Calibration | ZJU | ICML | 2022 | [PUB] | |
Neurotoxin: Durable Backdoors in Federated Learning | Southeast University;Princeton | ICML | 2022 | Neurotoxin182 | [PUB] [PDF] [CODE] |
Resilient and Communication Efficient Learning for Heterogeneous Federated Systems | Michigan State University | ICML | 2022 | [PUB] | |
Bayesian Framework for Gradient Leakage | ETH Zurich | ICLR | 2022 | [PUB] [PDF] [CODE] | |
Federated Learning from only unlabeled data with class-conditional-sharing clients | The University of Tokyo; CUHK | ICLR | 2022 | FedUL183 | [PUB] [CODE] |
FedChain: Chained Algorithms for Near-Optimal Communication Cost in Federated Learning | CMU; University of Illinois at Urbana-Champaign; University of Washington | ICLR | 2022 | FedChain184 | [PUB] [PDF] |
Acceleration of Federated Learning with Alleviated Forgetting in Local Training | THU | ICLR | 2022 | FedReg185 | [PUB] [PDF] [CODE] |
FedPara: Low-rank Hadamard Product for Communicatkion-Efficient Federated Learning | POSTECH | ICLR | 2022 | [PUB] [PDF] [CODE] | |
An Agnostic Approach to Federated Learning with Class Imbalance | University of Pennsylvania | ICLR | 2022 | [PUB] [CODE] | |
Efficient Split-Mix Federated Learning for On-Demand and In-Situ Customization | Michigan State University; The University of Texas at Austin | ICLR | 2022 | [PUB] [PDF] [CODE] | |
Robbing the Fed: Directly Obtaining Private Data in Federated Learning with Modified Models 🔥 | University of Maryland; NYU | ICLR | 2022 | [PUB] [PDF] [CODE] | |
ZeroFL: Efficient On-Device Training for Federated Learning with Local Sparsity | University of Cambridge; University of Oxford | ICLR | 2022 | [PUB] [PDF] | |
Diverse Client Selection for Federated Learning via Submodular Maximization | Intel; CMU | ICLR | 2022 | [PUB] [CODE] | |
Recycling Model Updates in Federated Learning: Are Gradient Subspaces Low-Rank? | Purdue | ICLR | 2022 | [PUB] [PDF] [CODE] | |
Diurnal or Nocturnal? Federated Learning of Multi-branch Networks from Periodically Shifting Distributions 🔥 | University of Maryland; Google | ICLR | 2022 | [PUB] [CODE] | |
Towards Model Agnostic Federated Learning Using Knowledge Distillation | EPFL | ICLR | 2022 | [PUB] [PDF] [CODE] | |
Divergence-aware Federated Self-Supervised Learning | NTU; SenseTime | ICLR | 2022 | [PUB] [PDF] [CODE] | |
What Do We Mean by Generalization in Federated Learning? 🔥 | Stanford; Google | ICLR | 2022 | [PUB] [PDF] [CODE] | |
FedBABU: Toward Enhanced Representation for Federated Image Classification | KAIST | ICLR | 2022 | [PUB] [PDF] [CODE] | |
Byzantine-Robust Learning on Heterogeneous Datasets via Bucketing | EPFL | ICLR | 2022 | [PUB] [PDF] [CODE] | |
Improving Federated Learning Face Recognition via Privacy-Agnostic Clusters | Aibee | ICLR Spotlight | 2022 | [PUB] [PDF] [PAGE] [解读] | |
Hybrid Local SGD for Federated Learning with Heterogeneous Communications | University of Texas; Pennsylvania State University | ICLR | 2022 | [PUB] | |
On Bridging Generic and Personalized Federated Learning for Image Classification | The Ohio State University | ICLR | 2022 | Fed-RoD186 | [PUB] [PDF] [CODE] |
Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond | KAIST; MIT | ICLR | 2022 | [PUB] [PDF] | |
Constrained differentially private federated learning for low-bandwidth devices | UAI | 2021 | [PUB] [PDF] | ||
Federated stochastic gradient Langevin dynamics | UAI | 2021 | [PUB] [PDF] | ||
Federated Learning Based on Dynamic Regularization | BU; ARM | ICLR | 2021 | [PUB] [PDF] [CODE] | |
Achieving Linear Speedup with Partial Worker Participation in Non-IID Federated Learning | The Ohio State University | ICLR | 2021 | [PUB] [PDF] | |
HeteroFL: Computation and Communication Efficient Federated Learning for Heterogeneous Clients | Duke University | ICLR | 2021 | HeteroFL187 | [PUB] [PDF] [CODE] |
FedMix: Approximation of Mixup under Mean Augmented Federated Learning | KAIST | ICLR | 2021 | FedMix188 | [PUB] [PDF] |
Federated Learning via Posterior Averaging: A New Perspective and Practical Algorithms 🔥 | CMU; Google | ICLR | 2021 | [PUB] [PDF] [CODE] | |
Adaptive Federated Optimization 🔥 | ICLR | 2021 | [PUB] [PDF] [CODE] | ||
Personalized Federated Learning with First Order Model Optimization | Stanford; NVIDIA | ICLR | 2021 | FedFomo189 | [PUB] [PDF] [CODE] [UC.] |
FedBN: Federated Learning on Non-IID Features via Local Batch Normalization 🔥 | Princeton | ICLR | 2021 | FedBN190 | [PUB] [PDF] [CODE] |
FedBE: Making Bayesian Model Ensemble Applicable to Federated Learning | The Ohio State University | ICLR | 2021 | FedBE191 | [PUB] [PDF] [CODE] |
Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint Learning | KAIST | ICLR | 2021 | [PUB] [PDF] [CODE] | |
KD3A: Unsupervised Multi-Source Decentralized Domain Adaptation via Knowledge Distillation | ZJU | ICML | 2021 | [PUB] [PDF] [CODE] [解读] | |
Gradient Disaggregation: Breaking Privacy in Federated Learning by Reconstructing the User Participant Matrix | Harvard University | ICML | 2021 | [PUB] [PDF] [VIDEO] [CODE] | |
FL-NTK: A Neural Tangent Kernel-based Framework for Federated Learning Analysis | PKU; Princeton | ICML | 2021 | FL-NTK192 | [PUB] [PDF] [VIDEO] |
Personalized Federated Learning using Hypernetworks 🔥 | Bar-Ilan University; NVIDIA | ICML | 2021 | [PUB] [PDF] [CODE] [PAGE] [VIDEO] [解读] | |
Federated Composite Optimization | Stanford; Google | ICML | 2021 | [PUB] [PDF] [CODE] [VIDEO] [SLIDES] | |
Exploiting Shared Representations for Personalized Federated Learning | University of Texas at Austin; University of Pennsylvania | ICML | 2021 | [PUB] [PDF] [CODE] [VIDEO] | |
Data-Free Knowledge Distillation for Heterogeneous Federated Learning 🔥 | Michigan State University | ICML | 2021 | [PUB] [PDF] [CODE] [VIDEO] | |
Federated Continual Learning with Weighted Inter-client Transfer | KAIST | ICML | 2021 | [PUB] [PDF] [CODE] [VIDEO] | |
Federated Deep AUC Maximization for Hetergeneous Data with a Constant Communication Complexity | The University of Iowa | ICML | 2021 | [PUB] [PDF] [CODE] [VIDEO] | |
Bias-Variance Reduced Local SGD for Less Heterogeneous Federated Learning | The University of Tokyo | ICML | 2021 | [PUB] [PDF] [VIDEO] | |
Federated Learning of User Verification Models Without Sharing Embeddings | Qualcomm | ICML | 2021 | [PUB] [PDF] [VIDEO] | |
Clustered Sampling: Low-Variance and Improved Representativity for Clients Selection in Federated Learning | Accenture | ICML | 2021 | [PUB] [PDF] [CODE] [VIDEO] | |
Ditto: Fair and Robust Federated Learning Through Personalization | CMU; Facebook AI | ICML | 2021 | [PUB] [PDF] [CODE] [VIDEO] | |
Heterogeneity for the Win: One-Shot Federated Clustering | CMU | ICML | 2021 | [PUB] [PDF] [VIDEO] | |
The Distributed Discrete Gaussian Mechanism for Federated Learning with Secure Aggregation 🔥 | ICML | 2021 | [PUB] [PDF] [CODE] [VIDEO] | ||
Debiasing Model Updates for Improving Personalized Federated Training | BU; Arm | ICML | 2021 | [PUB] [CODE] [VIDEO] | |
One for One, or All for All: Equilibria and Optimality of Collaboration in Federated Learning | Toyota; Berkeley; Cornell University | ICML | 2021 | [PUB] [PDF] [CODE] [VIDEO] | |
CRFL: Certifiably Robust Federated Learning against Backdoor Attacks | UIUC; IBM | ICML | 2021 | [PUB] [PDF] [CODE] [VIDEO] | |
Federated Learning under Arbitrary Communication Patterns | Indiana University; Amazon | ICML | 2021 | [PUB] [VIDEO] | |
Sageflow: Robust Federated Learning against Both Stragglers and Adversaries | KAIST | NeurIPS | 2021 | Sageflow193 | [PUB] |
CAFE: Catastrophic Data Leakage in Vertical Federated Learning | Rensselaer Polytechnic Institute; IBM Research | NeurIPS | 2021 | CAFE194 | [PUB.] [CODE] |
Fault-Tolerant Federated Reinforcement Learning with Theoretical Guarantee | NUS | NeurIPS | 2021 | [PUB.] [PDF] [CODE] | |
Optimality and Stability in Federated Learning: A Game-theoretic Approach | Cornell University | NeurIPS | 2021 | [PUB.] [PDF] [CODE] | |
QuPeD: Quantized Personalization via Distillation with Applications to Federated Learning | UCLA | NeurIPS | 2021 | QuPeD195 | [PUB.] [PDF] [CODE] [解读] |
The Skellam Mechanism for Differentially Private Federated Learning 🔥 | Google Research; CMU | NeurIPS | 2021 | [PUB.] [PDF] [CODE] | |
No Fear of Heterogeneity: Classifier Calibration for Federated Learning with Non-IID Data | NUS; Huawei | NeurIPS | 2021 | [PUB.] [PDF] | |
STEM: A Stochastic Two-Sided Momentum Algorithm Achieving Near-Optimal Sample and Communication Complexities for Federated Learning | UMN | NeurIPS | 2021 | [PUB.] [PDF] | |
Subgraph Federated Learning with Missing Neighbor Generation | Emory; UBC; Lehigh University | NeurIPS | 2021 | FedSage32 | [PUB.] [PDF] [CODE] [解读] |
Evaluating Gradient Inversion Attacks and Defenses in Federated Learning 🔥 | Princeton | NeurIPS | 2021 | GradAttack196 | [PUB.] [PDF] [CODE] |
Personalized Federated Learning With Gaussian Processes | Bar-Ilan University | NeurIPS | 2021 | [PUB] [PDF] [CODE] | |
Differentially Private Federated Bayesian Optimization with Distributed Exploration | MIT; NUS | NeurIPS | 2021 | [PUB.] [PDF] [CODE] | |
Parameterized Knowledge Transfer for Personalized Federated Learning | PolyU | NeurIPS | 2021 | [PUB.] [PDF] | |
Federated Reconstruction: Partially Local Federated Learning 🔥 | Google Research | NeurIPS | 2021 | [PUB.] [PDF] [CODE] [UC.] | |
Fast Federated Learning in the Presence of Arbitrary Device Unavailability | THU; Princeton; MIT | NeurIPS | 2021 | [PUB.] [PDF] [CODE] | |
FL-WBC: Enhancing Robustness against Model Poisoning Attacks in Federated Learning from a Client Perspective | Duke University; Accenture Labs | NeurIPS | 2021 | FL-WBC197 | [PUB.] [PDF] [CODE] |
FjORD: Fair and Accurate Federated Learning under heterogeneous targets with Ordered Dropout | KAUST; Samsung AI Center | NeurIPS | 2021 | FjORD198 | [PUB.] [PDF] |
Linear Convergence in Federated Learning: Tackling Client Heterogeneity and Sparse Gradients | University of Pennsylvania | NeurIPS | 2021 | [PUB.] [PDF] [VIDEO] | |
Federated Multi-Task Learning under a Mixture of Distributions | INRIA; Accenture Labs | NeurIPS | 2021 | [PUB.] [PDF] [CODE] | |
Federated Graph Classification over Non-IID Graphs | Emory | NeurIPS | 2021 | GCFL31 | [PUB.] [PDF] [CODE] [解读] |
Federated Hyperparameter Tuning: Challenges, Baselines, and Connections to Weight-Sharing | CMU; Hewlett Packard Enterprise | NeurIPS | 2021 | FedEx199 | [PUB.] [PDF] [CODE] |
On Large-Cohort Training for Federated Learning 🔥 | Google; CMU | NeurIPS | 2021 | Large-Cohort200 | [PUB.] [PDF] [CODE] |
DeepReduce: A Sparse-tensor Communication Framework for Federated Deep Learning | KAUST; Columbia University; University of Central Florida | NeurIPS | 2021 | DeepReduce201 | [PUB.] [PDF] [CODE] |
PartialFed: Cross-Domain Personalized Federated Learning via Partial Initialization | Huawei | NeurIPS | 2021 | PartialFed202 | [PUB.] [VIDEO] |
Federated Split Task-Agnostic Vision Transformer for COVID-19 CXR Diagnosis | KAIST | NeurIPS | 2021 | [PUB.] [PDF] | |
Addressing Algorithmic Disparity and Performance Inconsistency in Federated Learning | THU; Alibaba; Weill Cornell Medicine | NeurIPS | 2021 | FCFL203 | [PUB.] [PDF] [CODE] |
Federated Linear Contextual Bandits | The Pennsylvania State University; Facebook; University of Virginia | NeurIPS | 2021 | [PUB.] [PDF] [CODE] | |
Few-Round Learning for Federated Learning | KAIST | NeurIPS | 2021 | [PUB.] | |
Breaking the centralized barrier for cross-device federated learning | EPFL; Google Research | NeurIPS | 2021 | [PUB.] [CODE] [VIDEO] | |
Federated-EM with heterogeneity mitigation and variance reduction | Ecole Polytechnique; Google Research | NeurIPS | 2021 | Federated-EM204 | [PUB.] [PDF] |
Delayed Gradient Averaging: Tolerate the Communication Latency for Federated Learning | MIT; Amazon; Google | NeurIPS | 2021 | [PUB] [PAGE] [SLIDES] | |
FedDR – Randomized Douglas-Rachford Splitting Algorithms for Nonconvex Federated Composite Optimization | University of North Carolina at Chapel Hill; IBM Research | NeurIPS | 2021 | FedDR205 | [PUB.] [PDF] [CODE] |
Gradient Inversion with Generative Image Prior | Pohang University of Science and Technology; University of Wisconsin-Madison; University of Washington | NeurIPS | 2021 | [PUB.] [PDF] [CODE] | |
Federated Adversarial Domain Adaptation | BU; Columbia University; Rutgers University | ICLR | 2020 | [PUB] [PDF] [CODE] | |
DBA: Distributed Backdoor Attacks against Federated Learning | ZJU; IBM Research | ICLR | 2020 | [PUB] [CODE] | |
Fair Resource Allocation in Federated Learning 🔥 | CMU; Facebook AI | ICLR | 2020 | fair-flearn206 | [PUB] [PDF] [CODE] |
Federated Learning with Matched Averaging 🔥 | University of Wisconsin-Madison; IBM Research | ICLR | 2020 | FedMA207 | [PUB] [PDF] [CODE] |
Differentially Private Meta-Learning | CMU | ICLR | 2020 | [PUB] [PDF] | |
Generative Models for Effective ML on Private, Decentralized Datasets 🔥 | ICLR | 2020 | [PUB] [PDF] [CODE] | ||
On the Convergence of FedAvg on Non-IID Data 🔥 | PKU | ICLR | 2020 | [PUB] [PDF] [CODE] [解读] | |
FedBoost: A Communication-Efficient Algorithm for Federated Learning | ICML | 2020 | FedBoost208 | [PUB] [VIDEO] | |
FetchSGD: Communication-Efficient Federated Learning with Sketching | UC Berkeley; Johns Hopkins University; Amazon | ICML | 2020 | FetchSGD209 | [PUB] [PDF] [VIDEO] [CODE] |
SCAFFOLD: Stochastic Controlled Averaging for Federated Learning | EPFL; Google | ICML | 2020 | SCAFFOLD210 | [PUB] [PDF] [VIDEO] [UC.] [解读] |
Federated Learning with Only Positive Labels | ICML | 2020 | [PUB] [PDF] [VIDEO] | ||
From Local SGD to Local Fixed-Point Methods for Federated Learning | Moscow Institute of Physics and Technology; KAUST | ICML | 2020 | [PUB] [PDF] [SLIDES] [VIDEO] | |
Acceleration for Compressed Gradient Descent in Distributed and Federated Optimization | KAUST | ICML | 2020 | [PUB] [PDF] [SLIDE] [VIDEO] | |
Differentially-Private Federated Linear Bandits | MIT | NeurIPS | 2020 | [PUB] [PDF] [CODE] | |
Federated Principal Component Analysis | University of Cambridge; Quine Technologies | NeurIPS | 2020 | [PUB] [PDF] [CODE] | |
FedSplit: an algorithmic framework for fast federated optimization | UC Berkeley | NeurIPS | 2020 | FedSplit211 | [PUB] [PDF] |
Federated Bayesian Optimization via Thompson Sampling | NUS; MIT | NeurIPS | 2020 | fbo212 | [PUB] [PDF] [CODE] |
Lower Bounds and Optimal Algorithms for Personalized Federated Learning | KAUST | NeurIPS | 2020 | [PUB] [PDF] | |
Robust Federated Learning: The Case of Affine Distribution Shifts | UC Santa Barbara; MIT | NeurIPS | 2020 | RobustFL213 | [PUB] [PDF] [CODE] |
An Efficient Framework for Clustered Federated Learning | UC Berkeley; DeepMind | NeurIPS | 2020 | ifca214 | [PUB] [PDF] [CODE] |
Distributionally Robust Federated Averaging 🔥 | Pennsylvania State University | NeurIPS | 2020 | DRFA215 | [PUB] [PDF] [CODE] |
Personalized Federated Learning with Moreau Envelopes 🔥 | The University of Sydney | NeurIPS | 2020 | [PUB] [PDF] [CODE] | |
Personalized Federated Learning with Theoretical Guarantees: A Model-Agnostic Meta-Learning Approach | MIT; UT Austin | NeurIPS | 2020 | Per-FedAvg216 | [PUB] [PDF] [UC.] |
Group Knowledge Transfer: Federated Learning of Large CNNs at the Edge | USC | NeurIPS | 2020 | FedGKT217 | [PUB] [PDF] [CODE] [解读] |
Tackling the Objective Inconsistency Problem in Heterogeneous Federated Optimization 🔥 | CMU; Princeton | NeurIPS | 2020 | FedNova218 | [PUB] [PDF] [CODE] [UC.] |
Attack of the Tails: Yes, You Really Can Backdoor Federated Learning | University of Wisconsin-Madison | NeurIPS | 2020 | [PUB] [PDF] | |
Federated Accelerated Stochastic Gradient Descent | Stanford | NeurIPS | 2020 | FedAc219 | [PUB] [PDF] [CODE] [VIDEO] |
Inverting Gradients - How easy is it to break privacy in federated learning? 🔥 | University of Siegen | NeurIPS | 2020 | [PUB] [PDF] [CODE] | |
Ensemble Distillation for Robust Model Fusion in Federated Learning | EPFL | NeurIPS | 2020 | FedDF220 | [PUB] [PDF] [CODE] |
Throughput-Optimal Topology Design for Cross-Silo Federated Learning | INRIA | NeurIPS | 2020 | [PUB] [PDF] [CODE] | |
Bayesian Nonparametric Federated Learning of Neural Networks 🔥 | IBM | ICML | 2019 | [PUB] [PDF] [CODE] | |
Analyzing Federated Learning through an Adversarial Lens 🔥 | Princeton; IBM | ICML | 2019 | [PUB] [PDF] [CODE] | |
Agnostic Federated Learning | ICML | 2019 | [PUB] [PDF] | ||
cpSGD: Communication-efficient and differentially-private distributed SGD | Princeton; Google | NeurIPS | 2018 | [PUB] [PDF] | |
Federated Multi-Task Learning 🔥 | Stanford; USC; CMU | NeurIPS | 2017 | [PUB] [PDF] [CODE] |
In this section, we will summarize Federated Learning papers accepted by top DM(Data Mining) conference and journal, Including KDD(ACM SIGKDD Conference on Knowledge Discovery and Data Mining) and WSDM(Web Search and Data Mining).
- KDD 2022(Research Track, Applied Data Science track) , 2021,2020
- WSDM 2022, 2021, 2019
Title | Affiliation | Venue | Year | TL;DR | Materials |
---|---|---|---|---|---|
Collaboration Equilibrium in Federated Learning | THU | KDD | 2022 | CE221 | [PDF] [PUB] [CODE] |
Connected Low-Loss Subspace Learning for a Personalization in Federated Learning | Ulsan National Institute of Science and Technology | KDD | 2022 | SuPerFed222 | [PDF] [PUB] [CODE] |
FedMSplit: Correlation-Adaptive Federated Multi-Task Learning across Multimodal Split Networks | University of Virginia | KDD | 2022 | FedMSplit223 | [PUB] |
Communication-Efficient Robust Federated Learning with Noisy Labels | University of Pittsburgh | KDD | 2022 | Comm-FedBiO224 | [PDF] [PUB] |
FLDetector: Detecting Malicious Clients in Federated Learning via Checking Model-Updates Consistency | USTC | KDD | 2022 | FLDetector225 | [PDF] [PUB] [CODE] |
Practical Lossless Federated Singular Vector Decomposition Over Billion-Scale Data | HKUST | KDD | 2022 | FedSVD226 | [PDF] [PUB] [CODE] |
FedWalk: Communication Efficient Federated Unsupervised Node Embedding with Differential Privacy | SJTU | KDD | 2022 | FedWalk1 | [PDF] [PUB] |
FederatedScope-GNN: Towards a Unified, Comprehensive and Efficient Platform for Federated Graph Learning 🔥 | Alibaba | KDD (Best Paper Award) | 2022 | FederatedScope-GNN2 | [PDF] [CODE] [PUB] |
Fed-LTD: Towards Cross-Platform Ride Hailing via Federated Learning to Dispatch | BUAA | KDD | 2022 | Fed-LTD227 | [PDF] [PUB] [解读] |
Felicitas: Federated Learning in Distributed Cross Device Collaborative Frameworks | USTC | KDD | 2022 | Felicitas228 | [PDF] [PUB] |
No One Left Behind: Inclusive Federated Learning over Heterogeneous Devices | Renmin University of China | KDD | 2022 | InclusiveFL229 | [PDF] [PUB] |
FedAttack: Effective and Covert Poisoning Attack on Federated Recommendation via Hard Sampling | THU | KDD | 2022 | FedAttack230 | [PDF] [PUB] [CODE] |
PipAttack: Poisoning Federated Recommender Systems for Manipulating Item Promotion | The University of Queensland | WSDM | 2022 | PipAttack231 | [PDF] [PUB] |
Fed2: Feature-Aligned Federated Learning | George Mason University; Microsoft; University of Maryland | KDD | 2021 | Fed2232 | [PDF] [PUB] |
FedRS: Federated Learning with Restricted Softmax for Label Distribution Non-IID Data | Nanjing University | KDD | 2021 | FedRS233 | [CODE] [PUB] |
Federated Adversarial Debiasing for Fair and Trasnferable Representations | Michigan State University | KDD | 2021 | FADE234 | [PAGE] [CODE] [SLIDES] [PUB] |
Cross-Node Federated Graph Neural Network for Spatio-Temporal Data Modeling | USC | KDD | 2021 | CNFGNN33 | [PUB] [CODE] [解读] |
AsySQN: Faster Vertical Federated Learning Algorithms with Better Computation Resource Utilization | Xidian University;JD Tech | KDD | 2021 | AsySQN235 | [PDF] [PUB] |
FLOP: Federated Learning on Medical Datasets using Partial Networks | Duke University | KDD | 2021 | FLOP236 | [PDF] [PUB] [CODE] |
A Practical Federated Learning Framework for Small Number of Stakeholders | ETH Zürich | WSDM | 2021 | Federated-Learning-source237 | [PUB] [CODE] |
Federated Deep Knowledge Tracing | USTC | WSDM | 2021 | FDKT238 | [PUB] [CODE] |
FedFast: Going Beyond Average for Faster Training of Federated Recommender Systems | University College Dublin | KDD | 2020 | FedFast239 | [PUB] [VIDEO] |
Federated Doubly Stochastic Kernel Learning for Vertically Partitioned Data | JD Tech | KDD | 2020 | FDSKL240 | [PUB] [PDF] [VIDEO] |
Federated Online Learning to Rank with Evolution Strategies | Facebook AI Research | WSDM | 2019 | FOLtR-ES241 | [PUB] [CODE] |
In this section, we will summarize Federated Learning papers accepted by top Secure conference and journal, Including S&P(IEEE Symposium on Security and Privacy), CCS(Conference on Computer and Communications Security), USENIX Security(Usenix Security Symposium) and NDSS(Network and Distributed System Security Symposium).
Title | Affiliation | Venue | Year | TL;DR | Materials |
---|---|---|---|---|---|
Private, Efficient, and Accurate: Protecting Models Trained by Multi-party Learning with Differential Privacy | Fudan University | S&P | 2023 | PEA242 | [PDF] |
Back to the Drawing Board: A Critical Evaluation of Poisoning Attacks on Production Federated Learning | University of Massachusetts | S&P | 2022 | [PUB] [VIDEO] | |
SIMC: ML Inference Secure Against Malicious Clients at Semi-Honest Cost | Microsoft Research | USENIX Security | 2022 | SIMC243 | [PUB] [PDF] [CODE] |
Efficient Differentially Private Secure Aggregation for Federated Learning via Hardness of Learning with Errors | University of Vermont | USENIX Security | 2022 | [PUB] [SLIDES] | |
Label Inference Attacks Against Vertical Federated Learning | ZJU | USENIX Security | 2022 | [PUB] [SLIDES] [CODE] | |
FLAME: Taming Backdoors in Federated Learning | Technical University of Darmstadt | USENIX Security | 2022 | FLAME244 | [PUB] [SLIDES] [PDF] |
Local and Central Differential Privacy for Robustness and Privacy in Federated Learning | University at Buffalo, SUNY | NDSS | 2022 | [PUB] [PDF] [UC.] | |
Interpretable Federated Transformer Log Learning for Cloud Threat Forensics | University of the Incarnate Word | NDSS | 2022 | [PUB] [UC.] | |
FedCRI: Federated Mobile Cyber-Risk Intelligence | Technical University of Darmstadt | NDSS | 2022 | FedCRI245 | [PUB] |
DeepSight: Mitigating Backdoor Attacks in Federated Learning Through Deep Model Inspection | Technical University of Darmstadt | NDSS | 2022 | DeepSight246 | [PUB] [PDF] |
Private Hierarchical Clustering in Federated Networks | NUS | CCS | 2021 | [PUB] [PDF] | |
FLTrust: Byzantine-robust Federated Learning via Trust Bootstrapping | Duke University | NDSS | 2021 | [PUB] [PDF] [CODE] [VIDEO] [SLIDES] | |
POSEIDON: Privacy-Preserving Federated Neural Network Learning | EPFL | NDSS | 2021 | [PUB] [VIDEO] | |
Manipulating the Byzantine: Optimizing Model Poisoning Attacks and Defenses for Federated Learning | University of Massachusetts Amherst | NDSS | 2021 | [PUB] [CODE] [VIDEO] | |
Local Model Poisoning Attacks to Byzantine-Robust Federated Learning | The Ohio State University | USENIX Security | 2020 | [PUB] [PDF] [CODE] [VIDEO] [SLIDES] | |
A Reliable and Accountable Privacy-Preserving Federated Learning Framework using the Blockchain | University of Kansas | CCS (Poster) | 2019 | [PUB] | |
IOTFLA : A Secured and Privacy-Preserving Smart Home Architecture Implementing Federated Learning | Université du Québéc á Montréal | S&P (Workshop) | 2019 | [PUB] | |
Comprehensive Privacy Analysis of Deep Learning: Passive and Active White-box Inference Attacks against Centralized and Federated Learning 🔥 | University of Massachusetts Amherst | S&P | 2019 | [PUB] [VIDEO] [SLIDES] [CODE] | |
Practical Secure Aggregation for Privacy Preserving Machine Learning | CCS | 2017 | [PUB] [PDF] [解读] [UC.] [UC] |
In this section, we will summarize Federated Learning papers accepted by top CV(computer vision) conference and journal, Including CVPR(Computer Vision and Pattern Recognition), ICCV(IEEE International Conference on Computer Vision), ECCV(European Conference on Computer Vision), MM(ACM International Conference on Multimedia).
Title | Affiliation | Venue | Year | TL;DR | Materials |
---|---|---|---|---|---|
FedX: Unsupervised Federated Learning with Cross Knowledge Distillation | KAIST | ECCV | 2022 | FedX247 | [PUB.] [CODE] |
Personalizing Federated Medical Image Segmentation via Local Calibration | Xiamen University | ECCV | 2022 | LC-Fed248 | [PUB.] [CODE] |
Improving Generalization in Federated Learning by Seeking Flat Minima | Politecnico di Torino | ECCV | 2022 | FedSAM249 | [PUB.] [CODE] |
ATPFL: Automatic Trajectory Prediction Model Design Under Federated Learning Framework | HIT | CVPR | 2022 | ATPFL250 | [PUB] |
Rethinking Architecture Design for Tackling Data Heterogeneity in Federated Learning | Stanford | CVPR | 2022 | ViT-FL251 | [PUB] [SUPP] [PDF] [CODE] [VIDEO] |
FedCorr: Multi-Stage Federated Learning for Label Noise Correction | Singapore University of Technology and Design | CVPR | 2022 | FedCorr252 | [PUB] [SUPP] [PDF] [CODE] [VIDEO] |
FedCor: Correlation-Based Active Client Selection Strategy for Heterogeneous Federated Learning | Duke University | CVPR | 2022 | FedCor253 | [PUB] [SUPP] [PDF] |
Layer-Wised Model Aggregation for Personalized Federated Learning | PolyU | CVPR | 2022 | pFedLA254 | [PUB] [SUPP] [PDF] |
Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning | University of Central Florida | CVPR | 2022 | FedAlign255 | [PUB] [SUPP] [PDF] [CODE] |
Federated Learning With Position-Aware Neurons | Nanjing University | CVPR | 2022 | PANs256 | [PUB] [SUPP] [PDF] |
RSCFed: Random Sampling Consensus Federated Semi-Supervised Learning | HKUST | CVPR | 2022 | RSCFed257 | [PUB] [SUPP] [PDF] [CODE] |
Learn From Others and Be Yourself in Heterogeneous Federated Learning | Wuhan University | CVPR | 2022 | FCCL258 | [PUB] [CODE] [VIDEO] |
Robust Federated Learning With Noisy and Heterogeneous Clients | Wuhan University | CVPR | 2022 | RHFL259 | [PUB] [SUPP] [CODE] |
ResSFL: A Resistance Transfer Framework for Defending Model Inversion Attack in Split Federated Learning | Arizona State University | CVPR | 2022 | ResSFL260 | [PUB] [SUPP] [PDF] [CODE] |
FedDC: Federated Learning With Non-IID Data via Local Drift Decoupling and Correction | National University of Defense Technology | CVPR | 2022 | FedDC261 | [PUB] [PDF] [CODE] [解读] |
Federated Class-Incremental Learning | CAS; Northwestern University; UTS | CVPR | 2022 | GLFC262 | [PUB] [PDF] [CODE] |
Fine-Tuning Global Model via Data-Free Knowledge Distillation for Non-IID Federated Learning | PKU; JD Explore Academy; The University of Sydney | CVPR | 2022 | FedFTG263 | [PUB] [PDF] |
Differentially Private Federated Learning With Local Regularization and Sparsification | CAS | CVPR | 2022 | DP-FedAvg+BLUR+LUS264 | [PUB] [PDF] |
Auditing Privacy Defenses in Federated Learning via Generative Gradient Leakage | University of Tennessee; Oak Ridge National Laboratory; Google Research | CVPR | 2022 | GGL265 | [PUB] [PDF] [CODE] [VIDEO] |
CD2-pFed: Cyclic Distillation-Guided Channel Decoupling for Model Personalization in Federated Learning | SJTU | CVPR | 2022 | CD2-pFed266 | [PUB] [PDF] |
Closing the Generalization Gap of Cross-Silo Federated Medical Image Segmentation | Univ. of Pittsburgh; NVIDIA | CVPR | 2022 | FedSM267 | [PUB] [PDF] |
Multi-Institutional Collaborations for Improving Deep Learning-Based Magnetic Resonance Image Reconstruction Using Federated Learning | Johns Hopkins University | CVPR | 2021 | FL-MRCM268 | [PUB] [PDF] [CODE] |
Model-Contrastive Federated Learning 🔥 | NUS; UC Berkeley | CVPR | 2021 | MOON269 | [PUB] [PDF] [CODE] [解读] |
FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space 🔥 | CUHK | CVPR | 2021 | FedDG-ELCFS270 | [PUB] [PDF] [CODE] |
Soteria: Provable Defense Against Privacy Leakage in Federated Learning From Representation Perspective | Duke University | CVPR | 2021 | Soteria271 | [PUB] [PDF] [CODE] |
Federated Learning for Non-IID Data via Unified Feature Learning and Optimization Objective Alignment | PKU | ICCV | 2021 | FedUFO272 | [PUB] |
Ensemble Attention Distillation for Privacy-Preserving Federated Learning | University at Buffalo | ICCV | 2021 | FedAD273 | [PUB] [PDF] |
Collaborative Unsupervised Visual Representation Learning from Decentralized Data | NTU; SenseTime | ICCV | 2021 | FedU274 | [PUB] [PDF] |
Joint Optimization in Edge-Cloud Continuum for Federated Unsupervised Person Re-identification | NTU | MM | 2021 | FedUReID275 | [PUB] [PDF] |
Federated Visual Classification with Real-World Data Distribution | MIT; Google | ECCV | 2020 | FedVC+FedIR276 | [PUB] [PDF] [VIDEO] |
InvisibleFL: Federated Learning over Non-Informative Intermediate Updates against Multimedia Privacy Leakages | MM | 2020 | InvisibleFL277 | [PUB] | |
Performance Optimization of Federated Person Re-identification via Benchmark Analysis data. |
NTU | MM | 2020 | FedReID278 | [PUB] [PDF] [CODE] [解读] |
In this section, we will summarize Federated Learning papers accepted by top AI and NLP conference and journal, including ACL(Annual Meeting of the Association for Computational Linguistics), NAACL(North American Chapter of the Association for Computational Linguistics), EMNLP(Conference on Empirical Methods in Natural Language Processing) and COLING(International Conference on Computational Linguistics).
Title | Affiliation | Venue | Year | TL;DR | Materials |
---|---|---|---|---|---|
Scaling Language Model Size in Cross-Device Federated Learning | ACL workshop | 2022 | SLM-FL279 | [PUB] [PDF] | |
Intrinsic Gradient Compression for Scalable and Efficient Federated Learning | Oxford | ACL workshop | 2022 | IGC-FL280 | [PUB] [PDF] |
ActPerFL: Active Personalized Federated Learning | Amazon | ACL workshop | 2022 | ActPerFL281 | [PUB] [PAGE] |
FedNLP: Benchmarking Federated Learning Methods for Natural Language Processing Tasks 🔥 | USC | NAACL | 2022 | FedNLP282 | [PUB] [PDF] [CODE] |
Federated Learning with Noisy User Feedback | USC; Amazon | NAACL | 2022 | FedNoisy283 | [PUB] [PDF] |
Training Mixed-Domain Translation Models via Federated Learning | Amazon | NAACL | 2022 | FedMDT284 | [PUB] [PAGE] [PDF] |
Pretrained Models for Multilingual Federated Learning | Johns Hopkins University | NAACL | 2022 | [PUB] [PDF] [CODE] | |
Training Mixed-Domain Translation Models via Federated Learning | Amazon | NAACL | 2022 | [PUB] [PAGE] [PDF] | |
Federated Chinese Word Segmentation with Global Character Associations | University of Washington | ACL workshop | 2021 | [PUB] [CODE] | |
Efficient-FedRec: Efficient Federated Learning Framework for Privacy-Preserving News Recommendation | USTC | EMNLP | 2021 | Efficient-FedRec285 | [PUB] [PDF] [CODE] [VIDEO] |
Improving Federated Learning for Aspect-based Sentiment Analysis via Topic Memories | CUHK (Shenzhen) | EMNLP | 2021 | [PUB] [CODE] [VIDEO] | |
A Secure and Efficient Federated Learning Framework for NLP | University of Connecticut | EMNLP | 2021 | [PUB] [PDF] [VIDEO] | |
Distantly Supervised Relation Extraction in Federated Settings | UCAS | EMNLP workshop | 2021 | [PUB] [PDF] [CODE] | |
Federated Learning with Noisy User Feedback | USC; Amazon | NAACL workshop | 2021 | [PUB] [PDF] | |
An Investigation towards Differentially Private Sequence Tagging in a Federated Framework | Universität Hamburg | NAACL workshop | 2021 | [PUB] | |
Understanding Unintended Memorization in Language Models Under Federated Learning | NAACL workshop | 2021 | [PUB] [PDF] | ||
FedED: Federated Learning via Ensemble Distillation for Medical Relation Extraction | CAS | EMNLP | 2020 | [PUB] [VIDEO] [解读] | |
Empirical Studies of Institutional Federated Learning For Natural Language Processing | Ping An Technology | EMNLP workshop | 2020 | [PUB] | |
Federated Learning for Spoken Language Understanding | PKU | COLING | 2020 | [PUB] | |
Two-stage Federated Phenotyping and Patient Representation Learning | Boston Children’s Hospital Harvard Medical School | ACL workshop | 2019 | [PUB] [PDF] [CODE] [UC.] |
In this section, we will summarize Federated Learning papers accepted by top Information Retrieval conference and journal, including SIGIR(Annual International ACM SIGIR Conference on Research and Development in Information Retrieval).
Title | Affiliation | Venue | Year | TL;DR | Materials |
---|---|---|---|---|---|
Is Non-IID Data a Threat in Federated Online Learning to Rank? | The University of Queensland | SIGIR | 2022 | noniid-foltr286 | [PUB] [CODE] |
FedCT: Federated Collaborative Transfer for Recommendation | Rutgers University | SIGIR | 2021 | FedCT287 | [PUB] [PDF] [CODE] |
On the Privacy of Federated Pipelines | Technical University of Munich | SIGIR | 2021 | FedGWAS288 | [PUB] |
FedCMR: Federated Cross-Modal Retrieval. | Dalian University of Technology | SIGIR | 2021 | FedCMR289 | [PUB] [CODE] |
Meta Matrix Factorization for Federated Rating Predictions. | SDU | SIGIR | 2020 | MetaMF290 | [PUB] [PDF] |
In this section, we will summarize Federated Learning papers accepted by top Database conference and journal, including SIGMOD(ACM SIGMOD Conference) , ICDE(IEEE International Conference on Data Engineering) and VLDB(Very Large Data Bases Conference).
Title | Affiliation | Venue | Year | TL;DR | Materials |
---|---|---|---|---|---|
Skellam Mixture Mechanism: a Novel Approach to Federated Learning with Differential Privacy. | NUS | VLDB | 2022 | SMM291 | [PUB] [CODE] |
Towards Communication-efficient Vertical Federated Learning Training via Cache-enabled Local Update | PKU | VLDB | 2022 | CELU-VFL292 | [PUB] [PDF] [CODE] |
FedTSC: A Secure Federated Learning System for Interpretable Time Series Classification. | HIT | VLDB | 2022 | FedTSC293 | [PUB] [CODE] |
Improving Fairness for Data Valuation in Horizontal Federated Learning | The UBC | ICDE | 2022 | CSFV294 | [PUB] [PDF] |
FedADMM: A Robust Federated Deep Learning Framework with Adaptivity to System Heterogeneity | USTC | ICDE | 2022 | FedADMM295 | [PUB] [PDF] [CODE] |
FedMP: Federated Learning through Adaptive Model Pruning in Heterogeneous Edge Computing. | USTC | ICDE | 2022 | FedMP296 | [PUB] |
Federated Learning on Non-IID Data Silos: An Experimental Study. 🔥 | NUS | ICDE | 2022 | ESND297 | [PUB] [PDF] [CODE] |
Enhancing Federated Learning with Intelligent Model Migration in Heterogeneous Edge Computing | USTC | ICDE | 2022 | FedMigr298 | [PUB] |
Samba: A System for Secure Federated Multi-Armed Bandits | Univ. Clermont Auvergne | ICDE | 2022 | Samba299 | [PUB] [CODE] |
FedRecAttack: Model Poisoning Attack to Federated Recommendation | ZJU | ICDE | 2022 | FedRecAttack300 | [PUB] [PDF] [CODE] |
Enhancing Federated Learning with In-Cloud Unlabeled Data | USTC | ICDE | 2022 | Ada-FedSemi301 | [PUB] |
Efficient Participant Contribution Evaluation for Horizontal and Vertical Federated Learning | USTC | ICDE | 2022 | DIG-FL302 | [PUB] |
An Introduction to Federated Computation | University of Warwick; Facebook | SIGMOD Tutorial | 2022 | FCT303 | [PUB] |
BlindFL: Vertical Federated Machine Learning without Peeking into Your Data | PKU; Tencent | SIGMOD | 2022 | BlindFL304 | [PUB] [PDF] |
An Efficient Approach for Cross-Silo Federated Learning to Rank | BUAA | ICDE | 2021 | CS-F-LTR305 | [PUB] [RELATED PAPER(ZH)] |
Feature Inference Attack on Model Predictions in Vertical Federated Learning | NUS | ICDE | 2021 | FIA306 | [PUB] [PDF] [CODE] |
Efficient Federated-Learning Model Debugging | USTC | ICDE | 2021 | FLDebugger307 | [PUB] |
Federated Matrix Factorization with Privacy Guarantee | Purdue | VLDB | 2021 | FMFPG308 | [PUB] |
Projected Federated Averaging with Heterogeneous Differential Privacy. | Renmin University of China | VLDB | 2021 | PFA-DB309 | [PUB] [CODE] |
Enabling SQL-based Training Data Debugging for Federated Learning | Simon Fraser University | VLDB | 2021 | FedRain-and-Frog310 | [PUB] [PDF] [CODE] |
Refiner: A Reliable Incentive-Driven Federated Learning System Powered by Blockchain | ZJU | VLDB | 2021 | Refiner311 | [PUB] |
Tanium Reveal: A Federated Search Engine for Querying Unstructured File Data on Large Enterprise Networks | Tanium Inc. | VLDB | 2021 | TaniumReveal312 | [PUB] [VIDEO] |
VF2Boost: Very Fast Vertical Federated Gradient Boosting for Cross-Enterprise Learning | PKU | SIGMOD | 2021 | VF2Boost80 | [PUB] |
ExDRa: Exploratory Data Science on Federated Raw Data | SIEMENS | SIGMOD | 2021 | ExDRa313 | [PUB] |
Joint blockchain and federated learning-based offloading in harsh edge computing environments | TJU | SIGMOD workshop | 2021 | FLoffloading314 | [PUB] |
Privacy Preserving Vertical Federated Learning for Tree-based Models | NUS | VLDB | 2020 | Pivot-DT90 | [PUB] [PDF] [VIDEO] [CODE] |
In this section, we will summarize Federated Learning papers accepted by top Database conference and journal, including SIGCOMM(Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication), INFOCOM(IEEE Conference on Computer Communications), MobiCom(ACM/IEEE International Conference on Mobile Computing and Networking), NSDI(Symposium on Networked Systems Design and Implementation) and WWW(The Web Conference).
Title | Affiliation | Venue | Year | TL;DR | Materials |
---|---|---|---|---|---|
Joint Superposition Coding and Training for Federated Learning over Multi-Width Neural Networks | Korea University | INFOCOM | 2022 | SlimFL315 | [PUB] |
Towards Optimal Multi-Modal Federated Learning on Non-IID Data with Hierarchical Gradient Blending | University of Toronto | INFOCOM | 2022 | HGBFL316 | [PUB] |
Optimal Rate Adaption in Federated Learning with Compressed Communications | SZU | INFOCOM | 2022 | ORAFL317 | [PUB] [PDF] |
The Right to be Forgotten in Federated Learning: An Efficient Realization with Rapid Retraining. | CityU | INFOCOM | 2022 | RFFL318 | [PUB] [PDF] |
Tackling System and Statistical Heterogeneity for Federated Learning with Adaptive Client Sampling. | CUHK; AIRS ;Yale University | INFOCOM | 2022 | FLACS319 | [PUB] [PDF] |
Communication-Efficient Device Scheduling for Federated Learning Using Stochastic Optimization | Army Research Laboratory, Adelphi | INFOCOM | 2022 | CEDSFL320 | [PUB] [PDF] |
FLASH: Federated Learning for Automated Selection of High-band mmWave Sectors | NEU | INFOCOM | 2022 | FLASH321 | [PUB] [CODE] |
A Profit-Maximizing Model Marketplace with Differentially Private Federated Learning | CUHK; AIRS | INFOCOM | 2022 | PMDPFL322 | [PUB] |
Protect Privacy from Gradient Leakage Attack in Federated Learning | PolyU | INFOCOM | 2022 | PPGLFL323 | [PUB] [SLIDES] |
FedFPM: A Unified Federated Analytics Framework for Collaborative Frequent Pattern Mining. | SJTU | INFOCOM | 2022 | FedFPM324 | [PUB] [CODE] |
An Accuracy-Lossless Perturbation Method for Defending Privacy Attacks in Federated Learning | SWJTU;THU | WWW | 2022 | PBPFL325 | [PUB] [PDF] [CODE] |
LocFedMix-SL: Localize, Federate, and Mix for Improved Scalability, Convergence, and Latency in Split Learning | Yonsei University | WWW | 2022 | LocFedMix-SL326 | [PUB] |
Federated Unlearning via Class-Discriminative Pruning | PolyU | WWW | 2022 | [PUB] [PDF] [CODE] | |
FedKC: Federated Knowledge Composition for Multilingual Natural Language Understanding | Purdue | WWW | 2022 | FedKC327 | [PUB] |
Federated Bandit: A Gossiping Approach | University of California | SIGMETRICS | 2021 | Federated-Bandit328 | [PUB] [PDF] |
Hermes: an efficient federated learning framework for heterogeneous mobile clients | Duke University | MobiCom | 2021 | Hermes329 | [PUB] |
Federated mobile sensing for activity recognition | Samsung AI Center | MobiCom | 2021 | [PUB] [PAGE] [TALKS] [VIDEO] | |
Learning for Learning: Predictive Online Control of Federated Learning with Edge Provisioning. | Nanjing University | INFOCOM | 2021 | [PUB] | |
Device Sampling for Heterogeneous Federated Learning: Theory, Algorithms, and Implementation. | Purdue | INFOCOM | 2021 | D2D-FedL30 | [PUB] [PDF] |
FAIR: Quality-Aware Federated Learning with Precise User Incentive and Model Aggregation | THU | INFOCOM | 2021 | FAIR330 | [PUB] |
Sample-level Data Selection for Federated Learning | USTC | INFOCOM | 2021 | [PUB] | |
To Talk or to Work: Flexible Communication Compression for Energy Efficient Federated Learning over Heterogeneous Mobile Edge Devices | Xidian University; CAS | INFOCOM | 2021 | [PUB] [PDF] | |
Cost-Effective Federated Learning Design | CUHK; AIRS; Yale University | INFOCOM | 2021 | [PUB] [PDF] | |
An Incentive Mechanism for Cross-Silo Federated Learning: A Public Goods Perspective | The UBC | INFOCOM | 2021 | [PUB] | |
Resource-Efficient Federated Learning with Hierarchical Aggregation in Edge Computing | USTC | INFOCOM | 2021 | [PUB] | |
FedServing: A Federated Prediction Serving Framework Based on Incentive Mechanism. | Jinan University; CityU | INFOCOM | 2021 | FedServing331 | [PUB] [PDF] |
Federated Learning over Wireless Networks: A Band-limited Coordinated Descent Approach | Arizona State University | INFOCOM | 2021 | [PUB] [PDF] | |
Dual Attention-Based Federated Learning for Wireless Traffic Prediction | King Abdullah University of Science and Technology | INFOCOM | 2021 | FedDA332 | [PUB] [PDF] [CODE] |
FedSens: A Federated Learning Approach for Smart Health Sensing with Class Imbalance in Resource Constrained Edge Computing | University of Notre Dame | INFOCOM | 2021 | FedSens333 | [PUB] |
P-FedAvg: Parallelizing Federated Learning with Theoretical Guarantees | SYSU; Guangdong Key Laboratory of Big Data Analysis and Processing | INFOCOM | 2021 | P-FedAvg334 | [PUB] |
Meta-HAR: Federated Representation Learning for Human Activity Recognition. | University of Alberta | WWW | 2021 | Meta-HAR335 | [PUB] [PDF] [CODE] |
PFA: Privacy-preserving Federated Adaptation for Effective Model Personalization | PKU | WWW | 2021 | PFA336 | [PUB] [PDF] [CODE] |
Communication Efficient Federated Generalized Tensor Factorization for Collaborative Health Data Analytics | Emory | WWW | 2021 | FedGTF-EF-PC337 | [PUB] [CODE] |
Hierarchical Personalized Federated Learning for User Modeling | USTC | WWW | 2021 | [PUB] | |
Characterizing Impacts of Heterogeneity in Federated Learning upon Large-Scale Smartphone Data | PKU | WWW | 2021 | Heter-aware338 | [PUB] [PDF] [SLIDES] [CODE] |
Incentive Mechanism for Horizontal Federated Learning Based on Reputation and Reverse Auction | SYSU | WWW | 2021 | [PUB] | |
Physical-Layer Arithmetic for Federated Learning in Uplink MU-MIMO Enabled Wireless Networks. | Nanjing University | INFOCOM | 2020 | [PUB] | |
Optimizing Federated Learning on Non-IID Data with Reinforcement Learning 🔥 | University of Toronto | INFOCOM | 2020 | [PUB] [SLIDES] [CODE] [解读] | |
Enabling Execution Assurance of Federated Learning at Untrusted Participants | THU | INFOCOM | 2020 | [PUB] [CODE] | |
Billion-scale federated learning on mobile clients: a submodel design with tunable privacy | SJTU | MobiCom | 2020 | [PUB] | |
Federated Learning over Wireless Networks: Optimization Model Design and Analysis | The University of Sydney | INFOCOM | 2019 | [PUB] [CODE] | |
Beyond Inferring Class Representatives: User-Level Privacy Leakage From Federated Learning | Wuhan University | INFOCOM | 2019 | [PUB] [PDF] [UC.] | |
InPrivate Digging: Enabling Tree-based Distributed Data Mining with Differential Privacy | Collaborative Innovation Center of Geospatial Technology | INFOCOM | 2018 | TFL339 | [PUB] |
In this section, we will summarize Federated Learning papers accepted by top Database conference and journal, including OSDI(USENIX Symposium on Operating Systems Design and Implementation), SOSP(Symposium on Operating Systems Principles), ISCA(International Symposium on Computer Architecture), MLSys(Conference on Machine Learning and Systems), TPDS(IEEE Transactions on Parallel and Distributed Systems).
Title | Affiliation | Venue | Year | TL;DR | Materials |
---|---|---|---|---|---|
FedGraph: Federated Graph Learning with Intelligent Sampling | UoA | TPDS | 2022 | FedGraph8 | [PUB.] [CODE] [解读] |
AUCTION: Automated and Quality-Aware Client Selection Framework for Efficient Federated Learning. | THU | TPDS | 2022 | AUCTION340 | [PUB] |
DONE: Distributed Approximate Newton-type Method for Federated Edge Learning. | University of Sydney | TPDS | 2022 | DONE341 | [PUB] [PDF] [CODE] |
Flexible Clustered Federated Learning for Client-Level Data Distribution Shift. | CQU | TPDS | 2022 | FlexCFL342 | [PUB] [PDF] [CODE] |
Min-Max Cost Optimization for Efficient Hierarchical Federated Learning in Wireless Edge Networks. | Xidian University | TPDS | 2022 | [PUB] | |
LightFed: An Efficient and Secure Federated Edge Learning System on Model Splitting. | CSU | TPDS | 2022 | LightFed343 | [PUB] |
On the Benefits of Multiple Gossip Steps in Communication-Constrained Decentralized Federated Learning. | Purdue | TPDS | 2022 | Deli-CoCo344 | [PUB] [PDF] [CODE] |
Incentive-Aware Autonomous Client Participation in Federated Learning. | Sun Yat-sen University | TPDS | 2022 | [PUB] | |
Communicational and Computational Efficient Federated Domain Adaptation. | HKUST | TPDS | 2022 | [PUB] | |
Decentralized Edge Intelligence: A Dynamic Resource Allocation Framework for Hierarchical Federated Learning. | NTU | TPDS | 2022 | [PUB] | |
Differentially Private Byzantine-Robust Federated Learning. | Qufu Normal University | TPDS | 2022 | DPBFL345 | [PUB] |
Multi-Task Federated Learning for Personalised Deep Neural Networks in Edge Computing. | University of Exeter | TPDS | 2022 | [PUB] [PDF] [CODE] | |
Reputation-Aware Hedonic Coalition Formation for Efficient Serverless Hierarchical Federated Learning. | BUAA | TPDS | 2022 | SHFL346 | [PUB] |
Differentially Private Federated Temporal Difference Learning. | Stony Brook University | TPDS | 2022 | [PUB] | |
Towards Efficient and Stable K-Asynchronous Federated Learning With Unbounded Stale Gradients on Non-IID Data. | XJTU | TPDS | 2022 | WKAFL347 | [PUB] [PDF] |
Communication-Efficient Federated Learning With Compensated Overlap-FedAvg. | SCU | TPDS | 2022 | Overlap-FedAvg348 | [PUB] [PDF] [CODE] |
PAPAYA: Practical, Private, and Scalable Federated Learning. | Meta AI | MLSys | 2022 | PAPAYA349 | [PDF] [PUB] |
LightSecAgg: a Lightweight and Versatile Design for Secure Aggregation in Federated Learning | USC | MLSys | 2022 | LightSecAgg350 | [PDF] [PUB] [CODE] |
Oort: Efficient Federated Learning via Guided Participant Selection | University of Michigan | OSDI | 2021 | Oort351 | [PUB] [PDF] [CODE] [SLIDES] [VIDEO] |
Towards Efficient Scheduling of Federated Mobile Devices Under Computational and Statistical Heterogeneity. | Old Dominion University | TPDS | 2021 | [PUB] [PDF] | |
Self-Balancing Federated Learning With Global Imbalanced Data in Mobile Systems. | CQU | TPDS | 2021 | Astraea352 | [PUB] [CODE] |
An Efficiency-Boosting Client Selection Scheme for Federated Learning With Fairness Guarantee | SCUT | TPDS | 2021 | RBCS-F353 | [PUB] [PDF] [解读] |
Proof of Federated Learning: A Novel Energy-Recycling Consensus Algorithm. | Beijing Normal University | TPDS | 2021 | PoFL354 | [PUB] [PDF] |
Biscotti: A Blockchain System for Private and Secure Federated Learning. | UBC | TPDS | 2021 | Biscotti355 | [PUB] |
Mutual Information Driven Federated Learning. | Deakin University | TPDS | 2021 | [PUB] | |
Accelerating Federated Learning Over Reliability-Agnostic Clients in Mobile Edge Computing Systems. | University of Warwick | TPDS | 2021 | [PUB] [PDF] | |
FedSCR: Structure-Based Communication Reduction for Federated Learning. | HKU | TPDS | 2021 | FedSCR356 | [PUB] |
FedScale: Benchmarking Model and System Performance of Federated Learning 🔥 | University of Michigan | SOSP workshop / ICML 2022 | 2021 | FedScale168 | [PUB] [PDF] [CODE] [解读] |
Redundancy in cost functions for Byzantine fault-tolerant federated learning | SOSP workshop | 2021 | [PUB] | ||
Towards an Efficient System for Differentially-private, Cross-device Federated Learning | SOSP workshop | 2021 | [PUB] | ||
GradSec: a TEE-based Scheme Against Federated Learning Inference Attacks | SOSP workshop | 2021 | [PUB] | ||
Community-Structured Decentralized Learning for Resilient EI. | SOSP workshop | 2021 | [PUB] | ||
Separation of Powers in Federated Learning (Poster Paper) | IBM Research | SOSP workshop | 2021 | TRUDA357 | [PUB] [PDF] |
Accelerating Federated Learning via Momentum Gradient Descent. | USTC | TPDS | 2020 | MFL358 | [PUB] [PDF] |
Towards Fair and Privacy-Preserving Federated Deep Models. | NUS | TPDS | 2020 | FPPDL359 | [PUB] [PDF] [CODE] |
Federated Optimization in Heterogeneous Networks 🔥 | CMU | MLSys | 2020 | FedProx360 | [PUB] [PDF] [CODE] |
Towards Federated Learning at Scale: System Design | MLSys | 2019 | System_Design361 | [PUB] [PDF] [解读] |
Note: SG means Support for Graph data and algorithms, ST means Support for Tabular data and algorithms.
- UniFed leaderboard
Here's a really great Benchmark for the federated learning open source framework 👍 UniFed leaderboard, which present both qualitative and quantitative evaluation results of existing popular open-sourced FL frameworks, from the perspectives of functionality, usability, and system performance.
For more results, please refer to Framework Functionality Support
This section partially refers to repository Federated-Learning and FederatedAI research , the order of the surveys is arranged in reverse order according to the time of first submission (the latest being placed at the top)
- [CIKM Workshop 2022] Federated Graph Machine Learning: A Survey of Concepts, Techniques, and Applications PDF
- [ACM Trans. Interact. Intell. Syst.] Toward Responsible AI: An Overview of Federated Learning for User-centered Privacy-preserving Computing [PUB]
- [ICML Workshop 2020] SECure: A Social and Environmental Certificate for AI Systems PDF
- [IEEE Commun. Mag. 2020] From Federated Learning to Fog Learning: Towards Large-Scale Distributed Machine Learning in Heterogeneous Wireless Networks PDF [PUB]
- [China Communications 2020] Federated Learning for 6G Communications: Challenges, Methods, and Future Directions PDF [PUB.]
- [Federated Learning Systems] A Review of Privacy Preserving Federated Learning for Private IoT Analytics PDF [PUB]
- [WorldS4 2020] Survey of Personalization Techniques for Federated Learning PDF [PUB]
- Towards Utilizing Unlabeled Data in Federated Learning: A Survey and Prospective PDF
- [IEEE Internet Things J. 2022] A Survey on Federated Learning for Resource-Constrained IoT Devices PDF [PUB]
- [IEEE Communications Surveys & Tutorials 2020] Communication-Efficient Edge AI: Algorithms and Systems PDF [PUB]
- [IEEE Communications Surveys & Tutorials 2020] Federated Learning in Mobile Edge Networks: A Comprehensive Survey PDF [PUB]
- [IEEE Signal Process. Mag. 2020] Federated Learning: Challenges, Methods, and Future Directions PDF [PUB]
- [IEEE Commun. Mag. 2020] Federated Learning for Wireless Communications: Motivation, Opportunities and Challenges PDF [PUB]
- [IEEE TKDE 2021] A Survey on Federated Learning Systems: Vision, Hype and Reality for Data Privacy and Protection PDF [PUB]
- [IJCAI Workshop 2020] Threats to Federated Learning: A Survey PDF
- [Foundations and Trends in Machine Learning 2021] Advances and Open Problems in Federated Learning PDF [PUB]
- Privacy-Preserving Blockchain Based Federated Learning with Differential Data Sharing PDF
- An Introduction to Communication Efficient Edge Machine Learning PDF
- [IEEE Communications Surveys & Tutorials 2020] Convergence of Edge Computing and Deep Learning: A Comprehensive Survey PDF [PUB]
- [IEEE TIST 2019] Federated Machine Learning: Concept and Applications PDF [PUB]
- [J. Heal. Informatics Res. 2021] Federated Learning for Healthcare Informatics PDF [PUB]
- Federated Learning for Coalition Operations PDF
- No Peek: A Survey of private distributed deep learning PDF
-
[NeurIPS 2020] Federated Learning Tutorial [Web] [Slides] [Video]
-
Federated Learning on MNIST using a CNN, AI6101, 2020 (Demo Video)
-
[AAAI 2019] Federated Learning: User Privacy, Data Security and Confidentiality in Machine Learning
-
- A Tutorial for Encrypted Deep Learning
- Use Homomorphic Encryption (HE)
-
Private Image Analysis with MPC
- Training CNNs on Sensitive Data
- Use SPDZ as MPC protocol
-
Private Deep Learning with MPC
- A Simple Tutorial from Scratch
- Use Multiparty Compuation (MPC)
This section partially refers to The Federated Learning Portal.
-
[AI Technology School 2022] Trustable, Verifiable and Auditable Artificial Intelligence, Singapore
-
[FL-NeurIPS'22] International Workshop on Federated Learning: Recent Advances and New Challenges in Conjunction with NeurIPS 2022 , New Orleans, LA, USA
-
[FL-IJCAI'22] International Workshop on Trustworthy Federated Learning in Conjunction with IJCAI 2022, Vienna, Austria
-
[FL-AAAI-22] International Workshop on Trustable, Verifiable and Auditable Federated Learning in Conjunction with AAAI 2022, Vancouver, BC, Canada (Virtual)
-
[FL-NeurIPS'21] New Frontiers in Federated Learning: Privacy, Fairness, Robustness, Personalization and Data Ownership, (Virtual)
-
[The Federated Learning Workshop, 2021] , Paris, France (Hybrid)
-
[PDFL-EMNLP'21] Workshop on Parallel, Distributed, and Federated Learning, Bilbao, Spain (Virtual)
-
[FTL-IJCAI'21] International Workshop on Federated and Transfer Learning for Data Sparsity and Confidentiality in Conjunction with IJCAI 2021, Montreal, QB, Canada (Virtual)
-
[DeepIPR-IJCAI'21] Toward Intellectual Property Protection on Deep Learning as a Services, Montreal, QB, Canada (Virtual)
-
[FL-ICML'21] International Workshop on Federated Learning for User Privacy and Data Confidentiality, (Virtual)
-
[RSEML-AAAI-21] Towards Robust, Secure and Efficient Machine Learning, (Virtual)
-
[NeurIPS-SpicyFL'20] Workshop on Scalability, Privacy, and Security in Federated Learning, Vancouver, BC, Canada (Virtual)
-
[FL-IJCAI'20] International Workshop on Federated Learning for User Privacy and Data Confidentiality, Yokohama, Japan (Virtual)
-
[FL-ICML'20] International Workshop on Federated Learning for User Privacy and Data Confidentiality, Vienna, Austria (Virtual)
-
[FL-IBM'20] Workshop on Federated Learning and Analytics, New York, NY, USA
-
[FL-NeurIPS'19] Workshop on Federated Learning for Data Privacy and Confidentiality (in Conjunction with NeurIPS 2019), Vancouver, BC, Canada
-
[FL-IJCAI'19] International Workshop on Federated Learning for User Privacy and Data Confidentiality in Conjunction with IJCAI 2019, Macau
-
[FL-Google'19] Workshop on Federated Learning and Analytics, Seattle, WA, USA
- Special Issue on Trustable, Verifiable, and Auditable Federated Learning, IEEE Transactions on Big Data (TBD), 2022.
- Special Issue on Federated Learning: Algorithms, Systems, and Applications, ACM Transactions on Intelligent Systems and Technology (TIST), 2021.
- Special Issue on Federated Machine Learning, IEEE Intelligent Systems (IS), 2019.
- "Federated Learning" included as a new keyword in IJCAI'20, Yokohama, Japan
- Special Track on Federated Machine Learning, IEEE BigData'19, Los Angeles, CA, USA
- 2022/09/19 - add NeurIPS 2022 papers
- 2022/09/16 - repository is online with Github Pages
- 2022/09/06 - add information about FL on Tabular and Graph data
- 2022/09/05 - add some information about top journals and add TPDS papers
- 2022/08/31 - all papers (including 400+ papers from top conferences and top journals and 100+ papers with graph and tabular data) have been comprehensively sorted out, and information such as publication addresses, links to preprints and source codes of these papers have been compiled. The source code of 280+ papers has been obtained. We hope it can help those who use this project. 😃
- 2022/07/31 - add VLDB papers
- 2022/07/30 - add top-tier system conferences papers and add COLT,UAI,OSDI, SOSP, ISCA, MLSys, AISTATS,WSDM papers
- 2022/07/28 - add a list of top-tier conferences papers and add IJCAI,SIGIR,SIGMOD,ICDE,WWW,SIGCOMM.INFOCOM,WWW papers
- 2022/07/27 - add some ECCV 2022 papers
- 2022/07/22 - add CVPR 2022 and MM 2020,2021 papers
- 2022/07/21 - give TL;DR and interpret information(解读) of papers. And add KDD 2022 papers
- 2022/07/15 - give a list of papers in the field of federated learning in top NLP/Secure conferences. And add ICML 2022 papers
- 2022/07/14 - give a list of papers in the field of federated learning in top ML/CV/AI/DM conferences from innovation-cat‘s Awesome-Federated-Machine-Learning and find 🔥 papers(code is available & stars >= 100)
- 2022/07/12 - added information about the last commit time of the federated learning open source framework (can be used to determine the maintenance of the code base)
- 2022/07/12 - give a list of papers in the field of federated learning in top journals
- 2022/05/25 - complete the paper and code lists of FL on tabular data and Tree algorithms
- 2022/05/25 - add the paper list of FL on tabular data and Tree algorithms
- 2022/05/24 - complete the paper and code lists of FL on graph data and Graph Neural Networks
- 2022/05/23 - add the paper list of FL on graph data and Graph Neural Networks
- 2022/05/21 - update all of Federated Learning Framework
More items will be added to the repository. Please feel free to suggest other key resources by opening an issue report, submitting a pull request, or dropping me an email @ ([email protected]). Enjoy reading!
Many thanks ❤️ to the other awesome list:
-
Federated Learning
-
Other fields
@misc{awesomeflGTD,
title = {Awesome-Federated-Learning-on-Graph-and-Tabular-Data},
author = {Yuwen Yang, Bingjie Yan, Xuefeng Jiang, Hongcheng Li, Jian Wang, Jiao Chen, Xiangmou Qu, Chang Liu and others},
year = {2022},
howpublished = {\\url{https://github.com/youngfish42/Awesome-Federated-Learning-on-Graph-and-Tabular-Data}
}
Footnotes
-
FedWalk, a random-walk-based unsupervised node embedding algorithm that operates in such a node-level visibility graph with raw graph information remaining locally. FedWalk,一个基于随机行走的无监督节点嵌入算法,在这样一个节点级可见度图中操作,原始图信息保留在本地。 ↩ ↩2
-
FederatedScope-GNN present an easy-to-use FGL (federated graph learning) package. FederatedScope-GNN提出了一个易于使用的FGL(联邦图学习)软件包。 ↩ ↩2
-
GAMF formulate the model fusion problem as a graph matching task, considering the second-order similarity of model weights instead of previous work merely formulating model fusion as a linear assignment problem. For the rising problem scale and multi-model consistency issues, GAMF propose an efficient graduated assignment-based model fusion method, iteratively updates the matchings in a consistency-maintaining manner. GAMF将模型融合问题表述为图形匹配任务,考虑了模型权重的二阶相似性,而不是之前的工作仅仅将模型融合表述为一个线性赋值问题。针对问题规模的扩大和多模型的一致性问题,GAMF提出了一种高效的基于分级赋值的模型融合方法,以保持一致性的方式迭代更新匹配结果。 ↩
-
We study the knowledge extrapolation problem to embed new components (i.e., entities and relations) that come with emerging knowledge graphs (KGs) in the federated setting. In this problem, a model trained on an existing KG needs to embed an emerging KG with unseen entities and relations. To solve this problem, we introduce the meta-learning setting, where a set of tasks are sampled on the existing KG to mimic the link prediction task on the emerging KG. Based on sampled tasks, we meta-train a graph neural network framework that can construct features for unseen components based on structural information and output embeddings for them. 我们研究了知识外推问题,以嵌入新的组件(即实体和关系),这些组件来自于联邦设置的新兴知识图(KGs)。在这个问题上,一个在现有KG上训练的模型需要嵌入一个带有未见过的实体和关系的新兴KG。为了解决这个问题,我们引入了元学习设置,在这个设置中,一组任务在现有的KG上被抽样,以模拟新兴KG上的链接预测任务。基于抽样任务,我们对图神经网络框架进行元训练,该框架可以根据结构信息为未见过的组件构建特征,并为其输出嵌入。 ↩ ↩2
-
A novel structured federated learning (SFL) framework to enhance the knowledge-sharing process in PFL by leveraging the graph-based structural information among clients and learn both the global and personalized models simultaneously using client-wise relation graphs and clients' private data. We cast SFL with graph into a novel optimization problem that can model the client-wise complex relations and graph-based structural topology by a unified framework. Moreover, in addition to using an existing relation graph, SFL could be expanded to learn the hidden relations among clients. 一个新的结构化联邦学习(SFL)框架通过利用客户之间基于图的结构信息来加强PFL中的知识共享过程,并使用客户的关系图和客户的私人数据同时学习全局和个性化的模型。我们把带图的SFL变成一个新的优化问题,它可以通过一个统一的框架对客户的复杂关系和基于图的结构拓扑进行建模。此外,除了使用现有的关系图之外,SFL还可以扩展到学习客户之间的隐藏关系。 ↩ ↩2
-
VFGNN, a federated GNN learning paradigm for privacy-preserving node classification task under data vertically partitioned setting, which can be generalized to existing GNN models. Specifically, we split the computation graph into two parts. We leave the private data (i.e., features, edges, and labels) related computations on data holders, and delegate the rest of computations to a semi-honest server. We also propose to apply differential privacy to prevent potential information leakage from the server. VFGNN是一种联邦的GNN学习范式,适用于数据纵向分割情况下的隐私保护节点分类任务,它可以被推广到现有的GNN模型。具体来说,我们将计算图分成两部分。我们将私有数据(即特征、边和标签)相关的计算留给数据持有者,并将其余的计算委托给半诚实的服务器。我们还提议应用差分隐私来防止服务器的潜在信息泄露。 ↩ ↩2
-
SpreadGNN, a novel multi-task federated training framework capable of operating in the presence of partial labels and absence of a central server for the first time in the literature. We provide convergence guarantees and empirically demonstrate the efficacy of our framework on a variety of non-I.I.D. distributed graph-level molecular property prediction datasets with partial labels. SpreadGNN首次提出一个新颖的多任务联邦训练框架,能够在存在部分标签和没有中央服务器的情况下运行。我们提供了收敛保证,并在各种具有部分标签的非I.I.D.分布式图级分子特性预测数据集上实证了我们框架的功效。我们的研究结果表明,SpreadGNN优于通过依赖中央服务器的联邦学习系统训练的GNN模型,即使在受限的拓扑结构中也是如此。 ↩ ↩2
-
FedGraph for federated graph learning among multiple computing clients, each of which holds a subgraph. FedGraph provides strong graph learning capability across clients by addressing two unique challenges. First, traditional GCN training needs feature data sharing among clients, leading to risk of privacy leakage. FedGraph solves this issue using a novel cross-client convolution operation. The second challenge is high GCN training overhead incurred by large graph size. We propose an intelligent graph sampling algorithm based on deep reinforcement learning, which can automatically converge to the optimal sampling policies that balance training speed and accuracy. FedGraph 用于多个计算客户端之间的联邦图学习,每个客户端都有一个子图。FedGraph通过解决两个独特的挑战,跨客户端提供了强大的图形学习能力。首先,传统的GCN训练需要在客户之间进行功能数据共享,从而导致隐私泄露的风险。FedGraph使用一种新的跨客户端卷积操作来解决了这个问题。第二个挑战是大图所产生的高GCN训练开销。提出了一种基于深度强化学习的智能图采样算法,该算法可以自动收敛到最优的平衡训练速度和精度的采样策略。 ↩ ↩2
-
TBC ↩
-
FedNI, to leverage network inpainting and inter-institutional data via FL. Specifically, we first federatively train missing node and edge predictor using a graph generative adversarial network (GAN) to complete the missing information of local networks. Then we train a global GCN node classifier across institutions using a federated graph learning platform. The novel design enables us to build more accurate machine learning models by leveraging federated learning and also graph learning approaches. FedNI,通过 FL 来利用网络补全和机构间数据。 具体来说,我们首先使用图生成对抗网络(GAN)对缺失节点和边缘预测器进行联邦训练,以完成局部网络的缺失信息。 然后,我们使用联邦图学习平台跨机构训练全局 GCN 节点分类器。 新颖的设计使我们能够通过利用联邦学习和图学习方法来构建更准确的机器学习模型。 ↩
-
FedEgo, a federated graph learning framework based on ego-graphs, where each client will train their local models while also contributing to the training of a global model. FedEgo applies GraphSAGE over ego-graphs to make full use of the structure information and utilizes Mixup for privacy concerns. To deal with the statistical heterogeneity, we integrate personalization into learning and propose an adaptive mixing coefficient strategy that enables clients to achieve their optimal personalization. FedEgo是一个基于自中心图的联邦图学习框架,每个客户端将训练他们的本地模型,同时也为全局模型的训练作出贡献。FedEgo在自中心图上应用GraphSAGE来充分利用结构信息,并利用Mixup来解决隐私问题。为了处理统计上的异质性,我们将个性化整合到学习中,并提出了一个自适应混合系数策略,使客户能够实现其最佳的个性化。 ↩
-
FedPerGNN, a federated GNN framework for both effective and privacy-preserving personalization. Through a privacy-preserving model update method, we can collaboratively train GNN models based on decentralized graphs inferred from local data. To further exploit graph information beyond local interactions, we introduce a privacy-preserving graph expansion protocol to incorporate high-order information under privacy protection. FedPerGNN是一个既有效又保护隐私的GNN联盟框架。通过一个保护隐私的模型更新方法,我们可以根据从本地数据推断出的分散图来协作训练GNN模型。为了进一步利用本地互动以外的图信息,我们引入了一个保护隐私的图扩展协议,在保护隐私的前提下纳入高阶信息。 ↩ ↩2
-
This work focuses on the graph classification task with partially labeled data. (1) Enhancing the collaboration processes: We propose a new personalized FL framework to deal with Non-IID data. Clients with more similar data have greater mutual influence, where the similarities can be evaluated via unlabeled data. (2) Enhancing the local training process: We introduce auxiliary loss for unlabeled data that restrict the training process. We propose a new pseudo-label strategy for our SemiGraphFL framework to make more effective predictions. 这项工作专注于具有部分标记数据的图分类任务。(1) 加强合作过程。我们提出了一个新的个性化的FL框架来处理非IID数据。拥有更多相似数据的客户有更大的相互影响,其中的相似性可以通过未标记的数据进行评估。(2) 加强本地训练过程。我们为未标记的数据引入了辅助损失,限制了训练过程。我们为我们的SemiGraphFL框架提出了一个新的伪标签策略,以做出更有效的预测。 ↩
-
In this paper, we first develop a novel attack that aims to recover the original data based on embedding information, which is further used to evaluate the vulnerabilities of FedE. Furthermore, we propose a Federated learning paradigm with privacy-preserving Relation embedding aggregation (FedR) to tackle the privacy issue in FedE. Compared to entity embedding sharing, relation embedding sharing policy can significantly reduce the communication cost due to its smaller size of queries. 在本文中,我们首先开发了一个新颖的攻击,旨在基于嵌入信息恢复原始数据,并进一步用于评估FedE的漏洞。此外,我们提出了一种带有隐私保护的关系嵌入聚合(FedR)的联邦学习范式,以解决FedE的隐私问题。与实体嵌入共享相比,关系嵌入共享策略由于其较小的查询规模,可以大大降低通信成本。 ↩
-
A data-driven approach for power allocation in the context of federated learning (FL) over interference-limited wireless networks. The power policy is designed to maximize the transmitted information during the FL process under communication constraints, with the ultimate objective of improving the accuracy and efficiency of the global FL model being trained. The proposed power allocation policy is parameterized using a graph convolutional network and the associated constrained optimization problem is solved through a primal-dual algorithm. 在干扰有限的无线网络上联邦学习(FL)的背景下,一种数据驱动的功率分配方法。功率策略的设计是为了在通信约束下的联邦学习过程中最大化传输信息,其最终目的是提高正在训练的全局联邦学习模型的准确性和效率。所提出的功率分配策略使用图卷积网络进行参数化,相关的约束性优化问题通过原始-双重算法进行解决。 ↩
-
We investigate multi-task learning (MTL), where multiple learning tasks are performed jointly rather than separately to leverage their similarities and improve performance. We focus on the federated multi-task linear regression setting, where each machine possesses its own data for individual tasks and sharing the full local data between machines is prohibited. Motivated by graph regularization, we propose a novel fusion framework that only requires a one-shot communication of local estimates. Our method linearly combines the local estimates to produce an improved estimate for each task, and we show that the ideal mixing weight for fusion is a function of task similarity and task difficulty. 我们研究了多任务学习(MTL),其中多个学习任务被关联而不是单独执行,以利用它们的相似性并提高性能。我们专注于联邦多任务线性回归的设置,其中每台机器拥有自己的个别任务的数据,并且禁止在机器之间共享完整的本地数据。在图正则化的启发下,我们提出了一个新的融合框架,只需要一次本地估计的交流。我们的方法线性地结合本地估计,为每个任务产生一个改进的估计,我们表明,融合的理想混合权重是任务相似性和任务难度的函数。 ↩
-
FedEC framework, a local training procedure is responsible for learning knowledge graph embeddings on each client based on a specific embedding learner. We apply embedding-contrastive learning to limit the embedding update for tackling data heterogeneity. Moreover, a global update procedure is used for sharing and averaging entity embeddings on the master server. 在FedEC框架中,一个本地训练程序负责在每个客户端上基于特定的嵌入学习者学习知识图的嵌入。我们应用嵌入对比学习来限制嵌入的更新,以解决数据的异质性问题。此外,全局更新程序被用于共享和平均主服务器上的实体嵌入。 ↩
-
Existing FL paradigms are inefficient for geo-distributed GCN training since neighbour sampling across geo-locations will soon dominate the whole training process and consume large WAN bandwidth. We derive a practical federated graph learning algorithm, carefully striking the trade-off among GCN convergence error, wall-clock runtime, and neighbour sampling interval. Our analysis is divided into two cases according to the budget for neighbour sampling. In the unconstrained case, we obtain the optimal neighbour sampling interval, that achieves the best trade-off between convergence and runtime; in the constrained case, we show that determining the optimal sampling interval is actually an online problem and we propose a novel online algorithm with bounded competitive ratio to solve it. Combining the two cases, we propose a unified algorithm to decide the neighbour sampling interval in federated graph learning, and demonstrate its effectiveness with extensive simulation over graph datasets. 现有的FL范式对于地理分布式的GCN训练是低效的,因为跨地理位置的近邻采样很快将主导整个训练过程,并消耗大量的广域网带宽。我们推导了一个实用的联邦图学习算法,仔细权衡了GCN收敛误差、wall - clock运行时间和近邻采样间隔。我们的分析根据邻居抽样的预算分为两种情况。在无约束的情况下,我们得到了最优的近邻采样间隔,实现了收敛性和运行时间的最佳折衷;在有约束的情况下,我们证明了确定最优采样间隔实际上是一个在线问题,并提出了一个新的有界竞争比的在线算法来解决这个问题。结合这两种情况,我们提出了一个统一的算法来决定联邦图学习中的近邻采样间隔,并通过在图数据集上的大量仿真证明了其有效性 ↩
-
The DP-based federated GNN has not been well investigated, especially in the sub-graph-level setting, such as the scenario of recommendation system. DP-FedRec, a DP-based federated GNN to fill the gap. Private Set Intersection (PSI) is leveraged to extend the local graph for each client, and thus solve the non-IID problem. Most importantly, DP(differential privacy) is applied not only on the weights but also on the edges of the intersection graph from PSI to fully protect the privacy of clients. 基于DP的联邦GNN还没有得到很好的研究,特别是在子图层面的设置,如推荐系统的场景。DP-FedRec,一个基于DP的联盟式GNN来填补这一空白。隐私集合求交(PSI)被用来扩展每个客户端的本地图,从而解决非IID问题。最重要的是,DP(差分隐私)不仅适用于权重,也适用于PSI中交集图的边,以充分保护客户的隐私。 ↩
-
C lustering-based hierarchical and T wo-step- optimized FL (CTFL) employs a divide-and-conquer strategy, clustering clients based on the closeness of their local model parameters. Furthermore, we incorporate the particle swarm optimization algorithm in CTFL, which employs a two-step strategy for optimizing local models. This technique enables the central server to upload only one representative local model update from each cluster, thus reducing the communication overhead associated with model update transmission in the FL. 基于聚类的层次化和两步优化的FL ( CTFL )采用分治策略,根据本地模型参数的接近程度对客户端进行聚类。此外,我们将粒子群优化算法集成到CTFL中,该算法采用两步策略优化局部模型。此技术使中心服务器能够仅从每个集群上载一个有代表性的本地模型更新,从而减少与FL中模型更新传输相关的通信开销。 ↩
-
A privacy-preserving spatial-temporal prediction technique via federated learning (FL). Due to inherent non-independent identically distributed (non-IID) characteristic of spatial-temporal data, the basic FL-based method cannot deal with this data heterogeneity well by sharing global model; furthermore, we propose the personalized federated learning methods based on meta-learning. We automatically construct the global spatial-temporal pattern graph under a data federation. This global pattern graph incorporates and memorizes the local learned patterns of all of the clients, and each client leverages those global patterns to customize its own model by evaluating the difference between global and local pattern graph. Then, each client could use this customized parameters as its model initialization parameters for spatial-temporal prediction tasks. 一种通过联邦学习(FL)保护隐私的时空预测技术。由于时空数据固有的非独立同分布(non-IID)特性,基本的基于FL的方法无法通过共享全局模型很好地处理这种数据异构性;此外,我们提出了基于元学习的个性化联邦学习方法。我们在数据联邦下自动构建全局时空模式图。这个全局模式图包含并记忆了所有客户机的本地学习模式,每个客户机利用这些全局模式通过评估全局模式图和本地模式图之间的差异来定制自己的模型。然后,每个客户端可以使用这个定制的参数作为其时空预测任务的模型初始化参数。 ↩
-
We investigate FL scenarios in which data owners are related by a network topology (e.g., traffic prediction based on sensor networks). Existing personalized FL approaches cannot take this information into account. To address this limitation, we propose the Bilevel Optimization enhanced Graph-aided Federated Learning (BiG-Fed) approach. The inner weights enable local tasks to evolve towards personalization, and the outer shared weights on the server side target the non-i.i.d problem enabling individual tasks to evolve towards a global constraint space. To the best of our knowledge, BiG-Fed is the first bilevel optimization technique to enable FL approaches to cope with two nested optimization tasks at the FL server and FL clients simultaneously. 我们研究了数据所有者与网络拓扑相关的 FL 场景(例如,基于传感器网络的流量预测)。 现有的个性化 FL 方法无法将这些信息考虑在内。 为了解决这个限制,我们提出了双层优化增强的图形辅助联邦学习(BiG-Fed)方法。 内部权重使本地任务向个性化发展,而服务器端的外部共享权重针对非独立同分布问题,使单个任务向全局约束空间发展。 据我们所知,BiG-Fed 是第一个使 FL 方法能够同时处理 FL 服务器和 FL 客户端的两个嵌套优化任务的双层优化技术。 ↩
-
A graph neural network model based on federated learning named GraphSniffer to identify malicious transactions in the digital currency market. GraphSniffer leverages federated learning and graph neural networks to model graph-structured Bitcoin transaction data distributed at different worker nodes, and transmits the gradients of the local model to the server node for aggregation to update the parameters of the global model. GraphSniffer 一种基于联邦学习的图神经网络模型来识别数字货币市场中的恶意交易。GraphSniffer 利用联邦学习和图神经网络对分布在不同工作节点的图结构比特币交易数据进行建模,并将局部模型的梯度传递到服务器节点进行聚合,更新全局模型的参数。 ↩
-
We explore the threat of collusion attacks from multiple malicious clients who pose targeted attacks (e.g., label flipping) in a federated learning configuration. By leveraging client weights and the correlation among them, we develop a graph-based algorithm to detect malicious clients. 我们探讨了来自多个恶意客户的串通攻击的威胁,这些客户在联邦学习配置中提出了有针对性的攻击(例如,标签翻转)。通过利用客户端的权重和它们之间的关联性,我们开发了一种基于图的算法来检测恶意客户端。 ↩ ↩2
-
Federated learning allows end users to build a global model collaboratively while keeping their training data isolated. We first simulate a heterogeneous federated-learning benchmark (FedChem) by jointly performing scaffold splitting and latent Dirichlet allocation on existing datasets. Our results on FedChem show that significant learning challenges arise when working with heterogeneous molecules across clients. We then propose a method to alleviate the problem: Federated Learning by Instance reweighTing (FLIT+). FLIT+ can align local training across clients. Experiments conducted on FedChem validate the advantages of this method. 联邦学习允许最终用户协同构建全局模型,同时保持他们的训练数据是孤立的。我们首先通过在现有数据集上联邦执行支架拆分和隐狄利克雷分配来模拟一个异构的联邦学习基准FedChem 。我们在FedChem上的研究结果表明,在跨客户端处理异构分子时,会出现显著的学习挑战。然后,我们提出了一种缓解该问题的方法:实例重加权联邦学习FLIT + 。FLIT+可以跨客户对齐本地训练。在FedChem上进行的实验验证了这种方法的优势。 ↩
-
Deep learning-based Wi-Fi indoor fingerprint localization, which requires a large received signal strength (RSS) dataset for training. A multi-level federated graph learning and self-attention based personalized indoor localization method is proposed to further capture the intrinsic features of RSS(received signal strength), and learn the aggregation manner of shared information uploaded by clients, with better personalization accuracy. 基于深度学习的Wi-Fi室内指纹定位,需要一个大的接收信号强度( RSS )数据集进行训练。为了进一步捕获RSS(接收信号强度)的内在特征,学习客户端上传的共享信息的聚合方式,具有更好的个性化精度,提出了一种基于多级联邦图学习和自注意力机制的个性化室内定位方法。 ↩
-
This paper proposes a decentralized online multitask learning algorithm based on GFL (O-GFML). Clients update their local models using continuous streaming data while clients and multiple servers can train different but related models simul-taneously. Furthermore, to enhance the communication efficiency of O-GFML, we develop a partial-sharing-based O-GFML (PSO-GFML). The PSO-GFML allows participating clients to exchange only a portion of model parameters with their respective servers during a global iteration, while non-participating clients update their local models if they have access to new data. 本文提出了一种基于GFL (O-GFML)的去中心化在线多任务学习算法。客户端使用连续的流数据更新本地模型,而客户端和多个服务器可以同时训练不同但相关的模型。此外,为了提高O-GFML的通信效率,我们开发了一种基于部分共享的O-GFML (PSO-GFML)。PSO-GFML允许参与的客户端在全局迭代过程中只与各自的服务器交换部分模型参数,而非参与的客户端在有机会获得新数据的情况下更新本地模型。 ↩
-
AI healthcare applications rely on sensitive electronic healthcare records (EHRs) that are scarcely labelled and are often distributed across a network of the symbiont institutions. In this work, we propose dynamic neural graphs based federated learning framework to address these challenges. The proposed framework extends Reptile , a model agnostic meta-learning (MAML) algorithm, to a federated setting. However, unlike the existing MAML algorithms, this paper proposes a dynamic variant of neural graph learning (NGL) to incorporate unlabelled examples in the supervised training setup. Dynamic NGL computes a meta-learning update by performing supervised learning on a labelled training example while performing metric learning on its labelled or unlabelled neighbourhood. This neighbourhood of a labelled example is established dynamically using local graphs built over the batches of training examples. Each local graph is constructed by comparing the similarity between embedding generated by the current state of the model. The introduction of metric learning on the neighbourhood makes this framework semi-supervised in nature. The experimental results on the publicly available MIMIC-III dataset highlight the effectiveness of the proposed framework for both single and multi-task settings under data decentralisation constraints and limited supervision. 人工智能医疗应用依赖于敏感的电子医疗记录( EHR ),这些记录几乎没有标签,而且往往分布在共生体机构的网络中。在这项工作中,我们提出了基于动态神经图的联邦学习框架来解决这些挑战。提出的框架将模型不可知元学习(MAML)算法Reptile扩展到联邦环境。然而,与现有的MAML算法不同,本文提出了神经图学习(Neural Graph Learning,NGL 的动态变体,以在有监督的训练设置中纳入未标记的示例。动态NGL通过对带标签的训练示例执行监督学习,同时对其带标签或未带标签的邻域执行度量学习来计算元学习更新。标记样本的这个邻域是使用在批量训练样本上建立的局部图动态建立的。通过比较由模型的当前状态生成的嵌入之间的相似性来构造每个局部图。在邻域上引入度量学习使得这个框架具有半监督的性质。 ↩
-
A Federated Learning-Based Graph Convolutional Network (FedGCN). First, we propose a Graph Convolutional Network (GCN) as a local model of FL. Based on the classical graph convolutional neural network, TopK pooling layers and full connection layers are added to this model to improve the feature extraction ability. Furthermore, to prevent pooling layers from losing information, cross-layer fusion is used in the GCN, giving FL an excellent ability to process non-Euclidean spatial data. Second, in this paper, a federated aggregation algorithm based on an online adjustable attention mechanism is proposed. The trainable parameter ρ is introduced into the attention mechanism. The aggregation method assigns the corresponding attention coefficient to each local model, which reduces the damage caused by the inefficient local model parameters to the global model and improves the fault tolerance and accuracy of the FL algorithm. 基于联邦学习的图卷积网络(Fedgcn)。首先,我们提出了一个图卷积网络(GCN)作为FL的局部模型。该模型在经典图卷积神经网络的基础上,增加了Top K池化层和全连接层,提高了特征提取能力。此外,为了防止池化层丢失信息,在GCN中使用跨层融合,使FL具有处理非欧几里得空间数据的出色能力。其次,本文提出了一种基于在线可调注意力机制的联邦聚合算法。可训练参数ρ被引入注意力机制。聚合方法为每个局部模型分配相应的注意力系数,减少了低效的局部模型参数对全局模型造成的破坏,提高了FL算法的容错性和准确性。 ↩
-
Two important characteristics of contemporary wireless networks: (i) the network may contain heterogeneous communication/computation resources, while (ii) there may be significant overlaps in devices' local data distributions. In this work, we develop a novel optimization methodology that jointly accounts for these factors via intelligent device sampling complemented by device-to-device (D2D) offloading. Our optimization aims to select the best combination of sampled nodes and data offloading configuration to maximize FedL training accuracy subject to realistic constraints on the network topology and device capabilities. Theoretical analysis of the D2D offloading subproblem leads to new FedL convergence bounds and an efficient sequential convex optimizer. Using this result, we develop a sampling methodology based on graph convolutional networks (GCNs) which learns the relationship between network attributes, sampled nodes, and resulting offloading that maximizes FedL accuracy. 当代无线网络的两个重要特征:( i )网络中可能包含异构的通信/计算资源( ii )设备的本地数据分布可能存在显著的重叠。在这项工作中,我们开发了一种新的优化方法,通过智能设备采样和设备到设备(D2D)卸载来共同考虑这些因素。我们的优化目标是在网络拓扑和设备能力的现实约束下,选择采样节点和数据卸载配置的最佳组合,以最大化FedL训练精度。对D2D卸载子问题的理论分析得到了新的FedL收敛界和一个有效的序列凸优化器。利用这一结果,我们开发了一种基于图卷积网络(GCN)的采样方法,该方法学习网络属性、采样节点和结果卸载之间的关系,从而最大化FedL的准确性。 ↩ ↩2
-
Graphs can also be regarded as a special type of data samples. We analyze real-world graphs from different domains to confirm that they indeed share certain graph properties that are statistically significant compared with random graphs. However, we also find that different sets of graphs, even from the same domain or same dataset, are non-IID regarding both graph structures and node features. A graph clustered federated learning (GCFL) framework that dynamically finds clusters of local systems based on the gradients of GNNs, and theoretically justify that such clusters can reduce the structure and feature heterogeneity among graphs owned by the local systems. Moreover, we observe the gradients of GNNs to be rather fluctuating in GCFL which impedes high-quality clustering, and design a gradient sequence-based clustering mechanism based on dynamic time warping (GCFL+). 图也可以看作是一种特殊类型的数据样本。我们分析来自不同领域的真实图,以确认它们确实共享某些与随机图形相比具有统计意义的图属性。然而,我们也发现不同的图集,即使来自相同的域或相同的数据集,在图结构和节点特性方面都是非IID的。图聚类联邦学习(GCFL)框架,基于GNNs的梯度动态地找到本地系统的集群,并从理论上证明这样的集群可以减少本地系统所拥有的图之间的结构和特征异构性。此外,我们观察到GNNs的梯度在GCFL中波动较大,阻碍了高质量的聚类,并设计了基于动态时间规整的梯度序列聚类机制(GCFL+)。 ↩ ↩2
-
In this work, towards the novel yet realistic setting of subgraph federated learning, we propose two major techniques: (1) FedSage, which trains a GraphSage model based on FedAvg to integrate node features, link structures, and task labels on multiple local subgraphs; (2) FedSage+, which trains a missing neighbor generator along FedSage to deal with missing links across local subgraphs. 在本工作中,针对子图联邦学习的新颖而现实的设置,我们提出了两个主要技术:(1) FedSage,它基于FedAvg训练一个GraphSage模型,以整合多个局部子图上的节点特征、链接结构和任务标签;(2) FedSage +,它沿着FedSage训练一个缺失的邻居生成器,以处理跨本地子图的缺失链接。 ↩ ↩2
-
Cross-Node Federated Graph Neural Network (CNFGNN) , a federated spatio-temporal model, which explicitly encodes the underlying graph structure using graph neural network (GNN)-based architecture under the constraint of cross-node federated learning, which requires that data in a network of nodes is generated locally on each node and remains decentralized. CNFGNN operates by disentangling the temporal dynamics modeling on devices and spatial dynamics on the server, utilizing alternating optimization to reduce the communication cost, facilitating computations on the edge devices. 跨节点联邦图神经网络(CNFGNN),是一个联邦时空模型,在跨节点联邦学习的约束下,使用基于图神经网络(GNN)的架构对底层图结构进行显式编码,这要求节点网络中的数据是在每个节点上本地生成的,并保持分散。CNFGNN通过分解设备上的时间动态建模和服务器上的空间动态来运作,利用交替优化来降低通信成本,促进边缘设备的计算。 ↩ ↩2
-
A novel decentralized scalable learning framework, Federated Knowledge Graphs Embedding (FKGE), where embeddings from different knowledge graphs can be learnt in an asynchronous and peer-to-peer manner while being privacy-preserving. FKGE exploits adversarial generation between pairs of knowledge graphs to translate identical entities and relations of different domains into near embedding spaces. In order to protect the privacy of the training data, FKGE further implements a privacy-preserving neural network structure to guarantee no raw data leakage. 一种新颖的去中心化可扩展学习框架,联邦知识图谱嵌入(FKGE),其中来自不同知识图谱的嵌入可以以异步和对等的方式学习,同时保持隐私。FKGE利用成对知识图谱之间的对抗生成,将不同领域的相同实体和关系转换到临近嵌入空间。为了保护训练数据的隐私,FKGE进一步实现了一个保护隐私的神经网络结构,以保证原始数据不会泄露。 ↩
-
A new Decentralized Federated Graph Neural Network (D-FedGNN for short) which allows multiple participants to train a graph neural network model without a centralized server. Specifically, D-FedGNN uses a decentralized parallel stochastic gradient descent algorithm DP-SGD to train the graph neural network model in a peer-to-peer network structure. To protect privacy during model aggregation, D-FedGNN introduces the Diffie-Hellman key exchange method to achieve secure model aggregation between clients. 一个新的去中心化的联邦图神经网络(简称D-FedGNN)允许多个参与者在没有中心化服务器的情况下训练一个图神经网络模型。具体地,D-FedGNN采用去中心化的并行随机梯度下降算法DP-SGD在对等网络结构中训练图神经网络模型。为了保护模型聚合过程中的隐私,D-FedGNN引入了Diffie-Hellman密钥交换方法来实现客户端之间的安全模型聚合。 ↩
-
We study the vertical and horizontal settings for federated learning on graph data. We propose FedSGC to train the Simple Graph Convolution model under three data split scenarios. 我们研究了图数据上联邦学习的横向和纵向设置。我们提出FedSGC在三种数据分割场景下训练简单图卷积模型。 ↩
-
A holistic collaborative and privacy-preserving FL framework, namely FL-DISCO, which integrates GAN and GNN to generate molecular graphs. 集成GAN和GNN生成分子图的整体协作和隐私保护FL框架FL-DISCO。 ↩
-
We introduce a differential privacy-based adjacency matrix preserving approach for protecting the topological information. We also propose an adjacency matrix aggregation approach to allow local GNN-based models to access the global network for a better training effect. Furthermore, we propose a GNN-based model named attention-based spatial-temporal graph neural networks (ASTGNN) for traffic speed forecasting. We integrate the proposed federated learning framework and ASTGNN as FASTGNN for traffic speed forecasting. 我们提出了一种基于差分隐私的邻接矩阵保护方法来保护拓扑信息。我们还提出了一种邻接矩阵聚合方法,允许基于局部GNN的模型访问全局网络,以获得更好的训练效果。此外,我们提出了一个基于GNN的模型,称为基于注意力的时空图神经网络(ASTGNN)的交通速度预测。我们将提出的联邦学习框架和ASTGNN集成为FASTGNN用于交通速度预测。 ↩
-
In order to address device asynchrony and anomaly detection in FL while avoiding the extra resource consumption caused by blockchain, this paper introduces a framework for empowering FL using Direct Acyclic Graph (DAG)-based blockchain systematically (DAG-FL). 为了解决FL中的设备不同步和异常检测问题,同时避免区块链带来的额外资源消耗,本文提出了一种基于直接无环图(DAG, Direct Acyclic Graph)的区块链系统为FL赋能的框架(DAG-FL)。 ↩
-
In this paper, we introduce federated setting to keep Multi-Source KGs' privacy without triple transferring between KGs(Knowledge graphs) and apply it in embedding knowledge graph, a typical method which have proven effective for KGC(Knowledge Graph Completion) in the past decade. We propose a Federated Knowledge Graph Embedding framework FedE, focusing on learning knowledge graph embeddings by aggregating locally-computed updates. 在本文中,我们引入联邦设置来保持多源KGs的隐私,而不需要在KGs (知识图谱)之间传输三元组,并将其应用于知识图谱嵌入(这是一个典型的方法,在过去的十年中已证明对KGC(知识图谱补全)有效)。我们提出了一个联邦知识图谱嵌入框架FedE,重点是通过聚合本地计算的更新来学习知识图谱嵌入。 ↩
-
A new federated framework FKE for representation learning of knowledge graphs to deal with the problem of privacy protection and heterogeneous data. 一种新的联邦框架 FKE,用于知识图谱的表示学习,以处理隐私保护和异构数据的问题。 ↩
-
GFL, A private multi-server federated learning scheme, which we call graph federated learning. We use cryptographic and differential privacy concepts to privatize the federated learning algorithm over a graph structure. We further show under convexity and Lipschitz conditions, that the privatized process matches the performance of the non-private algorithm. GFL,一种私有的多服务器联邦学习方案,我们称之为图联邦学习。 我们使用密码学和差分隐私概念将联邦学习算法私有化在图结构上。 我们进一步表明在凸性和 Lipschitz 条件下,私有化过程与非私有算法的性能相匹配。 ↩
-
A novel framework Fedrated Social recommendation with Graph neural network (FeSoG). Firstly, FeSoG adopts relational attention and aggregation to handle heterogeneity. Secondly, FeSoG infers user embeddings using local data to retain personalization.The proposed model employs pseudo-labeling techniques with item sampling to protect the privacy and enhance training. 一种带有图神经网络 (FeSoG) 的新框架联邦社交推荐。 首先,FeSoG 采用关系注意力和聚合来处理异质性。 其次,FeSoG 使用本地数据推断用户嵌入以保留个性化。所提出的模型采用带有项目采样的伪标签技术来保护隐私并增强训练。 ↩
-
FedGraphNN, an open FL benchmark system that can facilitate research on federated GNNs. FedGraphNN is built on a unified formulation of graph FL and contains a wide range of datasets from different domains, popular GNN models, and FL algorithms, with secure and efficient system support. FedGraphNN是一个开放的FL基准系统,可以方便地进行联邦GNN的研究。FedGraphNN建立在图FL的统一提法之上,包含来自不同领域的广泛数据集、流行的GNN模型和FL算法,具有安全高效的系统支持。 ↩
-
The connectional brain template (CBT) is a compact representation (i.e., a single connectivity matrix) multi-view brain networks of a given population. CBTs are especially very powerful tools in brain dysconnectivity diagnosis as well as holistic brain mapping if they are learned properly – i.e., occupy the center of the given population. We propose the first federated connectional brain template learning (Fed-CBT) framework to learn how to integrate multi-view brain connectomic datasets collected by different hospitals into a single representative connectivity map. First, we choose a random fraction of hospitals to train our global model. Next, all hospitals send their model weights to the server to aggregate them. We also introduce a weighting method for aggregating model weights to take full benefit from all hospitals. Our model to the best of our knowledge is the first and only federated pipeline to estimate connectional brain templates using graph neural networks. 连接脑模板(CBT)是一个给定人群的紧凑表示(即,单个连接矩阵)多视图脑网络。CBTs在大脑障碍诊断和整体大脑映射中特别是非常强大的工具,如果它们被正确地学习- -即占据给定人群的中心。我们提出了第一个联邦连接脑模板学习( Fed-CBT )框架来学习如何将不同医院收集的多视角脑连接组学数据集整合成一个单一的代表性连接图。首先,我们随机选择一部分医院来训练我们的全球模型。接下来,所有医院将其模型权重发送给服务器进行聚合。我们还介绍了一种加权方法,用于聚合模型权重,以充分受益于所有医院。据我们所知,我们的模型是第一个也是唯一一个使用图神经网络来估计连接大脑模板的联邦管道。 ↩
-
A novel Cluster-driven Graph Federated Learning (FedCG). In FedCG, clustering serves to address statistical heterogeneity, while Graph Convolutional Networks (GCNs) enable sharing knowledge across them. FedCG: i) identifies the domains via an FL-compliant clustering and instantiates domain-specific modules (residual branches) for each domain; ii) connects the domain-specific modules through a GCN at training to learn the interactions among domains and share knowledge; and iii) learns to cluster unsupervised via teacher-student classifier-training iterations and to address novel unseen test domains via their domain soft-assignment scores. 一种新颖的集群驱动的图联邦学习(FedCG)。 在 FedCG 中,聚类用于解决统计异质性,而图卷积网络 (GCN) 可以在它们之间共享知识。 FedCG:i)通过符合 FL 的集群识别域,并为每个域实例化特定于域的模块(剩余分支); ii) 在训练时通过 GCN 连接特定领域的模块,以学习领域之间的交互并共享知识; iii)通过教师-学生分类器训练迭代学习无监督聚类,并通过其域软分配分数解决新的未知测试域。 ↩
-
Graph neural network (GNN) is widely used for recommendation to model high-order interactions between users and items.We propose a federated framework for privacy-preserving GNN-based recommendation, which can collectively train GNN models from decentralized user data and meanwhile exploit high-order user-item interaction information with privacy well protected. 图神经网络(GNN)被广泛用于推荐,以对用户和项目之间的高阶交互进行建模。我们提出了一种基于隐私保护的基于 GNN 的推荐的联邦框架,它可以从分散的用户数据集中训练 GNN 模型,同时利用高阶 - 订购用户-项目交互信息,隐私得到很好的保护。 ↩
-
We study the problem of how to efficiently learn a model in a peer-to-peer system with non-iid client data. We propose a method named Performance-Based Neighbor Selection (PENS) where clients with similar data distributions detect each other and cooperate by evaluating their training losses on each other's data to learn a model suitable for the local data distribution. 我们研究如何在具有非独立同分布客户端数据的对等系统中高效地学习模型的问题。我们提出了一种名为基于性能的邻居选择(Performance-Based Neighbor Selection,PENS)的方法,具有相似数据分布的客户端通过评估彼此数据的训练损失来相互检测和合作,从而学习适合本地数据分布的模型。 ↩
-
We study federated graph learning (FGL) under the cross-silo setting where several servers are connected by a wide-area network, with the objective of improving the Quality-of-Service (QoS) of graph learning tasks. Glint, a decentralized federated graph learning system with two novel designs: network traffic throttling and priority-based flows scheduling. 我们研究了跨孤岛设置下的联邦图学习(FGL),其中多台服务器通过广域网连接,目的是提高图学习任务的服务质量(QoS)。 Glint,一个分散的联邦图学习系统,具有两种新颖的设计:网络流量节流和基于优先级的流调度。 ↩
-
A novel distributed scalable federated graph neural network (FGNN) to solve the cross-graph node classification problem. We add PATE mechanism into the domain adversarial neural network (DANN) to construct a cross-network node classification model, and extract effective information from node features of source and target graphs for encryption and spatial alignment. Moreover, we use a one-to-one approach to construct cross-graph node classification models for multiple source graphs and the target graph. Federated learning is used to train the model jointly through multi-party cooperation to complete the target graph node classification task. 一种新颖的分布式可扩展联邦图神经网络 (FGNN),用于解决跨图节点分类问题。 我们在域对抗神经网络(DANN)中加入PATE机制,构建跨网络节点分类模型,从源图和目标图的节点特征中提取有效信息进行加密和空间对齐。 此外,我们使用一对一的方法为多个源图和目标图构建跨图节点分类模型。 联邦学习用于通过多方合作共同训练模型,完成目标图节点分类任务。 ↩
-
Human Activity Recognition (HAR) from sensor measurements is still challenging due to noisy or lack of la-belled examples and issues concerning data privacy. We propose a novel algorithm GraFeHTy, a Graph Convolution Network (GCN) trained in a federated setting. We construct a similarity graph from sensor measurements for each user and apply a GCN to perform semi-supervised classification of human activities by leveraging inter-relatedness and closeness of activities. 由于噪声或缺乏标记示例以及有关数据隐私的问题,来自传感器测量的人类活动识别 (HAR) 仍然具有挑战性。 我们提出了一种新的算法 GraFeHTy,一种在联邦设置中训练的图卷积网络 (GCN)。 我们从每个用户的传感器测量中构建相似图,并应用 GCN 通过利用活动的相互关联性和密切性来执行人类活动的半监督分类。 ↩
-
The aim of this work is to develop a fully-distributed algorithmic framework for training graph convolutional networks (GCNs). The proposed method is able to exploit the meaningful relational structure of the input data, which are collected by a set of agents that communicate over a sparse network topology. After formulating the centralized GCN training problem, we first show how to make inference in a distributed scenario where the underlying data graph is split among different agents. Then, we propose a distributed gradient descent procedure to solve the GCN training problem. The resulting model distributes computation along three lines: during inference, during back-propagation, and during optimization. Convergence to stationary solutions of the GCN training problem is also established under mild conditions. Finally, we propose an optimization criterion to design the communication topology between agents in order to match with the graph describing data relationships. 这项工作的目的是开发一个用于训练图卷积网络(GCN)的完全分布式算法框架。 所提出的方法能够利用输入数据的有意义的关系结构,这些数据由一组通过稀疏网络拓扑进行通信的代理收集。 在制定了集中式 GCN 训练问题之后,我们首先展示了如何在底层数据图在不同代理之间拆分的分布式场景中进行推理。 然后,我们提出了一种分布式梯度下降程序来解决 GCN 训练问题。 生成的模型沿三条线分布计算:推理期间、反向传播期间和优化期间。 GCN 训练问题的平稳解的收敛性也在温和条件下建立。 最后,我们提出了一种优化标准来设计代理之间的通信拓扑,以便与描述数据关系的图相匹配。 ↩
-
We focus on improving the communication efficiency for fully decentralized federated learning (DFL) over a graph, where the algorithm performs local updates for several iterations and then enables communications among the nodes. 我们专注于提高图上完全分散的联邦学习(DFL)的通信效率,其中算法执行多次迭代的本地更新,然后实现节点之间的通信。 ↩
-
An Automated Separated-Federated Graph Neural Network (ASFGNN) learning paradigm. ASFGNN consists of two main components, i.e., the training of GNN and the tuning of hyper-parameters. Specifically, to solve the data Non-IID problem, we first propose a separated-federated GNN learning model, which decouples the training of GNN into two parts: the message passing part that is done by clients separately, and the loss computing part that is learnt by clients federally. To handle the time-consuming parameter tuning problem, we leverage Bayesian optimization technique to automatically tune the hyper-parameters of all the clients. 自动分离联邦图神经网络( ASFGNN )学习范式。ASFGNN由两个主要部分组成,即GNN的训练和超参数的调整。具体来说,为了解决数据Non - IID问题,我们首先提出了分离联邦GNN学习模型,将GNN的训练解耦为两个部分:由客户端单独完成的消息传递部分和由客户端联邦学习的损失计算部分。为了处理耗时的参数调优问题,我们利用贝叶斯优化技术自动调优所有客户端的超参数。 ↩
-
Communication is a critical enabler of large-scale FL due to significant amount of model information exchanged among edge devices. In this paper, we consider a network of wireless devices sharing a common fading wireless channel for the deployment of FL. Each device holds a generally distinct training set, and communication typically takes place in a Device-to-Device (D2D) manner. In the ideal case in which all devices within communication range can communicate simultaneously and noiselessly, a standard protocol that is guaranteed to converge to an optimal solution of the global empirical risk minimization problem under convexity and connectivity assumptions is Decentralized Stochastic Gradient Descent (DSGD). DSGD integrates local SGD steps with periodic consensus averages that require communication between neighboring devices. In this paper, wireless protocols are proposed that implement DSGD by accounting for the presence of path loss, fading, blockages, and mutual interference. The proposed protocols are based on graph coloring for scheduling and on both digital and analog transmission strategies at the physical layer, with the latter leveraging over-the-air computing via sparsity-based recovery. 由于边缘设备之间交换了大量模型信息,因此通信是大规模 FL 的关键推动力。在本文中,我们考虑了一个无线设备网络,该网络共享一个共同的衰落无线信道来部署 FL。每个设备都拥有一个通常不同的训练集,并且通信通常以设备到设备 (D2D) 的方式进行。在通信范围内的所有设备可以同时无噪声地通信的理想情况下,保证在凸性和连通性假设下收敛到全局经验风险最小化问题的最优解的标准协议是分散随机梯度下降(DSGD)。 DSGD 将本地 SGD 步骤与需要相邻设备之间通信的周期性共识平均值集成在一起。在本文中,提出了通过考虑路径损耗、衰落、阻塞和相互干扰的存在来实现 DSGD 的无线协议。所提出的协议基于用于调度的图形着色以及物理层的数字和模拟传输策略,后者通过基于稀疏性的恢复利用空中计算。 ↩
-
We propose a similarity-based graph neural network model, SGNN, which captures the structure information of nodes precisely in node classification tasks. It also takes advantage of the thought of federated learning to hide the original information from different data sources to protect users' privacy. We use deep graph neural network with convolutional layers and dense layers to classify the nodes based on their structures and features. 我们提出了一种基于相似度的图神经网络模型 SGNN,它在节点分类任务中精确地捕获节点的结构信息。 它还利用联邦学习的思想,对不同数据源隐藏原始信息,保护用户隐私。 我们使用具有卷积层和密集层的深度图神经网络根据节点的结构和特征对节点进行分类。 ↩
-
To detect financial misconduct, A methodology to share key information across institutions by using a federated graph learning platform that enables us to build more accurate machine learning models by leveraging federated learning and also graph learning approaches. We demonstrated that our federated model outperforms local model by 20% with the UK FCA TechSprint data set. 为了检测财务不当行为,一种通过使用联邦图学习平台在机构间共享关键信息的方法,使我们能够通过利用联邦学习和图学习方法来构建更准确的机器学习模型。 我们证明了我们的联邦模型在英国 FCA TechSprint 数据集上的性能优于本地模型 20%。 ↩
-
We aim at solving a binary supervised classification problem to predict hospitalizations for cardiac events using a distributed algorithm. We focus on the soft-margin l1-regularized sparse Support Vector Machine (sSVM) classifier. We develop an iterative cluster Primal Dual Splitting (cPDS) algorithm for solving the large-scale sSVM problem in a decentralized fashion. 我们的目标是解决一个二元监督分类问题,以使用分布式算法预测心脏事件的住院情况。 我们专注于软边距 l1 正则化稀疏支持向量机 (sSVM) 分类器。 我们开发了一种迭代集群 Primal Dual Splitting (cPDS) 算法,用于以分散的方式解决大规模 sSVM 问题。 ↩
-
TBC ↩
-
FGML a comprehensive review of the literature in Federated Graph Machine Learning. FGML 对图联邦机器学习的文献进行了全面回顾的综述文章。 ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
Federated functional gradient boosting (FFGB). Under appropriate assumptions on the weak learning oracle, the FFGB algorithm is proved to efficiently converge to certain neighborhoods of the global optimum. The radii of these neighborhoods depend upon the level of heterogeneity measured via the total variation distance and the much tighter Wasserstein-1 distance, and diminish to zero as the setting becomes more homogeneous. ↩
-
Federated Random Forests (FRF) models, focusing particularly on the heterogeneity within and between datasets. 联邦随机森林(FRF)模型,特别关注数据集内部和之间的异质性。 ↩
-
Federated Forest , which is a lossless learning model of the traditional random forest method, i.e., achieving the same level of accuracy as the non-privacy-preserving approach. Based on it, we developed a secure cross-regional machine learning system that allows a learning process to be jointly trained over different regions’ clients with the same user samples but different attribute sets, processing the data stored in each of them without exchanging their raw data. A novel prediction algorithm was also proposed which could largely reduce the communication overhead. Federated Forest ,是传统随机森林方法的无损学习模型,即达到与非隐私保护方法相同的准确度。在此基础上,我们开发了一个安全的跨区域机器学习系统,允许在具有相同用户样本但不同属性集的不同区域的客户端上联邦训练一个学习过程,处理存储在每个客户端的数据,而不交换其原始数据。还提出了一种新的预测算法,可以在很大程度上减少通信开销。 ↩
-
Fed-GBM (Federated Gradient Boosting Machines), a cost-effective collaborative learning framework, consisting of two-stage voting and node-level parallelism, to address the problems in co-modelling for Non-intrusive load monitoring (NILM). Fed-GBM(联邦梯度提升)是一个具有成本效益的协作学习框架,由两阶段投票和节点级并行组成,用于解决非侵入式负载监测(NILM)中的协同建模问题。 ↩
-
A novel federated ensemble classification algorithm for horizontally partitioned data, namely Boosting-based Federated Random Forest (BOFRF), which not only increases the predictive power of all participating sites, but also provides significantly high improvement on the predictive power of sites having unsuccessful local models. We implement a federated version of random forest, which is a well-known bagging algorithm, by adapting the idea of boosting to it. We introduce a novel aggregation and weight calculation methodology that assigns weights to local classifiers based on their classification performance at each site without increasing the communication or computation cost. 一种针对横向划分数据的新型联邦集成分类算法,即基于 Boosting 的联邦随机森林 (BOFRF),它不仅提高了所有参与站点的预测能力,而且显着提高了局部模型不成功的站点的预测能力 . 我们通过采用 boosting 的思想来实现一个联邦版本的随机森林,这是一种众所周知的 bagging 算法。 我们引入了一种新颖的聚合和权重计算方法,该方法根据本地分类器在每个站点的分类性能为它们分配权重,而不会增加通信或计算成本。 ↩
-
Efficient FL for GBDT (eFL-Boost), which minimizes accuracy loss, communication costs, and information leakage. The proposed scheme focuses on appropriate allocation of local computation (performed individually by each organization) and global computation (performed cooperatively by all organizations) when updating a model. A tree structure is determined locally at one of the organizations, and leaf weights are calculated globally by aggregating the local gradients of all organizations. Specifically, eFL-Boost requires only three communications per update, and only statistical information that has low privacy risk is leaked to other organizations. 针对GBDT的高效FL(eFL-Boost),将精度损失、通信成本和信息泄露降到最低。该方案的重点是在更新模型时适当分配局部计算(由每个组织单独执行)和全局计算(由所有组织合作执行)。树状结构由其中一个组织在本地确定,而叶子的权重则由所有组织的本地梯度汇总后在全局计算。具体来说,eFL-Boost每次更新只需要三次通信,而且只有具有低隐私风险的统计信息才会泄露给其他组织。 ↩
-
Random Forest Based on Federated Learning for Intrusion Detection 使用联邦随机森林做入侵检测 ↩
-
A federated decision tree-based random forest algorithm where a small number of organizations or industry companies collaboratively build models. 一个基于联邦决策树的随机森林算法,由少数组织或行业公司合作建立模型。 ↩
-
VF2Boost, a novel and efficient vertical federated GBDT system. First, to handle the deficiency caused by frequent mutual-waiting in federated training, we propose a concurrent training protocol to reduce the idle periods. Second, to speed up the cryptography operations, we analyze the characteristics of the algorithm and propose customized operations. Empirical results show that our system can be 12.8-18.9 times faster than the existing vertical federated implementations and support much larger datasets. VF2Boost,一个新颖而高效的纵向联邦GBDT系统。首先,为了处理联邦训练中频繁的相互等待造成的缺陷,我们提出了一个并发训练协议来减少空闲期。第二,为了加快密码学操作,我们分析了算法的特点,并提出了定制的操作。经验结果表明,我们的系统可以比现有的纵向联邦实现快12.8-18.9倍,并支持更大的数据集。我们将保证公平性的客户选择建模为一个Lyapunov优化问题,然后提出一个基于C2MAB的方法来估计每个客户和服务器之间的模型交换时间,在此基础上,我们设计了一个保证公平性的算法,即RBCS-F来解决问题。 ↩ ↩2
-
SecureBoost, a novel lossless privacy-preserving tree-boosting system. SecureBoost first conducts entity alignment under a privacy-preserving protocol and then constructs boosting trees across multiple parties with a carefully designed encryption strategy. This federated learning system allows the learning process to be jointly conducted over multiple parties with common user samples but different feature sets, which corresponds to a vertically partitioned data set. SecureBoost是一种新型的无损隐私保护的提升树系统。SecureBoost首先在一个保护隐私的协议下进行实体对齐,然后通过精心设计的加密策略在多方之间构建提升树。这种联邦学习系统允许学习过程在具有共同用户样本但不同特征集的多方联邦进行,这相当于一个纵向分割的数据集。 ↩
-
A Blockchain-Based Federated Forest for SDN-Enabled In-Vehicle Network Intrusion Detection System 基于区块链的联邦森林用于支持SDN的车载网络入侵检测系统 ↩
-
An improved gradient boosting decision tree (GBDT) federated ensemble learning method is proposed, which takes the average gradient of similar samples and its own gradient as a new gradient to improve the accuracy of the local model. Different ensemble learning methods are used to integrate the parameters of the local model, thus improving the accuracy of the updated global model. 提出了一种改进的梯度提升决策树(GBDT)联邦集合学习方法,该方法将相似样本的平均梯度和自身的梯度作为新的梯度来提高局部模型的精度。采用不同的集合学习方法来整合局部模型的参数,从而提高更新的全局模型的精度。 ↩
-
Decision tree ensembles such as gradient boosting decision trees (GBDT) and random forest are widely applied powerful models with high interpretability and modeling efficiency. However, state-of-art framework for decision tree ensembles in vertical federated learning frameworks adapt anonymous features to avoid possible data breaches, makes the interpretability of the model compromised. Fed-EINI make a problem analysis about the necessity of disclosure meanings of feature to Guest Party in vertical federated learning. Fed-EINI protect data privacy and allow the disclosure of feature meaning by concealing decision paths and adapt a communication-efficient secure computation method for inference outputs. 集成决策树,如梯度提升决策树(GBDT)和随机森林,是被广泛应用的强大模型,具有较高的可解释性和建模效率。然而,纵向联邦学习框架中的决策树群的先进框架适应匿名特征以避免可能的数据泄露,使得模型的可解释性受到影响。Fed-EINI对纵向联邦学习中向客人方披露特征含义的必要性进行了问题分析。Fed-EINI通过隐藏决策路径来保护数据隐私,并允许披露特征含义,同时为推理输出适应一种通信效率高的安全计算方法。 ↩
-
Propose a new tree-boosting method, named Gradient Boosting Forest (GBF), where the single decision tree in each gradient boosting round of GBDT is replaced by a set of trees trained from different subsets of the training data (referred to as a forest), which enables training GBDT in Federated Learning scenarios. We empirically prove that GBF outperforms the existing GBDT methods in both centralized (GBF-Cen) and federated (GBF-Fed) cases. 我们提出了一种新的提升树方法,即梯度提升森林(GBF),在GBDT的每一轮梯度提升中,单一的决策树被一组从训练数据的不同子集训练出来的树(称为森林)所取代,这使得在联邦学习场景中可以训练GBDT。我们通过经验证明,GBF在集中式(GBF-Cen)和联邦式(GBF-Fed)情况下都优于现有的GBDT方法。 ↩
-
A privacy-preserving framework using Mondrian k-anonymity with decision trees for the horizontally partitioned data. 使用Mondrian K-匿名化的隐私保护框架,对横向分割的数据使用决策树建模。 ↩
-
AF-DNDF which extends DNDF (Deep Neural Decision Forests, which unites classification trees with the representation learning functionality from deep convolutional neural networks) with an asynchronous federated aggregation protocol. Based on the local quality of each classification tree, our architecture can select and combine the optimal groups of decision trees from multiple local devices. AF-DNDF,它将DNDF(深度神经决策森林,它将分类树与深度卷积神经网络的表征学习功能结合起来)与一个异步的联邦聚合协议进行了扩展。基于每个分类树的本地质量,我们的架构可以选择和组合来自多个本地设备的最佳决策树组。 ↩
-
Differential Privacy is used to obtain theoretically sound privacy guarantees against such inference attacks by noising the exchanged update vectors. However, the added noise is proportional to the model size which can be very large with modern neural networks. This can result in poor model quality. Compressive sensing is used to reduce the model size and hence increase model quality without sacrificing privacy. 差分隐私是通过对交换的更新向量进行噪声处理来获得理论上合理的隐私保证,以抵御这种推断攻击。然而,增加的噪声与模型大小成正比,而现代神经网络的模型大小可能非常大。这可能会导致模型质量不佳。压缩感知被用来减少模型大小,从而在不牺牲隐私的情况下提高模型质量。 ↩
-
A practical horizontal federated environment with relaxed privacy constraints. In this environment, a dishonest party might obtain some information about the other parties' data, but it is still impossible for the dishonest party to derive the actual raw data of other parties. Specifically, each party boosts a number of trees by exploiting similarity information based on locality-sensitive hashing. 一个具有宽松隐私约束的实用横向联邦环境。在这种环境中,不诚实的一方可能会获得其他方数据的一些信息,但不诚实的一方仍然不可能得出其他方的实际原始数据。具体来说,每一方通过利用基于位置敏感散列的相似性信息来提升一些树。 ↩ ↩2
-
Pivot, a novel solution for privacy preserving vertical decision tree training and prediction, ensuring that no intermediate information is disclosed other than those the clients have agreed to release (i.e., the final tree model and the prediction output). Pivot does not rely on any trusted third party and provides protection against a semi-honest adversary that may compromise m - 1 out of m clients. We further identify two privacy leakages when the trained decision tree model is released in plain-text and propose an enhanced protocol to mitigate them. The proposed solution can also be extended to tree ensemble models, e.g., random forest (RF) and gradient boosting decision tree (GBDT) by treating single decision trees as building blocks. Pivot,一个用于保护隐私的纵向决策树训练和预测的新颖解决方案,确保除了客户同意发布的信息(即最终的树模型和预测输出)外,没有任何中间信息被披露。Pivot不依赖任何受信任的第三方,并提供保护,防止半诚实的对手可能损害m个客户中的m-1。我们进一步确定了当训练好的决策树模型以明文形式发布时的两个隐私泄漏,并提出了一个增强的协议来缓解这些泄漏。通过将单个决策树作为构建块,所提出的解决方案也可以扩展到集成树模型,如随机森林(RF)和梯度提升决策树(GBDT)。 ↩ ↩2
-
FEDXGB, a federated extreme gradient boosting (XGBoost) scheme supporting forced aggregation. First, FEDXGB involves a new HE(homomorphic encryption) based secure aggregation scheme for FL. Then, FEDXGB extends FL to a new machine learning model by applying the secure aggregation scheme to the classification and regression tree building of XGBoost. FEDXGB,一个支持强制聚合的联邦极端梯度提升(XGBoost)方案。首先,FEDXGB涉及一个新的基于HE(同态加密)的FL的安全聚合方案。然后,FEDXGB通过将安全聚合方案应用于XGBoost的分类和回归树构建,将FL扩展到一个新的机器学习模型。 ↩
-
FedCluster, a novel federated learning framework with improved optimization efficiency, and investigate its theoretical convergence properties. The FedCluster groups the devices into multiple clusters that perform federated learning cyclically in each learning round. FedCluster是一个具有改进的优化效率的新型联邦学习框架,并研究其理论收敛特性。FedCluster将设备分成多个集群,在每一轮学习中循环进行联邦学习。 ↩
-
The proposed FL-XGBoost can train a sensitive task to be solved among different entities without revealing their own data. The proposed FL-XGBoost can achieve significant reduction in the number of communications between entities by exchanging decision tree models. FL-XGBoost可以训练一个敏感的任务,在不同的实体之间解决,而不透露他们自己的数据。所提出的FL-XGBoost可以通过交换决策树模型实现实体之间通信数量的大幅减少。 ↩
-
A bandwidth slicing algorithm in PONs(passive optical network) is introduced for efficient FL, in which bandwidth is reserved for the involved ONUs(optical network units) collaboratively and mapped into each polling cycle. 在PONs(无源光网络)中引入了一种高效的FL算法,即为参与的ONU(光网络单元)协同保留带宽并映射到每个轮询周期。 ↩
-
A distributed machine learning system based on local random forest algorithms created with shared decision trees through the blockchain. 一个基于本地随机森林算法的分布式机器学习系统通过区块链创建了共享决策树。 ↩
-
A decentralized redundant n-Cayley tree (DRC-tree) for federated learning. Explore the hierarchical structure of the n-Cayley tree to enhance the redundancy rate in federated learning to mitigate the impact of stragglers. In the DRC- tree structure, the fusion node serves as the root node, while all the worker devices are the intermediate tree nodes and leaves that formulated through a distributed message passing interface. the redundancy of workers is constructed layer by layer with a given redundancy branch degree. 用于联邦学习的分散冗余n-Cayley树(DRC-tree)。探索n-Cayley树的分层结构,提高联邦学习中的冗余率,以减轻散兵游勇的影响。在DRC-树结构中,融合节点作为根节点,而所有客户端设备是通过分布式消息传递接口制定的中间树节点和叶子。客户端的冗余度是以给定的冗余分支度逐层构建的。 ↩
-
Fed-sGBM, a federated soft gradient boosting machine framework applicable on the streaming data. Compared with traditional gradient boosting methods, where base learners are trained sequentially, each base learner in the proposed framework can be efficiently trained in a parallel and distributed fashion. Fed-sGBM是一个适用于流数据的联邦软梯度提升机框架。与传统的梯度提升方法相比,传统的梯度提升方法中的基础学习器是按顺序训练的,而拟议的框架中的每个基础学习器可以以平行和分布的方式有效地训练。 ↩
-
Deep neural decision forests (DNDF), combine the divide-and-conquer principle together with the property representation learning. By parameterizing the probability distributions in the prediction nodes of the forest, and include all trees of the forest in the loss function, a gradient of the whole forest can be computed which some/several federated learning algorithms utilize. 深度神经决策森林(DNDF),将分治策略与属性表示学习结合起来。通过对森林预测节点的概率分布进行参数化,并将森林中的所有树木纳入损失函数中,可以计算出整个森林的梯度,一些/一些联邦学习算法利用了这一梯度。 ↩
-
TBC ↩
-
TBC ↩
-
A hybrid federated learning framework based on XGBoost, for distributed power prediction from real-time external features. In addition to introducing boosted trees to improve accuracy and interpretability, we combine horizontal and vertical federated learning, to address the scenario where features are scattered in local heterogeneous parties and samples are scattered in various local districts. Moreover, we design a dynamic task allocation scheme such that each party gets a fair share of information, and the computing power of each party can be fully leveraged to boost training efficiency. 一个基于XGBoost的混合联邦学习框架,用于从实时外部特征进行分布式电力预测。除了引入提升树来提高准确性和可解释性,我们还结合了横向和纵向的联邦学习,以解决特征分散在本地异质方和样本分散在不同本地区的情况。此外,我们设计了一个动态的任务分配方案,使每一方都能获得公平的信息份额,并能充分利用每一方的计算能力来提高训练效率。 ↩
-
Efficient XGBoost vertical federated learning. we proposed a novel batch homomorphic encryption method to cut the cost of encryption-related computation and transmission in nearly half. This is achieved by encoding the first-order derivative and the second-order derivative into a single number for encryption, ciphertext transmission, and homomorphic addition operations. The sum of multiple first-order derivatives and second-order derivatives can be simultaneously decoded from the sum of encoded values. 高效的XGBoost纵向联邦学习。我们提出了一种新颖的批量同态加密方法,将加密相关的计算和传输成本减少了近一半。这是通过将一阶导数和二阶导数编码为一个数字来实现的,用于加密、密码文本传输和同态加法操作。多个一阶导数和二阶导数的总和可以同时从编码值的总和中解密。 ↩
-
TBC ↩
-
TBC ↩
-
Two variants of federated XGBoost with privacy guarantee: FedXGBoost-SMM and FedXGBoost-LDP. Our first protocol FedXGBoost-SMM deploys enhanced secure matrix multiplication method to preserve privacy with lossless accuracy and lower overhead than encryption-based techniques. Developed independently, the second protocol FedXGBoost-LDP is heuristically designed with noise perturbation for local differential privacy. 两种具有隐私保护的联邦XGBoost的变体:FedXGBoost-SMM和FedXGBoost-LDP。FedXGBoost-SMM部署了增强的安全矩阵乘法,以无损的精度和低于基于加密的技术的开销来保护隐私。第二个协议FedXGBoost-LDP以启发式方法设计的,带有噪声扰动,用于保护局部差分隐私。 ↩
-
MP-FedXGB, a lossless multi-party federated XGB learning framework is proposed with a security guarantee, which reshapes the XGBoost's split criterion calculation process under a secret sharing setting and solves the leaf weight calculation problem by leveraging distributed optimization. MP-FedXGB是一个无损的多方联邦XGB学习框架,它在秘密共享的环境下重塑了XGBoost的分割准则计算过程,并通过利用分布式优化解决了叶子权重计算问题。 ↩
-
FederBoost for private federated learning of gradient boosting decision trees (GBDT). It supports running GBDT over both horizontally and vertically partitioned data. The key observation for designing FederBoost is that the whole training process of GBDT relies on the order of the data instead of the values. Consequently, vertical FederBoost does not require any cryptographic operation and horizontal FederBoost only requires lightweight secure aggregation. FederBoost用于梯度提升决策树(GBDT)的私有联邦学习。它支持在横向和纵向分区的数据上运行GBDT。设计FederBoost的关键是,GBDT的整个训练过程依赖于数据的顺序而不是数值。因此,纵向FederBoost不需要任何加密操作,横向FederBoost只需要轻量级的安全聚合。 ↩
-
A horizontal federated XGBoost algorithm to solve the federated anomaly detection problem, where the anomaly detection aims to identify abnormalities from extremely unbalanced datasets and can be considered as a special classification problem. Our proposed federated XGBoost algorithm incorporates data aggregation and sparse federated update processes to balance the tradeoff between privacy and learning performance. In particular, we introduce the virtual data sample by aggregating a group of users' data together at a single distributed node. 一个横向联邦XGBoost算法来解决联邦异常检测问题,其中异常检测的目的是从极不平衡的数据集中识别异常,可以被视为一个特殊的分类问题。我们提出的联邦XGBoost算法包含了数据聚合和稀疏的联邦更新过程,以平衡隐私和学习性能之间的权衡。特别是,我们通过将一组用户的数据聚集在一个分布式节点上,引入虚拟数据样本。 ↩
-
With the advent of deep learning and increasing use of brain MRIs, a great amount of interest has arisen in automated anomaly segmentation to improve clinical workflows; however, it is time-consuming and expensive to curate medical imaging. FedDis to collaboratively train an unsupervised deep convolutional autoencoder on 1,532 healthy magnetic resonance scans from four different institutions, and evaluate its performance in identifying pathologies such as multiple sclerosis, vascular lesions, and low- and high-grade tumours/glioblastoma on a total of 538 volumes from six different institutions. To mitigate the statistical heterogeneity among different institutions, we disentangle the parameter space into global (shape) and local (appearance). Four institutes jointly train shape parameters to model healthy brain anatomical structures. Every institute trains appearance parameters locally to allow for client-specific personalization of the global domain-invariant features. 随着深度学习的出现和脑 MRI 的使用越来越多,人们对自动异常分割以改善临床工作流程产生了极大的兴趣。然而,管理医学成像既耗时又昂贵。 FedDis 将在来自四个不同机构的 1,532 次健康磁共振扫描上协作训练一个无监督的深度卷积自动编码器,并评估其在总共 538 个机构中识别多发性硬化症、血管病变以及低级别和高级别肿瘤/胶质母细胞瘤等病理的性能来自六个不同机构的卷。为了减轻不同机构之间的统计异质性,我们将参数空间分解为全局(形状)和局部(外观)。四个研究所联邦训练形状参数来模拟健康的大脑解剖结构。每个机构都在本地训练外观参数,以允许对全局域不变特征进行客户特定的个性化。 ↩
-
This progress has emphasized that, from model development to model deployment, data play central roles. In this Review, we provide a data-centric view of the innovations and challenges that are defining ML for healthcare. We discuss deep generative models and federated learning as strategies to augment datasets for improved model performance, as well as the use of the more recent transformer models for handling larger datasets and enhancing the modelling of clinical text. We also discuss data-focused problems in the deployment of ML, emphasizing the need to efficiently deliver data to ML models for timely clinical predictions and to account for natural data shifts that can deteriorate model performance. 这一进展强调,从模型开发到模型部署,数据发挥着核心作用。在这篇评论中,我们提供了一个以数据为中心的观点,即定义医疗保健的ML的创新和挑战。我们讨论了深度生成模型和联合学习,作为增强数据集以提高模型性能的策略,以及使用最近的转化器模型来处理更大的数据集和加强临床文本的建模。我们还讨论了ML部署中以数据为重点的问题,强调需要有效地将数据交付给ML模型,以便及时进行临床预测,并考虑到可能恶化模型性能的自然数据转移。 ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
A novel methodology coined FedPop by recasting personalised FL into the population modeling paradigm where clients' models involve fixed common population parameters and random individual ones, aiming at explaining data heterogeneity. To derive convergence guarantees for our scheme, we introduce a new class of federated stochastic optimisation algorithms which relies on Markov chain Monte Carlo methods. Compared to existing personalised FL methods, the proposed methodology has important benefits: it is robust to client drift, practical for inference on new clients, and above all, enables uncertainty quantification under mild computational and memory overheads. We provide non-asymptotic convergence guarantees for the proposed algorithms. 一种新的方法被称为FedPop,它将个性化的FL重塑为群体建模范式,客户的模型涉及固定的共同群体参数和随机的个体参数,旨在解释数据的异质性。为了得出我们方案的收敛保证,我们引入了一类新的联合随机优化算法,该算法依赖于马尔科夫链蒙特卡洛方法。与现有的个性化FL方法相比,所提出的方法具有重要的优势:它对客户的漂移是稳健的,对新客户的推断是实用的,最重要的是,在温和的计算和内存开销下,可以进行不确定性量化。我们为提议的算法提供了非渐进收敛保证。 ↩
-
We aim to formally represent this problem and address these fairness issues using concepts from co-operative game theory and social choice theory. We model the task of learning a shared predictor in the federated setting as a fair public decision making problem, and then define the notion of core-stable fairness: Given N agents, there is no subset of agents S that can benefit significantly by forming a coalition among themselves based on their utilities UN and US. Core-stable predictors are robust to low quality local data from some agents, and additionally they satisfy Proportionality (each agent gets at least 1/n fraction of the best utility that she can get from any predictor) and Pareto-optimality (there exists no model that can increase the utility of an agent without decreasing the utility of another), two well sought-after fairness and efficiency notions within social choice. We then propose an efficient federated learning protocol CoreFed to optimize a core stable predictor. CoreFed determines a core-stable predictor when the loss functions of the agents are convex. CoreFed also determines approximate core-stable predictors when the loss functions are not convex, like mooth neural networks. We further show the existence of core-stable predictors in more general settings using Kakutani's fixed point theorema. 我们旨在利用合作博弈理论和社会选择理论的概念来正式表示这个问题并解决这些公平性问题。我们把在联盟环境中学习共享预测器的任务建模为一个公平的公共决策问题,然后定义核心稳定的公平概念。给定N个代理人,没有一个代理人的子集S可以通过在他们之间形成一个基于他们的效用UN和US的联盟而显著受益。核心稳定的预测器对一些代理人的低质量本地数据具有鲁棒性,此外,它们还满足Proportionality(每个代理人从任何预测器中得到的最佳效用的至少1/n部分)和Pareto-optimality(不存在任何模型可以在增加一个代理人的效用的同时不减少另一个代理人的效用),这是社会选择中两个广受欢迎的公平和效率概念。然后,我们提出了一个高效的联合学习协议CoreFed来优化一个核心稳定的预测器。当代理人的损失函数是凸的时候,CoreFed确定了一个核心稳定的预测器。当损失函数不是凸的时候,CoreFed也能确定近似的核心稳定预测器,比如摩斯神经网络。我们利用Kakutani的固定点定理,进一步证明了在更一般的情况下核心稳定预测器的存在。 ↩
-
The Yeo-Johnson (YJ) transformation is a standard parametrized per-feature unidimensional transformation often used to Gaussianize features in machine learning. In this paper, we investigate the problem of applying the YJ transformation in a cross-silo Federated Learning setting under privacy constraints. For the first time, we prove that the YJ negative log-likelihood is in fact convex, which allows us to optimize it with exponential search. We numerically show that the resulting algorithm is more stable than the state-of-the-art approach based on the Brent minimization method. Building on this simple algorithm and Secure Multiparty Computation routines, we propose SECUREFEDYJ, a federated algorithm that performs a pooled-equivalent YJ transformation without leaking more information than the final fitted parameters do. Quantitative experiments on real data demonstrate that, in addition to being secure, our approach reliably normalizes features across silos as well as if data were pooled, making it a viable approach for safe federated feature Gaussianization. Yeo-Johnson(YJ)变换是一个标准的参数化的每特征单维变换,通常用于机器学习的高斯化特征。在本文中,我们研究了在隐私约束下,在跨语境的联合学习环境中应用YJ转换的问题。我们首次证明了YJ负对数可能性实际上是凸的,这使我们能够用指数搜索来优化它。我们在数值上表明,所得到的算法比基于布伦特最小化方法的最先进的方法更稳定。在这个简单的算法和安全多方计算程序的基础上,我们提出了SECUREFEDYJ,这是一个联合算法,在不泄露比最终拟合参数更多信息的情况下执行集合等效的YJ转换。在真实数据上的定量实验表明,除了安全之外,我们的方法还能可靠地将不同筒仓的特征归一化,就像数据被汇集起来一样,这使得它成为安全联合特征高斯化的可行方法。 ↩
-
A simple yet effective model-heterogeneous FL method named FedRolex to tackle this constraint. Unlike the model-homogeneous scenario, the fundamental challenge of model heterogeneity in FL is that different parameters of the global model are trained on heterogeneous data distributions. FedRolex addresses this challenge by rolling the submodel in each federated iteration so that the parameters of the global model are evenly trained on the global data distribution across all devices, making it more akin to model-homogeneous training. 一个名为FedRolex的简单而有效的模型-异质性FL方法来解决这一约束。与模型同质化的情况不同,FL中模型异质化的根本挑战是全局模型的不同参数是在异质的数据分布上训练的。FedRolex通过在每个联合迭代中滚动子模型来解决这个挑战,这样全局模型的参数就会在所有设备的全局数据分布上均匀地训练,使其更类似于模型同质化训练。 ↩
-
The data-owning clients may drop out of the training process arbitrarily. These characteristics will significantly degrade the training performance. This paper proposes a Dropout-Resilient Secure Federated Learning (DReS-FL) framework based on Lagrange coded computing (LCC) to tackle both the non-IID and dropout problems. The key idea is to utilize Lagrange coding to secretly share the private datasets among clients so that the effects of non-IID distribution and client dropouts can be compensated during local gradient computations. To provide a strict privacy guarantee for local datasets and correctly decode the gradient at the server, the gradient has to be a polynomial function in a finite field, and thus we construct polynomial integer neural networks (PINNs) to enable our framework. Theoretical analysis shows that DReS-FL is resilient to client dropouts and provides privacy protection for the local datasets. 拥有数据的客户可能会任意退出训练过程。这些特点将大大降低训练性能。本文提出了一个基于拉格朗日编码计算(LCC)的辍学弹性安全联合学习(DReS-FL)框架来解决非IID和辍学问题。其关键思想是利用拉格朗日编码在客户之间秘密分享私人数据集,以便在本地梯度计算中补偿非IID分布和客户退出的影响。为了给本地数据集提供严格的隐私保证并在服务器上正确解码梯度,梯度必须是有限域中的多项式函数,因此我们构建了多项式整数神经网络(PINNs)来实现我们的框架。理论分析表明,DReS-FL对客户端辍学有弹性,并为本地数据集提供隐私保护。 ↩
-
Since in real-world applications the data may contain bias on fairness-sensitive features (e.g., gender), VFL models may inherit bias from training data and become unfair for some user groups. However, existing fair machine learning methods usually rely on the centralized storage of fairness-sensitive features to achieve model fairness, which are usually inapplicable in federated scenarios. In this paper, we propose a fair vertical federated learning framework (FairVFL), which can improve the fairness of VFL models. The core idea of FairVFL is to learn unified and fair representations of samples based on the decentralized feature fields in a privacy-preserving way. Specifically, each platform with fairness-insensitive features first learns local data representations from local features. Then, these local representations are uploaded to a server and aggregated into a unified representation for the target task. In order to learn a fair unified representation, we send it to each platform storing fairness-sensitive features and apply adversarial learning to remove bias from the unified representation inherited from the biased data. Moreover, for protecting user privacy, we further propose a contrastive adversarial learning method to remove private information from the unified representation in server before sending it to the platforms keeping fairness-sensitive features. 由于在现实世界的应用中,数据可能包含对公平性敏感的特征(如性别)的偏见,VFL模型可能会从训练数据中继承偏见,并对一些用户群体变得不公平。然而,现有的公平机器学习方法通常依赖于公平性敏感特征的集中存储来实现模型的公平性,这在联盟场景中通常是不适用的。在本文中,我们提出了一个公平的垂直联合学习框架(FairVFL),它可以提高VFL模型的公平性。FairVFL的核心思想是以保护隐私的方式,基于分散的特征场学习统一的、公平的样本表示。具体来说,每个具有公平性不敏感特征的平台首先从本地特征中学习本地数据表示。然后,这些本地表征被上传到服务器上,并聚合成目标任务的一个统一表征。为了学习一个公平的统一表征,我们将其发送到每个存储公平性敏感特征的平台,并应用对抗性学习来消除从有偏见的数据中继承的统一表征的偏见。此外,为了保护用户的隐私,我们进一步提出了一种对比性的对抗性学习方法,在将统一表示发送到保存公平性敏感特征的平台之前,从服务器中去除私人信息。 ↩
-
We study distributed optimization methods based on the local training (LT) paradigm, i.e., methods which achieve communication efficiency by performing richer local gradient-based training on the clients before (expensive) parameter averaging is allowed to take place. While these methods were first proposed about a decade ago, and form the algorithmic backbone of federated learning, there is an enormous gap between their practical performance, and our theoretical understanding. Looking back at the progress of the field, we identify 5 generations of LT methods: 1) heuristic, 2) homogeneous, 3) sublinear, 4) linear, and 5) accelerated. The 5th generation was initiated by the ProxSkip method of Mishchenko et al. (2022), whose analysis provided the first theoretical confirmation that LT is a communication acceleration mechanism. Inspired by this recent progress, we contribute to the 5th generation of LT methods by showing that it is possible to enhance ProxSkip further using variance reduction. While all previous theoretical results for LT methods ignore the cost of local work altogether, and are framed purely in terms of the number of communication rounds, we construct a method that can be substantially faster in terms of the total training time than the state-of-the-art method ProxSkip in theory and practice in the regime when local computation is sufficiently expensive. We characterize this threshold theoretically, and confirm our theoretical predictions with empirical results. Our treatment of variance reduction is generic, and can work with a large number of variance reduction techniques, which may lead to future applications in the future. 我们研究了基于局部训练(LT)范式的分布式优化方法,即在允许进行(昂贵的)参数平均化之前,通过在客户端进行更丰富的基于局部梯度的训练来实现通信效率。虽然这些方法是在大约十年前首次提出的,并且形成了联合学习的算法支柱,但是在它们的实际性能和我们的理论理解之间存在着巨大的差距。回顾该领域的进展,我们确定了5代LT方法:1)启发式,2)同质式,3)亚线性,4)线性,以及5)加速式。第5代是由Mishchenko等人(2022)的ProxSkip方法发起的,其分析首次从理论上证实了LT是一种通信加速机制。受这一最新进展的启发,我们为第5代LT方法做出了贡献,表明有可能利用方差减少来进一步增强ProxSkip。虽然之前所有关于LT方法的理论结果都完全忽略了局部工作的成本,而仅仅是以通信轮数为框架,但我们构建了一种方法,在理论和实践中,当局部计算足够昂贵时,其总训练时间可以比最先进的方法ProxSkip快很多。我们从理论上描述了这个阈值,并通过经验结果证实了我们的理论预测。我们对方差减少的处理是通用的,可以与大量的方差减少技术一起工作,这可能导致未来的应用。 ↩
-
Vertical Federated Learning (VFL) methods are facing two challenges: (1) scalability when # participants grows to even modest scale and (2) diminishing return w.r.t. # participants: not all participants are equally important and many will not introduce quality improvement in a large consortium. Inspired by these two challenges, in this paper, we ask: How can we select l out of m participants, where l≪m , that are most important?We call this problem Vertically Federated Participant Selection, and model it with a principled mutual information-based view. Our first technical contribution is VF-MINE---a Vertically Federated Mutual INformation Estimator---that uses one of the most celebrated algorithms in database theory---Fagin's algorithm as a building block. Our second contribution is to further optimize VF-MINE to enable VF-PS, a group testing-based participant selection framework. 垂直联合学习(VFL)方法面临着两个挑战:(1)当参与者数量增长到一定规模时的可扩展性;(2)对参与者的回报递减:不是所有的参与者都同样重要,许多参与者不会在一个大型联盟中引入质量改进。受这两个挑战的启发,在本文中,我们问:我们如何从m个参与者中选择l个,其中l≪m,是最重要的。我们称这个问题为垂直联合参与者选择,并以基于相互信息的原则性观点为其建模。我们的第一个技术贡献是VF-MINE--一个垂直联合的相互信息估计器--它使用数据库理论中最著名的算法之一--Fagin的算法作为构建模块。我们的第二个贡献是进一步优化VF-MINE,以实现VF-PS,一个基于小组测试的参与者选择框架。 ↩
-
A novel two-stage Data-free One-Shot Federated Learning(DENSE) framework, which trains the global model by a data generation stage and a model distillation stage. DENSE is a practical one-shot FL method that can be applied in reality due to the following advantages:(1) DENSE requires no additional information compared with other methods (except the model parameters) to be transferred between clients and the server;(2) DENSE does not require any auxiliary dataset for training;(3) DENSE considers model heterogeneity in FL, i.e. different clients can have different model architectures. 一种新颖的两阶段无数据单次联合学习(DENSE)框架,它通过数据生成阶段和模型提炼阶段来训练全局模型。DENSE是一种实用的一次性FL方法,由于以下优点可以在现实中应用:(1)与其他方法相比,DENSE不需要在客户端和服务器之间传输额外的信息(除了模型参数);(2)DENSE不需要任何辅助数据集进行训练;(3)DENSE考虑了FL中的模型异质性,即不同客户端可以有不同的模型架构。 ↩
-
We study the problem of FAT(federated adversarial training) under label skewness, and firstly reveal one root cause of the training instability and natural accuracy degradation issues: skewed labels lead to non-identical class probabilities and heterogeneous local models. We then propose a Calibrated FAT (CalFAT) approach to tackle the instability issue by calibrating the logits adaptively to balance the classes. 我们研究了标签偏斜下的FAT(联合对抗训练)问题,首先揭示了训练不稳定和自然准确率下降问题的一个根本原因:偏斜的标签导致了非相同的类概率和异质的局部模型。然后,我们提出了一种校准的FAT(CalFAT)方法,通过自适应地校准对数来平衡类,来解决不稳定问题。 ↩
-
Federated min-max learning has received increasing attention in recent years thanks to its wide range of applications in various learning paradigms. We propose a new algorithmic framework called stochastic sampling averaging gradient descent ascent (SAGDA), which i) assembles stochastic gradient estimators from randomly sampled clients as control variates and ii) leverages two learning rates on both server and client sides. We show that SAGDA achieves a linear speedup in terms of both the number of clients and local update steps, which yields an O(ϵ−2) communication complexity that is orders of magnitude lower than the state of the art. Interestingly, by noting that the standard federated stochastic gradient descent ascent (FSGDA) is in fact a control-variate-free special version of SAGDA, we immediately arrive at an O(ϵ−2) communication complexity result for FSGDA. Therefore, through the lens of SAGDA, we also advance the current understanding on communication complexity of the standard FSGDA method for federated min-max learning. 近年来,由于其在各种学习范式中的广泛应用,联合最小-最大学习得到了越来越多的关注。我们提出了一个新的算法框架,称为随机抽样平均梯度下降上升法(SAGDA),它i)从随机抽样的客户端组装随机梯度估计器作为控制变量,ii)在服务器和客户端利用两个学习速率。我们表明,SAGDA在客户数量和局部更新步骤方面都实现了线性加速,这产生了O(ϵ-2)的通信复杂度,比目前的技术水平要低几个数量级。有趣的是,通过注意到标准联合随机梯度下降法(FSGDA)实际上是SAGDA的无控制变量的特殊版本,我们立即得出了FSGDA的O(ϵ-2)通信复杂度结果。因此,通过SAGDA的视角,我们也推进了目前对标准FSGDA方法的通信复杂度的理解,以实现联合的最小最大学习。 ↩
-
A key assumption in most existing works on FL algorithms' convergence analysis is that the noise in stochastic first-order information has a finite variance. Although this assumption covers all light-tailed (i.e., sub-exponential) and some heavy-tailed noise distributions (e.g., log-normal, Weibull, and some Pareto distributions), it fails for many fat-tailed noise distributions (i.e., heavier-tailed'' with potentially infinite variance) that have been empirically observed in the FL literature. To date, it remains unclear whether one can design convergent algorithms for FL systems that experience fat-tailed noise. This motivates us to fill this gap in this paper by proposing an algorithmic framework called FAT-Clipping (federated averaging with two-sided learning rates and clipping), which contains two variants: FAT-Clipping per-round (FAT-Clipping-PR) and FAT-Clipping per-iteration (FAT-Clipping-PI). 在大多数现有的关于FL算法收敛性分析的工作中,一个关键的假设是随机一阶信息中的噪声具有有限的方差。尽管这一假设涵盖了所有轻尾(即亚指数)和一些重尾噪声分布(如对数正态分布、Weibull分布和一些Pareto分布),但对于FL文献中实证观察到的许多肥尾噪声分布(即可能具有无限方差的重尾'')来说,它是失败的。到目前为止,我们还不清楚是否可以为经历肥尾噪声的FL系统设计收敛算法。这促使我们在本文中提出了一个名为FAT-Clipping(具有双面学习率和剪切的联合平均法)的算法框架来填补这一空白,该框架包含两个变体。FAT-Clipping per-round(FAT-Clipping-PR)和FAT-Clipping per-iteration(FAT-Clipping-PI)。 ↩
-
FedSubAvg, We study federated learning from the new perspective of feature heat, where distinct data features normally involve different numbers of clients, generating the differentiation of hot and cold features. Meanwhile, each client’s local data tend to interact with part of features, updating only the feature-related part of the full model, called a submodel. We further identify that the classical federated averaging algorithm (FedAvg) or its variants, which randomly selects clients to participate and uniformly averages their submodel updates, will be severely slowed down, because different parameters of the global model are optimized at different speeds. More specifically, the model parameters related to hot (resp., cold) features will be updated quickly (resp., slowly). We thus propose federated submodel averaging (FedSubAvg), which introduces the number of feature-related clients as the metric of feature heat to correct the aggregation of submodel updates. We prove that due to the dispersion of feature heat, the global objective is ill-conditioned, and FedSubAvg works as a suitable diagonal preconditioner. We also rigorously analyze FedSubAvg’s convergence rate to stationary points. 我们从特征热的新角度来研究联合学习,不同的数据特征通常涉及不同数量的客户端,产生了冷热特征的区分。同时,每个客户的本地数据往往与部分特征交互,只更新完整模型中与特征相关的部分,称为子模型。我们进一步确定,经典的联合平均算法(FedAvg)或其变体,即随机选择客户参与并统一平均他们的子模型更新,将被严重减慢,因为全局模型的不同参数是以不同的速度优化。更具体地说,与热(或冷)特征相关的模型参数将被快速(或缓慢)更新。因此,我们提出了联合子模型平均法(FedSubAvg),它引入了与特征相关的客户数量作为特征热度的度量,以修正子模型更新的聚合。我们证明,由于特征热度的分散,全局目标是无条件的,而FedSubAvg作为一个合适的对角线先决条件发挥作用。我们还严格分析了FedSubAvg对静止点的收敛率。 ↩
-
BooNTK, State-of-the-art federated learning methods can perform far worse than their centralized counterparts when clients have dissimilar data distributions. We show that this performance disparity can largely be attributed to optimization challenges presented by nonconvexity. Specifically, we find that the early layers of the network do learn useful features, but the final layers fail to make use of them. That is, federated optimization applied to this non-convex problem distorts the learning of the final layers. Leveraging this observation, we propose a Train-Convexify-Train (TCT) procedure to sidestep this issue: first, learn features using off-the-shelf methods (e.g., FedAvg); then, optimize a convexified problem obtained from the network's empirical neural tangent kernel approximation. 当客户具有不同的数据分布时,最先进的联合学习方法的表现会比集中式的对应方法差很多。我们表明,这种性能差异主要归因于非凸性带来的优化挑战。具体来说,我们发现网络的早期层确实学到了有用的特征,但最后一层却无法利用它们。也就是说,应用于这个非凸问题的联合优化扭曲了最终层的学习。利用这一观察,我们提出了一个Train-Convexify-Train(TCT)程序来回避这一问题:首先,使用现成的方法(如FedAvg)学习特征;然后,优化一个从网络的经验神经切线核近似中得到的凸化问题。 ↩
-
SoteriaFL, A unified framework that enhances the communication efficiency of private federated learning with communication compression. Exploiting both general compression operators and local differential privacy, we first examine a simple algorithm that applies compression directly to differentially-private stochastic gradient descent, and identify its limitations. We then propose a unified framework SoteriaFL for private federated learning, which accommodates a general family of local gradient estimators including popular stochastic variance-reduced gradient methods and the state-of-the-art shifted compression scheme. 具有通信压缩的增强私有联邦学习通信效率的统一框架。利用一般的压缩算子和局部差分隐私,我们首先研究了一种简单的直接将压缩应用于差分隐私随机梯度下降的算法,并指出其局限性。然后,我们提出了一个用于私有联邦学习的统一框架SoteriaFL,它包含了一个通用的局部梯度估计器家族,包括流行的随机方差减少梯度方法和最先进的移位压缩方案。 ↩
-
FILM, A novel attack method FILM (Federated Inversion attack for Language Models) for federated learning of language models---for the first time, we show the feasibility of recovering text from large batch sizes of up to 128 sentences. Different from image-recovery methods which are optimized to match gradients, we take a distinct approach that first identifies a set of words from gradients and then directly reconstructs sentences based on beam search and a prior-based reordering strategy. The key insight of our attack is to leverage either prior knowledge in pre-trained language models or memorization during training. Despite its simplicity, we demonstrate that FILM can work well with several large-scale datasets---it can extract single sentences with high fidelity even for large batch sizes and recover multiple sentences from the batch successfully if the attack is applied iteratively. 一种新颖的针对语言模型联合学习的攻击方法FILM (针对语言模型的联合反演攻击) - -首次展示了从多达128个句子的大批量文本中恢复文本的可行性。与为匹配梯度而优化的图像恢复方法不同,我们采取了一种独特的方法,首先从梯度中识别一组单词,然后根据光束搜索和基于先验的重新排序策略直接重建句子。我们攻击的关键见解是在预训练的语言模型中利用先验知识,或者在训练过程中进行记忆。尽管FILM简单,但我们证明了它可以在几个大规模数据集上很好地工作- -即使对于大批量的数据集,它也可以高保真地提取单个句子,如果迭代地应用攻击,它可以成功地从批处理中恢复多个句子。 ↩
-
FedPCL, A lightweight framework where clients jointly learn to fuse the representations generated by multiple fixed pre-trained models rather than training a large-scale model from scratch. This leads us to a more practical FL problem by considering how to capture more client-specific and class-relevant information from the pre-trained models and jointly improve each client's ability to exploit those off-the-shelf models. Here, we design a Federated Prototype-wise Contrastive Learning (FedPCL) approach which shares knowledge across clients through their class prototypes and builds client-specific representations in a prototype-wise contrastive manner. Sharing prototypes rather than learnable model parameters allows each client to fuse the representations in a personalized way while keeping the shared knowledge in a compact form for efficient communication. 一个轻量级的框架,客户共同学习融合多个固定的预训练模型所产生的表征,而不是从头开始训练一个大规模的模型。这将我们引向一个更实际的FL问题,即考虑如何从预训练的模型中获取更多特定于客户和与类相关的信息,并共同提高每个客户利用这些现成的模型的能力。在这里,我们设计了一个联合原型对比学习(FedPCL)的方法,通过客户的类别原型在客户之间分享知识,并以原型对比的方式建立客户的特定表征。分享原型而不是可学习的模型参数允许每个客户以个性化的方式融合表征,同时将共享的知识保持在一个紧凑的形式,以便有效沟通。 ↩
-
To achieve resource-adaptive federated learning, we introduce a simple yet effective mechanism, termed All-In-One Neural Composition, to systematically support training complexity-adjustable models with flexible resource adaption. It is able to efficiently construct models at various complexities using one unified neural basis shared among clients, instead of pruning the global model into local ones. The proposed mechanism endows the system with unhindered access to the full range of knowledge scattered across clients and generalizes existing pruning-based solutions by allowing soft and learnable extraction of low footprint models. 为了实现资源自适应的联邦学习,我们引入了一种简单而有效的机制,称为"一体式神经合成",以系统支持具有灵活资源自适应的训练复杂度可调模型。它能够使用客户机之间共享的一个统一神经基础在各种复杂情况下高效地构建模型,而不是将全局模型剪枝为局部模型。所提出的机制使系统能够不受阻碍地访问分散在客户端的所有知识,并通过允许对低足迹模型进行软和可学习的提取来推广现有的基于剪枝的解决方案。 ↩
-
Inspired by Bayesian hierarchical models, we develop a self-aware personalized FL method where each client can automatically balance the training of its local personal model and the global model that implicitly contributes to other clients' training. Such a balance is derived from the inter-client and intra-client uncertainty quantification. A larger inter-client variation implies more personalization is needed. Correspondingly, our method uses uncertainty-driven local training steps an aggregation rule instead of conventional local fine-tuning and sample size-based aggregation. 受贝叶斯层次模型的启发,我们开发了一种自感知的个性化FL方法,每个客户端可以自动平衡其本地个人模型和隐式贡献于其他客户端训练的全局模型的训练。这种平衡来自于客户端间和客户端内的不确定性量化。更大的客户间差异意味着更多的个性化需求。相应地,我们的方法使用不确定性驱动的局部训练步骤作为聚合规则,而不是传统的局部微调和基于样本量的聚合。 ↩
-
In this paper, we study a large-scale multi-agent minimax optimization problem, which models many interesting applications in statistical learning and game theory, including Generative Adversarial Networks (GANs). The overall objective is a sum of agents' private local objective functions. We first analyze an important special case, empirical minimax problem, where the overall objective approximates a true population minimax risk by statistical samples. We provide generalization bounds for learning with this objective through Rademacher complexity analysis. Then, we focus on the federated setting, where agents can perform local computation and communicate with a central server. Most existing federated minimax algorithms either require communication per iteration or lack performance guarantees with the exception of Local Stochastic Gradient Descent Ascent (SGDA), a multiple-local-update descent ascent algorithm which guarantees convergence under a diminishing stepsize. By analyzing Local SGDA under the ideal condition of no gradient noise, we show that generally it cannot guarantee exact convergence with constant stepsizes and thus suffers from slow rates of convergence. To tackle this issue, we propose FedGDA-GT, an improved Federated (Fed) Gradient Descent Ascent (GDA) method based on Gradient Tracking (GT). When local objectives are Lipschitz smooth and strongly-convex-strongly-concave, we prove that FedGDA-GT converges linearly with a constant stepsize to global ϵ-approximation solution with O(log(1/ϵ)) rounds of communication, which matches the time complexity of centralized GDA method. Finally, we numerically show that FedGDA-GT outperforms Local SGDA. 在本文中,我们研究了一个大规模的多代理最小优化问题,它模拟了统计学习和博弈论中许多有趣的应用,包括生成对抗网络(GANs)。总体目标是代理人的私有局部目标函数的总和。我们首先分析了一个重要的特例,即经验最小值问题,其中总体目标是通过统计样本逼近真实的群体最小值风险。我们通过Rademacher复杂度分析,为这个目标的学习提供泛化界线。然后,我们专注于联盟环境,其中代理可以执行本地计算并与中央服务器通信。大多数现有的联合最小化算法要么需要每次迭代都进行通信,要么缺乏性能保证,但本地随机梯度上升算法(SGDA)除外,它是一种多本地更新的下降上升算法,保证在步长减小的情况下收敛。通过在没有梯度噪声的理想条件下分析Local SGDA,我们发现一般来说它不能保证在恒定的步长下准确收敛,因此存在收敛速度慢的问题。为了解决这个问题,我们提出了FedGDA-GT,一种基于梯度跟踪(GT)的改进的联邦(Fed)梯度下降上升(GDA)方法。当局部目标是Lipschitz平滑和强凸-强凹时,我们证明FedGDA-GT以恒定的步长线性收敛到全局的ϵ近似解,只需O(log(1/ϵ)) 轮通信,这与集中式GDA方法的时间复杂度相符。最后,我们用数字表明,FedGDA-GT优于Local SGDA。 ↩
-
SemiFL to address the problem of combining communication efficient FL like FedAvg with Semi-Supervised Learning (SSL). In SemiFL, clients have completely unlabeled data and can train multiple local epochs to reduce communication costs, while the server has a small amount of labeled data. We provide a theoretical understanding of the success of data augmentation-based SSL methods to illustrate the bottleneck of a vanilla combination of communication efficient FL with SSL. To address this issue, we propose alternate training to 'fine-tune global model with labeled data' and 'generate pseudo-labels with global model.' SemiFL是为了解决像FedAvg这样的通信效率高的FL与半监督学习(SSL)相结合的问题。在SemiFL中,客户拥有完全未标记的数据,并且可以训练多个本地历时以减少通信成本,而服务器拥有少量的标记数据。我们对基于数据增强的SSL方法的成功提供了一个理论上的理解,以说明通信效率高的FL与SSL的虚构组合的瓶颈。为了解决这个问题,我们提出了 "用标签数据微调全局模型 "和 "用全局模型生成伪标签 "的替代训练。 ↩
-
This study starts from an analogy to continual learning and suggests that forgetting could be the bottleneck of federated learning. We observe that the global model forgets the knowledge from previous rounds, and the local training induces forgetting the knowledge outside of the local distribution. Based on our findings, we hypothesize that tackling down forgetting will relieve the data heterogeneity problem. To this end, we propose a novel and effective algorithm, Federated Not-True Distillation (FedNTD), which preserves the global perspective on locally available data only for the not-true classes. 这项研究从持续学习的类比开始,表明遗忘可能是联邦学习的瓶颈。我们观察到全局模型忘记了前几轮的知识,而本地训练会导致忘记本地分布之外的知识。基于我们的发现,我们假设处理遗忘会缓解数据异质性问题。为此,我们提出了一种新颖而有效的算法- -联邦非真实蒸馏( FedNTD ),它仅对非真实类保留本地可用数据的全局视角。 ↩
-
We propose a simple yet novel representation learning framework, namely FedSR, which enables domain generalization while still respecting the decentralized and privacy-preserving natures of this FL setting. Motivated by classical machine learning algorithms, we aim to learn a simple representation of the data for better generalization. In particular, we enforce an L2-norm regularizer on the representation and a conditional mutual information (between the representation and the data given the label) regularizer to encourage the model to only learn essential information (while ignoring spurious correlations such as the background). Furthermore, we provide theoretical connections between the above two objectives and representation alignment in domain generalization. 我们提出了一个简单但新颖的表示学习框架,即FedSR,它允许领域泛化,同时仍然尊重这种FL设置的去中心化和隐私保护性质。受经典机器学习算法的启发,我们旨在学习数据的简单表示以获得更好的泛化能力。特别地,我们在表示上强制一个L2范数正则化器和一个条件互信息(在给定标签的表示和数据之间)正则化器,以鼓励模型只学习基本信息(而忽略虚假的相关性,如背景)。此外,我们提供了上述两个目标与领域泛化中的表示对齐之间的理论联系。 ↩
-
In real-world federated learning scenarios, participants could have their own personalized labels which are incompatible with those from other clients, due to using different label permutations or tackling completely different tasks or domains. However, most existing FL approaches cannot effectively tackle such extremely heterogeneous scenarios since they often assume that (1) all participants use a synchronized set of labels, and (2) they train on the same tasks from the same domain. In this work, to tackle these challenges, we introduce Factorized-FL, which allows to effectively tackle label- and task-heterogeneous federated learning settings by factorizing the model parameters into a pair of rank-1 vectors, where one captures the common knowledge across different labels and tasks and the other captures knowledge specific to the task for each local model. Moreover, based on the distance in the client-specific vector space, Factorized-FL performs selective aggregation scheme to utilize only the knowledge from the relevant participants for each client. 在现实世界的联合学习场景中,由于使用不同的标签组合或处理完全不同的任务或领域,参与者可能有自己的个性化标签,而这些标签与其他客户的标签不兼容。然而,大多数现有的FL方法不能有效地处理这种极端异质的场景,因为它们通常假设(1)所有参与者使用同步的标签集,以及(2)他们在同一领域的相同任务上训练。在这项工作中,为了应对这些挑战,我们引入了Factorized-FL,它可以通过将模型参数分解为一对等级1的向量来有效地解决标签和任务异质的联合学习环境,其中一个捕捉不同标签和任务的共同知识,另一个捕捉每个本地模型的特定任务知识。此外,根据客户特定向量空间中的距离,Factorized-FL执行选择性聚合方案,只利用每个客户的相关参与者的知识。 ↩
-
We study federated contextual linear bandits, where M agents cooperate with each other to solve a global contextual linear bandit problem with the help of a central server. We consider the asynchronous setting, where all agents work independently and the communication between one agent and the server will not trigger other agents' communication. We propose a simple algorithm named FedLinUCB based on the principle of optimism. We prove that the regret of FedLinUCB is bounded by ˜O(d√∑Mm=1Tm) and the communication complexity is ˜O(dM2), where d is the dimension of the contextual vector and Tm is the total number of interactions with the environment by agent m. To the best of our knowledge, this is the first provably efficient algorithm that allows fully asynchronous communication for federated linear bandits, while achieving the same regret guarantee as in the single-agent setting.我们研究联邦式的上下文线性匪徒问题,其中M个代理相互协作,借助中心服务器解决一个全局的上下文线性匪徒问题。我们考虑异步设置,其中所有代理独立工作,并且一个代理与服务器之间的通信不会触发其他代理的通信。我们基于乐观原则提出了一个简单的算法FedLinUCB。我们证明了FedLinUCB的后悔度以˜O(d√∑Mm=1Tm)为界,通信复杂度为˜O(dM2),其中d是上下文向量的维数,Tm是代理m与环境交互的总数。据我们所知,这是第一个可证明有效的算法,允许联邦线性匪徒完全异步通信,同时实现与单代理设置中相同的遗憾保证。 ↩
-
Vertical federated learning (VFL), where parties share the same set of samples but only hold partial features, has a wide range of real-world applications. However, most existing studies in VFL disregard the record linkage” process. They design algorithms either assuming the data from different parties can be exactly linked or simply linking each record with its most similar neighboring record. These approaches may fail to capture the key features from other less similar records. Moreover, such improper linkage cannot be corrected by training since existing approaches provide no feedback on linkage during training. In this paper, we design a novel coupled training paradigm, FedSim, that integrates one-to-many linkage into the training process. Besides enabling VFL in many real-world applications with fuzzy identifiers, FedSim also achieves better performance in traditional VFL tasks. Moreover, we theoretically analyze the additional privacy risk incurred by sharing similarities. 纵向联邦学习(VFL),其中各方共享相同的样本集,但只保留部分特征,它有广泛的实际应用。然而,VFL中的大多数现有研究忽略了记录链接过程。他们设计算法,要么假设来自不同方的数据可以完全链接,要么简单地将每个记录与其最相似的相邻记录链接起来。这些方法可能无法从其他不太相似的记录中捕获关键特征。而且,这种不恰当的联结不能通过训练来纠正,因为现有方法在训练过程中没有提供关于联结的反馈。在本文中,我们设计了一种新的耦合训练范式FedSim,它将一对多连接集成到训练过程中。除了在许多具有模糊标识符的实际应用程序中启用VFL之外,FedSim还在传统的VFL任务中实现了更好的性能。此外,我们从理论上分析了共享相似性所带来的额外隐私风险。 ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
FedScale, a federated learning (FL) benchmarking suite with realistic datasets and a scalable runtime to enable reproducible FL research. FedScale是一个联邦学习(FL)基准测试套件,具有现实的数据集和可扩展的运行时间,以实现可重复的FL研究。 ↩ ↩2
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
CE propose the concept of benefit graph which describes how each client can benefit from collaborating with other clients and advance a Pareto optimization approach to identify the optimal collaborators. CE提出了利益图的概念,描述了每个客户如何从与其他客户的合作中获益,并提出了帕累托优化方法来确定最佳合作者。 ↩
-
SuPerFed, a personalized federated learning method that induces an explicit connection between the optima of the local and the federated model in weight space for boosting each other. SuPerFed,一种个性化联邦学习方法,该方法在本地模型和联邦模型的权重空间中诱导出一个明确的连接,以促进彼此的发展。 ↩
-
FedMSplit framework, which allows federated training over multimodal distributed data without assuming similar active sensors in all clients. The key idea is to employ a dynamic and multi-view graph structure to adaptively capture the correlations amongst multimodal client models. FedMSplit框架,该框架允许在多模态分布式数据上进行联邦训练,而不需要假设所有客户端都有类似的主动传感器。其关键思想是采用动态和多视图图结构来适应性地捕捉多模态客户模型之间的相关性。 ↩
-
Comm-FedBiO propose a learning-based reweighting approach to mitigate the effect of noisy labels in FL. Comm-FedBiO提出了一种基于学习的重加权方法,以减轻FL中噪声标签的影响。 ↩
-
FLDetector detects malicious clients via checking their model-updates consistency to defend against model poisoning attacks with a large number of malicious clients. FLDetector 通过检查其模型更新的一致性来检测恶意客户,以防御大量恶意客户的模型中毒攻击。 ↩
-
FedSVD, a practical lossless federated SVD method over billion-scale data, which can simultaneously achieve lossless accuracy and high efficiency. FedSVD,是一种实用的亿级数据上的无损联邦SVD方法,可以同时实现无损精度和高效率。 ↩
-
Federated Learning-to-Dispatch (Fed-LTD), a framework that allows effective order dispatching by sharing both dispatching models and decisions while providing privacy protection of raw data and high efficiency. 解决跨平台叫车问题,即多平台在不共享数据的情况下协同进行订单分配。 ↩
-
Felicitas is a distributed cross-device Federated Learning (FL) framework to solve the industrial difficulties of FL in large-scale device deployment scenarios. Felicitas是一个分布式的跨设备联邦学习(FL)框架,以解决FL在大规模设备部署场景中的工业困难。 ↩
-
InclusiveFL is to assign models of different sizes to clients with different computing capabilities, bigger models for powerful clients and smaller ones for weak clients. InclusiveFL 将不同大小的模型分配给具有不同计算能力的客户,较大的模型用于强大的客户,较小的用于弱小的客户。 ↩
-
FedAttack a simple yet effective and covert poisoning attack method on federated recommendation, core idea is using globally hardest samples to subvert model training. FedAttack是一种对联邦推荐的简单而有效的隐蔽中毒攻击方法,核心思想是利用全局最难的样本来颠覆模型训练。 ↩
-
PipAttack present a systematic approach to backdooring federated recommender systems for targeted item promotion. The core tactic is to take advantage of the inherent popularity bias that commonly exists in data-driven recommenders. PipAttack 提出了一种系统化的方法,为联邦推荐系统提供后门,以实现目标项目的推广。其核心策略是利用数据驱动的推荐器中普遍存在的固有的流行偏见。 ↩
-
Fed2, a feature-aligned federated learning framework to resolve this issue by establishing a firm structure-feature alignment across the collaborative models. Fed2是一个特征对齐的联邦学习框架,通过在协作模型之间建立牢固的结构-特征对齐来解决这个问题。 ↩
-
FedRS focus on a special kind of non-iid scene, i.e., label distribution skew, where each client can only access a partial set of the whole class set. Considering top layers of neural networks are more task-specific, we advocate that the last classification layer is more vulnerable to the shift of label distribution. Hence, we in-depth study the classifier layer and point out that the standard softmax will encounter several problems caused by missing classes. As an alternative, we propose “Restricted Softmax" to limit the update of missing classes’ weights during the local procedure. FedRS专注于一种特殊的非iid场景,即标签分布倾斜,每个客户端只能访问整个类集的部分集合。考虑到神经网络的顶层更具有任务针对性,我们主张最后一个分类层更容易受到标签分布偏移的影响。因此,我们深入研究了分类器层,并指出标准的softmax会遇到由缺失类引起的一些问题。作为一个替代方案,提出了 "限制性Softmax",以限制在本地程序中对缺失类的权重进行更新。 ↩
-
While adversarial learning is commonly used in centralized learning for mitigating bias, there are significant barriers when extending it to the federated framework. In this work, we study these barriers and address them by proposing a novel approach Federated Adversarial DEbiasing (FADE). FADE does not require users' sensitive group information for debiasing and offers users the freedom to opt-out from the adversarial component when privacy or computational costs become a concern. 虽然对抗性学习通常用于集中式学习以减轻偏见,但当把它扩展到联邦式框架中时,会有很大的障碍。 在这项工作中,我们研究了这些障碍,并通过提出一种新的方法 Federated Adversarial DEbiasing(FADE)来解决它们。FADE不需要用户的敏感群体信息来进行去偏,并且当隐私或计算成本成为一个问题时,用户可以自由地选择退出对抗性部分。 ↩
-
To address the challenges of communication and computation resource utilization, we propose an asynchronous stochastic quasi-Newton (AsySQN) framework for Vertical federated learning(VFL), under which three algorithms, i.e. AsySQN-SGD, -SVRG and -SAGA, are proposed. The proposed AsySQN-type algorithms making descent steps scaled by approximate (without calculating the inverse Hessian matrix explicitly) Hessian information convergence much faster than SGD-based methods in practice and thus can dramatically reduce the number of communication rounds. Moreover, the adopted asynchronous computation can make better use of the computation resource. We theoretically prove the convergence rates of our proposed algorithms for strongly convex problems. 为了解决通信和计算资源利用的挑战,我们提出了一个异步随机准牛顿(AsySQN)的纵和联邦学习VFL框架,在这个框架下,我们提出了三种算法,即AsySQN-SGD、-SVRG和-SAGA。所提出的AsySQN型算法使下降步骤按近似(不明确计算逆Hessian矩阵)Hessian信息收敛的速度比基于SGD的方法在实践中快得多,因此可以极大地减少通信轮数。此外,采用异步计算可以更好地利用计算资源。我们从理论上证明了我们提出的算法在强凸问题上的收敛率。 ↩
-
A simple yet effective algorithm, named Federated Learning on Medical Datasets using Partial Networks (FLOP), that shares only a partial model between the server and clients. 一种简单而有效的算法,被命名为使用部分网络的医学数据集的联邦学习(FLOP),该算法在服务器和客户之间只共享部分模型。 ↩
-
This paper have built a framework that enables Federated Learning (FL) for a small number of stakeholders. and described the framework architecture, communication protocol, and algorithms. 本文建立了一个框架,为少数利益相关者实现联邦学习(FL),并描述了框架架构、通信协议和算法。 ↩
-
A novel Federated Deep Knowledge Tracing (FDKT) framework to collectively train high-quality Deep Knowledge Tracing (DKT) models for multiple silos. 一个新颖的联邦深度知识追踪(FDKT)框架,为多个筒仓集体训练高质量的深度知识追踪(DKT)模型。 ↩
-
FedFast accelerates distributed learning which achieves good accuracy for all users very early in the training process. We achieve this by sampling from a diverse set of participating clients in each training round and applying an active aggregation method that propagates the updated model to the other clients. Consequently, with FedFast the users benefit from far lower communication costs and more accurate models that can be consumed anytime during the training process even at the very early stages. FedFast加速了分布式学习,在训练过程的早期为所有用户实现了良好的准确性。我们通过在每轮训练中从不同的参与客户中取样,并应用主动聚合方法,将更新的模型传播给其他客户来实现这一目标。因此,有了FedFast,用户可以从更低的通信成本和更准确的模型中受益,这些模型可以在训练过程中随时使用,即使是在最早期阶段。 ↩
-
FDSKL, a federated doubly stochastic kernel learning algorithm for vertically partitioned data. Specifically, we use random features to approximate the kernel mapping function and use doubly stochastic gradients to update the solutions, which are all computed federatedly without the disclosure of data. FDSKL,一个针对纵向分割数据的联邦双随机核学习算法。具体来说,我们使用随机特征来近似核映射函数,并使用双重随机梯度来更新解决方案,这些都是在不透露数据的情况下联邦计算的。 ↩
-
Federated Online Learning to Rank setup (FOLtR) where on-mobile ranking models are trained in a way that respects the users' privacy. FOLtR-ES that satisfies these requirement: (a) preserving the user privacy, (b) low communication and computation costs, (c) learning from noisy bandit feedback, and (d) learning with non-continuous ranking quality measures. A part of FOLtR-ES is a privatization procedure that allows it to provide ε-local differential privacy guarantees, i.e. protecting the clients from an adversary who has access to the communicated messages. This procedure can be applied to any absolute online metric that takes finitely many values or can be discretized to a finite domain. 联邦在线学习排名设置(FOLtR)中,移动端排名模型是以尊重用户隐私的方式来训练的。FOLtR-ES满足这些要求:(a)保护用户隐私,(b)低通信和计算成本,(c)从嘈杂的强盗反馈中学习,以及(d)用非连续的排名质量指标学习。FOLtR-ES的一部分是一个私有化程序,使其能够提供ε-local差异化的隐私保证,即保护客户不受能够接触到通信信息的对手的伤害。 这个程序可以应用于任何绝对在线度量,其取值有限,或者可以离散到一个有限域。 ↩
-
We are motivated to resolve the above issue by proposing a solution, referred to as PEA (Private, Efficient, Accurate), which consists of a secure differentially private stochastic gradient descent (DPSGD for short) protocol and two optimization methods. First, we propose a secure DPSGD protocol to enforce DPSGD, which is a popular differentially private machine learning algorithm, in secret sharing-based MPL frameworks. Second, to reduce the accuracy loss led by differential privacy noise and the huge communication overhead of MPL, we propose two optimization methods for the training process of MPL. 提出一个安全差分隐私随机梯度下降协议以在基于秘密共享的安全多方学习框架中实现差分隐私随机梯度下降算法。为了降低差分隐私带来的精度损失并提升安全多方学习的效率,从安全多方学习训练过程的角度提出了两项优化方法,多方可以在MPL模型训练过程中平衡。做到隐私、效率和准确性三者之间的权衡。 ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
LC-Fed propose a personalized federated framework with Local Calibration, to leverage the inter-site in-consistencies in both feature- and prediction- levels to boost the segmentation. LC-Fed提出了一个带有本地校准的个性化联邦学习框架,以利用特征和预测层面的站点间不一致来提高分割效果。 ↩
-
Models trained in federated settings often suffer from degraded performances and fail at generalizing, especially when facing heterogeneous scenarios. FedSAM investigate such behavior through the lens of geometry of the loss and Hessian eigenspectrum, linking the model's lack of generalization capacity to the sharpness of the solution. 联邦学习环境下训练的模型经常会出现性能下降和泛化失败的情况,特别是在面对异质场景时。FedSAM 通过损失和Hessian特征谱的几何角度来研究这种行为,将模型缺乏泛化能力与解决方案的锐度联系起来 ↩
-
ATPFL helps users federate multi-source trajectory datasets to automatically design and train a powerful TP model. ATPFL帮助用户联邦多源轨迹数据集,自动设计和训练强大的TP轨迹预测模型。 ↩
-
ViT-FL demonstrate that self-attention-based architectures (e.g., Transformers) are more robust to distribution shifts and hence improve federated learning over heterogeneous data. ViT-FL证明了基于自注意力机制架构(如 Transformers)对分布的转变更加稳健,从而改善了异构数据的联邦学习。 ↩
-
FedCorr, a general multi-stage framework to tackle heterogeneous label noise in FL, without making any assumptions on the noise models of local clients, while still maintaining client data privacy. FedCorr 一个通用的多阶段框架来处理FL中的异质标签噪声,不对本地客户的噪声模型做任何假设,同时仍然保持客户数据的隐私。 ↩
-
FedCor, an FL framework built on a correlation-based client selection strategy, to boost the convergence rate of FL. FedCor 一个建立在基于相关性的客户选择策略上的FL框架,以提高FL的收敛率。 ↩
-
A novel pFL training framework dubbed Layer-wised Personalized Federated learning (pFedLA) that can discern the importance of each layer from different clients, and thus is able to optimize the personalized model aggregation for clients with heterogeneous data. "层级个性化联邦学习"(pFedLA),它可以从不同的客户那里分辨出每一层的重要性,从而能够为拥有异质数据的客户优化个性化的模型聚合。 ↩
-
FedAlign rethinks solutions to data heterogeneity in FL with a focus on local learning generality rather than proximal restriction. 我们重新思考FL中数据异质性的解决方案,重点是本地学习的通用性(generality)而不是近似限制。 ↩
-
Position-Aware Neurons (PANs) , fusing position-related values (i.e., position encodings) into neuron outputs, making parameters across clients pre-aligned and facilitating coordinate-based parameter averaging. 位置感知神经元(PANs)将位置相关的值(即位置编码)融合到神经元输出中,使各客户的参数预先对齐,并促进基于坐标的参数平均化。 ↩
-
Federated semi-supervised learning (FSSL) aims to derive a global model by training fully-labeled and fully-unlabeled clients or training partially labeled clients. RSCFed presents a Random Sampling Consensus Federated learning, by considering the uneven reliability among models from fully-labeled clients, fully-unlabeled clients or partially labeled clients. 联邦半监督学习(FSSL)旨在通过训练有监督和无监督的客户或半监督的客户来得出一个全局模型。 随机抽样共识联邦学习,即RSCFed,考虑来自有监督的客户、无监督的客户或半监督的客户的模型之间不均匀的可靠性。 ↩
-
FCCL (Federated Cross-Correlation and Continual Learning) For heterogeneity problem, FCCL leverages unlabeled public data for communication and construct cross-correlation matrix to learn a generalizable representation under domain shift. Meanwhile, for catastrophic forgetting, FCCL utilizes knowledge distillation in local updating, providing inter and intra domain information without leaking privacy. FCCL(联邦交叉相关和持续学习)对于异质性问题,FCCL利用未标记的公共数据进行交流,并构建交叉相关矩阵来学习领域转移下的可泛化表示。同时,对于灾难性遗忘,FCCL利用局部更新中的知识提炼,在不泄露隐私的情况下提供域间和域内信息。 ↩
-
RHFL (Robust Heterogeneous Federated Learning) simultaneously handles the label noise and performs federated learning in a single framework. RHFL(稳健模型异构联邦学习),它同时处理标签噪声并在一个框架内执行联邦学习。 ↩
-
ResSFL, a Split Federated Learning Framework that is designed to be MI-resistant during training. ResSFL一个分割学习的联邦学习框架,它被设计成在训练期间可以抵抗MI模型逆向攻击。 Model Inversion (MI) attack 模型逆向攻击 。 ↩
-
FedDC propose a novel federated learning algorithm with local drift decoupling and correction. FedDC 一种带有本地漂移解耦和校正的新型联邦学习算法。 ↩
-
Global-Local Forgetting Compensation (GLFC) model, to learn a global class incremental model for alleviating the catastrophic forgetting from both local and global perspectives. 全局-局部遗忘补偿(GLFC)模型,从局部和全局的角度学习一个全局类增量模型来缓解灾难性的遗忘问题。 ↩
-
FedFTG, a data-free knowledge distillation method to fine-tune the global model in the server, which relieves the issue of direct model aggregation. FedFTG, 一种无数据的知识蒸馏方法来微调服务器中的全局模型,它缓解了直接模型聚合的问题。 ↩
-
DP-FedAvg+BLUR+LUS study the cause of model performance degradation in federated learning under user-level DP guarantee and propose two techniques, Bounded Local Update Regularization and Local Update Sparsification, to increase model quality without sacrificing privacy. DP-FedAvg+BLUR+LUS 研究了在用户级DP保证下联邦学习中模型性能下降的原因,提出了两种技术,即有界局部更新正则化和局部更新稀疏化,以提高模型质量而不牺牲隐私。 ↩
-
Generative Gradient Leakage (GGL) validate that the private training data can still be leaked under certain defense settings with a new type of leakage. 生成梯度泄漏(GGL)验证了在某些防御设置下,私人训练数据仍可被泄漏。 ↩
-
CD2-pFed, a novel Cyclic Distillation-guided Channel Decoupling framework, to personalize the global model in FL, under various settings of data heterogeneity. CD2-pFed,一个新的循环蒸馏引导的通道解耦框架,在各种数据异质性的设置下,在FL中实现全局模型的个性化。 ↩
-
FedSM propose a novel training framework to avoid the client drift issue and successfully close the generalization gap compared with the centralized training for medical image segmentation tasks for the first time. 新的训练框架FedSM,以避免客户端漂移问题,并首次成功地缩小了与集中式训练相比在医学图像分割任务中的泛化差距。 ↩
-
FL-MRCM propose a federated learning (FL) based solution in which we take advantage of the MR data available at different institutions while preserving patients' privacy. FL-MRCM 一个基于联邦学习(FL)的解决方案,其中我们利用了不同机构的MR数据,同时保护了病人的隐私。 ↩
-
MOON ↩
-
FedDG-ELCFS A novel problem setting of federated domain generalization (FedDG), which aims to learn a federated model from multiple distributed source domains such that it can directly generalize to unseen target domains. Episodic Learning in Continuous Frequency Space (ELCFS), for this problem by enabling each client to exploit multi-source data distributions under the challenging constraint of data decentralization. FedDG-ELCFS 联邦域泛化(FedDG)旨在从多个分布式源域中学习一个联邦模型,使其能够直接泛化到未见过的目标域中。连续频率空间中的偶发学习(ELCFS),使每个客户能够在数据分散的挑战约束下利用多源数据分布。 ↩
-
Soteria propose a defense against model inversion attack in FL, learning to perturb data representation such that the quality of the reconstructed data is severely degraded, while FL performance is maintained. Soteria 一种防御FL中模型反转攻击的方法,关键思想是学习扰乱数据表示,使重建数据的质量严重下降,而FL性能保持不变。 ↩
-
FedUFO a Unified Feature learning and Optimization objectives alignment method for non-IID FL. FedUFO 一种针对non IID FL的统一特征学习和优化目标对齐算法。 ↩
-
FedAD propose a new distillation-based FL frame-work that can preserve privacy by design, while also consuming substantially less network communication resources when compared to the current methods. FedAD 一个新的基于蒸馏的FL框架,它可以通过设计来保护隐私,同时与目前的方法相比,消耗的网络通信资源也大大减少 ↩
-
FedU a novel federated unsupervised learning framework. FedU 一个新颖的无监督联邦学习框架. ↩
-
FedUReID, a federated unsupervised person ReID system to learn person ReID models without any labels while preserving privacy. FedUReID,一个联邦的无监督人物识别系统,在没有任何标签的情况下学习人物识别模型,同时保护隐私。 ↩
-
Introduce two new large-scale datasets for species and landmark classification, with realistic per-user data splits that simulate real-world edge learning scenarios. We also develop two new algorithms (FedVC, FedIR) that intelligently resample and reweight over the client pool, bringing large improvements in accuracy and stability in training. 为物种和地标分类引入了两个新的大规模数据集,每个用户的现实数据分割模拟了真实世界的边缘学习场景。我们还开发了两种新的算法(FedVC、FedIR),在客户池上智能地重新取样和重新加权,在训练中带来了准确性和稳定性的巨大改进 ↩
-
InvisibleFL propose a privacy-preserving solution that avoids multimedia privacy leakages in federated learning. InvisibleFL 提出了一个保护隐私的解决方案,以避免联邦学习中的多媒体隐私泄漏。 ↩
-
FedReID implement federated learning to person re-identification and optimize its performance affected by statistical heterogeneity in the real-world scenario. FedReID 实现了对行人重识别任务的联邦学习,并优化了其在真实世界场景中受统计异质性影响的性能。 ↩
-
Due to the server-client communication and on-device computation bottlenecks, this paper explores whether the big language model can be achieved using cross-device federated learning. First, they investigate quantization and partial model training to address the per round communication and computation cost. Then, they study fast convergence techniques by reducing the number of communication rounds, using transfer learning and centralized pretraining methods. They demonstrated that these techniques, individually or in combination, can scale to larger models in cross-device federated learning. 由于通讯和计算资源受限,他们研究是否能在跨设备联邦学习中训练参数较多的模型,如21M的Transformer, 20.2M的Conformer。首先,他们调查了量化、部分训练技术来减少通讯和计算成本;其次,他们研究快速收敛技术通过减少通讯轮次,运用迁移学习和Centralized pretraining技术。他们的研究表明,运用上述技术,或这些技术的组合,可以在跨设备联邦学习中扩展到更大的模型。 ↩
-
Communication cost is the largest barrier to the wider adoption of federated learning. This paper addresses this issue by investigating a family of new gradient compression strategies, including static compression, time-varying compression and K-subspace compression. They call it intrinsic gradient compression algorithms. These three gradient compression algorithms can be applied to different levels of bandwidth scenarios and can be used in combination in special scenarios.Moreover, they provide theoretical guarantees on the performance. They train big models with 100M parameters compared to current state-of-the-art gradient compression methods (e.g. FetchSGD). 通讯成本是联邦学习大规模部署面临的最大阻碍。这篇文章研究一系列新的梯度压缩策略来减轻这一挑战,包括static compression, time-varying compression and K-subspace compression,他们称之为intrinstic gradient compression algorighms. 这三种梯度压缩算法可应用于不同级别带宽的场景,在特殊的场景也可以组合使用。而且,他们提供了理论分析保证。他们训练了100M参数的大模型,与其他梯度压缩方法(如FetchSGD)相比,达到SOTA. ↩
-
Inspired by Bayesian hierarchical models, this paper investigates how to achieve better personalized federated learning by balancing local model improvement and global model tuning. They develop Act-PerFL, a self-aware personalized FL method where leveraging local training and global aggregation via inter- and intra-client uncertainty quantification. Specifically, ActPerFL adaptively adjusts local training steps with automated hyper-parameter selection and performs uncertainty-weighted global aggregation (Non-sample size based weighted average) . 受贝叶斯分层模型的启发,本文研究如何通过平衡本地模型和全局模型实现更好的个性化联邦学习。他们提出了ActPerFL,利用客户间和客户内部的不确定性量化来指导本地训练和全局聚合。具体来说,ActPerFL通过自动超参数选择自适应地调整本次训练次数,并执行不确定性加权全局聚合(非基于样本数量的带权平均)。 ↩
-
This paper present a benchmarking framework for evaluating federated learning methods on four common formulations of NLP tasks: text classification, sequence tagging, question answering, and seq2seq generation. 联邦学习在NLP领域的一个基准框架,提供常见的联邦学习算法实现(FedAvg、FedProx、FedOPT),支持四种常见NLP任务(文本分类、序列标记、问答、seq2seq)的对比。 ↩
-
In realistic human-computer interaction, there are usually many noisy user feedback signals. This paper investigates whether federated learning can be trained directly based on positive and negative user feedback. They show that, under mild to moderate noise conditions, incorporating feedback improves model performance over self-supervised baselines.They also study different levels of noise hoping to mitigate the impact of user feedback noise on model performance. 在现实的人机交互中,通常有很多带噪声的用户反馈信号。本文研究是否能直接基于积极和消极的用户反馈来进行联邦学习训练。他们表明,在轻度至中度噪声条件下,与自监督基准相比,结合不同反馈可以提高模型性能。他们还对不同程度的噪声展开研究,希望能减轻用户反馈噪声对模型性能的影响。 ↩
-
Due to the real-world limitations of centralized training, when training mixed-domain translation models with federated learning, this paper finds that the global aggregation strategy of federated learning can effectively aggregate information from different domains, so that NMT (neural machine translation) can benefit from federated learning. At the same time, they propose a novel and practical solution to reduce the communication bandwidth. Specifically, they design Dynamic Pulling, which pulls only one type of high volatility tensor in each round of communication. 由于中心式训练在现实世界存在诸多限制,在用联邦学习训练mixed-domain translation models时候,本文发现联邦学习的全局聚合策略可以有效融合来自不同领域的信息,使得NMT(neural machine translation)可以从联邦学习中受益。同时由于通信瓶颈,他们提出一种新颖且实用的方案来降低通信带宽。具体来说,他们设计了 Dynamic Pulling, 在每轮通信中只拉取一种类型的高波动张量。 ↩
-
TBC ↩
-
In this perspective paper we study the effect of non independent and identically distributed (non-IID) data on federated online learning to rank (FOLTR) and chart directions for future work in this new and largely unexplored research area of Information Retrieval. 在这篇前瞻论文中,我们研究了非独立和相同分布(非IID)数据对联邦在线学习排名(FOLTR)的影响,并为这个新的、基本上未被开发的信息检索研究领域的未来工作指明了方向。 ↩
-
The cross-domain recommendation problem is formalized under a decentralized computing environment with multiple domain servers. And we identify two key challenges for this setting: the unavailability of direct transfer and the heterogeneity of the domain-specific user representations. We then propose to learn and maintain a decentralized user encoding on each user's personal space. The optimization follows a variational inference framework that maximizes the mutual information between the user's encoding and the domain-specific user information from all her interacted domains. 跨域推荐问题在具有多个域服务器的去中心化计算环境下被形式化。我们确定了这种情况下的两个关键挑战:直接传输的不可用性和特定领域用户表征的异质性。然后,我们建议在每个用户的个人空间上学习和维护一个分散的用户编码。优化遵循一个变分推理框架,使用户的编码和来自她所有互动领域的特定用户信息之间的互信息最大化。 ↩
-
Under some circumstances, the private data can be reconstructed from the model parameters, which implies that data leakage can occur in FL.In this paper, we draw attention to another risk associated with FL: Even if federated algorithms are individually privacy-preserving, combining them into pipelines is not necessarily privacy-preserving. We provide a concrete example from genome-wide association studies, where the combination of federated principal component analysis and federated linear regression allows the aggregator to retrieve sensitive patient data by solving an instance of the multidimensional subset sum problem. This supports the increasing awareness in the field that, for FL to be truly privacy-preserving, measures have to be undertaken to protect against data leakage at the aggregator. 在某些情况下,私人数据可以从模型参数中重建,这意味着在联邦学习中可能发生数据泄漏。 在本文中,我们提请注意与FL相关的另一个风险。即使联邦算法是单独保护隐私的,将它们组合成管道也不一定是保护隐私的。我们提供了一个来自全基因组关联研究的具体例子,其中联邦主成分分析和联邦线性回归的组合允许聚合器通过解决多维子集和问题的实例来检索敏感的病人数据。这支持了该领域日益增长的意识,即为了使FL真正保护隐私,必须采取措施防止聚合器的数据泄漏。 ↩
-
The federated cross-modal retrieval (FedCMR), which learns the model with decentralized multi-modal data. 联邦跨模式检索(FedCMR),它用分散的多模式数据学习模型。 ↩
-
A federated matrix factorization (MF) framework, named meta matrix factorization (MetaMF) for rating prediction (RP) for mobile environments. 一个联邦矩阵分解(MF)框架,命名为元矩阵分解(MetaMF),用于移动环境的评级预测(RP)。 ↩
-
We design and develop distributed Skellam mechanism DSM, a novel solution for enforcing differential privacy on models built through an MPC-based federated learning process. Compared to existing approaches, DSM has the advantage that its privacy guarantee is independent of the dimensionality of the gradients; further, DSM allows tight privacy accounting due to the nice composition and sub-sampling properties of the Skellam distribution, which are key to enforce differential privacy on models built through an MPC-based federated learning process. 我们设计并开发了分布式Skellam机制DSM,这是一种新的解决方案,用于在基于MPC的联邦学习过程构建的模型上强制实现差分隐私。与现有方法相比,DSM的优势在于其隐私保护独立于梯度的维度;此外,由于Skellam分布的良好组成和子采样特性,DSM允许进行严格的隐私计算,这对于通过基于MPC的联邦学习过程建立的模型实施差分隐私是关键。 ↩
-
CELU-VFL, a novel and efficient Vertical federated learning (VFL) training framework that exploits the local update technique to reduce the cross-party communication rounds. CELU-VFL caches the stale statistics and reuses them to estimate model gradients without exchanging the ad hoc statistics. Significant techniques are proposed to improve the convergence performance. First, to handle the stochastic variance problem, we propose a uniform sampling strategy to fairly choose the stale statistics for local updates. Second, to harness the errors brought by the staleness, we devise an instance weighting mechanism that measures the reliability of the estimated gradients. Theoretical analysis proves that CELU-VFL achieves a similar sub-linear convergence rate as vanilla VFL training but requires much fewer communication rounds. CELU-VFL,一种新颖高效的纵向联邦学习 (VFL) 训练框架,它利用本地更新技术来减少跨方通信轮次。 CELU-VFL 缓存过时的统计数据并重用它们来估计模型梯度,而无需交换临时统计数据。 提出了重要的技术来提高收敛性能。 首先,为了处理随机方差问题,我们提出了一种统一的抽样策略来公平地选择用于局部更新的陈旧统计数据。 其次,为了利用过时带来的误差,我们设计了一种实例加权机制来衡量估计梯度的可靠性。 理论分析证明,CELU-VFL 实现了与普通 VFL 训练相似的亚线性收敛速度,但需要的通信轮数要少得多。 ↩
-
FedTSC, a novel federated learning (FL) system for interpretable time series classification (TSC). FedTSC is an FL-based TSC solution that makes a great balance among security, interpretability, accuracy, and efficiency. We achieve this by firstextending the concept of FL to consider both stronger security and model interpretability. Then, we propose three novel TSC methods based on explainable features to deal with the challengeable FL problem. To build the model in the FL setting, we propose several security protocols that are well optimized by maximally reducing the bottlenecked communication complexity. We build the FedTSC system based on such a solution, and provide the user Sklearn-like Python APIs for practical utility. FedTSC,一种用于可解释时间序列分类 (TSC) 的新型联邦学习 (FL) 系统。 FedTSC 是基于 FL 的 TSC 解决方案,在安全性、可解释性、准确性和效率之间取得了很好的平衡。 我们通过首先扩展 FL 的概念来考虑更强的安全性和模型可解释性来实现这一点。 然后,我们提出了三种基于可解释特征的新型 TSC 方法来处理具有挑战性的 FL 问题。 为了在 FL 设置中构建模型,我们提出了几种安全协议,这些协议通过最大限度地降低瓶颈通信复杂性而得到了很好的优化。 我们基于这样的解决方案构建了 FedTSC 系统,并为用户提供了类似于 Sklearn 的 Python API 以供实用。 ↩
-
TBC ↩
-
Federated Learning (FL) is an emerging framework for distributed processing of large data volumes by edge devices subject to limited communication bandwidths, heterogeneity in data distributions and computational resources, as well as privacy considerations. In this paper, we introduce a new FL protocol termed FedADMM based on primal-dual optimization. The proposed method leverages dual variables to tackle statistical heterogeneity, and accommodates system heterogeneity by tolerating variable amount of work performed by clients. FedADMM maintains identical communication costs per round as FedAvg/Prox, and generalizes them via the augmented Lagrangian. A convergence proof is established for nonconvex objectives, under no restrictions in terms of data dissimilarity or number of participants per round of the algorithm. We demonstrate the merits through extensive experiments on real datasets, under both IID and non-IID data distributions across clients. FedADMM consistently outperforms all baseline methods in terms of communication efficiency, with the number of rounds needed to reach a prescribed accuracy reduced by up to 87%. The algorithm effectively adapts to heterogeneous data distributions through the use of dual variables, without the need for hyperparameter tuning, and its advantages are more pronounced in large-scale systems. 联邦学习 (FL) 是一种新兴框架,用于边缘设备分布式处理大数据量,受限于有限的通信带宽、数据分布和计算资源的异构性以及隐私考虑。在本文中,我们介绍了一种基于原始对偶优化的称为 FedADMM 的新 FL 协议。所提出的方法利用双变量来解决统计异质性,并通过容忍客户执行的可变工作量来适应系统异质性。 FedADMM 保持与 FedAvg/Prox 相同的每轮通信成本,并通过增强的拉格朗日量对其进行推广。为非凸目标建立了收敛证明,不受数据差异或每轮算法参与者数量的限制。我们在跨客户端的 IID 和非 IID 数据分布下,通过对真实数据集的广泛实验证明了这些优点。 FedADMM 在通信效率方面始终优于所有基线方法,达到规定精度所需的轮数减少了高达 87%。该算法通过使用对偶变量有效适应异构数据分布,无需超参数调优,在大规模系统中优势更加明显。 ↩
-
The existing FL frameworks usually suffer from the difficulties of resource limitation and edge heterogeneity. Herein, we design and implement FedMP, an efficient FL framework through adaptive model pruning. We theoretically analyze the impact of pruning ratio on model training performance, and propose to employ a Multi-Armed Bandit based online learning algorithm to adaptively determine different pruning ratios for heterogeneous edge nodes, even without any prior knowledge of their computation and communication capabilities. With adaptive model pruning, FedMP can not only reduce resource consumption but also achieve promising accuracy. To prevent the diverse structures of pruned models from affecting the training convergence, we further present a new parameter synchronization scheme, called Residual Recovery Synchronous Parallel (R2SP), and provide a theoretical convergence guarantee. Extensive experiments on the classical models and datasets demonstrate that FedMP is effective for different heterogeneous scenarios and data distributions, and can provide up to 4.1× speedup compared to the existing FL methods.现有的 FL 框架通常存在资源限制和边缘异构的困难。在这里,我们通过自适应模型修剪设计并实现了一个高效的 FL 框架 FedMP。我们从理论上分析了剪枝率对模型训练性能的影响,并提出采用基于多臂老虎机的在线学习算法来自适应地确定异构边缘节点的不同剪枝率,即使对它们的计算和通信能力没有任何先验知识。通过自适应模型修剪,FedMP 不仅可以减少资源消耗,而且可以实现有希望的准确性。为了防止剪枝模型的多种结构影响训练收敛,我们进一步提出了一种新的参数同步方案,称为残差恢复同步并行(R2SP),并提供了理论上的收敛保证。对经典模型和数据集的大量实验表明,FedMP 对于不同的异构场景和数据分布是有效的,与现有的 FL 方法相比,可以提供高达 4.1 倍的加速。 ↩
-
A key and common challenge on distributed databases is the heterogeneity of the data distribution among the parties. The data of different parties are usually non-independently and identically distributed (i.e., non-IID). There have been many FL algorithms to address the learning effectiveness under non-IID data settings. However, there lacks an experimental study on systematically understanding their advantages and disadvantages, as previous studies have very rigid data partitioning strategies among parties, which are hardly representative and thorough. In this paper, to help researchers better understand and study the non-IID data setting in federated learning, we propose comprehensive data partitioning strategies to cover the typical non-IID data cases. Moreover, we conduct extensive experiments to evaluate state-of-the-art FL algorithms. We find that non-IID does bring signifificant challenges in learning accuracy of FL algorithms, and none of the existing state-of-the-art FL algorithms outperforms others in all cases. Our experiments provide insights for future studies of addressing the challenges in “data silos”. 分布式数据库的一个关键和常见挑战是各方之间数据分布的异质性。不同各方的数据通常是非独立同分布的(即非IID)。已经有许多FL算法来解决在非IID数据设置下的学习有效性。然而,由于以往的研究具有非常僵硬单一的数据划分策略,难以具有代表性和彻底,因此缺乏系统理解其优缺点的实验研究。在本文中,为了帮助研究者更好地理解和研究联邦学习中的非IID数据设置,我们提出了综合的数据划分策略来覆盖典型的非IID数据案例。此外,我们还进行了广泛的实验来评估最先进的FL算法。我们发现,非IID确实在FL算法的学习准确性方面带来了重大挑战,而且现有的最先进的FL算法在所有情况下都没有一种优于其他算法。我们的实验为未来研究解决“数据竖井”中的挑战提供了见解。 ↩
-
To approach the challenges of non-IID data and limited communication resource raised by the emerging federated learning (FL) in mobile edge computing (MEC), we propose an efficient framework, called FedMigr, which integrates a deep reinforcement learning (DRL) based model migration strategy into the pioneer FL algorithm FedAvg. According to the data distribution and resource constraints, our FedMigr will intelligently guide one client to forward its local model to another client after local updating, rather than directly sending the local models to the server for global aggregation as in FedAvg. Intuitively, migrating a local model from one client to another is equivalent to training it over more data from different clients, contributing to alleviating the influence of non-IID issue. We prove that FedMigr can help to reduce the parameter divergences between different local models and the global model from a theoretical perspective, even over local datasets with non-IID settings. Extensive experiments on three popular benchmark datasets demonstrate that FedMigr can achieve an average accuracy improvement of around 13%, and reduce bandwidth consumption for global communication by 42% on average, compared with the baselines. 为了应对移动边缘计算 (MEC) 中新兴的联邦学习 (FL) 带来的非 IID 数据和有限通信资源的挑战,我们提出了一个名为 FedMigr 的高效框架,该框架集成了基于深度强化学习 (DRL) 的模型迁移策略进入先驱 FL 算法 FedAvg。根据数据分布和资源限制,我们的 FedMigr 会智能地引导一个客户端在本地更新后将其本地模型转发给另一个客户端,而不是像 FedAvg 那样直接将本地模型发送到服务器进行全局聚合。直观地说,将本地模型从一个客户端迁移到另一个客户端相当于在来自不同客户端的更多数据上对其进行训练,有助于减轻非 IID 问题的影响。我们证明 FedMigr 从理论角度可以帮助减少不同局部模型和全局模型之间的参数差异,即使在具有非 IID 设置的局部数据集上也是如此。在三个流行的基准数据集上进行的大量实验表明,与基线相比,FedMigr 可以实现约 13% 的平均准确度提升,并将全局通信的带宽消耗平均减少 42%。 ↩
-
The federated learning paradigm allows several data owners to contribute to a machine learning task without exposing their potentially sensitive data. We focus on cumulative reward maximization in Multi-Armed Bandits (MAB), a classical reinforcement learning model for decision making under uncertainty. We demonstrate Samba, a generic framework for Secure federAted Multi-armed BAndits. The demonstration platform is a Web interface that simulates the distributed components of Samba, and which helps the data scientist to configure the end-to-end workflow of deploying a federated MAB algorithm. The user-friendly interface of Samba, allows the users to examine the interaction between three key dimensions of federated MAB: cumulative reward, computation time, and security guarantees. We demonstrate Samba with two real-world datasets: Google Local Reviews and Steam Video Game. 联邦学习允许多个数据所有者为机器学习任务做出贡献,而不会暴露他们潜在的敏感数据。我们专注于多臂老虎机(MAB)中的累积奖励最大化,这是一种用于在不确定性下进行决策的经典强化学习模型。我们演示了 Samba,这是一个用于安全联合多臂强盗的通用框架。该演示平台是一个模拟 Samba 分布式组件的 Web 界面,可帮助数据科学家配置部署联合 MAB 算法的端到端工作流程。 Samba 的用户友好界面允许用户检查联合 MAB 的三个关键维度之间的交互:累积奖励、计算时间和安全保证。我们使用两个真实数据集演示 Samba:Google 本地评论和 Steam 视频游戏。 ↩
-
Federated Recommendation (FR) has received considerable popularity and attention in the past few years. In FR, for each user, its feature vector and interaction data are kept locally on its own client thus are private to others. Without the access to above information, most existing poisoning attacks against recommender systems or federated learning lose validity. Benifiting from this characteristic, FR is commonly considered fairly secured. However, we argue that there is still possible and necessary security improvement could be made in FR. To prove our opinion, in this paper we present FedRecAttack, a model poisoning attack to FR aiming to raise the exposure ratio of target items. In most recommendation scenarios, apart from private user-item interactions (e.g., clicks, watches and purchases), some interactions are public (e.g., likes, follows and comments). Motivated by this point, in FedRecAttack we make use of the public interactions to approximate users' feature vectors, thereby attacker can generate poisoned gradients accordingly and control malicious users to upload the poisoned gradients in a well-designed way. To evaluate the effectiveness and side effects of FedRecAttack, we conduct extensive experiments on three real-world datasets of different sizes from two completely different scenarios. Experimental results demonstrate that our proposed FedRecAttack achieves the state-of-the-art effectiveness while its side effects are negligible. Moreover, even with small proportion (3%) of malicious users and small proportion (1%) of public interactions, FedRecAttack remains highly effective, which reveals that FR is more vulnerable to attack than people commonly considered. 联邦推荐(FR)在过去几年中受到了相当大的欢迎和关注。在 FR 中,对于每个用户,其特征向量和交互数据都本地保存在自己的客户端上,因此对其他人来说是私有的。如果无法访问上述信息,大多数现有的针对推荐系统或联邦学习的中毒攻击都会失去有效性。得益于这一特性,FR 通常被认为是相当安全的。然而,我们认为在 FR 中仍有可能和必要的安全改进。为了证明我们的观点,在本文中,我们提出了 FedRecAttack,这是一种针对 FR 的模型中毒攻击,旨在提高目标项目的曝光率。在大多数推荐场景中,除了私人用户-项目交互(例如,点击、观看和购买)之外,一些交互是公开的(例如,喜欢、关注和评论)。受此启发,在 FedRecAttack 中,我们利用公共交互来近似用户的特征向量,从而攻击者可以相应地生成投毒梯度,并控制恶意用户以精心设计的方式上传投毒梯度。为了评估 FedRecAttack 的有效性和副作用,我们对来自两个完全不同场景的三个不同大小的真实数据集进行了广泛的实验。实验结果表明,我们提出的 FedRecAttack 实现了最先进的效果,而其副作用可以忽略不计。此外,即使恶意用户的比例很小(3%)和公共交互的比例很小(1%),FedRecAttack 仍然非常有效,这表明 FR 比人们通常认为的更容易受到攻击。 ↩
-
This work focus on the scenario of federated semi-supervised learning where there are insufficient on-device labeled data and numerous in-cloud unlabeled data. Considering the number of participating clients and the pseudo labeling quality of in-cloud unlabeled data will have a significant impact on the performance, the authors introduce a multi-armed bandit (MAB) based online algorithm to adaptively determine the participating fraction in FL and the confidence threshold. The experimental results show 3%-14.8% higher test accuracy and saves up to 48% training cost compared with baselines. 这项工作聚焦联邦半监督学习的场景,即设备上的有标签数据不足,而云端中的无标签数据很多。考虑到参与客户的数量和云端无标签数据的伪标签质量将对性能产生重大影响,作者引入了一种基于多臂老虎机的在线算法,以适应性地确定联邦学习客户端参与比例和用于伪标签的阈值。实验结果显示,与基线相比,测试精度提高了3%-14.8%,并最高节省了48%的训练成本。 ↩
-
The performance of the FL model heavily depends on the quality of participants' local data, which makes measuring the contributions of participants an essential task for various purposes, e.g., participant selection and reward allocation. The Shapley value is widely adopted by previous work for contribution assessment, which, however, requires repeatedly leave-one-out retraining and thus incurs the prohibitive cost for FL. In this paper, we propose a highly efficient approach, named DIG-FL, to estimate the Shapley value of each participant without any model retraining. It's worth noting that our approach is applicable to both vertical federated learning (VFL) and horizontal federated learning (HFL), and we provide concrete design for VFL and HFL. In addition, we propose a DIG-FL based reweight mechanism to improve the model training in terms of accuracy and convergence speed by dynamically adjusting the weights of participants according to their per-epoch contributions, and theoretically analyze the convergence speed. Our extensive evaluations on 14 public datasets show that the estimated Shapley value is very close to the actual Shapley value with Pearson's correlation coefficient up to 0.987, while the cost is orders of magnitude smaller than state-of-the-art methods. When there are more than 80% participants holding low-quality data, by dynamically adjusting the weights, DIG-FL can effectively accelerate the convergence and improve the model accuracy. FL 模型的性能在很大程度上取决于参与者本地数据的质量,这使得衡量参与者的贡献成为各种目的的基本任务,例如参与者选择和奖励分配。 Shapley 值被以前的贡献评估工作广泛采用,然而,这需要反复留一再培训,因此导致 FL 的成本过高。在本文中,我们提出了一种名为 DIG-FL 的高效方法来估计每个参与者的 Shapley 值,而无需任何模型再训练。值得注意的是,我们的方法适用于垂直联邦学习 (VFL) 和水平联邦学习 (HFL),我们为 VFL 和 HFL 提供了具体设计。此外,我们提出了一种基于 DIG-FL 的重加权机制,通过根据参与者的每个时期的贡献动态调整参与者的权重来提高模型训练的准确性和收敛速度,并从理论上分析收敛速度。我们对 14 个公共数据集的广泛评估表明,估计的 Shapley 值非常接近实际的 Shapley 值,Pearson 相关系数高达 0.987,而成本比最先进的方法小几个数量级。当有 80% 以上的参与者持有低质量数据时,通过动态调整权重,DIG-FL 可以有效加速收敛,提高模型精度。 ↩
-
Federated Computation is an emerging area that seeks to provide stronger privacy for user data, by performing large scale, distributed computations where the data remains in the hands of users. Only the necessary summary information is shared, and additional security and privacy tools can be employed to provide strong guarantees of secrecy. The most prominent application of federated computation is in training machine learning models (federated learning), but many additional applications are emerging, more broadly relevant to data management and querying data. This tutorial gives an overview of federated computation models and algorithms. It includes an introduction to security and privacy techniques and guarantees, and shows how they can be applied to solve a variety of distributed computations providing statistics and insights to distributed data. It also discusses the issues that arise when implementing systems to support federated computation, and open problems for future research. 联邦计算是一个新兴的领域,它试图为用户数据提供更强的隐私,通过执行大规模的分布式计算,数据仍然在用户手中。只有必要的摘要信息被共享,并且可以采用额外的安全和隐私工具来提供强大的保密保证。联邦计算最突出的应用是训练机器学习模型(联邦学习),但许多其他的应用正在出现,更广泛地与数据管理和数据查询有关。本教程概述了联邦计算的模型和算法。它包括对安全和隐私技术和保证的介绍,并展示了如何应用它们来解决各种分布式计算,为分布式数据提供统计和洞察力。它还讨论了在实现支持联合计算的系统时出现的问题,以及未来研究的开放问题。 ↩
-
Due to the rising concerns on privacy protection, how to build machine learning (ML) models over different data sources with security guarantees is gaining more popularity. Vertical federated learning (VFL) describes such a case where ML models are built upon the private data of different participated parties that own disjoint features for the same set of instances, which fits many real-world collaborative tasks. Nevertheless, we find that existing solutions for VFL either support limited kinds of input features or suffer from potential data leakage during the federated execution. To this end, this paper aims to investigate both the functionality and security of ML modes in the VFL scenario.To be specific, we introduce BlindFL, a novel framework for VFL training and inference. First, to address the functionality of VFL models, we propose the federated source layers to unite the data from different parties. Various kinds of features can be supported efficiently by the federated source layers, including dense, sparse, numerical, and categorical features. Second, we carefully analyze the security during the federated execution and formalize the privacy requirements. Based on the analysis, we devise secure and accurate algorithm protocols, and further prove the security guarantees under the ideal-real simulation paradigm. Extensive experiments show that BlindFL supports diverse datasets and models efficiently whilst achieves robust privacy guarantees. 垂直联邦学习 (VFL) 描述了这样一种情况,其中 ML 模型建立在不同参与方的私有数据之上,这些参与方对同一组实例拥有不相交的特征,这适合许多现实世界的协作任务。尽管如此,我们发现现有的 VFL 解决方案要么支持有限种类的输入特征,要么在联合执行期间遭受潜在的数据泄漏。为此,本文旨在研究 VFL 场景中 ML 模式的功能和安全性。具体来说,我们介绍了 BlindFL,这是一种用于 VFL 训练和推理的新框架。首先,为了解决 VFL 模型的功能,我们提出了federated source layers来统一来自不同方的数据。federated source layers可以有效地支持各种特征,包括密集、稀疏、数值和分类特征。其次,我们仔细分析了联邦学习执行期间的安全性,并正式确定了隐私要求。在分析的基础上,我们设计了安全准确的算法协议,进一步证明了理想-现实仿真范式下的安全保证。大量实验表明,BlindFL 有效地支持各种数据集和模型,同时实现了强大的隐私保证。 ↩
-
Traditional learning-to-rank (LTR) models are usually trained in a centralized approach based upon a large amount of data. However, with the increasing awareness of data privacy, it is harder to collect data from multiple owners as before, and the resultant data isolation problem makes the performance of learned LTR models severely compromised. Inspired by the recent progress in federated learning, we propose a novel framework named Cross-Silo Federated Learning-to-Rank (CS-F-LTR), where the efficiency issue becomes the major bottleneck. To deal with the challenge, we first devise a privacy-preserving cross-party term frequency querying scheme based on sketching algorithms and differential privacy. To further improve the overall efficiency, we propose a new structure named reverse top-K sketch (RTK-Sketch) which significantly accelerates the feature generation process while holding theoretical guarantees on accuracy loss. Extensive experiments conducted on public datasets verify the effectiveness and efficiency of the proposed approach. 传统的排序学习 (LTR) 模型通常基于大量数据以集中方法进行训练。然而,随着数据隐私意识的增强,像以前一样从多个所有者那里收集数据变得更加困难,由此产生的数据隔离问题使得学习到的 LTR 模型的性能受到严重影响。受联邦学习最近进展的启发,我们提出了一个新的框架,称为Cross-silo联邦学习排序(CS-F-LTR),其中效率问题成为主要瓶颈。为了应对这一挑战,我们首先设计了一种基于草图算法和差分隐私的隐私保护跨方词频查询方案。为了进一步提高整体效率,我们提出了一种名为反向 top-K 草图(RTK-Sketch)的新结构,它显着加快了特征生成过程,同时保持了精度损失的理论保证。在公共数据集上进行的大量实验验证了所提出方法的有效性和效率。 ↩
-
Recently, vertical FL, where the participating organizations hold the same set of samples but with disjoint features and only one organization owns the labels, has received increased attention. This paper presents several feature inference attack methods to investigate the potential privacy leakages in the model prediction stage of vertical FL. The attack methods consider the most stringent setting that the adversary controls only the trained vertical FL model and the model predictions, relying on no background information of the attack target's data distribution. We first propose two specific attacks on the logistic regression (LR) and decision tree (DT) models, according to individual prediction output. We further design a general attack method based on multiple prediction outputs accumulated by the adversary to handle complex models, such as neural networks (NN) and random forest (RF) models. Experimental evaluations demonstrate the effectiveness of the proposed attacks and highlight the need for designing private mechanisms to protect the prediction outputs in vertical FL. 最近,纵向 FL 受到越来越多的关注,其中参与组织持有相同的样本集但具有不相交的特征并且只有一个组织拥有标签。本文提出了几种特征推理攻击方法来研究纵向 FL 模型预测阶段潜在的隐私泄露。攻击方法考虑了最严格的设置,即对手仅控制训练好的纵向 FL 模型和模型预测,不依赖于攻击目标数据分布的背景信息。我们首先根据个体预测输出对逻辑回归 (LR) 和决策树 (DT) 模型提出两种特定攻击。我们进一步设计了一种基于对手累积的多个预测输出的通用攻击方法,以处理复杂的模型,例如神经网络(NN)和随机森林(RF)模型。实验评估证明了所提出的攻击的有效性,并强调需要设计私有机制来保护纵向 FL 中的预测输出。 ↩
-
A fundamental issue in FL is the susceptibility to the erroneous training data. This problem is especially challenging due to the invisibility of clients' local training data and training process, as well as the resource constraints of a large number of mobile and edge devices. In this paper, we try to tackle this challenging issue by introducing the first FL debugging framework, FLDebugger, for mitigating test error caused by erroneous training data. The pro-posed solution traces the global model's bugs (test errors), jointly through the training log and the underlying learning algorithm, back to first identify the clients and subsequently their training samples that are most responsible for the errors. In addition, we devise an influence-based participant selection strategy to fix bugs as well as to accelerate the convergence of model retraining. The performance of the identification algorithm is evaluated via extensive experiments on a real AIoT system (50 clients, including 20 edge computers, 20 laptops and 10 desktops) and in larger-scale simulated environments. The evaluation results attest to that our framework achieves accurate and efficient identification of negatively influential clients and samples, and significantly improves the model performance by fixing bugs. FL中的一个基本问题是对错误训练数据的敏感性。由于客户端本地训练数据和训练过程的不可见性,以及大量移动和边缘设备的资源限制,这个问题尤其具有挑战性。在本文中,我们尝试通过引入第一个 FL 调试框架 FLDebugger 来解决这个具有挑战性的问题,以减轻由错误训练数据引起的测试错误。所提出的解决方案通过训练日志和底层学习算法共同跟踪全局模型的错误(测试错误),以首先识别对错误负有最大责任的客户,然后是他们的训练样本。此外,我们设计了一种基于影响力的参与者选择策略来修复错误并加速模型再训练的收敛。识别算法的性能通过在真实 AIoT 系统(50 个客户端,包括 20 台边缘计算机、20 台笔记本电脑和 10 台台式机)和更大规模的模拟环境中的广泛实验来评估。评估结果证明,我们的框架实现了对负面影响的客户和样本的准确高效识别,并通过修复错误显着提高了模型性能。 ↩
-
This paper comprehensively studies the problem of matrix factorization in different federated learning (FL) settings, where a set of parties want to cooperate in training but refuse to share data directly. We first propose a generic algorithmic framework for various settings of federated matrix factorization (FMF) and provide a theoretical convergence guarantee. We then systematically characterize privacy-leakage risks in data collection, training, and publishing stages for three different settings and introduce privacy notions to provide end-to-end privacy protections. The first one is vertical federated learning (VFL), where multiple parties have the ratings from the same set of users but on disjoint sets of items. The second one is horizontal federated learning (HFL), where parties have ratings from different sets of users but on the same set of items. The third setting is local federated learning (LFL), where the ratings of the users are only stored on their local devices. We introduce adapted versions of FMF with the privacy notions guaranteed in the three settings. In particular, a new private learning technique called embedding clipping is introduced and used in all the three settings to ensure differential privacy. For the LFL setting, we combine differential privacy with secure aggregation to protect the communication between user devices and the server with a strength similar to the local differential privacy model, but much better accuracy. We perform experiments to demonstrate the effectiveness of our approaches. 本文全面研究了不同联邦学习(FL)设置中的矩阵分解问题,其中一组方希望在训练中进行合作,但拒绝直接共享数据。我们首先为federated matrix factorization (FMF) 的各种设置提出了一个通用算法框架,并提供了理论上的收敛保证。然后,我们系统地描述了三种不同设置的数据收集、训练和发布阶段的隐私泄露风险,并引入了隐私概念以提供端到端的隐私保护。第一个是垂直联合学习(VFL),其中多方具有来自同一组用户但不相交的项目集的评分。第二个是横向联合学习(HFL),各方对同一组项目的不同用户集进行评分。第三个设置是本地联合学习 (LFL),其中用户的评分仅存储在他们的本地设备上。我们引入了 FMF 的改编版本,并在三种设置中保证了隐私概念。特别是,在所有三种设置中引入并使用了一种称为嵌入裁剪的新私有学习技术,以确保差异隐私。对于 LFL 设置,我们将差分隐私与安全聚合相结合,以保护用户设备与服务器之间的通信,其强度类似于本地差分隐私模型,但精度要高得多。我们进行实验来证明我们的方法的有效性。 ↩
-
In practice, different clients may have different privacy requirements due to varying policies or preferences.In this paper, we focus on explicitly modeling and leveraging the heterogeneous privacy requirements of different clients and study how to optimize utility for the joint model while minimizing communication cost. As differentially private perturbations affect the model utility, a natural idea is to make better use of information submitted by the clients with higher privacy budgets (referred to as "public" clients, and the opposite as "private" clients). The challenge is how to use such information without biasing the joint model. We propose Projected Federated Averaging (PFA), which extracts the top singular subspace of the model updates submitted by "public" clients and utilizes them to project the model updates of "private" clients before aggregating them. We then propose communication-efficient PFA+, which allows "private" clients to upload projected model updates instead of original ones. Our experiments verify the utility boost of both algorithms compared to the baseline methods, whereby PFA+ achieves over 99% uplink communication reduction for "private" clients. 在实践中,由于不同的政策或偏好,不同的客户可能有不同的隐私要求。在本文中,我们专注于显式建模和利用不同客户端的异构隐私需求,并研究如何在最小化通信成本的同时优化联合模型的效用。由于不同的私人扰动会影响模型效用,一个自然的想法是更好地利用具有较高隐私预算的客户(称为“公共”客户,反之称为“私人”客户)提交的信息。挑战在于如何在不影响联合模型的情况下使用这些信息。我们提出Projected Federated Averaging (PFA),它提取“公共”客户端提交的模型更新的顶部奇异子空间,并在聚合它们之前,利用它们来预测“私人”客户端的模型更新。然后,我们提出了高效的通信 PFA+,它允许“私人”客户端上传预计的模型更新而不是原始模型更新。我们的实验验证了这两种算法与基线方法相比的效用提升,其中 PFA+ 为“私人”客户端实现了超过 99% 的上行链路通信减少。 ↩
-
How can we debug a logistical regression model in a federated learning setting when seeing the model behave unexpectedly (e.g., the model rejects all high-income customers' loan applications)? The SQL-based training data debugging framework has proved effective to fix this kind of issue in a non-federated learning setting. Given an unexpected query result over model predictions, this framework automatically removes the label errors from training data such that the unexpected behavior disappears in the retrained model. In this paper, we enable this powerful framework for federated learning. The key challenge is how to develop a security protocol for federated debugging which is proved to be secure, efficient, and accurate. Achieving this goal requires us to investigate how to seamlessly integrate the techniques from multiple fields (Databases, Machine Learning, and Cybersecurity). We first propose FedRain, which extends Rain, the state-of-the-art SQL-based training data debugging framework, to our federated learning setting. We address several technical challenges to make FedRain work and analyze its security guarantee and time complexity. The analysis results show that FedRain falls short in terms of both efficiency and security. To overcome these limitations, we redesign our security protocol and propose Frog, a novel SQL-based training data debugging framework tailored for federated learning. Our theoretical analysis shows that Frog is more secure, more accurate, and more efficient than FedRain. We conduct extensive experiments using several real-world datasets and a case study. The experimental results are consistent with our theoretical analysis and validate the effectiveness of Frog in practice. 当模型表现异常时(例如,模型拒绝所有高收入客户的贷款申请),我们如何在联邦学习环境中调试逻辑回归模型?事实证明,基于 SQL 的训练数据调试框架可以有效地解决非联邦学习环境中的此类问题。给定模型预测的意外查询结果,该框架会自动从训练数据中删除标签错误,从而使重新训练的模型中的意外行为消失。在本文中,我们为联邦学习启用了这个强大的框架。关键的挑战是如何为联合调试开发一种被证明是安全、高效和准确的安全协议。实现这一目标需要我们研究如何无缝集成来自多个领域(数据库、机器学习和网络安全)的技术。我们首先提出 FedRain,它将最先进的基于 SQL 的训练数据调试框架 Rain 扩展到我们的联邦学习设置。我们解决了几个技术挑战以使 FedRain 工作并分析其安全保证和时间复杂度。分析结果表明,FedRain 在效率和安全性方面都存在不足。为了克服这些限制,我们重新设计了我们的安全协议并提出了 Frog,这是一种为联邦学习量身定制的基于 SQL 的新型训练数据调试框架。我们的理论分析表明,Frog 比 FedRain 更安全、更准确、更高效。我们使用几个真实世界的数据集和一个案例研究进行了广泛的实验。实验结果与我们的理论分析一致,在实践中验证了 Frog 的有效性。 ↩
-
Techniques for learning models from decentralized data must properly handle two natures of such data, namely privacy and massive engagement. Federated learning (FL) is a promising approach for such a learning task since the technique learns models from data without exposing privacy. However, traditional FL methods assume that the participating mobile devices are honest volunteers. This assumption makes traditional FL methods unsuitable for applications where two kinds of participants are engaged: 1) self-interested participants who, without economical stimulus, are reluctant to contribute their computing resources unconditionally, and 2) malicious participants who send corrupt updates to disrupt the learning process. This paper proposes Refiner, a reliable federated learning system for tackling the challenges introduced by massive engagements of self-interested and malicious participants. Refiner is built upon Ethereum, a public blockchain platform. To engage self-interested participants, we introduce an incentive mechanism which rewards each participant in terms of the amount of its training data and the performance of its local updates. To handle malicious participants, we propose an audit scheme which employs a committee of randomly chosen validators for punishing them with no reward and preclude corrupt updates from the global model. The proposed incentive and audit scheme is implemented with cryptocurrency and smart contract, two primitives offered by Ethereum. This paper demonstrates the main features of Refiner by training a digit classification model on the MNIST dataset.从分散数据中学习模型的技术必须正确处理此类数据的两种性质,即隐私和大规模参与。联邦学习 (FL) 是此类学习任务的一种很有前途的方法,因为该技术从数据中学习模型而不暴露隐私。然而,传统的 FL 方法假设参与的移动设备是诚实的志愿者。这一假设使得传统的 FL 方法不适用于有两种参与者参与的应用程序:1)自利的参与者,在没有经济刺激的情况下,不愿意无条件地贡献他们的计算资源,以及 2)发送损坏更新以破坏网络的恶意参与者学习过程。本文提出了 Refiner,这是一种可靠的联合学习系统,用于应对自利和恶意参与者的大规模参与所带来的挑战。 Refiner 建立在公共区块链平台以太坊之上。为了吸引自利的参与者,我们引入了一种激励机制,根据每个参与者的训练数据量和本地更新的性能来奖励每个参与者。为了处理恶意参与者,我们提出了一个审计方案,该方案使用一个随机选择的验证者委员会来惩罚他们而没有奖励,并从全局模型中排除损坏的更新。提议的激励和审计计划是通过以太坊提供的两种原语加密货币和智能合约来实施的。本文通过在 MNIST 数据集上训练一个数字分类模型来展示 Refiner 的主要功能。 ↩
-
Tanium Reveal is a federated search engine deployed on large-scale enterprise networks that is capable of executing data queries across billions of private data files within 60 seconds. Data resides at the edge of networks, potentially distributed on hundreds of thousands of endpoints. The anatomy of the search engine consists of local inverse indexes on each endpoint and a global communication platform called Tanium for issuing search queries to all endpoints. Reveal enables asynchronous parsing and indexing on endpoints without noticeable impact to the endpoints' primary functionality. The engine harnesses the Tanium platform, which is based on a self-organizing, fault-tolerant, scalable, linear chain communication scheme. We demonstrate a multi-tier workflow for executing search queries across a network and for viewing matching snippets of text on any endpoint. We analyze metrics for federated indexing and searching in multiple environments including a production network with 1.05 billion searchable files distributed across 4236 endpoints. While primarily focusing on Boolean, phrase, and similarity query types, Reveal is compatible with further automation (e.g., semantic classification based on machine learning). Lastly, we discuss safeguards for sensitive information within Reveal including cryptographic hashing of private text and role-based access control (RBAC). Tanium Reveal 是一个部署在大型企业网络上的联邦搜索引擎,能够在 60 秒内对数十亿个私有数据文件执行数据查询。数据位于网络边缘,可能分布在数十万个端点上。搜索引擎的结构包括每个端点上的本地反向索引和一个名为 Tanium 的全球通信平台,用于向所有端点发出搜索查询。 Reveal 在端点上启用异步解析和索引,而不会对端点的主要功能产生明显影响。该引擎利用基于自组织、容错、可扩展、线性链通信方案的 Tanium 平台。我们演示了一个多层工作流程,用于在网络上执行搜索查询并在任何端点上查看匹配的文本片段。我们分析了多个环境中的联邦索引和搜索指标,包括分布在 4236 个端点上的 10.5 亿个可搜索文件的生产网络。虽然主要关注布尔、短语和相似性查询类型,但 Reveal 与进一步的自动化兼容(例如,基于机器学习的语义分类)。最后,我们讨论了 Reveal 中敏感信息的保护措施,包括私有文本的加密散列和基于角色的访问控制 (RBAC)。 ↩
-
Data science workflows are largely exploratory, dealing with under-specified objectives, open-ended problems, and unknown business value. Therefore, little investment is made in systematic acquisition, integration, and pre-processing of data. This lack of infrastructure results in redundant manual effort and computation. Furthermore, central data consolidation is not always technically or economically desirable or even feasible (e.g., due to privacy, and/or data ownership). The ExDRa system aims to provide system infrastructure for this exploratory data science process on federated and heterogeneous, raw data sources. Technical focus areas include (1) ad-hoc and federated data integration on raw data, (2) data organization and reuse of intermediates, and (3) optimization of the data science lifecycle, under awareness of partially accessible data. In this paper, we describe use cases, the overall system architecture, selected features of SystemDS' new federated backend (for federated linear algebra programs, federated parameter servers, and federated data preparation), as well as promising initial results. Beyond existing work on federated learning, ExDRa focuses on enterprise federated ML and related data pre-processing challenges. In this context, federated ML has the potential to create a more fine-grained spectrum of data ownership and thus, even new markets.数据科学工作流在很大程度上是探索性的,处理未指定的目标、开放式问题和未知的商业价值。因此,在数据的系统采集、集成和预处理方面投入很少。这种基础设施的缺乏导致了多余的人工和计算。此外,中央数据整合在技术上或经济上并不总是可取的,甚至不可行(例如,由于隐私和/或数据所有权)。 ExDRa 系统旨在为联邦和异构原始数据源的探索性数据科学过程提供系统基础设施。技术重点领域包括 (1) 原始数据的临时和联合数据集成,(2) 数据组织和中间体的重用,以及 (3) 在了解部分可访问数据的情况下优化数据科学生命周期。在本文中,我们描述了用例、整体系统架构、SystemDS 新的联合后端的选定特性(用于联合线性代数程序、联合参数服务器和联合数据准备),以及有希望的初步结果。除了现有的联合学习工作之外,ExDRa 还专注于企业联合 ML 和相关的数据预处理挑战。在这种情况下,联合 ML 有可能创造更细粒度的数据所有权范围,从而创造新的市场。 ↩
-
With the popularity of edge computing, numerous Internet of Things (IoT) applications have been developed and applied to various fields. However, for the harsh environment with network fluctuations and potential attacks, traditional task offloading decision-making schemes cannot meet the requirements of real-time and security. For this reason, we propose a novel task offloading decision framework to cope with the special requirements of the environment. This framework uses a task offloading decision model based on deep reinforcement learning algorithms, and is located on the user side to reduce the impact of network fluctuations. To improve the efficiency and security of the model in harsh edge computing environments, we adopt federated learning and introduce the blockchain into the process of parameter upload and decentralization of federated learning. In addition, we design a new blockchain consensus algorithm to reduce the waste of computing resources and improve the embedding and propagation speeds of the blockchain. Furthermore, we demonstrate the effect of task offloading of this model by performing offloading decisions on a simulation platform. 随着边缘计算的普及,已经开发出大量物联网(IoT)应用并应用于各个领域。然而,对于网络波动和潜在攻击的恶劣环境,传统的任务卸载决策方案无法满足实时性和安全性的要求。出于这个原因,我们提出了一种新颖的任务卸载决策框架来应对环境的特殊要求。该框架采用基于深度强化学习算法的任务卸载决策模型,定位于用户侧,减少网络波动的影响。为了提高模型在恶劣的边缘计算环境下的效率和安全性,我们采用联邦学习,并将区块链引入到联邦学习的参数上传和去中心化过程中。此外,我们设计了一种新的区块链共识算法,以减少计算资源的浪费,提高区块链的嵌入和传播速度。此外,我们通过在仿真平台上执行卸载决策来展示该模型的任务卸载效果。 ↩
-
This paper aims to integrate two synergetic tech�nologies, federated learning (FL) and width-adjustable slimmable neural network (SNN) architectures. FL preserves data privacy by exchanging the locally trained models of mobile devices. By adopting SNNs as local models, FL can flexibly cope with the time-varying energy capacities of mobile devices. Combining FL and SNNs is however non-trivial, particularly under wireless connections with time-varying channel conditions. Furthermore, existing multi-width SNN training algorithms are sensitive to the data distributions across devices, so are ill-suited to FL. Moti�vated by this, we propose a communication and energy efficient SNN-based FL (named SlimFL) that jointly utilizes superposition coding (SC) for global model aggregation and superposition training (ST) for updating local models. By applying SC, SlimFL exchanges the superposition of multiple width configurations that are decoded as many as possible for a given communication throughput. Leveraging ST, SlimFL aligns the forward propaga�tion of different width configurations, while avoiding the inter�width interference during back propagation. We formally prove the convergence of SlimFL. The result reveals that SlimFL is not only communication-efficient but also can counteract non-IID data distributions and poor channel conditions, which is also corroborated by simulations. 本文旨在整合两种协同技术,即联邦学习 (FL) 和宽度可调的可精简神经网络 (SNN) 架构。联邦学习通过交换本地训练的移动设备模型来保护数据隐私。通过采用 SNN 作为local模型,FL 可以灵活地应对移动设备随时间变化的能量容量。然而,结合 FL 和 SNN 并非易事,尤其是在具有时变信道条件的无线连接下。此外,现有的多宽度 SNN 训练算法对跨设备的数据分布敏感,因此不适合 FL。受此启发,我们提出了一种基于 SNN 的通信和节能的 FL(命名为 SlimFL),它联合利用叠加编码(SC)进行全局模型聚合和叠加训练(ST)来更新局部模型。通过应用 SC,SlimFL 交换多个宽度配置的叠加,这些配置为给定的通信吞吐量尽可能多地解码。利用 ST,SlimFL 对齐不同宽度配置的前向传播,同时避免反向传播期间的宽度间干扰。我们正式证明了 SlimFL 的收敛性。结果表明,SlimFL 不仅具有通信效率,而且可以抵消非 IID 数据分布和恶劣的信道条件,这也得到了仿真的证实。我们正式证明了 SlimFL 的收敛性。结果表明,SlimFL 不仅具有通信效率,而且可以抵消非 IID 数据分布和恶劣的信道条件,这也得到了仿真的证实。我们正式证明了 SlimFL 的收敛性。结果表明,SlimFL 不仅具有通信效率,而且可以抵消非 IID 数据分布和恶劣的信道条件,这也得到了仿真的证实。 ↩
-
Recent advances in federated learning (FL) made it feasible to train a machine learning model across multiple clients, even with non-IID data distributions. In contrast to these unimodal models that have been studied extensively in the literature, there are few in-depth studies on how multi-modal models can be trained effectively with federated learning. Unfortunately, we empirically observed a counter-intuitive phenomenon that, compared with its uni-modal counterpart, multi-modal FL leads to a significant degradation in performance. Our in-depth analysis of such a phenomenon shows that modality sub-networks and local models can overfit and generalize at different rates. To alleviate these inconsistencies in collaborative learning, we propose hierarchical gradient blending (HGB), which simultaneously computes the optimal blending of modalities and the optimal weighting of local models by adaptively measuring their overfitting and generalization behaviors. When HGB is applied, we present a few important theoretical insights and convergence guarantees for convex and smooth functions, and evaluate its performance in multi-modal FL. Our experimental results on an extensive array of non-IID multimodal data have demonstrated that HGB is not only able to outperform the best uni-modal baselines but also to achieve superior accuracy and convergence speed as compared to stateof-the-art frameworks. 联邦学习(FL)的最新进展使得跨多个客户端训练机器学习模型成为可能,甚至可以使用非IID数据分布。与文献中广泛研究的这些单模态模型相比,关于如何利用联邦学习有效训练多模态模型的深入研究很少。不幸的是,我们根据经验观察到一个反直觉的现象,与单模态相比,多模态FL会导致性能的显著下降。我们对这种现象的深入分析表明,模态子网络和局部模型可以以不同的速度过拟合和泛化。为了缓解协作学习中的这些不一致性,我们提出了层次梯度混合(HGB),该方法通过自适应度量局部模型的过拟合和泛化行为,同时计算模型的最佳混合和局部模型的最佳权重。当HGB应用于凸函数和光滑函数时,我们提出了一些重要的理论见解和收敛保证,并评估了它在多模态FL中的性能。我们在大量非IID多模态数据上的实验结果表明,与最先进的框架相比,HGB不仅能够优于最好的单模态基线,而且能够获得更高的精度和收敛速度。 ↩
-
Federated Learning (FL) incurs high communication overhead, which can be greatly alleviated by compression for model updates. Y et the tradeoff between compression and model accuracy in the networked environment remains unclear and, for simplicity, most implementations adopt a fixed compression rate only. In this paper, we for the first time systematically examine this tradeoff, identifying the influence of the compression error on the final model accuracy with respect to the learning rate. Specifically, we factor the compression error of each global iteration into the convergence rate analysis under both strongly convex and non-convex loss functions. We then present an adaptation framework to maximize the final model accuracy by strategically adjusting the compression rate in each iteration. We have discussed the key implementation issues of our framework in practical networks with representative compression algorithms. Experiments over the popular MNIST and CIFAR-10 datasets confirm that our solution effectively reduces network traffic yet maintains high model accuracy in FL. 联邦学习(FL)会产生很高的通信开销,这可以通过对模型更新进行压缩而大大减轻。在网络环境中,压缩和模型精度之间的权衡仍然不清楚,为了简单起见,大多数实现只采用固定的压缩率。在本文中,我们第一次系统地研究了这种权衡,确定了压缩误差对最终模型精度与学习率的影响。具体而言,我们在强凸和非凸损失函数下,将每次全局迭代的压缩误差考虑到收敛速度分析中。然后,我们提出了一个自适应框架,通过在每次迭代中有策略地调整压缩率来最大化最终的模型精度。我们用代表性的压缩算法讨论了该框架在实际网络中的关键实现问题。在流行的MNIST和CIFAR-10数据集上的实验证实,我们的解决方案有效地减少了网络流量,同时在FL中保持了较高的模型精度。 ↩
-
In Machine Learning, the emergence of the right to be forgotten gave birth to a paradigm named machine unlearning, which enables data holders to proactively erase their data from a trained model. Existing machine unlearning techniques focus on centralized training, where access to all holders’ training data is a must for the server to conduct the unlearning process. It remains largely underexplored about how to achieve unlearning when full access to all training data becomes unavailable. One noteworthy example is Federated Learning (FL), where each participating data holder trains locally, without sharing their training data to the central server. In this paper, we investigate the problem of machine unlearning in FL systems. We start with a formal definition of the unlearning problem in FL and propose a rapid retraining approach to fully erase data samples from a trained FL model. The resulting design allows data holders to jointly conduct the unlearning process efficiently while keeping their training data locally. Our formal convergence and complexity analysis demonstrate that our design can preserve model utility with high efficiency. Extensive evaluations on four real-world datasets illustrate the effectiveness and performance of our proposed realization. 在机器学习中,遗忘权的出现催生了一种称为机器遗忘的范式,它使数据持有者能够主动地从经过训练的模型中删除数据。现有的机器学习技术侧重于集中训练,服务器必须访问所有持有者的训练数据才能进行学习过程。在无法完全访问所有训练数据的情况下,如何实现忘却学习在很大程度上仍然没有得到充分探讨。一个值得注意的例子是联邦学习(FL),其中每个参与数据持有者在本地进行训练,而不将其训练数据共享给中央服务器。本文研究了FL系统中的机器学习问题。我们从FL中的学习问题的正式定义开始,并提出了一种快速再训练方法,从训练的FL模型中完全擦除数据样本。由此产生的设计允许数据持有者联合有效地进行学习过程,同时将他们的培训数据保存在本地。我们的形式收敛性和复杂性分析表明,我们的设计可以高效地保持模型效用。对四个真实数据集的广泛评估说明了我们提出的实现的有效性和性能。 ↩
-
Federated learning (FL) algorithms usually sample a fraction of clients in each round (partial participation) when the number of participants is large and the server’s communication bandwidth is limited. Recent works on the convergence analysis of FL have focused on unbiased client sampling, e.g., sampling uniformly at random, which suffers from slow wall-clock time for convergence due to high degrees of system heterogeneity and statistical heterogeneity. This paper aims to design an adaptive client sampling algorithm that tackles both system and statistical heterogeneity to minimize the wall-clock convergence time. We obtain a new tractable convergence bound for FL algorithms with arbitrary client sampling probabilities. Based on the bound, we analytically establish the relationship between the total learning time and sampling probabilities, which results in a non-convex optimization problem for training time minimization. We design an efficient algorithm for learning the unknown parameters in the convergence bound and develop a low-complexity algorithm to approximately solve the non-convex problem. Experimental results from both hardware prototype and simulation demonstrate that our proposed sampling scheme significantly reduces the convergence time compared to several baseline sampling schemes. Notably, our scheme in hardware prototype spends 73% less time than the uniform sampling baseline for reaching the same target loss. 联邦学习(FL)算法通常在每轮(部分参与)中采样一部分客户端,当参与者数量较大,且服务器的通信带宽有限时。最近关于FL的收敛性分析的工作主要集中在无偏客户端抽样,例如均匀随机抽样,由于系统异质性和统计异质性的高度,这种方法的收敛时间较慢。本文旨在设计一种兼顾系统和统计异质性的自适应客户端采样算法,使挂钟收敛时间最小化。对于具有任意客户端采样概率的FL算法,我们得到了一个新的易于处理的收敛界。基于该界限,我们解析建立了总学习时间与采样概率之间的关系,从而得到了训练时间最小化的非凸优化问题。我们设计了一种学习收敛界内未知参数的有效算法,并开发了一种近似求解非凸问题的低复杂度算法。硬件样机和仿真实验结果表明,与几种基线采样方案相比,我们提出的采样方案显著缩短了收敛时间。值得注意的是,在硬件原型中,我们的方案在达到相同目标损耗时比统一采样基线节省73%的时间。 ↩
-
Federated learning (FL) is a useful tool in distributed machine learning that utilizes users’ local datasets in a privacy-preserving manner. When deploying FL in a constrained wireless environment; however, training models in a time-efficient manner can be a challenging task due to intermittent connectivity of devices, heterogeneous connection quality, and non-i.i.d. data. In this paper, we provide a novel convergence analysis of non-convex loss functions using FL on both i.i.d. and non-i.i.d. datasets with arbitrary device selection probabilities for each round. Then, using the derived convergence bound, we use stochastic optimization to develop a new client selection and power allocation algorithm that minimizes a function of the convergence bound and the average communication time under a transmit power constraint. We find an analytical solution to the minimization problem. One key feature of the algorithm is that knowledge of the channel statistics is not required and only the instantaneous channel state information needs to be known. Using the FEMNIST and CIFAR-10 datasets, we show through simulations that the communication time can be significantly decreased using our algorithm, compared to uniformly random participation.联邦学习(FL)是分布式机器学习中一个有用的工具,它以保护隐私的方式利用用户的本地数据集。 在受限无线环境中部署FL时; 然而,由于设备的间歇性连接、异构连接质量和非同一性,以高效的方式训练模型可能是一项具有挑战性的任务。 数据。 本文给出了非凸损失函数在内id和非内id上的一种新的收敛性分析。 数据集的任意设备选择概率为每一轮。 然后,利用推导出的收敛界,我们利用随机优化方法开发了一种新的客户端选择和功率分配算法,该算法在传输功率约束下使收敛界和平均通信时间的函数最小。 我们找到了最小化问题的解析解。 该算法的一个关键特点是不需要信道统计知识,只需要知道瞬时信道状态信息。 使用FEMNIST和CIFAR-10数据集,我们通过模拟表明,与统一随机参与相比,使用我们的算法可以显著减少通信时间。 ↩
-
TBC ↩
-
Existing machine learning (ML) model marketplaces generally require data owners to share their raw data, leading to serious privacy concerns. Federated learning (FL) can partially alleviate this issue by enabling model training without raw data exchange. However, data owners are still susceptible to privacy leakage from gradient exposure in FL, which discourages their participation. In this work, we advocate a novel differentially private FL (DPFL)-based ML model marketplace. We focus on the broker-centric design. Specifically, the broker first incentivizes data owners to participate in model training via DPFL by offering privacy protection as per their privacy budgets and explicitly accounting for their privacy costs. Then, it conducts optimal model versioning and pricing to sell the obtained model versions to model buyers. In particular, we focus on the broker’s profit maximization, which is challenging due to the significant difficulties in the revenue characterization of model trading and the cost estimation of DPFL model training. We propose a two-layer optimization framework to address it, i.e., revenue maximization and cost minimization under model quality constraints. The latter is still challenging due to its non-convexity and integer constraints. We hence propose efficient algorithms, and their performances are both theoretically guaranteed and empirically validated. 现有的机器学习(ML)模型市场通常要求数据所有者分享他们的原始数据,这导致了严重的隐私问题。联邦学习(FL)通过支持不进行原始数据交换的模型训练,可以部分地缓解这个问题。然而,数据所有者仍然容易受到FL中梯度暴露的隐私泄露,这阻碍了他们的参与。在这项工作中,我们提倡一种新的基于差分私有FL (DPFL)的ML模型市场。我们专注于以代理为中心的设计。具体来说,经纪人首先通过DPFL通过提供根据隐私预算的隐私保护并明确说明其隐私成本来激励数据所有者参与模型培训。然后进行最优模型版本化和定价,将获得的模型版本卖给模型买家。我们特别关注经纪人的利润最大化,这是具有挑战性的,因为在模型交易的收入描述和DPFL模型训练的成本估计方面有很大的困难。我们提出了一个两层优化框架,即模型质量约束下的收益最大化和成本最小化。后者由于其非凸性和整数约束,仍然具有挑战性。因此,我们提出了有效的算法,其性能在理论上得到了保证,并得到了经验验证。 ↩
-
Federated Learning (FL) is susceptible to gradient leakage attacks, as recent studies show the feasibility of obtaining private training data on clients from publicly shared gradients. Existing work solves this problem by incorporating a series of privacy protection mechanisms, such as homomorphic encryption and local differential privacy to prevent data leakage. However, these solutions either incur significant communication and computation costs, or significant training accuracy loss. In this paper, we show that the sensitivity of gradient changes w.r.t. training data is an essential measure of information leakage risk. Based on this observation, we present a novel defense, whose intuition is perturbing gradients to match information leakage risk such that the defense overhead is lightweight while privacy protection is adequate. Our another key observation is that global correlations of gradients could compensate for this perturbation. Based on such compensation, training can achieve guaranteed accuracy. We conduct experiments on MNIST, Fashion-MNIST and CIFAR-10 for defending against two gradient leakage attacks. Without sacrificing accuracy, the results demonstrate that our lightweight defense can decrease the PSNR and SSIM between the reconstructed images and raw images by up to more than 60% for both two attacks, compared with baseline defensive methods. 联邦学习(FL)容易受到梯度泄漏攻击,因为最近的研究表明,从公共共享的梯度中获取客户端的私有训练数据是可行的。现有工作通过引入同态加密、局部差分隐私等一系列隐私保护机制来防止数据泄露,从而解决了这一问题。然而,这些解决方案要么产生显著的通信和计算成本,要么产生显著的训练精度损失。在本文中,我们表明梯度变化的敏感性训练数据是一个基本的衡量信息泄漏风险。基于这一观察,我们提出了一种新的防御,其直觉是扰动梯度来匹配信息泄漏风险,这样防御开销很轻,而隐私保护足够。我们的另一个关键观察是,梯度的全局相关性可以补偿这种扰动。在这种补偿的基础上,训练可以达到有保证的精度。我们对MNIST、Fashion-MNIST和CIFAR-10进行了防御两种梯度泄漏攻击的实验。结果表明,在不牺牲准确性的情况下,与基线防御方法相比,我们的轻量级防御在两种攻击中都可以将重建图像和原始图像之间的PSNR和SSIM降低高达60%以上。 ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
This paper carries out the first empirical study to characterize the impacts of heterogeneity in FL based on large-scale data from 136k smartphones that can faithfully reflect heterogeneity in real-world settings. This paper also builds a heterogeneity-aware FL platform that complies with the standard FL protocol but with heterogeneity in consideration.本文第一次通过通过136,000台数据收集到的数据上的实证研究来描述联邦学习中异质性的影响,并构建了一个数据异质性感知的联邦学习平台。 ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
PAPAYA outline a production asynchronous FL system design. Empirically, we demonstrate that asynchronous FL converges faster than synchronous FL when training across nearly one hundred million devices. In particular, in high concurrency settings, asynchronous FL is 5x faster and has nearly 8x less communication overhead than synchronous FL. PAPAYA概述了一个生产性的异步联邦系统设计。根据经验,我们证明了在近一亿台设备上进行训练时,异步FL比同步FL收敛得更快。特别是,在高并发环境下,异步FL比同步FL快5倍,通信开销少8倍。 ↩
-
State-of-the-art secure aggregation protocols rely on secret sharing of the random-seeds used for mask generations at the users to enable the reconstruction and cancellation of those belonging to the dropped users. The complexity of such approaches, however, grows substantially with the number of dropped users. LightSecAgg, to overcome this bottleneck by changing the design from "random-seed reconstruction of the dropped users" to "one-shot aggregate-mask reconstruction of the active users via mask encoding/decoding". 最先进的安全聚合协议依赖于在用户处秘密共享用于掩码生成的随机种子,以便能够重建和取消属于被放弃用户的随机种子。然而,这种方法的复杂性随着被放弃的用户数量的增加而大大增加。LightSecAgg 通过将设计从 "被放弃用户的随机种子重建 "改为 "通过掩码编码/解码对活跃用户进行一次性聚合掩码重建 "来克服这个瓶颈。 ↩
-
Oort, improve the performance of federated training and testing with guided participant selection. With an aim to improve time-to-accuracy performance in model training, Oort prioritizes the use of those clients who have both data that offers the greatest utility in improving model accuracy and the capability to run training quickly. To enable FL developers to interpret their results in model testing, Oort enforces their requirements on the distribution of participant data while improving the duration of federated testing by cherry-picking clients. Oort,通过指导性的参与者选择来提高联邦训练和测试的性能。为了提高模型训练的时间-精度性能,Oort优先使用那些既拥有对提高模型精度有最大作用的数据又有能力快速运行训练的客户。为了使FL开发者能够解释他们在模型测试中的结果,Oort强制执行他们对参与者数据分布的要求,同时通过挑选客户来改善联邦测试的持续时间。 ↩
-
TBC ↩
-
Model the fairness guaranteed client selection as a Lyapunov optimization problem and then a C2MAB-based method is proposed for estimation of the model exchange time between each client and the server, based on which we design a fairness guaranteed algorithm termed RBCS-F for problem-solving. 我们将保证公平性的客户选择建模为一个Lyapunov优化问题,然后提出一个基于C2MAB的方法来估计每个客户和服务器之间的模型交换时间,在此基础上,我们设计了一个保证公平性的算法,即RBCS-F来解决问题。 ↩
-
TBC ↩
-
TBC ↩
-
TBC ↩
-
TRUDA, a new cross-silo FL system, employing a trustworthy and decentralized aggregation architecture to break down information concentration with regard to a single aggregator. Based on the unique computational properties of model-fusion algorithms, all exchanged model updates in TRUDA are disassembled at the parameter-granularity and re-stitched to random partitions designated for multiple TEE-protected aggregators. TRUDA是一个新的跨机构FL系统,采用了一个可信的、分散的聚合架构,以打破对单一聚合器的信息集中。基于模型融合算法的独特计算特性,TRUDA中所有交换的模型更新都在参数粒度上被分解,并重新缝合到指定给多个受TEE保护的聚合器的随机分区。 ↩
-
TBC ↩
-
TBC ↩
-
FedProx, to tackle heterogeneity in federated networks. FedProx can be viewed as a generalization and re-parametrization of FedAvg, the current state-of-the-art method for federated learning. While this re-parameterization makes only minor modifications to the method itself, these modifications have important ramifications both in theory and in practice. Theoretically, we provide convergence guarantees for our framework when learning over data from non-identical distributions (statistical heterogeneity), and while adhering to device-level systems constraints by allowing each participating device to perform a variable amount of work (systems heterogeneity). Practically, we demonstrate that FedProx allows for more robust convergence than FedAvg across a suite of realistic federated datasets. FedProx,解决联邦网络中的异质性问题。FedProx可以被看作是FedAvg的概括和重新参数化,FedAvg是目前最先进的联邦学习方法。虽然这种重新参数化只对方法本身做了微小的修改,但这些修改在理论和实践上都有重要的影响。在理论上,我们为我们的框架提供了收敛保证,当对来自非相同分布的数据进行学习时(统计异质性),同时通过允许每个参与的设备执行不同数量的工作(系统异质性)来遵守设备级别的系统约束。在实践中,我们证明了FedProx比FedAvg在一系列现实的联邦数据集中能实现更稳健的收敛。 ↩
-
We have built a scalable production system for Federated Learning in the domain of mobile devices, based on TensorFlow. In this paper, we describe the resulting high-level design, sketch some of the challenges and their solutions, and touch upon the open problems and future directions. 我们已经为移动设备领域的联邦学习建立了一个可扩展的生产系统,基于TensorFlow。在本文中,我们描述了由此产生的高层次设计,概述了一些挑战和它们的解决方案,并谈到了开放的问题和未来的方向。 ↩