This repo is motivated by awesome tensor compilers.
- Paper-Code
- Researcher
- LLM Serving Framework
- LLM Evaluation Platform
- LLM Inference System Side)
- RLHF
- DIT
- LLM Inference AI Side)
- LLM MoE
- LoRA
- Framework
- Parallellism Training
- Training
- Communication
- Serving-Inference
- MoE
- GPU Cluster Management
- Schedule and Resource Management
- Optimization
- GNN
- Fine-Tune
- Energy
- Misc
- Contribute
Title | Github |
---|---|
MLC LLM | |
TensorRT-LLM | |
xFasterTransformer | |
CTranslate2(low latency) | |
llama2.c |
Title | Github | Website |
---|---|---|
FastChat |
Title | Paper | Github | WebSite | Pub. & Date |
---|---|---|---|---|
xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism | - | Nov. 2024 | ||
FastVideo | - | Dec. 2024 |
-
code Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning OSDI'22
paper Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning OSDI'22
-
code Unity: Accelerating DNN Training Through Joint Optimization of Algebraic Transformations and Parallelization OSDI'22 OSDI'22
-
code Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM SC21
paper Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM SC21
-
code Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training ICPP'23
paper Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training ICPP'23
-
code HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework VLDB'22
paper HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework VLDB'22
-
code DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines Eurosys'24
paper DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines Eurosys'24
-
Aceso: Efficient Parallel DNN Training through Iterative Bottleneck Alleviation Eurosys'24
-
code HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated Program Synthesis Eurosys'24
paper HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated Program Synthesis Eurosys'24
-
code Calculon: A Methodology and Tool for High-Level Co-Design of Systems and Large Language Models SC'23
paper Calculon: A Methodology and Tool for High-Level Co-Design of Systems and Large Language Models SC'23
-
code Bamboo: Making Preemptible Instances Resilient for Affordable Training of Large DNNs NSDI'23
paper Bamboo: Making Preemptible Instances Resilient for Affordable Training of Large DNNs NSDI'23
-
code Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning OSDI'22
paper Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning OSDI'22
-
code AMP: Automatically Finding Model Parallel Strategies with Heterogeneity Awareness NeurIPS '22
paper AMP: Automatically Finding Model Parallel Strategies with Heterogeneity Awareness NeurIPS '22
-
code NASPipe: High Performance and Reproducible Pipeline Parallel Supernet Training via Causal Synchronous Parallelism ASPLOS'22
-
code Varuna: Scalable, Low-cost Training of Massive Deep Learning Models Eurosys'22
paper Varuna: Scalable, Low-cost Training of Massive Deep Learning Models
-
code Chimera: efficiently training large-scale neural networks with bidirectional pipelines SC'21
paper Chimera: efficiently training large-scale neural networks with bidirectional pipelines SC'21
-
code Piper: Multidimensional Planner for DNN Parallelization NeurIPS'21
paper Piper: Multidimensional Planner for DNN Parallelization NeurIPS'21
-
code PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models ICML'21
paper PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models ICML'21
-
code DAPPLE: An Efficient Pipelined Data Parallel Approach for Large Models Training PPOPP'21
paper DAPPLE: An Efficient Pipelined Data Parallel Approach for Large Models Training PPOPP'21
-
code TeraPipe:Large-Scale Language Modeling with Pipeline Parallelism ICML'21
paper TeraPipe:Large-Scale Language Modeling with Pipeline Parallelism ICML'21
-
code PipeDream: Pipeline Parallelism for DNN Training SOSP'19
paper PipeDream: Pipeline Parallelism for DNN Training SOSP'19
-
code SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
paper SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
-
code ModelKeeper: Accelerating DNN Training via Automated Training Warmup NSDI'23
paper ModelKeeper: Accelerating DNN Training via Automated Training Warmup NSDI'23
-
code STRONGHOLD: Fast and Affordable Billion-scale Deep Learning Model Training SC'22
paper STRONGHOLD: Fast and Affordable Billion-scale Deep Learning Model Training SC'22
-
code Whale: Efficient Giant Model Training over Heterogeneous {GPUs}ATC'22
paper Whale: Efficient Giant Model Training over Heterogeneous {GPUs}ATC'22
-
code GeePS: Scalable Deep Learning on Distributed GPUs with a GPU-Specialized Parameter Server Eurosys'16
paper GeePS: Scalable Deep Learning on Distributed GPUs with a GPU-Specialized Parameter Server Eurosys'16
-
code ARK: GPU-driven Code Execution for Distributed Deep Learning NSDI'23
paper ARK: GPU-driven Code Execution for Distributed Deep Learning NSDI'23
-
code TopoOpt: Optimizing the Network Topology for Distributed DNN Training NSDI'23
paper TopoOpt: Optimizing the Network Topology for Distributed DNN Training NSDI'23
-
code Paella: Low-latency Model Serving with Virtualized GPU Scheduling SOSP'23
paper Paella: Low-latency Model Serving with Virtualized GPU Scheduling SOSP'23
-
code AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving OSDI'23
paper AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving OSDI'23
-
code Optimizing Dynamic Neural Networks with Brainstorm OSDI'23
paper Optimizing Dynamic Neural Networks with Brainstorm OSDI'23
-
code Fast and Efficient Model Serving Using Multi-GPUs with Direct-Host-Access Eurosys'23
paper Fast and Efficient Model Serving Using Multi-GPUs with Direct-Host-Access Eurosys'23
-
code Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor Programs.ASPLOS'23
paper Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor Programs.ASPLOS'23
-
code MPCFormer: fast, performant, and private transformer inference with MPC ICLR'23
paper MPCFormer: fast, performant, and private transformer inference with MPC ICLR'23
-
code High-throughput Generative Inference of Large Language Modelwith a Single GPU ICML'23
paper High-throughput Generative Inference of Large Language Modelwith a Single GPU ICML'23
-
code Cocktail: A Multidimensional Optimization for Model Serving in Cloud NSDI'22
paper Cocktail: A Multidimensional Optimization for Model Serving in Cloud NSDI'22
-
code Serving DNNs like Clockwork: Performance Predictability from the Bottom Up OSDI'20
paper Serving DNNs like Clockwork: Performance Predictability from the Bottom Up OSDI'20
-
code Exploiting Cloud Services for Cost-Effective, SLO-Aware Machine Learning Inference Serving ATC'19
paper Exploiting Cloud Services for Cost-Effective, SLO-Aware Machine Learning Inference Serving ATC'19
-
code Nexus: a GPU cluster engine for accelerating DNN-based video analysis SOSP'19
paper Nexus: a GPU cluster engine for accelerating DNN-based video analysis SOSP'19
-
code Clipper:A low-latency prediction-serving system NSDI'17
paper Clipper:A low-latency prediction-serving system NSDI'17
-
code MegaBlocks: Efficient Sparse Training with Mixture-of-Experts MLSYS'23
paper MegaBlocks: Efficient Sparse Training with Mixture-of-Experts MLSYS'23
-
code FastMoE: A Fast Mixture-of-Expert Training System PPOPP'22
paper FastMoE: A Fast Mixture-of-Expert Training System PPOPP'22
-
code AutoMoE: Neural Architecture Search for Efficient Sparsely Activated Transformers ICLR'23
paper AutoMoE: Neural Architecture Search for Efficient Sparsely Activated Transformers ICLR'23
-
code Shockwave: Fair and Efficient Cluster Scheduling for Dynamic Adaptation in Machine Learning NSDI'23
paper Shockwave: Fair and Efficient Cluster Scheduling for Dynamic Adaptation in Machine Learning NSDI'23
-
code Synergy : Looking Beyond GPUs for DNN Scheduling on Multi-Tenant Clusters OSDI'22
paper Synergy : Looking Beyond GPUs for DNN Scheduling on Multi-Tenant Clusters OSDI'22
-
code Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning OSDI'21
paper Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning OSDI'21
-
code Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads OSDI'20
paper Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads OSDI'20
-
code Chronus: A Novel Deadline-aware Scheduler for Deep Learning Training Jobs SOCC'21
paper Chronus: A Novel Deadline-aware Scheduler for Deep Learning Training Jobs SOCC'21
-
code An interference-aware scheduler for fine-grained GPU sharing Resources Eurosys'24
paper An interference-aware scheduler for fine-grained GPU sharing Resources Eurosys'24
-
code ElasticFlow: An Elastic Serverless Training Platform for Distributed Deep Learning ASPLOS'23
paper ElasticFlow: An Elastic Serverless Training Platform for Distributed Deep Learning ASPLOS'23
-
code Multi-Resource Interleaving for Deep Learning Training SIGCOMM'22
paper Multi-Resource Interleaving for Deep Learning Training SIGCOMM'22
-
code Out-of-order backprop: an effective scheduling technique for deep learning Eurosys'22
paper Out-of-order backprop: an effective scheduling technique for deep learning Eurosys'22
-
code KungFu: Making Training in Distributed Machine Learning Adaptive OSDI'20
paper KungFu: Making Training in Distributed Machine Learning Adaptive OSDI'20
-
code PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications OSDI'20
paper PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications OSDI'20
-
code GLake: optimizing GPU memory management and IO transmission ASPLOS'24
-
code Spada: Accelerating Sparse Matrix Multiplication with Adaptive Dataflow ASPLOS'23
paper Spada: Accelerating Sparse Matrix Multiplication with Adaptive Dataflow ASPLOS'23
-
code MISO: Exploiting Multi-Instance GPU Capability on Multi-Tenant GPU Clusters SOCC'22
paper MISO: Exploiting Multi-Instance GPU Capability on Multi-Tenant GPU Clusters SOCC'22
-
code Accpar: Tensor partitioning for heterogeneous deep learning accelerators HPCA'20
paper Accpar: Tensor partitioning for heterogeneous deep learning accelerators HPCA'20
-
code Hidet: Task Mapping Programming Paradigm for Deep Learning Tensor Programs ASPLOS'23
paper Hidet: Task Mapping Programming Paradigm for Deep Learning Tensor Programs ASPLOS'23
-
code CheckFreq: Frequent, Fine-Grained DNN Checkpointing FAST'22
paper CheckFreq: Frequent, Fine-Grained DNN Checkpointing FAST'22
-
code Efficient Quantized Sparse Matrix Operations on Tensor Cores SC'22
paper Efficient Quantized Sparse Matrix Operations on Tensor Cores SC'22
-
code PetS: A Unified Framework for Parameter-Efficient Transformers Serving ATC'22
paper PetS: A Unified Framework for Parameter-Efficient Transformers Serving ATC'22
-
code APNN-TC: Accelerating Arbitrary Precision Neural Networks on Ampere GPU Tensor Core SC'21
paper APNN-TC: Accelerating Arbitrary Precision Neural Networks on Ampere GPU Tensor Core SC'21
-
code Fluid: Resource-Aware Hyperparameter Tuning Engine MLSYS'21
paper Fluid: Resource-Aware Hyperparameter Tuning Engine MLSYS'21
-
code Baechi: Fast Device Placement on Machine Learning Graphs SOCC'20
paper Baechi: Fast Device Placement on Machine Learning Graphs SOCC'20
-
code Dynamic Parameter Allocation in Parameter Servers VLDB'20
paper Dynamic Parameter Allocation in Parameter Servers VLDB'20
-
code Data Movement Is All You Need: A Case Study on Optimizing Transformers paper Data Movement Is All You Need: A Case Study on Optimizing Transformers
-
code gSampler: Efficient GPU-Based Graph Sampling for Graph Learning SOSP'23
paper gSampler: Efficient GPU-Based Graph Sampling for Graph Learning SOSP'23
-
code Legion: Automatically Pushing the Envelope of Multi-GPU System for Billion-Scale GNN Training ATC'23
paper Legion: Automatically Pushing the Envelope of Multi-GPU System for Billion-Scale GNN Training ATC'23
-
code TC-GNN: Accelerating Sparse Graph Neural Network Computation Via Dense Tensor Core on GPUs ATC'23
paper TC-GNN: Accelerating Sparse Graph Neural Network Computation Via Dense Tensor Core on GPUs ATC'23
-
code CoGNN: Efficient Scheduling for Concurrent GNN Training on GPUs SC'22
paper CoGNN: Efficient Scheduling for Concurrent GNN Training on GPUs SC'22
-
code GNNAdvisor: An Efficient Runtime System for GNN Acceleration on GPUs OSDI'21
paper GNNAdvisor: An Efficient Runtime System for GNN Acceleration on GPUs OSDI'21
-
code Marius: Learning Massive Graph Embeddings on a Single Machine OSDI'21
paper Marius: Learning Massive Graph Embeddings on a Single Machine OSDI'21
-
code Accelerating Large Scale Real-Time GNN Inference Using Channel Pruning VLDB'21
paper Accelerating Large Scale Real-Time GNN Inference Using Channel Pruning VLDB'21
-
code Reducing Communication in Graph Neural Network Training SC'20
paper Reducing Communication in Graph Neural Network Training SC'20
-
code Zeus: Understanding and Optimizing {GPU} Energy Consumption of {DNN} Training NSDI'23
paper Zeus: Understanding and Optimizing {GPU} Energy Consumption of {DNN} Training NSDI'23
-
code EnvPipe: Performance-preserving DNN Training Framework for Saving Energy ATC'23
paper EnvPipe: Performance-preserving DNN Training Framework for Saving Energy ATC'23
-
code Characterizing Variability in Large-Scale, Accelerator-Rich Systems SC'22
paper Characterizing Variability in Large-Scale, Accelerator-Rich Systems SC'22
-
code Prediction of the Resource Consumption of Distributed Deep Learning Systems SIGMETRICS'22
paper Prediction of the Resource Consumption of Distributed Deep Learning Systems SIGMETRICS'22
We encourage all contributions to this repository. Open an issue or send a pull request.