Angel 1.0.0
Release v1.0.0
-
ParameterServer功能
- 基于Matrix/Vector的模型自动切分和管理,兼顾稀疏和稠密两种格式
- 支持对Model进行Push和Pull操作,可以自定义复杂的psFunc
- 提供多种同步控制机制(BSP/SSP/ASP)
-
开发运行
- 语言支持:系统基于Scala和Java开发,用户可以自由选择其中一种
- 部署方便:可以直接在Yarn社区版本中运行,也支持本地调试模式
- 数据切分: 自动切分读取训练数据,默认兼容了Hadoop FS接口
- 增量训练:训练过程中会自动Checkpoint,而且支持加载模型后,增量训练
-
PS Service
- 只启动PSServer和PSAngent,为其他分布式计算平台提供PS服务
- 基于PS-Service,不需要修改Spark核心代码,直接开发Spark-on-Angel算法,该模式无缝支持Breeze数值运算库
-
算法库
- 集成Logistic Regression,SVM,KMeans,LDA,MF,GBDT等机器学习算法
- 多种优化方法,包括ADMM,OWLQN, LBFGS和GD
- 支持多种损失函数、评估指标,包含L1、L2正则项
-
算法优化
- LDA采用了F+LDA算法用于加速采样的速度,同时利用流式参数获取的方法减少网络参数获取的延迟
- GBDT使用两阶段树分裂算法,将部分计算转移到PS,减少网络传输,提升速度
Release v1.0.0
-
ParameterServer Functionalities
- Automatically partitions and manages models whose parameters can be represented as matrix/vector, supporting sparse and dense types
- Supports push/pull operations and customized psFunc
- Provides multiple concurrency control mechanisms(BSP/SSP/ASP)
-
Development & Execution
- Language support: the system is developed with Scala and Java; users can use either one of them as choice
- Deployment: can be deployed to Yarn or run on local
- Data partitioning: automatically partitions and reads training data, compatible with Hadoop FS interface by default
- Incremental training: automatically generates checkpoint during training, supporting incremental training with reloaded model
-
PS Service
- Only starts up PSServer and PSAgent, providing PS service for other distributed computing platforms
- Spark-on-Angel algorithms can be developed based on PS-Service without changing Spark source code, supporting Breeze NumericOps seamlessly
-
Algorithms Library
- Contains algorithms such as Logistic Regression, SVM, KMeans, LDA, MF, GBDT, etc.
- Supports various optimization methods, including ADMM, OWLQN, LBFGS and GD
- Supports various loss functions and metrics, L1/L2 regularization
-
Algorithm Optimization
- LDA: F+LDA accelerates sampling, while retrieving parameters in a streaming fashion to reduce network latency
- GBDT: utilize two-stage tree-splitting to transfer parts of computing load to PS, reducing network communications and improve for speed