This doc focuses on GPU related Scala API interfaces, and fortunately only one new API is introduced to support training on GPU.
XGBoost-Spark3.0 provides four classes as below to support ML things on spark:
The full name is ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier. It extends ProbabilisticClassifier[Vector, XGBoostClassifier, XGBoostClassificationModel].
- XGBoostClassifier(xgboostParams: Map[String, Any])
- all standard xgboost parameters are supported
- eval_sets: Map[String,DataFrame] is used to set the named evaluation dataset(s) for training.
Note: Only GPU related methods are listed below.
- setFeaturesCols(value: Seq[String]): XGBoostClassifier. This method sets the feature columns for training.
- value: a sequence of feature column name
- returns the classifier itself
The full name is ml.dmlc.xgboost4j.scala.spark.XGBoostClassificationModel. It extends ProbabilisticClassificationModel[Vector, XGBoostClassificationModel].
No GPU specific methods, use it as a normal spark model.
The full name is ml.dmlc.xgboost4j.scala.spark.XGBoostRegressor. It extends Predictor[Vector, XGBoostRegressor, XGBoostRegressionModel].
- XGBoostRegressor(xgboostParams: Map[String, Any])
- all standard xgboost parameters are supported
- eval_sets: Map[String,DataFrame] is used to set the named evaluation dataset(s) for training.
Note: Only GPU related methods are listed below.
- setFeaturesCols(value: Seq[String]): XGBoostRegressor. This method sets the feature columns for training.
- value: a sequence of feature column names to set
- returns the regressor itself
The full name is ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel. It extends PredictionModel[Vector, XGBoostRegressionModel].
No GPU specific methods, use it as a normal spark model.