RegressionEvaluator
is an Evaluator of regression models (e.g. ALS, DecisionTreeRegressor
, DecisionTreeClassifier, GBTRegressor
, GBTClassifier
, RandomForestRegressor, RandomForestClassifier, LinearRegression, RFormula
, NaiveBayes
, LogisticRegression, MultilayerPerceptronClassifier
, LinearSVC
, GeneralizedLinearRegression).
Metric | Description | isLargerBetter |
---|---|---|
|
Root mean squared error |
|
|
Mean squared error |
|
|
|
|
|
Mean absolute error |
|
import org.apache.spark.ml.evaluation.RegressionEvaluator
val regEval = new RegressionEvaluator().
setMetricName("r2").
setPredictionCol("prediction").
setLabelCol("label")
scala> regEval.isLargerBetter
res0: Boolean = true
scala> println(regEval.explainParams)
labelCol: label column name (default: label, current: label)
metricName: metric name in evaluation (mse|rmse|r2|mae) (default: rmse, current: r2)
predictionCol: prediction column name (default: prediction, current: prediction)
Parameter | Default Value | Description |
---|---|---|
|
Name of the classification metric for evaluation Can be one of the following: |
|
|
Name of the column with predictions |
|
|
Name of the column with indexed labels |
// prepare a fake input dataset using transformers
import org.apache.spark.ml.feature.Tokenizer
val tok = new Tokenizer().setInputCol("text")
import org.apache.spark.ml.feature.HashingTF
val hashTF = new HashingTF()
.setInputCol(tok.getOutputCol) // it reads the output of tok
.setOutputCol("features")
// Scala trick to chain transform methods
// It's of little to no use since we've got Pipelines
// Just to have it as an alternative
val transform = (tok.transform _).andThen(hashTF.transform _)
val dataset = Seq((0, "hello world", 0.0)).toDF("id", "text", "label")
// we're using Linear Regression algorithm
import org.apache.spark.ml.regression.LinearRegression
val lr = new LinearRegression
import org.apache.spark.ml.Pipeline
val pipeline = new Pipeline().setStages(Array(tok, hashTF, lr))
val model = pipeline.fit(dataset)
// Let's do prediction
// Note that we're using the same dataset as for fitting the model
// Something you'd definitely not be doing in prod
val predictions = model.transform(dataset)
// Now we're ready to evaluate the model
// Evaluator works on datasets with predictions
import org.apache.spark.ml.evaluation.RegressionEvaluator
val regEval = new RegressionEvaluator
scala> regEval.evaluate(predictions)
res0: Double = 0.0