Skip to content

Latest commit

 

History

History
124 lines (93 loc) · 3.73 KB

spark-mllib-RegressionEvaluator.adoc

File metadata and controls

124 lines (93 loc) · 3.73 KB

RegressionEvaluator — Evaluator of Regression Models

RegressionEvaluator is an Evaluator of regression models (e.g. ALS, DecisionTreeRegressor, DecisionTreeClassifier, GBTRegressor, GBTClassifier, RandomForestRegressor, RandomForestClassifier, LinearRegression, RFormula, NaiveBayes, LogisticRegression, MultilayerPerceptronClassifier, LinearSVC, GeneralizedLinearRegression).

Table 1. RegressionEvaluator’s Metrics and isLargerBetter Flag
Metric Description isLargerBetter

rmse

Root mean squared error

false

mse

Mean squared error

false

r2

true

mae

Mean absolute error

false

import org.apache.spark.ml.evaluation.RegressionEvaluator
val regEval = new RegressionEvaluator().
  setMetricName("r2").
  setPredictionCol("prediction").
  setLabelCol("label")

scala> regEval.isLargerBetter
res0: Boolean = true

scala> println(regEval.explainParams)
labelCol: label column name (default: label, current: label)
metricName: metric name in evaluation (mse|rmse|r2|mae) (default: rmse, current: r2)
predictionCol: prediction column name (default: prediction, current: prediction)
Table 2. RegressionEvaluator' Parameters
Parameter Default Value Description

metricName

areaUnderROC

Name of the classification metric for evaluation

Can be one of the following: mae, mse, rmse (default), r2

predictionCol

prediction

Name of the column with predictions

labelCol

label

Name of the column with indexed labels

// prepare a fake input dataset using transformers
import org.apache.spark.ml.feature.Tokenizer
val tok = new Tokenizer().setInputCol("text")

import org.apache.spark.ml.feature.HashingTF
val hashTF = new HashingTF()
  .setInputCol(tok.getOutputCol)  // it reads the output of tok
  .setOutputCol("features")

// Scala trick to chain transform methods
// It's of little to no use since we've got Pipelines
// Just to have it as an alternative
val transform = (tok.transform _).andThen(hashTF.transform _)

val dataset = Seq((0, "hello world", 0.0)).toDF("id", "text", "label")

// we're using Linear Regression algorithm
import org.apache.spark.ml.regression.LinearRegression
val lr = new LinearRegression

import org.apache.spark.ml.Pipeline
val pipeline = new Pipeline().setStages(Array(tok, hashTF, lr))

val model = pipeline.fit(dataset)

// Let's do prediction
// Note that we're using the same dataset as for fitting the model
// Something you'd definitely not be doing in prod
val predictions = model.transform(dataset)

// Now we're ready to evaluate the model
// Evaluator works on datasets with predictions

import org.apache.spark.ml.evaluation.RegressionEvaluator
val regEval = new RegressionEvaluator

scala> regEval.evaluate(predictions)
res0: Double = 0.0

Evaluating Model Output — evaluate Method

evaluate(dataset: Dataset[_]): Double
Note
evaluate is part of Evaluator Contract.

evaluate…​FIXME