You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
with spark 3.4 and scala 2.12.16 version on dataproc image 2. we were able to run our jobs by setting the below property.
--properties=spark.sql.legacy.allowUntypedScalaUDF=true
but since we have been migrated to 3.5 and scala 2.12.18 version of dataproc image 2. we are getting below error message.
"exception": "AnalysisException: [UNTYPED_SCALA_UDF] You\u0027re using untyped Scala UDF,
which does not have the input type information. Spark may blindly pass null to the Scala closure with primitive-type argument,
and the closure will see the default value of the Java type for the null argument, e.g. udf((x: Int) \u003d\u003e x, IntegerType),
the result is 0 for null input. To get rid of this error, you could:\n1. use typed Scala UDF APIs(without return type parameter),
e.g. udf((x: Int) \u003d\u003e x).\n2. use Java UDF APIs, e.g. udf(new UDF1[String, Integer] { override def call(s: String): Integer \u003d s.length() }, IntegerType),
if input types are all non primitive.\n3. set "spark.sql.legacy.allowUntypedScalaUDF" to "true" and use this API with caution."
is this property being deprecated with spark 3.5 version?
The text was updated successfully, but these errors were encountered:
yes simialr code was working fine with spark 3.3 and scala 2.12.16 version on dataproc image 2.0. it just that we need to specify the spark.sql.legacy.allowUntypedScalaUDF=true when submitting the spark job to dataproc cluster and with old bigquery connector.
Since then, we been migrated to image 2.2 and scala 2.12.18 and spark 3.5 version(which are compatible for new dataproc image) we are facing this issue.
with spark 3.4 and scala 2.12.16 version on dataproc image 2. we were able to run our jobs by setting the below property.
--properties=spark.sql.legacy.allowUntypedScalaUDF=true
but since we have been migrated to 3.5 and scala 2.12.18 version of dataproc image 2. we are getting below error message.
"exception": "AnalysisException: [UNTYPED_SCALA_UDF] You\u0027re using untyped Scala UDF,
which does not have the input type information. Spark may blindly pass null to the Scala closure with primitive-type argument,
and the closure will see the default value of the Java type for the null argument, e.g. udf((x: Int) \u003d\u003e x, IntegerType),
the result is 0 for null input. To get rid of this error, you could:\n1. use typed Scala UDF APIs(without return type parameter),
e.g. udf((x: Int) \u003d\u003e x).\n2. use Java UDF APIs, e.g. udf(new UDF1[String, Integer] { override def call(s: String): Integer \u003d s.length() }, IntegerType),
if input types are all non primitive.\n3. set "spark.sql.legacy.allowUntypedScalaUDF" to "true" and use this API with caution."
is this property being deprecated with spark 3.5 version?
The text was updated successfully, but these errors were encountered: