You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I need to read only the data where ids array is not empty and it seem to work fine with a GCS connector.
spark.read.format("parquet")
.load("gs://some-path")
.filter($"ids" =!= Array.empty[String])
... // other filters
However when I try to do the same with a BigQuery connector I get an error:
spark.read.format("bigquery")
.option("viewsEnabled", "true")
.load("internal-table-name")
.filter($"ids" =!= Array.empty[String])
... // other filters
24/07/23 11:00:30 INFO ReadSessionCreator: |creation a read session for table null, parameters: |selectedFields=[name,ids],|filter=[(NOT (`ids` = ArraySeq())) AND (`ids` IS NOT NULL) AND (`name` = 'John') AND (`name` IS NOT NULL) AND (`time` <= TIMESTAMP '2024-07-18T23:59:59.999999Z') AND (`time` >= TIMESTAMP '2024-07-18T00:00:00Z') AND (`time` IS NOT NULL)]
com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.rpc.InvalidArgumentException: com.google.cloud.spark.bigquery.repackaged.io.grpc.StatusRuntimeException: INVALID_ARGUMENT: request failed: Query error: Function not found: ArraySeq at [1:189]
at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.rpc.ApiExceptionFactory.createException(ApiExceptionFactory.java:92)
at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.rpc.ApiExceptionFactory.createException(ApiExceptionFactory.java:41)
at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:86)
at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:66)
at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.grpc.GrpcExceptionCallable$ExceptionTransformingFuture.onFailure(GrpcExceptionCallable.java:97)
...
Seems like a predicate pushdown doesn't work correctly for this case.
Changing filter condition to .filter(array_size($"ids") > 0) works but this condition most likely not going to be pushed down.
I have a piece of code which reads the same data from GCS or BigQuery. Simplified schema for the data looks like this:
I need to read only the data where
ids
array is not empty and it seem to work fine with a GCS connector.However when I try to do the same with a BigQuery connector I get an error:
Seems like a predicate pushdown doesn't work correctly for this case.
Changing filter condition to
.filter(array_size($"ids") > 0)
works but this condition most likely not going to be pushed down.Dependencies used:
The text was updated successfully, but these errors were encountered: