Cannot get expression from valueQuantity #1770

lakime · 2023-10-20T19:08:03Z

Describe the bug
While trying to fetch data from Observation entity - valueQuantity, from data generated via synthea - using databricks - I do receive error
Libraries:
au.csiro.pathling:library-api:6.3.1
latest pathling installed using pypi

IllegalArgumentException: requirement failed: All input types must be the same except nullable, containsNull, valueContainsNull flags. The expression is: if ((NOT instanceof(assertnotnull(input[0, org.hl7.fhir.r4.model.Observation, true]).getValue, class org.hl7.fhir.r4.model.Quantity) OR isnull(objectcast(assertnotnull(input[0, org.hl7.fhir.r4.model.Observation, true]).getValue, ObjectType(class org.hl7.fhir.r4.model.Quantity))))) null else named_struct(value, staticinvoke(class org.apache.spark.sql.types.Decimal, DecimalType(32,6), apply, if (instanceof(assertnotnull(input[0, org.hl7.fhir.r4.model.Observation, true]).getValue, class org.hl7.fhir.r4.model.Quantity)) objectcast(assertnotnull(input[0, org.hl7.fhir.r4.model.Observation, true]).getValue, ObjectType(class org.hl7.fhir.r4.model.Quantity)) else null.getValueElement.getValue, true, true, true)). The input types found are
StructType(StructField(id,StringType,true),StructField(value,DecimalType(32,6),true),StructField(value_scale,IntegerType,true),StructField(comparator,StringType,true),StructField(unit,StringType,true),StructField(system,StringType,true),StructField(code,StringType,true),StructField(_value_canonicalized,StructType(StructField(value,DecimalType(38,0),true),StructField(scale,IntegerType,true)),true),StructField(_code_canonicalized,StringType,true))
StructType(StructField(value,DecimalType(32,6),true)).

If I will remove "valueQuantity" - it works as expected

To Reproduce

Observation - To be checked quantities

observationfhir = json_resources.extract("Observation",
columns=[
exp("id", "Identifier"),
exp("status", "status"),
exp("category.first().coding.first().code", "category"),
exp("code.coding.code", "Observation_Code"),
exp("code.coding.display", "Observation_Name"),
exp("code.text", "Observation_Text"),
exp("subject.reference", "Subject_Reference"),
exp("encounter.reference", "Encounter_Reference"),
exp("valueQuantity.value","Value_Quantity")
]
)

observationfhir = observationfhir.withColumn('source',lit('payorq')).withColumn('sourceFile',lit(today)).withColumn('Value_Quantity', col('Value_Quantity').cast("string"))
display(observationfhir)

Expected behavior
values from FHIR files

johngrimes · 2023-10-23T07:43:32Z

Thanks @lakime, we have reproduced the issue and are working on a fix.

lakime · 2023-10-23T09:00:01Z

Probably it is wrong construct - as:

from pathling import PathlingContext, Expression as exp
from pyspark.sql.functions import split, explode, col, lit, expr, cast
from datetime import date

today = date.today()
pc = PathlingContext.create()
ndjson_dir = 'dbfs:/mnt/hda/raw/payorq/landing/'
json_resources = pc.read.ndjson(ndjson_dir)

I am able to fetch the data using sql query

%sql SELECT valueQuantity.value as Value_Quantity_Value, valueQuantity.unit as Value_Quantity_Unit, FROM bronzeraw.observation;

johngrimes · 2023-10-27T01:26:30Z

Hi @lakime,

We have done a bit of work to figure out what is happening here.

This is essentially caused by a bug in Spark, or an inability of Spark to deal with the expressions that we generate in certain scenarios. We're working on creating a bug report for this.

This behaviour is specific to reading data directly from a raw FHIR source, such as NDJSON or Bundles.

There are two workarounds, the first one is to simply cache the datasets involved in the query before running extract:

pc = PathlingContext.create()
ndjson_dir = 'dbfs:/mnt/hda/raw/payorq/landing/'
json_resources = pc.read.ndjson(ndjson_dir)

json_resources.read('Observation').cache()

observationfhir = json_resources.extract(  #...

The other workaround is to set the configuration parameter spark.sql.optimizer.nestedSchemaPruning.enabled to false:

spark = (
    SparkSession.builder
    .config("spark.sql.optimizer.nestedSchemaPruning.enabled", "false")
    .getOrCreate()
)

pc = PathlingContext.create(spark)

Perhaps you could try this and let us know if this solves your problem?

piotrszul · 2023-11-02T00:02:02Z

Reported to spark as a bug: SPARK-45766

johngrimes added a commit that referenced this issue Oct 24, 2023

Add failing test for #1770

2de6eae

johngrimes assigned piotrszul Oct 26, 2023

johngrimes mentioned this issue Oct 27, 2023

Data source cache method #1771

Open

johngrimes added the bug Something isn't working label Dec 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot get expression from valueQuantity #1770

Cannot get expression from valueQuantity #1770

lakime commented Oct 20, 2023

johngrimes commented Oct 23, 2023

lakime commented Oct 23, 2023 •

edited

Loading

johngrimes commented Oct 27, 2023 •

edited

Loading

piotrszul commented Nov 2, 2023

Cannot get expression from valueQuantity #1770

Cannot get expression from valueQuantity #1770

Comments

lakime commented Oct 20, 2023

Observation - To be checked quantities

johngrimes commented Oct 23, 2023

lakime commented Oct 23, 2023 • edited Loading

johngrimes commented Oct 27, 2023 • edited Loading

piotrszul commented Nov 2, 2023

lakime commented Oct 23, 2023 •

edited

Loading

johngrimes commented Oct 27, 2023 •

edited

Loading