Skip to content
This repository has been archived by the owner on May 1, 2024. It is now read-only.

Support FHIR Extensions in Spark Datasets #68

Open
mtsargent opened this issue Jan 8, 2020 · 4 comments
Open

Support FHIR Extensions in Spark Datasets #68

mtsargent opened this issue Jan 8, 2020 · 4 comments

Comments

@mtsargent
Copy link

Please fill out the below template as best you can.

Description of Issue

I am currently attempting to read in FHIR Bundles from a directory that contains JSON files and then extract certain resource types to Spark Datasets. While Datasets are being successfully created, Extensions that were part of resources in my FHIR bundle are being dropped altogether.

If I am looking at the correct places in code, it seems like lack of Extension support was a conscious decision:

// Contained resources and extensions not yet supported.

// Contained resources and extensions not yet supported.

I would like to be able to create Datasets for FHIR resources that still contain the Extensions from the original resources.

System Configuration

Project Version

Using Bunsen 0.4.9

Steps to Reproduce the Issue

Run this Scala code (or Java equivalent):

object BunsenExample {
  def main(args: Array[String]): Unit = {
    failBundles()
  }

  def failBundles(): Unit = {
    val conf = new SparkConf()
      .setMaster("local[*]")
      .set("spark.sql.crossJoin.enabled", "true")
    val spark = SparkSession.builder().config(conf).getOrCreate()
    
    val data = Bundles.forStu3().loadFromDirectory(spark, "/path/to/bundles/with/resource/extensions", 2).cache()

    val patients = Bundles.forStu3().extractEntry(spark, data, "Patient")
    patients.show()
    patients.printSchema()
  }
}

The patients dataset will not contain the extensions that were originally part of the Patient FHIR resources in the bundle. There does not appear to be a place for extensions to exist in the schema for the Dataset. I verified that the Extensions are being parsed successfully and are accessible through the BundleContainers returned if you run data.collect() and dive into the result.

Expected Outcomes

Add support for Extensions to be included in Datasets when they are created by extracting resources from a collection of FHIR Bundles.

@bdrillard
Copy link
Contributor

bdrillard commented Mar 7, 2020

Extensions and Contained resources are now supported in Bunsen 0.5.x, which applies a different paradigm to creating Spark rows from FHIR resources. The Bundles API in this new major version is still much the same, so try loading your data in the latest version to see if you get the support you require.

While Contained resource support was added in Bunsen 0.4.9 I believe, Extensions were known to be more difficult to implement in the earlier way we did things, so I don't think users can expect Extension support will be back-ported.

@mtsargent
Copy link
Author

I tried running an example similar to the one I posted (except using Observations instead of Patients), and I am still not seeing extensions when the resources are extracted from the bundle. Using a debugger, I can see the extensions exist on the resources in the Bundle. I am using com.cerner.bunsen:bunsen-spark-shaded:0.5.4. Is this the correct dependency?

Also, is there R4 support with bunsen 0.5.x? I was unable to find information similar to the information listed here for 0.5.x releases: https://engineering.cerner.com/bunsen/0.4.6/

@Teej42
Copy link

Teej42 commented Apr 28, 2020

Can we have an update on this question Matt posed here, please?

@dhallam
Copy link

dhallam commented Sep 1, 2020

From looking at the codebase, it looks like in a24851b on the 0.5.0-dev branch deleted the python tests for r4 and removed classes such as FhirEncoders which are still used by the bunsen-r4 sub-project. It looks like R4 has been abandoned in bunsen. Is that right?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants