-
-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assembly shaderules break serialization/deserialization with Dataset and Dataframe #279
Comments
Hi @oroundtree , thanks for reporting. Indeed sounds very strange that the presence of the shading rule creates a problem. Can you provide a minimal example to reproduce this including instructions? You can start by forking https://github.com/thesamet/sparksql-scalapb-test |
Here you go: https://github.com/oroundtree/sparksql-scalapb-test-oroundtree Master has the shaderule and the tests present, and the no-shaderule branch has the shaderule removed and the version number bumped up so I can test adding the two jars as unmanaged dependencies separately. Here are the steps I follow to reproduce it:
After that, you can give the non-shaded jar a try using the same steps as above, except:
EDIT Also worth noting I get the same results in both cases if I'm pulling the jar as a managed dependency in sbt or maven (i.e. from a private maven repository) EDIT x2 If you are using IDEA the IDE may complain that the imports from your unmanaged sbt dependency are not found. You can safely ignore the syntax highlighting |
Thanks, I quickly read through. For step 3, can you provide that "another project" as well so and make the edits in your message above, just so the issue is self contained? |
I've updated the steps with the small example project and more exact steps on how to reproduce the error. Hope it helps! |
Thanks for providing the detailed example. I was able to follow the instructions and see the issue. The example guides us into something that's a little tricky to reason about bringing : the assembled jar brings a shaded version of shapeless, and the parent project brings another unshaded copy. I think it was unintended, but the shaded jar brings also scalatest. The practice I want to encourage is to perform the assembly and shading as the final packaging step, just before it's shipped to a spark cluster.
|
If I didn't do this, every project that uses the proto definitions would need to have their individual .proto files edited when a change is made to a message definition |
Trying to understand the above. The suggested practice is to have all the intermediate dependencies (which can contain protos) remained unshaded, and only perform the assembly/shading for the final artifacts you deploy. You write that this would lead to editing of protos that import other protos upon their change - I'm not following this part - can you explain in more detail? What edits would be necessary? I would suggest to see how you can adopt your build to support the suggested practice of shading at the last step. sbt-assembly also calls out that introducing fat jars as dependencies is not a great idea. Having said that, I did look deeper and it looks like the first failure that happens in the encoder derivation involves invoking a macro in the shaded copy of shapeless. I've filed a bug with sbt-assembly along with a reproducible example. |
Closing due to inactivity. |
I've been working on an issue for a while now where certain features of sparksql-scalapb haven't been working correctly, mostly related to encoders and the following error when creating a Dataframe or Dataset of serialized protobuf data:
Unable to find encoder for type Array[Byte]. An implicit Encoder[Array[Byte]] is needed to store Array[Byte] instances in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases.
My scalatests for serialization and deserialization work when they are run in the same project that the protobuf messages are in, using the compiled code. However, they fail if I'm using the assembled jar unless I remove the following shaderule from build.sbt:
ShadeRule.rename("shapeless.**" -> "shadeshapeless.@1").inAll
I've also tested this and found the same results when running a class without scalatest dependencies.
I haven't yet seen any issues from removing the above shaderule, but I'm also not sure why it is there and what the implications of removing it are...
The text was updated successfully, but these errors were encountered: