Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Updated to use the newer Kryo 5 #514

Open
wants to merge 11 commits into
base: develop
Choose a base branch
from

Conversation

nicknezis
Copy link

This updates Kryo to the latest version. The previous 4.0.2 version was released over 2 years ago. Since then there have been many updates and fixes.

This link has an initital summary of some of the larger changes. https://groups.google.com/g/kryo-users/c/sBZ10dwrwFQ/m/hb6FF5ZXCQAJ

In 2018 when Kryo 5 was released, the author also stated.

Kryo issue tracker has gone from ~100 issues to 3, and the PRs are down to 1. I'm pretty happy with the state of the project now!

Some performance comparisons can also be found here: EsotericSoftware/kryo#743 (comment)

@CLAassistant
Copy link

CLAassistant commented Jan 7, 2021

CLA assistant check
All committers have signed the CLA.

@nicknezis nicknezis changed the title Updated to use the newer Kryo 5 WIP: Updated to use the newer Kryo 5 Jan 7, 2021
@johnynek
Copy link
Collaborator

johnynek commented Jan 7, 2021

Thanks for the PR.

The issue of upgrading can be a bit of challenge, because several users have decided to use kryo for long term storage. This is strongly not recommended since each version of kryo can change serialized data, yet it has happened.

Perhaps the right answer is to just say "too bad, don't upgrade it you have done this". I tend to think that is the right answer and we should probably make it clearer on the README that you should never store kryo data beyond the life of a single process.

Next, we do need to get the CI green before we can merge anything.

I would be willing to merge a PR that did the upgrade as described above (update README to be clear, get CI green).

Thanks again!

@nicknezis
Copy link
Author

nicknezis commented Jan 7, 2021

@johnynek Yes I agree. That definitely is a challenge. I'll update the README with this PR. I learned that the kryo-shaded jar is replaced with a newer versioned jar. There are version specific classes, so I will need to update some of the code. But one of the benefits of using this newer approach is that we might be able to support reading 4 and 5 at the same time while only writing out version 5. Some of this is described here. I can look into doing this. But in general it would be easier to take the approach you outlined. I'll do some research and report back. For now I've left this as a Work In Progress. My ultimate goal is to update Spark with the newer Kryo, but I want to be respectful of the various Chill use cases.

@nicknezis
Copy link
Author

@johnynek I just realized that part of the inspiration of the Kryo4 + Kryo5 class name change was inspired by an old comment you provided to the Kryo team. Small world. :)

@nicknezis
Copy link
Author

@johnynek Sharing a link to the Google Groups discussion about what approach to take to update to Kyro 5 in Chill + Spark. I would really appreciate your input.

@nicknezis
Copy link
Author

I was able to progress with the code edits using the versioned imports. I ran into a block because Chill depends on a Storm interface IKryoFactory which has the non-versioned classpath. Storm is still using 3.0.3 which is super old. I recently updated Apache Heron to use 5.0.3, so I think I could update Storm if the maintainers were ok with it.

Doing a quick search through the Storm codebase, I didn't see any mention of Chill. They did have chill-java listed in two license files which listed the dependency, but is it possible the framework doesn't directly depend on it? I can see analytics using chill as an optional dependency.

@johnynek
Copy link
Collaborator

johnynek commented Jan 8, 2021

I think we could possibly drop support for chill-storm.

I'm not sure anyone is using that module. And if they are they can use an old version or open and issue/PR to restore it.

It really isn't much code.

@nicknezis
Copy link
Author

I kind of like that approach. We could just add the little bit of code directly into Storm?

@nicknezis
Copy link
Author

I may need some help resolving the CI tests. Some of these are identifying the fact that this change won't be binary compatible because of the API change. Also I think some syntax checks may be failing, but I don't get a specific line in the error.

@johnynek
Copy link
Collaborator

I think you will need to disable the mima binary compatibility checking.

One way would be here:

chill/build.sbt

Line 124 in 6335bb2

val unreleasedModules = Set[String]("akka")

to make that set have all the modules in it, and when we merge and publish a new version, we would bump that up.

I think we should bump the version number, like 0.10.0-SNAPSHOT or something.

cc @regadas @nevillelyh what do you all think of this. Does scio use this?

@johnynek
Copy link
Collaborator

cc @ttim does twitter care about this?

This will create a nightmare scenario of updates with storm and summingbird if those are still used. If summingbird has been retired it might be good to lock the repo and let people know they are free to use it, but it is basically a retired tech.

try {
TBase prototype = tBaseClass.newInstance();
TBase prototype = tBaseClass.getDeclaredConstructor().newInstance();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this change out of curiosity?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newInstance() was marked deprecated by my IDE. This seemed to the the recommended replacement. I think it has to do with newInstance() bypassing a checked exception.

https://www.xspdf.com/resolution/53014482.html
https://stackoverflow.com/a/53014482/118999

There were some other deprecated APIs and solutions, but I wasn't sure what versions of Scala Chill is used with so I didn't make other changes. I did see some versions in the build.sbt, so would a proper test encompass building with each of those Scala versions?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, when I ran publishM2 to generate local artifacts, I didn't get a chill-java to use in my Spark testing. But when I changed the Scala version it did build. Could you give me a quick guideline for how I should be testing this library? Do I need to modify the build.sbt to check each Scala version?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'd rather separate changes that are core to the kryo5 change and those irrelevant.

As far as testing, there are the tests in the repo (sbt test), but if you want to do integration tests with spark, you will need to do the publish. I'm actually not a huge sbt expert to share much there.

You can set the scala version in sbt by doing: sbt ++2.12.12 publishM2, for instance.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted this change as requested.

@nicknezis
Copy link
Author

I started trying to update Storm to use Kryo 5, but hit a roadblock with a super old Twitter library called Carbonite (2016). I can't find that code anywhere. ☹️ It seems to be a library for saving Clojures using Kryo. Maybe that should move into Chill (if it hasn't already?)

@johnynek
Copy link
Collaborator

here is an idea of who might be a stakeholder here:

https://mvnrepository.com/artifact/com.twitter/chill/usages

@nicknezis
Copy link
Author

FYI, I'm still working this. Took a long break, after hitting a dead end trying to update Twitter Carbonite which doesn't seem to be open sourced anywhere. I'm refocusing on updating Spark and testing that with this updated version of Chill. Once I complete that testing, I'll be able to come back and update the status here.

@nicknezis
Copy link
Author

@johnynek I think I have the code edits that I need for the newer Kryo 5 API. I also was able to compile Spark with changes there while referencing a locally built Chill dependency. I'd like to get this merged in, but not sure what's needed to get the tests passing. Going to look into this next. Any help would be appreciated.

@nicknezis
Copy link
Author

Is it fair to assume that many of the values stored in this test file will need to change with the newer Kryo? The reason I ask is because I'm surprised that I didn't see similar changes in the move from Kryo 3 to 4.

@snehal-kolte
Copy link

Team, what is plan on merging this PR and moving to 5.0.0?

@nicknezis
Copy link
Author

Just a quick recap of my efforts and where I left off. I was running into issues updating the test artifacts for a successful local unit test run. If someone could help with that, it would be greatly appreciated.

I was not able to update Apache Storm because of some weird circular dependencies. Ultimately I got stumped by a library called Carbonite that was made by Twitter, but was not open sourced. No one seems to know where it lives or who could update it. Storm might be better served in removing the dependency.

I started updating Apache Spark with a locally built version of Chill which had these edits, but without Chill first being updated, it is in a blocked state.

@chiavennasca
Copy link

(Thread... rise from your grave...)

@nicknezis (and others) The Maven Central-hosted pom.xml for [email protected] points to https://github.com/sritchie/carbonite. Does that help in any way?

@roczei
Copy link

roczei commented Apr 5, 2023

Hi @nicknezis,

Are you still working on this pull request?

@nicknezis
Copy link
Author

I'm not, but I'm happy to help if someone wants to assist. I'm also happy discarding the pull request if no one wants the change.

@roczei
Copy link

roczei commented Apr 11, 2023

Thanks @nicknezis for the quick answer!

I think I have managed to fix all local unit test issues in this pull request. Registration is required by default in Kryo5:

https://github.com/EsotericSoftware/kryo/wiki/Migration-to-v5#configuration-changes

Here is the fix: roczei@b2ba2b3

As next step I would like to rebase it to the latest state of the develop branch.

@roczei
Copy link

roczei commented Apr 18, 2023

I have rebased the commits, fixed the conflicts and unit test issues, latest state:

roczei#1

There is one remaining unit test issue, opened a Kryo bug ticket (this is the previous update in this pull request).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants