-
Notifications
You must be signed in to change notification settings - Fork 14.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-18211: Override class loaders for class graph scanning in connect. #18403
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you able to write a regression test for this? Our existing test suite missed this on the library change.
Yes, made some updates and added a test for this scenario. On a side not, may I get auto approval to run the test suite on my PRs, will help with validating the Multiversion PRs. |
Sorry about that, clicking the approval button approves a single run and the label approves multiple runs. TIL |
connect/runtime/src/test/java/org/apache/kafka/connect/runtime/isolation/PluginScannerTest.java
Outdated
Show resolved
Hide resolved
// In addition, we need to explicitly specify the full classloader order, as classgraph only scans the classes available | ||
// in the classloaders and not the entire parent chain. Due to this reason if a plugin is extending a class present | ||
// in classpath/application it will not be able to find the parent class unless we explicitly specify the classloader order. | ||
List<ClassLoader> classLoaderOrder = new ArrayList<>(); | ||
ClassLoader cl = source.loader(); | ||
while (cl != null) { | ||
classLoaderOrder.add(cl); | ||
cl = cl.getParent(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this true? What is the ignoreParentClassLoaders
method doing then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some more analysis on this. My statement is somewhat incorrect. Classgraph does try to find and scan the classes from parent but it fails to do so. Elaborating more.
Classgraph scanning works by computing a list of class path URLs from the provided class loaders and then manually scans the .class
files under the URLs to retrieve information about those classes. Unless we instruct classloading explicitly (which we do here) the classes are not loaded using Java's classloading. If the desired class is not found in the scanning our code never tries to load the class. The list of class path URLs are ordered based on the ordering of the provided classloaders. All the logic is in the constructor of ClasspathFinder.
ClasspathFinder uses various ClassLoaderHandlers to obtain the set of URLs for a classloader. Connects PluginClassLoader uses URLClassLoaderHandler which works fine and gets the list of jars in a plugin path. But when it comes to the application classloader and platform classloader which has the URLs for classpath it uses JPMSClassLoaderHelper and tries to get the URLs through a illegal reflections access on a private field, which will fail on modern Java (throws an IllegalAccessException, can be mitigated using --illegal-access=permit
but this is not present since Java 17 and not really recommended). Even though the classloader chain is computed the, URLs in classpath are not obtained because of this. This is why some of the tests were failing where SubclassOfClasspathConverter
which extens ByteArrayConverter
was not computed to be an implementation of a converter since ByteArrayConverter
is in classpath.
To force classpath URL scanning explicitly we need ClasspathFinder to execute this part of code, which is only possible with classloader overrides if one of the provided classloader is application/platform classloader. Passing the classloader chain for the PluginClassLoader achieves this. ignoreParentClassLoader
is tied to overrideClassloader == null
check, hence is of no use with classLoader overrides.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, thanks for the explanation, I see that ClassGraph is going outside the classloader hierarchy for scanning, and just uses a list of jars. Here's what I was seeing:
Trunk
.addClassLoader(source.loader())
- Gradle wrapper
- Plugin
- Gradle worker
- build/ class directories
- connect-runtime
- build/ lib jars
- Gradle caches (dependencies)
Single Loader
.overrideClassLoaders(source.loader())
- Plugin
Loader and parents
List<ClassLoader> classLoaderOrder = new ArrayList<>();
ClassLoader cl = source.loader();
while (cl != null) {
classLoaderOrder.add(cl);
cl = cl.getParent();
}
...
.overrideClassLoaders(classLoaderOrder.toArray(new ClassLoader[0]))
- Runtime modules
- Plugin
- Gradle worker
- build/ class directories
- connect-runtime
- build/ lib jars
- Gradle caches (dependencies)
It doesn't appear that the order in which we specify the classloaders has an effect on the order in which scanning actually takes place. And I think that makes sense given the ClassLoaderHandler stuff; It's applying a deterministic ordering to generate these lists of jars.
Note that URLClassLoader subclasses do not need a custom ClassLoaderHandler (unless they need to override the delegation order, as with SpringBootRestartClassLoaderHandler), URLClassLoader subclasses are handled automatically by ClassGraph.
This sounds like it applies to us, because we have a child-first/parent-last delegation order, but we don't have a special handler telling ClassGraph about it. Maybe we can pursue upstreaming a handler to make the ordering work properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I can see that your reproduction case fails with the old code and succeeds with the new code, why is that?
I can see after the initial ClassGraph#scan()
call, it actually always finds the PluginClassLoader isolated copy of ByteArrayConverter, even with the trunk implementation. The difference happens inside of ClassInfoList#loadClasses
and ClassGraphClassLoader#loadClass
. Here's what I was seeing there:
Trunk
environmentClassLoaderDelegationOrder
- PlatformClassLoader
- AppClassLoader
- FilteringClassLoader (?)
- URLClassLoader (?)
- PluginClassLoader
Loader and parents
overrideClassLoaders
- PluginClassLoader
- AppClassLoader
- PlatformClassLoader
So the order of the specified classloaders does change the classloading order of the plugins, even though it doesn't change the scanning behavior.
I think the current implementation is satisfactory as a workaround for the behavior, but we should follow-up with ClassGraph and try and use their ClassLoaderHandler infrastructure to get the right classloader sorting. I explored circumventing the ClassGraphClassLoader entirely and calling Class.forName(..., ..., source.loader())
, but its a bit clunky because we have to also include a bunch of error handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the order of the specified classloaders does change the classloading order of the plugins, even though it doesn't change the scanning behavior.
Yes, that's why with addClassLoader
which uses environmentClassLoaderDelegationOrder
in ClassGraphClassLoader
always finds the files in the classpath loader (AppClassLoader) before PluginClassLoder has a chance to kick in.
Also, yeah maybe with a special handler for PluginClassLoader we can avoid having to pass the entire class loader chain in the reverse order.
connect/runtime/src/test/java/org/apache/kafka/connect/runtime/isolation/TestPlugins.java
Outdated
Show resolved
Hide resolved
connect/runtime/src/main/java/org/apache/kafka/connect/runtime/isolation/ReflectionScanner.java
Outdated
Show resolved
Hide resolved
// In addition, we need to explicitly specify the full classloader order, as classgraph only scans the classes available | ||
// in the classloaders and not the entire parent chain. Due to this reason if a plugin is extending a class present | ||
// in classpath/application it will not be able to find the parent class unless we explicitly specify the classloader order. | ||
List<ClassLoader> classLoaderOrder = new ArrayList<>(); | ||
ClassLoader cl = source.loader(); | ||
while (cl != null) { | ||
classLoaderOrder.add(cl); | ||
cl = cl.getParent(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, thanks for the explanation, I see that ClassGraph is going outside the classloader hierarchy for scanning, and just uses a list of jars. Here's what I was seeing:
Trunk
.addClassLoader(source.loader())
- Gradle wrapper
- Plugin
- Gradle worker
- build/ class directories
- connect-runtime
- build/ lib jars
- Gradle caches (dependencies)
Single Loader
.overrideClassLoaders(source.loader())
- Plugin
Loader and parents
List<ClassLoader> classLoaderOrder = new ArrayList<>();
ClassLoader cl = source.loader();
while (cl != null) {
classLoaderOrder.add(cl);
cl = cl.getParent();
}
...
.overrideClassLoaders(classLoaderOrder.toArray(new ClassLoader[0]))
- Runtime modules
- Plugin
- Gradle worker
- build/ class directories
- connect-runtime
- build/ lib jars
- Gradle caches (dependencies)
It doesn't appear that the order in which we specify the classloaders has an effect on the order in which scanning actually takes place. And I think that makes sense given the ClassLoaderHandler stuff; It's applying a deterministic ordering to generate these lists of jars.
Note that URLClassLoader subclasses do not need a custom ClassLoaderHandler (unless they need to override the delegation order, as with SpringBootRestartClassLoaderHandler), URLClassLoader subclasses are handled automatically by ClassGraph.
This sounds like it applies to us, because we have a child-first/parent-last delegation order, but we don't have a special handler telling ClassGraph about it. Maybe we can pursue upstreaming a handler to make the ordering work properly.
// In addition, we need to explicitly specify the full classloader order, as classgraph only scans the classes available | ||
// in the classloaders and not the entire parent chain. Due to this reason if a plugin is extending a class present | ||
// in classpath/application it will not be able to find the parent class unless we explicitly specify the classloader order. | ||
List<ClassLoader> classLoaderOrder = new ArrayList<>(); | ||
ClassLoader cl = source.loader(); | ||
while (cl != null) { | ||
classLoaderOrder.add(cl); | ||
cl = cl.getParent(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I can see that your reproduction case fails with the old code and succeeds with the new code, why is that?
I can see after the initial ClassGraph#scan()
call, it actually always finds the PluginClassLoader isolated copy of ByteArrayConverter, even with the trunk implementation. The difference happens inside of ClassInfoList#loadClasses
and ClassGraphClassLoader#loadClass
. Here's what I was seeing there:
Trunk
environmentClassLoaderDelegationOrder
- PlatformClassLoader
- AppClassLoader
- FilteringClassLoader (?)
- URLClassLoader (?)
- PluginClassLoader
Loader and parents
overrideClassLoaders
- PluginClassLoader
- AppClassLoader
- PlatformClassLoader
So the order of the specified classloaders does change the classloading order of the plugins, even though it doesn't change the scanning behavior.
I think the current implementation is satisfactory as a workaround for the behavior, but we should follow-up with ClassGraph and try and use their ClassLoaderHandler infrastructure to get the right classloader sorting. I explored circumventing the ClassGraphClassLoader entirely and calling Class.forName(..., ..., source.loader())
, but its a bit clunky because we have to also include a bunch of error handling.
connect/runtime/src/main/java/org/apache/kafka/connect/runtime/isolation/ReflectionScanner.java
Outdated
Show resolved
Hide resolved
connect/runtime/src/main/java/org/apache/kafka/connect/runtime/isolation/ReflectionScanner.java
Outdated
Show resolved
Hide resolved
Hi @gharris1727, shouldn't we merge this before the 15th and port it to 4.0. TIA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Thanks @snehashisp for finding and fixing this regression before it affected users!
@gharris1727 Do you want to cherry-pick this one to 4.0 branch too? |
…ct. (#18403) Reviewers: Greg Harris <[email protected]>
@dajac Just cherry picked it, thanks! |
Addresses the issue highlighted in https://issues.apache.org/jira/browse/KAFKA-18211
Committer Checklist (excluded from commit message)