Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose DefaultSerializerProvider.flushCachedSerializers()? (or prevent Metaspace leaking without it) #1905

Closed
mattsheppard opened this issue Jan 22, 2018 · 8 comments

Comments

@mattsheppard
Copy link

If I provide Jackson's ObjectMapper new dynamically generated types on each request, I seem to get a process which grows in metaspace forever (or eventually throws an OutOfMemoryError error if I set a maximum bound on metaspace).

I created an example project to exhibit this at https://github.com/mattsheppard/JacksonDatabindMetaspaceLeak (it uses groovy to generate the dynamic classes because that's what the application I originally found the problem in used, but it doesn't appear to be groovy specific).

Note that this leaking occurs even if I call the TypeFactory's clear cache method which was discussed in #489.

I stumbled across ((DefaultSerializerProvider) mapper.getSerializerProvider()).flushCachedSerializers(); after exploring some heap dumps, and calling that every once-in-a-while seems to be sufficient to run more-or-less indefinitely.

If it's not possible to use weak references or allow for some sort of bounded cache to avoid the actual problem (#489 suggests you may be reluctant), would it at least be possible to expose the flush method on SerializerProvider to avoid the cast?

@cowtowncoder
Copy link
Member

Interesting idea. Yes, ideally caches should always be size-bound. My main concern is that construction of serializers/deserializers is really, really expensive, so it's bit different from things like reusing buffers.
One possibility would indeed be ability to allow explicit flushing; and with 3.0 could add upper limit on size, which could be configurable (since we'll switch to builder-based construction which will make it bit more convenient to add new non-feature config options).

I think use of weak references may not work quite as well here (compared to buffer recycling), but combination of bounded cache and ability to clear seems like an improvement.

One thing to note, too, which may be obvious: simply dropping ObjectMapper, constructing new one, will achieve clearing as well. But I think -- although am not 100% certain without testing -- even using ObjectMapper.copy() achieves this, but retains all configuration (including modules).
So in the meantime perhaps this could be a work-around.

As to how to expose flush: I don't think it should be in SerializerProvider but in ObjectMapper itself. This because although provider has access to cache, operation conceptually belongs to mapper and not context object (role SerializerProvider has, even though it's bit strange combo of factory/blue-print, which creates instances).
But that's relatively minor detail.

@cowtowncoder
Copy link
Member

Ok. So... first things first: bound caching should be done, but for 3.0:

Second part on exposing... after thinking about this a bit, I think that existing exposure should work. This is quite specialized functionality and the intent with limited SerializerProvider API is to only expose methods needed for its primary function; accessing current contextual state, including serializers (it should really be named SerializationContext, but...).
I could consider renaming DefaultSerializerProvider to something else (it's not just default implementation; it is the internal API for databind core).

Given this I think situation as is should remain for 2.9: I don't want to add API change that is not strictly needed, even though I understand that it'd be more convenient not need cast.

@LukeButters
Copy link

Would you consider the possibility of letting the outside world define the implementation of the cache?

@cowtowncoder
Copy link
Member

@LukeButtersFunnelback I am not dead set against such possibility. There'd be many details wrt API exposed for 3.0, to be discussed, but I am open to such possibility, yes.

@mattsheppard
Copy link
Author

Fair enough - If you are going to bound SerializerCache and DeserializerCache in 3.0 could TypeFactory. _typeCache could come along for the ride?

@mattsheppard
Copy link
Author

Oh, yeah - Re dropping and recreating ObjectMapper - Yes, I can see how that could be a workable approach. Unfortunately in my real case Spring has passed it off to a lot of other places which makes that tricky to actually do unless there's some spring-magic I'm unaware of.

Potentially I could try to wrap the ObjectMapper originally given to Spring with something that passes calls through and occasionally rebuilds the underlying mapper. That said, if there's a good prospect of it being fixed in 3.0 we can probably just wait for that.

@cowtowncoder
Copy link
Member

@mattsheppard TypeFactory cache is limited to 200 types already in 2.9, so not sure if further work would be needed. Exposing its clear method would probably make sense if some "super-flush" was added in ObjectMapper; something that would try to get rid of all cached state (which I think actually makes some sense -- some app servers would benefit from that probably, wrt hot reloading; apparently ClassLoaders can have dangling refs otherwise).

Wrt dropping ObjectMapper... true. That is problematic for indirect usage. The idea of delegating mapper has been floating around: one open idea is "safe" ObjectMapper, which would not expose any checked exceptions -- needed for Java 8 streams, for example. Something like that would be great, but seemed difficult to pull off with 2.x.
More general ability to wrap might be useful, if there's a natural way to achieve that.

With Jackson 3.x I think the idea of both more configurable and replaceable components can help.
One challenge wrt internal caches is that they are rather optimized for specific usage: serializer cache, for example, has both "private, zero-lock" cache (read-only one) and "shared" one.
This is optimized for common case where set of serializers cached converges to a steady set in which nothing changes: if so, a read-only immutable instance (which is shared... so naming for "shared" instance is poor choice; it really is "shared mutable", adding lock overhead).
So exposing an API for that gets tricky, considering that it also has to manage actual construction and initialization of serializers.
For deserializers handling is simpler just because deserializer access is much more static: single root-level deserializer generally fetches all dependant deserializers, and performance of lookup is not as dominant.

@mattsheppard
Copy link
Author

TypeFactory - Ah, so it is - not sure how I missed that. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants