-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random Deserialization Problem with fst 2.57 #270
Comments
awaiting your test case. Meanwhile try disabling the use of unsafe to at least get an exception instead of a segfault. Penalty is 10-20% performance loss |
@RuedigerMoeller I updated the initial bug description and added a link to the test case. If you need more informations let me know :) |
Have you tried to see what happens if you preregister each of the classes? Additionally, preregistering the classes would significantly reduce the size of your serialized data so it would be advised anyway. |
I tried to preregister the classes in my testcase. It makes no difference. But I think it is too late to register them when deserializing a stream written without registered classes. The problem is, that it only happens from time to time in our production system. There is to less traffic in the test system. So we can't easily test, if preregistering classes makes a difference. I hope the test case, as it is, will help someone with deep understanding of the library to find the problem :) |
We might be seeing something similar. We're still quite early in the debugging process, so not really sure which part of the stack to blame just yet. At the moment, we don't have any way of reliably reproducing our problem. Anyway, we're using fst in combination with netty for RPC. No unsafe (
This is what our deserialization looks like: class FstDecoder extends io.netty.handler.codec.LengthFieldBasedFrameDecoder {
FstDecoder( int maxObjectSize ) {
super( maxObjectSize, 0, 4, 0, 4 );
}
@Override
protected Object decode( io.netty.channel.ChannelHandlerContext ctx,
io.netty.buffer.ByteBuf in ) throws Exception {
io.netty.buffer.ByteBuf frame = (io.netty.buffer.ByteBuf)super.decode( ctx, in );
if ( frame == null ) {
return null;
}
try {
byte[] bytes = ByteBufUtil.getBytes( frame );
return fstConfiguration.getObjectInput( bytes ).readObject();
}
finally {
frame.release();
}
}
} And this is the error we're seeing:
|
See #235, we discovered that issue and it seems to be our root cause. |
@furti We have debugged our issue to great length and concluded that it seems to be related to our use of the G1GC garbage collector. Do you happen to be running it in production? |
Currently we use openjdk 8 without setting the GC explicit. When the error occured we used Oracle JRE 8 ,Also without changing the GC. So I assume we don't have G1GC enabled. I also checked locally by explicitly starting Eclipse with a different GC. And also the started test program has no GC set and runs on openjdk 8. But the error occurs. So I think our Problem has nothing to do with the GC. But I could be wrong. |
@furti Ok, maybe these are two different issues. For reference, my test repro is here https://github.com/perlun/fst-concurrency-issue - it would be interesting to see if this scenario fails for you without tweaking the GC setting. (change it in With the default GC, I have not seen this test failing on Java 8 so it would be very interesting to see how it behaves for you. |
Do you have String deduplication turned on as well as G1GC? We're seeing a concurrency issue "failed to read next byte" under heavy load which has been mentioned in a few other issues - and all mention G1GC strategy - I'm wondering if people also have string duplication turned on which requires G1GC. |
(For the moment, we are staying away from FST because of the severity of this bug. We cannot have the JVM crashing randomly; it's better to live with a few % lower performance and use another serialization library until this gets properly resolved on the FST side.) |
To my understanding this concurrency issue only occurs when you reuse the streams as specified in the doco for improved speed. Does that mean it is still possible to use FST concurrently so long as we don't take advantage of the reusable stream objects? |
@chrisco484 Unfortunately, no. If you look at this code in my repro repository, we instantiate new streams for each iteration in the loop. Even with this approach, we run into concurrency issues. If you find ways to workaround it, please let me know. for ( int i = 0; i < 1_000_000; i++ ) {
futures.add( pool.submit( () -> {
ByteArrayOutputStream stream = new ByteArrayOutputStream();
try {
writeObject( testObject, stream );
readObject( new ByteArrayInputStream( stream.toByteArray() ) );
}
catch ( Exception e ) {
System.err.println( "ERROR" + e );
throw new RuntimeException( e );
}
} ) );
} |
Oh wow, that's a worry. I see your writeObject and readObject methods are creating completely new instances and yet there's still a concurrency issue!. That shared FSTConfiguration object must be sharing objects between the threads in a "not totally threadsafe" way :) I think I just found something extra that I'm tempting fate with: In my app a library uses FST and so does the main app. They both instantiate separate FSTConfiguration objects and the various threads would be touching both - that might complicate the threading semantics even more because FSTConfiguration may be written assuming that there is only ever ONE FSTConfiguration instance within any running app and possibly thread locals assume complete monopolization of their resources but if there is another FSTConfiguration and it has been touched by the same thread then that could have serious consequences. I just checked the source code - when I have multiple libraries that need to use the FST library I think I need to use FSTConfiguration.getDefaultConfiguration() rather than calling FSTConfiguration.createDefaultConfiguration(). If using the latter method each library will be creating its own config instance (which may not be good in a multi threaded environment - I've seen byte stream with completely wrong data - data produced by a different library running in the same JVM - weird.) From the source code below, if I use getDefaultConfiguration instead then the JVM will only end up with a single FSTConfiguration instance - which is probably how its designed to work. static AtomicBoolean conflock = new AtomicBoolean(false); |
This caused me grey hair. Failing on prod on distributed highly-concurrent system where computations communicate through redisson where FST is the default codec. We had to switch to other codec. |
@tlamr Which one did you switch to? We're also evaluating alternatives to FST because, even though the concept and performance is awesome, we just can't have a crucial tech component that fails under highly concurrent loads and it seems no fix is in sight given the time that has elapsed since these issues were first raised. |
I switched it to std java serializer, since this one worked out of the box for the whole app and we needed some asap fix. @chrisco484 How is going the evalutation of alternatives? Any preference to some other codec? I fully agree with your view on correctness and speed, we are in the very same situation it seems :) |
@tlamr
The big attraction with FST was that it's performance compared to std
java serializer was so much better - size of streams are tiny and speed
of serialization/deserialization is soooo much faster plus combine this
with the fact that your standard java classes work "out of the box" with
FST meant it was a real winner.
So far in our evaluations we haven't found any other solution (other
than std java serializer) that doesn't require changes to our code base
or the addition of new "meta data" files to make these other codecs work.
So we're stuck between a rock and a hard place just hoping that FST will
get a fix soon-ish...
…On 15/12/2019 1:06 am, tlamr wrote:
I switched it to std java serializer, since this one worked out of the
box for the whole app and we needed some asap fix.
It seems that we would need to do some changes to be able to use other
codecs (kryo4 wanted no arg constructors, json based also wanted some
additional things,...)
I plan to investigate it further, but so far haven't dig into it.
@chrisco484 <https://github.com/chrisco484> How is going the
evalutation of alternatives? Any preference to some other codec?
I fully agree with your view on correctness and speed, we are in the
very same situation it seems :)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#270?email_source=notifications&email_token=AAHKEP37DMR5SGV2HEAKCBTQYTSAFA5CNFSM4GHOEYLKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEG4DJUQ#issuecomment-565720274>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHKEP3QMPBCFGBCSSWA72DQYTSAFANCNFSM4GHOEYLA>.
|
What about version |
We moved to kryo. |
For reference, kryo is here: https://github.com/EsotericSoftware/kryo |
@furti was this problem solved? we are facing similar issue and string.Equals causes occasional crashes. kryo is not an in-place replacement for java serializer. |
Hello all, I've a similar problem, and maybe it's not problem of concurrency. My project contains 3 serialisable object instances (data of internal knowledge base), each containing Sometimes it deserialise with wrong types, so My workaround is to use new instance of Part of my code :
|
Should be fixed by #311 . Please check. |
Using a configuration for each serialize call works, however it can be quite slow and memory intensive. So I tried using one per thread and ran into issues. My serializer is on one system and my deserializer is on a different system. Is it possible that FST is caching information in different order on the two systems and therefore causing the configurations to be out of sync? |
Did you check if the fix in #311 works? |
I'm still on Java 11, so I can't test any of the new releases. Note that I have a configuration per thread, so there is no concurrency issue. |
The patch should be a trivial backport to any release.
Sure, OK. |
We are using fast-serialization now for over a year in our application and it worked like a charm. But with the last version deployed this week, we switched from version 2.47 to 2.57. Everything worked fine in our QA enviroment. But in Production things got weird.
Basically we use the library to serialize and deserialize sessions to and from a Relational DB. In PROD the deserializing produces strange results in some cases. Mostly a java.lang.String.equals call failed inside the string class at line 982
int n = value.length;
. The internal value field of the String was null. This should never happen. Except when you mess up the String with reflection. But we were pretty sure we don't do this ;)It went that far, that the jvm terminated with a Sementation Error when such maleformed data was accessed.
I wrote a small programm that checked every session in the database. And there where indeed some sessions that failed. I removed them and let the program run again. But wait...now other sessions, that where valid in the first run, failed in the second run. But the binary data was exactly the same 😲
Now we had the idea to try it with the older version 2.47 we used before. And what should I say. The same sessions, written with version 2.57, could be read with version 2.47 without any problem. So the problem must lay in the deserialization logic of fst 2.57.
I tracked it down now to a set of 3 Session objects. When they are deserialized in a specific order, the deserialization does pretty weird stuff for the last sesison object. Most noticalbe it sets a field of type "String" to a List of other Objects.
Some things to note
private static final ThreadLocal<DefaultCoder> CODERS = ThreadLocal.withInitial(() -> new DefaultCoder());
But my test Testcase shows that it also happens in a simply main class without any threads and also with FSTOBjectInput Instances obtained by the FSTConfiguration. As long as you reuse a Instance it might happen from time to time.
TestCase
I have a TestCase ready. But I have to make sure first, that I'm allowed to upload it to github as it contains code written for our project at work.I will update this issue as soon as I have uploaded the test case.
The test case including a description is available at https://github.com/furti/fst-deserialization-problem
But maybe someone has the same Problem or has an idea what could be the cause for this strange behaviour?
I'm pretty sure it must be something in the library. But it could also be something we do wrong. So excuse me if that is the case :)
The text was updated successfully, but these errors were encountered: