-
Notifications
You must be signed in to change notification settings - Fork 246
Serialization
As I constantly see people tinkering with ByteArrayOutputStreams and stuff (very inefficent), so here is the simplest way of using fast serialization (threadsafe)
static FSTConfiguration conf = FSTConfiguration.createDefaultConfiguration();
// maybe register most frequently used classes on conf
[...]
// write
byte barray[] = conf.asByteArray(mySerializableObject);
[...]
// read
MyObject object = (MyObject)conf.asObject(barray);
in case you stream over network, you need to ensure you have read a full object on receiver side. Therefore first write the len of the object, then the object itself like:
// write
byte barray[] = conf.asByteArray(mySerializableObject);
stream.writeInt(barray.length);
stream.write(barray);
[..flush..]
// read
int len = stream.readInt();
int orglen = len;
byte buffer[] = new byte[len]; // this could be reused !
while (len > 0)
len -= in.read(buffer, buffer.length - len, len);
// skipped: check for stream close
Object readObj = conf.getObjectInput(buffer).readObject();
see (or use) TCPObjectSocket for a networking example.
Basically you just replace ObjectOutputStream, ObjectInputStream
with FSTObjectOutput,FSTObjectInput
.
public MyClass myreadMethod( InputStream stream ) throws IOException, ClassNotFoundException
{
FSTObjectInput in = new FSTObjectInput(stream);
MyClass result = (MyClass)in.readObject();
in.close(); // required !
return result;
}
public void mywriteMethod( OutputStream stream, MyClass toWrite ) throws IOException
{
FSTObjectOutput out = new FSTObjectOutput(stream);
out.writeObject( toWrite );
out.close(); // required !
}
if you know the type of the Object (saves some bytes for the class name of the initial Object) you can do:
public MyClass myreadMethod(InputStream stream) throws IOException, ClassNotFoundException
{
FSTObjectInput in = new FSTObjectInput(stream);
MyClass result = in.readObject(MyClass.class);
in.close();
return result;
}
public void mywriteMethod( OutputStream stream, MyClass toWrite ) throws IOException
{
FSTObjectOutput out = new FSTObjectOutput(stream);
out.writeObject( toWrite, MyClass.class );
out.close();
}
Note
if you write with a type, you also have to read with the same type.
Note
if you create an instance with each serialization you should close the FSTStream, because behind the scenes some datastructures are cached and reused. If this fails, you might observe a performance hit (too much object creation going on), especially if you encode lots of smallish objects.
In order to optimize object reuse and thread safety, FSTConfiguration provides 2 simple factory methods to obtain input/outputstream instances (they are stored thread local):
...
// ! reuse this Object, it caches metadata. Performance degrades massively
// if you create a new Configuration Object with each serialization !
static FSTConfiguration conf = FSTConfiguration.createDefaultConfiguration();
...
public MyClass myreadMethod(InputStream stream) throws IOException, ClassNotFoundException
{
FSTObjectInput in = conf.getObjectInput(stream);
MyClass result = in.readObject(MyClass.class);
// DON'T: in.close(); here prevents reuse and will result in an exception
stream.close();
return result;
}
public void mywriteMethod( OutputStream stream, MyClass toWrite ) throws IOException
{
FSTObjectOutput out = conf.getObjectOutput(stream);
out.writeObject( toWrite, MyClass.class );
// DON'T out.close() when using factory method;
out.flush();
stream.close();
}
This will create and reuse a single FSTIn/OutputStream instance per thread, which implies you should not save references to streams returned from that method. You can also use a global FSTConfiguration throughout your app using FSTConfiguration.getDefaultConfiguration() (that's what configuration free constructors do)
Note
FSTObjectIn/Output
are not threadsafe, only one thread can read/write at a time. FSTConfiguration (holding class metadata) is threadsafe and can be shared applicationwide.
In case of high load multithreaded en/decoding, FSTConfiguration internal locks may become a bottleneck. Use a FSTConfiguration per Thread then e.g. using thread locals like:
static ThreadLocal<FSTConfiguration> conf = new ThreadLocal() {
public FSTConfiguration initialValue() {
return FSTConfiguration.createDefaultConfiguration();
}
};
// *NOT* threadsafe. Use ThreadLocal in case
DefaultCoder coder = new DefaultCoder(); // reuse this (per thread)
byte serialized[] = coder.toByteArray( someObject );
Object deserialized = coder.toObject( serialized );
There are methods to directly serialize to an existing bytearray and direct memory adresses (OffHeapCoder). Checkout the different XXCoder classes.
To improve speed consider preregistering most frequent serialized classes like:
DefaultCoder coder =new DefaultCoder(true,
Car.class, CarBench.Engine.class,
CarBench.Model.class,
CarBench.Accel.class, CarBench.PerformanceFigures.class,
CarBench.FueldData.class, CarBench.OptionalExtras.class);
if you don't require detection of cycles/restore references inside object graph, set 'shared' to false in Constructor. This improves performance significantly.
This class defines the encoders/decoders used during serialization. Usually you just create one global singleton (instantiation of this class is very expensive). Usage of several distinct Configurations is for special use cases which require some in-depth knowledge of FST code. You probably never will need more than this one default instance.
e.g.
public class MyApplication {
static FSTConfiguration singletonConf = FSTConfiguration.createDefaultConfiguration();
public static FSTConfiguration getInstance() {
return singletonConf;
}
}
You can customize the FSTConfiguration returned by createDefaultConfiguration(). E.g. register new or different serializers, some hooks, set additional flags on defined serializers etc. . Just have a look at the source, there are also some utils such as asByteArray(Serializable object)
.
Ressolvement Order
- If a serializer is registered, this will be used
- Check for externalizable interface
- Check for JDK special methods (e.g. writeObject/readObject/readReplace/writeReplace). If found, use compatibility mode for that class (=>slow, avoid)
- Use default FST serialization implementation
Note use of configuration must always be symmetric, so sender and receiver have to use exactly identical configurations (+preregistered classes, etc.)
// the default to go for
FSTConfiguration.createDefaultConfiguration();
// android compatible. use this for android client/servers
FSTConfiguration.createAndroidDefaultConfiguration();
// fastest, but uses Unsafe class
// to write primitives and arrays.
FSTConfiguration.createFastBinaryConfiguration();
// (2.29 and higher) reads writes a Json representation.
// Uses jackson-core for raw Json parsing/generation
FSTConfiguration.createJsonConfiguration();
// reads/writes MinBin (~binary Json) format.
// can be read+pretty printed using the minbin package
// (also with minbin.js)
// Original classes are not required for decoding
FSTConfiguration.createMinBinConfiguration();
// Default singleton instance used if FSTConfiguration is
// ommited in Stream Constructor
FSTConfiguration.getDefaultConfiguration();
One easy and important optimization is to register classes which are serialized for sure in your application at the FSTConfiguration object. This way FST can avoid writing classnames.
final FSTConfiguration conf = FSTConfiguration.createDefaultConfiguration();
conf.registerClass(Image.class,Media.class,MediaContent.class,Image.Size.class,Media.Player.class);
Frequently it is no problem figuring out most frequently serialized classes and register them at application startup time. Especially for short messages/small objects the need to write full qualified classnames hampers performance. Anyway fst writes a class name only once per stream.
Note
Reader and writer configuration should be identical. Even the order of class registration matters.
In case you just want to serialize plain cycle-free message objects as fast as possible, you can make use of the FSTObjectIn/OutputNoShared
subclasses. Omitting cycle detection and detection of identical objects allows to cut corners in many places. You might also want to preregister all message classes. Performance difference can be up to 40%.
Note
Set FSTConfiguration.setShareReferences() to false. When mixing shared/unshared mode in a single application, create two instances of FSTConfiguration.
The unshared versions do not support JDK serialization methods such as readReplace/writeObject/readObject. If you need these to be reflected, use regular FSTObjectInput/Output
streams having a configuration with setSharedReferences(false). This will still yield a reasonable performance improvement.
The encoded Objects are written to the underlying stream once you close/flush the FSTOutputStream. Vice versa, the FSTInput reads the underlying stream in chunks until it starts decoding. This means you cannot read directly from blocking streams (e.g. as returned by a Socket). Example on how to solve this.
I know of users still preferring FST for very large object graphs. Maximum size is then determined by int index, so an object graph has a max size of ~1.5 GB.
WARNING: only applicable with unshared streams and full preregistration of all classes possibly referenced in a skipped subgraph. See issue #75.
There are scenarios (e.g. when using multicast), where a receiver conditionally wants to skip decoding parts of a received Object in order to save CPU time. With FST one can achieve that using the @Conditional annotation.
class ConditionalExample {
int messagetype;
@Conditional
BigObject aBigObject;
...
}
if you read the Object, do the following:
FSTObjectInput.ConditionalCallback conditionalCallback = new FSTObjectInput.ConditionalCallback() {
@Override
public boolean shouldSkip(Object halfDecoded, int streamPosition, Field field) {
return ((ConditionalExample)halfDecoded).messagetype != 13;
}
};
...
...
FSTObjectInput fstin = new FSTObjectInput(instream, conf);
fstin .setConditionalCallback(conditionalCallback);
Object res = in.readObject(cl);
The FSTObjectInput
will deserialize all primitive fields of ConditionalExample
then call 'shouldSkip' giving in the partially-deserialized Object. If the shouldSkip method returns false, the @Conditional reference will be decoded and set, else it will be skipped.
By default FST falls back to the methods defined by the JDK. Especially if private methods like 'writeObject' are involved, performance suffers, because reflection must be used. Additionally the efficiency of some stock JDK classes is cruel regarding size and speed. The FST default configuration already registers some serializers for common classes (popular Collections and some other frequently used classes).
So if you have trouble with stock JDK serilaization speed/efficiency, you might want to register a piece of custom code defining how to read and write an object of a specific class.
the basic interface to define the serialization of an Object is FSTObjectSerializer
. However in most cases you'll use a subclass of FSTBasicObjectSerializer
.
The FSTDateSerializer delivered with FST (note the registration in the instantiate method, you need to do it if you instantiate the object by yourself):
public class FSTDateSerializer extends FSTBasicObjectSerializer {
@Override
public void writeObject(FSTObjectOutput out, Object toWrite, FSTClazzInfo clzInfo, FSTClazzInfo.FSTFieldInfo referencedBy)
{
out.writeFLong(((Date)toWrite).getTime());
}
@Override
public void readObject(FSTObjectInput in, Object toRead, FSTClazzInfo clzInfo, FSTClazzInfo.FSTFieldInfo referencedBy)
{
}
@Override
public Object instantiate(Class objectClass, FSTObjectInput in, FSTClazzInfo serializationInfo, FSTClazzInfo.FSTFieldInfo referencee, int streamPositioin)
{
Object res = new Date(in.readFLong());
in.registerObject(res,streamPositioin,serializationInfo);
return res;
}
}
typically you will overrider just read and write method and let FST manage obejct instantiation. However sometimes its handy to override the instantiation method. This allows to replace classes at read time or e.g. return a singleton or do string deduplication.
a serializer is registered at the FSTConfiguration
Object:
static {
...
conf = FSTConfiguration.createDefaultConfiguration();
conf.registerSerializer(Date.class, new FSTDateSerializer(), false);
...
}
(ofc you have to use exactly this configuration later on in the FSTObjectIn/OutputStream).
Note
The reason having 3 methods (read, write, instantiate) allows to read from the stream before creating the object (e.g. to decide which class to create). Common case is to just override and implement read/write, however there are cases where read is empty and the full object is created and read in the instantiate method.
Note: some annotations have been removed in 2.x for the sake of maintainability.
FST defines a set of annotations influencing en/decoding of objects at runtime.
for tiny object graphs a performance gain can be achieved by turning off tracking of identical objects inside the object graph. However this requires all serialized objects to be cycle free (most useful when serializing arguments e.g. marhalling RPC's). Also stock JDK serialization mechanisms (e.g. readObject/writeObject) are not supported when using the Unshared stream variants (FSTObjectOutputUnshared
).
see Serialization page
applicable to String references. Often Strings contain de-facto enumerations, but are not declared so. To prevent the transmission of constant Strings, one can define up to 254 constants at a field reference. If at serialization time the field contains one of the constants, only one byte is written instead of the full string. If the field contains a different string, this one is transmitted. So the list can be incomplete (e.g. just denote frequent values or default values).
...
@OneOf({"EUR", "DOLLAR", "YEN"})
String currency;
...
@OneOf({"BB-", "CC-"})
Object rating;
...
This is a pure CPU-saving annotation. If a field referencing some Object is marked as @Flat, FST will not do any effort to reuse this Object. E.g. if a second reference to this Object exists, it will not be restored but contain a copy after deserialization. However if you know in advance there are no identical objects in the object graph, you can save some CPU time. The effects are pretty little, but in case you want to save any nanosecond possible consider this ;). If a class is marked as @Flat
, no instance of this class will be checked for equalness or identity. Using this annotation can be important when you serialize small objects as typical in remoting or offheap memory.
Can be used at class or field level. At class level, the classes contained in the Predict array, are added to the list of known classes, so no full classnames for those classes are written to the serialization stream.
If used at field level, it can save some bytes in case of lose typing. E.g. you have a reference typed 'Entity' and you know it is most likely a 'Employee' or 'Employer', you might add @Predict({Employ.class,Employer.class})
to that field. This way only one byte is used to denote the class of the Object. This also works for fields containing losely typed Collections.
Since FST does a pretty decent job minimizing the size of class names, this optimization makes sense only if you write very short objects, where the transmission of a classname might require more space than the object itself.
Add fields to classes without breaking compatibility to streams written with prior versions. See javadoc of the @Version annotation and related test cases.
Warning versioning enables newer code to read objects written by an older version, not vice versa.