-
Notifications
You must be signed in to change notification settings - Fork 246
Home
Documentation for 1.x is archived here
FST 1.x is somewhat faster, 2.x pays a small price for an overall better abstraction and additional features.
changes from 1.x to 2.x:
- renamed package :-)
- Removed some old classes (rarely used stuff like OffHeap/Compressed Objects classes) + some annotations & flags in order to get fst easier to maintain.
- Added limited versioning support
- KSon: easy text => object mapping with an extension of JSon. [config files, testdata]
- New implementation of OffHeap support: Easy to use OffHeap Map, persistant Map's (based on memory mapped files). Focus is on convenience, ease of use + fast iteration (leverages fast-serialization for that).
- MinBin binary codec to enable cross-platform serialization (currently only javascript reader implemented). Also enables reading serialized streams without the need to have the original classes.
- cleaned up test mess.
##How to use Serialization
Basically you just replace ObjectOutputStream, ObjectInputStream
with FSTObjectOutput,FSTObjectInput
.
public MyClass myreadMethod( InputStream stream ) throws IOException, ClassNotFoundException
{
FSTObjectInput in = new FSTObjectInput(stream);
MyClass result = (MyClass)in.readObject();
in.close(); // required !
return result;
}
public void mywriteMethod( OutputStream stream, MyClass toWrite ) throws IOException
{
FSTObjectOutput out = new FSTObjectOutput(stream);
out.writeObject( toWrite );
out.close(); // required !
}
if you know the type of the Object (saves some bytes for the class name of the initial Object) you can do:
public MyClass myreadMethod(InputStream stream) throws IOException, ClassNotFoundException
{
FSTObjectInput in = new FSTObjectInput(stream);
MyClass result = in.readObject(MyClass.class);
in.close();
return result;
}
public void mywriteMethod( OutputStream stream, MyClass toWrite ) throws IOException
{
FSTObjectOutput out = new FSTObjectOutput(stream);
out.writeObject( toWrite, MyClass.class );
out.close();
}
Note
if you write with a type, you also have to read with the same type.
Note
if you create an instance with each serialization you should close the FSTStream, because behind the scenes some datastructures are cached and reused. If this fails, you might observe a performance hit (too much object creation going on), especially if you encode lots of smallish objects.
In order to optimize object reuse and thread safety, FSTConfiguration provides 2 simple factory methods to obtain input/outputstream instances (they are stored thread local):
...
// ! reuse this Object, it caches metadata. Performance degrades massively
// if you create a new Configuration Object with each serialization !
static FSTConfiguration conf = FSTConfiguration.createDefaultConfiguration();
...
public MyClass myreadMethod(InputStream stream) throws IOException, ClassNotFoundException
{
FSTObjectInput in = conf.getObjectInput(stream);
MyClass result = in.readObject(MyClass.class);
// DON'T: in.close(); here prevents reuse and will result in an exception
stream.close();
return result;
}
public void mywriteMethod( OutputStream stream, MyClass toWrite ) throws IOException
{
FSTObjectOutput out = conf.getObjectOutput(stream);
out.writeObject( toWrite, MyClass.class );
// DON'T out.close() when using factory method;
out.flush();
stream.close();
}
This will create and reuse a single FSTIn/OutputStream instance per thread, which implies you should not save references to streams returned from that method. You can also use a global FSTConfiguration throughout your app using FSTConfiguration.getDefaultConfiguration() (that's what configuration free constructors do)
Note
FSTObjectIn/Output
are not threadsafe, only one thread can read/write at a time. FSTConfiguration (holding class metadata) is threadsafe and can be shared applicationwide.
This class defines the encoders/decoders used during serialization. Usually you just create one global singleton (instantiation of this class is very expensive). Usage of several distinct Configurations is for special use cases which require some in-depth knowledge of FST code. You probably never will need more than this one default instance.
e.g.
public class MyApplication {
static FSTConfiguration singletonConf = FSTConfiguration.createDefaultConfiguration();
public static FSTConfiguration getInstance() {
return singletonConf;
}
}
You can customize the FSTConfiguration returned by createDefaultConfiguration(). E.g. register new or different serializers, some hooks, set additional flags on defined serializers etc. . Just have a look at the source, there are also some utils such as asByteArray(Serializable object)
.
Ressolvement Order
- If a serializer is registered, this will be used
- Check for externalizable interface
- Check for JDK special methods (e.g. writeObject/readObject/readReplace/writeReplace). If found, use compatibility mode for that class (=>slow, avoid)
- Use default FST serialization implementation
####Pregistering Classes
One easy and important optimization is to register classes which are serialized for sure in your application at the FSTCOnfiguration object. This way FST can avoid writing classnames.
final FSTConfiguration conf = FSTConfiguration.createDefaultConfiguration();
conf.registerClass(Image.class,Media.class,MediaContent.class,Image.Size.class,Media.Player.class);
Frequently it is no problem figuring out most frequently serialized classes and register them at application startup time. Especially for short messages/small objects the need to write full qualified classnames hampers performance. Anyway fst writes a class name only once per stream.
Note
Reader and writer configuration should be identical. Even the order of class registration matters.
In case you just want to serialize plain cycle-free message objects as fast as possible, you can make use of the FSTObjectIn/OutputNoShared
subclasses. Omitting cycle detection and detection of identical objects allows to cut corners in many places. You might also want to preregister all message classes. Performance difference can be up to 40%.
Note
Set FSTConfiguration.setShareReferences() to false. When mixing shared/unshared mode in a single application, create two instances of FSTConfiguration.
The unshared versions do not support JDK serialization methods such as readReplace/writeObject/readObject. If you need these to be reflected, use regular FSTObjectInput/Output
streams having a configuration with setSharedReferences(false). This will still yield a reasonable performance improvement.
The encoded Objects are written to the underlying stream once you close/flush the FSTOutputStream. Vice versa, the FSTInput reads the underlying stream in chunks until it starts decoding. This means you cannot read directly from blocking streams (e.g. as returned by a Socket). Example on how to solve this.
I know of users still preferring FST for very large object graphs. Maximum size is then determined by int index, so an object graph has a max size of ~1.5 GB.
There are scenarios (e.g. when using multicast), where a receiver conditionally wants to skip decoding parts of a received Object in order to save CPU time. With FST one can achieve that using the @Conditional annotation.
class ConditionalExample {
int messagetype;
@Conditional
BigObject aBigObject;
...
}
if you read the Object, do the following:
FSTObjectInput.ConditionalCallback conditionalCallback = new FSTObjectInput.ConditionalCallback() {
@Override
public boolean shouldSkip(Object halfDecoded, int streamPosition, Field field) {
return ((ConditionalExample)halfDecoded).messagetype != 13;
}
};
...
...
FSTObjectInput fstin = new FSTObjectInput(instream, conf);
fstin .setConditionalCallback(conditionalCallback);
Object res = in.readObject(cl);
The FSTObjectInput
will deserialize all fields of ConditionalExample
then call 'shouldSkip' giving in the partially-deserialized Object. If the shouldSkip method returns false, the @Conditional reference will be decoded and set, else it will be skipped.
By default FST falls back to the methods defined by the JDK. Especially if private methods like 'writeObject' are involved, performance suffers, because reflection must be used. Additionally the efficiency of some stock JDK classes is cruel regarding size and speed. The FST default configuration already registers some serializers for common classes (popular Collections and some other frequently used classes).
So if you have trouble with stock JDK serilaization speed/efficiency, you might want to register a piece of custom code defining how to read and write an object of a specific class.
the basic interface to define the serialization of an Object is FSTObjectSerializer
. However in most cases you'll use a subclass of FSTBasicObjectSerializer
.
The FSTDateSerializer delivered with FST (note the registration in the instantiate method, you need to do it if you instantiate the object by yourself):
public class FSTDateSerializer extends FSTBasicObjectSerializer {
@Override
public void writeObject(FSTObjectOutput out, Object toWrite, FSTClazzInfo clzInfo, FSTClazzInfo.FSTFieldInfo referencedBy)
{
out.writeFLong(((Date)toWrite).getTime());
}
@Override
public void readObject(FSTObjectInput in, Object toRead, FSTClazzInfo clzInfo, FSTClazzInfo.FSTFieldInfo referencedBy)
{
}
@Override
public Object instantiate(Class objectClass, FSTObjectInput in, FSTClazzInfo serializationInfo, FSTClazzInfo.FSTFieldInfo referencee, int streamPositioin)
{
Object res = new Date(in.readFLong());
in.registerObject(res,streamPositioin,serializationInfo);
return res;
}
}
a serializer is registered at the FSTConfiguration
Object:
static {
...
conf = FSTConfiguration.createDefaultConfiguration();
conf.registerSerializer(Date.class, new FSTDateSerializer(), false);
...
}
(ofc you have to use exactly this configuration later on in the FSTObjectIn/OutputStream).
Note
The reason having 3 methods (read, write, instantiate) allows to read from the stream before creating the object (e.g. to decide which class to create). Common case is to just override and implement read/write, however there are cases where read is empty and the full object is created and read in the instantiate method.
** Note ** Unsafe mode has been removed in 1.5, but might come back in a later, refactored 2.x version of FST. Reason is it paid off for native arrays only, so for the rare cases you encode huge primitive arrays, just use externalize on the specific class for now if you really want to use Unsafe
Note
Enabling unsafe is not worth the risk in most scenarios. However it can be useful in certain high performance communication heavy data processing. Especially primitive arrays and String encoding improves by an order of magnitude. Usual object graphs and structures do not profit from enabling of unsafe a lot.
Setting System property fst.unsafe=true
(e.g. -Dfst.unsafe=true
on commandline) lets FST make use of unsafe operations to speed up. It has been proven to be reliable across machines, JDK version and operating systems (WinX,Linux) in production systems. It is recommended to use set fst.unsafe to "true" only if you have a very stable application without any class version mismatch issues. Else you might run into trouble/crashes.
Consider this as a special option usable for inhouse / intra server communication. Never use in clients as they might be outdated. Version mismatch can cause unpredictable behaviour when unsafe use is turned on.
If FSTConfiguration.preferSpeed
is true, also native arrays will be serialized using unsafe operations, which means no value compression is applied and may result in significant higher size of a serialized object. However often speed matters more than size e.g. when serializing to Off-Heap, Shared Memory queues or fast networks such as IB or 10GBit ethernet (even 1GBit ethernet is not that easy saturated if one uses some of the various slowish enterprise frameworks).
Usage of Unsafe can be enabled by calling "System.setProperty("fst.unsafe","true")"
prior to referencing any FST class. A better approach is to switch Unsafe usage at command line like java -Dfst.unsafe=true ...
when starting your program.
If you use FST in client server applications or heterogenous networks you might run into byte order issues, as the byteorder of an x86 and (RIP) Solaris SPARC machine are different.
In contradiction to standard Java IO, FST always assumes x86 byte order even when Unsafe is turned off, this means you can encode from an x86 server with Unsafe enabled and decode on a Client with another processor architecture as long Unsafe on the Client is disabled.
So on Big Endian platforms, never turn on Unsafe usage wether its a client or server machine. Ofc this does not hold true if ALL machines de/encoding FST Objects are Big Endian.
Clarification: Disable unsafe on all Big Endian Platforms (non-x86), except when all machines (client+server) are Big Endian. You can enable Unsafe always on Little Endian (x86) machines.