title | toc | date |
---|---|---|
14 Serialization and File IO |
false |
2017-10-30 |
If your data will be used by only the Java program that generated it:
- Using serialization(序列化): Serialization is an (automatic) way to save and load the state of an object from a stream. The serialized object could then be stored in a file, database or sent over the network.
- The serialized file is much harder for humans to read, but it's much easier (and safer) to restore objects from serialization.
If your data will be used by other programs:
- Write a plain text file(纯文本文件): Write a file, with delimiters that other programs can parse.
- Make a FileOutputStream
:::Java FileOutputStream fileStream = new FileOutputStream("MyGame.ser");
- Make an ObjectOutputStream
:::Java ObjectOutputStream os = new ObjectOutputStream(fileStream);
- Write the object
:::Java os.writeObject(characterOne);
- Serializes the object referenced by character-One, and writes them to the file "myGame.ser".
- Close the ObjectOutputStream
:::Java os.close();
- Closing the stream at the top closes the ones underneath, so the FileOutputStream (and the file) will close automatically.
The Java I/O API has connection streams, that represent connections to destinations and sources such as files or network sockets, and chain streams that work only if chained to other streams.
- connection streams represent the connection (FileOutputStream)
- chain streams call methods on connection. (ObjectOutputStream)
- FileOutputStream writes bytes to a file.
- ObjectOutputStream turns objects into data that can be written to a stream.
A FileOutputStream is an output stream for writing data to a File or to a FileDescriptor.
!!! Note FileOutputStream is meant for writing streams of raw bytes. For writing streams of characters, consider using FileWriter.
Serialization saves the entire object graph.
- When an object is serialized, all the objects it refers to from instance variables are also serialized.
If you want your class to be serializable, implement {==Serializable==}.
- The Serializable interface is known as a marker or tag interface(标记接口), because the interface doesn't have any methods to implement.
- Its sole purpose is to announce that the class implementing it, is serializable.
- If any superclass of a class is serializable, the subclass is automatically serializable even if the subclass doesn't explicitly declare implements Serializable.
A marker interface is an interface that contains NO method declarations, but merely designates (or "marks") a class that implements the interface as having some property. [Effective Java, 2nd]
The source file of Serializable.java
:
public interface Serializable {
}
Serialization is all or nothing: Either the entire object graph is serialized correctly or serialization fails.
import java.io.*;
public class Box implements Serializable {
// No methods to implement, but when you say "implements Serializable",
// it says to the JVM, "it's OK to serialize objects of this type."
private int width;
private int hight;
public void setWidth(int w) {
width = w;
}
public void setHight(int h) {
hight = h;
}
public static void main(String[] args) {
Box box = new Box();
box.setHight(5);
box.setWidth(6);
try {
FileOutputStream outputStream = new FileOutputStream("foo.ser");
ObjectOutputStream os = new ObjectOutputStream(outputStream);
os.writeObject(box);
os.close();
} catch (IOException ex) {
ex.printStackTrace();
}
}// end main
}// end class Box
transient is a variables modifier used in serialization:
Mark an instance variable as {==transient==} if it can't (or shouldn't) be saved.
import java.net.*;
class Chat implements Serializable {
// transient says,
// "don't save this variable during serialization, just skip it".
transient String currentID;
String userName;
//more code
}
Variables may be marked transient to indicate that they are not part of the persistent state of an object. [Java Language Specification, Java SE 7 Edition, Section 8.3.1.3. transient Fields]
!!! Note If you serialize an object, a transient reference instance variable will be brought back as null, regardless of the value it had at the time it was saved.
!!! Question What happens if two objects in the object graph are the same object?
In that case, *only one* of the objects is saved, and during deserialization, any references to that single object are restored.
Deserialization(反序列化) is a lot like serialization(序列化) in reverse.
- Make a FileInputStream
:::Java FileInputStream fileStream = new FileInputStream("MyGame.ser");
- Make an ObjectInputStream
:::Java ObjectInputStream os = new ObjectInputStream(fileStream);
- Read the object
:::Java Object one = os.readObject();
- Each time you say readObject(), you get the next object in the stream. So you'll read them back in the same order in which they were written.
- Cast the objects
:::Java GameCharacter elf = (GameCharacter) one;
- The return value of readObject() is type Object, so you have to cast it back to the type you know it really is.
- Close the ObjectInputStream
:::Java os.close();
- Closing the stream at the top closes the ones underneath, so the FileInputStream and the file will close automatically.
What happens during deserialization?
- The object is read from the stream.
- The JVM determines (through info stored with the serialized object) the object’s class type.
- The JVM attempts to find and load the object’s class. If the JVM can’t find and/or load the class, the JVM throws an exception and the deserialization fails.
- A new object is given space on the heap, but the serialized object’s constructor does NOT run! Obviously, if the constructor ran, it would restore the state of the object back to its original 'new' state, and that’s not what we want. We want the object to be restored to the state it had when it was serialized, not when it was first created.
- If the object has a non-serializable class somewhere up its inheritance tree, the constructor for that non-serializable class will run along with any constructors above that (even if they’re serializable). Once the constructor chaining begins, you can’t stop it, which means all superclasses, beginning with the first non-serializable one, will reinitialize their state.
- The object’s instance variables are given the values from the serialized state. Transient variables are given a value of null for object references and defaults (0, false, etc.) for primitives.
!!! note Static variables are NOT serialized. Static means "one per class" not "one per object". When an object is deserialized, static variables will have whatever static variable its class currently has.
Question: What might happen if you change the class and want to deserialize the object?
Result: It may throw an InvalidClassException.
Changes to a class that can hurt deserialization:
- Deleting an instance variable
- Change the declared type of an instance variable
- Changing a non-transient instance variable to transient
- Moving a class up or down the inheritance hierarchy
- Changing a class from Serializable to not Serializable.
- Changing an instance variable to static.
Each time an object is serialized, the object (including every object in its graph) is 'stamped' with a version ID number for the object's class. The ID is called serialVersionUID.
The serialization runtime associates with each serializable class a version number, called a serialVersionUID, which is used during deserialization to verify that the sender and receiver of a serialized object have loaded classes for that object that are compatible with respect to serialization. If the receiver has loaded a class for the object that has a different serialVersionUID than that of the corresponding sender's class, then deserialization will result in an
InvalidClassException
. [Java Docs]
SOLUTION TO THE QUESTION: If you think there is ANY possibility that your class might evolve, put a serialVersionUID in your class.
- As the class evolves, the serialVersionUID will remain the same and the JVM thinks the class is compatible with this serialized object.
If a serializable class does not explicitly declare a serialVersionUID, then the serialization runtime will calculate a default serialVersionUID value for that class based on various aspects of the class. However, it is strongly recommended that all serializable classes explicitly declare serialVersionUID values, since the default serialVersionUID computation is highly sensitive to class details that may vary depending on compiler implementations, and can thus result in unexpected InvalidClassExceptions during deserialization. [Java Docs]
HOW?
A serializable class can declare its own serialVersionUID explicitly by declaring a field named "serialVersionUID" that must be static, final, and of type long:
ANY-ACCESS-MODIFIER static final long serialVersionUID = 42L;
The java.io.File class represents a file on disk, but doesn't actually represent the contents of the file.
- Think of a File object as something more like a pathname of a file (or even a directory) rather than the Actual File Itself.
File is an abstract representation of file and directory pathnames. [Java docs]
Some things you can do with a File object:
-
Make a file Object representing an existing file
:::Java File f = new File("Mycode.txt");
// not actually create a file
-
Create a new, empty file
:::Java File.createNewFile("MyFile.txt")
-
Make a new directory
:::Java File dir = new File("Chapter7"); dir.mkdir();
-
List the contents of a directory
if (dir.isDirectory()) { String[] dirContents = dir.list(); for (int i=0; i < dirContents.length; i++) { System.out.println(dirContents[i]); } }
-
Get the absolute path of a file or directory
:::Java System.out.println(dir.getAbsolutePath());
-
Delete a file or directory (returns true if successful)
:::Java boolean isDeleted = f.delete();
To write a string: :::Java FileWriter.write("My first String to save")
import java.io.*;
class WriteAFile {
public static void main (String[] args) {
try {
// If the file "foo.txt" does not exist, FileWriter will create it.
FileWriter writer = new FileWriter(“Foo.txt”);
// The write() method takes a String
writer.write(“hello foo!”);
// Close it when you're done!
writer.close();
} catch(IOException ex) {
ex.printStackTrace();
}
}
}
Buffers are much more efficient than working without them.
BufferedWriter writer = new BufferedWriter(new FileWriter(aFile));
- FileWriter writes each and every thing you pass to the file each and every time.
- BufferedWriter will hold all the stuff you write to it until it's full. Only then the buffer is full will the
FileWriter
actually be told to write to the file on disk.- If you do want to send data before the buffer is full, Just Flush It(
writer.flush()
).
- If you do want to send data before the buffer is full, Just Flush It(
Usually, we use a File Object to represent a file, a FileReader to do the actual reading, and a BufferedReader to make the reading more efficient.
!!! example "Reading A FILE"
```Java
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
public class ReadAFile {
public static void main (String[] args){
try{
File myFile = new File("Foo.text");
// A FileReader is a connection stream for characters, that connects to a text file.
FileReader fileReader = new FileReader(myFile);
//Chain the FileReader to a BufferedReader for more efficient reading.
BufferedReader reader = new BufferedReader(fileReader);
// Make a String variable to hold each line as the line is read.
String line = null;
while ((line = reader.readLine())!=null) {
System.out.println(line);
}
reader.close();
} catch (Exception ex) {
ex.printStackTrace();
}
}
}
```
在java.io
包中流的操作主要有字节流(byte streams)、字符流(character streams)两大类,并且两个都具备输入输出的操作。InputStream
和OutputStream
属于字节流,Reader
和Writer
属于字符流。
对比:字节流一个字节一个字节访问文件。字节流适合于任何文件,除了文本文件。例如,当文件的字符使用两个字节的unicode编码时,字节流会分开对待每个字符,需要手动转换。而字符流一个字符一个字符访问文件。字符流需要指定文件编码才能正确访问。
NIO是New I/O的简称,表示新的一套Java IO标准。Java NIO与IO的主要区别如下:
IO | NIO |
---|---|
面向流 | 面向缓冲 |
阻塞IO | 非阻塞IO |
无 | Selectors |
Stream Oriented vs. Buffer Oriented
The first big difference between Java NIO and IO is that IO is stream oriented(面向流), where NIO is buffer oriented(面向缓冲).
Java IO being stream oriented means that you read one or more bytes at a time, from a stream. What you do with the read bytes is up to you. They are not cached anywhere. Furthermore, you cannot move forth and back in the data in a stream. If you need to move back and forth in the data read from a stream, you will need to cache it in a buffer first.
Java NIO's buffer oriented approach is slightly different. Data is read into a buffer from which it is later processed. You can move forth and back in the buffer as you need to. This gives you a bit more flexibility during processing. However, you also need to check if the buffer contains all the data you need in order to fully process it. And, you need to make sure that when reading more data into the buffer, you do not overwrite data in the buffer you have not yet processed.
Blocking vs. Non-blocking IO
Java IO's various streams are blocking. That means, that when a thread invokes a read()
or write()
, that thread is blocked until there is some data to read, or the data is fully written. The thread can do nothing else in the meantime.
Java IO: Reading data from a blocking stream.
Java NIO's non-blocking mode enables a thread to request reading data from a channel, and only get what is currently available, or nothing at all, if no data is currently available. Rather than remain blocked until data becomes available for reading, the thread can go on with something else.
The same is true for non-blocking writing. A thread can request that some data be written to a channel, but not wait for it to be fully written. The thread can then go on and do something else in the mean time.
What threads spend their idle time on when not blocked in IO calls, is usually performing IO on other channels in the meantime. That is, a single thread can now manage multiple channels of input and output.
If you need to manage thousands of open connections simultaneously, which each only send a little data, for instance a chat server, implementing the server in NIO is probably an advantage. Similarly, if you need to keep a lot of open connections to other computers, e.g. in a P2P network, using a single thread to manage all of your outbound connections might be an advantage. This one thread, multiple connections design is illustrated in this diagram:
If you have fewer connections with very high bandwidth, sending a lot of data at a time, perhaps a classic IO server implementation might be the best fit. This diagram illustrates a classic IO server design:
Selectors
Java NIO's selectors(选择器) allow a single thread to monitor multiple channels of input. You can register multiple channels with a selector, then use a single thread to "select" the channels that have input available for processing, or select the channels that are ready for writing. This selector mechanism makes it easy for a single thread to manage multiple channels.
Java NIO Channels are similar to streams with a few differences:
- You can both read and write to a Channels. Streams are typically one-way (read or write).
- Channels can be read and written asynchronously.
- Channels always read from, or write to, a Buffer.
Here are the most important Channel implementations in Java NIO:
- FileChannel reads data from and write to files.
- DatagramChannel can read and write data over the network via UDP.
- SocketChannel can read and write data over the network via TCP.
- ServerSocketChannel allows you to listen for incoming TCP connections, like a web server does. For each incoming connection a SocketChannel is created.
Java NIO Buffers are used when interacting with NIO Channels. Data is read from channels into buffers, and written from buffers into channels.
Using a Buffer
to read and write data typically follows this little 4-step process:
- Write data into the
Buffer
- Call
buffer.flip()
- Read data out of the
Buffer
- Call
buffer.clear()
orbuffer.compact()
In the typical life cycle of a Java NIO buffer, the buffer is created empty ready for a producer to fill it up with data. The buffer is in filling mode(a producer writes into the buffer). After the producer has finished writing data, the buffer is then flipped to prepare it for draining mode(a consumer reads from the buffer). At this point, the buffer is ready for the consumer to read the data. Once done, the buffer is then cleared and ready for writing again.
RandomAccessFile aFile = new RandomAccessFile("data/nio-data.txt", "rw");
FileChannel inChannel = aFile.getChannel();
ByteBuffer buf = ByteBuffer.allocate(48);
int bytesRead = inChannel.read(buf);
while (bytesRead != -1) {
System.out.println("Read " + bytesRead);
buf.flip();
while (buf.hasRemaining()){
System.out.print((char) buf.get());
}
buf.clear();
bytesRead = inChannel.read(buf);
}
aFile.close();
Buffer中有三个重要的参数:位置(position)、容量(capacity)和上限(limit):
- posiiton: 当前缓冲区的位置
- capacity: 缓冲区的容量上限
- limit: 缓冲区的实际上线,总是小于或者等于capacity
flip()
操作会重置position为0,并把limit设置到当前position,通常用于将buffer从写模式转换为读模式。
Direct buffers are more efficient as I/O operations are performed directly on the buffer without the need of copying the information into memory first. However, creating a direct buffer is an expensive operation, and might even trigger a full Garbage Collection. Direct buffers are usually best suited when working with long-lived and large buffers, although performance gain should be measured before committing to using direct buffers.
Direct buffers are created with allocateDirect()
method:
ByteBuffer byteBuffer = ByteBuffer.allocateDirect(100);