Friday, September 2, 2011

File Copy Using Java -- Basic and Fastest Way

Recently I had faced with situation to write simple copy command in Java, so I written the same using the plain old style and give it back to developer to implement and unit test the same. To my surprise developer comes back and reported me that my copy code is performing badly. Reason he want to use my copy code to copy at least 20 GB of file data.

So now I was faced with challenge to implement the fastest copy code. The first thing came to my mind was to use the nio API of java.

Copy using the java mainly depends on the following factors which you can control:

1) Buffer Size
2) Reading the data in efficient manner.

Buffer Size:

Buffer size is mainly dependent on operating system block size and CPU Cache. Example windows default block size is 4K (4096 Bytes).

Now if some one configure the block size of say 4500 bytes then two blocks (2* 4096) will be read and put it into the memory. So it means you had read extra bytes which will not be used and you pay the price for disk to ram read. Many times OS cache the blocks you read so next read it will be used but again here you pay the price of RAM and cost of RAM to CPU Cache copy. So ideally the buffer size should be equal to the block size or multiple of it ,so that every time you read it will be full block size read and there is no wastage. Remember cost of RAM to CPU Cache is much lesser than the cost of disk to RAM reads.

Again it depends on the CPU how it moves the data between L3 to L2 cache and complexity increases. So if you run your program under different cache size it gives different result. As per my experience the best buffer size is 8192 bytes even the java default buffer size is 8192. I will suggest to try
different buffer size and select the optimal one which best suites your environment.

Data Reading:

Most inefficient way of reading the text data is

while ((line = in.readLine()) != null) {
...
}

Here you are creating the String object every time you read a line. Lets say you are reading 10000 lines of file that means you are creating 10000 string objects and unnecessary time is spend in object creation, allocation and garbage collection. Instead of String you can use the character array.

For Binary Copy many write the code as below:
BufferedInputStream in = new BufferedInputStream(fis);
BufferedOutputStream out = new BufferedOutputStream(ost);
FileInputStream fis = new FileInputStream(src);
FileOutputStream ost = new FileOutputStream(dst);
byte[] buffer = new byte[8192];

while (true) {
int amountRead = in.read(buffer);
if (amountRead == -1) {
break;
}
out.write(buffer, 0, amountRead);
}

If you observer the out.write method the buffer and amount read are same so every time it will be flushed. So if file is huge it will have significant impact. So here while creating the new BufferedOutputStream(ost, size) use the size option and also use the write method instead of the previous one.

After the nio package is releases there are various option to read out of which one of very interesting is the ByteBuffer class.

ByteBuffer read the data repeatedly it keeps the track of the last byte written so that you can be assured of keep writing and rest it will take care. It doesn't mean that you can read endless data ByteBuffer has the limit. Once the limit is reached you need to flip.

ByteBuffer had following concept:

* Capacity - Size of the internal byte[].
* Position - Last byte index filled.
* Limit - Limit = capacity while reading and one past last filled byte while emptying.
* Mark - Bookmark ( optional )

ByteBuffer data is read from buffer starting position to read and while writing data is written from starting at ByteBuffer position upto limit. If you observer the position will advance till the limit and no further reading will be possible as position == limit. So you needs to do flip in ByteBuffer which will basically sets the position to 0 (start), and limit to the position
(previous position to be exact, or rather the end of useful input).


Different ways of Copy:

1) Conventional Way Of File Copy:


public void copyFile(File src, File dst, int buffLen) throws IOException {

BufferedInputStream in = null;
BufferedOutputStream out = null;
FileInputStream fis = null;
FileOutputStream ost = null;
byte[] buffer = new byte[buffLen];
try {
fis =
ost = new FileOutputStream(dst);
in = new BufferedInputStream(fis);
out = new BufferedOutputStream(ost);
while (true) {
int amountRead = in.read(buffer);
if (amountRead == -1) {
break;
}
out.write(buffer, 0, amountRead);
}
out.flush();
} catch (Exception e) {
e.printStackTrace();
} finally {
if (in != null) {
in.close();
}
if (out != null) {
out.close();
}
if (fis != null) {
fis.close();
}
if (ost != null) {
ost.close();
}

}
}

Here we are using the Buffered Stream to read and write.

2) Using the nio package and lets JVM decide the best way to copy:

public void copyUsingFileChannelIntenalTransfer(File source, File destination) throws IOException {
FileChannel in = null;
FileChannel out = null;
try {
in = new FileInputStream(source).getChannel();
out = new FileOutputStream(destination).getChannel();
// JavaVM does its best to do this as native I/O operations.
in.transferTo(0, in.size(), out);
// Closing file channels will close corresponding stream objects as well.
} finally {
out.close();
in.close();
}
}
Here we are depending on the JVM to decide the best buffer to make it as native I/O operation.

3) Copy using ByteChannel and Bytebuffer :
public void copyUsingByteChannel(File source, File destination, int bufferSize) throws IOException {
final InputStream input = new FileInputStream(source);
final OutputStream output = new FileOutputStream(destination);
final ReadableByteChannel inputChannel = Channels.newChannel(input);
final WritableByteChannel outputChannel = Channels.newChannel(output);
try {
final ByteBuffer byteBuffer = ByteBuffer.allocateDirect(bufferSize);
while (inputChannel.read(byteBuffer) != -1) {
// preparing for write by moving the limit to position
// and making position to zero
byteBuffer.flip();
// Writing to channel output stream.
outputChannel.write(byteBuffer);
// If there is any bytes left between postion and limit transfer
// it to front else same result as clear.
byteBuffer.compact();
}
//There are chances that after compact some bytes might left between
// position and limit so that also needs to be written.
byteBuffer.flip();
// Writing the remaining bytes if there is any
while (byteBuffer.hasRemaining()) {
outputChannel.write(byteBuffer);
}
} finally {
inputChannel.close();
outputChannel.close();
}
}
Here we are using the ByteChannel and ByteBuffer to copy please take note of comments for understanding.

4) Copy using MappedByteBuffer :

public void copyUsingMappedByteBuffer(File source, File destination) throws IOException {
FileInputStream fi = new FileInputStream(source);
FileChannel fic = fi.getChannel();
// Here also we are relying on the jvm and os to determine the
// buffer size
MappedByteBuffer mbuf = fic.map(
FileChannel.MapMode.READ_ONLY, 0, source.length());
fic.close();
fi.close();
FileOutputStream fo = new FileOutputStream(destination);
FileChannel foc = fo.getChannel();
foc.write(mbuf);
foc.close();
fo.close();
}

Here we are using the Mapped Byte Buffer and completely relying on jvm to determine the optimal buffer size.

Result:
After rigorous testing with different file size and different buffer size I came to
conclusion that the option no 3 is the fastest and consistent.
We were able to achieve more than 65% of performance improvement by opting to option no 3.
I will update this blog with the throughput data soon.

No comments:

Post a Comment