The Open Road: java.nio.fileEditor's Note: Along with exotic and ambitious proposed features like the Java Module System, closures, and language-level XML support, do you suppose Java 7 will provide us a reliable file-copy method? It could happen, as a JSR for "More New I/O APIs for the Java Platform" appears to be a likely candidate for inclusion in Java 7. In this installment of The Open Road, Elliotte Rusty Harold takes a detailed look at the current state of the NIO2 spec and how it will, and sometimes won't, help you work with files.
Before we begin, here's a brief update on the status of the OpenJDK 7 project. The most recent JDK 7 drop is Build b29, posted June 20. A look at b29's summary of changes shows this to mainly be a bug-fix beta, with defects cleared in the compiler, build scripts, AWT, and a few other areas. Releases from the project have been coming out every two weeks or so since April -- taking an unsurprising break in early May for JavaOne -- with b28, b27, b26, and b25 continuing to fix defects and add minor features, such as JMX support for platform MXBeans of any type (bug 6610094), and an IO redirection API for sub-processes (bug 4960438).
And speaking of bugs, take a look at bug 4032604, "Copy method in class java.io.File." The first two comments on the bug were posted by the author and the editor of the article you're about to read -- 11 and nine years ago, respectively. Will we finally get our wish? Read on for Elliotte's answer.
Java is a cross-platform language and environment. However, the Java VM itself needs to communicate with the native processor, operating system, and file system. If native code is to be avoided, everything you'd rely on the OS for in a classic program has to be provided by Java instead. Reimplementing a complete virtual OS and API takes a while. In Java's case, and specifically in the case of the file system, the job has taken over a decade; and it still isn't done. Nonetheless, Java 7 may finally finish some of the last abstractions needed to create filesystem-aware programs with all the power of their native counterparts.
Sometimes it's the little things that are most annoying, like
the mosquito that won't stop buzzing around your bed at night.
Sometimes these irritants grow over time. In Java 1.0 the language
was so new we barely noticed that it had no reliable way to copy or
move a file. In Java 1.1 we were so happy about
internationalization and readers and writers that we figured moving
and copying files would surely come in the next release. In Java
1.2 we got distracted by Swing, and didn't think much about I/O. By
Java 1.3, however, we were starting to get a little antsy. Surely
Sun could have offered us file copies by now? We were definitely
getting a little tired of running long streams just to move a file
from point A to point B while losing all the metadata in the
process. We were very tired of shelling out to native code to move
files because renameTo() only worked on an
unpredictable subset of the systems we needed to run on. But Sun
promised that they'd get around to a decent file system interface
in Java 1.4.
Java 1.4 arrived, and it was full of buffers and channels and
charsets and more in a spanking new java.nio package.
Unfortunately, the new filesystem interface we'd been promised was
nowhere to be found. Seems the developers working on
java.nio had gotten so excited about non-blocking I/O
and memory-mapped files that they ran out of time or just plum
forgot about their promise to finally let us move and copy
files.
Java 5, and then Java 6, came and went with nary a file copy operation in sight, though Sun did manage to find time to invent the most complicated and simultaneously least powerful generics implementation I've ever seen. It was starting to feel like the priorities were more than a little skewed over at the JCP. Complicated, sexy proposals like generics, closures, and asynchronous I/O got a lot more attention than they deserved while basic, fundamental, and easy but unsexy functionality, such as copying and moving files, was starved for resources.
However, finally in Java 7, it looks like there's at least a
50-50 chance we'll get a filesystem API that's more powerful than
the clunky old java.io.File that was thrown together
twelve years ago to push Java 1.0 out the door. Sun, IBM, Oracle,
Intel, HP, Google, NTT, and Doug Lea are working on JSR 203 to create
"More New I/O APIs
for the Java Platform ('NIO.2')". Don't hold your breath yet,
but do keep your fingers crossed. Maybe, just maybe, we'll
finally be able to copy files in Java 7.
In Java 7 java.io.File won't be deprecated, but it
probably should be. Instead, all files should be referenced through
the java.nio.file package. What used to be a
java.io.File will now become a
java.nio.file.Path. This is a more accurate name, since
there was never any guarantee that a File object
mapped to a real file. Also, paths can refer to both files and
directories.
The Path class is abstract and has no constructors.
Instead, you'll ask a FileSystem object to create a
path for you. This way, it can create a path that's specific to the
type of file system: Windows, Unix, Mac OS X, network, zip archive,
or something else. For example, this is how you create a
Path object for the file in
/home/elharo/articles/java.net/article3.html on the local
default file system:
FileSystem local = FileSystems.getDefault();
Path p = local.getPath("/home/elharo/articles/java.net/article3.html");
You can create relative paths, too. These are relative to the current working directory:
Path p = fileSystem.getPath("articles/java.net/article3.html");
Most of the time, you'll just want to use the operating system's
native file system, which is available via the
FileSystem.getDefault() method. If this is all you
want, and usually it is, the static Path.get() method
saves you a few columns of horizontal space:
Path p = Path.get("/home/elharo/articles/java.net/paths.html");
However, you can install other file systems that point somewhere other than the local file system. For instance, you could have a file system that accesses an HTTP server, reads a zip archive, mounts an ISO disk image, or views a Mercurial repository. Each such file system would have its own path and attribute classes. However, basic operations could still be performed with the abstract superclasses I discuss here.
The possibility of alternate file systems gives us a second way
to create path objects. Given a file system that supports a URI
scheme, you can create a Path object from the URI. For
example, imagine you've installed a RESTful file system provider
that uses HTTP GET for reading, HTTP PUT for writing, HTTP OPTIONS
and HEAD for attributes, and HTTP DELETE for removal. Then you can
point to a file on the server like so:
Path p = Path.get(new URI("http://www.example.com/foo/bar.html"));
There's also a toUri() method that converts an
absolute Path to a filesystem-specific URI:
String url = path.toAbsolutePath().toUri();
Finally, if you're passed a traditional java.io.File
object by some old code, you can convert it to the new hotness with
the getFileRef() method:
File f = new File("/foo/bar.txt");
Path p = f.getFileRef();
Unfortunately, the more obvious name getPath() was
already taken.
A Path stores a hierarchical list of names indexed
from zero to one less than the length of the list. The last name in
the list is the name of the individual file or directory the path
refers to. The first name will be the root of the file system if
the path is absolute. Other parts of the path are parent
directories. These methods inspect this list:
Path getName(int n)nth component in this path. The root of the path is
0. The file/directory itself is one less than the number of
components of the path.int getNameCount()Path getParent()Path getRoot()/. For an absolute
path on a DOS-like file system, this will be something like
C:\ or D:\. For a relative path, this will be
null.Of course, paths aren't always quite perfect trees. In relative
paths, the root is missing. Sometimes symbolic links cause the path
to jump to a different subtree. The toAbsolutePath()
method converts a path to an absolute path starting from a root of
the file system, wherever that might be. Invoking
toRealPath(false) on a path removes path segments like
/./ and /../ from the path before
computing an absolute path. Invoking toRealPath(true)
on a path removes path segments like /./ and
/../ and also resolves all symbolic links before
returning the absolute path.
You can use several variants of the resolve method
to create new paths from an existing path. For example, suppose
temp points to /usr/tmp:
Path temp = fileSystem.getPath("/usr/tmp");
We can resolve other paths with this as the root. For example,
resolving articles/java.net/article3.html against temp
creates a path pointing to
/usr/tmp/articles/java.net/article3.html:
Path p = fileSystem.getPath("articles/java.net/article3.html");
Path resolved = temp.resolve(p);
The inverse operation of resolution is relativization. Given an absolute path such as /usr/tmp/articles/java.net/article3.html, you can convert it to a path relative to some other path such as /usr/tmp:
Path absolute = fileSystem.getPath("/usr/tmp/articles/java.net/article3.html");
Path temp = fileSystem.getPath("/usr/tmp");
Path relative = temp.relativize(absolute);
If necessary, relativization can add ./ and ../ to the path to properly relativize. For example, here I calculate a relative link from an article in one directory to an article in another directory:
Path article3 = fileSystem.getPath("/usr/tmp/articles/java.net/article3.html");
Path article7 = fileSystem.getPath("/usr/tmp/articles/developerWorks/article7.html");
Path link = article7.relativize(article3);
link now points to
../../java.net/article3.html.
These methods could be useful when setting up a templating system for a blog engine, or a content management system, and converting file paths to URLs, for example. If you use them that way, please do be careful that you don't accidentally let crackers go wandering all over the file system outside your content root, though.
To write to a path, you call newOutputStream(), and
then use the returned object as normal. Example 1 shows a simple
method to write the ASCII letters A through Z into a file in the
current working directory named alphabet.txt:
public void makeAlphabetFile throws IOException {
Path p = Path.get("alphabet");
OutputStream out = p.newOutputStream();
for (int c = 'A'; c <= 'Z'; c++) {
out.write(c);
out.write('\n');
}
out.close();
}
This program will create the file if it doesn't exist, and
overwrite it if it does. However, you can adjust this by passing
StandardOpenOption.APPEND or
StandardOpenOption.CREATE_NEW to the
newOutputStream() method:
OutputStream out = p.newOutputStream(EnumSet<OpenOption>.of(StandardOpenOption.CREATE_NEW));
Now alphabet.txt will be created if and only if it doesn't already exist. Otherwise the attempt will throw an exception.
There are several options you can use when opening a file:
StandardOpenOption.CREATE (default behavior for
writes)StandardOpenOption.CREATE_NEWStandardOpenOption.APPENDStandardOpenOption.TRUNCATE_EXISTING (default for
writes)StandardOpenOption.NOFOLLOW_LINKSStandardOpenOption.SPARSEStandardOpenOption.DSYNCBufferedInputStream and
BufferedWriter.StandardOpenOption.SYNCStandardOpenOption.READStandardOpenOption.WRITEThese options apply not just in this method, but for all methods in the API that open files. Not all of these are mutually exclusive. You can use several when opening a file.
You can buffer or otherwise filter these streams as normal.
Example 2 shows a better alphabet() method that uses
UTF-8 encoding, and buffers the data:
public void makeAlphabetFile throws IOException {
Path p = Path.get("alphabet");
OutputStream out = p.newOutputStream();
out = new BufferedOuputStream(out);
Writer w = new OutputStreamWriter(out, "UTF-8");
w = new BufferedWriter(w);
for (int c = 'A'; c <= 'Z'; c++) {
w.write(c);
w.write('\n');
}
w.flush();
w.close();
}
For reading, just use newInputStream() instead.
You can also specify attributes for newly created files when opening a path for writing. I'll discuss those below.
There are also methods that create channels instead, though on modern VMs, I'm skeptical whether that's really helpful or just more complex. Threading has improved so much in Java 6 that's it's no longer a problem to run thousands or even tens of thousands of streams in separate threads, thereby removing much of the impetus for using channels and non-blocking I/O in the first place. Perhaps the true asynchronous I/O also introduced with JSR-203 will make channels relevant again, but this remains to be seen.
However, there is one case that definitely calls for channels:
random access files. There's no specific new
RandomAccessFile class. Instead you ask the path to
give you a SeekableByteChannel:
Path p = Path.get("fits.dat");
SeekableByteChannel raf = p.newSeekableByteChannel(
StandardopenOption.READ,
StandardOpenOption.WRITE,
StandardopenOption.SYNC,
StandardOpenOption.DSYNC
);
The SeekableByteChannel class is a new subinterface
of ByteChannel that extends it with methods for moving
the file pointer around in the file before reading or writing:
public interface SeekableByteChannel extends ByteChannel {
public int read(ByteBuffer dest) throws IOException;
public int write(ByteBuffer source) throws IOException;
public long position() throws IOException;
public SeekableByteChannel position(long newPosition) throws IOException;
public long size() throws IOException;
public SeekableByteChannel truncate(long size) throws IOException;
}
To list the files in a directory you'll use a
DirectoryStream, which is not really a stream at all.
Instead, it's an Iterable that returns
DirectoryEntry objects from which you can get more
Paths. These Path objects are all
relative to their parent directories. The process starts with a call
to the newDirectoryStream() method of the path
representing a directory.
Example 3 is a program that lists all the .txt files in the roots of the filesystem:
import java.io.IOException;
import java.nio.file.*;
public class TextLister {
public static void main(String args) throws IOException {
for (Path root : FileSystem.getRootDirectories()) {
DirectoryStream txtFiles = root.newDirectoryStream("*.txt");
try {
for (Path textFile : txtFiles) {
System.out.println(textFile.getName());
}
finally {
txtFiles.close();
}
}
}
}
For filters beyond simple name filters -- for instance, filtering
by size or MIME type -- you have to implement your own instance of
the DirectoryStream.Filter interface to specify which
files to accept and reject. For example, here's a simple filter
that accepts files that are less than 100K in size:
public class SmallFilesOnly {
public boolean accept(DirectoryEntry entry) {
try {
if (entry.newSeekableByteChannel().size() < 102400) {
return true;
}
return false;
} catch (IOException ex) {
return false;
}
}
}
Unfortunately, you can't just pass an instance of this filter to
the newDirectoryStream() as you might expect. Instead,
you have to use a far less direct and more opaque means of listing
the directory using the Files.withDirectory method:
import java.io.IOException;
import java.nio.file.*;
public class TextLister {
public static void main(String args) throws IOException {
for (Path root : FileSystem.getRootDirectories()) {
Files.withDirectories(root, new SmallFilesOnly(), new DirectoryAction() {
public void invoke(DirectoryEntry entry) {
System.out.println(entry.getName());
}
});
}
}
}
I'm not sure what the working group has against simple, straightforward iteration, but instead we have to use this confusing closure-lite syntax. However, Java is not a language that was designed around closures, and closure-based methods like this just don't fit. There are just too many layers of indirection, and it's too hard to see what actually happens. For instance, in Example 5, can you tell me how to print the names of the first 10 entries, and then break? Doable, yes; but not trivial. Functional languages have their place, but they don't mix well with iterative-based languages like Java. Usable Java APIs should emphasize imperative design patterns, not functional ones.
Suppose you want to copy the file charm.txt in the directory cats to the file charm_harold.xml in the directory pets. Before Java 7, you had to open the source file and the destination file, read the entire contents from the source, and then write them to the destination. For a large file this could take a while, and usually you'd lose metadata such as permissions, owners, MIME types, archive flags, and such in the process. Example 6 shows how to accomplish this basic task in Java 7:
FileSystem default = FileSystems.getDefault();
Path charm = default.getPath("cats/charm.txt");
Path pets = default.getPath("pets/charm_harold.xml");
charm.copyTo(pets);
On many operating systems this will happen a lot faster than streaming data from one file to another. Furthermore, it should preserve all metadata that should be preserved. Security restrictions may prevent certain metadata from being copied, and other features such as the file creation time may be changed.
Now suppose instead of copying a file you want to move a file. In Java 6 and earlier, all you could do rename the file, which worked on some operating systems but not on others, and usually didn't work for network volumes even if it worked for local disks. Or you could copy the file byte by byte, and then delete the original. Now however, it's this simple:
FileSystem default = FileSystems.getDefault();
Path charm = default.getPath("cats/charm.txt");
Path pets = default.getPath("pets/charm_harold.xml");
charm.moveTo(pets);
This can be much faster even for very large files because most of the time no bits need to be moved at all. The local native file system simply needs to rewrite a few entries in a virtual table. Moves between physical disks or across the network do need to move bytes and will take finite time.
These methods are synchronous and blocking. If that bothers you,
just wrap the transfer in a FutureTask and pass it to
an Executor.
Of course, I/O is still an unsafe operation. These methods can
throw IOExceptions if the source file doesn't exist,
if the target directory is read-only, if a floppy is ejected while
a copy is being written to it, if a network goes down while a file
is being read, or any other such problems. As always, you'll need to
wrap these operations in a try-catch block or declare that your
method throws the relevant exception. You may also want to
implement your own recovery logic. File copies and moves over the
network or between disks take real time; and if an operation is
interrupted in medias res, the target file may be half-written and in an inconsistent, corrupt state.
By default, when a file is copied or moved:
Sometimes this is what you want, and sometimes it isn't. You can
adjust the behavior of the copy/move by passing one more copy
options to the copyTo() or moveTo()
methods:
StandardCopyOption.REPLACE_EXISTING: Overwrite a
preexisting target file.StandardCopyOption.COPY_ATTRIBUTES: Preserve all
the original's attributes in the copy.StandardCopyOption.NOFOLLOW_LINKS: Do not follow
symbolic links from the target when copying. Copy the links
themselves instead.StandardCopyOption.ATOMIC_MOVE: Copy/move the
entire file or nothing.For example, if you want to overwrite an existing target when
copying, pass StandardCopyOption.REPLACE_EXISTING to
copyTo like so:
source.copyTo(target, StandardCopyOption.REPLACE_EXISTING);
If you want to overwrite an existing target and preserve the
original file attributes, pass
StandardCopyOption.REPLACE_EXISTING and
StandardCopyOption.COPY_ATTRIBUTES:
source.copyTo(target, StandardCopyOption.REPLACE_EXISTING, StandardCopyOption.COPY_ATTRIBUTES);
Yes, the syntax does not look like the options for creating a
stream. Those use an EnumSet while these use
varargs.
Particular file systems may support additional non-standard attributes, but these four are required.
Metadata about a file such as owners, permission, readability,
and so forth has now been separated from the file class itself. You
request attributes from a path using the new
java.nio.file.Attributes class like so:
BasicFileAttributes attrs = Attributes.readBasicFileAttributes(path, false);
This only gives you the basic attributes that are common to most file systems, most of which have been available since Java 1.0. Example 8 is a simple program to list all the attributes for files named on the command line:
import java.io.IOException;
import java.nio.file.*;
import java.nio.file.attribute.*;
import java.util.concurrent.TimeUnit;
public class AttributePrinter {
public static void main(String args) throws IOException {
for (String name : args) {
Path p = Path.get(name);
BasicFileAttributes attrs = Attributes.readBasicFileAttributes(path, false);
TimeUnit scale = attrs.resolution();
// all dates are since the epoch but we do need to adjust for
// different time units used in different file systems
System.out.println(name + " was created at "
+ new Date(scale.toMillis(attrs.creationTime));
System.out.println(name + " was last access at "
+ new Date(scale.toMillis(attrs.lastAccessTime));
System.out.println(name + " was last modified at "
+ new Date(scale.toMillis(attrs.lastModifiedTime));
if (attrs.isDirectory()) {
System.out.println(name + " is a directory.");
}
if (attrs.isFile()) {
System.out.println(name + " is a normal file.");
}
if (attrs.isSymbolicLink()) {
System.out.println(name + " is a symbolic link.");
}
if (attrs.isOther()) {
System.out.println(name + " is something strange.");
}
System.out.println(name + " is " + attrs.size() + " bytes long.");
System.out.println("There are " + attrs.linkCount() + " links to this file.");
}
}
These attributes are assumed to be more or less the same on different file systems, though this isn't always true. Not all file systems track the last access time, for example.
You can ask for more platform-specific attributes with the
readDosFileAttributes() and
readPosixFileAttributes() methods. For example, Here's
a simple program to list all the attributes for a Windows file
named at the DOS prompt:
import java.io.IOException;
import java.nio.file.*;
import java.nio.file.attribute.*;
import java.util.concurrent.TimeUnit;
public class WindowsAttributePrinter {
public static void main(String args) throws IOException {
for (String name : args) {
Path p = Path.get(name);
DosFileAttributes attrs = Attributes.readDosFileAttributes(path, false);
if (attrs.isArchive()) {
System.out.println(name + " is backed up.");
}
if (attrs.isReadOnly()) {
System.out.println(name + " is read-only.");
}
if (attrs.isHidden()) {
System.out.println(name + " is hidden.");
}
if (attrs.isSystem()) {
System.out.println(name + " is a system file.");
}
}
}
POSIX file attributes are group, owner, and permissions. You'll
get an UnsupportedOperationException if you try to
read DOS attributes from a POSIX file system or vice versa.
Other providers can offer their own subclasses of
BasicFileAttributes. For instance, Apple might offer
MacFileAttributes, and Microsoft (or third parties)
NTFSFileAttributes. However, these additional
attributes can't be so easily plugged into the system.
I must say this is the piece of JSR-203 that strikes me as most questionable. File systems and file metadata are still evolving. The current system doesn't even support what's available today in Vista (Indexes, Archived, etc.) or Mac OS X Leopard (file type, creator type, etc.), much less what may be available in five years. I think we need a more flexible approach that does not presume it knows the names, types, or meaning of all possible file system metadata in advance. A generic key-value system would be a lot more palatable.
Copying files and checking permissions aren't the sexiest parts of a programmer's job. Indeed, they're among the most prosaic. Nonetheless, they are extremely important. The lack of a good way to do this has been a really critical omission in Java for years. Finally, Java 7 fills these basic holes.
Add on top of that sexier new I/O features, such as watch lists, true asynchronous I/O, and virtual file systems, and Java 7 may finally have a modern foundation for input and output on which the next generation of clients, servers, and desktop apps can be built.
Elliotte Rusty Harold is the author of numerous books including Java I/O, Java Network Programming, and the upcoming Refactoring HTML.
|
|