PDFs are one of the most common and most significant document
formats on the internet. Typically, developers must use expensive
tools from Adobe or cumbersome APIs to generate PDFs. In this
article, you will learn how to programmatically generate PDFs easily
with plain XHTML and CSS using two open source Java libraries:
Flying Saucer and iText.
The Problem with PDFs
PDFs are a great technology. When Adobe created the PDF format, they had a
vision for a portable document format (hence the name) that could
be viewed on any computer and printed to any printer. Unlike web
pages, PDFs will look exactly the same on every device, thanks to
the rigorous PDF specification. And the best thing about PDFs is
that the specification is open so you can generate them on the fly,
using readily available open source libraries.
There is one big problem with PDFs, however: the spec is
complicated and the APIs for generating PDFs tend to be cumbersome,
requiring a lot of low-level coding of paragraphs and headers. More
importantly, you have to use code to generate PDFs. But to
make good-looking PDFs, you need a graphic designer to create the
layout. Even if graphic designers are up to the task of
programming, they still must convert their layout from some other
format to code, which can be cumbersome, buggy, and time-consuming.
Fortunately, there is a better way.
The way to make good looking PDFs is to let the programmers do
what they are good at: writing code that manipulates data, and let
the graphic designers do what they are good at: making attractive
graphic designs. Flying Saucer and iText are tools that do this.
They let you render CSS stylesheets and XHTML, either static or
generated, directly to PDFs.
An Introduction to Flying Saucer and iText
Flying Saucer, which is the common name for the xhtmlrenderer project on
java.net, is an LGPLed Java library on java.net originally
created by me and continually developed by the java.net community.
Download it from the project page, or use the copy included with
this article's sample code (see Resources). Flying Saucer's primary purpose is to
render spec-compliant XHTML and CSS 2.1 to the screen as a Swing
component. Though it was originally intended for embedding markup
into desktop applications (things like the iTunes Music Store),
Flying Saucer has been extended work with iText as well. This makes
it very easy to render XHTML to PDFs, as well as to images and to the
screen. Flying Saucer requires Java 1.4 or higher.
iText is a PDF generation library created by Bruno
Lowagie and Paulo Soares, licensed under the LGPL and the Mozilla
Public License. You can download iText from its home page or use the copy
in the download bundle at the end of this article (see Resources). Using the iText API, you can produce
paragraphs, headers, or any other PDF feature. Since the PDF
imaging model is fairly similar to Java2D's model, Flying Saucer
and iText can easily work together to produce PDFs. In fact, the
PDF version of the Flying
Saucer user manual was itself produced using Flying Saucer and
iText.
Generating a Simple PDF
To get started, I'm going to show you how to render a very
simple HTML document as a PDF file. You can see in the
samples/firstdoc.xhtml file below that it's a plain XHTML
document (note the XHTML DTD in the header) and contains only a
single formatting rule: b { color: green; }. This
means the default HTML formatting for paragraphs and text will
apply, with the exception that all b elements will be
green.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>My First Document</title>
<style type="text/css"> b { color: green; } </style>
</head>
<body>
<p>
<b>Greetings Earthlings!</b>
We've come for your Java.
</p>
</body>
</html>
Now that we have a document, we need code to produce the PDF. The
FirstDoc.java file below is the simplest possible way to
render a PDF document.
package flyingsaucerpdf;
import java.io.*;
import com.lowagie.text.DocumentException;
import org.xhtmlrenderer.pdf.ITextRenderer;
public class FirstDoc {
public static void main(String[] args)
throws IOException, DocumentException {
String inputFile = "samples/firstdoc.xhtml";
String url = new File(inputFile).toURI().toURL().toString();
String outputFile = "firstdoc.pdf";
OutputStream os = new FileOutputStream(outputFile);
ITextRenderer renderer = new ITextRenderer();
renderer.setDocument(url);
renderer.layout();
renderer.createPDF(os);
os.close();
}
}
There are two main parts to the code. First it prepares the
input and output files. Since Flying Saucer deals with input URLs,
the code above converts a local file string into a
file:// URL using the File class. The
output document is just a FileOutputStream that
writes to the firstdoc.pdf file in the current working
directory.
The second part of the code creates a new
ITextRenderer object. This is the Flying Saucer class
that knows how to render PDFs using iText. You must first set the
document property of the renderer using the
setDocument(String) method. There are other methods
for setting the document using URLs and W3C DOM objects. Once the
document is installed you must call layout() to
perform the actual layout of the document and then
createPDF() to draw the document into a PDF file on
disk.
To compile and run this code you need the Flying Saucer .jar,
core-renderer.jar. For this article I am using a recent
development build (R7 HEAD). R7 final should be out in
a few weeks, perhaps by the time you read this. I chose to use a recent
R7 build instead of the year-old R6 because R7 has a rewritten CSS
parser, better table support, and of course, many, many bugfixes.
You will also need the iText .jar itext_paulo-155.jar (this
is actually an early access copy of iText from its SourceForge project
page). All of these .jars are included in the standard Flying
Saucer R6 download, and also in the examples.zip file in
this article's Resources section. Once you put
these .jars in your classpath everything will compile and run. The
finished PDF looks like Figure 1:

Figure 1. Screenshot of firstdoc.pdf (click to download full PDF
document)
Generating Content on the Fly
Producing a PDF from static documents is useful, but it would be
more interesting if you could generate the markup programmatically.
Then you could produce documents that contain more interesting
content than simple static text.
Below is the code for a simple program that generates the
lyrics to the song "99 Bottles of Beer on the Wall." This song has
a repeated structure, so we can easily produce the lyrics with a
simple loop. This document also uses some extra CSS styles like
color, text transformation, and modified padding.
In first part of the OneHundredBottles.java code, all of
the style and markup is appended to a StringBuffer.
Note that the style rule for h3 includes the
text-transform property. This will capitalize the
first letter of every word in the title. The body of the document
is produced by the loop that goes from 99 to 0. Notice that there
is an image, 100bottles.jpg, included at the top of
the document. iText will embed the image in the resulting PDF,
meaning the user will not need to load any other images once they
receive the PDF. This is an advantage of PDFs over HTML, where
images must be stored separately.
public class OneHundredBottles {
public static void main(String[] args) throws Exception {
StringBuffer buf = new StringBuffer();
buf.append("<html>");
// put in some style
buf.append("<head><style language='text/css'>");
buf.append("h3 { border: 1px solid #aaaaff; background: #ccccff; ");
buf.append("padding: 1em; text-transform: capitalize; font-family: sansserif; font-weight: normal;}");
buf.append("p { margin: 1em 1em 4em 3em; } p:first-letter { color: red; font-size: 150%; }");
buf.append("h2 { background: #5555ff; color: white; border: 10px solid black; padding: 3em; font-size: 200%; }");
buf.append("</style></head>");
// generate the body
buf.append("<body>");
buf.append("<p><img src='100bottles.jpg'/></p>");
for(int i=99; i>0; i--) {
buf.append("<h3>"+i+" bottles of beer on the wall, "
+ i + " bottles of beer!</h3>");
buf.append("<p>Take one down and pass it around, "
+ (i-1) + " bottles of beer on the wall</p>\n");
}
buf.append("<h2>No more bottles of beer on the wall, no more bottles of beer. ");
buf.append("Go to the store and buy some more, 99 bottles of beer on the wall.</h2>");
buf.append("</body>");
buf.append("</html>");
The second part of the code parses the
StringBufferinto a DOM document using the standard
Java XML APIs and then sets that as the document on the
ITextRenderer object. The renderer needs a base
URL to load resources like images and external CSS files. If
you pass a URL for the document to the renderer, then it will infer
the base URL. For example the document URL
http://myserver.com/pdf/mydoc.xhtml would result in a base
URL of http://myserver.com/pdf/ However, if you pass in a
pre-parsed Document object instead of a URL, then the
renderer will have no idea what the base URL is. You can manually
set the base URL using the second argument to the
setDocument() method. In this case I have used a value
of null, since I am not referencing any external
resources.
// parse the markup into an xml Document
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.parse(new StringBufferInputStream(buf.toString()));
ITextRenderer renderer = new ITextRenderer();
renderer.setDocument(doc, null);
String outputFile = "100bottles.pdf";
OutputStream os = new FileOutputStream(outputFile);
renderer.layout();
renderer.createPDF(os);
os.close();
}
}
The final document looks like Figure 2:

Figure 2. Screenshot of 100bottles.pdf (click to download full
PDF document)
Page-Specific Features
So far the documents we have rendered are basically just web
pages in PDF form. They don't have any features that take
advantage of pages. Paged media like printed documents or
slideshows have certain features specific to pages. In particular,
pages have specific sizes and margins. Text laid out for an 8 1/2
by 11 inch piece of paper will look very different than text for a
paperback book, or a CD cover. In short, pages matter, and Flying
Saucer gives you some control over pages using page-specific
features in CSS.
This next example will print the first chapter of Lewis
Carroll's Alice in Wonderland in a paperback format. The
markup is pretty straightforward, just a bunch of headers and
paragraphs. Below are the first few paragraphs of the document (see
the download for the entire chapter). There are two things to
notice in this document. First, all of the style is included in the
alice.css file linked in the header with a link
element. The media="print" attribute must be included,
or the style will not be loaded. The other important thing to
notice are the two divs at the top: header
and footer. The footer has two special elements in it,
pagenumber and pagecount, which are used
to generate the page numbers. These divs and the page
number elements will not be rendered at the top of the page.
Instead, we will use some special CSS to make these
divs repeat on every page and generate the proper page
numbers at runtime.
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Alice's Adventures in Wonderland -- Chapter I</title>
<link rel="stylesheet" type="text/css" href="alice.css" media="print"/>
</head>
<body>
<div id="header" style="">Alice's Adventures in Wonderland</div>
<div id="footer" style=""> Page <span id="pagenumber"/> of <span id="pagecount"/> </div>
<h1>CHAPTER I</h1>
<h2>Down the Rabbit-Hole</h2>
<p class="dropcap-holder">
<div class="dropcap">A</div>
lice was beginning to get very tired of sitting by her sister
on the bank, and of having nothing to do: once or twice she had
peeped into the book her sister was reading, but it had no pictures
or conversations in it, `and what is the use of a book,' thought
Alice `without pictures or conversation?'
</p>
<p>So she was considering in her own mind (as well as she could,
for the hot day made her feel very sleepy and stupid), whether the
pleasure of making a daisy-chain would be worth the trouble of
getting up and picking the daisies, when suddenly a White Rabbit
with pink eyes ran close by her. </p>
<p class="figure">
<img src="alice2.gif" width="200px" height="300px"/>
<br/>
<b>White Rabbit checking watch</b>
</p>
... the rest of the chapter
Most of the alice.css file contains normal CSS rules
that can apply to any kind XHTML document, printed or not. There
are a few, however, that are page-specific extensions:
@page {
size: 4.18in 6.88in;
margin: 0.25in;
-fs-flow-top: "header";
-fs-flow-bottom: "footer";
-fs-flow-left: "left";
-fs-flow-right: "right";
border: thin solid black;
padding: 1em;
}
#header {
font: bold serif;
position: absolute; top: 0; left: 0;
-fs-move-to-flow: "header";
}
#footer {
font-size: 90%; font-style: italic;
position: absolute; top: 0; left: 0;
-fs-move-to-flow: "footer";
}
#pagenumber:before {
content: counter(page);
}
#pagecount:before {
content: counter(pages);
}
The first thing you'll notice in the CSS above is the
@page rule. This is a rule that is attached to the
page itself rather than to any particular elements within the
document. Within this @page rule, you can set the size
of the page as well as page margins using the size and
margin properties. Note that I have set the size to
4.18in 6.88in, which is the size of a standard mass-market paperback book in the U.S. (according to CafePress). Also in the
@page rule are four special properties beginning with
-fs-flow-. These are Flying Saucer-specific properties
that tell the renderer to move content marked with the specified
names: header, footer, left,
and right to every page in the top, bottom, left, and
right positions.
In the rules for the header and footer divs, you
can see another Flying Saucer-specific property called
-fs-move-to-flow, which will take the div
out of the normal document and put it in the special place marked
by "footer" or "header". This property
works in conjunction with the -fs-flow-* properties in
the @page element to make repeated content work. These
custom properties are needed because CSS 2.1 does not define any
way to have repeated headers and footers. CSS 3 does
define a way to have repeated content, and Flying Saucer will
support the new standard mechanism in the future.
After the @page and header rules, you'll find two
more rules for the pagenumber and
pagecount elements. These are made-up elements (not
standard XHTML) that will have counters added to their content.
Since those two elements are empty, you will only see the counters
themselves. Since the pagenumber and
pagecount elements were defined in the footer, the
final page numbers will also appear in the footer. Again, these page
number elements will be replaced with their proper CSS 3
equivalents in the future.
The final rendered alice.xhtml is shown in Figure 3:


Figure 3. Screenshot of two pages of pagination.pdf (click to
download full PDF document)
A quick note on debugging: CSS can be tricky sometimes, and it is
very easy to misspell a keyword or forget some punctuation. Flying
Saucer R7 has a brand new CSS parser with very robust error
reporting. When developing your application, I recommend turning on
the built-in logging. The in-depth details of Flying Saucer
configuration are available in the FAQ. I have found the most
useful setting is to set the logging level to INFO by
adding this to your Java command line:
-Dxr.util-logging.java.util.logging.ConsoleHandler.level=INFO
This setting will print lots of debugging information, including
places where the CSS or markup may be broken.
Rendering Generic XML Instead of XHTML
Every example so far has used XHTML, meaning the XHTML dialect
of XML defined by the W3C. Many documents rendered into PDF are in
fact XHTML documents, but Flying Saucer can actually handle any
well-formed XML file. In fact, Flying Saucer does very little that
is XHTML-specific. XHTML documents are just XML documents with a
default stylesheet. If you define your own stylesheet, then you can
render any XML document you want. This could be particularly useful
when working with the output of databases or web services, since
that output is probably in XML already.
Below is a very simple custom XML document, weather.xml,
that describes the weather at multiple locations. It does not use
standard XHTML elements at all; every element is custom. Notice the
second line contains a reference to the stylesheet.
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href='weather.css' type='text/css'?>
<weather>
<station>
<location>Springfield, NT</location>
<description>Sunny</description>
<tempf>85</tempf>
</station>
<station>
<location>Arlen, TX</location>
<description>Super Sunny</description>
<tempf>99</tempf>
</station>
<station>
<location>South Park, CO</location>
<description>Snowing</description>
<tempf>18</tempf>
</station>
</weather>
Here is the DirectXML.java code that renders the
document. Notice that the code does nothing special. As far as
Flying Saucer is concerned, the only difference between XHTML and
XML is the file extension.
public class DirectXML {
public static void main(String[] args) throws IOException, DocumentException {
String inputFile = "samples/weather.xml";
String outputFile = "weather.pdf";
OutputStream os = new FileOutputStream(outputFile);
ITextRenderer renderer = new ITextRenderer();
renderer.setDocument(new File(inputFile));
renderer.layout();
renderer.createPDF(os);
os.close();
}
}
Here's the weather.css CSS that will style the XML.
* { display: block; margin: 0; padding: 0; border: 0;}
station {
clear: both;
width: 3in; height: 3in;
padding: 0.5em; margin: 1em;
border: 3px solid black; background-color: green;
font-size: 30pt;
page-break-inside: avoid;
}
tempf {
border: 1px solid white;
background-color: blue; color: white;
width: 1.5in; height: 1.5in;
margin: 5pt;
padding: 8pt;
font: 300% sans-serif bold;
}
location { color: white; }
description { color: yellow; }
The CSS stylesheet contains all of the magic in this example.
Since this is all XML, there are no default rules to show how any
element is drawn. That's why the first rule is a *
rule to affect all elements: they should all be blocks with no
border, margins, or padding. Then I have defined a rule for each of
the four content elements. The elements take the standard CSS
properties that you could apply to HTML elements. Note that the
station element has a page-break-inside:
avoid property. This is a CSS 3 property that tells the
renderer that you don't want the station element split by a page
break. This is useful when you have content sections that must be printed
whole. For example you might be printing to label paper for
stickers on a map display. In that case, you definitely would
not want any boxes to be split across pages.
Note that I've set the size of the station block using inches.
When coding for the Web you usually want to avoid absolute units
like inches, pixels, or centimeters. Instead, you should use
relative units like points or ems, since these work well when a
user resizes the document or changes their font size. But then
again, PDFs aren't for the Web. They are paged media for
printing. That means absolute units are perfectly fine, and in fact
encouraged, since their use ensures the user will get a document
that looks exactly like you wanted.
The final document looks like Figure 4.:

Figure 4. Screenshot of weather.pdf (click to download full PDF
document)
Generating PDFs in a Server-Side Application
All of the examples in this article have been small command-line
programs that write PDF files. However, you can easily use this
technology to produce PDFs in a web application using a servlet.
The only difference is that you will be writing to a
ServletOutputStream instead of a
FileOutputStream. Below is a portion of the code for a
PDF generation servlet that produces a tabular report of sales for
a particular user:
public class PDFServlet extends HttpServlet {
protected void processRequest(HttpServletRequest request, HttpServletResponse response)
throws ServletException, IOException {
response.setContentType("application/pdf");
StringBuffer buf = new StringBuffer();
buf.append("<html>");
String css = getServletContext().getRealPath("/PDFservlet.css");
// put in some style
buf.append("<head><link rel='stylesheet' type='text/css' "+
"href='"+css+"' media='print'/></head>");
... //generate the rest of the HTML
// parse our markup into an xml Document
try {
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.parse(new StringBufferInputStream(buf.toString()));
ITextRenderer renderer = new ITextRenderer();
renderer.setDocument(doc, null);
renderer.layout();
OutputStream os = response.getOutputStream();
renderer.createPDF(os);
os.close();
} catch (Exception ex) {
ex.printStackTrace();
}
}
The code above looks pretty much like the previous examples.
There are two special things to notice, though. First, you must set
the content type to application/pdf. This will make
the user's web browser pass the PDF on to their PDF reader or
plugin instead of showing it as garbled text. Second, the CSS is
stored in a separate file in the main webapp directory (where the
JSPs and HTML would go). In order for Flying Saucer to find it, you
must use the getServletContext().getRealPath() method
to convert PDFservlet.css into an absolute URL and put
it in the link tag at the top of the generated markup. Once you
have your HTML properly generated, you can just parse it into a
Document and render the PDF to the output stream
returned by response.getOutputStream().
The final document looks like Figure 5:

Figure 5. Screenshot of servlet.pdf (click to download full PDF
document)
Conclusion
PDFs are a great format for maps, receipts, reports, and
printable labels. Flying Saucer and iText let you produce PDF files
programmatically without having to use expensive tools or
cumbersome APIs. By using plain XHTML and CSS, your graphic
designer can use their existing web tools like Dreamweaver to
produce great looking CSS templates that you or your developers
plug in to your applications. By splitting the work, you can save
both time and money.
If you use Flying Saucer to produce PDFs for your company or
project, please post a link in the comments of this article or email
me. The Flying Saucer team would love to have more examples of cool
things people are doing with Flying Saucer and iText.
Resources
Joshua Marinacci first tried Java in 1995 at the request of his favorite TA and has never looked back.
THk you very much