Skip to Content

Creating PDF from code with FOP

View this article in Belorussian provided by fatcow. [ Translated by Uta Bayer ]

I recently had a project that needed to do a whole lot of number crunching, and then create a PDF to present those numbers and a line chart as well. This presented two problems for me - how to create a PDF from code, and how to create a chart for use within that PDF. I can say that I have had some success and wish to share with you the process. I'll talk about the PDF side of things in this article, and post another soon with details on how I handled the chart. But an overview of the chart solution is included below.

First, some more background. The project also needed a separate web interface to search the crunched data with date ranges, or specific locations. We decided early that we would utilize PHP for this as it would be real quick to do so, as well as implementing an infrastructure that can grow. The one main sore point here is that we needed PHP on a LAMP server to talk to a dedicated Windows MSSQL 2005 server running on a Windows 2003 box. After a bunch of digging it turns out that PHP on a Ubuntu server needs the php5-sybase package installed, and then using the MSSQL functions work fine. I found a page that gave me a great sample of the needed code to start from.

Seeing as we were using PHP for the search portion of the project, we opted to use it to help generate the PDF as well. Now, PHP can generate PDFs directly but the process is rather monotonous. The libraries indicate that you need to create and position everything manually, as well as taking care of page breaks manually. Instead, the local Linux Users Group came through (again) and offered an alternative. Neil Mayhew, with the Programming Special Interest Group (PROGSIG), offered a suggestion to us FOP. FOP is an Apache project that makes PDF generation very simple. It is a Java application, so you do need a Java environment, but it can be called as a command line statement. The only real issue here is that FOP makes use of XML and XSL-FO. We were already intending on putting the data into XML for easy translation via XSLT. So we only needed to go a step further and create the XSL-FO statements (which is the superset of XSLT).

The second hurdle was the chart. We initially opted to use Google Charts because it was quick and easy. However, we quickly ran into the limitations of a generic and free 3rd party solution. Sure some of the issues could be addressed, but taking into account the amount of data we would be graphing, and the URL size limits, we were going to have a problem. In addition there was the ethical/responsible question about passing our internal data through a 3rd party's servers just for a simple graph. So, again we looked into alternatives. The solution turned out to be SVG. Using the XSL-FO we could create and embedd SVG code. SVG is itself an XML based format, so this was easy. We fired up Inkscape and created a dummy line chart to use as a base for our dynamic chart.

With a plan in place to cover the sticking points, we started coding. Besides the (shallow) learning curve for XSL-FO and SVG, we were impressed with how quickly things fell together. We spent more time working out how to get our data from the database than we did building the report.

For those who want to see this in action, here's a simple How-To.

1. Build your XML.

For this how-to we'll use the following:

  <?xml version="1.0" encoding="ISO-8859-1"?>
  <mydata>
    <record>
      <date>1 May 2008</date>
      <earnings>100.00</earnings>
    </record>
    <record>
      <date>1 May 2008</date>
      <earnings>150.00</earnings>
    </record>
    <record>
      <date>1 May 2008</date>
      <earnings>80.00</earnings>
    </record>
    <record>
      <date>1 May 2008</date>
      <earnings>200.00</earnings>
    </record>
  </mydata>

You would build this with your own data and structure, of course. The PHP DOM object is useful here.

One tip. It is MUCH easier to have your code do any number crunching needed, than it is to have XSLT do it. And much easier to understand too. If you need to total any columns, or determine something dynamically that affects the formatting - have your CODE do it and insert XML elements to represent this data. Keep the XSLT stuff to simple layout chores. I'm sure the XML/XSLT zealots out there will tell me this isn't needed. But I find that I am more productive if I do the hard stuff in code first.

2. Build your XSL-FO document.

If you have ever done an XSLT file, then this is relatively easy. Otherwise, it can be rather cryptic and take a bit of getting used to. A good source to start can be found at W3Schools. In the end you'll have a document something like this:

<?xml version="1.0" encoding="ISO-8859-1"?>

<xsl:stylesheet version="1.0" 
        xmlns:fo="http://www.w3.org/1999/XSL/Format" 
        xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
        
  <xsl:output method="xml" version="1.0" indent="yes" encoding="UTF-8"/>
  
  <xsl:template match="mydata">
    <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
    
      <fo:layout-master-set>
        <fo:simple-page-master  master-name="summarypage"
                              page-height="11in"
                              page-width="8.5in"
                              margin-top="1in"
                              margin-bottom="1in"
                              margin-left="1in"
                              margin-right="1in"
                              >
          <fo:region-body />
        </fo:simple-page-master>
      </fo:layout-master-set>
    
      <fo:page-sequence master-reference="summarypage">
        <fo:flow flow-name="xsl-region-body">
        
          <!-- Report Header -->
          <fo:block text-align="center"
                    font-size="18pt"
                    font-weight="bold">My Custom PDF Report</fo:block>
                    
          <fo:block text-align="center"
                    font-size="14pt"
                    font-weight="bold"
                    padding-after="0.25in">Showing Some data</fo:block>
          
          <!-- Current Numbers -->
          <fo:block text-align="center"
                    font-size="8pt">
            
            <fo:table table-layout="fixed" width="100%">
              <fo:table-body>
                <fo:table-row background-color="#000"
                              border-left-style="solid"
                              border-right-style="solid"
                              border-top-style="solid">
                  <fo:table-cell >
                   <fo:block color="#fff" 
                                text-align="center" 
                                padding-top="0.5em" 
                                padding-bottom="0.5em">
                                  Date
                   </fo:block>
                   </fo:table-cell>
                  <fo:table-cell >
                    <fo:block color="#fff" 
                                 text-align="center" 
                                 padding-top="0.5em"
                                 padding-bottom="0.5em">
                                   Earnings
                    </fo:block>
                  </fo:table-cell>
                </fo:table-row>   
                <xsl:for-each select="record">
                  <fo:table-row border-left-style="solid"
                                    border-right-style="solid"
                                    border-bottom-style="solid">
                    <fo:table-cell >
                      <fo:block text-align="center" 
                                   padding-top="0.5em">
                        <xsl:value-of select="date"/>
                      </fo:block>
                    </fo:table-cell>
                    <fo:table-cell >
                      <fo:block text-align="center" 
                                   padding-top="0.5em">
                        <xsl:value-of select="earnings"/>
                      </fo:block>
                    </fo:table-cell>
                  </fo:table-row>   
                </xsl:for-each>
              
              </fo:table-body>
            </fo:table>
          </fo:block>
        
        </fo:flow>
      </fo:page-sequence>
    </fo:root> 
  </xsl:template>
</xsl:stylesheet>

This looks at lot more complex than it really is. We'll break it down.

Just like any other XML document, we need a preprocessor directive saying so:

<?xml version="1.0" encoding="ISO-8859-1"?>

Then we set up a standard XSLT stylesheet document.

<xsl:stylesheet version="1.0" 
        xmlns:fo="http://www.w3.org/1999/XSL/Format" 
        xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

The difference here is that we are including the fo namespace as well as xsl.

Next we tell the processor we want an XML document output from this

<xsl:output method="xml" version="1.0" indent="yes" encoding="UTF-8"/>

And then we set up the main XSL template to process:

<xsl:template match="mydata">

Notice that we are matching the root element out of the XML file.

Now we start the XSL-FO section:

<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">

XSL-FO can define multiple layouts. For instance you may want most of your pages to use US Letter format in Portrait orientation, but have some pages use Landscape orientation. You define these layouts first, then reference them later. In our sample, we are defining a single US Letter page with portrait orientation:

<fo:layout-master-set>
        <fo:simple-page-master  master-name="summarypage"
                              page-height="11in"
                              page-width="8.5in"
                              margin-top="1in"
                              margin-bottom="1in"
                              margin-left="1in"
                              margin-right="1in"
                              >
          <fo:region-body />
        </fo:simple-page-master>
      </fo:layout-master-set>

You would add a now simple-page-master entry for each page format you want. Give it a "master-name" attribute - the value is arbitrary, but should be unique from the other simple-page-master values.

Now we create an instance using the defined page format:

<fo:page-sequence master-reference="summarypage">
        <fo:flow flow-name="xsl-region-body">

The page-sequence uses the master-reference attribute to indicate which of our page layouts should be used.

Each page-sequence can have one flow sections. Within this you can create a number of blocks which define where you will put your text, images, etc.

<!-- Report Header -->
          <fo:block text-align="center"
                    font-size="18pt"
                    font-weight="bold">My Custom PDF Report</fo:block>
                    
          <fo:block text-align="center"
                    font-size="14pt"
                    font-weight="bold"
                    padding-after="0.25in">Showing Some data</fo:block>

Here we are defining two blocks to hold our page header. We also put our header text in here. As you can see there are a number of properties/attributes for a block that define how the contents are handled - centered, font size, padding, etc. A block is very much like a div in HTML, at least in concept.

We are getting close to putting our custom data in place. We've decided to use a table for this, so the next block defines a table with a header row:

<!-- Current Numbers -->
          <fo:block text-align="center"
                    font-size="8pt">
            
            <fo:table table-layout="fixed" width="100%">
              <fo:table-body>
                <fo:table-row background-color="#000"
                              border-left-style="solid"
                              border-right-style="solid"
                              border-top-style="solid">
                  <fo:table-cell >
                    <fo:block color="#fff" 
                                 text-align="center" 
                                 padding-top="0.5em"
                                 padding-bottom="0.5em">
                                   Date
                    </fo:block>
                  </fo:table-cell>
                  <fo:table-cell >
                    <fo:block color="#fff" 
                                 text-align="center" 
                                 padding-top="0.5em" 
                                 padding-bottom="0.5em">
                                   Earnings
                    </fo:block>
                  </fo:table-cell>
                </fo:table-row>   

Tables are always begun with a table and your content goes into the table-body. Within XSL-FO, tables are referred to as "tables and captions", so there are other sections that can fit in a table that we are not using here. The remainder sets up a row, and two table cells. We've applied a little formatting to make our header role distinct from the data rows.

And now we can insert our data:

 <xsl:for-each select="record">
           <fo:table-row border-left-style="solid"
                            border-right-style="solid"
                            border-bottom-style="solid">
             <fo:table-cell 
               ><fo:block text-align="center" 
                                padding-top="0.5em">
                 <xsl:value-of select="date"/>
               </fo:block>
             </fo:table-cell>
             <fo:table-cell >
               <fo:block text-align="center" 
                            padding-top="0.5em">
                 <xsl:value-of select="earnings"/>
               </fo:block>
            </fo:table-cell>
          </fo:table-row>   
        </xsl:for-each>

We are using normal XSL transformations to say that for every record element in our XML file, we want to add a table row, that contains two cells. These two cells contain the values of the date and earnings elements for the record.

And all that is left now is to close off all our elements:

 </fo:table-body>
            </fo:table>
          </fo:block>
        </fo:flow>
      </fo:page-sequence>
    </fo:root> 
  </xsl:template>
</xsl:stylesheet>

3. Set up FOP

Make sure you have Java installed ( "sudo apt-get install sun-java6-jre" on Ubuntu).

Update: FOP is now available in the Ubuntu repositories. So "sudo apt-get install fop" is all you need to do. If this applies to you, skip to step #4 below after installing FOP.

Next you need to download a binary (unless you want to compile from source) for FOP. Grab the .tar.gz file for the latest version one of the mirrors. Currently it is at version 0.94. (we'd recommend the fop-0.94-bin-jdk1.4.tar.gz file)

Once you have that downloaded, follow the instructions found in the Ubuntu Forums to set it up. We set it up for global use, so that our scripts can just call the "fop" command wherever they are.

4. Do a test run

As root, enter the following command:

fop -xml test.xml -xsl test.xsl -pdf test.pdf

Replace the test.xml, test.xsl, and test.pdf to match your files. The test.xml is our sample XML. The test.xsl is the XSL-FO file we described above. And test.pdf is the output file.

If all goes well, you will be returned to the command prompt with no messages. If it doesn't go so well, you'll see one of three types of output. The first type is where you've entered the command wrong or an input file doesn't exist - you'll be given the usage dump for fop. The second type is informational - you get some messages that tell you what is happening, or warnings of content overflowing their blocks. The third type of output is where you have an error in your XML or XSL-FO. Read the messages carefully - they tell you what is wrong, but it can be hard to find until you get used to it. I've found the error messages are straight forward and helpful.

You should end up with a PDF file that looks something like this image:

Or compare to the generated PDF.

For those who are curious, that image was generated with FOP directly:

fop -xml test.xml -xsl test.xsl -png test.png

All that changed was -pdf argument was replaced with -png. (I'm really starting to like this tool!)

5. Automate it

For one off, testing type jobs, we don't need to go any further. But to really make use of this, we'll need to apply data dynamically and generate the PDF's on the fly. Using PHP, this is easily done. Our algorithm could be:

  1. Process Data
  2. Create XML
  3. Create PDF

We'll assume you know how to process the data and create the xml files, as this is highly dependent on your own needs. We'll also assume you have hand crafted an XSL-FO file, and/or created one dynamically in the "Create XML" phase. The last step is what we are interested in.

In my case I needed to create a scheduled task to generate the PDF. So I chose to do this with a command line PHP script that was called via a CRON job.

We need to execute a shell command to create our PDF. This looks like

  $cmd = "fop";
  $cmd .= " -xml " . $xmlfile;    //the xml file to apply the template to
  $cmd .= " -xsl " . $xslfoFile;  //the xsl-fo file to do the transformations
  $cmd .= " -pdf " . $pdffile;    //the pdf file to create
  
  shell_exec($cmd);

With the shell_exec being the magic line.

Now, obviously running a shell command from within code can be dangerous. There are a bunch of gotcha's surrounding the shell command execution, and then regular file permissions come into play. If the user your script is running as does not have permission to read the input files, or write the output file, you can fully expect the command to fail. Your PHP environment needs to be set up to handle shell commands as well. If you are trying to do this on a web host, you are probably out of luck - after all you probably weren't able to instal FOP either. If you fall in this later group, check out the PHP PDF support.

Now hopfully you have your PDF being generated dynamically.

Conclusion

I am finding the FOP solution to be very powerful, and creates professional looking reports. I've only scratched the surface of what it can do, but can already tell it's become a valuable item in my coding toolbox of tricks.

If you find this article useful, let me know. Better yet, send me a link to one of your PDFs to post.