Wednesday, November 28, 2007

Reducing memory usage of FOP

Here is a straight forward example on using FOP to generate PDF reports:

......
<fo:page-sequence master-reference="simpleA4">
......
  <fo:flow flow-name="xsl-region-body">
    <xsl:apply-templates select="the-bean" />
  </fo:flow>
......
</fo:page-sequence>
......


The original data is in form of Java beans. A container contains multiple Java objects of "TheBean". The data is converted into XML by Castor. You can notice that the XSL file simply "apply-templates" based on the beans. FOP is smart enough to loop through all the beans of type "TheBean" to render the reports.

But this straight forward approach has a problem. FOP writes output and free up resources on a per-page-sequence basis. In the above example, the "apply-templates" tag is within a page sequence. This means FOP will loop through all the data before wriing the output PDF. In one particular test, in order to generate a report with 11MB of data in XML format (the PDF output is over 400 pages), FOP used up 1.3GB of memory.

To get around the problem, we could use a difference page sequence for each object:

......
<xsl:for-each select="the-bean">
  <fo:page-sequence master-reference="simpleA4">
......
    <fo:flow flow-name="xsl-region-body">
      <fo:block>
        <xsl:apply-templates select="." />
      </fo:block>
    </fo:flow>
......
  </fo:page-sequence>
</xsl:for-each>
......


Here, the for-each loop will iterate the beans one by one using different page sequence. Using this approach on the same 11MB XML file, the memory usage is reduced to below 48MB.

If each object needs to maintain its page number count, we could use a variable to name the "last page block" differently:

......
<xsl:for-each select="the-bean">
  <fo:page-sequence master-reference="simpleA4"
    initial-page-number="1"
    force-page-count="no-force">

    <xsl:variable name="count">
      <xsl:value-of select="position()"/>
    </xsl:variable>
......
      Page <fo:page-number/> of
      <fo:page-number-citation
      ref-id='last-page{$count}'/>
......
    <fo:flow flow-name="xsl-region-body">
      <fo:block>
        <xsl:apply-templates select="." />
      </fo:block>
      <fo:block id="last-page{$count}"/>
    </fo:flow>
......
  </fo:page-sequence>
</xsl:for-each>
......


Note the use of force-page-count="no-force". It is to prevent FOP from inserting a blank page between page sequences.

1 comment:

jeff said...

Is it true that I cannot have a "nested" page-sequence?

My problem is that my table is 6,000 rows in length which is all enclosed within one page-sequence. Will converting this to a for-each loop for each ROW solve my problem?