Pages

Thursday, April 1, 2010

Free way to convert an existing PDF to PDF/A

The following test resulted in a valid and verified PDF/A produced from the Linux command line.

Requirements:
Ghostscript 8.64 Only.
PDFBox 0.7.3
pdfmarks ( file to supply additional meta data)
PDFA_def.ps
USWebCoatedSWOP.icc

I converted the PDF to PDF/A running the following command:
gs -sDEVICE=pdfwrite -q -dNOPAUSE -dBATCH -dNOSAFER -dPDFA -dUseCIEColor -sProcessColorModel=DeviceCMYK
-sOutputFile=Out_PDFA.pdf PDFA_def.ps pdfmarks IN_PDF.pdf

The PDF/A created is named OUT_PDFA.pdf
If you verify the file at this stage it will show that there are Annotation print flags that are not set for each hyperlink

To fix that problem, it is necessary to set the print flag in each link by using a Java program I wrote called "FixPrintFlag.java".
FixPrintFlag.java uses PDFBox library to acces each link and set the print flag in the PDF file.

running FixPrintFlag:
usage FixPrintFlag input_pdf output_pdf

so : java FixPrintFlag Out_PDFA.pdf New_verifiablePDFA.pdf

using Adobe, New_verifiablePDFA.pdf verifies as a valid PDF/A

---\code for FixPrintFlag.java
****start******:
package org.pdfbox.examples.pdmodel;

import org.pdfbox.pdmodel.PDDocument;
import org.pdfbox.pdmodel.PDPage;
import org.pdfbox.pdmodel.common.PDRectangle;
import org.pdfbox.pdmodel.edit.PDPageContentStream;
import org.pdfbox.pdmodel.font.PDFont;
import org.pdfbox.pdmodel.font.PDType1Font;
import org.pdfbox.pdmodel.graphics.color.PDGamma;
import org.pdfbox.pdmodel.interactive.action.type.PDActionURI;
import org.pdfbox.pdmodel.interactive.annotation.PDAnnotation;
import org.pdfbox.pdmodel.interactive.annotation.PDAnnotationLine;
import org.pdfbox.pdmodel.interactive.annotation.PDAnnotationSquareCircle;
import org.pdfbox.pdmodel.interactive.annotation.PDAnnotationTextMarkup;
import org.pdfbox.pdmodel.interactive.annotation.PDAnnotationLink;
import org.pdfbox.pdmodel.interactive.annotation.PDBorderStyleDictionary;


import java.util.List;

public class FixPrintFlag
{
public StringBuffer errMsg = new StringBuffer();


private FixPrintFlag()
{
//utility class, should not be instantiated.
}


public static void main( String[] args ) throws Exception
{
PDDocument doc =
null;
try
{
if( args.length != 2 )
{
usage();
}
else
{
doc = PDDocument.load( args[0] );
List allPages = doc.getDocumentCatalog().getAllPages();
for ( int i=0; i< page =" (PDPage)allPages.get(" annotations =" page.getAnnotations();">for ( int j = 0; j < annot =" (PDAnnotation)annotations.get(">if ( annot instanceof PDAnnotationLink ) {
PDAnnotationLink link = (PDAnnotationLink)annot;
link.setPrinted(
true);
System.out.println(
"setting print flag...");
}
}
}
}
if (args[1] != null)
doc.save(args[1]);
}
catch (Exception ex) {
System.err.println(
"Error parsing pdf: " + ex.getMessage());
}
}
private static void usage()
{
System.err.println(
"Usage: java org.pdfbox.examples.pdmodel.FixPrintFlag " );
}

}
******end code********************
__________________________________________________________________________________________________________________________

pdfmarks can be used to supply MetaData..date and times left out intentionally.


contents:
[ /Title (Document title)
/Author (Author name)
/Subject (Subject description)
/Keywords (comma, separated, keywords)
/Creator (application name or creator note)
/Producer (PDF producer name or note)
/DOCINFO pdfmark



_______________________________________________________________________________________________________________________________________

NOTE: don't try gs v 8.70 , it will error out early even if the pdf was modified by FixPrintFlag. Ther appears to be a bug in v8.70.


GPL Ghostscript 8.70: Annotation set to non-printing,
not permitted in PDF/A, reverting to normal PDF output


NOTE: It will work for gs v 8.71...
note on 8.71 , run the FixPrintFlag on the input pdf
then take the output from FixPrintFlag and use that as the input to gs 7.1
command above
the file that you get out from gs will verify as a valid PDF/A using Adobe.

1 comment:

  1. I get an error.

    ```
    $ gs -sDEVICE=pdfwrite -q -dNOPAUSE -dBATCH -dNOSAFER -dPDFA -sProcessColorModel=DeviceCMYK -sOutputFile=Out_PDFA.pdf PDFA_def.ps pdfmarks Die-Wut-und-der-Stolz.pdf
    Error: /undefinedfilename in --file--
    Operand stack:
    --nostringval-- --nostringval-- (srgb.icc) (r)
    Execution stack:
    %interp_exit .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- false 1 %stopped_push 1999 1 3 %oparray_pop 1998 1 3 %oparray_pop 1982 1 3 %oparray_pop 1868 1 3 %oparray_pop --nostringval-- %errorexec_pop .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval--
    Dictionary stack:
    --dict:1210/1684(ro)(G)-- --dict:0/20(G)-- --dict:79/200(L)--
    Current allocation mode is local
    Last OS error: No such file or directory
    Current file position is 793
    GPL Ghostscript 9.20: Unrecoverable error, exit code 1
    ```

    ReplyDelete