Apachie PdfBox preflight can be used to validate PDF/A's.
using the exmple in the last entry:
issue the command:
java -classpath /home/fausser/LatestTrunk/PDFBox-trunk/trunk/preflight/target/preflight-1.8.0-SNAPSHOT-jar-with-dependencies.jar org.apache.pdfbox.preflight.Validator_A1b /home/fausser/Fixflag/AnnontsPDFA.pdf
The file /home/fausser/Fixflag/AnnontsPDFA.pdf is a valid PDF/A-1b file
download the jar file from:
https://builds.apache.org/job/PDFBox-trunk/lastBuild/org.apache.pdfbox$preflight/
Friday, March 22, 2013
freely convert PDFs to PDF/A using ghostscript-9.07
Here is an updated better way to freely convert PDFs to PDF/A using ghostscript-9.07
Convert annots.pdf to AnnotsPDFA.pdf
> /home/fausser/ghostscript-9.07/bin/gs -sDEVICE=pdfwrite -q -dNOPAUSE -dBATCH -dNOSAFER -dPDFA -dUseCIEColor -sProcessColorModel=DeviceCMYK -sOutputFile=AnnontsPDFA.pdf annots.pdf
GPL Ghostscript 9.07: Annotation set to non-printing,
not permitted in PDF/A, reverting to normal PDF output
Need to set the flag with a java program.....here is the code listing:
>cat FixPrintFlag.java
//package Utilities;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotation;
import org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotationLink;
import java.util.List;
public class FixPrintFlag
{
public StringBuffer errMsg = new StringBuffer();
private FixPrintFlag()
{
//utility class, should not be instantiated.
}
public static void main( String[] args ) throws Exception
{
PDDocument doc = null;
try
{
if( args.length != 2 )
{
usage();
}
else
{
doc = PDDocument.load( args[0] );
List allPages = doc.getDocumentCatalog().getAllPages();
for ( int i=0; i< allPages.size(); i++ ) {
PDPage page = (PDPage)allPages.get( i );
List annotations = page.getAnnotations();
for ( int j = 0; j < annotations.size(); j++ ) {
PDAnnotation annot = (PDAnnotation)annotations.get( j );
if ( annot instanceof PDAnnotationLink ) {
PDAnnotationLink link = (PDAnnotationLink)annot;
link.setPrinted(true);
System.out.println("setting print flag...");
}
}
}
}
if (args[1] != null)
doc.save(args[1]);
} catch (Exception ex) {
System.err.println("Error parsing pdf: " + ex.getMessage());
}
}
private static void usage()
{
System.err.println( "Usage: java org.pdfbox.examples.pdmodel.FixPrintFlag " );
}
}
To compile it......using Apache's pdfbox.jar and commons-logging.jar
javac -cp /home/fausser/Fixflag/pdfbox.jar:/home/fausser/Fixflag/commons-logging.jar FixPrintFlag.java
And Run it......
> java -cp /home/fausser/Fixflag/pdfbox.jar:/home/fausser/Fixflag/commons-logging.jar:. FixPrintFlag annots.pdf annots_out.pdf
setting print flag...
setting print flag...
setting print flag...
setting print flag...
setting print flag...
setting print flag...
setting print flag...
setting print flag...
setting print flag...
setting print flag...
setting print flag...
setting print flag...
setting print flag...
setting print flag...
setting print flag...
setting print flag...
setting print flag...
setting print flag...
setting print flag...
setting print flag...
>
trying to convert again:
> /home/fausser/ghostscript-9.07/bin/gs -sDEVICE=pdfwrite -q -dNOPAUSE -dBATCH -dNOSAFER -dPDFA -dUseCIEColor -sProcessColorModel=DeviceCMYK -sOutputFile=/home/fausser/Fixflag/AnnontsPDFA.pdf /home/fausser/Fixflag/annots_out.pdf
not a PDFA yet, does not verify as one using Adobe Acrobat Pro 11x.....needs OputIntent by using PDFA_def.ps:
cat /home/fausser/ghostscript-9.07/lib/PDFA_def.ps
%!
% This is a sample prefix file for creating a PDF/A document.
% Feel free to modify entries marked with "Customize".
% This assumes an ICC profile to reside in the file (ISO Coated sb.icc),
% unless the user modifies the corresponding line below.
% Define entries in the document Info dictionary :
/ICCProfile (/home/fausser/eciRGB_v2.icc) % Customize.
def
[ /Title (Title) % Customize.
/DOCINFO /home/fausser/pdfmark %not used
% Define an ICC profile :
[/_objdef {icc_PDFA} /type /stream /OBJ pdfmark
[{icc_PDFA} <
> /PUT pdfmark
[{icc_PDFA} ICCProfile (r) file /PUT pdfmark
% Define the output intent dictionary :
[/_objdef {OutputIntent_PDFA} /type /dict /OBJ pdfmark
[{OutputIntent_PDFA} <<
/Type /OutputIntent % Must be so (the standard requires).
/S /GTS_PDFA1 % Must be so (the standard requires).
/DestOutputProfile {icc_PDFA} % Must be so (see above).
/OutputConditionIdentifier (CGATS TR001) % Customize
>> /PUT pdfmark
[{Catalog} <> /PUT pdfmark
gs command using PDFA_def.ps
> /home/fausser/ghostscript-9.07/bin/gs -sDEVICE=pdfwrite -q -dNOPAUSE -dBATCH -dNOSAFER -dPDFA -dUseCIEColor -sProcessColorModel=DeviceCMYK -sOutputFile=/home/fausser/Fixflag/AnnontsPDFA.pdf /home/fausser/ghostscript-9.07/lib/PDFA_def.ps /home/fausser/Fixflag/annots_out.pdf
[fausser@sally Fixflag]$
now it verifies as one
Convert annots.pdf to AnnotsPDFA.pdf
> /home/fausser/ghostscript-9.07/bin/gs -sDEVICE=pdfwrite -q -dNOPAUSE -dBATCH -dNOSAFER -dPDFA -dUseCIEColor -sProcessColorModel=DeviceCMYK -sOutputFile=AnnontsPDFA.pdf annots.pdf
GPL Ghostscript 9.07: Annotation set to non-printing,
not permitted in PDF/A, reverting to normal PDF output
Need to set the flag with a java program.....here is the code listing:
>cat FixPrintFlag.java
//package Utilities;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotation;
import org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotationLink;
import java.util.List;
public class FixPrintFlag
{
public StringBuffer errMsg = new StringBuffer();
private FixPrintFlag()
{
//utility class, should not be instantiated.
}
public static void main( String[] args ) throws Exception
{
PDDocument doc = null;
try
{
if( args.length != 2 )
{
usage();
}
else
{
doc = PDDocument.load( args[0] );
List allPages = doc.getDocumentCatalog().getAllPages();
for ( int i=0; i< allPages.size(); i++ ) {
PDPage page = (PDPage)allPages.get( i );
List annotations = page.getAnnotations();
for ( int j = 0; j < annotations.size(); j++ ) {
PDAnnotation annot = (PDAnnotation)annotations.get( j );
if ( annot instanceof PDAnnotationLink ) {
PDAnnotationLink link = (PDAnnotationLink)annot;
link.setPrinted(true);
System.out.println("setting print flag...");
}
}
}
}
if (args[1] != null)
doc.save(args[1]);
} catch (Exception ex) {
System.err.println("Error parsing pdf: " + ex.getMessage());
}
}
private static void usage()
{
System.err.println( "Usage: java org.pdfbox.examples.pdmodel.FixPrintFlag
}
}
To compile it......using Apache's pdfbox.jar and commons-logging.jar
javac -cp /home/fausser/Fixflag/pdfbox.jar:/home/fausser/Fixflag/commons-logging.jar FixPrintFlag.java
And Run it......
> java -cp /home/fausser/Fixflag/pdfbox.jar:/home/fausser/Fixflag/commons-logging.jar:. FixPrintFlag annots.pdf annots_out.pdf
setting print flag...
setting print flag...
setting print flag...
setting print flag...
setting print flag...
setting print flag...
setting print flag...
setting print flag...
setting print flag...
setting print flag...
setting print flag...
setting print flag...
setting print flag...
setting print flag...
setting print flag...
setting print flag...
setting print flag...
setting print flag...
setting print flag...
setting print flag...
>
trying to convert again:
> /home/fausser/ghostscript-9.07/bin/gs -sDEVICE=pdfwrite -q -dNOPAUSE -dBATCH -dNOSAFER -dPDFA -dUseCIEColor -sProcessColorModel=DeviceCMYK -sOutputFile=/home/fausser/Fixflag/AnnontsPDFA.pdf /home/fausser/Fixflag/annots_out.pdf
not a PDFA yet, does not verify as one using Adobe Acrobat Pro 11x.....needs OputIntent by using PDFA_def.ps:
cat /home/fausser/ghostscript-9.07/lib/PDFA_def.ps
%!
% This is a sample prefix file for creating a PDF/A document.
% Feel free to modify entries marked with "Customize".
% This assumes an ICC profile to reside in the file (ISO Coated sb.icc),
% unless the user modifies the corresponding line below.
% Define entries in the document Info dictionary :
/ICCProfile (/home/fausser/eciRGB_v2.icc) % Customize.
def
[ /Title (Title) % Customize.
/DOCINFO /home/fausser/pdfmark %not used
% Define an ICC profile :
[/_objdef {icc_PDFA} /type /stream /OBJ pdfmark
[{icc_PDFA} <
> /PUT pdfmark
[{icc_PDFA} ICCProfile (r) file /PUT pdfmark
% Define the output intent dictionary :
[/_objdef {OutputIntent_PDFA} /type /dict /OBJ pdfmark
[{OutputIntent_PDFA} <<
/Type /OutputIntent % Must be so (the standard requires).
/S /GTS_PDFA1 % Must be so (the standard requires).
/DestOutputProfile {icc_PDFA} % Must be so (see above).
/OutputConditionIdentifier (CGATS TR001) % Customize
>> /PUT pdfmark
[{Catalog} <> /PUT pdfmark
gs command using PDFA_def.ps
> /home/fausser/ghostscript-9.07/bin/gs -sDEVICE=pdfwrite -q -dNOPAUSE -dBATCH -dNOSAFER -dPDFA -dUseCIEColor -sProcessColorModel=DeviceCMYK -sOutputFile=/home/fausser/Fixflag/AnnontsPDFA.pdf /home/fausser/ghostscript-9.07/lib/PDFA_def.ps /home/fausser/Fixflag/annots_out.pdf
[fausser@sally Fixflag]$
now it verifies as one
Thursday, August 19, 2010
Win32 command line free PDF to PDF/A converter
Similar to the previous entry for Linux type machines. Note 8.71 seems to be
more stable than 8.64 Ghostscript. Ghostscript 8.70 had problems
GET : http://sourceforge.net/projects/ghostscript/files/GPL%20Ghostscript/8.71/gs871w32.exe/download
GET java runtime or SDK
Files needed:
Fixflag (see April post)- java programs to turn 'on' print flags in each hyperlink.
PDFA_def.ps (see April post)- PDFA postcript def file
pdfmarks (see April post) - file containing metadata. edit this to put in your values.
USWebCoatedSWOP.icc (see April post)- color def file
test.bat contains ghost script command and parameters
To Preserve Hyperlinks and turn print flag on, run first:
java -cp C:\pdftest\Fixflag\pdfbox.jar:C:\pdftest\Fixflag\commons-logging.jar:. FixPrintFlag Input.pdf Output.pdf
then run test.bat
-------------------------------------------------------------
Otherwise
Run: test.bat
-------------------------------------------------------------
Contents of test.bat:
"C:\Program Files\gs\gs8.71\bin\gswin32c.exe" -sDEVICE=pdfwrite -q -dNOPAUSE -dBATCH -dNOSAFER -dPDFA -dUseCIEColor -sProcessColorModel=DeviceCMYK -sOutputFile=pdfa_out.pdf PDFA_def.ps pdfmarks c:\pdftest\test2.pdf
Note: make sure to change name of file in test.bat to match input and output names.
Similar to the previous entry for Linux type machines. Note 8.71 seems to be
more stable than 8.64 Ghostscript. Ghostscript 8.70 had problems
GET : http://sourceforge.net/projects/ghostscript/files/GPL%20Ghostscript/8.71/gs871w32.exe/download
GET java runtime or SDK
Files needed:
Fixflag (see April post)- java programs to turn 'on' print flags in each hyperlink.
PDFA_def.ps (see April post)- PDFA postcript def file
pdfmarks (see April post) - file containing metadata. edit this to put in your values.
USWebCoatedSWOP.icc (see April post)- color def file
test.bat contains ghost script command and parameters
To Preserve Hyperlinks and turn print flag on, run first:
java -cp C:\pdftest\Fixflag\pdfbox.jar:C:\pdftest\Fixflag\commons-logging.jar:. FixPrintFlag Input.pdf Output.pdf
then run test.bat
-------------------------------------------------------------
Otherwise
Run: test.bat
-------------------------------------------------------------
Contents of test.bat:
"C:\Program Files\gs\gs8.71\bin\gswin32c.exe" -sDEVICE=pdfwrite -q -dNOPAUSE -dBATCH -dNOSAFER -dPDFA -dUseCIEColor -sProcessColorModel=DeviceCMYK -sOutputFile=pdfa_out.pdf PDFA_def.ps pdfmarks c:\pdftest\test2.pdf
Note: make sure to change name of file in test.bat to match input and output names.
Thursday, April 1, 2010
Free way to convert an existing PDF to PDF/A
The following test resulted in a valid and verified PDF/A produced from the Linux command line.
Requirements:
Ghostscript 8.64 Only.
PDFBox 0.7.3
pdfmarks ( file to supply additional meta data)
PDFA_def.ps
USWebCoatedSWOP.icc
I converted the PDF to PDF/A running the following command:
gs -sDEVICE=pdfwrite -q -dNOPAUSE -dBATCH -dNOSAFER -dPDFA -dUseCIEColor -sProcessColorModel=DeviceCMYK
-sOutputFile=Out_PDFA.pdf PDFA_def.ps pdfmarks IN_PDF.pdf
The PDF/A created is named OUT_PDFA.pdf
If you verify the file at this stage it will show that there are Annotation print flags that are not set for each hyperlink
To fix that problem, it is necessary to set the print flag in each link by using a Java program I wrote called "FixPrintFlag.java".
FixPrintFlag.java uses PDFBox library to acces each link and set the print flag in the PDF file.
running FixPrintFlag:
usage FixPrintFlag input_pdf output_pdf
so : java FixPrintFlag Out_PDFA.pdf New_verifiablePDFA.pdf
using Adobe, New_verifiablePDFA.pdf verifies as a valid PDF/A
---\code for FixPrintFlag.java
****start******:
package org.pdfbox.examples.pdmodel;
import org.pdfbox.pdmodel.PDDocument;
import org.pdfbox.pdmodel.PDPage;
import org.pdfbox.pdmodel.common.PDRectangle;
import org.pdfbox.pdmodel.edit.PDPageContentStream;
import org.pdfbox.pdmodel.font.PDFont;
import org.pdfbox.pdmodel.font.PDType1Font;
import org.pdfbox.pdmodel.graphics.color.PDGamma;
import org.pdfbox.pdmodel.interactive.action.type.PDActionURI;
import org.pdfbox.pdmodel.interactive.annotation.PDAnnotation;
import org.pdfbox.pdmodel.interactive.annotation.PDAnnotationLine;
import org.pdfbox.pdmodel.interactive.annotation.PDAnnotationSquareCircle;
import org.pdfbox.pdmodel.interactive.annotation.PDAnnotationTextMarkup;
import org.pdfbox.pdmodel.interactive.annotation.PDAnnotationLink;
import org.pdfbox.pdmodel.interactive.annotation.PDBorderStyleDictionary;
import java.util.List;
public class FixPrintFlag
{
public StringBuffer errMsg = new StringBuffer();
private FixPrintFlag()
{
//utility class, should not be instantiated.
}
public static void main( String[] args ) throws Exception
{
PDDocument doc = null;
try
{
if( args.length != 2 )
{
usage();
}
else
{
doc = PDDocument.load( args[0] );
List allPages = doc.getDocumentCatalog().getAllPages();
for ( int i=0; i< page =" (PDPage)allPages.get(" annotations =" page.getAnnotations();">for ( int j = 0; j < annot =" (PDAnnotation)annotations.get(">if ( annot instanceof PDAnnotationLink ) {
PDAnnotationLink link = (PDAnnotationLink)annot;
link.setPrinted(true);
System.out.println("setting print flag...");
}
}
}
}
if (args[1] != null)
doc.save(args[1]);
} catch (Exception ex) {
System.err.println("Error parsing pdf: " + ex.getMessage());
}
}
private static void usage()
{
System.err.println( "Usage: java org.pdfbox.examples.pdmodel.FixPrintFlag " );
}
}
******end code********************
__________________________________________________________________________________________________________________________
pdfmarks can be used to supply MetaData..date and times left out intentionally.
contents:
[ /Title (Document title)
/Author (Author name)
/Subject (Subject description)
/Keywords (comma, separated, keywords)
/Creator (application name or creator note)
/Producer (PDF producer name or note)
/DOCINFO pdfmark
_______________________________________________________________________________________________________________________________________
NOTE: don't try gs v 8.70 , it will error out early even if the pdf was modified by FixPrintFlag. Ther appears to be a bug in v8.70.
GPL Ghostscript 8.70: Annotation set to non-printing,
not permitted in PDF/A, reverting to normal PDF output
NOTE: It will work for gs v 8.71...
note on 8.71 , run the FixPrintFlag on the input pdf
then take the output from FixPrintFlag and use that as the input to gs 7.1
command above
the file that you get out from gs will verify as a valid PDF/A using Adobe.
Requirements:
Ghostscript 8.64 Only.
PDFBox 0.7.3
pdfmarks ( file to supply additional meta data)
PDFA_def.ps
USWebCoatedSWOP.icc
I converted the PDF to PDF/A running the following command:
gs -sDEVICE=pdfwrite -q -dNOPAUSE -dBATCH -dNOSAFER -dPDFA -dUseCIEColor -sProcessColorModel=DeviceCMYK
-sOutputFile=Out_PDFA.pdf PDFA_def.ps pdfmarks IN_PDF.pdf
The PDF/A created is named OUT_PDFA.pdf
If you verify the file at this stage it will show that there are Annotation print flags that are not set for each hyperlink
To fix that problem, it is necessary to set the print flag in each link by using a Java program I wrote called "FixPrintFlag.java".
FixPrintFlag.java uses PDFBox library to acces each link and set the print flag in the PDF file.
running FixPrintFlag:
usage FixPrintFlag input_pdf output_pdf
so : java FixPrintFlag Out_PDFA.pdf New_verifiablePDFA.pdf
using Adobe, New_verifiablePDFA.
---\code for FixPrintFlag.java
****start******:
package org.pdfbox.examples.pdmodel;
import org.pdfbox.pdmodel.PDDocument;
import org.pdfbox.pdmodel.PDPage;
import org.pdfbox.pdmodel.common.
public class FixPrintFlag
{
public StringBuffer errMsg = new StringBuffer();
private FixPrintFlag()
{
//utility class, should not be instantiated.
}
public static void main( String[] args ) throws Exception
{
PDDocument doc = null;
try
{
if( args.length != 2 )
{
usage();
}
else
{
doc = PDDocument.load( args[0] );
List allPages = doc.getDocumentCatalog().
PDAnnotationLink link = (PDAnnotationLink)annot;
link.setPrinted(true);
System.out.println("setting print flag...");
}
}
}
}
if (args[1] != null)
doc.save(args[1]);
} catch (Exception ex) {
System.err.println("Error parsing pdf: " + ex.getMessage());
}
}
private static void usage()
{
System.err.println( "Usage: java org.pdfbox.examples.pdmodel.
}
}
******end code********************
______________________________
pdfmarks can be used to supply MetaData..date and times left out intentionally.
contents:
[ /Title (Document title)
/Author (Author name)
/Subject (Subject description)
/Keywords (comma, separated, keywords)
/Creator (application name or creator note)
/Producer (PDF producer name or note)
/DOCINFO pdfmark
______________________________
NOTE: don't try gs v 8.70 , it will error out early even if the pdf was modified by FixPrintFlag. Ther appears to be a bug in v8.70.
GPL Ghostscript 8.70: Annotation set to non-printing,
not permitted in PDF/A, reverting to normal PDF output
NOTE: It will work for gs v 8.71...
note on 8.71 , run the FixPrintFlag on the input pdf
then take the output from FixPrintFlag and use that as the input to gs 7.1
command above
the file that you get out from gs will verify as a valid PDF/A using Adobe.
Subscribe to:
Posts (Atom)