Diff'ing PDFs

As a freelance software developer I have to negotiate the terms and conditions of contracts on a regular basis. Once clients have agreed to my proposed changes they will send me a modified contract. These contracts are always in PDF format.

It can be very cumbersome to read all of the legaleze again to verify that the agreed upon changes have been properly included. And, just as important, to make sure that nothing else has been slipped in.

If these contracts had been text files the process would have been easy; the diff utility would have listed the set of the changes between two versions of a contract.

Although Adobe provides a tool for comparing PDF files I'd rather use the tools I'm already familiar with. Xpdf is an open source viewer for PDF files. In addition it has a number of very convenient command line utitilies, among others, pdftotext. pdftotext takes a PDF file and dumps all the text contained in it to a text file that is suitable for diff'ing with diff.

If you run OS X installing Xpdf is a simple matter of running (requires MacPorts):

sudo port install xpdf-tools

Now let's assume that the two versions of the PDF contract reside in the same directory. Running the following command converts both PDFs to their textual equivalent:

for f in `ls -1 *.pdf`; do pdftotext -layout -nopgbrk -enc UTF-8 $f ${f%.pdf}.txt; done

All that remains is running:

diff <old.txt> <new.txt>

to list the differences.

Comments !