Converting a two-pages-per-sheet PDF to one-page-per-sheet

If you got a PDF scan of a book with two pages per sheet such as this one (BTW this is a manuscript dating from 1449: “Tractato de septe peccati mortali” by Frate Antonino, image copyright Houghton Library, Harvard University, Cambridge, Mass.)

and you wish to convert it to a one-page-per-sheet PDF:

then you can use these steps on Debian Linux:

  1. Query the PDF for the number of pages and the resolution:
    pdfinfo ugly.pdf

    look at the “Pages” output of this command. Now type:

    pdftoppm -gray -l 1 ugly.pdf test

    then inspect the resulting test-001.pgm file with an image editor to find out the resolution; for the pages and the resolution I got 223 and 1650 x 1275 pts respectively, so these numbers will be used in the following – you should of course adapt them to your results.

  2. Create a bash script to process a single page:
    cat > doone.sh
    #!/bin/bash
    page=`printf '%03d' $1`
    pagenew=`printf '%03d_' $1`
    gs -sDEVICE=pdfwrite -dNOPAUSE -dQUIET -dBATCH -dFirstPage=$1 -dLastPage=$1 -sOutputFile="$page.pdf" ugly.pdf
    pdftoppm -gray "$page.pdf" > "$page.pgm"
    convert -crop 825x1275 "$page.pgm" "$pagenew.pgm"
    rm "$page.pgm"
    ^D
    chmod u+x doone.sh

    Note that for the X-resolution option of the convert command, I enter the half (625) of the horizontal resolution above (1250); in this way the pgm will be split in two vertically. The pdftoppm command has a -mono option to produce monochrome images, and a -r option to set the resolution.

  3. Run the bash script on all pages:
    seq 223 | xargs -n1 ./doone.sh
  4. Finally concatenate the pages to get hold of the converted PDF:
    convert *_.pgm nice.pdf

    or do that in two steps:

    for i in *.pgm; do convert -compress fax $i `basename $i .pbm`.pdf; done
    gs -q -sPAPERSIZE=a4 -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=nice.pdf *_.pdf

About paolog

homo technologicus cynicus
This entry was posted in Howtos. Bookmark the permalink.

One Response to Converting a two-pages-per-sheet PDF to one-page-per-sheet

  1. paolog says:

    This works, but it is much easier to with briss: http://sourceforge.net/projects/briss/

Leave a Reply

Your email address will not be published. Required fields are marked *

Anti-Spam Quiz: