I’ve just gone through a project to scan all of my files. Yes, all of them. The first step was straightforward: get a sheet-feed scanner. I settled on the Brother MFC-7460DN to replace my older DCP-7030. That keeps the extraneous gadget count down, and it happened to be on sale at the time. Out of the box you just put in a document and you get a PDF. Perfect simplicity right?
If you’re doing single-sided documents, that’s all there is to it. Then you get a double sided document, and suddenly you have a problem. I’d actually looked for a duplexing scanner, but those don’t seem to exist outside of very high end photocopiers. That leaves you with only a few options: you can scan things by hand, or ignore the back sides. Or if you’re really creative, just flip the stack over and put it into the stack again. Of course, you’ll end up with the pages in a wacky order. For a 10 page document, it’ll be: 1, 3, 5, 7, 9, 10, 8, 6, 4, 2.
This last part got me thinking. Really, we have all of the page data, it’s just in the wrong order. This isn’t a hardware problem, it’s a software problem! If we can reorder the pages in software, then the following procedure will work:
- Scan the front sides. This produces pages 1, 3, 5…
- Reload the stack upside down and scan again. This produces pages N, N-2, … with page 2 being last
- Save all of the pages to the same file (easy to do with the built in software on Mac and with the bundled software on Windows)
- Run a script to reorder the pages
Which tool can we use to reorder pages? My first thought went to the swiss army knife of desktop programming: Python! A quick search turned up a StackOverflow post that describes nicely how to reorder PDF pages using pyPdf.
Now all that’s needed is to get the ordering right. If you have a look at the ordering listed above, you need to get the first page and the last page, then the second page and the second last page, then the third page and the third last page. Keep going until you meet in the middle. Write all of that out to a new file. Done!
Here’s the python script I wrote to get this done, with much of the documentation stripped. You can see the full duplex.py file on pastebin.
from pyPdf import PdfFileWriter, PdfFileReader
input_filename = "Scan.pdf"
output_filename = "Scan_reordered.pdf"
# create a PDF writer to hold the output
output_pdf = PdfFileWriter()
# open up the input file and start reading the PDF
with open(input_filename, 'rb') as input_file:
input_pdf = PdfFileReader(input_file)
# check that we have an even number of pages
total_pages = input_pdf.getNumPages()
if (total_pages % 2) != 0:
print "Cannot de-duplex an odd number of pages!"
# Get the pages in pairs, starting with the first and last,
# and then proceeding to the second and second last, and so on
for i in range(0, total_pages/2):
print "Adding page " + str(i)
print "Adding page " + str(total_pages - i - 1)
output_pdf.addPage(input_pdf.getPage(total_pages - i - 1))
# Write the output file.
# NB: must be done before closing the input file!
with open(output_filename, 'wb') as output_file:
This was actually faster to write than installing Python and pyPdf on Windows! This version looks for Scan.pdf, since that’s what the scanner produces on my Mac. I simply don’t bother to rename the file until after I’ve done the reordering. For those who aren’t familiar with Python and just want to use the tool, here are some basic instructions:
- Install Python. On Mac and Linux, chances are you already have it. Here are some instructions on how to install Python and Pip on Windows.
- Install pyPdf. On Mac and Linux it’s as easy as “sudo pip install pypdf”. On Windows, if you have pip then it’s just about the same thing.
- Download the duplex.py script
- Name the file you want reordered “Scan.pdf” and put it in the same folder as the script
- Run the script
- Voila! Your reordered PDF is in Scan_reordered.pdf