Combining PDF Files using Python

I was surprise how easy it was to combine separate PDFs into a single file using Python. I was scanning a large document and my scanner could not handle the entire document. So I had to perform multiple scans that created multiple PDF files. I was looking for free PDF combining software when I decided to search for Python code.

I found sample code on Stack Overflow that uses PyPDF2 library. The code isn’t fancy and uses listdir to get all PDF files in a common directory. I added the input directory but didn’t program a user interface so the directory files names need to sort in the order you want to merge. I had 17 files so I renamed them to 01.pdf, 02.pdf, … 10.pdf, 11.pdf, …, 16.pdf, 17.pdf.

The code below was developed under Python 3.8.2 on Windows 10 platform using the PyPDF2 library. To install the PyPDF2 library use pip install PyPDF2.

Code

import os
import sys                          # system interface (argvs)
from PyPDF2 import PdfFileMerger    # pdf library

def mergePDFs(pdfdir):
    if os.path.exists(pdfdir) :
        x = [a for a in os.listdir(pdfdir) if a.endswith('.pdf')]
    
        merger = PdfFileMerger()
        for pdf in x:
            merger.append(open(pdfdir + '\\' + pdf, 'rb'))
    
        if os.path.isfile(pdfdir + '\\' + 'result.pdf') is not True:
            with open(pdfdir + '\\' + 'result.pdf', 'wb') as fout:
                merger.write(fout)
        else:
            print("Output file result.pdf exists. Exit without saving.")

    else:
        print('Directory " + pdfdir + " does not exist.')
        
if __name__ == "__main__": 
    # get pdf input directory
    pdfdir = ''
    if len(sys.argv) == 2:
        pdfdir = sys.argv[1]
        
    if pdfdir == '':
        pdfdir = input('Input pdf input file directory: ')
    
    # call  function 
    mergePDFs(pdfdir) 

I like using command line arguments so the “main” startup function checks for an input argument that is the PDF directory for input and output. If the argument is missing then prompt the user.

A list x is created using os.listdir and files ending with .pdf. The list x only includes file names so the path is added when performing the append method. Once all the files have been appended then result file is written.

Error checking is performed to ensure the input directory exists and the output file is not present.

The next code improvement is to add a user interface where files are selected and ordered for merging.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s