I was surprise how easy it was to combine separate PDFs into a single file using Python. I was scanning a large document and my scanner could not handle the entire document. So I had to perform multiple scans that created multiple PDF files. I was looking for free PDF combining software when I decided to search for Python code.
I found sample code on Stack Overflow that uses PyPDF2 library. The code isn’t fancy and uses listdir to get all PDF files in a common directory. I added the input directory but didn’t program a user interface so the directory files names need to sort in the order you want to merge. I had 17 files so I renamed them to 01.pdf, 02.pdf, … 10.pdf, 11.pdf, …, 16.pdf, 17.pdf.
The code below was developed under Python 3.8.2 on Windows 10 platform using the PyPDF2 library. To install the PyPDF2 library use pip install PyPDF2.
Code
import os
import sys # system interface (argvs)
from PyPDF2 import PdfFileMerger # pdf library
def mergePDFs(pdfdir):
if os.path.exists(pdfdir) :
x = [a for a in os.listdir(pdfdir) if a.endswith('.pdf')]
merger = PdfFileMerger()
for pdf in x:
merger.append(open(pdfdir + '\\' + pdf, 'rb'))
if os.path.isfile(pdfdir + '\\' + 'result.pdf') is not True:
with open(pdfdir + '\\' + 'result.pdf', 'wb') as fout:
merger.write(fout)
else:
print("Output file result.pdf exists. Exit without saving.")
else:
print('Directory " + pdfdir + " does not exist.')
if __name__ == "__main__":
# get pdf input directory
pdfdir = ''
if len(sys.argv) == 2:
pdfdir = sys.argv[1]
if pdfdir == '':
pdfdir = input('Input pdf input file directory: ')
# call function
mergePDFs(pdfdir)
I like using command line arguments so the “main” startup function checks for an input argument that is the PDF directory for input and output. If the argument is missing then prompt the user.
A list x is created using os.listdir and files ending with .pdf. The list x only includes file names so the path is added when performing the append method. Once all the files have been appended then result file is written.
Error checking is performed to ensure the input directory exists and the output file is not present.
The next code improvement is to add a user interface where files are selected and ordered for merging.