Categories
Uncategorized

Python: splitting Multi-page PDF file into one PDF file per page

Saving Each page of a PDF as a different file.

As mentioned earlier, I volunteer at a Not-For-Profit organization where I handle their IT needs and bookkeeping. Each month, our accountant sends me a consolidated PDF file containing the paystubs of all our staff members. With our transition to remote work, it became necessary to email individual paystubs to the staff when their salaries are sent out.

To streamline this process, I sought an automated solution to avoid manually printing each paystub as a separate file. Thankfully, Python came to the rescue. By using the PyPDF2 library, I developed a script that splits the consolidated PDF file into individual pages, enabling me to email each staff member their respective paystub.

Before running the script, it’s important to install the PyPDF2 library by using the ‘pip install PyPDF2’ command.

from PyPDF2 import PdfFileWriter, PdfFileReader

def split_pdf_pages(input_path, output_prefix):
    input_file = PdfFileReader(open(input_path, "rb"))

    for i in range(input_file.numPages):
        output = PdfFileWriter()
        output.addPage(input_file.getPage(i))

        output_path = f"{output_prefix}-page{i}.pdf"
        with open(output_path, "wb") as output_file:
            output.write(output_file)

# Usage example
split_pdf_pages("w2.pdf", "document")

The script uses a module called PyPDF2 to work with PDF files in Python. It defines a function called split_pdf_pages that takes two arguments: the path to the input PDF file and a prefix for the output file names. Inside the function, it opens the input PDF file and loops through each page. For each page, it creates a new PDF file that contains only that specific page. The output file name is generated by combining the prefix with the current page number. The content of the new PDF file is then written to the output file. After looping through all pages, separate PDF files are created for each page of the input PDF. The function can be called by providing the input PDF file path and the output file name prefix to split the PDF file into separate pages.

Leave a comment