In this article, We are going to learn how to delete pages from a pdf file in Python programming language.
Introduction
Modifying documents is a common task performed by many users. We can perform this task easily with Python libraries/modules that allow the language to process almost any file, the possibility of data processing inside Programming languages have become limitless. This article is about how to delete pages from a PDF file in Python.
Prerequisite:
The PyMuPDF library will be used for PDF processing in this article. To install the library in our system, run the following command in the command prompt.
pip install pymupdf
NOTE: This library is imported by using the following command.
import fitz
Deleting Pages with PyMuPDF
The PyMuPDF library offers various methods that simplify deleting pages from a PDF file. It allows specifying a single page, a range of page numbers, or a list with the page numbers.
Using each method, the following examples demonstrate how to delete pages from PDF files.
Input pdf file used:
Method 1: Deleting a singular page from a PDF
The delete_page() function in the library allows the deletion of a single page. The function takes an argument of the page number. The page associated with the number is deleted in the PDF. Here also indexing starts from ‘0’ so if we pass ‘0’ as an argument first page will be deleted. The following example deletes page number 1.
Note: The pdf file and program should in the same folder to avoid an error because we are not passing the path.
Python3
import fitz # Path of the PDF file input_file = r "test.pdf" # Path for the output PDF file output_file = r "modified_test.pdf" # Opening the PDF file and creating a handle for it file_handle = fitz. open (input_file) # The page no. denoted by the variable will be deleted page = 0 # Passing the variable as an argument file_handle.delete_page(page) # Saving the file file_handle.save(output_file) |
Output: After running the above code a new file is generated with the name ‘modified_test.pdf’ in which first page is deleted.
Method 2: Deleting a range of page numbers from a PDF
The delete_pages() method in the Python library allows for the deletion of a range of page numbers. The function considers two variables: first, the starting index, and second, the ending index. The pages between these indexes will be deleted. The following example opens the PDF file and deletes the pages between 2 and 7 page numbers.
Python3
import fitz # Path of the PDF file input_file = r "test.pdf" # Path for the output PDF file output_file = r "modified_test.pdf" # Opening the PDF file and creating a handle for it file_handle = fitz. open (input_file) # The index (page no.) from where the pages are to be deleted start = 2 # The index to which the pages are to be deleted end = 7 # Passing the start & end index as arguments file_handle.delete_pages(start, end) # Saving the file file_handle.save(output_file) |
Output: After running the above code we get the modified pdf file in which pages number 3, 4, 5, 6, 7, and 8 are deleted.
Method 3: Deleting a list of pages from a PDF
Similarly, the select() method allows the deletion of pages based on their numbers. i.e., The select function takes a list as an argument containing the page number of the pages we are willing to preserve, and the rest of the pages are deleted. Ex. If a PDF contains 10 pages, and we pass in argument the list [1, 3, 5] to the select function, then only these pages will remain, and the rest will be deleted. The following example deletes all the pages other than the page numbers 0, 1, and 3 from the PDF.
Python3
import fitz # Path of the PDF file input_file = r "test.pdf" # Path for the output PDF file output_file = r "modified_test.pdf" # Opening the PDF file and creating a handle for it file_handle = fitz. open (input_file) # This list contains the pages that we are willing to keep # Rest are deleted pages_list = [ 0 , 1 , 3 ] # Passing the list to the select function file_handle.select(pages_list) # Saving the file file_handle.save(output_file) |
Output: The output of the above code is a modified pdf file in which only pages 1, 2, and 4 are present rest are deleted.