In this article, we will see how can we count the total number of pages in a PDF file in Python,
For this article there is no such prerequisite, we will use PyPDF2 library for this purpose. PyPDF2 is a free and open-source pure-Python PyPDF library capable of performing many tasks like splitting, merging, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. PyPDF2 can retrieve text and metadata from PDFs as well. Refer to this “Working with PDF files in Python” to explore about PyPDF2
Installing required library
Execute the below command to install the PyPDF2 library in the command prompt or terminal.
pip install PyPDF2
Step to Count the number of pages in a PDF file
Step 1: Import PyPDF2 library into the Python program
import PyPDF2
Step 2: Open the PDF file in read binary format using file handling
file = open('your pdf file path', 'rb')
Step 3: Read the pdf using the PdfFileReader() function of the PyPDF2 library
pdfReader = PyPDF2.PdfFileReader(file)
Note: These above three steps are similar for all methods that we are going to see using an example.
Methods to count PDF pages
We are going to learn three methods to count the number of pages in a PDF file which are as follows:
- By using the numPages property.
- By using the getNumPages() method.
- By using the pages property and len() function.
Method 1: Using numPages property
numPages is a property of PdfReader Class that returns the total number of pages in the PDF file.
totalPages1 = pdfReader.numPages
For Example:
Python3
# importing PyPDF2 library import PyPDF2 # opened file as reading (r) in binary (b) mode file = open ( '/home/hardik/GFG_Temp/dbmsFile.pdf' , 'rb' ) # store data in pdfReader pdfReader = PyPDF2.PdfFileReader( file ) # count number of pages totalPages = pdfReader.numPages # print number of pages print (f "Total Pages: {totalPages}" ) |
Output:
Total Pages: 10
In the above example, we imported the PyPDF2 module and opened the file using file handling in read binary format after that with the help of PdfFileReader() function of PyPDF2 module we read the pdf file which we opened previously, then with the help of the numPages property of the module we counted the total pages of PDF file and stored the total number of pages in a variable “totalPages” for further usage and at last, we print the variable holding the total page count of PDF file.
Method 2: Using getNumPages() method
getNumPages() is a method of PdfReader class that returns an integer specifying a total number of pages and it takes no argument this method is deprecated since version 1.28.0 but we can still use another method that comes in its replacement is next method discussed.
totalPages2 = pdfReader.getNumPages()
Python3
# importing PyPDF2 library import PyPDF2 # opened file as reading (r) in binary (b) mode file = open ( '/home/hardik/GFG_Temp/dbmsFile.pdf' , 'rb' ) # store data in pdfReader pdfReader = PyPDF2.PdfFileReader( file ) # count number of pages totalPages = pdfReader.getNumPages() # print number of pages print (f "Total Pages: {totalPages}" ) |
Output:
Total Pages: 10
In the above example, we imported the PyPDF2 module and opened the file using file handling in reading binary format after that with the help of the PdfFileReader() function of PyPDF2 module we read the pdf file that we opened previously, then with the help of getNumPages() method of the module we counted the total pages of PDF file and stored the total number of pages in a variable “totalpages” for further usage and at last, we print the variable holding the total page count of PDF file.
Method 3: Using pages property and len() function
pages is a read-only property that emulates a list of Page objects and using len() function which is Python’s inbuilt function to count the length of a sequence is used combinedly to determine the total pages of the PDF.
totalPages3 = len(pdfReader.pages)
Python3
# importing PyPDF2 library import PyPDF2 # opened file as reading (r) in binary (b) mode file = open ( '/home/hardik/GFG_Temp/dbmsFile.pdf' , 'rb' ) # store data in pdfReader pdfReader = PyPDF2.PdfFileReader( file ) # count number of pages totalPages = len (pdfReader.pages) # print number of pages print (f "Total Pages: {totalPages}" ) |
Output:
Total Pages: 10
In the above example we imported the PyPDF2 module and opened the file using file handling in read binary format then with the help of PdfFileReader() function of PyPDF2 module we read the pdf file which we opened previously, then with the help of the pages property of the module we get the list of all the pages of PDF file and with the help of len() function we counted the total pages returned by pages property and stored the total number of pages in a variable “totalpages” for further usage and at last, we print the variable holding the total page count of PDF file.