How to extract images from PDF in Python?

28 July 2024

0

The task in this article is to extract images from PDFs and convert them to Image to PDF and PDF to Image in Python.

To extract the images from PDF files and save them, we use the PyMuPDF library. First, we would have to install the PyMuPDF library using Pillow.

pip install PyMuPDF Pillow

PyMuPDF is used to access PDF files. To extract images from a PDF file, we need to follow the steps mentioned below-

Import necessary libraries
Specify the path of the file from which you want to extract images and open it
Iterate through all the pages of the PDF and get all images and objects present on every page
Use getImageList() method to get all image objects as a list of tuples
To get the image in bytes and along with the additional information about the image, use extractImage()

Note: To download the PDF file click here.

Implementation:

Python3

# STEP 1
# import libraries
import fitz
import io
from PIL import Image
  
# STEP 2
# file path you want to extract images from
file = "/content/pdf_file.pdf"
  
# open the file
pdf_file = fitz.open(file)
  
# STEP 3
# iterate over PDF pages
for page_index in range(len(pdf_file)):
  
    # get the page itself
    page = pdf_file[page_index]
    image_list = page.getImageList()
  
    # printing number of images found in this page
    if image_list:
        print(
            f"[+] Found a total of {len(image_list)} images in page {page_index}")
    else:
        print("[!] No images found on page", page_index)
    for image_index, img in enumerate(page.getImageList(), start=1):
  
        # get the XREF of the image
        xref = img[0]
  
        # extract the image bytes
        base_image = pdf_file.extractImage(xref)
        image_bytes = base_image["image"]
  
        # get the image extension
        image_ext = base_image["ext"]

Output:

Image to PDF and PDF to Image Conversion:

Image to PDF Conversion

Note: The image used here can be found here.

Python3

import fitz
doc = fitz.open()
imgdoc = fitz.open('image.jpeg')  # open image
pdfbytes = imgdoc.convert_to_pdf()
imgpdf = fitz.open("pdf", pdfbytes)
doc.insert_pdf(imgpdf)
doc.save('imagetopdf.pdf')  # save file

First, we opened a blank document. Then we opened the image.

Now the image is converted to PDF using the convert_to_pdf() method.

After conversion, the image is appended to the empty doc which we created at starting. The document is saved after it has been appended.

Output:

PDF to Image Conversion

Note: We are using the sample.pdf for PDf to image conversion; to get the pdf, use the link below.

https://www.africau.edu/images/default/sample.pdf – sample.pdf

Python3

import fitz
doc = fitz.open('sample.pdf')
for page in doc:
    pix = page.get_pixmap(matrix=fitz.Identity, dpi=None,
                          colorspace=fitz.csRGB, clip=None, alpha=True, annots=True)
    pix.save("samplepdfimage-%i.jpg" % page.number)  # save file

We used the get_pixmap() method to convert pdf to image and then saved the image.

Output:

The sample.pdf is a two-page document, so two separate images are created.

There is a tool called UPDF that can be used to extract images from PDF file.

How to extract images from PDF in Python?

Python3

Image to PDF and PDF to Image Conversion:

Python3

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

8 Best VPNs for Apple TV in 2024: Fast & Secure by Penka Hristovska

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Is Microsoft Teams Secure? Use Teams Safely in 2024 by Tyler Cross

Recent Comments

EDITOR PICKS

8 Best VPNs for Apple TV in 2024: Fast & Secure by Penka Hristovska

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

POPULAR POSTS

8 Best VPNs for Apple TV in 2024: Fast & Secure by Penka Hristovska

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

POPULAR CATEGORY

ABOUT US

FOLLOW US