Wednesday, November 20, 2024
Google search engine
HomeLanguagesHow to convert a PDF document to a preview image in PHP?

How to convert a PDF document to a preview image in PHP?

Converting a PDF document into a set of images may not sound that fun, but it can have a few applications. As the content from images cannot be copied that easily, the conversion makes the document strictly ‘read-only’ and brings an extra layer of protection from plagiarism. The images may also come in handy when you need some ready-made slides for your quick office presentations or for embedding them into your reports and blogs.
In this post, however, we will limit ourselves to a much smaller example, that is to generate an image preview from a given PDF document. “Why previews?”, you may ask. Well, one may need it for his library management system, her online e-book retail store or just for some insane weekend programming challenge. Where do you think you can use this concept into your project? Do let me know in the comments.
Now implementing the complete conversion algorithm from scratch is not feasible, so we will stick to the 3rd party libraries to ease our task. The methods that I found appealing in this scenario are based on the following tools: 
 

  • Ghostscript: It is a command line utility available for all three major platforms, viz. Windows, Linux and Mac, that interprets PostSript and PDF files. You can read more about it on its official site.
  • ImageMagick: It is a free and open-source software suite for displaying, converting, and editing raster image and vector image files. It is available for majority of mainstream programming languages, including PHP. Here’s the standard documentation for a quick overview.

 

Using Ghostscript

For using Ghostscript into your project, start with its installation. If you are on windows, download the executable from its download page.
Linux users can install Ghostscript directly through their default package managers; 
 

# RPM based distros, Fedora 26/27/28
$ sudo dnf install ghostscript

Verify the installation via this command, 
 

$ gs --version

After installation, move to the directory containing the PDF file and run the following command. 
 

$ gs -dSAFER -dBATCH -sDEVICE=jpeg \
-dTextAlphaBits=4 -dGraphicsAlphaBits=4 \ 
-dFirstPage=1 -dLastPage=1 -r300 \
-sOutputFile=preview.jpg input.pdf

This will generate an image of the first page from the document. Let us understand what it actually does; 
 

  • -sDEVICE: sets the output file format of the image.
  • -sTEXTVAL, -sGRAPHICVAL: sets the anti-aliasing for the resultant image. Allowed values are 1, 2 and 4.
  • -r{NUM}: sets the resolution (in dpi) of the image.
  • -sFirstPage, -sLastPage: set the first and the last page of the document that has to be rendered.
  • -sOutputFile: sets the name of the output file.
  • input.pdf: it is the actual pdf document that is used for conversion.

Now for using this command in PHP, we call exec() function. For ex: 
 

php




<?php
 
exec( "ls -l", $output_str, $return_val );
 
foreach ( $output_str as $line ) {
    echo $line . "\n";
}
 
?>;


This example, on Linux, will execute ls command and list all the directories and files onto the console.
We can use this concept and execute ghostscript command from our PHP code. Here’s how I have done it;
 

php




<?php
 
function is_pdf ( $file ) {
    $file_content = file_get_contents( $file );
     
    if ( preg_match( "/^%PDF-[0-1]\.[0-9]+/", $file_content ) ) {
        return true;
    }
    else {
        return false;
    }
}
 
function create_preview ( $file ) {
    $output_format = "jpeg";
    $antialiasing = "4";
    $preview_page = "1";
    $resolution = "300";
    $output_file = "preview.jpg";
 
    $exec_command  = "gs -dSAFER -dBATCH -dNOPAUSE -sDEVICE=" . $output_format . " ";
    $exec_command .= "-dTextAlphaBits=". $antialiasing . " -dGraphicsAlphaBits=" . $antialiasing . " ";
    $exec_command .= "-dFirstPage=" . $preview_page . " -dLastPage=" . $preview_page . " ";
    $exec_command .= "-r" . $resolution . " ";
    $exec_command .= "-sOutputFile=" . $output_file . " '" . $file . "'";
 
    echo "Executing command...\n";
    exec( $exec_command, $command_output, $return_val );
     
    foreach( $command_output as $line ) {
        echo $line . "\n";
    }
 
    if ( !$return_val ) {
        echo "Preview created successfully!!\n";
    }
    else {
        echo "Error while creating the preview.\n";
    }
}
 
function __main__() {
    global $argv;
    $input_file = $argv[1];
 
    if ( is_pdf( $input_file ) ) {
        // Create preview for the pdf
        create_preview( $input_file );
    }
    else {
        echo "The input file " . $input_file . " is not a valid PDF document.\n";
    }
}
 
__main__();
     
?>


The execution starts from __main__() which takes PDF file at command line. It checks whether the input file is valid PDF or not. If valid, it executes the ghostscript command over the input file. 
Output: 
 

$ php pdf_preview.php input.pdf
Executing command...
GPL Ghostscript 9.22 (2017-10-04)
Copyright (C) 2017 Artifex Software, Inc.  All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Processing pages 1 through 1.
Page 1
Preview created successfully!!

 

Using ImageMagick

As usual, we will start with installing ImageMagick binaries into the system. Start with the dependencies; 
 

$ sudo dnf install gcc php-devel php-pear

After that, install ImageMagick; 
 

$ sudo dnf install ImageMagick ImageMagick-devel

Then install the PHP wrapper classes; 
 

$ sudo pecl install imagick
$ sudo bash -c "echo "extension=imagick.so" > /etc/php.d/imagick.ini"

If you are planning to use it on LAMP architecture, consider restarting the Apache Web server; 
 

$ sudo service httpd restart

Now that our system is ready, we can use ImageMagick into our example project. The basic functionality of the script remains the same. All you have to do is to replace the content of the create_preview() function with the following code. 
 

php




function create_preview ( $file ) {
    $output_format = "jpeg";
    $preview_page = "1";
    $resolution = "300";
    $output_file = "imagick_preview.jpg";
 
    echo "Fetching preview...\n";
    $img_data = new Imagick();
    $img_data->setResolution( $resolution, $resolution );
    $img_data->readImage( $file . "[" . ($preview_page - 1) . "]" );
    $img_data->setImageFormat( $output_format );
 
    file_put_contents( $output_file, $img_data, FILE_USE_INCLUDE_PATH );
}


The code is self-explanatory. We are defining an instance of Imagick type and setting various parameters like resolution, file format, etc. The PDF page you want to render is mentioned as an array index after the file name. For ex: 
 

First page: input.pdf[0]
Second page: input.pdf[1]
.
.
.
Nth page: input.pdf[N - 1]

Output: 
 

$ php pdf_preview.php input.pdf
Fetching preview...

Some of you might be wondering why to use this method over the previous one. Well, I found the ImageMagick one pretty consistent with the PHP code. A command line in programming does not look that good and sometimes becomes notorious to understand. However, with the same set of configurations, Ghostscript produced smaller image files than the ones rendered by ImageMagick. I am not sure if that is because of some optimization issues, but the difference is not of that big concern. The choice of one over the other is merely based on your own taste.
So this is how you create a preview for a given PDF document. I hope you have learned something new from this post. Which method would you prefer? Have any suggestions for further improvements? Feel free to mention them in the comments. 

 

RELATED ARTICLES

Most Popular

Recent Comments