Convert PDF to image files

Another entry in my PDF series. 🙂

At work I had to convert a few PDFs into jpeg files. The PDF files actually contain just images, so extracting the images would’ve been as good, but converting PDF pages to images is easier. 🙂 At least on windows. On linux you could use pdfimages, which should be available for your distro through some poppler util package.

A general approach is to use the versatile ImageMagick software. Using the convert command, it’s simply a matter of:

convert -density 200x200 -quality 85% inputfile.pdf outputfile%03d.jpg

to convert inputfile.pdf to a set of jpeg files. Of course you can choose any supported output format you like: png/bmp/gif/etc.. The density option is important as it specifies the output dpi which is only 72 by default. Imagemagick uses Ghostscript to do the pdf reading, so on windows you have to make sure that environment is set up correctly. In my case I had to add its bin directory to the path and set the working dir to the lib path as it didn’t find otherwise (adding to the path didn’t help). If you know a better way, please let me know in the comments. I also specified the quality option to have a jpeg quality factor of 85, which is 75 by default.

Since I had to process a batch of them, I thought it would be nice to have a cool batch script to do them automatically. Here is what I came up with:

set path=%path%;C:\programs\gs\gs8.54\bin
cd C:\programs\gs\gs8.54\lib
for %%C in (%*) do call :startconvert %%C
goto :eof

convert -density 200x200 -quality 85%% %1 %~dp1%~n1%%03d.jpg
goto :eof

Replace the paths as appropiate. To use the bat file, simply drag and drop one or more pdf file on the bat file. It will put jpeg files in the same directory as the pdf file and number them using 3 digits (%03d printf format). %~dp1 stands for the full path of the 1st argument file, %~n1 for the name only (without extension). Also note you have to escape all other %’s.