15 March 2010

PDF viewing woes

I've been running into problems with the programs I use to view the LSE PDFs. For one thing, these PDFs are an average of 261 MB in size, whereas my file manager, Dolphin, won't generate previews for files larger than 100 MB. This is a rather annoying and arbitrary upper limit, especially considering that it takes only a few milliseconds to generate thumbnails for 100 MB PDFs. I've accordingly filed a bug report asking that the limit be removed.

The other problem is that my usual PDF viewer, Okular, is phenomenally slow at rendering the pages of the LSE PDFs—it takes between 10 and 25 seconds per page. (By comparison, the proprietary Adobe Reader renders them almost instantly.) Okular, like many other Free Software document viewers, renders PDFs using the FreeDesktop project's Poppler library, and it is there that the problem lies. Most likely this is due to a known issue, Bug 13518. For now, then, I will be using GNU gv, which isn't based on Poppler and is able to render the LSE PDFs quickly.


  1. I read your files are, in average, 261 MB in size

    Assuming your scans are in Black and White/Grayscale, in order to reduce filesize without losing quality, maybe you want take a look to jbig2 encoding?

    Adam Langley has sources

    binaries also are availables (I myself have compiled for my distro Puppy Linux - compilation depends from libpng version used in your system, f you want try, I can link already compiled versions, but you are a very smart guy and you can easily build your binary from sources)

    Jbig2enc is a terrific compression encoder; from 15 MB of tiff g4 using jbig2enc I get 1942 MB pdf

    python is also needed (using pdf.py) for final step (assembling jbig2 images in a pdf)

  2. Unfortunately, I don't think JBIG2 will be appropriate for the LSE images, since they're fairly low-resolution greyscale, whereas JBIG2 is for bilevel images. However, JBIG2 would be appropriate for the 1970–1997 scans I made myself, since those are 600 dpi bilevel images. I'll be posting more about these latter scans once I'm done with the former.