04 November 2012

More bugs

So I wrote to Alexey Kryukov, author of PDFBeads, alerting him about his program's inability to handle input files where the horizontal and vertical DPI differ. I never heard back. I also did some further research on manually changing the output PDF resolution, but from what I can tell this isn't possible. So it looks like I'll just have to override the DPI settings in the original TIFFs and live with slightly stretched or compressed page sizes.

After running PDFBeads on my entire collection of images, I noticed that it failed to produce some issues due to missing hOCR files. Looking back, I see that Tesseract 3.01 has failed on some of the images, producing the following error message:

ELIST_ITERATOR::add_after_then_move:Error:Attemting to add an element with non NULL links, to a list

It looks like this problem has been reported at least a couple times before on the Tesseract issue tracker (Issue 541, Issue 788). Comments on the second report suggest the problem may have been solved in Tesseract 3.02, which was released a few days ago. This version hasn't yet been packaged in Lazy Kent's repository, so I can either wait to see if he updates the RPM, or try producing one myself using his spec file and patches.