28 March 2010

First results with unpaper

The last few days have been spent figuring out how to get unpaper to work. Unlike my autocrop tool, the sheet size needs to be specified, which makes it a bit trickier to use with the LSE scans (see my earlier post "Page and image size analysis"). It also handles only uncompressed PNM files, which for some strange reason the author thinks of as a feature rather than a shortcoming. So now my corpus has ballooned by another 74 GB. Good thing I bought that 1.5 TB drive.

Anyway, I've run unpaper on the September 1904 through August 1918 issues (whose pages are all 242 mm × 460 mm). Of the 836 uncropped JPEGs for these issues, unpaper seems to have processed 779 of them (93%) correctly with the following command line:

unpaper --overwrite --layout double --output-pages 2 --pre-wipe 0,2898,4263,3102 --sheet-size 3500,2700 in.pgm out%d.pgm

Of the remaining 57 JPEGs, all but 6 were correctly processed with some extra options to modify the behaviour of the black or grey filters. The remaining 6 images I will have to deskew and crop by hand, or just use my autocrop tool.

Finding out the correct unpaper options for the 57 anomalous JPEGs was somewhat tedious. I would run unpaper with various command-line options on the files, wait several seconds for it to process, launch an image viewer on the output files, and then if the output was not acceptable, I would have to quit the image viewer and start again with a different set of command-line options. It would have been much easier if I could have just kept the image viewer open and set it to automatically refresh the images whenever they changed on disk. Unfortunately, neither of the viewers I tried (Gwenview and Kuickshow) have a "watch file" option. Gwenview does have a manual "refresh" command, but it does not refresh thumbnails. I've therefore created and/or voted for these "watch file" and "refresh" feature requests on the KDE bug tracker:

In the meantime, does anyone know of a fast, lightweight image viewer for X11 which has a "watch file" feature? It should be able to view PNM and PNG files.

4 comments:

  1. maybe

    *xnview*
    - http://www.xnview.com/en/downloadunix.html

    can perform tasks you are looking for

    ReplyDelete
  2. Unfortunately, XnView is not free software, so I would rather not use it.

    ReplyDelete
  3. Ah, I think I have discovered an appropriate free image viewer: QIV

    ReplyDelete
  4. Well, for me geekie apparently "watches" files, and it displays p*m and png files. I don't know if it happens for the thumbnails though.

    ReplyDelete