17 March 2010

Page and image size analysis

I have just discovered two anomalies regarding the page and image sizes; it remains to be seen how they will affect the cropping task.

The Socialist Standard has used at least five different page sizes throughout its print run. However, as shown in the table below, the page dimension ratios don't seem to correspond with those of the scanned versions in the LSE PDFs.

periodphysical pagescanned pageapparent DPI
dimensions (mm)ratiodimensions (px)ratiohorizontalvertical
1904-09 – 1918-08242 × 4600.5261715 × 26700.642180147
1918-09 – 1932-08180 × 2390.7531815 × 25550.710256272
1932-09 – 1950-12208 × 2760.7541965 × 26250.749240242
1951-01 – 1969-12212 × 2700.7851715 × 26250.653205247
1970-01 – 1972-12210 × 2970.7071725 × 28250.611209242

I'm at a loss as to what might account for the discrepancy, especially since the scanned aspect ratio is sometimes greater and sometimes lesser than the original. Keeping in mind that the microfiche was produced from photographs of bound collections of the Socialist Standard, here are some possible causes:

  • The Standard's sheets may have been cropped to a different size for binding.
  • The Standard was reprinted on paper of a different size for binding.
  • Portions of the physical sheets were obscured during photography (for instance, to hold the book open and in place for the camera), resulting in a cropped photo.
  • The paper is much wider than it appears in the two-dimensional photographs due to the binding gutter.
  • The horizontal and vertical DPI settings used for scanning the microfiche were not equal.

Not only are the page ratios different, but the dimensions of the entire scans (including the margins around the book and the LSE banner at the bottom) vary inexplicably. The height is always 3102 pixels but, as can be seen in the graph below, the width varies from 3405 to 4263 pixels. There is no obvious reason for this.

To automatically extract the image dimensions, I wrote the following C program using libjpeg:

#include <stdio.h>
#include <stdlib.h>
#include <jpeglib.h>

int main(int argc, char *argv[]) {

  struct jpeg_error_mgr jerr;
  struct jpeg_decompress_struct cinfo;
  FILE *infile;
  int arg = 0, status = EXIT_SUCCESS;

  /* Print usage information */
  if (argc <= 1) {
    fputs("Usage: jpegdims file.jpg ...\n", stderr);
    return EXIT_FAILURE;
  }

  /* For each filename on the command line */
  while (++arg < argc) {

    /* Open the file */
    if ((infile = fopen(argv[arg], "rb")) == NULL) {
      fprintf(stderr, "jpegdims: can't open %s\n", argv[arg]);
      status = EXIT_FAILURE;
      continue;
    }

    /* Initialize JPEG decompression */
    cinfo.err = jpeg_std_error(&jerr);
    jpeg_create_decompress(&cinfo);
    jpeg_stdio_src(&cinfo, infile);
    (void) jpeg_read_header(&cinfo, TRUE);
    (void) jpeg_start_decompress(&cinfo);

    printf("%7lu\t%7u\t%s\n", cinfo.output_width, 
                              cinfo.output_height, argv[arg]);

    /* Clean up */
    jpeg_destroy_decompress(&cinfo);
    fclose(infile);
  }

  return status;
}

5 comments:

  1. Oh Dear!

    your program is very useful

    can I compile and repackage for my distro? (Puppy Linux)

    ReplyDelete
  2. Sure, I'll release it into the public domain, so feel free to modify and redistribute it.

    ReplyDelete
  3. thanks!

    I compiled (dynamically and statically linked)

    and named by myself *jpegsize*

    I will make also a .pet package for puppy linux
    - http://puppylover.netsons.org/dokupuppy/

    as pdf addicted and old books-journals lover, I will follow closely your scanning enterprise

    ReplyDelete
  4. Thanks Dingo! Where did you come across my blog, by the way?

    ReplyDelete
  5. Well,I found your blog while was looking for digital libraries and djvu free books

    ReplyDelete