Socialist Standard digitization blog: Page and image size analysis

I have just discovered two anomalies regarding the page and image sizes; it remains to be seen how they will affect the cropping task.

The Socialist Standard has used at least five different page sizes throughout its print run. However, as shown in the table below, the page dimension ratios don't seem to correspond with those of the scanned versions in the LSE PDFs.

period	physical page		scanned page		apparent DPI
period	dimensions (mm)	ratio	dimensions (px)	ratio	horizontal	vertical
1904-09 – 1918-08	242 × 460	0.526	1715 × 2670	0.642	180	147
1918-09 – 1932-08	180 × 239	0.753	1815 × 2555	0.710	256	272
1932-09 – 1950-12	208 × 276	0.754	1965 × 2625	0.749	240	242
1951-01 – 1969-12	212 × 270	0.785	1715 × 2625	0.653	205	247
1970-01 – 1972-12	210 × 297	0.707	1725 × 2825	0.611	209	242

I'm at a loss as to what might account for the discrepancy, especially since the scanned aspect ratio is sometimes greater and sometimes lesser than the original. Keeping in mind that the microfiche was produced from photographs of bound collections of the Socialist Standard, here are some possible causes:

The Standard's sheets may have been cropped to a different size for binding.
The Standard was reprinted on paper of a different size for binding.
Portions of the physical sheets were obscured during photography (for instance, to hold the book open and in place for the camera), resulting in a cropped photo.
The paper is much wider than it appears in the two-dimensional photographs due to the binding gutter.
The horizontal and vertical DPI settings used for scanning the microfiche were not equal.

Not only are the page ratios different, but the dimensions of the entire scans (including the margins around the book and the LSE banner at the bottom) vary inexplicably. The height is always 3102 pixels but, as can be seen in the graph below, the width varies from 3405 to 4263 pixels. There is no obvious reason for this.

To automatically extract the image dimensions, I wrote the following C program using libjpeg:

#include <stdio.h>
#include <stdlib.h>
#include <jpeglib.h>

int main(int argc, char *argv[]) {

  struct jpeg_error_mgr jerr;
  struct jpeg_decompress_struct cinfo;
  FILE *infile;
  int arg = 0, status = EXIT_SUCCESS;

  /* Print usage information */
  if (argc <= 1) {
    fputs("Usage: jpegdims file.jpg ...\n", stderr);
    return EXIT_FAILURE;
  }

  /* For each filename on the command line */
  while (++arg < argc) {

    /* Open the file */
    if ((infile = fopen(argv[arg], "rb")) == NULL) {
      fprintf(stderr, "jpegdims: can't open %s\n", argv[arg]);
      status = EXIT_FAILURE;
      continue;
    }

    /* Initialize JPEG decompression */
    cinfo.err = jpeg_std_error(&jerr);
    jpeg_create_decompress(&cinfo);
    jpeg_stdio_src(&cinfo, infile);
    (void) jpeg_read_header(&cinfo, TRUE);
    (void) jpeg_start_decompress(&cinfo);

    printf("%7lu\t%7u\t%s\n", cinfo.output_width, 
                              cinfo.output_height, argv[arg]);

    /* Clean up */
    jpeg_destroy_decompress(&cinfo);
    fclose(infile);
  }

  return status;
}

5 comments:

DingoThursday, March 18, 2010 at 11:43:00 p.m. GMT+1
Oh Dear!

your program is very useful

can I compile and repackage for my distro? (Puppy Linux)
Tristan MillerFriday, March 19, 2010 at 12:16:00 a.m. GMT+1
Sure, I'll release it into the public domain, so feel free to modify and redistribute it.
DingoFriday, March 19, 2010 at 3:36:00 p.m. GMT+1
thanks!

I compiled (dynamically and statically linked)

and named by myself *jpegsize*

I will make also a .pet package for puppy linux
- http://puppylover.netsons.org/dokupuppy/

as pdf addicted and old books-journals lover, I will follow closely your scanning enterprise
Tristan MillerFriday, March 19, 2010 at 9:40:00 p.m. GMT+1
Thanks Dingo! Where did you come across my blog, by the way?
DingoFriday, March 19, 2010 at 10:15:00 p.m. GMT+1
Well,I found your blog while was looking for digital libraries and djvu free books

17 March 2010

Page and image size analysis

5 comments:

Blog Archive

Labels

Related links

17 March 2010

Page and image size analysis

5 comments:

Subscribe

Blog Archive

Labels

Related links