This afternoon I used GIMP to find cropping coordinates for the 18 pages my autocrop program didn't successfully process. Having passed these to jpegtran, I'm now in possession of 13 020 properly cropped JPEG images, of which 11 164 are unique pages of the Socialist Standard (and the 1967 supplement) and the remaining 1856 are blank pages, microfiche title slides, indices, or duplicates.
Having cropped the images and discarded the irrelevant pages has brought the size of the corpus down from 17.58 GB to 15.13 GB, a savings of 13.94%. Of course, if LSE had properly scanned them as high-resolution bilevel images rather than JPEGs in the first place, the size would have been about a third of this. I am wondering if there is some way to convert the JPEGs to bilevel images, but given the relatively poor quality of the photographs and low resolution of the scans, this may not be possible. I'll have a go at batch-converting them with ImageMagick and examine the results, but I am not optimistic that they will be acceptable.
At any rate, the next step will be to assemble the individual pages into PDFs or DjVus, one issue per file. I shall have to look around to see what software is available for this. The only one I'm aware of is the pdfpages package for pdfTeX, though I'm sure there are others more suitable for my task.