I recently purchased Devonthink and I subscribe to a historical newspaper service that allows me to download copies of each page of a newspaper. I would like to download a copy of each issue of my town’s local newspaper from the 1880’s to about 1980. I have several questions and would appreciate your advice:
[list]* As I download the .jpg images I assign a name to each corresponding newspaper page in this format: Newspapername_YYYY_MM_DD_pg1. Then I add the pages to a file folder named NewspaperName_YYYY_MM_DD. Is there a drawback as far as Devonthink is concerned to this method? Would it be better to add the pages to a .pdf document named Newspapername_YYYY_MM_DD without bothering to name each individual .jpg page?
The number of documents will, of course, be greater if you create a group for each publication date and store the pages for that day as individual JPEG images. That will increase the amount of metadata managed by a DEVONthink database.
I don’t know your plans for using the downloads.
My inclination would probably be to keep the individual page JPEG images rather than initially convert them to multipage PDFs.
Whether OCR of the images would work well depends on the resolution of the images. Low-resolution images may result in so many recognition errors that searches would not be reliable, and there might be image degradation in the conversion to PDF.
I’ve worked with microfilm of old newspapers and journals occasionally. Image quality can vary widely. Some old microfilms are barely readable and would not be good candidates for OCR. Some have very good resolution and would OCR well.