We’re about to receive ~50,000 pages of multi-page TIFF and/or TIF files, and I’m a bit worried about processing them locally. Does DT3 play nice with multi-page TIFF? Will it OCR and PDF them?
Do you have a sample file to test?
PS: I hope you’re not thinking of queueing up 50,000 TIFF files to OCR.
Inquiring now-- thanks.
And yes, that is actually my plan—splitting the task across three Macs. I recently OCR’d about that many PDF pages in DT3… Started it in the morning and it was finished by the evening. But maybe processing TIFFs is more intensive? I don’t know what to expect, this is the first time I’ve had to deal with them.
Where are you getting these files and for purpose?
It’s an e-discovery dump coming from a state regulatory agency in the U.S. during a lawsuit. Evidently some of the more popular e-discovery platforms use TIFF
Gotcha. I’m familiar with them from the printing industry.
Hi,
had the same issue. Here is the solution:
- Install Homebrew (-> https://brew.sh → execute in Terminal:
/bin/bash -c “$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)”
(don’t forget the quotations at the end!)
- install ImageMagick in Terminal:
brew install imagemagick
-
put all tiff-files in a folder (e.g. ~/Desktop/tiff-convert)
-
execute the following command in Terminal:
find ~/Desktop/tiff-convert -iname “*.tiff” -exec convert {} {}.pdf ;
(at the end is a “;”!)
This converts all tiff-files in the folder to PDF files. But please be aware, that in some cases (2-3%) it doesn’t work.
- The PDF files should then be processed with OCR in DEVONthink.
Does that mean that DT can’t OCR multi-page TIFFs?
From my initial test, yes it can. However, we are trying to locate some good real-world example files to test.
@tjur: as @BLUEFROG said (and I could confirm), DT handles multi-page TIFFs just fine. I.e. it can OCR and convert them to PDF. So, which problem did ImageMagick solve in this case?
I found an ugly multi-page TIFF here:
Interestingly, it seems to use different compressions for the pages.
As to real-world samples: The USPTO used TIFFs in the past, and you can find a lot of those here:
http://storage.googleapis.com/patents/grant_multi_page_imgs_before2000
You’ll find a heap of ZIPs, each of them containing tons of files filed by USPTO No. The quality of the TIFFs might be terrible, though.
Some anglophone courts are perhaps also using TIFFs, but I couldn’t come up with a sample yet.
(Why on earth does anyone still use text-less TIFF instead of PDF for text?)
Yeah - that’s the sample file I found too. Thanks for the link for extra TIFFs too.
Yes that is the problem. I’m dealing with many TIFF files from the german business and commercial register. In my estimation, DT’s OCR function didn’t work with at least half of the files. DT only recognized and saved the first page of the TIFF. That’s why i use the “convert” command.
Bundesanzeiger? I was looking there, but only cursory. Found only PDFs. Maybe you could pass one of the non-working TIFFs on to the DT developers?
I just checked some TIFF files again (where I think the error occurred last time) and no errors occur now. Could it be, that one of the last updates of macOS or DEVONthink fix that issue?
That’s not related to DT, I think. Thumbnails are handled by macOS, and if there’s no Quicklook plugin for TIFF, there’s no thumbnail.
Ok, I assumed that the tiff files from the commercial register are corrupted and therefore also displayed in Finder without thumbnails. However, “Preview” can display them. Weird tactics by Apple not to deliver a native tiff QL plugin…
I’ll get back to the error if it comes back to me.
The Thumbnails inspector supports only PDF documents currently.
Curious: is Thumbnails Inspector software from DEVONthink or macOS, or combination?