Incomplete indexing of large pdf's

I used DTpro3 to index a folder containing pdf’s of up to 13,000 pages. It appears DT (like Mac’s Spotlight) does not index pdfs past something like 1,000 pages.

Is there a way to force DT to index the entirety of all pdf files regardless of size? (Is there a way to force Spotlight to do so?) If so, what is the way?


Where are you getting PDFs that large?

Medical records and books, mainly.

I could solve the problem by breaking files up into chunks, but in my circumstances that would create disadvantages I’d like to avoid.

Incidentally, Foxtrot Pro does not have this problem if you switch from reliance on Spotlight indexing of pdfs and instead switch to using xpdf.

Interesting - where is the config setting for xpdf?

If you start up Foxtrot using command/option, it’s in the dialogue box that pops up.

1 Like

Development would have to assess this.

A link to or an example document would be helpful.

DEVONthink uses a background task to index certain documents (e.g. PDF) and a timeout to avoid that e.g. corrupted documents can stall the indexing or could crash the main app. Therefore the only limits are the timeout and the speed of your computer. Are any of these giant PDFs downloadable/public?

FWIW - I regularly work with PDFs on the order of thousands of pages (also medical records). I have found that the long OCR and other processing of these files may at times slow down execution of smart rules; also it is not a good idea to simultaneously attempt other CPU-intensive tasks in DT3 while such large files are being processed. But that said I have never seen a page limit to the size of indexing such files.