DT Pro crashes while indexing

Hi,

I’m running into a persistent crash problem when indexing files and folders. I’m an academic, so many of my research files include large pdf files

DT has worked fine in the past, but now that I have added a few hundred pdfs from Google Books it continually hangs while indexing and frequently crashes. Is there a known problem with the manner in which DT scans files compressed using JBIG2? The crash report seems to suggest that the problem is with the pdf rendering (opening lines appended).

Thanks for any assistance,
Joe Z

Crash Report follows:

Host Name: Computer
Date/Time: 2007-05-11 12:07:43.167 +1200
OS Version: 10.4.9 (Build 8P135)
Report Version: 4
Command: DEVONthink Pro
Path: /Applications/DEVONthink Pro.app/Contents/MacOS/DEVONthink Pro
Parent: WindowServer [69]
Version: 1.3.1 (1.3.1)

PID: 360
Thread: 0

Exception: EXC_BAD_ACCESS (0x0001)
Codes: KERN_PROTECTION_FAILURE (0x0002) at 0x00000008

Thread 0 Crashed:
0 libJBIG2.A.dylib 0x9c66e710 JBIG2Stream::readTextRegion(int, int, int, int, unsigned, unsigned, int, JBIG2HuffmanTable*, unsigned, JBIG2Bitmap**, unsigned, unsigned, unsigned, unsigned, int, JBIG2HuffmanTable*, JBIG2HuffmanTable*, JBIG2HuffmanTable*, JBIG2HuffmanTable*, JBIG2HuffmanTable*, JBIG2HuffmanTable*, JBIG2HuffmanTable*, JBIG2HuffmanTable*, unsigned, int*, int*) + 920
1 libJBIG2.A.dylib 0x9c66e298 JBIG2Stream::readTextRegionSeg(unsigned, int, int, unsigned, unsigned*, unsigned) + 2192
2 libJBIG2.A.dylib 0x9c66ccbc JBIG2Stream::readSegments() + 944
3 libJBIG2.A.dylib 0x9c66c858 JBIG2Stream::reset() + 284
4 libJBIG2.A.dylib 0x9c66a9bc create_state(pdf_source*, pdf_source*) + 308
5 com.apple.CoreGraphics 0x9064cb0c pdf_source_create_jbig2_filter + 140
6 com.apple.CoreGraphics 0x9064d920 create_jbig2_filter + 72
7 com.apple.CoreGraphics 0x903caccc add_filter + 560
8 com.apple.CoreGraphics 0x903caa64 create_with_name + 144
9 com.apple.CoreGraphics 0x903ca9c0 pdf_filter_chain_create + 228
10 com.apple.CoreGraphics 0x903ca640 CGPDFStreamCreateFilterChain + 176

This looks like a random crash of Mac OS X/Quartz. You might switch to pdftotext (see Preferences > PDF & PS) and try to index the files & folders using this tool.

Dear Christian,

Thanks for the prompt response. I switched the Preference as you suggested, and that did the trick. DT Pro no longer hangs or crashes when indexing my pdfs.

Thanks again!
Joe :smiley:

Is one way better than the other for DTP?

Lou

Hi, Lou. If you Index Finder files into your database, your database package will be smaller, and the memory requirements for the database will be somewhat smaller than Import capture. If you are a heavy user of MS Word files, Index captures are somewhat preferable, as there’s one-way synchronization to your database when a Word file is edited and saved.

Remember that the original files in the Finder cannot be deleted, else information will be lost to your database.

I prefer my databases to be highly portable, so that I can easily move them among computers. So I import-capture files, resulting in copying them into the database package (with the exception of Word files, which remain externally linked in the Finder). I tend to avoid Word files. :slight_smile:

But if I need Word files, I’ll copy them to a folder in my Documents directory before Importing them. If I move that database to a different computer, I also copy the folder containing Word files to the Documents directory of the new computer.

As always - thanks for the explanation Bill. Much appreciated.

Lou

Lou,

Bill’s advice is spot-on. Importing files into the database enhances portability at the expense of database size (which can be enormous). Indexing is best used if you want to keep the database size manageable for easy backup. I think Indexing is also the most efficient choice if you’re using DT on a single computer.

As for the specific crash problem that prompted my original query, it looks more an more as though it’s caused by OS X 10.4 Tiger’s built-in PDFKit, which is the default indexing choice. Selecting the DT pdftotext feature as described by Christian has worked flawlessly (I’ve just finished indexing +7,500 pdf documents, whereas before DT Pro had consistently crashed less than half way through the indexing).

Best,
Joe