DT hangs during importing PDFs

Hi,
I downloaded the test version of Devon Think and I’m very excited about the program, it looks exactly like what I’ve always wanted for my research… but unfortunately I’m having problems importing my library of papers.
When I drag and drop a folder containig several papers (pdfs and ps’s), DT starts importing fine, but then hangs after a while, just showing the “importing …” thingy, and not doing anything anymore (not using up processor cycles either).
The weird thing is that importing the offending files individually works, so it’s probably not a problem with the files…

Is there maybe a limit to the number of files you can import into the test version?

I’d really like this to work, because the program looks like the answer to my prayers — except the bit about potentially having to import all papers (there are about a thousand of them in my archive) individually, that I didn’t say anything about in my prayers…

Cheers,
David.

Strange. There’s no limit on the number of files you can import into the test version, so it must be something else. Maybe it would be a good idea to distill the .ps files BEFORE importing. DEVONthink will use Mac OS X’s normalizer anyway to create PDFs out of PostScript files.

Best,

Eric.

Thanks Eric!

But what I should have mentioned is that the program actually hangs during importing pdfs, not .ps. And, as I said, the really weird thing is that importing those same pdfs individually does work…

I’ll try installing this other pdff-to-text-filter (Lightning something) once I’m back home back at my Mac.

Cheers,
David.

DT PE is limited to 10.000 PDF documents but it’s unlikely that this is causing the troubles. Is there anything logged to the system console (see application “Console” inside folder /Applications/Utilities)? Or is the “pdftotext” process running in the background and using CPU cycles? Then DT is waiting for the end of this background process. Another possibility - just send us the problematic files and we’ll check this over here (the easiest solution probably).

Thanks for all your efforts!

The problem really seems to be of the most helpful sort: reliably non-deterministic. It occurs seemingly randomly with different files (always pdfs, though), which all import fine on their own. My feeling is that it only occurs when I import folders containing a relatively high number of files (> 40), but I’m not sure. (There were folders with >100 that worked fine.)

But anyway, I’ve managed to import everything now. The trick was to import sub-directories with fewer files, and immediately do “backup and optimize”. I had two crashes (or, more precisely, hang ups: no processing cycles, no pdftotext process, no progress, no messages on console, so that I killed DT after about 10 minutes) but was able to recover everything imported before the crash, and trying the same folders a second time did work. Weird, but at least it’s all in there now.

There was one real crash of pdftotext, the console tells me (I’ll send the crash.log to you) which however I think was not responsible for the problems, as it came at a different time (?) and in any case there were two problems but only this one crash of pdftotext.

Still, there was a relatively high number of files (ps and pdf) where the log tells me that importing didn’t work; any ideas why that might be?

Anyways, many thanks, and my order is on its way!
D.

DEVONthink uses Mac OS X’s PostScript-to-PDF converter and the Quartz engine for displaying PDFs. So, all .ps and PDF files that work with Preview will also work in DEVONthink. Also, DT cannot extract text from password-protected files (see our latest DEVONtalk newsletter).

Best,

Eric.

Which PDF format do the documents use? For example, Quartz (OS X 10.3.x) can’t read/display some files created by Acrobat 6. Or are some files encrypted or maybe damaged?

Yeah, I’ve had this problem too. I’ve been trying to import three folders, one with several very large PDF’s (entire books so the file size is around 5 megs per) and the other two with around 200 PDF’s each. One of the folders has several sub-folders. The PDF’s in all folders are a mixture of PDF’s created by Adobe Acrobat and ones created by the “Print to PDF” method. All of them open fine in Preview.

I’ve been trying to import them into DEVONTHINK using TextLightning (which I discovered AFTER I bought it is no longer recommended by DEVONTHINK because it’s “buggy and not very well supported” were I think the terms used in another form. Way to go! Why, if you think this is problematic software, do you continue to provide a way to integrate it into DEVONTHINK? I feel like I just wasted my money) and converting to RTF’s in order to preserve formating. I also unchecked the pref to create thumbnails and checked the pref to import the files into the database in addition to using TextLightning and converting to RTF. The background color was the default white.

Anyway, DEVONTHINK starts to import, calls TextLightning (which seems to work just fine by the way) and then does the little “Importing” window thing. DEVONTHINK’s import window stays open long after the progress window for Textlightning has closed. If DEVONTHINK manages to make it through the folder, I wind up with a lot of documents that the log tells me it could get no text from. True, TextLightning outside of DEVONTHINK can’t do batch conversions; it invariably chokes if I drop a folder full of PDF’s on it. But from what I can figure, DEVONTHINK is calling TextLightning for each PDF its importing so technically its not a batch convert. And I should add that the PDF’s that DEVONTHINK’s log tells me had no text convert just fine with TextLightning outside of DEVONTHINK and convert in DEVONTHINK if brought in individually. Seems to me that the problem is in DEVONTHINK and not TextLightning and occurs when DEVONTHINK tries to import multiple PDF files and/or a folder full of them> I know that in at least some of the cases that proved to be a problem for DEVONTHINK (i.e. its log told me that there was no text for the document) that TextLightning successfully converted the documents as I found the converted RTF’s inside a folder with the name of my home directory which was inside a folder named DEVONTHINK in my /tmp directory. Clearly, I think the problem is on DEVONTHINK’s end and the way in which it is importing the RTF’s TextLightning produces into the database. I think it vital that this problem be addressed as there are a lot of folks like me, using this program to organize and access research databases, for whom this sort of functionality is critical.

Also, as an FYI to those using TextLightning, I found after installing it that its preference file was corrupted. The plist file can be found at ~/Library/Preferences/com.metaobject.TextLightning.plist I found that there was an item in it that referenced Soundmanager, which hasn’t been around since os 9 I think. I deleted that item so that the only item left in the .plist file was the license key. That seemed to clear up a lot of the problems I was having with TextLightning. It is also helpful to trash the DEVONTHINK .plist file as well (NOT the registration one!!!)