Crashes on email import

I am now having problems with DTPO crashing while importing large volumes of email. It seems similar to my problems of large batch processing of OCR imports. I will send the crash logs for this issue as well. Any idea what is going on? With the type of database I am attempting to build this program is crashing all the time. I am beginning to wonder if the program is up to the task of what I want to do. I assume DTPO should be able to handle tasks involving several gigabytes of information. My largest database is only 4G I am surprised to be having these types of problems.

What’s the item/word count, see File > Database Properties? Because the filesize doesn’t matter, one could easily create databases containing terabytes of data (like movies, photos, sounds).

My import from entourage is also creating error - i tried to import folder that had subfolders, (count of about 5000) and fails after about 10 minutes of trying to import

I have opened my activity monitor and watched an import of a few thousand emails and crash. I have 4G of memory in my system. I would have thought that was adequate. it seems to me the DTPO either requires a lot of memory or has a problem with memory management. I can only import and OCR 300 or so tiffs at a time in DTPO without it crashing (I think it is also running out of physical memory) I was able to convert and OCR 28,000 images at one time in acrobat without a problem, which leads me to believe the issues I am having are really software related.

I also find that performing OCR on a few hundred tiffs/pdfs at once is problematic. Basically, DT’s memory usage just gets higher and higher until in the end it crashes. My workaround is to OCR around 400 single-pages pdfs, quit DT, restart it, OCR 400 more etc.

i am finding this time just after doing a system /img]restart, then importing only inbox (not deleted or sent), even though there are errors reported, it is now imported 12000 emails…but even though i created a new database it says it had previously imported 3000 of them… getting better but still not perfect.what a palava!

still crashes - time out from certain folders

I added 8G of ram to see if it solved the problem. still crashes on import of a single inbox with a few thousand emails in it.

I thought I would give you some additional information. I am running OSX 10.5.8 and DTPO 2.01 on a Mac pro with 2-3.0 Ghz dual core processors and 12G ram and ample hard drive space. The message I am getting form DTPO is “Can’t allocate enough memory for data object” the program crashes while it is loading the messages at the beginning of the import process before they are actually being imported. The inbox I am trying to import contains 3298 messages is approximately 2.2 G in size.

I used Acrobat 8 as a work around. It has batch processing and I was able to convert and OCR 28,000+ .tiff files at one time with no problems. After they were done I imported them into DTPO without a problem. I have heard Acrobat’s OCR is not as good, but I have not noticed a difference. Another bonus is Acrobat is literally at least 5X faster (probably more like 10X-20X) than conversion in DTPO.

Yes, I also have tried Acrobat Pro (9 in my case) and it handles lots of files without crashing. For this particular job, I have a few tens of thousands of single-page pdfs, organized in folder in DT, which I’d like to scan. They have text already, but I want to perform OCR on them again (the text looks OK but is not readable if you cut and paste, for example, for some reason).

Unfortunately, Acrobat refuses to do that. I can open them, export as tiff and do ocr on that in acrobat, but I cannot automate that because Acrobat seems to only support automation in Javascript, which I don’t know and am not about to learn just for this. DT can do that (you simply tell it to ocr a bunch of pdfs and it doesn’t care if it already has text), so I use that.

I’ve tried a number of ways of programmatically converting a pdf file to a tiff file, using various languages and libraries, but all fail with these files for some reason. Automator doesn’t fail (maybe DT is using the same mechanism to convert the pdf pages to images), but crashes all the time.

I’m planning next to find some command line utility to convert pdfs to tiff, and just automate the whole conversion using applescript to export from DT and then just calling that from applescript. Unfortunately I currently don’t have time to do that.

At any rate it really is true that the engine DT uses is more accurate than the one Acrobat uses, and this may or may not make a difference to the way the “see also” and “classify” features work (it does in my case, although not that much of a difference)

I wonder if there is an official workaround for this? I have submitted a ticket yesterday, explaining a similar situation when trying to import e-mail through Entourage.

It worries me that I am not alone with the bug issue in DEVONthink, and the reason why I bought Pro Office, is because I was testing Pro before, and needed the e-mail archiving solution, so I decided to spend all the $ into DEVONthink, instead of having a separate app such as Mailsteward to handle my e-mail archives.

I tried to import selected e-mails through drag_drop from Entourage, but the naming doesn’t follows with the pattern done when actually using the Import Menu. I get messages like “Mail from ?” in the Name field, etc. I have already imported all of my folders through the Import Menu, except one big “Archive” folder which holds around 11K of e-mails. I need from there only those which correspond to dates prior to Dec31, 2009, and want them imported as clean as the others already imported look, when done through the Import Menu.

*Hardware: MacBook 2.4ghz 4gb of Ram, Snow Leopard.