Staged Indexing


I have ridiculous number of PDF files (well over 100K) sizes varying from a few kb to many mb (a few over 100MB). Obviously, I don’t want to store all of these in DevonThinks database because it wouldn’t be all that efficient. However, the problem that I run into now is that, sometimes, the indexing process crashes or fails on particular documents (not all are searchable PDF’s). Then, when restarting we go all the way to the beginning. I propose the following:

Import the list of files to be indexed first along with a status. Then, a background process can actually perform the indexing without stopping useful work from being accomplished.

I"m going to guess that this could be done using a script, unfortunately, I’m not a scripting person and I find AppleScript, well, to be a bit confusing.

If this is something already implemented please do let me know. I’ve got tons of documents that I’d love to turn the AI loose upon and, perhaps, help me to organize things better.

Actually it shouldn’t crash as a background task is used for indexing. Has been anything logged to the system console? Could you send the crash logs to cgrunenberg - at - Thanks!

I have to say I find the underlying request in this post quite appealing. I would find it great if there really were an “indexing queue” in DTpro that one could turn on and off (even cooler: one that could also be triggered to run by a sleep event, i.e. whenever the screen saver was active).

I now have several very large databases, circa 35gb. And when I add a document with 50-100K words, it stops my work for up to sixty seconds with a spinning cursor. I assume a lot of this is memory management (that devonthink uses about 2.5gb of virtual memory on my computer once I’ve done a search) and that the OS is churning through memory to allow DT to do its indexing.

If I could turn indexing off for a while, work with the documents, and then turn it back on (or have it automatically turn back on), this would help my work tremendously.

The desire for this has been accentuated in fact as DTpro has become more useful. I am writing a book which has 120K words, broken into word files with chapters of circa 35K words. I store and organize these files in Devonthink. Each time I save, DT has to reindex. It would be so nice if I could just turn that off for a while and let it reactivate when I walk away from my computer.

This would also be useful when dropping tons of files into the sorter. I sometimes go a research binge, download 30 articles in safari, and start dragging pdfs from skim, safari, etc. into dtpro into a folder that I’ve placed in the sorter. Dtpro and the sorter become totaly unresponsive during this time, and the system is very sluggish. It would be awesome to be able to just drop 50 files into the folder of my choice, and tell dtpro to wait until I say go to index them. Then I could “get to work” with the files, renaming them, using them, etc, and let the indexing happen when convenient for me.

I understand this isn’t a trivial request to implement, as the whole question of state comes to the fore. But, neverthless, this would save me years of waiting, 30 seconds at a time!

If the queue that built up also showed the files that couldn’t be indexed, and allowed me to go to them, that would be nice. If there were an applescript command to turn the processing on and off, that would be great, too.