PDFs not searchable from Spotlight

Bill De Ville, in another topic gave me a good suggestion: “I would recommend that you set your preference for importing images and PDF/PS files to copy them to the database folder (check PDFKit for PDF & PS files). After a test import of each file type, check the new document’s Info panel and make sure the Path leads to the database’s Files folder. This has the advantage that these files are actually stored in the Finder, and Spotlight will index them.”

I did what he said, set the preferences for importing to “copy files to database folder” and I imported a bunch of articles. According to the info panel, the articles in question go to some folder within Devonthink. For example, I have an article called “Cooper against Chomsky” and here is the path info: /Users/andreariew/Documents/DEVONthink.dtBase/Files/Cooper against Chomsky-1.pdf

My problem is that I can’t get Spotlight to find any of the articles that I have in the database folder. In fact, I cannot even find the folder on my harddrive (the “DEVONthink.dtBASE” file doesn’t have a “file” that can be opened to reveal its contents).

Any suggestions on what I’m doing wrong? I tried forcing Spotlight to reindex my entire HD and I even ran repair permissions before my reindex.

Thanks,

Andre Ariew

May be your PDFs are encrypted. Otherwise Spotlight finds any PDF, jpeg etc. in the DT files folder.

Maria

As I was writing a response to Maria–telling her that my files are not encrypted and so that’s not the problem, I felt my face turn hot and I swear it morphed into a donkey’s head (just like a Warner Bros’s cartoon character). The answer to question must be rather simple. The key is that there’s a copy of my document in the database, leaving the original on my harddrive and it is the original that is searchable from Spotlight. I was ridding myself of the originals after copying to Devonthink’s database folder. So, of course, spotlight can’t see what was eliminated.

Is that right? Have I got the right answer? Or, is there something else going on? Bill said (above): “After a test import of each file type, check the new document’s Info panel and make sure the Path leads to the database’s Files folder. This has the advantage that these files are actually stored in the Finder, and Spotlight will index them.” I read the last sentence as indicating that the files I have just copied to Devonthink’s database are spotlight-searchable. Now I interpret the sentence to mean that the original is stored in the finder…"

Andre

The DevonThink.dtBase is a package file and you can view its contents from the Finder. RIght-click on the file and select “Show package contents” from the contextual menu. My own experience with Spotlight and PDFs stored in the database folder has been spotty. Some PDFs are searchable from Spotlight and some are not. Some of the ones that Spotlight cannot find are ones that I created, so I know they are not protected.

I’m going to back off my statement that Spotlight can index files stored in the DT Pro database Files folder, because the results are inconsistent at best.

In an earlier beta of DT Pro, before the public release betas (and an earlier version of Tiger) I was consistently able to see results in Spotlight for items in my database Files folder, including the Records of Sheets & Records. In DT Pro 1.0 and OS X 10.4.2 that’s not the case. I will ask Christian about this.

I’ve seen a hack to Spotlight that would let it index everything in the database files folder. The hack isn’t specific to DT Pro. But Spotlight already has enough problems, and I’ve seen reports of unfortunate consequences from the hack. So I do not recommend it.

Of course, DT Pro version 2.0 will let Spotlight index the files stored inside the database package.

Thank you Maria, Greg, and Bill for your help. I guess the work-around (for now) is what I said before, keep copies of the files that you enter into the database folder. Spotlight sees the copies.

If you add the Devonthink database file in the “custom” catalog of Quicksilver it is able to see (I think) all the files in the database folder. Granted it does not see the content of the file, you can still take advantage of Quicksilver’s other functions, like, access the articles on one keystroke, set up triggers for actions, etc.

Andre

I just checked it on my machine where it worked earlier, and indeed, Spotlight does not find the files inside packages any more. Sorry about the mistake, I used that feature intensively for a while I needed it and did not realize the change.

Maria

I am trying out the demo today.

Am I misunderstading the capabilities of Devonthink pro?

I drrgged 300 PDF files into the window and used file Sync…nothing happens?
When I type a search in the loupe for a word in one of the pdfs, it returns ‘No items found!’
These PDF files were scanned into my Mac by a Fujitsu Scansnap for the Mac, using the scansnap software.

Thanks

Steve

Hi Steve

DT is not an OCR program. It can read texts in PDF’s not the images of texts. If Preview can search inside your PDF’s DT will also do that.

OT: The best way to OCR PDFs, I think, is Acrobat Professional. Yes, it will cost a buck or two, but you can script the OCR process so that you can do them in batches. With articles I download from JSTOR (a database of academic journals) I almost always need to perform the OCR step in Acrobat. (ProjectMUSE articles have the text layer intact.) After the OCR step, they are fully indexable in DT. It is a step, but not an extra step, since there’s no avoiding it.

I assume it will be quite some time before version 2.0 comes out. Since I’m starting out here, I want to set things up “right” the first time. I currently have pdfs stored inside my database folder; for now would it be better to move them outside that folder and simply link to the files from DTPro? Would that speed things up on the search end inside DT as well?

My main concern in making a final purchasing decision will take speed into account. It’s not the only factor, of course, especially given the power of DTPro.

Thanks to Mari and Bill for their prompt replies to my previous inquiries.

Best,

Steven

Steven:

I’m importing PDFs into the database package Files folder. No difference in speed by comparison to external links.

But having files inside the package makes the database more portable, in case you want to run it on another computer or from an external drive.

Thanks Bill. I like having the files inside the DB folder; but then they remain outside Spotlight’s search capabilities, right? That might be fine, actually, since with Entourage the DB is not searchable from Spotlight. But searches within Entourage itself are quite efficient. And portability is convenient, to be sure.

Best,

Steven

Terceiro: do you have an automated way of doing this that you recommend? Automator or a script for taking PDFs out of DTpro, into acrobat, scanning, and putting them back in (or at least somewhere on the harddrive)?

also, those of you using Acrobat for this: how long does it take your computer (please specify what you have) to scan, let’s say, a 25 page PDF off of JSTOR or the like? Is a iBook g4 just not powerful enough, because I’m scanning one, and it’s only on p. 10 of 25 after about 20 minutes…I can only shudder to think how long it will take on 200 pdf’s!!!

I am currently searching for a paperless office solution and DEVONthink looks like a good lead. However, it appears tha there is no Acrobat Standard or Pro available for Macintosh X. The the OCR support that shipped with my scanner does not work (Readiris 9).

What do others use?

thanks,

fellow

Fellow: Yes, Adobe Acrobat (standard and professional) is available for OS X.

ReadIRIS 9 does work, also.

An enhanced version of DEVONthink will include OCR as a new feature. A public beta before the end of 2006 is anticipated, but not definitely promised. :slight_smile: