Search fails to find words in pdf

I am a very early stage user, so please forgive me if this question is silly, but…

I option-command dragged a few pdfs to my database, and then I searched for some words in those pdfs. The words are found in some of the pdfs, but not others. It appears that those words are part of the hidden text of the pdfs, but DevonThink isn’t finding them.

Screenshot 1 shows my database with a number of pdf and rtf entries. One pdf is selected, and the word “Hyptis” can be plainly seen in the scan.

Screenshot 2 shows the database searching for Hyptis, and that document not is part of the found set.

Screenshot 3 shows the same document as in 1, now open in Skim, searching for “Hyptis”. Skim selects the word shown in 1, indicating that the word is part of the hidden text in that pdf.

Any thoughts about why DTP is not seeing this word, and how I can educate it?

A

Check the document type. It might be PDF, rather than PDF + Text. Only documents of the second type can be searched. If it is PDF only then you’ll have to OCR.
Or perhaps its something more complicated?

Sorry, I’ve just realised that the document you are wanting to search is PDF + Text.
Well, that’s all I can think of. Probably some expert will come along and solve your problem.

When I run a search I always do in in the full Search window (Tools > Search). Why? Because I can instantly inspect the search configuration, and also because there are features not available to searches using the little search field in view windows.

Try that search using the full Search window. Make certain that it’s database-wide. Perhaps your previous search had been inadvertently limited to a certain group. Or perhaps you were doing Name searches, rather than All searches.

Thanks, Bill, but that is not the the problem. It appears that DevonThink is not indexing the pdf content for some reason (or the index is not being updated).

PS–I am adding .skim files to experiment with note taking, as you suggested somewhere on this forum, hence the new files shown in the search results. But the original pdf does not show up, even tho the search word appears to be in the hidden text.

Just to be clear, I imported the .pdf into DevonThink using a drag from the Finder with option and command keys pressed. According to the Help file:

which I think means that DevonThink should keep the index up to date if the file is changed (ie, re-OCR’d in Acrobat). Perhaps this is a misunderstanding?

If the Index-captured file is subsequently modified, the File > Synchronize command should be invoked to update the database.

Assuming that you had Index-captured a Finder folder into the database, if you select the corresponding group in the database and choose File > Synchronize, all the content of that group will be updated.

Alternatively, there is a script, ‘Synchronize.scpt’ that can be found in the Extras folder of the DEVONthink application’s download disk image (DT Pro and DT Pro Office only). Save that script to your computer.

If you select a database group that contains Index-captured content, you can open the Info panel, click on the ‘Select’ button to the right of the ‘Script’ field and browse for and select the ‘Synchronize’ script. This will install the script on that group.

Now, whenever you click on that group in the database to open it, the Index-captured files will be automatically updated.

Also note that text annotations added to PDFs are not indexed by DEVONthink, as these are not part of the searchable text layer of PDFs. (That’s one of several reasons why I don’t use text annotations in PDFs, but instead make my notes and annotations in rich text notes that I associate to the source document (of whatever file type) by hyperlink. Eric’s ’ Annotation smart template, which can be invoked by a keyboard shortcut, is an example of such notes.)

Thank you! That solves it.

Somehow I thought that the Synchronize command was only for imported folders. I see that I was mistaken.

Actually, the Synchronize command has no effect on imported groups. They don’t hold the Path back to the folder in the Finder.

I’d like to second the OP. Same problem here. Searching with the toolbar search box fails to find some files for some unknown reason. I double checked all search options. Manually selecting the text in DTPO & copying it to paste into TextEdit confirms that the correct text is present, but just isn’t found. Something is wrong. Some files are found, others aren’t.

I’ve never found an instance where a valid search query that works in the full Search window, and is also valid for the more limited options in the Toolbar search field, will fail in the Toolbar search. But I find that it’s much easier to make mistakes in the Toolbar search. :slight_smile:

There’s something else to keep in mind in designing searches, by the way. One can search by Name, Content, etc. Each of these should be understood as an individual field.

What happens if I do an All search and construct a query for a term that’s only in the Name field, AND another term that’s only in the Content field? Try it. This is a case in which the AND operator will return a null result. This is quite logical, of course. I’ve looked in all the ‘fields’ of the document, and didn’t find one for which the AND expression was valid (the OR operator would have returned a result).

Many of my PDF documents had cryptic names when downloaded or scanned into my database. I usually rename them by selecting some text in the content of the PDF and choosing the contextual menu option, ‘Set Title As’. Presto. I’ve eliminated that particular logical conundrum noted above.

Most of the time, when designing searches, I’m really interested in the Content of documents.

Ah, this is probably the issue. I must say that, unlike you, I think this quite illogical. I think a search for All should lump all fields (content, tags, name, etc) into one long string, then search this long string using an AND operator. Just like Spotlight seems to do.

I note that the DTPO manual is vague in this regard: “Searches all elements of a document.”

I tested this with DTPO & again with Spotlight. I used a text document containing the sentence “I like potato” & called the document “giraffe”. I then did an All search in DTPO for “potato giraffe” & another search using Spotlight. DTPO follow’s Bill’s logic & fails to find the document. Spotlight follows my logic & finds the document.

OK, now I know the source of the “feature” (or “bug” perhaps :slight_smile:, I can work around it by using Spotlight. Now, to work out how to set a Spotlight Smart Search for DTPO documents only…