"Words" button on pdf's in DT?

Hi there

When I open a pdf in DT that I’ve previously imported, I notice that some don’t have a “Words” button next to the “Classify” and “See Also” buttons.

Why is this? If I import a bunch of pdf’s with the same settings, some are getting indexed but not others.


Me again…

Could it be because the pdf has a security setting attached?

I’m guessing this is the case.

If there’s no “Words” button, then the imported PDF does not contain any text. The most likely reasons are…

  1. encrypted documents
  2. documents containing only vector/bitmap graphics

On a somewhat related note: I have the DT pdf manual in my database. And this has added to my concordance a bunch of run-on "words" such as

"Chooseyourpreferredviewforthefrontmostwindowinthe" and "youcanalsocomfortablyhighlightimportant"

What is causing this?


The conversion of PDF documents (both using pdftotext or TextLightning) to text may fail - there are often no such things as words or strings in PDF documents and therefore those utilities have to rebuild the text. And this does not always succeed.

I like Index import very much, but the "index" will be blank (hence, unseachable) if the PDF file is image-only, or is encrypted. To check the success of an Index import, open Info for the new file. "Kind" should display "PDF + text" if successful.

If a PDF is image-only (contains no text) and you REALLY need to capture the text, it may be possible to run it through an OCR application, such as ReadIris 9.

If the PDF is encrypted, TextLightning version 3 may be able to capture the text. Or, if printing is allowed, print it as a PDF file from Preview – then try an import to DEVONthink.