"Words" button on pdf's in DT?

bongoman · February 5, 2004, 5:40pm

Hi there

When I open a pdf in DT that I’ve previously imported, I notice that some don’t have a “Words” button next to the “Classify” and “See Also” buttons.

Why is this? If I import a bunch of pdf’s with the same settings, some are getting indexed but not others.

Bongoman

bongoman · February 5, 2004, 6:00pm

Me again…

Could it be because the pdf has a security setting attached?

I’m guessing this is the case.

cgrunenberg · February 5, 2004, 7:27pm

If there’s no “Words” button, then the imported PDF does not contain any text. The most likely reasons are…

encrypted documents
documents containing only vector/bitmap graphics

klanxner · February 8, 2004, 2:37am

On a somewhat related note: I have the DT pdf manual in my database. And this has added to my concordance a bunch of run-on "words" such as

"Chooseyourpreferredviewforthefrontmostwindowinthe" and "youcanalsocomfortablyhighlightimportant"

What is causing this?

Thanks.

cgrunenberg · March 12, 2004, 9:24pm

The conversion of PDF documents (both using pdftotext or TextLightning) to text may fail - there are often no such things as words or strings in PDF documents and therefore those utilities have to rebuild the text. And this does not always succeed.

Bill_DeVille · March 12, 2004, 11:36pm

I like Index import very much, but the "index" will be blank (hence, unseachable) if the PDF file is image-only, or is encrypted. To check the success of an Index import, open Info for the new file. "Kind" should display "PDF + text" if successful.

If a PDF is image-only (contains no text) and you REALLY need to capture the text, it may be possible to run it through an OCR application, such as ReadIris 9.

If the PDF is encrypted, TextLightning version 3 may be able to capture the text. Or, if printing is allowed, print it as a PDF file from Preview – then try an import to DEVONthink.