Can't select text after "Save to DT"

I’ve been pulling lots of pdf’s directly into DT (1.9.2), using the “Save to Devonthink.scpt” plugin (i.e., I open the file using Preview or the Schubert plugin in Safari, I hit “print” then select the “Save to Devonthink.scpt” option). This is a marvelous tool – except there’s one odd problem. I realize now that both DT and Preview are partially blind to the text imported this way – I can’t select any of the text in the files, and if I try to do a search for known words in the file in Preview it just reports “no occurrences.” This is despite the fact that the text is DISPLAYED ok in both DT and Preview, and DT even does the word count and classification ok. The Information panel in DT doesn’t indicate anything odd - the DT file is not locked and it reports “PDF+text” for the file. This seems to be only a function of using the “Save to DT” script. If I take the same starting pdf, save on my harddrive, then “import” that, the resulting file has text that can be selected, etc. But that’s not as desirable because I have to save the file elsewhere on my drive first and then import it, instead of just going directly into DT. Suggestions?

[1] Symptoms: Can’t select text either in DT or Preview, and Preview can’t search for text in these files.

That’s normal behavior in DEVONthink for PDF + Text files; you’re not seeing text, but instead an image of text. But the real clue is that Preview can’t find text in the files you’ve imported via the “Save to DEVONthink.script” option. That means there IS no text in the files – they seem to be image only. The files can be read, but only as images of text.

But you say that, if you save the PDF file to disk, then import it to DT (using, I assume, pdftotext conversion), the resulting text in DT is selectable, and Preview can search the text when the PDF file is opened in Preview. That knocks out two possible explanations: a copy-protected PDF, or a version of PDF that Preview can’t handle properly.

I’m stumped. Anyone else have an explanation? I’ve not seen this kind of problem in the instances when I’ve ‘printed’ a Web page as PDF.

[2] I don’t use the Save to Devonthink.scpt for importing PDFs into my database. Many of the PDFs that I want to capture contain bookmarks and/or hyperlinks, and those features are lost when one ‘prints’ a PDF to save it. That’s a bummer.

[3] There’s an alternative approach that is highly automatic, and preserves the PDF in its original form. It uses Folder Action scripts that were included in the Scripts & Macros folder (on the Disk Image of your DT PE 1.9.2 download).

Because that’s a topic that may be of general interest, I’ll post a mini-tutorial on how to do it in the Tips & Tricks section a bit later – perhaps by tomorrow night.

Bill – I look forward to your tutorial! I just re-confirmed that pdf’s that are “saved” directly into DT act like they are images only without text when you open them directly in Preview – even though DT knows and analyzes the textual content in the same file. I don’t know anything about pdf file structure, but it’s almost like DT sucked the text layer out of the file (maybe stored it into it’s own database?) and left only the image layer in the file that get stored in the Libray/ApplicationSupport/Devonthink/Files folder.