Hi all,
not sure if this is the place to bring this up, if not don’t answer me :slight_smile:

The majority of my imports into DT are PDF files  (science literature), one probem I have is that some science journals encrypt their files. This gives the result of a concordane that only contains the word "encrypted" (although I just noted this goes away in 1.5.2??). I have had some discussion with Marcel Weiher(Textlightning) on the problem but he is worried about legal issues to do with him allowing Textlightning to decode these files.

Does anyone have any suggestions as to how I may imoprt the text of these PDFs into DT?


You could try to deactivate TextLightning and then DT uses the integrated pdftotext tool to import PDF documents. If that does not work too, just send me an example file.

Possible work-around for encrypted PDF files:

In earlier versions of OS X, it was possible to get around PDF file encryption by opening and saving the file in Preview. That no longer works in OS X 10.2.4.

I like to use TextLightning to provide RTF conversion of PDF text content, so that I can see the results in DT searches. If the PDF is encrypted, TextLightning can’t do the RTF conversion, so DT can’t search the PDF content. If that happens, and there’s a Web HTML page corresponding to the PDF file (either abstract or full text), I copy the HTML page and paste it into the DT content page for the PDF file. Now a DT search can ‘hit’ the file.

Here’s a more complicated approach, if there’s no HTML counterpart to the PDF file. Save the encrypted PDF as TIFF pages, OCR the image files, export the OCR’d text to Word, then copy/paste the text into the DT content page (if the PDF is a long file, I’ll probably Summarize the text). This doesn’t work if the encrypted PDF is a low-resolution image, as happens with some statistics sites. Since all I want to do is search for the relevance of the PDF document (I paid my subscription to the journal and I didn’t alter the PDF file) and I’m not going to distribute my database, I think this is fair use. It’s the equivalent of dictating into DT content while reading the PDF file, but a little less time-consuming. In the old days, I did bibliographies of thousands of references with summaries from hand-written or dictated notes. Now I use DT!

The easiest workaround at the moment is probably to open the encrypted PDF documents in Preview, choose "Print…" (if printing is possible!) and "Save as PDF…". This works still under Mac OS X 10.2.4.

Afterwards, deactivate the usage of TextLightning (see "Import" preferences) and import the new PDF documents in DEVONthink. The results (see concordance) are not always perfect but anything is better than nothing.

I have a problem that hasn’t been discussed.  I keep my files synchronized on my home and office computers by carrying a portable firewire disk back and forth.  After synchronizing last night, all of my PDF files appeared as small unreadable PDFs (like large thumbnails) when opened from within DT by either the built-in viewer or an external Acrobat reader.  Yet the full files are still in my DT files folder, and can be opened and viewed from the finder.

I worked around this by reimporting, but now have two copies of every file (one large and one small).

Has anyone else seen this?  Any ideas on how to prevent it from happening again?  Thanks in advance for your help.

Sounds very strange. Could you please send us either the database or a screenshot (e.g. using the split view showing those "large thumbnails") and another screenshot of the info panel ("General" tab) while such a PDF document is selected. Thanks!