pdf2rtf Service, DevonThink and Pages!

Following my earlier question I tried the solution outlined. However Pages just doesn’t want to play ball! It will not accept a pdf directly, even though I got that impression from the information provided:

Do I need to install anything for the PDFKit to work?
The drag and drop from DT (trial, but I’m liking it!) just gives me a picture that cannot be edited. Export as RTFD is the same.

Any help would be appreciated, but DT Pro is not really an option at the moment :smiley:

Questioning, I’m not certain that I understand what you want to do.

DT Pro can import PDF files in several ways, depending on your preferences settings and choice of import methods.

If you wish to import only the text from PDFs into your DT/DT Pro database, leave the original PDF in the Finder but linked to DT Pro so that you can open it under Preview from DT/DT Pro, here’s how:

DEVONthink Pro > Preferences > PDF & PS settings as follows:
Index & convert: check Use PDFKit (Tiger) OR check Use built-in pdftotext (OS X 10.3.9, and also check Convert to plain text); and
Index & convert: check Convert to Rich Text
COMMENT: Tiger is recommended. If you use under Tiger the settings above, DT Pro will capture the rich text of a PDF file and link externally to the PDF EVEN IF you have set preferences to import a copy of the file to the databases’s Files folder, and DT Pro will NOT import a copy of the file to your database’s Files folder. If you are using Tiger, you do not need any other application or utility to capture rich text from a PDF.

NOTE: IF you encounter a PDF file that is image-only (just a picture of text), then – obviously – PDFKit can’t read the text. The workaround would be to run the PDF through an OCR application in order to convert the image file to a PDF that contains text.

NOTE: If the PDF file is encrypted to prevent copying, PDFKit cannot read the text. A workaround would be to use a third party utility to make a copy of the file with security removed.

TIP: Even if your preferences are normally set to import PDF files via File > Import > Files & Folders into your database’s Files folder and display the document in DT/DT Pro as PDF+text (which is the setting I normally use), you can capture the text content of any of your PDFs by exporting them to a target folder using File > Export > Files & Folders. Then TEMPORARILY change your Preferences > PDF & PS settings to capture rich text, and you will have a second import of the PDF file in your DT/DT Pro database, which contains only the rich text content. Note also that you can export the text of a PDF from Acrobat (full version), or capture selections as rich text from within Preview.

NOW ABOUT USING PDF2RTFService:

This is a very convenient way of capturing the text content of PDF files from within any Cocoa application, such as Pages or TextEdit. If that’s what you want to do, it really works.

Follow the instructions to install it.

Then logout/login (or restart) to initialize the new Service. Works under Tiger.

This doesn’t seem to work, at least on my computer. I changed the Preferences to capture Rich Text, but dragging PDFs into DT Pro led to a load of PDFs being imported as PDF + Text. I tried quitting DevonThink to make sure the preference change had taken hold, but to no avail. After checking the preference setting again and logging out to make doubly sure, PDFs still come in as PDF + Text. Any idea what could be going wrong?

Another weird thing: I installed the Service and logged out and in. It doesn’t appear in my Services menu (though the usual DevonAgent, DevonThink, Convert, Format, Speak Text are there). So I tried creating a Services folder in the system Library and putting the Service there, and then restarting the computer, but still no sign of PDF2RTF. Does anybody have any general advice about what to do in these situations?

ricki:

The method I described works for me. Temporarily check the PDF & PS import preferences to BOTH use PDFKit (Tiger) AND Convert to Rich Text. Under Tiger, this currently overrides other instructions in Preferences, e.g., to import PDFs into the database Files folder. The result will be a capture of the text of the PDF file, which is NOT itself imported into the database but remains externally linked.

Try again, using the preference settings I noted. This time (to reduce variables), import your PDFs by selecting File > Import > Files & Folders rather than doing it by drag & drop. I’ll bet it works.

Note: I recommend SmartWrap 2.5 for formatting captured text. It’s not perfect (e.g., indentations in scripts aren’t followed, but does a good job of making the captured text much more readable with just one click via Services. Not free, but if you do a lot of reading of captured text, it reduces puzzlement.