importing pdfs from web

I’ve scanned several threads here on this subject but am not yet clear on this.

What I want to do: import from the web a pdf document which, in DTP, will be indexed & searchable, & contain any images that were in the original. Links that were in the original would be nice but don’t seem to come up much in the pdfs I want.

I use Safari 2, OS 10.4.11, and latest version of DTP. PDF prefs set to Use pdfkit.

What I just tried: Bill’s suggestion of saving the web-pdf to a folder in the finder, then importing it to DTP. I did not use a folder actions script as he recommended (haven’t tried those at all yet), but just used DTP’s File> Import> Files & Folders menu item.

successful import though no choice given me as to what DTP folder I wanted it to appear in.
images: did not contain any, so haven’t answered that question yet.
searchable: yes although the search terms were highlighted in a dull grey that made it hard to pick them out; in all other documents in DTP, the search terms are highlighted in blue, controlled I suppose by Apple Prefs panel.
size of import: text 82 k, entire import 1100 k. Is this all the pdf formatting? or does it include other information?

Other questions:
Some posts alluded to the fact that some methods of import would make the DTP database harder to backup to external media or transfer to a new computer. Can someone expand on this, please?


In the current versions of OS X (10.4.11 and 10.5.3) “printing” a Web page as PDF will result in a searchable PDF that includes the text, images and (usually) the hyperlinks of the Web page.

DT Pro and DT Pro Office provide a script to capture the PDF version of the Web page and allow one to store it in any desired group in the open database. To do this while viewing a Web page, invoke the Print command (Command-P). In the Print panel, click on the PDF button, then choose “Save to DEVONthink Pro” among the available options. Select the group into which the PDF document is to be stored.

About hyperlinks in the resulting PDF: I’ve seen a few cases in which hyperlinks are not captured. WikiPedia pages captured as PDFs do capture the hyperlinks, but because of the way WikiPedia pages are set up, many of the hyperlinks will be functional when the mouse cursor clicks on them, but will look like plain text.


Thanks for the reply, I learned a lot from it about importing web pages as pdfs via Print dialog, but must not have been clear on my question.

I have been putting pdf files into my DTP db as text because I don’t understand how it to do otherwise, and I really have looked in the Help and the Forums. Maybe my question is too primitive!

Let’s say I google something and find a pdf I want to put in DTP. I have Safari set to open pdfs in Safari, rather than in Adobe. In Safari they do not open as pdfs–no images, no thumbnail navigation panel etc. If I copy what I see and put it in DTP via New with clipboard, then I get this not-pdf text (editable, indexed, searchable, but not pdf). If I save the page after opening it in Safari, it saves with a pdf extension but is a webpage, not a pdf. So working from SAfari with these settings does not yield what I want.

If I go back to the Google page and drag the url of the pdf to the DTP icon in the dock, the icon darkens as if accepting the file but the file does not seem to appear when I search DTP for the keywords.

So, in order to get a pdf as a pdf, do I have to go to the google page (in this example), control-click and Download LInked File to my disk, and then do another step to import this into DTP?

Safari is capable of displaying PDF files, complete with images. When a PDF file is displayed, it can be saved to the Finder using Save As and the resulting Finder file is in fact a PDF file. If you saved it into a Finder folder to which the Folder Action script to Import to DEVONthink Pro is attached, it will be imported into your database automatically. Otherwise, you can manually invoke File > Import > Files & Folders and select the PDF in the Finder for import.

But you say that Safari doesn’t display PDFs.

I’m wondering if you have installed software that creates errors in your operating system. The Unsanity haxie ShapeShifter is notorious for causing problems, for example. My own preference is to avoid hacks to the operating system; I keep OS X on my computers pretty stock.

If you have Acrobat or Adobe Reader installed on your computer, an Adobe plugin is installed in your boot volume’s /Library/Internet Plugins/ folder. I always immediately delete that plugin, as it’s not necessary for viewing PDFs under Tiger or Leopard.

In the Finder Info panel, you should list the “parent” of PDFs as either Acrobat or Preview (I choose Preview, because I prefer the Find routine in Preview to that of Acrobat).