Importing webpage as pdf in DTPO strange behavior

Hello to Everyone,

Searched through the forum plus purchased a Take Control eBook on DTPO and still haven’t seen any mention of this…

When I import a webpage as PDF, the PDF ends up huge or the formatting is off depending on how I import.

Through both Safari and Chrome when I use the bookmarklet I end up with a PDF 1246 x 3875 pixel (that’s just one example… It varies between 1200+ x 3500+ pixels).

Copying the same URL into DevonAgent then converting to PDF gives me similar results.

If I drag the URL as a bookmark into the Sorter and later convert to PDF I end up with something in the range of 1000+ x 1700+ pixel.

The above are all viewable within DTPO but Preview and other PDF viewers\editors become useless.

If I use the “Save PDF to DevonThink PRO.scpt” via the print drop down menu, I end up with 612 X 792 pixels same as “Print to PDF” through OSX.

Using DTPO v.2.3 on OSX 10.6.8

Since I’m on the topic I was hoping someone might be able to answer 2 more questions.

  1. Is there a way to change what app the “open in external editor or viewer” toolbar shortcut accesses ?

  2. When I print webpages from this (among other) forums the formatting is lost and I get text on top of text… I realize that’s a question better left to another forum but seeing as how informative this forum has been ( can’t wait to get these posts into DTPO !!!) I figured I’d ask…

If there’s no easy solution to number 3 would the following thread be the easiest work around?

Sorry if any of this is obvious and really appreciate any help

Chris

  1. To change the default app that will open a document in an external app using the ‘Open Externally’ icon in the Toolbar: Go to the Finder and select any file of the filetype in question. Press Command-I to open the Info panel. Choose the application that’s to open this file. Then (immediately under that option) extend that choice to all files of this filetype.

  2. I rarely capture a Web page as PDF, as that’s not a very efficient approach for file size. More importantly, I rarely want to capture the entire Web page, as there are often extraneous elements that add to file size and that may reduce the focus (efficiency) of searches, Classify and See Also by including text that’s not related to the article of interest.

To capture a page such as a thread in the user forum, I usually select the desired area of the page in Safari or in DEVONagent Pro. Most captures are made as rich text capture of the selected area using the Service command, ‘Command-)’. In the case of scripts that are presented in a scrollable box (as part of the selected area of the page), I’ll capture the selected area as WebArchive, using the command, ‘Command-%’. These Services captures are instantaneous and do not require a reload of the Web page. (But they don’t work in Firefox or Chrome.)

First, Thanks Mr. Deville for your quick and thorough reply !

As per 2. I’d prefer not to load my editor (PDFPen Pro in my case) as my system wide PDF viewer since it’s much slower than Preview for basic PDF viewing. I guess the “open with…” contextual menu will have to suffice.

  1. Thanks !!! Those are great shortcuts !!!

Any suggestions concerning 1. ?

I really do appreciate your help !

Chris

If you wish to print an entire thread from a phpBB forum, such as the one you’re reading now, the print icon at the top of the forum (see image) yields a very nicely formatted page. Over here, I always access this forum from a bookmark in DEVONthink, and when I want to save a thread I click the print icon, and when the printer version is displayed I use “Capture PDF” from the contextual menu (control-click the page) to save that directly to the Inbox of my database.

Regarding your first question about the ginormous PDF. If the problem is happening with just that URL then there’s probably not much that can be done since the rendering of the page depends on factors that are outside the control of the OS X pdf-capture software that DEVONthink and other programs use. In addition to Bill’s suggestions, have you tried Safari Reader or the Readability bookmarklet to extract a plain-text version of the page?

Thanks Korm !! You guys are awesome !! Between these two replies I’m seeing a much more functional approach to “data hoarding” :smiley:

Though that original issue still has me perplexed. If DTPO is using the OS X pdf-capture software why such mixed results ?

Just to reiterate:
Using bookmarlets “DTPO to PDF”, or “DTPO to …(paginated)” in both Safari and Chrome yield EXTRA LARGE PDF.

Opening a bookmark in DTPO and capturing to PDF yields LARGE PDF.

Yet the “Save to DTPO.scpt” in Print dropdown menu yields the same results as “save to PDF” from OS X… ???

This is true for every URL I’ve tried, including this thread… Shouldn’t each of these processes be using the same OS X pdf-capture software?

As far as PDF go, I only use OS X’s buit in Preview and PDFPen Pro. None of the Adobe tools. As far as printing PDF I have an automator action as described here
macsparky.com/blog/2008/3/19 … -os-x.html

I don’t see how any of that would effect pdf capturing …??

Very odd… Same URL and three different results. I’m not even sure how to go about troubleshooting this one… :question: .

Thanks Again for your VERY helpful input !!!

Chris

No I haven’t. Do they relate to the Safari Reading List ? Forgive my ignorance as I’ve been a long time FF and Session Manager user going back to my PC days. Recently switched to Chrome but it sounds like DTPO fancies Safari.

Safari and DEVONagent Pro handle Services very well.

The Reader button in Safari’s URL address field presents a rich text view of the primary article on most pages, and can pull together articles tha span more than one page.