Save PDF scripts in global scripts menu

humanengr · March 29, 2008, 10:22pm

The Save to DEVONthink Pro PDF service works as expected – bringing the PDF into DT and displaying it in the PDF window.

In contrast, using any of the Save PDF to DT scripts in the global scripts menu either doesn’t display anything in the DT window or displays a thumbnail if that box is checked in preferences.

Could I have something set improperly or have these global scripts been deprecated in favor of the PDF service?

humanengr

cgrunenberg · March 31, 2008, 10:09am

These scripts have to be run as PDF services and therefore have to be installed in the folder ~/Library/PDF Services.

humanengr · March 31, 2008, 4:38pm

Christian – I moved the scripts to PDF Services. Now, when I print-PDF-script from Safari, I get a “Print – Error when printing” dialog and the Safari window freezes.

From what I can tell, I must have gotten the scripts from http://www.devon-technologies.com/scripts/userforum/viewtopic.php?f=10&t=3644&p=18203&hilit=save+pdf+to+devonthink+no+group#p18203where valente reposted what he identified as your “Save PDF to DT (in a Group)” and “Save PDF to DT (not to a specific group)” scripts. I see his note there says to use the acrobat plug-in, which I’m no longer using.

I was experimenting with those scripts because I wanted a way to convert an HTML page in Safari to a PDF and retain the hyperlinks. The “Save to DEVONthink Pro” service works fine but doesn’t capture the links.

I now see that the “Save to Printed” service does capture links – but only for some web pages – e.g., yes for http://www.pge.com/about/news/mediarelations/newsreleases/ but not for http://en.wikipedia.org/wiki/Main_Page.

I also see, per Bill’s post http://www.devon-technologies.com/scripts/userforum/viewtopic.php?f=2&t=4258&p=21010&hilit=save+pdf+to+devonthink+no+group#p21010 that DEVONagent’s Data > Add to DEVONthink > (PDF one-page or paginated) will keep the links from a PDF page loaded in the DA browser window.

Which script would you suggest to convert an HTML page (in either Safari or DEVONagent) to a PDF and retain hyperlinks?

cgrunenberg · April 1, 2008, 3:13pm

The built-in commands of DEVONagent are probably the most powerful ones. But in the end DEVONagent uses the same functionality of Mac OS X like the scripts. Therefore links should be usually retained.

humanengr · April 2, 2008, 11:06pm

Christian – thanks. I see now that those DA Data menu commands do pretty much what I wanted – convert an HTML page to a PDF. I also see now that, while the links for wikipedia pages aren’t displayed as links in DT, they are active on mouse-over.

As you indicated, this is an OSX problem – I see the same thing happens using PDF services to print to any external viewer. (I’ve posted on Apple’s Leopard discussion forum about this. While it’s nice that OSX retains the links, the net result is a lot of unnecessary mouse movement searching for links if they’re not displayed as such. For a site like Wikipedia that embeds a lot of links in text, that’s a lot of work.)

BTAIM, this search for DT/DA scripts/commands brings me back to an issue I have long struggled with here – Is there a summary table somewhere that links to specific paragraphs in the DT/DA manuals or in these fora that show:

Which scripts, commands, etc. can be used to import/convert which file format;
Which “kind” (PDF, PDF+text, etc.) the result is recognized as in DT;
Which metadata (if any) is imported.

The lack of such an overview has been the major stumbling block to getting the best use from your excellent tools. (I first raised this issue in my post of http://www.devon-technologies.com/scripts/userforum/viewtopic.php?f=4&t=4578&p=22156#p22156)

Thanks,
humanengr

Bill_DeVille · April 3, 2008, 5:58am

One of the problems with documentation is that most users rarely read it, and even fewer would read it were it to become highly detailed.

The downloadable PDF documentation is searchable and provides a brief description of all of the menu and contextual menu commands.

But the additional layer you suggest would be a gargantuan task, never complete, always being revised and probably not a useful way to lay out for examination the tools one might choose to do a particular task.

Which is why I usually recommend that new users start small and experiment with the features.

Your example of the PDF capture of a Wikipedia page is a good example. Were the links captured? Yes. Were all of them “marked”, and so visible without a mouseover? No. An answer to the “why” question is that designers of Web pages have considerable latitude and it could take a lot of documentation to cover all of the possibilities.

A PDF download of a Web page may not capture all of the links used on that page, because the page may have been designed with Java or other tricks that PDF can’t interpret.

And a user of DT Pro who is still running under OS X 10.3.9 can capture a Web page as PDF, but the resulting PDF won’t capture hyperlinks. DT Pro makes a great many calls to the operating system (PDF captures are only one example) and there are significant differences between Jaguar, Tiger and Leopard that affect what happens. That’s obviously also true of some scripts.

A user’s choice of Web browser introduces important variables. Safari and other WebKit browsers (including DEVONagent and the DT Pro browser) are Cocoa applications and make use of OS X Services. Camino and Firefox, whatever their other virtues, are not Cocoa-based and have limited to no communication through Services. Another variable among browsers is their Applescript dictionary – built-in scripting “hooks”; Firefox doesn’t work well with Applescript. So no set of scripts can be designed that work equally well among all browsers, unless they were limited to the level of the least scriptable browser.

So when I want to capture information from the Web, I limit my browser choices to those that communicate well through Services and work richly with Applescript. That excludes Camino and Firefox, for example. I use (depending on my workflow of the moment) DEVONagent, DT Pro’s built-in browser or Safari, in order of decreasing “richness” of the capture to database options.

Another big set of variables: I use rich text captures of Web material almost exclusively. I want to capture just the text and graphics of an article (and the hyperlinks and source URL) and to exclude all other material on that page. I don’t care about the Web site designer’s layout; I just want the information content that’s important to me. That has important consequences for the efficiency of searches and See Also use, as well. I subscribe to online scientific journals such as Science and Nature. They provide downloadable copies of articles in searchable PDF format. I don’t put those in my database. Instead, I select the text and images from the HTML view of the articles. My captures are richer, because they include hyperlinks usually not present in the PDF versions, and “tighter” because they avoid extraneous material such as the end of the previous article and the beginning of the next article.

But if I were a Web page designer, I would have very different interests and consequently a different choice of capture format, so a different workflow from beginning to end.

My preference for rich text captures is part of my overall workflow. I do my writing inside my database in rich text. I can easily extract quotations from other rich text documents. I can insert hyperlinks into rich text documents, or use highlighting and insert page marker cues. When I’ve finished a draft I’ll copy/paste it into Pages or Papyrus for final polishing including layout, footnotes and endnotes. (If I need to furnish the final output in MS Word, either of those can do an acceptable conversion to Word. Pages accepts my draft material from the database via copy/paste beautifully. A transfer of images to Papyrus requires more tinkering, but I have other reasons for liking Papyrus for some purposes. I’ve got MS Office, but avoid using it.)

I generally don’t download pages as PDF, for the additional reason that extracting a quotation from a PDF requires editing to get rid of all those fixed line endings. But I do download PDFs of pages from my bank’s Web site, because that gives me an accurate record of transactions and captures as HTML or WebArchive are not permitted on such a secure site. The simplest method is to “print” the page as PDF, using DT Pro’s Save to DEVONthink Pro script in the File > Print panel under the PDF button. (A different script to save the page as PDF won’t work on my bank’s secure site if it attempts to grab the URL, download and convert.) And I’ve scanned thousands of pages into databases as searchable PDF+Text.

Re your question about styles and lists in rich text, just activate the Ruler for a text document. The Ruler options act just as in TextEdit.

Just wanted to illustrate a few of the variables involved in choices of tools for a workflow. Most of them can’t be described within the documentation of a single application. Overall, a good workflow should honor the personal preferences of the user, and should efficiently move along to the desired end result. Although I prefer rich text captures, I’ve got a mix of file types in my database because I have to handle a variety of sources and contexts.

I find that I end up using a relatively small subset of all the tools available in DT Pro Office for most of my work. I’ve settled on those through experiment and practice. When new features come along, I sometimes adapt to them and change procedures. For example, rich text captures of tables in early version of OS X were unusable. Now they are pretty good.