PDF or Web Archive?

I have started to do some research and want to save the articles I am reading into a database in DEVONthink Pro. Some of the articles have photos which I want to preserve in the articles themselves. I have a quick question - I do realize that both formats will preserve the photos, however, is it better to save the articles in PDF, or is it better to save the articles in web archive format?

Hi,

I store the crucial part of the page (minus ads etc.) as RTF, via Services. This way, you can still access the links, which is impossible from PDFs, and you get rid of all the junk. The web pages I collected this way look quite harmonious in my db.

The tables in websites were the only problem, so I was looking forward to Tiger’s ability to render tables in RTF. Unfortunately, the results now often look worse than the RTFs in the Panther era, and the RTF tries to render tables for design-purposes of the webpage as well, which is worse than running text. Somehow I got used to the situation and get around with RTF in Tiger now as well.

Maria

You’ve left out RTFD (rich text capture including images) as a capture format that, like Web Archives, both retains working hyperlinks and images. But RTFD captures may or may not render tables well, depending on the particular Web page and how the capture is made.

Another advantage of RTFD and Web Archives is that its easier to extract text or images from those formats than is the case for PDFs.

But PDFs offer the advantage of cross-platform compatibility, which both RTFD and Web Archives lack (although one could always convert them to PDF).

And PDFs offer the advantage of saving disk space compared to the other file formats, especially if a compression utility is used.

So there are pros and cons to each format. It’s your decision, and you may wish to experiment. Personally, I may use any of the three options, depending on the material and what I want to do with it.

Note: I’ve got Pages, and wasn’t all that thrilled with it. But I used Pages to format some of my RTFD documents, and found that, to my astonishment, Pages can produce MS Word versions of the document that correspond precisely to the Pages appearance, including placement of images, and with working hyperlinks and bookmark links. So that’s another ‘standard’ for sharing material with Windows users. If you’ve saved Web Archives, you can later make a rich Note (RTFD) capture from the Web Archive, and through copy/paste into Pages retain the ability to do conversions to MS Word. (Or, of course, export to PDF.) Pages lets you add headers and footers and even footnotes.

You can download an example of a DT Pro RTFD capture, “Screening Estrogens” formatted in Pages and exported as MS Word and PDF. The URL is http://homepage.mac.com/wbdeville/FileSharing4.html.

While on the side topic of sharing material with others, I should also mention DT Pro’s ability to publish material as a Web site. Example: Check out my HTML conversion of an earlier version of the DT Pro Tutorial database at http://homepage.mac.com/wbdeville/DT_Tutorial_Export/Welcome%20to%20DEVONthink%20Pro.html. All I had to do was select the Welcome page and select File > Export > as Web site. I then put the export folder in my iDisk Sites folder and set up the Web page URL.

Bill, I appreaciate your thorough reply. I had not thought of RTFD. I will try it as I have no need for cross-platform compatibility. I will look at the links you sent me.

I also appreciate your post, Maria. RTFs wouldn’t work as I want to save images as well.

(Correction on my part, Maria. I just tried saving in RTF. RTFs work just fine. I did not know they would.)

Gino:

Apple’s Cocoa rich text without images saves with the suffix .rtf.

If images are included in the selection and the document is saved, the suffix becomes .rtfd.

Cocoa applications such as TextEdit and DEVONthink Pro can open either rich text format and display it properly.

But applications such as MS Word can open .rtf files, but not .rtfd files.

It can get a bit confusing. :slight_smile:

Bill and Gino,

I really do not see the difference. There is no service “Save as RTFD-Memo”. I only see “Save as RTF Memo” or “Append RTF Memo” (In German, so I do not know the exact translation. Using this service fives me nice RTFs and RTFDs with pictures, there is no need for me to think about RTFD or RTF, it just works with pictures and without.

Of course I know of the difference between the RTFD package and the RTF file, but the service is one and the same.

Something wrong?

Maria

Hi, Maria:

No, nothing’s wrong. It’s interesting to note that in the English localization, the DT/DT Pro Services and the DT/DT Pro browser contextual menu the options are “Take Rich Note” or “Capture Note” as the application depends on the OS to save the capture in the correct file format, depending on the selected content.

By the way, did you receive my email attachment showing the RTFD file with properly rendered tables?

Is there a way to automate pdf capture? I imported several found sets from DevonAgent and I would like to convert all the URLs to pdfs of the pages. Currently I go through the “print” dialog box and choose “Save as pdf”. Unfortunately there are a large number of them that I would like to do further processing with using DevonThink pro. Is there a more automated way to go about this?

With the current version of DEVONagent it is not possible to use scripting to get the PDF for a webpage. The upcoming version 1.8 will have this capability (and probably available as an Automator action as well).

Annard

Bill,

yes I got it, but one example of nicely rendered tables won’t help. I have documents with nicely rendered tables as well.

Sometimes, the Apple text machine renders like that on my computer as well, but sometimes it does not. Since I have found no way to influence the result when capturing RTF (or RTFD :wink: ) I have either to check how results turned out or hope for the best. Sometimes I do this, sometimes that.

The second problem is that the text machine now renders tables as well as tables that were created for design purposes. RTF is then useless as a text.

Problem continue to exist, have become worse in some cases with Tiger. But this is an Apple problem, I don’t think that DT can solve it…

So far my experiences – they were better with Panther as far as capturing RTF is regarded.

Best,
Maria

Later today check out the Automator workflow “Convert URLs to Webarchive” (in the final release of DT Pro) that will convert a selection of URLs to webarchives with the aid of DEVONagent.

This could be adapted later to do the same for PDFs with the future release of DEVONagent.

Thanks Annard. That’s a good tip. Any idea on the timeframe for the new version of DevonAgent? Weeks, months or years away???

David

We can but hope that it won’t take as long as DT Pro… But let’s give Christian a break, he did a stellar job on this new product! :wink:

We’re working on it, no time off for us…

Hi Annard,

Maybe I’m doing something wrong, but several of the URLs I’m trying to create archives for are just not working. The Automator workflow just stops with no explanation other than “failed”. It gets through the Applescript and loops through opening the URLs in DevonAgent and then dies when trying to import the resulting archives into a DevonThink group. Am I doing something wrong? Is there some characteristic of an archive that won’t let it import properly?

Here is an example URL that I am having problems with:
http://64.177.91.153/xcart/customer/product.php?productid=332&cat=&page=

David

Hello David,

I tried the URL you gave me and it gave me no problems. Try to see in the Automator log and the Console what is going on. If an error occurs it will be logged in the former, any warnings in our code may be logged in the latter.

Automator log: View -> Show Log
Console.app: clean the console window by clicking the sweeper button

Now run the workflow and let me know what any of these logs print out (you can do so in a private message if you like).