Converting .doc, .Pages, .rtf etc to PDF

johnrover · February 3, 2014, 4:10pm

Hello. I have a number of documents in editable formats in DEVONthink. I would like to convert them to PDF, and have the PDF appear next to the source documents in the my DEVONthink folder structure. (And, sometimes, I want to delete the editable version of the document.)

Is there any way to do this quickly?

Right now the only method I can find is to print each on to PDF separately, then manually re-import and filter back into the proper folder. There must be a better way… Any ideas?

Bill_DeVille · February 3, 2014, 6:44pm

Adobe’s Portable Document Format (PDF) was created to be viewable in all common operating systems. That makes sharing documents among colleagues using different computer platforms and operating systems a simple matter.

If that’s your objective, I understand why you wish to convert your existing documents to PDF.

Is that your objective?

The PDF file format is bloated compared to plain or rich text files. It’s difficult to edit PDF documents. Capturing blocks of text to extract excerpts for use in another document can be a hassle. Adobe’s text note annotations are limited to plain text, are not searchable in OS X’s PDFKit (used by DEVONthink) and IMHO are ugly. Those are reasons why I try to minimize the number of PDFs in my research databases.

My main database that I use for research and draft writing contains many thousands of scientific papers and other articles captured from the Web as rich text. The file size of that database is 4.8 GB. If I had captured all of those references as PDF, the file size would exceed 50 GB, and (because of extraneous text on many pages if captured as PDF) the efficiency of searches and of the AI assistants such as Classify and See Also would be significantly reduced.

In that database, comprising about 30,000 documents, only 7% are PDFs (and only 1% are WebArchive, which is more bloated than PDF). For those occasions when I need to share documents with others who don’t use Macs, I’ll send them a version printed as PDF.

I’ve been lovingly building that database of references since 2002. I’m constantly adding new content and pruning obsolete or less useful content.

By contrast, my financial database consists almost entirely of PDFs, including scanned and OCRed documents. This is a smaller database in total number of documents (but relatively larger in file size), it is highly organized by groups and I do few searches and rarely use the AI assistants.

Still another database consists of methodological documents related to environmental sampling techniques, chemical analytical methodologies, quality assurance methods, data evaluation techniques, risk analysis and cost-benefit methodologies. The contents come largely from U.S. and EU governmental agencies and more than 80% are PDF format and are long documents, with a bulk of almost 16 GB. I find it useful, but much less efficient for searching and AI use than my main research database. That’s not because of the file format but document length, often including many topics within the same document.

No, I didn’t answer your question. But I’m curious as to why you want to move to PDFs as a primary document type.

korm · February 3, 2014, 8:03pm

Not within DEVONthink. There are numerous utilities for this, including command line scripts, but I’ve not found one that does batches of conversions in any way that actually saved me time. For that reason, as a matter of course I make a print-to-PDF version of most final drafts regardless of what software I used to create it.

Regarding Bill’s question – speaking for myself, (a) I generally don’t want to share editable documents with clients or others because I don’t want my work modified unless I modify it – partially for revenue protection and partially for content control; (b) a PDF is truly portable – especially between the platforms where I work all the time: OSX, Windows and iOS – none of the other formats the OP mentioned are universally portable – and PDFs insert well into lots of software packages I use. And regardless, storage is dirt cheap, so it doesn’t matter how much space a document takes. I can’t remember the last time in the past 10 years I even bothered to look at the size of a document.

Bill_DeVille · February 3, 2014, 9:11pm

korm, those are valid reasons supporting your uses of PDFs.

As I like to work on laptops with SSD drives, I do care about file size so as to retain plenty of free drive space. Although there are not significant differences in the memory requirements of my databases concerning the filetypes of documents, a computer’s performance does begin to degrade a bit as more than half the space on a disk is filled and still more as free disk space diminishes.

I’ve got a suite of 5 databases that are usually open when DEVONthink Pro Office is running. With that set and a couple more that are stored on my 500 MB SSD, the operating system and other files, I’ve already used 50% of the drive space, but still have room for lots of new documents so long as I keep them space-efficient.

Until SSD drives in the terabyte range become more affordable, I’ll continue to try to keep database sizes trimmed down in file size.

Even when big SSDs become inexpensive, I’ll continue to avoid making full page captures of most Web pages in order to prevent capturing extraneous images and text. Quite often, a full page capture of an article as WebArchive can require 2 orders of magnitude more storage space than a rich text capture of the article alone (PDF would be only slightly smaller). That extraneous text is what I want to avoid, as it makes searches and AI tools less efficient.