Any way to batch convert files to searchable PDFs?

jprint714 · July 21, 2012, 10:21pm

I have a number of .doc, .rtf, and dodx files that I’d like to convert to PDF so that I can annotate them.

Is there any way to batch convert them to searchable PDFs in DTP? If not, is there another process that anyone could recommend to accomplish such a process?

Thanks!

I’m also wondering if anyone has ideas about whether there’s another file format worth converting to after annotating PDFs. Searchable PDF files can get bulky, space-wise, so maybe there’s a way to compress them into file type after I’m done annotating them. Any thoughts?

Thanks again…

cgrunenberg · July 23, 2012, 8:22am

This script could be used in case of rich text documents, therefore Word files have to be converted via Data > Convert > To Rich Text first.


-- Convert documents to (paginated) PDFs
-- Created by Christian Grunenberg on Mon Dec 01 2008.
-- Copyright (c) 2008-2011. All rights reserved.

tell application id "com.devon-technologies.thinkpro2"
	try
		set theSelection to the selection
		if theSelection is not {} then
			show progress indicator "Converting..." steps (count of theSelection)
			set theWindow to missing value
			repeat with theRecord in theSelection
				set theName to (name of theRecord) as string
				step progress indicator theName
				if cancelled progress then exit repeat
				
				set theType to type of theRecord
				if theType is not group and theType is not feed and theType is not smart group then
					if theWindow is missing value then
						set theWindow to think window of (open tab for record theRecord)
					else
						set record of theWindow to theRecord
					end if
					
					repeat while loading of theWindow
						delay 0.5
					end repeat
					
					set theData to paginated PDF of theWindow
					-- set theData to PDF of theWindow
					
					try
						set theParents to parents of theRecord
						set thePDF to create record with {name:theName, URL:(URL of theRecord) as string, type:PDF document} in (item 1 of theParents)
						repeat with i from 2 to (count of theParents)
							replicate record thePDF to (item i of theParents)
						end repeat
						
						set data of thePDF to theData
						set creation date of thePDF to creation date of theRecord
						set modification date of thePDF to modification date of theRecord
						set comment of thePDF to comment of theRecord
						set label of thePDF to label of theRecord
					end try
				end if
			end repeat
			if theWindow is not missing value then close theWindow saving no
			hide progress indicator
		end if
	on error error_message number error_number
		hide progress indicator
		if the error_number is not -128 then display alert "DEVONthink Pro" message error_message as warning
	end try
end tell

jprint714 · July 25, 2012, 8:07pm

This looks great! Thanks so much for putting this together… Really appreciate it! I’ll put it together and let you know how it goes…

By the way, I don’t think .docx Word file formats can be converted via Data > Convert > To Rich Text… If I’m wrong about this, is there some other way to do it?

Thanks so much again!

Greg_Jones · July 25, 2012, 8:24pm

Using the Convert>To Rich Text works fine with .docx documents as long as the original formatting is not overly complex (multiple columns, flowing paragraphs, etc.).

jprint714 · July 25, 2012, 10:37pm

Ok, thanks. Strange that it’s not working on my end…wonder why.

Anyway, thank you again for all of your help, guys!

arnow · August 3, 2012, 11:55am

I belief it is easier and it gives much better results if you use Word rather then DevonThink to convert .rtf, .doc, or .docx to pdf. Have a look at the relevant Microsoft and Apple forums (for example, this discussion) to find out how this can be automated. I myself batch convert Word files to pdf by dropping them on an automator app running Spazek’s script. Works excellent!

jprint714 · June 10, 2013, 5:20pm

Hi @arnow, I realize it’s been a while since we exchanges posts re: this issue, but I’ve just tried to set up and run the script you recommended, but found it doesn’t work. I saved it to the DTP scrips, and am trying to run it from within DTP. Are you able to do the same? Did you make any changes to the script you suggested? Just wondering. Would be great to have this thing working… Thanks again.