OCR PDF via AppleScript without importing

calebaharrison · July 27, 2016, 9:15pm

Hey Folks,

I searched as best I could, but couldn’t find an answer that worked ([url]Using Applescript to convert a PDF into PDF+Text] is the closest I could find). I have a folder (‘attachments’) on Dropbox that I use to store the PDFs that are linked to my Bookends bibliography. This folder is indexed to my DTdb. Here’s what I am trying to accomplish:

Add PDF to ‘attachments’ folder.
Hazel rule triggers.
Hazel runs DT AppleScript to OCR file.

So I have a few questions about this.

A. Can I run the AppleScript on a file without importing that file to DT? In other words, can I just use the scripting features of DT to OCR a file?
B. If so, would something like the following work? (‘theFile’ is what Hazel uses as a placeholder for the file it is operating on)

tell application id "DNtp"
	ocr file theFile
end tell

I’m inclined to think that code won’t work (namely because it doesn’t! - so far it is just timing out…). But I’m not sure why. In the referenced post above, the example code suggests that the important line is “ocr file [path-to-file]”. Shouldn’t this work?

Any help/advice/scripts are appreciated!

Thanks,
Caleb

BLUEFROG · July 27, 2016, 10:24pm

No. OCR puts the resulting file into your DEVONthink database. That’s the simple answer.
Could you script something that would do otherwise? Yes, but it would still import, then you’d have to move it, and clean up after yourself.

You’d also have to decide if that’s worth the effort.

calebaharrison · July 28, 2016, 1:56pm

Thanks for the response! There’s a good chance I’m making this more trouble than its worth. I used to have PDFpenPro do my heavy lifting with scripted OCRing, but then they released new versions and I can’t just run the old version anymore for whatever reason; I’d rather know I can rely on DT to do my OCR scripting than think I should also look at other software.

Let me try one more way: I currently have the AppleScript importing the OCR file into a group (‘SendToBookends’). There are group tags (‘bookends’, ‘finished’) in the group, and I understand group tags to be tags that are applied to each record in the group (is that wrong?). Are DT tags accessible in any way, outside of DT? I had thought that DT tags were added to imported files as metadata, and that e.g. Hazel could then access that metadata to act on the files. If the DT tags aren’t accessible, though, then I’ll have to find another way.

It’s not the end of the world if I can’t do what I want in a simple way, it’s just that the scripted OCR features are a huge reason that I’m interested in the Pro Office version. (It’s like magic to send a janky PDF from e.g. iOS Safari to a Dropbox folder, then access that folder in my iOS PDF reader and have the PDF magically be searchable, able to be annotated, and look pretty) And I’m wanting to keep my research PDFs in Dropbox, rather than a DTdb, so that they are accessible to e.g. Bookends, iOS PDF readers, etc.

Thanks again!

calebaharrison · July 28, 2016, 6:54pm

I had thought I’d responded to this, but apparently I hadn’t. (Or I responded to some other topic accidentally.) Whoops.

Here’s what I have currently.

Keyboard shortcut in Skim adds Spotlight comment ‘SendToBookends’ to open window.
Hazel watches smart folder for files with ‘SendToBookends’ comment.
When matched, Hazel deletes comment and OCRs file into a ‘SendToBookends’ folder in DT.
DT ‘SendToBookends’ folder has the following action script:

on triggered(theGroup)
	tell application id "DNtp"
		set theRecords to children of theGroup
		repeat with theRecord in theRecords
			if type of theRecord is PDF document and word count of theRecord is greater than 0 then
				export record theRecord to "/User/Caleb/Desktop"
				move record theRecord to (trash group of database of theRecord)
			end if
		end repeat
	end tell
end triggered

DT moves record in ‘SendToBookends’ folder to the trash.

What I can’t seem to get it to do is to export the document associated with the DT record into a folder. In other words, I’m not sure “export record theRecord to folder_path” is doing what I think it’s doing. Am I way off?

calebaharrison · August 8, 2016, 9:32pm

I’m trying to get export working by itself in the Script Editor, and I can’t seem to do that. What is wrong with the following script?

on triggered(theGroup)
    tell application id "DNtp"
        set theRecords to children of theGroup
	repeat with theRecord in theRecords
	    if type of theRecord is PDF document and word count of theRecord is greater than 0 then
                set theData to data of record theRecord
		export theData to "/User/Caleb/Desktop"
		move record theRecord to (trash group of database of theRecord)
	    end if
	end repeat
    end tell
end triggered

I would have thought that something like this would export the PDF attached to the record to the desktop. I must be getting something wrong, though, because it doesn’t seem to work.

bjorn · June 13, 2019, 8:30pm

Hi!

Would it be possible to perform OCR using a smart rule in an indexed folder in iCloud Drive and then use Hazel on the same folder?

lutefish · June 13, 2019, 8:43pm

I think several people have variations on this now, and some of the issues in 3.0b2 have been fixed in 3.0b3 - this is a smart rule for an indexed folder-

bjorn · June 16, 2019, 8:25am

Thanks a lot! Any troubles running or resulting inconsistencies?

Regards,
Björn

OCR PDF via AppleScript *without* importing

OCR PDF via AppleScript without importing