Filename changed after Import, OCR and Delete

HI,

if I use this script a file with name aaa.bbb.pdf will be aaa_bbb.pdf after this script
The Import and Delete script do not change the filename

Changing the following line
set theRecord to ocr file thePath to incoming group
to
set theRecord to import thePath to incoming group

the filename is ok, but no ocr

Cheers
Chris

I don’t know which script you are referring to - I don’t seem to have one which is called “Import, OCR and Delete”. As such, I can only guess what is going on* (although I will take a closer look if you can point me to the script or post it here [enclosing the script with ```, please; that way the script is quoted here in the appropriate format]).

A workaround might be to add set origName to name of theRecord at the beginning of the script (assuming theRecord is already defined at that point and points to the incoming file) and set name of theRecord to origName at the end of the script. If you don’t script yourself, I’ll be happy to take a look and make a more specific suggestion if you provide the original script.

* I could imagine this is happening as part of a routine to avoid “double file endings” such as “.txt.pdf”, a routine which is maybe being a little overzealous in this case. As such, you may want to experiment with my suggestion above; assuming you are OCR’ing to PDF and not always exclusively importing PDFs, you probably need to set up a more complicated routine which only replaces the section of the filename prior to the file ending (in your case setting origName to become aaa.bbb by using something along the lines of set origName to texts 1 thru ((length of name of origName) -4) of origName and then adding back “.pdf” later.

Hi Blanc,
I thought that this routine is part of the installation of devsonthink.
It is one of the folder action scripts. It is 10 years old.

Or do you have another script which will do this steps (import and ocr) automatically if a pdf file is created in a folder?

kind regards
Chris

-- DEVONthink - Import, OCR & Delete.applescript
-- Created by Christian Grunenberg on Fri Jun 18 2010.
-- Copyright (c) 2010-2017. All rights reserved.
on adding folder items to this_folder after receiving added_items
	try
		if (count of added_items) is greater than 0 then
			tell application id "DNtp" to launch
			repeat with theItem in added_items
				set thePath to theItem as text
				if thePath does not end with ".download:" and thePath does not end with ".crdownload:" then
					set lastFileSize to 0
					set currentFileSize to 1
					repeat while lastFileSize ≠ currentFileSize
						delay 0.5
						set lastFileSize to currentFileSize
						set currentFileSize to size of (info for theItem)
					end repeat
					try
						tell application id "DNtp"
							set theRecord to ocr file thePath to incoming group
							if exists theRecord then tell application "Finder" to delete theItem
						end tell
					end try
				end if
			end repeat
		end if
	end try
end adding folder items to

I believe that script is left over from DT2; you might want to read this thread.

The quick and dirty solution would seem to be as follows: as you suggested in your first post, use import thePath rather than OCR file thePath. Then, between the lines if exists theRecord then tell application “Finder” to delete theItem and end tell add the following:

set theName to name of theRecord
set thePath to path of theRecord
set theFile to ocr file thePath
set name of theFile to theName
move record theRecord to trash group of theRecord's database

(Sorry, I can’t paste this into the script you have posted; if you might enclose your entire script in ```(that is: put ``` on a line above and below your script) it would become more workable for anybody who wanted to copy it, or otherwise work with it)

What my addition does is to copy the name of your file (theRecord), OCR it, set the resulting file to use the same name as the source and then bin the original file.

I can’t test this, because I actually don’t really understand under which circumstances you would use the original script, and don’t have a working copy of it. But I have tested my addition, and that seems not to cause any harm. Using it is up to you, though :slight_smile: if it causes an earth quake and gets COVID, I’m not going to be responsible :wink:

import and ocr are not the same commands.

if I use this script a file with name aaa.bbb.pdf will be aaa_bbb.pdf after this script

This is likely to ensure broad compatibility across platforms. Development would have to assess this.

The “.” is changed to an “_” as the ABBYY OCR fails with multiple “.” in the name.

Hi Blanc,
thank you very much for this post. The code you sent seems to be working

kind regards
Chris

1 Like