Quickly importing scans using a daemon and the Lingon app

This is my posted thread at MacScripter I moved to this Devonthink post. Since the issue at hand seems to more directly involve DT, I thought I might get better results–and ultimately help more people–here.

I am using an applescript and Lingon application (based on a MacTipper blog entry at this link http://www.mactipper.com/2008/04/use-launchd-instead-of-folder-actions.html) to integrate a Fujitsu ScanSnap scanner with DevonThink. (BTW, the script and Lingon app can be downloaded by following links on that blog entry.) In Lingon I’ve set up a daemon to run an applescript whenever a folder is modified. The workflow is: scan a document to a mac folder “Scan Folder”. The daemon “sees” that and triggers the applescript which imports the document into my currently selected DevonThink group. This has been WAY faster than folder actions which can take many seconds to a minute to activate, where the daemon is almost always nearly instantaneous (as long as there aren’t too many documents in the scan folder.).

Now to the problems: If the scanned file is written to the mac folder fairly quickly, the file imports into DevonThink beautifully. But if the file is written slowly (because, for example, a multipage document is being scanned to a single file) then it seems that the script is being triggered before the scan is finished and before the file is finished being written to the folder. The file does finish writing and (usually) a comment to signify that that file has been imported is written to the comments of the file (The script is set up to only import files without this comment.) BUT, again, no import takes place when a longer document is scanned. (Note: Import does work well on files written more quickly, such as one-page PDF scans, and even on multi-page scans…as long as each page is written to a separate file).

There were two versions of the script. Only the first part being different. The first version included my original idea of reading the size of the import folder every few seconds and then trigger the script when the size stops changing. (That code is at the very bottom of this post.)

The second version, posted below, includes Macscripter forum user adayzdone’s idea for triggering the script properly. Frankly, though, I don’t quite understand how his code exactly functions. However, this second version does, in practice, seem to get marginally better results (fewer documents being skipped/dropped for import into DT.)

The main goal here is to get the scanned items to import successfully no matter how long the scan takes. When this is all worked out I’ll post the finished product here and at MacScripter where I started this thread. It really is amazing how quick using a daemon is compared to folder actions. I think this could help a lot of DT users with a very quick work flow for importing a document into any selected DT group.

Can someone account for the odd behavior and maybe offer a more elegant solution for reliably delaying the import to DT until the longer scans are fully completed and the file is finished being written?

I’m thankful to adayzdone for the suggested code and to MacTipper for the, uh, mac tip.

One other bit of information. Here is the command line Lingon uses to launch the script when the folder is modified:

osascript “/Users/me/Library/Scripts/Folder Action Scripts/Import.scpt”

Here is the script: (I adjusted some of the paths and filenames for security.)

property folder_path : POSIX path of (path to desktop) & "Filers/Scan Folder"

on run
   
   -- This gets the name of the most recent file in the folder. (from adadyzdone)
   set xxx to first paragraph of (do shell script "ls -t " & quoted form of folder_path)
   
   -- This checks if the file is busy
   tell application "System Events"
       repeat until busy status of alias (folder_path & "/" & xxx as text) is false
           delay 1
       end repeat
   end tell
   
--original script, with slight modifications
   tell application "Finder"
       set action_folder to ((POSIX file folder_path) as alias)
       set folder_items to every item in action_folder
   end tell
   repeat with an_item in folder_items
       tell application "Finder"
           set item_name to name of (an_item as alias)
           set the_comment to (get comment of (an_item as alias))
       end tell
       if the_comment does not contain "processed" then
           set an_item to (an_item as alias)
           tell application id "com.devon-technologies.thinkpro2" to launch
           tell application id "com.devon-technologies.thinkpro2"
               set theDatabase to "/Users/me/Documents/Data.dtbase2"
               set theGroup to current group
               try
                   set thePath to an_item as text
                   if thePath does not end with ".download:" then
                       import thePath to theGroup
                   end if
               end try
           end tell
           tell application "Finder" to set comment of an_item to (the_comment & " processed")
       end if
       
   end repeat    
   
end run

Notes on this version including adayzdone’s suggested code:
I scan a long document. The file is written to the scan folder but no import takes place AND no comment is added to the file’s comments section. It appears the script is not triggering properly. THEN, I scan another document. Now, the PREVIOUS document IS imported to Devonthink and the comment “processed” is added to the comments section of THAT file; BUT, the file just scanned is ignored by the script. SO, the script DID trigger as the file was written to the scan folder, but it saw the previously scanned file that didn’t have the comment added, and imported it. (Note: Quick scans (e.g. one-page documents), even back-to-back quick scans in the Scansnap feeder, are imported properly…usually.)

Below is the script code for my original idea of delaying the import:

--My addition to the original script. Here's where I attempt to watch the folder for size changes until the scanned file is finished being written to the folder. (Pressing RUN in the script editor seems to recognize these lines. But not so when the daemon triggers the script.)
repeat
set base_folder_size to size of (info for folder_path)
delay 5
set change_folder_size to size of (info for folder_path)
if base_folder_size is change_folder_size then exit repeat
end repeat

devamag:

Congrats on the launchd stuff. Folder Actions are notoriously unreliable for many things but, with the help of Lingon especially, launchd can be used in much more powerful ways.

Cheers!

A few thoughts:

  1. Your code as written doesn’t trap any errors. Maybe try wrapping the entire script try block and use an on error trap. Maybe some error is occurring that can be trapped and recovered from.

  2. Googling “applescript busy flag” shows many complaints of the busy flag “not working” and several suggestions for workarounds. The lsof approach given at objectivelabs.com/scripts.php in the “Fast Is File Busy Check (Mac OS X)” code block seems compact and easy to implement, and appears to work. Here it is:


on isFileBusy(thePath)
--Tests to see if a file is in use
	try
		set myscript to "if ( lsof -Fp " & thePath & " | grep -q p[0-9]* ) then echo 'file is busy'; else echo 'not busy';fi"
		set myResult to do shell script myscript
		if myResult is "file is busy" then
			return true
		else
			return false
		end if
	on error err
		Display Dialog ("Error: isFileBusy " & err) giving up after 5
	end try
end isFileBusy

  1. If real-time performance is not an issue, maybe try scanning to an unwatched folder and periodically, manually move the files to the watched folder. These two folders should be on the same volume. My thinking is that since moving files from folder to folder on the same volume (probably) does not actually move bits, but (probably) only updates the files’ location metadata, moving the files should complete fairly rapidly, eliminating what could be some sort of race condition or just general flakiness in the the OS’s updating or AppleScript’s proper reporting of the files’ “busy” or “locked” or “open” status.

Maybe these ideas are just shots in the dark, but HTH.

@devamag didn’t mention asking Peter Borg if Lingon could support waiting until a file is written before triggering the next action. (E.g., I know it can pause for a defined count.) Since Peter wrote Lingon, maybe this question should be sent to him?

@Shoolie:

  1. I included an error trap with a try wrap of the main block of the script. (Not sure I did it right.) Got no error dialog. Just a silent failure of the import function.

update on point 1: I went back and commented out the entire “isFileBusy” block. (It’s not commented out below.) Leaving the block out made no difference to the script’s function. (Quick scans import. Long scans don’t.) Either I’m implementing that block incorrectly, or it’s not functioning as I thought it was intended to function. (And the same would probably be true of other script blocks I’ve attempted to use for the same function in previous script versions.)

  1. The Isof approach did not appear to work in this case, assuming I implemented the script block correctly. (Still silent failure of importing long scans.)

  2. Real-time scanning/importing is one of my main goals. (And it continues to work beautifully on quick scans (about 2-6 seconds), but import continues to fail on multi-page scans to a single file (about 15 seconds). The file is scanned to a folder and the “processed” flag is added to the comments of the file, but no import to DT actually takes place.

@korm: Peter Borg said Lingon does not support delaying until the file is written.

Here is the script with @Shoolie’s suggested code block included, and the addition of an error trap. The script functions, or does not function, as described above:

Thanks for your help thus far. Additional ideas, anyone?

property folder_path : POSIX path of (path to desktop) & "Filers/Home Group"

on isFileBusy(thePath)
	--Tests to see if a file is in use
	try
		set myscript to "if ( lsof -Fp " & thePath & " | grep -q p[0-9]* ) then echo 'file is busy'; else echo 'not busy';fi"
		set myResult to do shell script myscript
		if myResult is "file is busy" then
			return true
		else
			return false
		end if
	on error err
		display dialog ("Error: isFileBusy " & err) giving up after 5
	end try
end isFileBusy

on run
	
try
	repeat 1 times
		tell application "Finder"
			set action_folder to ((POSIX file folder_path) as alias)
			set folder_items to every item in action_folder
		end tell
		repeat with an_item in folder_items
			tell application "Finder"
				set item_name to name of (an_item as alias)
				set the_comment to (get comment of (an_item as alias))
			end tell
			if the_comment does not contain "processed" then
				set an_item to (an_item as alias)
				tell application id "com.devon-technologies.thinkpro2" to launch
				tell application id "com.devon-technologies.thinkpro2"
					set theDatabase to "/Users/me/Documents/data.dtbase2"
					set theGroup to current group
					try
						set thePath to an_item as text
						if thePath does not end with ".download:" then
							import thePath to theGroup
						end if
					end try
				end tell
				tell application "Finder" to set comment of an_item to (the_comment & " processed")
			end if
			
		end repeat
		delay 2
	end repeat
	on error errmsg
		display dialog errmsg buttons {"Oops"}
	end try

	-- Set the age of files that you want to purge from the folder
	set modDate to (1)
	set theDate to current date
	
	-- Check folder and move files to the trash that are older than N.
	tell application "Finder"
		try
			delete (every item of folder "Home Group" of folder "Filers" of folder "Desktop" of folder "me" of folder "Users" of startup disk whose creation date is less than ((current date)) - modDate * minutes)
		end try
		
	end tell
	
end run

Ok. Here’s a strategy. Does this have promise?

Have the script get the date & time, down to the second, of the file that’s being imported. (Does the MacOS assign a time with seconds or just to the minute?)
Have the script repeat until…DT “sees” a file of the same date & time in the current group.
THEN have the “processed” flag added to the comments of the file in the scan folder.

Thoughts on this strategy?