Merging a folder of images

I was trying to merge images and run OCR on top of them using using img2pdf and ocrmypdf.
Running it via the Terminal does work.
This is the shell command: img2pdf *.tif | ocrmypdf --output-type pdf -l amh - myfile.pdf

I couldn’t get it work via DT’s smart rule.

on performSmartRule(theRecords)
	tell application id "DNtp"
		repeat with theRecord in theRecords
			set thePath to (the POSIX path of theRecord)
			do shell script "img2pdf " & thePath & "*.tif | ocrmypdf  --output-type pdf  -l amh  - " & thePath & "myfile.pdf"
		end repeat
	end tell
end performSmartRule

Can you guys help me with it?

Any reason why you’re not using the merge records and convert image or ocr file commands? Anyway, the smart rule fails because groups don’t have a path (and images inside the group might be located in various folders).

1 Like

Also, in AppleScript you have to use quoted form of for paths that you pass to do shell script. Otherwise, spaces etc. in file names will throw the shell off.

1 Like

I understand the problem. Thank you.

Abbyy’s OCR doesn’t support this language (Amharic). I am using Tesseract. As to the merging and covert image commands, I want it all to pass through one system (ocrmypdf).

I will try with hazel then.

Thank you.

Even with the missing quoted form of, this looks fishy to me. thePath is what it says – the path of the record. Which always contains the extension.

But you append *.tif to this path, which makes absolutely no sense. *.tif on the shell is equivalent to “all files with the extension .tif”. But to append *.tif to a path that already contains an extension makes no sense at all. Either you loop over the documents selected by the smart rule and use only quoted form of thePath. Or you want to process all matching records at once, in which case you’d have to build a long string consisting of all the quoted form of thePath values, separated by spaces. You’re mixing these approaches.

In JavaScript, I’d try something like this

function performsmartrule(records) {
  const currentApplication = Application.currentApplication();
  currentApplication.includeStandardAdditions = true;
  /* Build an array of quoted path names */
  const allPaths = records.path().map(p => `"${p}"`);
  currentApplication.doShellScript(`img2pdf ${allPaths.join(" ")} | ocrmypdf --output-type pdf -l amh - /tmp/myfile.pdf`);
}

This is not tested
What it does:

  • It gets all the paths in an array (records.path())
  • Using map, it creates a new array (allPaths) where all paths are included in double quotes to protect spaces (this does not protect quotes in filenames, though).
  • Then it uses doShellScript, taking all elements from allPaths and joining them with a space character into a single string
  • The rest is the same as in your script, except for the output file: I use /tmp/myfile.pdf, but that’s not mandatory. Any POSIX path will do here.

The smart rule processes actually groups which don’t have any path at all (and no extension of course).

Right – I missed that. But then the shell thingy can’t work at all because groups don’t have a path. Yes, you said that already :wink:

So, what one could do:

function performsmartrule(group) {
  const currentApplication = Application.currentApplication();
  currentApplication.includeStandardAdditions = true;
  /* Build an array of quoted path names of all children of this group */
  const allPaths = group.children.path().map(p => `"${p}"`);
  currentApplication.doShellScript(`img2pdf ${allPaths.join(" ")} | ocrmypdf --output-type pdf -l amh - /tmp/myfile.pdf`);
}

Not sure though if children is the right element to use here – we talked about that, but I forgot the subtleties differentiating between records, children, and contents.

1 Like