DT3b2 Smart rule issue with OCR

I setup a smart rule to OCR whatever is scanned and saved by my network scanner into an index group. My OCR preference “Move to Trash” is enabled. This setup is working normally in DT3b1.


Now in DT3b2, after the OCR, the original file is deleted in the index folder (observed from Finder) but the index group is left with an item showing

At the same time, the pdf+text file is automatically moved into the Global Inbox.

Thanks in advance

I was just about to post this bug - the new OCR file appears in the Global Inbox, but is deleted from the indexed folder.

Also, there is an error with the OCR: the file was a single page of text in landscape, but not identified as such, and Abby Reader has (correctly) flipped the image, but cropped out content.Waleys 16-22-01-008.pdf (328.8 KB)

If you have the option to delete the original, then logically the original - the indexed file in the Finder - would be deleted.

If you used OCR > Apply, the file would be converted in place and remain indexed.

Hi

The main issue is that the OCRed file is automatically moved to the global inbox by the same smart rule under DT3b2 (which it shouldn’t), that’s why the original group is left with a missing file.

@aedwards or @cgrunenberg would have to assess this.

Thank you.

While this is being assessed, is there any way to move the newly created item back to where it came from? I’m trying to “Move” the new files back to the databases from which they’ve come, but I can’t figure out how to identify that (unless I create a smart rule for each database, and then run those separately). Thanks.

If you are running a smart rule, what did you define as your target - a specific group, all databases, …?

“All databases.”

The event trigger was “on clipping” of kind “web archive” - I’m trying to convert Web archives I’ve clipped over the last few years to PDFs.

OCR wouldn’t convert webarchives to PDF ?

OCR wasn’t available as a direct option. Selecting a webarchive directly, and right-clicking and selecting “OCR->” from the menu shows only grayed-out options.

Right clicking it and “convert-> to PDF” (which I hadn’t tried before) in fact creates a new PDF+Text item. (So, the original issue with OCR in 3b2 moving items to the global inbox, which still happens, no longer needs to be solved for this particular rule).

So, I suppose 1) I don’t need to OCR webarchive files, merely apply a rule that converts webarchives to PDF+Text, but now I’m trying to figure out how to 2) set the Date Created and Date Modified back to those of the original document, and 3) move the webarchive to trash.

(Oh, and how to copy over the original URL and metadata. This is no longer a bug report. sorry)

Conversion from webarchive to PDF preserves the metadata, etc.


Note, this creates a paginated PDF.

It’s also possible to use the Execute Script > Embedded (only using embedded for convenience here) with this code…

on performSmartRule(theRecords)
	tell application id "DNtp"
		repeat with theRecord in theRecords
			create PDF document from (URL of theRecord as string) in (current group) without pagination
		end repeat
	end tell
end performSmartRule
1 Like

Bluefrog - many, many thanks. That works as expected, and I can tinker with it as necessary.

Many, many welcomes back to you. Glad to help! :slight_smile: