Hey there,
08.04.2019
Version 1.1
What’s new:
-
Function «is finereader controller active» is used (more stable)
-
Made a Progress Bar (% of progress is from the FR function “get document progress”)
- to view the progress as a dialog - save the script as the applet;
- to view the progress in Menubar - use the script from the Menu bar;
- using it from KM or other apps or launchers make the Progress Bar invisible;
-
Added a file type handler:
- if one or more files, which you’ve chosen in DTPO are not PDF files, you get the dialog, allowing you to Skip this file, or Stop the script;
- if you do nothing, script proceeds with the default answer (skip the non-PDF file) after a 5 seconds timeout.
-
Indicated openly where you need to modify the script to tailor it to you:
- Setup FR recognizing parameters
- Setup the Temporary folder
28.03.2019
Version 1.0
I’ve just finished a script allowing to OCR multiple PDFs from DTPO.
What is new comparing to the existing method (I used ABBYY FR pro 12):
- Better picture quality along with the smaller resulting file size
- Recognition goes faster and quality of recognition is better
- Automatic outline in resulting PDF
- “Aliases” and “Exclude from…” metadata are also preserved from the original PDF
- Many other tweaks which you may do manually in script (like paper format, embedded fonts, recognition languages and etc.)
Here is the script:
use AppleScript version "2.4"
use scripting additions
tell application id "DNtp"
try
set theSelection to the selection
set theNumber to 0
if theSelection is not {} then
show progress indicator "Recognizing..." steps (count of theSelection)
-- Set Up Your FineReader Recognizing Preferences Here:
using terms from application "FineReader"
set langList to {Russian, English}
set PdfLayout to text under image
set saveType to same files as source
set CreateOutlineboolean to yes
set UseMRCboolean to yes
set KeepPageNumberHeadersAndFootersBoolean to yes
set EnablePDFTaggingboolean to yes
set KeepTextandBackgroundColorsboolean to yes
set EmbedFontsboolean to yes
set KeepPicturesboolean to yes
set ImageQuality to high quality
-- set PageSize to A4
end using terms from
repeat with theRecord in theSelection
set theName to (name of theRecord) as string
step progress indicator theName
if cancelled progress then exit repeat
set theType to type of theRecord
if theType is PDF document then
set oldName to theName & "_old"
set name of theRecord to oldName
set inPath to path of theRecord
-- Set Up Your Temporary Folder Here:
set outPath to "/Users/ilya/Documents/00_Temp/" & theName & ".pdf"
set theNumber to theNumber + 1
tell application "FineReader"
repeat until is finereader controller active
delay 1
end repeat
export to pdf outPath from file inPath ¬
ocr languages enum langList ¬
export mode PdfLayout ¬
saving type saveType ¬
create outline CreateOutlineboolean ¬
use mrc UseMRCboolean ¬
keep page numbers headers and footers KeepPageNumberHeadersAndFootersBoolean ¬
enable pdf tagging EnablePDFTaggingboolean ¬
keep text and background colors KeepTextandBackgroundColorsboolean ¬
embed fonts EmbedFontsboolean ¬
keep pictures KeepPicturesboolean ¬
image quality ImageQuality ¬
-- page size PageSize
set isBusy to true
tell me to set progress total steps to 100
tell me to set progress description to "Recognizing PDF: " & theNumber & " from " & (count of theSelection)
repeat until isBusy is false
delay 1
set theProgress to get document progress
tell me to set progress completed steps to theProgress
tell me to set progress additional description to theName & ": " & theProgress & "%..."
set isBusy to (is busy) as boolean
end repeat
end tell
delay 1
try
set theParents to parents of theRecord
set thePDF to import outPath to (item 1 of theParents)
repeat with i from 2 to (count of theParents)
replicate record thePDF to (item i of theParents)
end repeat
set addition date of thePDF to addition date of theRecord
set aliases of thePDF to aliases of theRecord
set attached script of thePDF to attached script of theRecord
set comment of thePDF to comment of theRecord
set creation date of thePDF to creation date of theRecord
set exclude from classification of thePDF to exclude from classification of theRecord
set exclude from search of thePDF to exclude from search of theRecord
set exclude from see also of thePDF to exclude from see also of theRecord
set exclude from tagging of thePDF to exclude from tagging of theRecord
set label of thePDF to label of theRecord
set locking of thePDF to locking of theRecord
-- set modification date of thePDF to modification date of theRecord
-- set opening date of thePDF to opening date of theRecord
set state of thePDF to state of theRecord
set tags of thePDF to tags of theRecord
set URL of thePDF to URL of theRecord
delete record theRecord
end try
tell application "Finder" to delete outPath as POSIX file
tell me to set progress total steps to 0
tell me to set progress completed steps to 0
tell me to set progress description to ""
tell me to set progress additional description to ""
else
display dialog "File: " & theName & " is not a PDF file" buttons {"Skip File", "Stop Script"} default button "Skip File" with icon caution giving up after 5
if the gave up of the result is true or button returned of the result is "Skip File" then
set theNumber to theNumber + 1
else
exit repeat
end if
end if
end repeat
hide progress indicator
end if
on error error_message number error_number
hide progress indicator
if the error_number is not -128 then display alert "DEVONthink Pro" message error_message as warning
end try
end tell
tell application "FineReader"
repeat until is finereader controller active
delay 1
end repeat
quit
end tell