Rename and divide PDF

I have made the coolest script of Devon :

  1. you scan to pdf a lot of papers.
  2. assume OCR
  3. on the first page, you select the title of the doc : the script rename the document with selection
  4. on the X page, you select the title of the dox : the script split the pdf and rename the new pdf with selection
  5. go back to 4).

Everything’s fine but quite slow and unstable : as I cannot find the applescript function to split the pdf. So I cheat by assigning split to keystroke and make system events to do it.

try
	tell application "DEVONthink Pro"
		activate
		if not (exists think window 1) then error "Aucune fenêtre ouverte."
		if not (exists content record) then error "Aucun document sélectionné."
		set nameofnewrecord to selected text of think window 1 as string
		if nameofnewrecord is missing value or nameofnewrecord is "" then error "Aucun texte sélectionné."
		set nbpage to current page of think window 1
		if current page of think window 1 > 0 then
			tell application "System Events" to keystroke "<" using command down
			delay 1
		end if
		set name of content record to nameofnewrecord
	end tell
on error error_message number error_number
	if the error_number is not -128 then display alert "DEVONthink Pro" message error_message as warning
end try

Do you have a idea to improve this script ?

Hi antonie,

You might find this terrific script from bosie to be helpful -

[url]Split PDF by page count or TOC]

Its a great example of how to use pdfsam to split up pdfs. You could adapt the techniques to your example.

Frederiko

Hi fredicio,

thank you for your contribution. I’ve already done my duty by searching the forum.

pdfsam is not interesting, because the split function already exists in Devonthink. And the script you propose is too complex to be efficient (temporary folder etc…)

Best regards.

a.

I really liked the idea behind your script antonie but I couldnt get it to work reliably so I reworked it a little so as not to rely on DT’s built in split function which I find fails fairly regularly.

Its really useful to be able to break up a pdf, say a long book, into chapters as you read through it on a ‘running’ basis.

Usage:
1.Name, add tags and comments to pdf
2.Scroll down to following pages where split is required.
3.Run script
4.Repeat until the pdf is completely broken up

This script requires both Java to be installed and a very powerful java pdf manipulation command line tool called sejda. Instructions are included in the comments to the script to get both java and sejda running.

-- Purpose: Move through a pdf splitting it up into smaller documents.
-- Splits pdf at currently open page. The new document comprising all the pages of the pdf following on will be placed in a new document with the name determined either by the highlighted text or if no text is highlighted by the name chosen in the dialog. This new document will be opened so further splits can be made
-- All the pages prior to the current page will remain in the previously named document.
--Usage: Open a pdf and run the script when the page is displayed where the script is too take place
-- Dependencies: Requires Java to be installed and accessible from the command line. 
-- Requires the sedja java utility

-- To install Java so it is accessible from the command line:
(* Download from Oracle: http://java.com/en/download/mac_download.jsp?locale=en

Verify that it's installed properly by looking in System Prefs:

Command-Space to open Spotlight, type 'System Preferences', hit enter.
Click Java icon in bottom row. After the Java Control Panel opens, click 'Java' tab, 'View...', and verify that your install worked. You can see a 'Path' there also, which you can sub into the commands below in case they are different than mine.
Verify that the version is as you expect (sub in your path as needed):

/Library/Internet\ Plug-Ins/JavaAppletPlugin.plugin/Contents/Home/bin/java -version


(be careful to get the slashes the correct way around. They are important)

Create link from /usr/bin/java to your new install

sudo ln -fs /Library/Internet\ Plug-Ins/JavaAppletPlugin.plugin/Contents/Home/bin/java /usr/bin/java

Sanity check your version:

java -version

courtesy of: http://stackoverflow.com/questions/12757558/installed-java-7-on-mac-os-x-but-terminal-is-still-using-version-6
*)

-- Get the sedja java application from http://www.sejda.org and unzip it taking note of the directory where you unzipped it to

-- Credits: Based upon Bosie's much more elegant script using pdfsam https://discourse.devontechnologies.com/t/split-pdf-by-page-count-or-toc/16523/1

-- Put in the correct path here to the directory where the sedja console has been placed"
set sejda to "/Users/[NAME]/Downloads/sejda-console-1.0.0.M9" & "/bin"

try
	tell application id "DNtp"
		activate
		if not (exists think window 1) then error "A pdf document must be selected and open"
		
		set nameofnewrecord to selected text of think window 1 as string
		if nameofnewrecord is missing value or nameofnewrecord is "" then
			set nameofnewrecord to display name editor "Name of New Document" info "Name of New Document"
		end if
		
		set nbpage to (current page of think window 1) + 1
		if nbpage > 1 then
			set thewindow to think window 1
			set this_item to content record of think window 1
			set thisitem_name to name of this_item
			set originalTab to (item 1 of (tabs of think window 1))
			set thisitem_filename to thisitem_name
			set filenamesuffix to ((characters -1 thru -4) of thisitem_filename) as string
			
			if filenamesuffix as string is not ".pdf" then
				set thisitem_filename to thisitem_filename & ".pdf"
			end if
			set AppleScript's text item delimiters to ""
			set tmppath to (path to temporary items folder as string)
			set tmppath to POSIX path of tmppath
			set filePath to get path of this_item
			show progress indicator "splitting pdf" steps -1
			set cmd to "cd " & sejda & "; ./sejda-console splitbypages -f " & quoted form of filePath & " -o " & tmppath & " -n " & (nbpage as string)
			do shell script cmd
			set ee to (path of this_item)
			set ff to quoted form of ee
			set cmd to "mv -f " & quoted form of (tmppath & "1_" & thisitem_filename) & " " & quoted form of (ee)
			do shell script cmd
			set newrecord to (import tmppath & nbpage & "_" & thisitem_filename name nameofnewrecord to current group)
			open tab for record newrecord in thewindow
			delay 1
			close (current tab of think window 1)
			do shell script "rm " & quoted form of (tmppath & nbpage & "_" & thisitem_filename)
			hide progress indicator
		end if
		
	end tell
on error error_message number error_number
	do shell script "rm " & quoted form of (tmppath & nbpage & "_" & thisitem_filename)
	if the error_number is not -128 then display alert "DEVONthink Pro" message error_message as warning
end try

[Edit:2014/11/26] Addition of progress bar (sejda can take a while with big pdfs) and fix to properly display most recent pdf
[Edit 2015/01/18] Bugs fixed

Frederiko

This is the solution !