Copy first line of PDF and append to filename

Hi,

I am puzzling over a problem:

I have a huge amount of PDFs (5000+) that originate from OCR’d newspaper clippings. The first line of text in each PDF contains the title of the article.
To rename these files into a more human-readable scheme, I would like to copy the first line of text in each PDF file (alternatively the first X characters and append this string to the existing filename.

Is there any way of doing this in Applescript?

Thanks a lot,
Marcel

This seems to work for me. The if is there to bypass blank lines I’ve found in some of my documents. Ideally, I’d do a regexp or something, but I don’t know how to do that in AppleScript.


tell application "DEVONthink Pro"
	set selectionList to selection
	repeat with i in selectionList
		repeat with aParagraph in (paragraphs of (rich text of i))
			if ((count of characters of aParagraph) > 2) then
				set name of i to aParagraph as text
				exit repeat
			end if
		end repeat
	end repeat
end tell

Works great, thanks! Didn’t realize it was that easy.
Marcel

And FWIW, if you wanted to screen out a set of predictably uninteresting first lines, you could list regexes describing them at the start of the script, and use something broadly along the lines of :

property plstJunkLines : {"^Sign in$", "^Register$", "^larger$", "^smaller$", ¬
	"^(Dear|Monday|Tuesday|Wednesday|Thursday|Friday|Saturday|Sunday)", ¬
	"^Thank you", "REILLY"}

property pMax : 4 * (10 ^ 6) -- Max byte size - some PDFs are just a bit too big and slow to process automatically this way

set strSkip to ""
repeat with oJunk in plstJunkLines
	set strSkip to strSkip & "|" & oJunk
end repeat

tell application id "DNtp"
	set {dlm, my text item delimiters} to {my text item delimiters, linefeed}
	repeat with oDoc in selection as list
		tell oDoc
			if type is PDF document then
				if size < pMax then
					set strLines to (paragraphs of ((its plain text) as string)) as text -- prepare line delimiters for shell
					try
						set strLine to (do shell script "echo " & ¬
							quoted form of (strLines) & ¬
							" | perl -ne 'if (!(m/^.{0,3}$" & strSkip & "/)) {print \"$_\"; exit}' ")
						if strLine ≠ {} then set its name to strLine & ".pdf"
					end try
				end if
			end if
		end tell
	end repeat
	set my text item delimiters to dlm
end tell