How to extract text in the first page of a pdf?

Working on refining my scripts at spare time. I am only aware of the command to extract the full text of a pdf. How can I extract only the text on the first page of a pdf? I understand that a workaround is to get the first n paragraphs of the rich text content, but get the text in the first page is my goal…

Thank you in advance.

	set {a} to item 1 of {selection}
	set b to (rich text of a)

DEVONthink doesn’t support scripting of PDF pages, a third-party tool might be able to do this.

Thanks. It’s OK, I’ll extract the first n hundreds of words OR first n paragraphs in the text content of the pdf as a proxy - it’s good enough for me.
I am reluctant to use too many different tools to achieve any task. I think DT3 + Better Touch Tool + Text Expander are already giving me almost all I need given that all my tasks are within DT3. The rest is just adjusting the workflow and find a workaround by AppleScript.

Related to this, is it possible to use a content-driven query to set the boundaries of the extracted text? For instance if I want to extract the text that lies between “Abstract” and “Introduction?”

(Please be gentle. My scripting skills are rudimentary at best.)

The next release will support this via smart rules actually:

This example would rename the item using the extracted text.

1 Like

Oooh, nice!

What I actually want to do is extract the Abstract (and title) out to a separate document, with an eye toward manipulating a folder full of abstracts with other tools.

Katherine

That would likely require scripting to accomplish.

I suspected as much, hence my original question.

Katherine

Can you post a few screen captures of documents you’d be processing ?

Sure. At least for the immediate need, they’re all from the same conference and follow the same format. These are recent, so they all come with a text layer. (OCR not needed.)

Katherine