Exporting search results with context

I’d like to do something, but don’t know if its possible with DTPO or some scripting:

  1. imagine I’ll do a search, get several results; I select one of the results (eg. a PDF document) and I visualize it inside DTPO with the search terms highlighted in orange;

  2. the question: can I export (to txt for eg.) the parts of the document near the highlighted word? For example, 5 words before and 5 after?

Imagine I search for “ipsum” inside a document with the following text:

Etiam vitae arcu volutpat, tincidunt augue quis, euismod nisl. Mauris elit ante, maximus vitae tristique nec, volutpat non odio. Fusce auctor venenatis turpis, rutrum aliquam felis. Sed dictum vehicula tempus. Suspendisse sed tristique ipsum. Maecenas luctus suscipit arcu, vel bibendum nulla ornare sit amet. Nullam nec aliquam ex. Phasellus ultricies lacinia luctus. Quisque non elementum sapien. Proin eleifend sed justo vitae sagittis. Donec nisi dui, lacinia vel justo eget, posuere pretium mi. Suspendisse faucibus hendrerit auctor. Morbi non velit ut sem interdum gravida. Praesent sagittis eleifend pretium. Praesent feugiat tellus mauris, id mollis nisi tincidunt et. Aliquam erat volutpat. Cras ipsum libero facilisis dolor dignissim finibus. Nam pretium interdum urna, nec eleifend elit pharetra eget. Morbi fermentum quam eget sapien gravida, a tincidunt orci pretium. Proin hendrerit dapibus consectetur. Pellentesque nec rhoncus turpis. Vivamus in consequat lorem.

I’d like to get a TXT file containing the following:

(…) vehicula tempus. Suspendisse sed tristique ipsum. Maecenas luctus suscipit arcu, vel (…)

(…) et. Aliquam erat volutpat. Cras ipsum libero facilisis dolor dignissim finibus (…)

Maybe some scripting? Any ideas? Thank you!

A future release will support this, in the meantime the only possibility is to use AppleScript. Here’s a simple example, just select one or more items and enter the word (or a regular expression) to summarize the items:


-- Summarize items

tell application id "DNtp"
	try
		set theSelection to the selection
		if theSelection is {} then error "Please select some contents."
		
		repeat
			set regEx to display name editor "Summarize Items" info "Regular Expression:"
			if regEx is not "" then exit repeat
		end repeat
		
		set theResult to ""
		repeat with theRecord in theSelection
			set theText to plain text of theRecord
			try
				set theSummary to do shell script "echo " & quoted form of theText & " | grep -i " & quoted form of regEx
				set theResult to theResult & theSummary & return & return
			end try
		end repeat
		if theResult is not "" then create record with {name:"The Summary", type:text, content:theResult} in current group
	on error error_message number error_number
		if the error_number is not -128 then display alert "DEVONthink Pro" message error_message as warning
	end try
end tell

Thanks Christian. The script is very interesting (altought I think it does not support DT intelligent operators (like “NEAR”, near with number of words)…
Otherwise works great in well formatted PDFS (I have some OCRs with old documents which I guess may be malformed which turn out bad results, but I guess its my fault).
Eager anticipating DTPO 3.0 ! :slight_smile:

‘Regular expressions’ are a near standard way for constructing complicated search strings. They are different from DT’s built in search operators. They can be extraordinarily mind bending when they are complicated but learning how to build regular expressions to a level where they will do most of what you need is not hard. The web has lots of good sites for learning regular expressions and lots of online and offline tools for building and testing them.

Frederiko

Thanks Frederiko, I do use RegEx to a certain extent (mainly search and replace) but I guess I can always deepen the knowledge :slight_smile: Thanks for the links!

Ah, regular expressions! The bane of my CS students. Just when you think you understand the mechanics and test an obviously correct expression, it blows up in your face. Of course I’m talking about the more complex expressions.