Inspector Search in files with long lines

Hello,

Is there any way to make the inspector (search on the right hand side of DT3) show me all the instances of a search term?

It currently tells me there are x number of results, but when I hit the next button, it doesn’t take me to the results. It highlights them in yellow, but I have to scroll down the document to find them or open the document in a text editor and then search there.

What I would like to get is a quick summary where it captures the phrase and shows that to me with the words and when I click on the phase, it will take me to the exact instance.

In my case, the transcript is a long file in json format so all the lines have been saved with \n rather than a real carriage return or end of line delimiter.

This seems to cause DT3 to be unable to make use of the search function.

Thank you for any ideas I can use to make the search faster and more useful.

I do notice that if I use the “concordance tools” there is a < A and A > icon just above the document that can be used. Unfortunately, those same button have no effect when used with the search panel. Perhaps this is because the search inspector/panel has the < and > already?

In any case, I would like the find next to be super obvious across all panels and I would use the concordance, but it has too many words and no way to search and find the word I want in order that I can use the < A and A > buttons that are supported in that panel.

What would be really nice is if I type the search term in the main search bar and not even need to use the inspector. What I would like to see is a search result that includes excerpts of the text instead of documents. A different view. I don’t really care which document until I find the right phrase… So, if the results of the search were in a format where I can just scan all the text results captured, then once I see the right ones, I would then perhaps open the document and get the file details. The current search is a bit backwards for me. And now that I’m starting to see ChatGPT style queries everywhere, it’s becoming annoying a bit that I have to scroll and look for highlights. Maybe a ChatGPT integration is possible so we can use natural language and get a summary of our documents?

Actually I think I was using the search wrong.

Since I see the < A and A > buttons now above the document view, they do work for the main search and let me cycle through all the results.

What doesn’t work are the < and > that are in the search inspector at the right.

I was attempting to use them and they only find the first instance.

And now that I’m starting to see ChatGPT style queries everywhere, it’s becoming annoying a bit that I have to scroll and look for highlights. Maybe a ChatGPT integration is possible so we can use natural language and get a summary of our documents?

A ChatGPT integration is highly unlikely. As noted in another thread…

ChatGPT is like a child magician. Sometimes it will surprise you, but often you just politely clap while rolling your eyes. :roll_eyes::stuck_out_tongue:

In my case, the transcript is a long file in json format so all the lines have been saved with \n rather than a real carriage return or end of line delimiter.

Why don’t you scrub the text and remove the \n instances?

I’m either misunderstanding or you’re reporting abnormal behavior…

I’m seeing the same behavior in a plain text document, essentially what JSON files are.

Cool. It works for you.

For me, that’s not the case, but as I said, I can use the left/right buttons with the A in them in your example video above the PDF document.

For GPT-3, I’ve been using the https://chat.openai.com/chat
It does a great job of pasting in a document and getting as summary, a bullet list, checking the document for errors, etc.

But because they don’t let me paste long form documents, it’s not helpful to give me summaries of things I have in DEVONthink.

It also can’t help me find stuff since it doesn’t see the entire database.

I think it would do very well for requests such as, “show me all of my documents where I mentioned xyz. give me a list in bullet format and include the url links to those locations.”

Probably could do things fancier like automation since it knows how to write code, it could use the API to produce a document on a regular interval or generate PDFs and handle backups, etc.

Oh, yes… Because I’m using an index of Descript.com transcripts. So I can’t modify the json files. They are only indexed from the Descript folder by DT3. They are not files I have imported that I can change.

Also, I have 1000’s of them and sort of need them to stay in Descript so that I can edit them and click on a word and go right to the video at that location.

So, why have a GPT3 engine in DT3?

  1. It can find errors in documents because it understand language. For example, transcripts can be corrected where the computer messed up the words or where the human that corrected the words has made spelling mistakes. In some cases, even logic errors can be identified in the documents.
  2. It can summarize documents in various ways (text, bullet point, etc.)
  3. It can be used for general search with natural language.
  4. It can be useful in writing scripts and automating tasks.

From the examples I saw here, I wouldn’t agree. The scripts are (in the best case) CS 101 stuff. In the worst case, they are wrong and not working. Better learn to script than learn to teach that software.

Edit And then there’s working code that is bad. Which is, in my opinion, just as bad as broken code: It doesn’t teach people how to write good programs/scripts.

3 Likes

As you like. I’m sure you work with good developers. I’ve worked with many and I find there are very few that can produce good code. Even Apple and Google and Microsoft mess this up regularly and we know that 1/3 of students of CS 101 fail…

The other three ideas I posted are perhaps worthy of GPT’s skills.

I think you’re giving GPT (and by extension other similar technologies) more credit than its due. I would spend some time reading the Limitations and Risks sections of this page…

I think you might want to look at this differently.
Let’s go through this one step at a time and see if there is value.

  1. Step 1: Predicting the word the user is typing = autocorrect
  2. Step 2: Finishing a sentence or responding to a chat message with a predictable answer
  3. Step 3: Predicting an entire paragraph or solving a math problem
  4. Step 4: Writing a poem, rhyming words, an essay, etc.
  5. Step 5: Writing code, solving errors etc.
  6. Step 6: Writing music, drawing, telling robots what to do by taking English instructions and writing code instantly to make the robot do what the user asked.

You get the idea. The thing is now able to pass many exams better than humans. But it’s still doing the same thing. Just predicting the next word.

I wanted to see how long it would take me to write an AppleScript that would let me summarize a document in long form in DT3 using ChatGPT.

Here’s my work. Just get your API key and plug it into the code. See the url in the code

First one will just summarize the selected document. If you want to see how it works, just uncomment all the display dialog messages.

Second one will summarize a long form document by splitting it into pieces. It will write a bunch of files to your desktop (the prompt, the curl commands it executes and the responses) and then it writes a DT3 document at the end that combines all of that. Feel free to change the prompt. Check out WroteScan.com to see some examples and understand how map reduce works on a long form PDF. My script isn’t that sophisticated (yet).

I wrote these with the help of ChatGPT. I ask it to write the code and it writes the code and I test it, give it errors and it gives me ideas or solutions to improve. Basically a little coding buddy that sits next to me and keeps me company. I don’t care if he’s stupid. He’s polite and doesn’t tell me to go pound sand. And it only took a few hours of my Saturday. So when I see DT3 team saying they will not integrate ChatGPT, that’s fine, but it’s not because they can’t do it or that it’s not bloody easy. It’s because they already gave us all the tools to do it ourselves. Easy peasy. Now you can analyze all of your DT3 files in any way you wish. Add some prompts and have fun.

Companies and people that think this isn’t the best tech ever created are probably not understanding the value. Such a simple thing… Predict the next word and it can be so powerful… Descript.com was able to get funding from OpenAI to start integrating their software with ChatGPT. Having video transcripts and perhaps eventually video visual recognition in Descript with a little chat window for asking questions will be excellent. I think it’s a great idea for DT3 also and you you can build it yourself.

-- display dialog "The following "

-- get the selected document in DEVONthink
tell application id "DNtp"
	set theSelection to the selection
	if theSelection is {} then error "Please select a document in DEVONthink."
	set theDocument to item 1 of theSelection
	set theText to plain text of theDocument
	
end tell

-- set the OpenAI API endpoint and API key
set theAPIEndpoint to "https://api.openai.com/v1/completions"
set theAPIKey to "PutYourAPIKeyHere" -- Go here: https://platform.openai.com/account/api-keys


on replace_chars(theText, searchList, replacementList)
	set oldDelims to AppleScript's text item delimiters
	set AppleScript's text item delimiters to the searchList
	set newText to text items of theText
	set AppleScript's text item delimiters to the replacementList
	set newText to newText as text
	set AppleScript's text item delimiters to oldDelims
	return newText
end replace_chars

set prompt to do shell script "echo " & quoted form of theText & " | tr '

' ' ' | sed 's/[^[:alnum:] ]//g'"
--set prompt to "Tell me a story in 10 words"
set prompt to "What can you tell me about this text: " & prompt

-- struggling with this. will look into it later. not used.
set requestOptions to "{
  \"model\": \"gpt-3.5-turbo\",
  \"messages\": [
    {
      \"role\": \"user\",
      \"content\": " & quoted form of prompt & "
    }
  ],
  \"temperature\": 0.7,
  \"top_p\": 1,
  \"frequency_penalty\": 0,
  \"presence_penalty\": 0,
  \"max_tokens\": 200,
  \"stream\": false,
  \"n\": 1
}"
set headers to "{
  \"Content-Type\": \"application/json\",
  \"Authorization\": \"Bearer " & theAPIKey & "\"
}"

--display dialog requestOptions
--display dialog headers



set command to "curl --silent \"https://api.openai.com/v1/chat/completions\" -H \"Authorization: Bearer " & theAPIKey & "\" -H \"Content-Type: application/json\" -d \"{\\\"model\\\": \\\"gpt-3.5-turbo\\\", \\\"messages\\\": [{\\\"role\\\": \\\"user\\\", \\\"content\\\": \\\"" & prompt & "\\\"}] }\""

--display dialog command 

-- execute the curl command and get the response
set theResponse to do shell script command

set clean_response to do shell script "echo " & quoted form of theResponse & " | perl -pe 's/\\\\([\"\\\\\\/bfnrt]|u[0-9a-fA-F]{4})/\"\\1\"/g'"

--display dialog clean_response

set message_content to do shell script "echo " & quoted form of clean_response & " | /opt/homebrew/bin/jq -r '.choices[0].message.content'"

--display dialog message_content



-- create a new record in DEVONthink with the response
tell application id "DNtp"
	--	display dialog message_content
	set currentDate to current date
	set theYear to year of currentDate -- extract the year from the date object
	set theMonth to month of currentDate -- extract the month from the date object
	set theDay to day of currentDate -- extract the day from the date object
	
	set dateString to theYear & "-" & theMonth & "-" & theDay
	
	set newRecordName to name of theDocument & " - " & dateString & " - GPT Response"
	
	
	--	display dialog newRecordName
	
	create record with {name:newRecordName, type:txt, content:message_content} in current group
	
end tell

Here is the second script if you have a longer document and want to collect the summaries into one document (map reduce style method). After you have this, if the summary isn’t too long, you can enter a new prompt for the method above (or you can rerun this one again as many times as you need to get the summary down to a size that suits you.

-- get the selected document in DEVONthink
tell application id "DNtp"
	set theSelection to the selection
	if theSelection is {} then error "Please select a document in DEVONthink."
	set theDocument to item 1 of theSelection
	set theText to plain text of theDocument
	
end tell

-- set the OpenAI API endpoint and API key
set theAPIEndpoint to "https://api.openai.com/v1/chat/completions"
set theAPIKey to "PutYourAPIKeyHere" -- Go here: https://platform.openai.com/account/api-keys

set cleanText to do shell script "echo " & quoted form of theText & " | tr '

' ' ' | sed 's/[^[:alnum:] ]//g'"

-- set the block size and prompt
set blockSize to 2000 -- number of characters per block

-- split cleanText into blocks of blockSize characters
set blockList to {}
set textLength to length of cleanText
set startIndex to 1
set endIndex to blockSize
repeat while startIndex < textLength
	if endIndex > textLength then set endIndex to textLength
	set currentBlock to text startIndex thru endIndex of cleanText
	set end of blockList to currentBlock
	set startIndex to endIndex + 1
	set endIndex to startIndex + blockSize - 1
end repeat

set prompt to "Please summarize the following text block: "

-- send each block to the OpenAI API with the prompt
set responseList to {}
repeat with i from 1 to count of blockList
	
	set currentBlock to item i of blockList
	set outputFolder to POSIX path of (path to desktop folder) -- Set the output folder to the user's desktop
	set filename to outputFolder & "response" & i & ".txt"
	--display dialog filename
	
	-- Build the curl command with output redirection
	set curlCmd to "curl --silent \"" & theAPIEndpoint & "\" -H \"Authorization: Bearer " & theAPIKey & "\" -H \"Content-Type: application/json\" -d \"{\\\"model\\\": \\\"gpt-3.5-turbo\\\", \\\"messages\\\": [{\\\"role\\\": \\\"user\\\", \\\"content\\\": \\\"" & prompt & currentBlock & "\\\"}] }\"  | /opt/homebrew/bin/jq -r '.choices[0].message.content' > " & quoted form of filename
	
	--display dialog curlCmd
	set outputFolder to (path to desktop folder) as string -- Set the output folder to the user's desktop - Finder needs this syntax instead of the posix syntax
	set commandFile to outputFolder & "command" & i & ".txt"
	set cmdFileRef to open for access file commandFile with write permission
	write curlCmd to cmdFileRef
	close access cmdFileRef
	
	set promptFile to outputFolder & "prompt" & i & ".txt"
	set pmtFileRef to open for access file promptFile with write permission
	write prompt & currentBlock to pmtFileRef
	close access pmtFileRef
	
	
	-- Run the curl command in a separate shell
	do shell script curlCmd & " &"
	
	-- Store the filename for later retrieval
	set end of responseList to filename
	
end repeat

-- Wait for all curl commands to complete
repeat with i from 1 to count of responseList
	set filename to item i of responseList
	
	-- Wait for the file to exist
	try
		set theResponse to (read file filename)
		exit repeat
	on error
		delay 1
	end try
	
end repeat

-- Read the contents of the output files in order
repeat with i from 1 to count of responseList
	
	
	set filename to item i of responseList
	set posixFileRef to POSIX file filename
	set finderFileRef to posixFileRef as alias
	set theResponse to (read finderFileRef)
	set item i of responseList to theResponse
end repeat


-- create a new record in DEVONthink with the response
tell application id "DNtp"
	
	set currentDate to current date
	set theYear to year of currentDate -- extract the year from the date object
	set theMonth to month of currentDate -- extract the month from the date object
	set theDay to day of currentDate -- extract the day from the date object
	
	set dateString to theYear & "-" & theMonth & "-" & theDay
	
	set newRecordName to name of theDocument & " - " & dateString & " - GPT Response"
	set responseText to responseList as string
	create record with {name:newRecordName, type:txt, content:responseText} in current group
	
end tell

6 Likes

Too bad there’s no direct integration of DT3 with ChatGPT.

@jsn , how do we use your scripts?

1 Like

To use the scripts, you can just place them in your script folder.
You can find this folder by clicking on the scripts icon next to help in the DT3 main menu.

You can run them by using the scripts menu.
There is some documentation in DT3 for how to give them a hotkey by naming the file as you wish.

Or, if you have Keyboard Maestro, you can put them into a hotkey as an “Execute an Applescript” command. I find KM more pleasing because I can add comments and try different hotkeys and activate/deactivate the keystrokes. Also it’s useful to see all the shortcuts, although you can use KeyCue for this, which some may prefer.

I’ve managed to get the scripts installed in my scripts folder, and they’re active in the menu, but I can’t find where they output the summaries once I’ve run them. (I’ve never used the scripts before.)

Did you check what is in the “Today” smart group?

Yeah, nothing there. (Also no evidence of activity in the Activity or Log windows when I run the script.)

Select something in DT and run the script in Script Editor. That should give you an indication what’s happening (or not)

Thanks. Now at least I can see there’s a problem, though don’t know what this means: “sh: /opt/homebrew/bin/jq: No such file or directory” Any idea how to fix? (I just put this in the scripts folder–scripts menu to the left of Help menu in DT.)

Install jq

Aside: that’s why I am against using third-party tools in scripts

2 Likes

You should PM the author of the script about this.

1 Like