Inspector Search in files with long lines

jsn · April 8, 2023, 6:48pm

I think you might want to look at this differently.
Let’s go through this one step at a time and see if there is value.

Step 1: Predicting the word the user is typing = autocorrect
Step 2: Finishing a sentence or responding to a chat message with a predictable answer
Step 3: Predicting an entire paragraph or solving a math problem
Step 4: Writing a poem, rhyming words, an essay, etc.
Step 5: Writing code, solving errors etc.
Step 6: Writing music, drawing, telling robots what to do by taking English instructions and writing code instantly to make the robot do what the user asked.

You get the idea. The thing is now able to pass many exams better than humans. But it’s still doing the same thing. Just predicting the next word.

I wanted to see how long it would take me to write an AppleScript that would let me summarize a document in long form in DT3 using ChatGPT.

Here’s my work. Just get your API key and plug it into the code. See the url in the code

First one will just summarize the selected document. If you want to see how it works, just uncomment all the display dialog messages.

Second one will summarize a long form document by splitting it into pieces. It will write a bunch of files to your desktop (the prompt, the curl commands it executes and the responses) and then it writes a DT3 document at the end that combines all of that. Feel free to change the prompt. Check out WroteScan.com to see some examples and understand how map reduce works on a long form PDF. My script isn’t that sophisticated (yet).

I wrote these with the help of ChatGPT. I ask it to write the code and it writes the code and I test it, give it errors and it gives me ideas or solutions to improve. Basically a little coding buddy that sits next to me and keeps me company. I don’t care if he’s stupid. He’s polite and doesn’t tell me to go pound sand. And it only took a few hours of my Saturday. So when I see DT3 team saying they will not integrate ChatGPT, that’s fine, but it’s not because they can’t do it or that it’s not bloody easy. It’s because they already gave us all the tools to do it ourselves. Easy peasy. Now you can analyze all of your DT3 files in any way you wish. Add some prompts and have fun.

Companies and people that think this isn’t the best tech ever created are probably not understanding the value. Such a simple thing… Predict the next word and it can be so powerful… Descript.com was able to get funding from OpenAI to start integrating their software with ChatGPT. Having video transcripts and perhaps eventually video visual recognition in Descript with a little chat window for asking questions will be excellent. I think it’s a great idea for DT3 also and you you can build it yourself.

-- display dialog "The following "

-- get the selected document in DEVONthink
tell application id "DNtp"
	set theSelection to the selection
	if theSelection is {} then error "Please select a document in DEVONthink."
	set theDocument to item 1 of theSelection
	set theText to plain text of theDocument
	
end tell

-- set the OpenAI API endpoint and API key
set theAPIEndpoint to "https://api.openai.com/v1/completions"
set theAPIKey to "PutYourAPIKeyHere" -- Go here: https://platform.openai.com/account/api-keys


on replace_chars(theText, searchList, replacementList)
	set oldDelims to AppleScript's text item delimiters
	set AppleScript's text item delimiters to the searchList
	set newText to text items of theText
	set AppleScript's text item delimiters to the replacementList
	set newText to newText as text
	set AppleScript's text item delimiters to oldDelims
	return newText
end replace_chars

set prompt to do shell script "echo " & quoted form of theText & " | tr '

' ' ' | sed 's/[^[:alnum:] ]//g'"
--set prompt to "Tell me a story in 10 words"
set prompt to "What can you tell me about this text: " & prompt

-- struggling with this. will look into it later. not used.
set requestOptions to "{
  \"model\": \"gpt-3.5-turbo\",
  \"messages\": [
    {
      \"role\": \"user\",
      \"content\": " & quoted form of prompt & "
    }
  ],
  \"temperature\": 0.7,
  \"top_p\": 1,
  \"frequency_penalty\": 0,
  \"presence_penalty\": 0,
  \"max_tokens\": 200,
  \"stream\": false,
  \"n\": 1
}"
set headers to "{
  \"Content-Type\": \"application/json\",
  \"Authorization\": \"Bearer " & theAPIKey & "\"
}"

--display dialog requestOptions
--display dialog headers



set command to "curl --silent \"https://api.openai.com/v1/chat/completions\" -H \"Authorization: Bearer " & theAPIKey & "\" -H \"Content-Type: application/json\" -d \"{\\\"model\\\": \\\"gpt-3.5-turbo\\\", \\\"messages\\\": [{\\\"role\\\": \\\"user\\\", \\\"content\\\": \\\"" & prompt & "\\\"}] }\""

--display dialog command 

-- execute the curl command and get the response
set theResponse to do shell script command

set clean_response to do shell script "echo " & quoted form of theResponse & " | perl -pe 's/\\\\([\"\\\\\\/bfnrt]|u[0-9a-fA-F]{4})/\"\\1\"/g'"

--display dialog clean_response

set message_content to do shell script "echo " & quoted form of clean_response & " | /opt/homebrew/bin/jq -r '.choices[0].message.content'"

--display dialog message_content



-- create a new record in DEVONthink with the response
tell application id "DNtp"
	--	display dialog message_content
	set currentDate to current date
	set theYear to year of currentDate -- extract the year from the date object
	set theMonth to month of currentDate -- extract the month from the date object
	set theDay to day of currentDate -- extract the day from the date object
	
	set dateString to theYear & "-" & theMonth & "-" & theDay
	
	set newRecordName to name of theDocument & " - " & dateString & " - GPT Response"
	
	
	--	display dialog newRecordName
	
	create record with {name:newRecordName, type:txt, content:message_content} in current group
	
end tell

Here is the second script if you have a longer document and want to collect the summaries into one document (map reduce style method). After you have this, if the summary isn’t too long, you can enter a new prompt for the method above (or you can rerun this one again as many times as you need to get the summary down to a size that suits you.

-- get the selected document in DEVONthink
tell application id "DNtp"
	set theSelection to the selection
	if theSelection is {} then error "Please select a document in DEVONthink."
	set theDocument to item 1 of theSelection
	set theText to plain text of theDocument
	
end tell

-- set the OpenAI API endpoint and API key
set theAPIEndpoint to "https://api.openai.com/v1/chat/completions"
set theAPIKey to "PutYourAPIKeyHere" -- Go here: https://platform.openai.com/account/api-keys

set cleanText to do shell script "echo " & quoted form of theText & " | tr '

' ' ' | sed 's/[^[:alnum:] ]//g'"

-- set the block size and prompt
set blockSize to 2000 -- number of characters per block

-- split cleanText into blocks of blockSize characters
set blockList to {}
set textLength to length of cleanText
set startIndex to 1
set endIndex to blockSize
repeat while startIndex < textLength
	if endIndex > textLength then set endIndex to textLength
	set currentBlock to text startIndex thru endIndex of cleanText
	set end of blockList to currentBlock
	set startIndex to endIndex + 1
	set endIndex to startIndex + blockSize - 1
end repeat

set prompt to "Please summarize the following text block: "

-- send each block to the OpenAI API with the prompt
set responseList to {}
repeat with i from 1 to count of blockList
	
	set currentBlock to item i of blockList
	set outputFolder to POSIX path of (path to desktop folder) -- Set the output folder to the user's desktop
	set filename to outputFolder & "response" & i & ".txt"
	--display dialog filename
	
	-- Build the curl command with output redirection
	set curlCmd to "curl --silent \"" & theAPIEndpoint & "\" -H \"Authorization: Bearer " & theAPIKey & "\" -H \"Content-Type: application/json\" -d \"{\\\"model\\\": \\\"gpt-3.5-turbo\\\", \\\"messages\\\": [{\\\"role\\\": \\\"user\\\", \\\"content\\\": \\\"" & prompt & currentBlock & "\\\"}] }\"  | /opt/homebrew/bin/jq -r '.choices[0].message.content' > " & quoted form of filename
	
	--display dialog curlCmd
	set outputFolder to (path to desktop folder) as string -- Set the output folder to the user's desktop - Finder needs this syntax instead of the posix syntax
	set commandFile to outputFolder & "command" & i & ".txt"
	set cmdFileRef to open for access file commandFile with write permission
	write curlCmd to cmdFileRef
	close access cmdFileRef
	
	set promptFile to outputFolder & "prompt" & i & ".txt"
	set pmtFileRef to open for access file promptFile with write permission
	write prompt & currentBlock to pmtFileRef
	close access pmtFileRef
	
	
	-- Run the curl command in a separate shell
	do shell script curlCmd & " &"
	
	-- Store the filename for later retrieval
	set end of responseList to filename
	
end repeat

-- Wait for all curl commands to complete
repeat with i from 1 to count of responseList
	set filename to item i of responseList
	
	-- Wait for the file to exist
	try
		set theResponse to (read file filename)
		exit repeat
	on error
		delay 1
	end try
	
end repeat

-- Read the contents of the output files in order
repeat with i from 1 to count of responseList
	
	
	set filename to item i of responseList
	set posixFileRef to POSIX file filename
	set finderFileRef to posixFileRef as alias
	set theResponse to (read finderFileRef)
	set item i of responseList to theResponse
end repeat


-- create a new record in DEVONthink with the response
tell application id "DNtp"
	
	set currentDate to current date
	set theYear to year of currentDate -- extract the year from the date object
	set theMonth to month of currentDate -- extract the month from the date object
	set theDay to day of currentDate -- extract the day from the date object
	
	set dateString to theYear & "-" & theMonth & "-" & theDay
	
	set newRecordName to name of theDocument & " - " & dateString & " - GPT Response"
	set responseText to responseList as string
	create record with {name:newRecordName, type:txt, content:responseText} in current group
	
end tell