Experimenting with OpenAI API for automatic classification and renaming

BLUEFROG · March 26, 2023, 7:05pm

That makes sense given your photography and graphic design background as I recall.

Indeed.

syntagm · March 27, 2023, 12:56am

Can we agree on some common ground and then actually talk about usecases within DEVONthink?

I think we all agree that:

Information is not always accurate and hallucinations are still a problem, so it shouldn’t be blindly trusted
GPT and language models are not sentient
Sensitive data shouldn’t get sent to OpenAI for the same matter that sensitive data shouldn’t be sent to any third-party and
GPT and language models still provide value right now and allow us to do a lot of cool things we weren’t able to do before

And just for fun, to go full meta, I analyzed this thread with ChatGPT

me: summarize the sentiment of this thread in 3 words
ChatGPT: Mixed, cautious optimism.

me: summarize BLUEFROG’s stance in 3 bulletpoints
ChatGPT:

Concerned about non-programmers blindly trusting AI-generated code as good or reliable solutions.

Emphasizes that ChatGPT is merely a predictive text engine and not a sentient entity communicating.

Worries about socio-political biases in AI outputs and people taking them as truth without verification.

me: generate a reply that the user “syntagm” could have written
ChatGPT:
I agree that we should approach AI with caution and be aware of its limitations, including biases and potential misinformation. But it’s also important to recognize the potential benefits it can bring to applications like DEVONthink when used responsibly.

We can explore AI integration while keeping a critical eye on the output and verify its reliability. By doing so, we can enhance certain functionalities and improve the user experience without compromising on the quality of the results.

It’s essential to maintain a balance between embracing the opportunities that AI offers and being mindful of its risks. With the right approach, AI can be a valuable tool that complements human expertise, rather than replacing it.

To continue on the initial thread:

I’ve packaged up my little experiment and put it on github, and also added a Dockerfile + mini http server so you could host this somewhere if you don’t want to deal with golang (I’m running it in my tailscale network, so all my devices can access it). So far it supports “classify” and “name”:

❯ curl -X POST -H "Content-Type: application/json" -d '{"content":"This is my credit card statement","options":["credit card","debit card","utility bill","phone bill"]}' http://host/classify
{"success":true,"result":"credit card"}⏎

❯ curl -X POST -H "Content-Type: application/json" -d '{"content":"This is my credit card statement","fileType": "pdf","exampleNames":["2023-03-water-bill.pdf", "2023-06-electricity-bill.pdf", "numNames": 3]}' http://host/name | json_pp
{
   "names" : [
      "2023-07-credit-card-statement.pdf",
      "2023-Q3-credit-card.pdf",
      "2023-July-credit-card.pdf"
   ],
   "success" : true
}

The tool was 90% generated by GPT4, just did some smaller changes yesterday. You can also use GPT4 if you have access to the beta API, but need to change the model it uses in Name.go and Classify.go.

You can also change the temperature in those files. Higher temperature = more creativity, lower temperature = more predictability. It’s set to 0.5, that was the sweet spot for reliable good names in my testing.

API key can be specified with apikey and organization arguments, or just set it as ENV var if your server runs somewhere internally. You can get an API key from http://platform.openai.com.

I’ve also hacked together a simple applescript for DEVONthink that is using this endpoint for naming. It’s not very glam but does the job (needs jq installed for extracting the JSON response):

property exampleFileNames : {"2023-03 Electricity Bill.pdf", "2023-03 Bank Letter"}

on replace_text(this_text, search_string, replacement_string)
    set prevTIDs to AppleScript's text item delimiters
    set AppleScript's text item delimiters to the search_string
    set the item_list to every text item of this_text
    set AppleScript's text item delimiters to the replacement_string
    set this_text to the item_list as string
    set AppleScript's text item delimiters to prevTIDs
    return this_text
end replace_text

on escapeString(theString)
    set theString to my replace_text(theString, "\\", "")
    set theEscapedString to ""
    repeat with theChar in characters of theString
        if theChar is in {"\\", "\""} then
            set theEscapedString to theEscapedString & "\\" & theChar
        else if theChar is equal to tab then
            set theEscapedString to theEscapedString & "\\t"
        else if theChar is equal to return then
            set theEscapedString to theEscapedString & "\\r"
        else if theChar is equal to linefeed then
            set theEscapedString to theEscapedString & "\\n"
        else
            set theEscapedString to theEscapedString & theChar
        end if
    end repeat

    return theEscapedString
end escapeString

on performSmartRule(theRecords)
    set csvString to ""
    repeat with i from 1 to count exampleFileNames
        set csvString to csvString & "\"" & item i of exampleFileNames & "\""
        if i is not (count exampleFileNames) then
            set csvString to csvString & ", "
        end if
    end repeat

    tell application id "DNtp"
        repeat with theRecord in theRecords
            repeat 1 times
                log ((name of theRecord) as text)
                if type of theRecord as text is "group" or (word count of theRecord) is 0 then exit repeat -- skip groups and records with zero word count

                set c to plain text of theRecord
                set fileType to type of theRecord
                set escapedContent to my escapeString(c)
                -- set escapedContent to quoted form of c

                log escapedContent as text

                set theCommand to "curl -X POST http://host/name -H \"Content-Type: application/json\" -d @- << EOF \n{\n\"content\":\"" & escapedContent & "\",\"numNames\": 1, \"fileType\": \"" & fileType & "\",\"exampleNames\":[" & csvString & "]\n}\nEOF"

                log theCommand as text

                try
                    set theResult to do shell script theCommand
                    log theResult as text
                    set firstElement to do shell script "echo " & quoted form of theResult & " | jq -r '.names[0]'"
                    log firstElement as text
                    set name of theRecord to firstElement
                on error errorMessage number errorNumber
                    display dialog "Error: " & errorMessage & " (" & errorNumber & ")"
                end try


            end repeat
        end repeat
    end tell
end performSmartRule

Then, also a little python script for usage with Pythonista for iOS or https://pyto.app to use within Siri Shortcuts, so I can use it with DEVONthink To Go:

import requests
import json
import sys

examplenames = [
"2023-03 Electricity Bill.pdf", 
"2023-03 Bank Letter"
]

url = "http://host/name"
headers = {"Content-Type": "application/json"}
data = {
 "content": sys.argv[1],
 "fileType": sys.argv[2] if len(sys.argv) > 2 else "pdf",
 "exampleNames": examplenames,
 "numNames": 1
}

response = requests.post(url, headers=headers, json=data)

if response.status_code == 200:
	parsed = response.json()
	if parsed["success"] == True:
		print(parsed["names"][0])
else:
	print(f"Request failed with status code {response.status_code}")

Amontillado · March 27, 2023, 11:39am

The pro-AI people here, as opposed to any less reliable pro-people AI, make good points. My personal cringe remains.

You have trade secrets. It’s your novel in progress or maybe sales literature in development for an unreleased product.

Would you go to a restaurant full of total strangers and ask the maitre d’ to poll them for their opinion on your prose?

That’s what ChatGPT and other Internet based AI services seem like to me.

rkaplan · March 27, 2023, 12:06pm

No question that data security is a huge issue. OpenAI is neither HIPAA compliant nor GDPR compliant. It cannot be used at this time to review any documents with identifying medical, legal, financial, or other personal information. Presumably that will be addressed in time.

Despite that huge limitation, there remain many professional and personal use case at present where ChatGPT and Bing AI are quite useful.

ibuys · March 27, 2023, 1:04pm

I’ve been watching and working in the tech industry for many years. ChatGPT is impressive, and I think disregarding it as merely anything is a mistake.

chrillek · March 27, 2023, 5:30pm

Nearly unrelated fun fact: I just saw someone post a “comment” generated by ChatGPT for a piece of human-written AppleScript code. The “comment” was an even more chatty version of every aspect of the code. Unfortunately, no insights. Only code turned into prose. What lovely times we’re living in…

rkaplan · March 27, 2023, 5:43pm

@chrillek - Your frustration sounds like I sound when my patients try to diagnose themselves with Google (and soon by ChatGPT).

Neither one of us is going to change either situation.

kewms · March 28, 2023, 1:54am

There’s abundant evidence that ChatGPT will flat out invent sources. Use this “capability” at your own risk.

syntagm · March 28, 2023, 2:23am

ChatGPT is not a knowledge graph or lookup database, that’s very important to remember. It’s akin to having the language portion of a brain - it’s very good with language and everything related to it, but now we still need the other parts of the brain.

The announcement of ChatGPT plugins changes that to a certain extent, by allowing ChatGPT to pull information from things that may actually know this stuff. The recent wolfram alpha plugin for example is giving it the capacity to do complex mathematical computation extremely accurately.
Bing is another huge step in that. It’s still early, but combining the language portion with a knowledge graph of information (Bing/Google/Other search engine), will drastically reduce the error rate of it making things up

Without plugins and knowledge, it’s still fantastic for things that involve language. For example, you give it a bunch of text and ask it to do something on that text - it will perform exceptionally well.

Programming not perfectly, but the point of programming languages in the first place was to have a higher level “language” so we can describe what a program should do, in a more human way. It can’t reason or ‘think’, but it can act on this language. But because that’s of course not the same as communicating, it will perform worse. Still good (esp. with gpt4), but worse.

Good to keep expectations to what it can and can’t do and understand these things. Classifying text, summarizing, finding structure in data - that’s all great, and we should absolutely use it for those things (like in DEVONthink).
Looking up accurate information - not so much.

rkaplan · March 28, 2023, 2:26am

(1) I built a web app which uses the ChatGPT API. A prompt to ChatGPT limits its response to a list of real references. There is no hallucination in this situation

(2) I am working on a workflow which expands this idea using Coda.io to scale the potential responses further

(3) Bing AI Chat has access to the Internet. If you instruct it to not respond if it does not know an answer, its risk of hallucination is almost nil.

(4) The recently announced ChatGPT plugin capability is likely to offer another route to eliminate hallucination.

rkaplan · March 28, 2023, 2:28am

Is the standard 100% accuracy? Or is the standard that it is superior to a traditional Google search?

Bing AI Chat has replaced more than 90% of the situations where I previously used Google Search.

Anton_Gyaltsen · March 28, 2023, 3:08am

Very relevant news in context of this topic: GitHub - openai/chatgpt-retrieval-plugin: The ChatGPT Retrieval Plugin lets you easily search and find personal or work documents by asking questions in everyday language.

kewms · March 28, 2023, 4:01am

What are you using the results for?

If it’s just an augmented search tool, that’s one thing. If you’re presenting the result to an audience as in some way reflective of reality, that’s another.

syntagm · March 28, 2023, 6:59am

Too bad this doesn’t work offline. Would be cool to have a plugin that can access DEVONthink files. Still very cool

rkaplan · March 28, 2023, 1:59pm

Either one is fine as long as I confirm the source - as is always my job whether the citation is from Google, from PubMed, from a medical journal, from a legal journal, or anywhere else.

Yes - my point is that as an augmented search tool Bing AI or ChatGPT with a customized App are amazing.

Eddy · March 28, 2023, 2:02pm

Stephen Wolfram’s 19,500-word essay, What Is ChatGPT Doing … and Why Does It Work?, explains the data processing that ChatGPT and other systems based on Large Language Models use. A few high-level quotes:

…what ChatGPT is always fundamentally trying to do is to produce a “reasonable continuation” of whatever text it’s got so far, where by “reasonable” we mean “what one might expect someone to write after seeing what people have written on billions of webpages, etc.”
…
…the remarkable thing is that when ChatGPT does something like write an essay what it’s essentially doing is just asking over and over again “given the text so far, what should the next word be?”—and each time adding a word.
…
…at each step it gets a list of words with probabilities. But which one should it actually pick to add to the essay (or whatever) that it’s writing? One might think it should be the “highest-ranked” word… . But this is where a bit of voodoo begins to creep in. Because for some reason—that maybe one day we’ll have a scientific-style understanding of—if we always pick the highest-ranked word, we’ll typically get a very “flat” essay, that never seems to “show any creativity” (and even sometimes repeats word for word). But if sometimes (at random) we pick lower-ranked words, we get a “more interesting” essay.

(Many, many words, explanations, amplifications, diagrams, later…)

So … What Is ChatGPT Doing, and Why Does It Work?
…
What ChatGPT does in generating text is very impressive— and the results are usually very much like what we humans would produce. So does this mean ChatGPT is working like a brain? …

When it comes to training (AKA learning) the different “hardware” of the brain and of current computers (as well as, perhaps, some undeveloped algorithmic ideas) forces ChatGPT to use a strategy that’s probably rather different (and in some ways much less efficient) than the brain. And there’s something else as well: unlike even in typical algorithmic computation, ChatGPT doesn’t internally “have loops” or “recompute on data”. And that inevitably limits its computational capability—even with respect to current computers, but definitely with respect to the brain.
…
But for now it’s exciting to see what ChatGPT has already been able to do. At some level it’s a great example of the fundamental scientific fact that large numbers of simple computational elements can do remarkable and unexpected things. But it also provides perhaps the best impetus we’ve had in two thousand years to understand better just what the fundamental character and principles might be of that central feature of the human condition that is human language and the processes of thinking behind it.

The words “intelligent” and “intelligence” do not appear in his essay even once.

ChatGPT is a parlor trick, and to be even more unkind, the proverbial talking dog, but for the digital age.

ChatGPT is not intelligent, but can produce output that resembles intelligence, often quite convincingly. This, and its shiny-newness, seduce us into trusting it in irrational ways.

You might as well hire me to sort through your source materials and make authoritative sounds about them. I can do everything that ChatGPT can do, just more slowly. But ChatGPT cannot do what you and I can do: it cannot reflect upon its own process or utterances, form knowledge from facts, infer relationships among concepts, or even conceptualize, or recognize concepts, at all.

ChatGPT uses “voodoo”-powered computation to compose text toward a desired effect. It. Strings. Words. Together. According. To. Algorithms. That. Is. All.

ChatGPT did not “pass the bar exam.” It. Constructed. Text. Word. By. Word. Using. Voodoo ™. In. Response. To. Input.

Do what you will…but please, please, please, do not run code produced by a computer whose sources are opaque, programmed by parties whose motives are unknown, or make conclusions or decisions based on the output of this remarkable, but (criminally) mis-marketed, technology. The consequences could be…consequential.

ibuys · March 28, 2023, 3:31pm

I guess we’ll find out if your sarcasm and dismissive attitude are warranted in a year or two. You may be right, this may be a flash-in-the-pan fad that will fade away into an “AI Winter”, but I suspect there’s more to it than that.

rkaplan · March 28, 2023, 3:54pm

Your brain is “just” a bunch of sodium moving across a membrane. That is “all” it is.

lutefish · March 28, 2023, 4:32pm

A small number of our students are already using ChatGPT to plagiarize, which is not at all unexpected. They are also, like most plagiarists, quite bad at it. My favorite example so far was an undergraduate in a large lecture class on Early Modern English literature who asked ChatGPT to write a five-page essay analyzing a sonnet. Which was fine, except the assignment had been to analyze a play, and no poem of that title exists. So, ChatGPT did the reasonable thing and composed a faux-Renaissance sonnet with the appropriate title, and then analyzed its own poem competently if blandly.

Eddy · March 28, 2023, 7:58pm

Oh yes, a lot can happen in a year or two.

All this is hardly “a flash-in-the-pan fad.” Indeed, it seems more of a genie from the bottle.

Upon reflection, I was perhaps a bit dramatic with the “consequences” bit… . My intention is to express what is clear to me to be present danger: that the seductiveness of this startling new technology is overpowering our rational and critical thinking, and this is being aided by dangerous and irresponsible marketing.

There may well be as you say “more to it,” but if there is, the insiders aren’t talking, and all of us can only go on what we know, and what we trust others, qualified to perform analyses that we individually aren’t capable of or trained to do, have to say.

Unfortunately, we can’t use ChatGPT as one of those trusted sources. It is a computer program that replies to prompts, one word at a time.