Experimenting with OpenAI API for automatic classification and renaming

syntagm · March 25, 2023, 1:59pm

With all the ChatGPT hype I wanted to experiment if I could use the APIs to come up with names for my unnamed files within DEVONthink, and hey, it somewhat works

gpt names demo

Obviously not recommending to send all your files to OpenAI (pls don’t do this), but this looks like it could be useful as part of a bigger classification pipeline

Some other things I tried as well were to classify based on content into a series of things like “water bill”, “credit card bill”, “utility”, “letter”, etc, or give it a list of file names so it ‘learns’ my naming/folder scheme.

Curious what other usecases you can think of that may be interesting to explore?

cgrunenberg · March 25, 2023, 2:04pm

Hard to tell without the source files whether the names are right/useful/smart or not but did you e.g. check the batch processing action Change Name plus the Proposed Name placeholder first?

Amontillado · March 25, 2023, 4:12pm

Internet dependent AI is going to get someone in trouble. The traffic will get snooped, data will be compromised, the world will express shock at those dastardly hackers, and then it will happen again. And again.

Sorry, had an after hours disaster that kept me up late. I’m still edgy. I’ll get better.

BLUEFROG · March 25, 2023, 4:36pm

No worries and the tech is much less impressive than the hype IMO. And the intelligence is definitely artificial.

DrJJWMac · March 25, 2023, 6:56pm

Or perhaps also imaginary (vis a vis real).

–
JJW

rkaplan · March 26, 2023, 4:07am

I am surprised @BLUEFROG - usually you are enthusiastic at what amounts to such a notable milestone in communication.

I do agree that much of what the media has been emphasizing is hype, i.e. using AI to write 10,000 blog articles per day or other similar spam creation “uses.”

Just a few notable abilities of various AI models I have tried include:

Proofreading
Asking for pro/con data or pro/con arguments on a given topic
Searching the web for sources which both support and refute a given argument
Summarizing information - including the ability to do this at multiple reading levels
Automating reformatting of text
Asking how two documents have ideas which are similar and have ideas which are contrasting
Searching through a large document for specific content - and offering a page reference or other link to that specific location
Assigning Keywords or tags to a document based on whatever criteria you wish
Parsing from JSON to an attractive HTML table and vice versa
Much more efficient and thorough for many types of searches than a traditional search engine. For example Bing AI can effortlessly prepare a detailed chart for you such as "List 20 hotels near XXX including number of stars, price, drive distance to XX, walking distance to YYY, URL, and general sentiment on Yelp

Yes some aspects of AI are overhyped. Yes there are some growing pains. But fundamentally I am convinced we will look back on AI language models as being up there with the invention of the printing press, word processor, and Internet.

P.S. A smart rule to run an OpenAI prompt on a set of documents/metadata fields in DT3 would be superb.

syntagm · March 26, 2023, 5:52am

Hmm, I’m usually the first who is critical of new tech, but the GPT3 and GPT4 advancements are pretty crazy. For example, the Golang CLI and scripts that I wrote for doing this little experiment are entirely autogenerated by GPT4, all I did were small touchups and copying it together
I’m already seeing people who can’t code releasing apps which will only pick up from here as the technology improves

I think the strengths for text based classification is that it can find structure in otherwise seemingly unstructured data and make this tech available for us “normal people”. My OCRed documents aren’t great quality and in multiple languages, but I was able to give it a few examples and it pretty nicely gave me what I wanted.
GPT4 is also able to do visual stuff, but that’s not available yet for playing around with, but it’d be pretty cool once I can give it an image of a bill, and it generates a summary with the keypoints for me. I want to be able to scan a bill with my phone, then have DEVONthink accurately extract the amount + date, and rename it accordingly. Currently, even with state of the art Abbyy and a bunch of regex rules I’m not able to do this reliably.

I gave it a try on 3 random PDFs. Here is the DEVONthink proposed name:
ss1

Here’s what GPT3.5 renamed it to from the content (not preloading other documents):

GPT4:

Not that much magic to classify such a small amount of text into a filename, but still pretty cool

Now let’s hope we’ll get models like this small enough to run locally soon, and properly embed into apps like DEVONthink

chrillek · March 26, 2023, 6:54am

I’ve seen enough people coding who can’t. Where’s the advantage in seeing more of them?
Sarcasm aside: what really riles me up is the I in this “AI”. The thing throws back stuff it finds on the net, possibly modifying it on the way. It does not write code for a problem that hasn’t been solved already.

syntagm · March 26, 2023, 8:02am

That’s a strong oversimplification of what’s going on. It’s not really a snippet lookup engine, but more a complex mathematical system that has been tuned to predict the next most likely token in a series of token based on the current context, while still making sense.

And even then, that this system is able to take what you tell it, understand your intent, then generate something that matches this intent is alone already damn impressive

Point case: I now have a program that can classify files based on it’s content that I didn’t have before, that I was able to create without writing any code.
And I now have a smart rule in DEVONthink that can give me accurate filenames on import based on the filecontent, functionality that I didn’t have before and makes my life easier.

Anyway, I’d like to keep this thread on exploring what we could use it for to power up DEVONthink, and give us more value/functionality, and not get hung up on arguing whether it can code or not, or if it’s useful (which it clearly is) lol

So far what it looks useful for within DEVONthink, in my playing around:

Summarizing documents and adding that summary (1-2 lines) into file metadata (also helps DEVONthink search for looking up documents)
Classifying into different categories/folders based on content
Automatic tagging (tag things that are classified as “utility bill” with the “utility bill tag” for example)
Generating names for unnamed files (see above)

What I’d like to see if it’s possible, but more complex:

Generate a context of existing documents, like “here are 10 credit card bills, this is their content, this is the metadata for those. Here’s the same for these other things like utility bills. Now based on this information, do the same for these other documents that aren’t processed yet”, kind of like a beefed up version of what I did above
Build a knowledge graph to reason with my DT data, so I can summon a prompt and ask stuff “how much did my water usage increase in the past 3 months” (inspired by GitHub - mpoon/gpt-repository-loader: Convert code repos into an LLM prompt-friendly format. Mostly built by GPT-4. which allows you to load an entire git repository into the GPT context and ask questions, or tell it directly to change or add features)

What are some other things worth trying or experimenting with?

rkaplan · March 26, 2023, 8:25am

That’s an extremely good argument as to why AI is not going to replace professional programmers.

But 99% of hobbist or power-user coding requests/ideas no doubt are indeed re-inventing the wheel. That does not make it less worthwhile. Github is overflowing with “solved” programming challenges; nonetheless only a tiny percent of computer users have the knowledge/capability to implement open-source code from Github. Accessing an API is surely not a novel piece of code, but GPT4 can be quite convenient at suggesting how to do it.

BLUEFROG · March 26, 2023, 12:38pm

That’s part of the problem. Many people don’t see the results as suggestions. They see them as solutions, often with no way of verifying that. Non-programmers using an AI service to write a script will think the output is brilliant because they’re not programmers. Their assessment doesn’t make the code good.

what amounts to such a notable milestone in communication.

Communicating with whom? ChatGPT isn’t communicating. It’s merely a predictive text engine.

And sorry to say, there are already socio-political biases baked into the AI, so such results are even less impressive. And again, people will take such responses as “true”, as if some cognizant or sentient entity gave them the correct answer. That is a dangerous thing in those cases, not something to be celebrated IMO.

DrJJWMac · March 26, 2023, 2:41pm

A generational reference here to a different paradigm shift some decades ago …

At this point, the wiser folks will put away their calculator and return to their order of magnitude estimate in their head, their slide rule, and their abacus as needed. The others will be left to wonder why getting a correct value pays less than knowing why the answer is true.

—
JJW

chrillek · March 26, 2023, 3:12pm

Since you baited me with that, I downloaded and installed Bing and got admitted to their “AI” trial. Asking a similar question, namely: “Which restaurants are open on a Sunday evening in Zagreb/Croatia, not more than 1,5 km from my hotel”.
Which gave me the ridiculous number of four restaurants, mostly from Tripadvisor. Narrowing it down by adding “croatian cuisine” returned three, of which one was a very, very wrong hit – those guys serve only italian food.

Now I tried something closer to your query: “Hotels Tirana Stadtzentrum ruhig 3 Sterne mit Website” (sorry, that’s in German, but you get the gist). Weirdly, Bing transformed that into a query of the three hotel portals it uses. Which returned no results. Modified again, saying “directly bookable” – Bing removed the “directly” and offered me links to the tripadvisor listings. Quite the opposite of what I was asking for. Maybe I’m too dumb, maybe Bing does not understand German well enough.

Is that really AI? Seriously? Why would I even think about using this instead of heading over to Tripadvisor directly (which I never use because it just draws everyone to the same places and everyone can write bogus reviews – horrible). For me, it would be a lot easier to just ask the guy at reception what he’d recommend for dinner tonight.

rkaplan · March 26, 2023, 3:35pm

Non-programmers using an AI service to write a script will think the output is brilliant because they’re not programmers. Their assessment doesn’t make the code good.

No doubt AI code will rarely be “good” and likely will rarely pass a code review by a professional coder.

I am saying that’s not the point. If it helps a “non-professional programmer” who happens to be a professional in some other field - and that lets him create personal scripts to extend the power of his desktop - then that can be immensely helpful.

Communicating with whom? ChatGPT isn’t communicating. It’s merely a predictive text engine.

If ChatGPT is “merely a predictive text engine” then the human brain is merely a bunch of cells. The truth is that nobody can truly explain how either design makes the jump to language.

Don’t get me wrong- I do NOT believe AI is sentient. I recognize it has major faults at present. I would never use it to author original material nor would I rely on its output without verification.

That said, today as we speak AI is extremely helpful to me in these ways:

As a big-picture editor to suggest or expand on ideas I had not considered in a document I author (ChatGPT)
To replace 90% of my traditional Google searches for basic personal information (Bing Chat)
Equally helpful (and thus always a companion source) in doing a traditional PubMed literature search (Perplexity/Bing Chat)
Superior to any other traditional source when I am doing research either for personal or professional purposes and I am explicitly seeking rebuttal arguments to be sure I have not missed something in my analysis

These are truly useful forms of communication where AI today offers capabilities not found elsewhere.

I agree its hallucinations are an issue and thus it cannot be relied upon without verification. But misinformation has been around for eons. Both sides of the Atlantic have a long history of tabloid journalism going back at least a century. So I am not worried that AI presents yet another way of spreading misinformation by those who do not care about truth. Instead I am glad that it offers another way to supplement (but not replace) sources for research and analysis in ways never before possible.

rkaplan · March 26, 2023, 3:42pm

Perhaps there is not as much data in those locations. I do know I did something like this on a real trip recently and it was extremely useful.

BLUEFROG · March 26, 2023, 3:57pm

I would definitely say that’s a false equivalency. The human brain is a far more powerful and complex (and impressive) instrument / machine than these AI engines.

As a big-picture editor to suggest or expand on ideas I had not considered in a document I author

To each their own, but it seems like you are treating it as a subject matter expert. Isn’t that what your job actually is?

et al:
How many people are reading and considering these things…

rkaplan · March 26, 2023, 4:08pm

That furthers my point. Yes the ultimate function of the human brain is far more complex than AI engines. But at its simplest level, the human brain is “just” a bunch of voltage changes across a membrane. Just as ChatGPT is “just” a predictive text engine. Nobody knows how either system evolves from its basic design to language.

Of course my ultimate job is to be responsible for the accuracy of all the content I create. I loathe the ads I see for Ai software to write 1,000 blog posts per day and other silliness.

I am saying that I have found AI to be extremely helpful in addition to other traditional sources I have used to search for information. And I have found it to be surprisingly good at suggesting additional arguments both pro and con for me to be aware of when analyzing an issue. Certainly I need to verify those ideas just like I would any other; for sure I reject some of what AI outputs, just like I reject lots of other information sources. In the end though, it is quite helpful.

All that aside, I agree that the focus in the media has been on using AI as an “automatic author” rather than considering AI to be an editor which is very capable but makes enough mistakes that it has to be watched very closely.

rkaplan · March 26, 2023, 4:11pm

Of course tons of people ignore that. Just like tons of people read The National Enquirer and other forms of tabloid journalism. And tons of people go to all sorts of traditional websites which spread wild conspiracy theories.

You keep emphasizing the limitations of AI; you are completely ignoring the fantastic benefits of AI which you can reap if you also pay attention to its limitations.

I say again - a smart rule in DT3 which would follow an OpenAI prompt for specific document/metadata in DT3 would be incredibly useful - for editing, for text transformations, for semantic searching. You are throwing the baby out with the bathwater by not considering the benefits. That’s surprising to me for software like DT3 which has the general philosophy of giving lots of customizable capability to users and letting the users assess the risks/rewards of scripting vs indexing vs smart rules etc.

BLUEFROG · March 26, 2023, 4:17pm

As I said, to each their own, but I am seeing none of these benefits, let alone anything really fantastic about it. And as we often say, “If it scratches your itch…”

On a side note: I find DALL-E more interesting, though I have philosophical arguments relative to copyright and authorship issues of the output.

I say again - a smart rule in DT3 which would follow an OpenAI prompt for specific document/metadata in DT3 would be incredibly useful - for editing, for text transformations, for semantic searching. You are throwing the baby out with the bathwater by not considering the benefits. That’s surprising to me for software like DT3 which has the general philosophy of giving lots of customizable capability to users and letting the users assess the risks/rewards of scripting vs indexing vs smart rules etc.

And again, speaking personally, I don’t see these as things I’d want DEVONthink to do.

I don’t control Development and what does or doesn’t get implemented. You are correct that DEVONthink is very customizable so you could likely write your own scripts to hook into APIs, etc.

And you’ve been around long enough to know, we don’t just jump on every bandwagon that rolls down the road. This is not a gimme nor is it a trivial thing to just plug in. But as already said, those decisions are above my pay grade and out of my wheelhouse.

PS: There is no intended denigration of the tech and people who want to use it and no offense meant. I’m glad you’re excited about it and if it’s useful and you trust the output, that’s great. But also bear in mind, we have much more to consider in the situation.

rkaplan · March 26, 2023, 4:31pm

That makes sense given your photography and graphic design background as I recall.

While image generation AI is fascinating to explore, I suspect what you will conclude about present-generation image generation is that (a) there is often something a bit off about the images which makes it apparent that it is digitally created; and (b) whereas it is easy to tweak AI text output to professional standards, it is a lot more work to do so on an AI image. Thus my guess is that at present most AI imagery is just “for fun” while it’s more common for AI text to be used “for real.”