Integrating OCR by ChatGPT into DT3

Hi there – I have found that feeding a PDF into ChatGPT and telling it to “give me a good OCR of this image, that preserves the meaning of words and sentences, rather than giving a character-by-character reading” reliably provides a vastly superior result to using DT3’s ABBY capacity. Is there a way (a script, I guess?) to get DT3 to use ChatGPT (I’m a paid subscriber) to convert a PDF in DT3 to an OCR’d version, the way ABBY now does in the contextual menu for “OCR”? (Rather than my having to use ChatGPT on the Web, then cut and paste the resulting text into a new DT3 text document, as I now do.) Afraid I’m computer illiterate (I’m an art critic!) so it would have to be an easy and transparent process, for it to work for me. THanks, folks!

1 Like

Using ChatGPT will not add a text layer to a document, so it’s not equal to OCR (list like Apple’s Live Text feature isn’t).

There are some people experimenting with LLMs and DEVONthink in threads in this Automation section, but we have nothing built-in at this time. Such integrations are under investigation for a future release.

2 Likes

I’m a bit surprised ChatGPT would do a better job than Abby at character recognition (even if it can’t create a text layer in the PDF, like BLUEFROG notes). I’m pretty happy with the results I get with DEVONthink’s built-in OCR.

What kind of PDFs are these? Some really bad scans or something? What are your OCR settings?

1 Like

This is anecdotal and as you’re surmizing, it could be very specific documents. It’s the only such report I’ve ever heard.
And you’re also sending data to someone else’s servers, so there’s that to consider as well :wink:

1 Like

I don’t follow LLM developments that closely, so I might be missing something. But right now bad scans is my only guess where I could imagine ChatGPT to yield better results.

Of course privacy is another important aspect.

And dolla-dolla-bill, yo :wink:

As the OP noted, he is a paying customer. People often forget or don’t understand, these LLMs are commercial, i.e., not free, and document processing costs more than merely asking simple questions.

2 Likes

I deal with a lot of historical documents (old newspapers etc) so the originals are often less than great, and ChatGPT really does seem to do a great job “reading” them, as follows.

Here’s what ABBY in DT3 came up with, for the text layer in a PDF of a typed 1917 document (I’d attach the PDF here if I could figure out how!):

DIVISION “C” - The Definition of ARTfor our Purposes.
Such a definition of ART in our age
(a) mustinclude a ll aspects of life. It eustfind
beauty not only in work done at leisure, but in the
daily labor of us all,
(b) There is the definition of P 1 a t o, which w ill serve
4y
m ;-.;
in part,
Zv’.
PREVIEW nAr.t_ia…,tho. jgrfool…ire,sogniE.3ailthru the aqnaoa"
(1)Platoimpliesthatthesenses arethemeans for recognizing the perfect.
(2)huttheperfectisconceived firstbythe intellect,
(3) since however every man lias the m e a n s , that is., the senses, this definition is partly democratic.
In his own Greek society the vast majority of people
were slaves. They lacked therefore the opportunity to develop their intellect. They could therefore
not enjoy ^RT as a conception of the perfect.
(o) We ought to add, therefore, to this definition of Plato
thisi
"Art is the recognition of the perfect thru the senses,
under social conditions which make it possible for every
member of society to endeavor to realize this aims* (d) It is one of the purposes of this thesis to whov that
nearly all 'art in the past has been u n d e m o c r a t ­
i c or strongly individual, and that we are just beginning to realize a more democratic ideal of ART. For this purpose, a historical review of works of art in the past
w ill be helpful.

and here’s what ChatGPT came up with:

DIVISION “C” - The Definition of ART for our Purposes
Such a definition of ART in our age:
(a) must include all aspects of life. It must find beauty not only in work done at leisure, but in the daily labor of us all.

(b) There is the definition of Plato, which will serve in part:

“Art is the perfect recognized through the senses.”

(1) Plato implies that the senses are the means for recognizing the perfect.
(2) But the perfect is conceived first by the intellect.
(3) Since, however, every man has the means, that is, the senses, this definition is partly democratic.
In his own Greek society, the vast majority of people were slaves. They lacked, therefore, the opportunity to develop their intellect. They could, therefore, not enjoy ART as a conception of the perfect.

(c) We ought to add, therefore, to this definition of Plato this:
“Art is the recognition of the perfect through the senses, under social conditions which make it possible for every member of society to endeavor to realize this aim.”

(d) It is one of the purposes of this thesis to show that nearly all ART in the past has been undemocratic or strongly individual, and that we are just beginning to realize a more democratic ideal of ART. For this purpose, a historical review of works of art in the past will be helpful.

2 Likes

I think you have the required user level to upload attatchments. You do it here:

You can also just drag and drop a file into the text field.

Aha! Thanks.
Here’s the pdf
1917 ART AND DEMOCRACY - BUCHER, HERMAN . New York University ProQuest Dissertations - Theses.pdf (303.3 KB)

That looks like a previous page. But [Edit: Ah, now I see. The first ¼ of what you pasted in above is from the previous page, then the text matches up. Your text just cuts off before “CHAPTER II”]

I just ran it through DTs OCR and converted to plain text, and I think it looks okay. Not 100% perfect, but certainly better than your example above.

The watermark does seem to cause a little trouble. Also some of the typed over characters and handwritten corrections, but I’m actually surprised how many it gets right.

Plain text result
 "Art is the perfect recognized thru the senses"
(I) Plato implies that the senses are the means
for recognizing the perfect.
(2) but the perfect is conceived first by the
intellect,
(3) since however every man has the means, that is.,
the senses, this definition is partly democraic.
In his own Greek aociety the vast majority of people were slaves. They lacked therefore the opportunity
to develop their intellect. They could therefore not enjoy ART as a conception of the perfect.
(c) We ought to add, therefore, to this definition of Plato this:
"Art is the recognition of the perfect thru the senses, under social conditions which make it possible for every member of society to endeavor to realize this aims*
(d) It is one of' the purposes of this thesis to show that nearly allAtrt in the past has been undem ocrat-
i c or strongly individual, and that we are just beginning to realize a more democratic ideal of ART. For this purpose, a historical review of works of art in the past
will be helpful.
CHAPTER II - HISTORIC.L REVIEW OF W O R K S OF ART.
DIVISION "A"
Such a brief historical review of works of art would include such works made during certain epochs.
personal belongings for purely personal pleasure,
(b) the military stage,, likewise characterised by works of art, weapons, arms, personal possessions, artistically
(c)
fashioned and decorated for personal use.
ia immense structures for eccles-
iastical institutions, rich robes and utensils for use of church dignitaries, all this while the majority of the people lived in abject poverty,
(d) the feudal and constitutional monarchial stage. the building of palaces and castles, sumptuous decoration within and without, all primarily for the glorifieat-
tion of the monarchial idea and the monarch himself, to the total exclusion of the masses. The time of the court painter and poet, as well as the court priest and court jester.
(e) the industrial and commercial stags, -JIT now being patronized by new men, the merchant prince,, the leader of finance and the captain of industry - but again ART in the service of strong, single men with strong indiw-
vidual tendencies - yet we find here an increasing number of men able to command in the field of art.
DIVISION "B"- Tendencies towards Democratization of ART,
(a) These various stages show a gradual broadening of the
tendencies and influences of ART. Art becomes more and more democratic.
(1) In the hunting and agricultural stage man created works of art for his own personal enjoyment, perhaps out of pride,
V
i
Settings

On an M1 machine:

Main things I notice, it misses:

(a) The hunting and agricultural stage, decoration of

^ the whole line, and

(c) the ecclesiastical stage

I don’t OCR a lot of typed documents, so I don’t really know what to expect.

How are you working with these documents? What level of precision do you need?

Bucher 1917 - Art and Democracy - pV (DT ocr).pdf (824.9 KB)

2 Likes

I also saw a very good result from OCR in DEVONthink. Only one noticeable concatenation of words, but otherwise not seeing a big issue with this document.

Could it be improved? Yes, I suppose so but it certainly wouldn’t be a “drastic” improvement IMHO.

1 Like

I guess a screenshot is easier to read than a codeblock :wink: But yeah, almost identical result. I didn’t even get the concatenation.
Your result is also missing the parts I noted above from DIVISION “A”. My first thought was the watermark, but on second look that doesn’t particularly seem like the cause.

How strange. Did you run the PDF I uploaded through the DT3 OCR once again, or just select the text from my PDF (ie, the underlying text layer) and paste it into that document you displayed here, that indeed has relatively few errors?
Because when I do just that – select the text, then paste it into a text document – I still get things like " nAr.t_ia…,tho. jgrfool…ire,sogniE.3ailthru the aqnaoa" for “Art is the perfect recognized through the sense” – and there are plenty of similar errors, if not quite as bad. (I also tried running the OCR on the same PDF a second time, and the result was no better.) Or did you do something else entirely to “see” and extract just the text layer?
Is it possible that my cut-and-pasted text does not correspond to what DT3 “sees” when it searches the text? (I can read the image layer of my PDFs just fine, of course, but searching them for words and phrases is an even more important need that doesn’t work if the underlying text is so faulty – I’d get too many “misses” in my searches.)
Any thoughts, gentlemen? Thanks for being so helpful…

(1)Platoimpliesthatthesenses arethemeans for recognizing the perfect.

(2)huttheperfectisconceived firstbythe intellect,

And I guess I should have included these specs for my system:

and

The plain text document was created by right-clicking the PDF (or going to Data in the menu bar) and selecting Convert > to Plain Text.

Was the original PDF you sent already OCR’ed in DEVONthink? The text layer is complete gibberish.

I don’t even get that, I get:

-h:􏰀zxWg􏰀􏰀nz?•􏰀 ^H:6••5􏰀􏰀W:wn’•H-WU􏰀ygW5z?:K z?w gN-g•g7

This is the document you uploaded converted to plain text on my machine:

Both BLUEFROG and I ran the PDF through DEVONthink’s OCR again. (Just to be sure we’re on the same page: Right-click the PDF or go to Data in the menu bar, then select OCR > to searchable PDF.)

I don’t keep Auto correct: Deskew and Page orientation enabled all the time, only for documents where I think it’s needed. But I just tried with your settings (including resolution), and I still get the same acceptable result as earlier.

I see you’re on an Intel mac, but I wouldn’t expect that to be the reason for the vastly different results here. What version of DEVONthink are you running?

And why are you still running macOS Big Sur? Why 11.4 in particular?? That’s from May 24, 2021. Even if you want to run Big Sur for some reason, the last Big Sur release was 11.7.10 – from September 11, 2023. I think this is more likely to explain why some things are not behaving as expected.

1 Like

I OCR’d the document as normally would be done. Then I converted it to plain text to show the contents of the text layer. Nothing more.

What are your OCR settings in DEVONthink?

And you’re welcome :slight_smile:

In case of public documents it’s quite likely that they were used to train ChatGPT and then it’s no real surprise that ChatGPT is able to “guess” the right text.

Poor, personal scans not available on the web would be better for a comparison of the capabilities.

2 Likes

An option is to get the gpt text and just create a new pdf and attach it to the original. Since if you’re working with historical documents you don’t want to just remove the original …

Yes, that’s what I’ve been doing – but I deal with so many hundreds of documents, when I’m writing a book, that it would slow down my workflow a great deal…

Sigh.
I guess I should update, if it’s even possible that the OS is causing problems for DT. I have to admit that whenever I’ve updated in the past, it has created problems of one kind or another, so I’ve tended to try to keep things as much as possible stable and unchanging. But that itself is clearly causing as many problems as updating might. Thanks for all the input.