Gemini Transcriptions of Files

I’m currently exporting files, transcribing them using Gemini, and then creating a searchable rtf. Is there a workflow that I could set up to make this less cumbersone? I have 10,000+ files to examine. When Gemini works, it is fantastic; when it doesn’t, it’s a bit aggravating.

Gemini is a bit quirky - it appears to have limits on the total size of files uploaded and using the same command sometimes produces different results. Odd ….

I’m new to AI and I didn’t see anything in the manual that addresses this particular situation. I do see where, in Settings, I can select Gemini but then it says that using it requires an API key. That’s fine, but I don’t have a Google Cloud account.

What kind of files actually?

@cgrunenberg:

Newspaper clippings that have been OCRed by DT.

How do currently “transcribe” these documents? Or is this actually just a conversion to rich text?

using the same command sometimes produces different results

This happens because LLMs are not deterministic. Repeating the same prompt will give you different results.

2 Likes

bjornivarsson:

How very interesting! I did a Google search on “llms are not deterministic” and learned a lot. This explains why I get such “interesting” results sometimes. When that happens, I either run the command again or, often, reduce the size of the text being transcribed. It seems that Gemini has difficulty processing anything over 2Mb in size. Smaller is better.

cgrunenberg:

I enter the command “transcribe exactly include date page number,” upload the file/files (the number depends on the total size - I try not to exceed 1.5Mb) to Gemini and click on the submit arrow. After awhile, the result appears and I take a look to see how good the results are. If I’m satisfied, I then click on the option to “Export to Docs” and a text document appears in a browser window. I then select File > Download to generate an rtf document, save it to Downloads and then go over it to ensure that the transcription is accurate while at the same time correcting spelling errors that were in the original clipping. Reporters often couldn’t spell and typesetters didn’t do the greatest job either sometimes. Once that is done, I save it to combine with all the other clippings in that particular group so that I have one “master file” for that group.

This is a cumbersome process but I’ve learned an enormous amount about the accuracy of OCR in what I’m doing. Most people experience much better results because their original document is not an old newspaper with fuzzy print, poor contrast, ink spills, torn paper, and faded print. OCR + AI is a wonderful application but it takes time to do what I’m doing.

I just wondered if there was some kind of workflow that I could set up that would make this go a little faster. If not, oh, well …. it was a worthy question to ask.

Yes, the primary goal of LLMs is to process and generate human language. This they do incredibly well. But often they don’t produce true statements :wink: https://www.paloaltonetworks.com/cyberpedia/large-language-models-llm

1 Like

bjornivarson:

:joy: I’m using Gemini as a tool - I’ve learned rather quickly that it is not a human, despite the annoying last sentence asking me: “Would you like me to …” Me? Really??

1 Like

Bluefrog:

I don’t understand…. what is this?

Recognized text as an Annotation file associated with this document.

Bluefrog:

Thank you for taking the time to share your screenshots. However, this is all miles above my pay grade so I’ll continue plodding along as I have been doing.