Un-OCRed a document?

I think I have undone an OCR of a document.

Here is what happened: The PDF document was added a couple of months ago and OCRed at the time. I highlighted it, and was able to summarize the annotations with RTF just this afternoon.

Then I highlighted a little more and noticed that the RTF summary didn’t automatically include the new highlights, so I deleted the summary and then asked it to summarize again, which it did. And it sort of looks ok - it gives the page number and then shows the highlight in color, as usual. Except the “words” it shows are random characters (I assume they are something meaningful to the display of the PDF). And I can no longer search in this PDF - it is like I’ve un-OCRed it.

When I copy and paste, this is what I get, and this is what it looks like in my RTF document (except it is green - the highlight color)

Highlight, Green:

L[ "?—” ”“­5%h L —[!7””—8."75 "?7 ”"­57["l&7.“[7“>” 7#]7“—7[47” —[ “?7 4?7’—””“% "7.4?—[8 &.xz“."z“% .[5 "?7 ‘7.[—[8 “?.” ”?7 z“ ?7 57“—!75 Q“z’ "?7”7 7#]7“—7[47”o

I tried to do the same thing with another PDF, and this didn’t happen - I was able to delete the summary, highlight more, make another summary and everything went well. So I don’t know what I did.

I’m unsure how to fix it. When I ask it to do the OCR to searchable PDF, it asks if I want to do that again - which implies that it thinks that it is still searchable. I didn’t continue. Should I OCR it again?

(I will note that the PDF looks fine - I can read it, it isn’t gibberish - it isn’t until I try to copy and paste from it, or summarize the highlights, that I get this weird stuff)

I’m on a mac using 14.4.1, and beta 2 of DT4.

Thanks!

The current version of macOS Sonoma is 14.7.6. You should stay current with the operating system point releases.

  • What are your OCR settings?
  • What language is the PDF in?

The PDF is in English and, again, the OCR worked on it whenever it was that I uploaded it (months ago), and was working until this afternoon. The document has lots of highlights throughout it. In fact, a summary with words in English (instead of gibberish) is still in the DT trash.

The OCR settings are

  • Convert incoming scans to searchable PDF
  • Searchable PDF enter metadata after text recognition
  • compress PDF
  • PDF resolution as source
  • Autocorrect deskew
  • Autocorrect page orientation
  • Primary language is English

Could I have clicked on something that would have caused the underlying PDF to change in this way? I was clicking on the little icon that highlights things in a PDF (the A with a gray box) - could that have cause this? Clicking that too many times?

One other thing to know about this PDF file is that it is a replicant - one of the few that I have. I’d forgotten about that until right now.

Sorry about not being quite up to date on the operating system - I’m hoping to update to Sequoia soon - when I can do without the computer for a day or so. Until then, it is what it is.

Thanks for your help!

You’re welcome!
Hold the Option key and choose Help > Report bug to start a support ticket and please attach the problematic PDF.

BLUEFROG didn’t say anything about Sequoia. Apple also keeps releasing minor updates for the previous two major macOS versions – that’s the “point releases”. I.e. I updated to Ventura 13.7.6 earlier this week.

Yes, sorry, to be clear what I meant was - I have not kept up with the smaller updates to Sonoma because I keep intending to do the larger update to Sequoia.

I have a hard time fitting any update into my schedule, so the smaller ones are, in a sense, just as disruptive as the larger ones. So why do two when you can simply do one? (of course, this would all be great if I would actually do the update to Sequoia - and I will, I swear). Anyway, sorry for the confusion.

No worries and I would just do the update in Sonoma unless you had a real need to go to Sequoia. I personally don’t feel anything compelling enough in it.

I don’t know - the writing tools might be fun to try. Anyway, I’ve updated my sonoma today, and will get around to sequoia soon.