Concordance disapppears!

Strange behaviour - Looking at the concordance window I use my mouse to add tags by right clicking. All is fine, then “boomph”. The contents on the concordance window is gone !

Hmmmm

The only way I can kind of resolve is to OCR “to searchable PDF” again. Then I see the concondance again, and can add tags (I’m not sure how long this lasts though…)

This suggests its not an intrinsic problem of the pdf, but some type of bug that occasionally and under specific circumstances arises when adding a tag.

I note that I have other documents missing concordance (ie blank contents when there were contents originally)…

Anyone else with this errant blank concordance issue?

(ps here is the short 4 page pdf I used
“www mit edu kardar research seminars Casimir Casimir1948.pdf”

  • it was when adding “Kd” after a few other that concordance went blank, for me)

Did you edit the PDF document? In some cases macOS’ PDFkit might corrupt the text layer and then it’s not searchable again (no concordance).

Thanks for your feedback - No I didn’t edit the pdf

More experiments show that if I paste the text into a fresh text file, that too does not display the concordance data.

But, if I add one character to the text file, so it is not identical, concordance data displays…

So it looks like the text of the pdf IS generating concordance data…

But it may fail to get to the screen due to another variable …

Next question is whether an error has appeared, and that is now forward propagating from the original file, due to the duplicate checker.

It looks like I need to clear everything and repeat to see what is going on…

(I have kept all visible pdf viewers closed across the mac)

Steps how to reproduce this would be great or a screen recording showing the issue, thanks!

I’m now going to try to copy paste (appears that txt files are blocked from uploading) exact verbatim the text content in my next reply, that produces the issue (all related files are in trash, so this text does not flag as a duplicate, although I’m not sure if its still interacting in some way). This text does not display a concordance as it, although I don’t know if my devonthink/mac state/environemnt is now different

Oh dear - I can’t get the basic text over to you…

Because it appears to have a link in, it is blocked
I can’t upload because ‘txt’ is not allowed
I can’t change to suffix to eg ‘png’ because it appears corrupt…

Any suggestions?

Just send the zipped file to cgrunenberg - at - devon-technologies.com plus the necesssary steps how to reproduce this, thanks!

Have done so, also included the original pdf I dragged into devonthink

"Hi there, here’s the text file

at present:

(1) I drag into the inbox

(2) I check concordance (after giving time) and no content observed.

prior to this:

(1) I downloaded this pdf (also attached)

(2) I generated searchable text via the OCR option

(3) I checked the concordance tab and at this point it did display fine

(4) I added tags (see below) but around getting half of these and I think with ‘Ky’ (not 100% sure), concordance stopped displaying

(5) I then repeated the OCR option and it displayed

(6) Following this point a series of duplicate files generated by me to try to narrow down the issue. I was getting a lot of blank cocordance ‘outputs’. But it wasn’t wholly consistent, probably because code to check for duplicates was running…

(7) So i pushed all related files into trash

(8) And now I see what is listed under “at present”

"

PS - this exact text sent, now seems ‘inside’ DT and I can’t get it to display anything in corcordance…however change one character, “bang”, concordance data comes back…

A further clue it is more subtle than one character change:

In the original I changed the word “Arts” to

(1) “” — FAILS
(2)“A” — FAILS
(3) “Ar” — FAILS
(4) “Art” — FAILS
(5) “Artss” — CONCORDANCE DISPLAYS !!

In which original actually? The text layers of PDF documents are not editable.

in order to experiment, the orignal text file is the first one generated from the pdf using DT’s “convert to Plain Text”

I observe pdf issue = text file issue

Does this also happen in a new, clean database? Is a verification of the existing database successful?

It works fine in a database …

However it fails in the inbox… ( I have used “Verify and Repair Database” and no problems reported )

Does it work after rebuilding the inbox (see File menu)?

After rebuilding, problem remains.

So it only applies to “inbox” database, even after repair and rebuild…

I can move one or even a pair of the same files into any other database, and it works just fine.

Does the inbox contain sensitive data? Otherwise please export it via File > Export > Database Archive… and send the archive to cgrunenberg - at - devon-technologies.com (if not too large, otherwise please provide a download link). Thanks!

To save sending the whole inbox I will move things out first, see if it persists, then send the remaining errant part

Great, thanks.