DTPO "access denied" error doing OCR

hybra · December 10, 2009, 1:49am

Hello,
I just began using DevonThink Pro Office 2b7, and whenever I try to “Convert image to searchable PDF” the OCR stops and gives me an error:

“Couldn’t open image file”
Access to /private/var/folders/v0/v06CWsRoF+OiGq-pxmFxi++++TI/-Tmp-/DEVONscribbler-0/B6D4A0E2-FC16-420B-940C-CB2AA36CEA98 was denied.
Continue to import the document as is, or skip it?

Tried to Repair Permissions on the disk but didn’t help.
Anyone’s guess?

thanks

annard · December 10, 2009, 3:44am

This message means that the software can’t write into a folder it created earlier. That’s not good and shouldn’t happen so something is not right on your Mac. Have you tried to restart it, then do Verify Disk and Repair Permissions using Disk Utility?

hybra · December 12, 2009, 5:19pm

Yes I did the Repair disk permission.
Anyway, other applications have no problems in reading/writing to the temporary folder. Only DevonThink (DTPO beta 8 now) is giving me this error.
The OCR engine copies the indexed files into the temporary subfolder DEVONscribbler-0, then when it tries to open them to analyze the images and convert them to PDF it seems it can’t access the folder anymore. That’s weird because it just wrote the files to it…

BTW, I am using Snow Leopard 10.6.2.

Bill_DeVille · December 12, 2009, 5:53pm

Possibly an OS X ‘Sharing & Permissions’ glitch on your computer. Check (in the Info panel) your permissions for the folder and for the files contained within it. You (the User) should have ‘read & write’ permissions.

hybra · December 12, 2009, 6:17pm

I have R+W permissions to the folder. That’s one of the first things I checked.
I even tried to delete the folder and let DT recreate it, but the error happens the same when the OCR starts analyzing the temporary pictures after they’ve been copied to the folder from the indexed file.

Bill_DeVille · December 12, 2009, 8:34pm

That’s weird.

I’m running DTPO2 pb8 under OS X 10.6.2 and with no problem for scan/OCR to my databases.

hybra · December 12, 2009, 10:53pm

It’s even weirder if you think that this didn’t always happen. I succeeded in converting to pdf some images in the beginning, but then this behaviour started to happen and now is permanent.
Is there any way to manually change (in some preference file maybe) the folder where this DEVONscribbler (the OCR engine?) stores temporary files?

also, I noticed the temporary files have encoded (GUID) names like “0B49081B-0887-42A9-8219-BD0D3A1AEB27” with no JPG extension to them, is that the normal behavior for the OCR engine? I also have to mention the files I am trying to OCRize are not imported, but indexed by the DTPO database.

UPDATE:
Actually I just tried to convert to searchable PDF the same images that were indexed, but after IMPORTING them. OCR works if the images are imported, it seems. Not so if they are indexed. Can you confirm this behavior?

Bill_DeVille · December 13, 2009, 12:27am

First, about conversion of Index-captured image-only PDFs: If you expected the PDF that was Indexed to be converted to a searchable PDF, that won’t happen. Instead, the external PDF will remain image-only and the result of OCR conversion will be a new Imported searchable PDF in the database, no longer linked to the external copy.

I just Index-captured an image-only PDF, then selected it within the database and chose Data > Convert > to Searchable PDF. The result was as described above. The external PDF remained image-only, and the database now holds a new searchable PDF that’s stored inside the database.

That’s what should have happened when you tried to convert your Indexed PDFs. The technical term for what really happened is that your computer is “frakked” in some way.

One possibility is that you have installed software that’s messing with (hacking) OS X so that errors result. Now that, in OS X 10.6.2, Apple has fixed a bug in Snow Leopard that made it risky to work in a Guest account I suggest that you run DTPO2 from a Guest account without the extensions such as Safari plugins, third-party preference panes, etc. that may be in your User account. For testing OCR, of course, you will have to redownload the ABBYY software after logging into a Guest account, by running Help > Install Add-ons.

If you try that, let us know what happened.

hybra · December 13, 2009, 8:59am

I will try something soon to understand where my mac may be “frakked”

In the meanwhile I partially solved the issue and getting what I need by “merging” the indexed images in a single pdf (that it’s created inside the database) and then converting this new pdf to pdf+text via OCR.

Thank you for your help.

[size=150]UPDATE: [/size]
Definitely I don’t think I have a “frakked” mac
This is the latest scenario:
Let’s say I have scanned a magazine (so mixed images and text in a bunch of jpg format files).

TEST 1:
I converted these JPG images to a PDF, outside of DevnThink.
I indexed this new PDF from DevonThink like you did. Converting this indexed PDF gives no problems at all! The OCR engine creates a subfolder XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX.SEPARATED inside

/private/var/folders/v0/v06CWsRoF+OiGq-pxmFxi++++TI/-Tmp-/DEVONscribbler-0

then in this subfolder it creates a TIFF image per every page in the PDF and then starts converting successfully.

TEST 2:
I indexed in Devonthink the scanned magazine JPGs directly.
If I try to “Convert to searchable PDF” any of the indexed JPGs it gives me the “access denied” error.
So it sounds like a bug of the OCR processing workflow in DTPO, or is it by design that indexed JPGs should not be convertible to searchable PDF?

TEST 3:
I converted one of the JPGs to TIFF in Photoshop. Then I indexed the TIFF from DevonThink. Guess what? The OCR engine has no problem converting the indexed TIFF to searchable PDF. No “access denied” error.
So in the end it seems it’s a bug in the OCR engine related to converting indexed image files other than TIFF (i.e. JPG) to searchable PDF.
Hopefully DT will have the time to fix it for the final release or the next beta!

Bill_DeVille · December 13, 2009, 7:46pm

I haven’t tried an image other than PDF. Perhaps the ‘frakk’ is connected with that, perhaps not.

But, as my results demonstrated, the OCR procedures were not designed for the case of Index-captured images. The external PDF remains an image, and the searchable PIDF within the database no longer retains the Path to that external PDF.

Christian has stated that in the future, there may be changes in the way that synchronization occurs in Index-captured content. But that’s not now.

s.hoffman · December 13, 2009, 11:05pm

It sounds like some very weird permissions problem is happening for you. Have you tried re-installing the Abbyy OCR plug-in from the help menu?

Sorry to hear about your troubles, it’s very strange that there is a lack of permissions on the /private/var folders these are recreated every time you reboot and always owned by your own account.

I’m having no troubles using OCR with 10.6.2 and the latest Devon beta.

hybra · December 19, 2009, 3:05pm

I am not having any issue with my mac’s file system permissions…I posted three tests I did before.

I believe this behavior of Devonthink is either by design or a bug that needs to be fixed. I still haven’t got any confirmation by Devontechnologies.
In case it’s a bug, I just want to point out that I am not “disappointed” with the product, I am happy I purchased it. Just (being a software architect myself) trying to help improve an already good product. Is this forum the right place to post bug reports, for start?

I was saying that OCRzing external indexed images seems to only work with TIFF files and not, for example, JPEG files.
You can try if you want and report us you get the same behaviour: i.e. the “access denied” error.

Just try to index (not import) an external JPG picture (with some text in it of course, if you want to test the OCR engine too) in Devonthink, then try “Convert to searchable PDF”. If the error that come out is by design then the dialog should tell something like “Indexed images does not support text recognition” but not that misleading “Access is denied”.

Just my 2 €cents

annard · December 20, 2009, 8:23am

It’s odd though that I cannot reproduce the behaviour you describe whatsoever over here. Especially since the OCR module doesn’t care where the files are located. And support@devon-technologies.com is the most reliable place to file bug reports or requests. In your case I would recommend you describe in great detail where those files are located, how did you index them and maybe even give examples with the bug report.