DT 2b3 OCR issues / losing tags & folder structure

When I run the OCR on pdf’s the tags / folder structure is lost on the new pdf’s that are created with the searchable text. Is there a way to retain the tags / folder structure?

Tags, such as keywords in Comments, will be lost, as the OCRd PDF is a new file. The image-only PDF still holds them, and they could be recovered from that file.

If one or more image-only PDFs are OCRd within the same group, the searchable PDFs will be created in that group. But if multiple PDFs from multiple groups are selected at the same time for OCR, they will be saved into a single group, IIRC, the group in which the first PDF in the selection resides.

I’m suffering from the same problem, I think.

I have a number of non OCRd PDFs that I’ve scanned in whilst waiting for the fix. As suggested, I created a Smart Group so that I know where they all are. Now, I was hoping that it would be as simple as selecting all documents in the Smart Group and converting to searchable PDF with the new document overwriting the old document (as happens when performing this action within a normal group). However, it doesn’t work like that. The document is converted and the old document is deleted but the new document loses its tags and folder structure and is saved somewhere else (not even in the Inbox). The only way I’ve been able to find it is through the ‘Today’ filter from where I’m able to drag it back to where it should be.

Yes. See my comments above.

The point here though is that because it’s a Smart Group the converted documents don’t appear to be saved anywhere. They are obviously being saved somewhere because they can be accessed using the ‘Today’ Smart Group but if you look at the info for the document the location field is blank and I can’t find them anywhere within the normal structure of my DT file. Any idea where they are being saved? Is it bug report time?

If you selected multiple image-only PDFs for conversion to searchable PDF, the searchable PDFs were all stored in the same group location.

Tip: If you’ve converted files today, they will be in the Today smart group. They will also be in the All PDF smart group. If you know the Name of one, select it within one of these smart groups and press Command-R (Reveal). That will display the document location in your organizational structure.

Want to find image-only PDFs in your database? If your view window doesn’t already have a “Kind” column in the documents list, you can add that column using View > Columns and check the option, “Kind”. Now click on the “Kind” header to sort by Kind. Note: If you select the PDFs one at a time for conversion to searchable PDF, the converted PDF will be in the same group location as the image-only file was in. But if you select multiple PDFs for conversion, they will all be placed in the same group as the first one.

I might be missing something here but I don’t think I am so here goes. :slight_smile:

I understand what you’re saying if you’re working within normal groups but if the document selected is in a Smart Group the converted document is NOT saved back into its original group even though the original unconverted document is deleted from that group (and also the Smart Group). This happens when a single document is selected for conversion. If multiple documents are selected then not only do the converted documents NOT get saved into the same group as the first one all the original unconverted documents are deleted from their respective groups (and also the Smart Group) as well. As I said earlier, they are obviously being saved somewhere, just not anywhere obvious or particularly convenient.

No, that’s not what I said.

Try a search for the name or content of a PDF that has been converted to searchable PDF. You will be able to find it, and the Command-R (Reveal) action will display its location in your database.

You can find a document that’s been converted from a Smart Group by searching but the Command-R (Reveal) action does not display its location in the database. Command-R does show the location of other documents so the problem would appear to be specific to documents that have been converted from a Smart Group AND not yet dragged from wherever/however you find it to where you want it within your database.

Create a Smart Group to show you all PDFs that have no text and then convert one to searchable pdf from within the Smart Group. When it’s been converted the original is deleted from within its ‘normal’ group. Find the converted document and perform a Command-R on it. DT can’t (or at least doesn’t) tell you where it is.

The converted documents cannot be placed into the same group as the original document as it could be a replicant and so do not have a one single location. They should get placed into the database’s inbox.

Ah, I seem to have had lost the focus of this thread. Check whether your database is “Shared” in the database’s properties (File > Database Properties). Unshared databases cannot be searched and show zero results whatever you search for.

I’m getting confused now. :confused:

I’m not having a problem searching for documents I’m having a problem determining where converted documents are stored. Try my little example and tell me where the document is stored. If it’s not within its original group I would expect it to be in the Inbox but it isn’t. It doesn’t appear to be anywhere within the structure of the DT file until it is dragged from the search results and dropped into an existing group.

I appreciate that converted document(s) can be searched for one way or another but if I’m converting a few and they’re not created in their original groups I’d like just to be able to go to where they’ve been created and process them from there (very much like the way the Inbox works when scanning in new documents). The way its working at the moment it’s quite possible to effectively ‘lose’ documents.

If you run Tools > Show Info… (Shift-Command-I) on one of the converted documents you can’t find what’s in the Location: field at the bottom of the Information palette window? If something’s wrong posting a screen capture of that window might be helpful.

A picture speaks a thousand words. :slight_smile:

As you can see, there’s nothing in the Location: field.

I put the test document into one of my existing groups. I then went to my ‘All Non Searchable PDF Documents’ Smart Group where, having no text yet, it was displayed. I selected it and chose to Convert to Searchable PDF. At the end of the conversion process the original test document is deleted from the group I put it in, it disappears from the Smart Group (which is expected of course) and the converted document is created somewhere, but where???

Must be close, considering the length of this thread. :slight_smile:

DT documents that don’t display a Location are at the top (root) level of the database and where they’re listed depends on the current view and selection. If you’re in Three Panes view and no groups/items are selected the top-level documents will be listed in the top-right pane of the current window.

Does that make sense?

You got it, Scott!

Here’s a simple way to find top-level documents that need classification:

Select any group in your database Three Panes view, e.g., Inbox.

Now Command-Click on that group to deselect it. Do you see some “unfiled” documents at the top of your database? That’s where one can find stuff that shows no Location in the Info panel. Such top-level documents can then be classified anywhere you wish.

It does make sense, and there it is, my test document converted to searchable pdf. :smiley:

It would be much more intuitive if they went into the Inbox though. One for the wish list DEVONtechnologies?