To import or not to import?

ruigrguerreiro · April 13, 2010, 9:54am

Hi there,

My first time here, I’m trying the new DT and loving it, but now I got into something I cannot figure out how to solve.
So far I’ve been importing some PDF’s to DT, just by dragging n dropping them to the correspondent folders (Groups), and now I’d like to get all my digital magazines in there too, so they can become searchable, but they are on a external HD, and I thought it would be best to keep them there and just create aliases in DT (Replicates, I think is the right term). So, while most of them just worked out fine, there were a couple that showed up in the Log window after importing (Replicanting), with the “No Text” tag under the Info column.
I located the files and Ctrl+Click to call up them contextual menu, and chose Convert to Searchable PDF. First, this took a long time (those are fairly large PDF’s), and second, once it finished I had both an alias and a local copy of the same document.
so my question is, why did this happen and how will I solve it? Are some kinds of PDF’s not searchable if they are not actually in DT’s database?

Thanks a lot for your help.

Best regards,
Rui Guerreiro

Greg_Jones · April 13, 2010, 10:47am

A couple of thoughts here, starting first with the terminology. You add documents to a DT database by either Importing, where the document is physically contained in the database, or by Indexing, where the document is linked in the database to the original document in the file system. Indexing would be the equivalent to creating an alias in the Finder. A Replicant in the database is different from an alias, in that replicants are pointers to the original document in the database structure. In other words, when you replicate a document, each file is a replicant-there is no ‘original’ and no ‘alias’, just replicants.

As to your documents, some of your digital documents are ‘PDF+Text’, indicating that they already have a searchable text layer in the original document. The documents showing as ‘No text’ are scanned images of the magazine and have no text layer as is. With DTPO and its OCR capability, those documents can be converted to add the searchable text layer, which is what you have already done. The OCR conversion creates a new document with the text layer added, so that the original is left unchanged. With indexed files that are converted, the new document is contained in the database rather than being created in the same location in the Finder as the original. What you can do, if you want to keep all of these documents as indexed files in the database, is to move the document to the Finder and then index those documents. Note that with some PDF documents such as digital magazines and books, the converted PDF+text document will be much larger than the original. With some of my digital material, I prefer to create a new PDF containing only the table of contents and the index, OCR that document, and then the contents of that document are searchable.