I’m a little puzzled and need some help/explanation how to deal with that situation:
I’m using DevonThink Pro Office 2.0.3.
I’ve indexed a big folder containing hundreds of pdf files.
Some of them were not searchable (image only), so I imported them with OCR.
What happened:
In my DevonThink database, the original pdf file was automatically put in the trash and physically a new pdf file with the same name as the original was created somewhere in the .dtbase2 package.
BUT: the original pdf file remains in the original (indexed) folder and that’s my problem.
I have duplicate files (and the older version in my “official” directory in the finder.
If now I try to open one of these newly imported files (from within the DT database) and make annotations with SKIM, the .skim-File is stored with the corresponding pdf file e. g. in mydatabase.dtBase2/Files.noindex/pdf/9/.
And therefore it is not automatically added to my Database and would NOT be found in any database search.
…
from my point of view I would have to do the following:
export all pdf files from the Database and merge them (how?!) with the files in the “original” pdf-directory.
Manually search for all skim files in the Files.noindex/pdf-Path and also move them to the “original” pdf directory.
Then: remove the indexed directory from my database and add it again (again only indexed).
Is that the right way?
Do I risk to lose any information when removing the indexed folder form my database and index it again?
p.s. if I just manually moved the “new” pdf files and skim files from the Files.noindex/pdf directory to the other one, I assume the database would search them and run into problems?!
Sorry for the long text, I hope I could make my problem clear…
Kind regards
Martin
The original, indexed files should be removed from the folder after emptying the trash and clicking on the option to delete “Files” or “Files & Folders” too.
This is not recommended as the path/filename of the PDF might change (e.g. after renaming or modifying).
Saving the annotated files as a .pdfd should be more reliable. But don’t save them directly inside the database package, save them e.g. in the global inbox (should be available in the Finder’s sidebar and therefore in the “Save” panel).
Thanks, Christian!
Your answer helped me to correctly formulate my “real” question:
Is there a way to convert indexed documents into “PDF+Text” and replace the original document (and not create an imported copy)?
If no, when will you implement it?

Kind regards
Martin
That’s not (yet) possible but a script should be able to perform the conversion and replace the original file with the converted one. Here’s a simple example (using the desktop for the conversion) with little error handling, be careful:
-- OCR indexed pictures/PDF documents
tell application id "com.devon-technologies.thinkpro2"
set theSelection to the selection
repeat with theRecord in theSelection
if (indexed of theRecord) and ((type of theRecord is picture) or (type of theRecord is PDF document)) then
try
set thePath to path of theRecord
if thePath is not "" then
set theConvertedRecord to ocr file thePath to incoming group
if exists theConvertedRecord then
set theNewPath to export record theConvertedRecord to "~/Desktop"
delete record theConvertedRecord
if exists theNewPath then
set theIndexedRecord to indicate theNewPath to parent 1 of theRecord
if exists theIndexedRecord then
set name of theIndexedRecord to name of theRecord
delete record theRecord
do shell script "rm " & quoted form of thePath
end if
end if
end if
end if
end try
end if
end repeat
end tell
Thanks, Christian - I’ll try it out.
An improved version of the script for DEVONthink Pro Office 2.0.9 is available here: viewtopic.php?f=2&t=13039&p=61304#p61304