DT Best practice

If you index a folder in the Finder and those documents are already in the database, then yes they will be duplicated. I assume that the PDFs that you have imported are located in various groups in the database, and you do not have a quick way to identify what and where they are? If so, you’ll need to do some work to make the transition. Here’s the steps that come to mind initially:

  1. Backup everything before proceeding.

  2. Rename in the Finder the folder that you are using now to store the PDFs to give it a descriptive name e.g. ‘TempIndex’.

  3. Create the folder, or folders, in the Finder that will serve in the future to be your Indexed folder(s) for these PDF documents. It is important that for now this folder, or folders, be empty, i.e. do not use the current folder (from step #1) in the Finder as your master Indexed folder.

  4. Index both the TempIndex folder from step #2 and the permanent (empty) folder from step #3 into the database.

  5. Create a Smart Group in the database that looks like this:

  6. Now when you view the Smart group, you will see all duplicate PDF documents and (assuming you had no duplicate PDFs already) one of the instances will be imported in the database and one will be indexed. Here is what this looks like for me, although if you have the preference set differently to display duplicates your view may be slightly different. Note also that I have the Three Pane view set to show date added-could be helpful to identify which PDFs that you just indexed.

  7. At this point you need to go through the list of documents and select the duplicates that are in the database and move (or replicate) them to what is to be the permanent indexed folder. (What you are doing now is somewhat counter-intuitive as eventually we are going to delete all the documents that we just indexed, but you need them in the database to locate all the PDFs with metadata that you want to move out to the Finder as indexed files.) You can go through the entire list by right-clicking on the non-indexed documents until you have selected them all, right-click, select ‘Move To’ or ‘Replicate To’, and move or replicate them to the permanent indexed folder (created in step #3).

  8. Now that you have moved all the PDF documents that were contained in the database into a group that is an indexed folder in the Finder, select all the documents (you are not using the Smart group any more-select the actual group in the database) and right-click, and select "Move To External Folder’. This step actually exports the PDFs that were in the database back out to the Finder, with all the tags preserved.

After this step, you are almost done. You can delete the TempIndex group from step #2 from the database, empty the trash, and select ‘Only from database’ when prompted. It’s probably a good idea now to compare, in the Finder, the TempIndex folder with the permanent indexed folder to ensure that you did not miss any files in the conversion.

The will not be automatically indexed, but there is a script in the DEVONthink Extras folder (from the downloaded disk image) that you can attach to the indexed folder. Then, any time you select that indexed group the index will be updated. The ‘File>Synchronize’ command does this also, but using the menu means you have to update the index manually.

If you move any folder in the Finder that is indexed in the database, then the links will be broken. You’ll want to pick a name and location for the folder created in step #3 above with care to ensure it is where you want it and named what you want. Re-indexing a folder is no big deal if you only keep the indexed documents in the indexed group in the database. If you start replicating or moving the indexed documents in the database and the link is broken, well that doesn’t end well!

By default, DEVONthink sets every group in a database to have tagging enabled. Indexed groups are excluded by default, so if you want the group name applied as a tag, you’ll need to uncheck the option to exclude the group from tagging (in the Info pane).

Also by default, DEVONthink does not update the filesystem with the OpenMeta tags applied to documents in the database as this metadata is not available to external applications anyway. (The tags are updated when a document is exported). However, indexed documents get their OpenMeta tags written to the filesystem immediately, so that you can use the metadata with other tagging applications, or even search on the tags with Spotlight.

Hope this helps.

That is extremely helpful, more so than I could ever have expected. Thanks so much. I’m going to give this a try when I get a free block of time. I’ll report back the results. Really, truly: thanks!

Oh, one follow-up. If I do all this and it works and I set up the new Finder folder to be automatically indexed by DT, that means I’ll have to be more vigilant about new additions to the database, right? Right now, I don’t have any external folders indexed. When I index or import anything new, it goes into the inbox. In this scenario, files I drop into the PDF folder will be indexed by the database and, if I set the options so, tagged with just the group/folder name? I assume the one indication that I haven’t “processed” a newly indexed file yet will be that it’s marked unread?

Indexed files are not marked as unread when received. One thing that you could use (I do) is a smart group that shows the files that are added Today (or yesterday, this week, etc.-whatever works best for your situation). I keep the smart group pictured below in the Sidebar so I can see what has been added today across all my open databases.

OK, I followed all the above steps, and had some success. One change: instead of going through and selecting all the duplicated files (with metadata), I added to my smart group a filter for all Instances of PDFs that are not Indexed (would have done a screenshot like you, but not sure how to paste that into the forum here). This worked, but the smart group only caught about half the files. I did a test by reindexing one of the missed files which has exactly the same name as an imported existing file, and the database does not identify them as duplicates. They may be slightly different files, however, as I used Acrobat at certain points to reduce the file sizes of my PDFs so that may be the issue. At this point, is there a way around that? I can’t seem to use filenames in any meaningful way to filter, but I’m experimenting. Thanks again.

Actually, it looks like, in the current version of Pro Office, at least, when I drop files into the “hot folder” I created using the index folder action script, the item is indexed in DT, marked as unread, and placed in my inbox, which is all exactly as I would have it. So that’s at least not a problem. As for the unindexed PDFs, I think I may just have to go through them manually, cutting and pasting the tags from the old ones to the new ones, unless there’s another way!

Attaching the index folder action script to a folder in the Finder will indeed show the documents as unread in DEVONthink. Using the Synchronize command or the synchronize script that I discussed will not show new documents as unread.

Aside from the read/unread behavior, the key differences between the two methods is that the folder action a) forces an import any time a file is added to the folder or when a file is renamed in the folder, which means that DEVONthink is activated. If you normally leave DEVONthink running, that should not be a problem. Using the synchronize method will not launch DEVONthink when files are added to the Finder folders.

The other thing is that with synchronized folders, you can place them where you wish in a database and you don’t need to move them from an Inbox to another group in the database. If you normally want to move the files around in different groups, the the index script could be a plus. Both methods work well, so it’s a matter of picking the workflow that works best for you. Sounds like you are getting there!

Perfect.
But I would like to do the same between two external drives. It does not seem possible. Am I wrong?

Depends on the set up. Could you describe the use case a bit more? Is DEVONthink running on a single machine, which accesses one external drive and then another. Or on two machines, each of which access their own external drives? It’s not clear from your question.

Many thanks for your quick response.

More precisely, I work on three Macs. A first iMac in town, a second in the country, and a MacBook in transit.

On each internal drive there is a partition for the system [HD-system] and one for the folders [HD-Workbook] that I synchronize with Synchronize! Pro X. It is on this partition that I would like to keep my database DTPO, but unfortunately the links are not maintained when i synchronize.

If I understand correctly, on each machine you have a path to your data that is something like /Volumes/HD-Workbook/Documents/My Data/My Database.dtBase2. If your indexed data is in something like /Volumes/HD-Workbook/Documents/Indexed Data/…, then I would expect that you could reproduce this on every machine, and the indexing scheme would work.

You need to be careful to ensure that the data resides at the same relative path from the root on each partition. This means, if your data partitions are always named HD-Workbook, and you always have the same folder structure with the same names defined on each file system, then you should be good to go.

Of course, an alternative is to keep your data on a portable external drive that you attach to your city, country, and portable machines - then you never would be concerned about configuring the scheme.

I realized my mistake. To avoid confusion, I had not given the exact same name to different partitions of computers (i7-Workbook, Intel-Workbook, MB-Workbook). I did not believe that this detail was important.

I fear the possible consequences of changing the names of partitions (especially with Time Machine and Synchronize! Pro X). I think rather to create new ones. I also considered the alternative of a portable external drive. But I’m not sure if a backup of this disc could be used in case of trouble.

Anyway, I see that solutions exist, I will try and bring you my feedback.

I appreciate your expertise and your availability.

Because OS X (and DEVONthink) conveniently collapses the front part of the path on the boot partition from something like /Volumes/MacIntosh HD/My Username/Documents/… to simply ~/Documents, it is easiest to keep your indexed data on the boot partition - because if they are in that location, then it doesn’t matter what your machine name or user name is on the different machines.

This is an interesting point. We understand that OS X has provided a place for everything (Applications, Documents, Pictures, Music). And it is very suitable for normal use. But, for intensive use, it results in a very large boot partition.

And I find advantageous to keep this partition as small as possible to allow better maintenance. I try to limit this partition to 100 GB, and the other at 200 or 250 GB to store my documents including lTunes and iPhoto.