DevonThink 2 databases and TimeMachine Backups

I have been testing the trial of DTPO2 and had imported a decent sized pdf library into it. The result was a 14G database file. My question is how DTPO2 handles their database in relation to TimeMachine backups. If I import 1 more pdf, will TM be compelled to backup all 14G again?

Aperture 2 had this issue initially and Apple made some change to their library/database structure that allowed TM to only backup the changed files. I am hoping something similar is occurring in DTPO2…

No, Time Machine won’t have to recopy the entire DT Pro/Office 2 database package every time new content is added. DEVONthink 2 has a different database structure than did DEVONthink 1.x.

Since this is kinda related, I’ll post it in this thread.

I use jungledisk to backup to Amazon S3. It’s a slow process on my current Internet connection, so I don’t want to transfer any more than is necessary. I noticed today that there are new directories being created frequently inside my DT2 DB in the Files.noindex directory. I don’t remember making any changes to the files I’m finding in there. Most of the files have dates much older than the directories that contain them. So, my questions are:

  • what’s going on?
  • can I exclude the files in Files.noindex from my backup and still safely restore from backup?

Your database documents are stored in the Files.noindex folder, inside the database.

If you exclude File.noindex from backup, your database document files would not be backed up. That’s not a good idea.

1 Like

Okay, the “noindex” threw me off. Made it sound like these are a subset that are excluded from search for some reason.

So, why are new directories being created for old documents?

Part of Christian’s magic. :slight_smile:

Is this a one time thing as part of the most recent upgrade? Or will this keep happening in that directory? I ask because with backing up to S3, uploading the same PDFs over and over can get to be kind of expensive and time consuming. The way jungledisk works is that it first checks if there are changes. I’d like to keep changes to just the ones I’ve made. I don’t understand the ones that DT has started making. Will they stop?

In the conversion from a DEVONthink 1 database to a DEVONthink 2 database, subfolders storing files by filetype are created within the folder that holds document files.

Thats a one-time thing.

Obviously, when Time Machine first sees a new DEVONthink 2 database, it will have to copy the entire database. Subsequent Time Machine backups will need to record only the changes made to that database.

Sorry to jump in here, but I think this is a related issue: I’ve picked up (from posts in other threads) that copying a DTP database while it’s open is a Bad Thing and could damage the database. Is this true of Time Machine? Time Machine is hell-bent on backing up every hour, so do I need to close DTPro every hour to keep the database safe?

-Simon

It might damage the backup of a database, but I don’t see how it could harm the original.

Probably not necessary to close it, but I wouldn’t want to be updating it during any backup. Since a DEVONthink database is constructed from multiple files/folders you’d like to ensure they’ve all been backed up before making any changes. Otherwise the backup might contain an incompatible mix of old and new data. For example, backing up DEVONthink-*.dtMeta files while any are being modified is a bad idea. But if a backup is relatively small the window of vulnerability to problems is smaller, too.

I think the risk is in making inconsistent backups (of any data, not just DEVONthink dbs) without realizing it until problems occur when they’re restored. Some backup methods have ways to reduce that risk; I’m not sure about Time Machine.

I don’t bother to close DEVONthink when a Time Machine backup is made. I’ve tested a couple of Time Machine backup recoveries and they have been fine.

On the other hand, as sjk noted, I don’t continue working on a database during a Time Machine backup, nor is my ModBook always connected to the external drive for Time Machine backups, so I don’t consistently do hourly backups.

Note: When I’ve done important changes to a database, usually several hours of work, I don’t wait for a scheduled backup. I’ll take a break and invoke Scripts > Export > Backup Archive. When I come back, the database has been verified, optimized and has current internal and external backups. I store the compressed and dated file produced by Backup Archive on an external drive. If I were to need to resort to a backup of my main database (which has happened only once in about four years, and then because I was testing a hack, which bollixed the database), my first choice would be the Backup Archive backup. Periodically, I copy Backup Archive archives of my databases to DVD and store the DVDs offsite. That’s insurance against loss of everything - including the Time Machine backups - from theft or fire.

Thanks sjk and Bill for the suggestions. I feel much better now. I’ll start using that Backup Archive Script!

-Simon

I have a related question.

Is it possible to return to an older version of a file via TimeMachine when that file was inside the DTPO database? I seem to have overwritten a file with a new version and I’m pretty sure that TimeMachine must have the older version backed up but I’m not able to find it the usual way (Finder window, then invoke TimeMachine and travel back in time in that folder).

You mean the file was imported into DTP and you altered it within DTP?

In that case it’s easy:
Select the new Version of the file in DTP. Invoke “Show in Finder” from the context menu. This will take you to a folder within the files.noindex folder database Package. Now fire up Time Machine from that folder an go back in time and restore the file.

(word of caution: fiddling with the files.noindex folder within the database package is surgery on the open heart. So do a backup of the database (Export>Archive) before restoring the file and do a Verify and Repair after restoring the file.)

Johannes

I have avoided this issue as follows: via an Automator workflow, each changed database is copied to another directory on my HD. In BackBlaze (my online backup utility) I excluded the original databases, which may be open, but included the copied databases which I never open. That way, only closed databases are ever backed up - and they are backed up by Automator every night.

The same would work for TimeMachine: exclude the databases with which you work, include the nightly copies.