database structure and iDisk

Hi,
I recently upgraded to DTPO 2.0. I run a solo law office and put all of my client matters into DTPO. It has grown now to 7.5 gigs and will continue to grow. I prefer to store the database on my iDisk so that it can constantly be backed up. My office is paperless, so everything is in DTPO, and if I lose that, a lot of documents are irreplaceable. So it is vital that I have a current backup. The problem is that every time a change is made to the database, iDisk syncing has to re-upload the entire 7.5 gig database. This takes days to do, so at this point the iDisk solution is breaking down. I never really have a fully backed up database to my iDisk, plus I never know really what is there. I do also clone my HDD every night so that I have a second backup, but the iDisk solution is preferable. Please change the database structure so that each document is a separate file so that iDisk syncing works.
Cameron Graber

Well I have a request of a similar nature, how about “exposing” the databases as pseudo drives? (even if read-only - anticipating any “Nooooo it will corrupt our databases” replies).

This would make “delta” synching of database to a remote target actually possible, in my case I rotate SSDs to/from a remote location and would like to use Synchronize! Pro X; if this is currently possible, I don’t know how.

Also it would also make access to the content from outside applications more efficient. When I have to upload files from a browser-based application, I have to extract it (from DT) to the desktop, then I have to upload from there. Not very efficient and user-friendly.

Personally, I rely on both Time Machine and the Backup Archive routine (Scripts > Export > Backup Archive) to back up my databases, which are worth considerably more to me than my car. Backing up is like buying home or auto insurance, only much cheaper.

When I’ve spent time and effort on a database, adding or modifying content and perhaps organization of the database, I don’t wait for a scheduled backup. At a convenient break time, I’ll invoke Backup Archive. When I return from break, the database is ready for more work, having been verified, optimized (useful after adding lots of content) and backed up internally and externally. The external backup is the smallest possible, compressed and dated archive of the database and is suitable for storage on external media (as a precaution against a hard drive crash, for example). Periodically, I copy onto DVD-R media my recent Backup Archive files and store them at my bank, as insurance against my computer equipment being stolen (oops! TIme Machine is gone) or my house burning down.

Some users have reported that DropBox, for example, is faster than iDisk/MobileMe. I haven’t tried DropBox, as I still prefer knowing where my backups are, I know that they will work and that I’m in full control of them. :slight_smile:

Bill,
Thank you for your response. The problem, however, is the structure of the database itself in that it apparently is one giant monolithic database. So any time I make a single change to the database, apparently the entire 7.5 gig database has to be reuploaded to iDisk. My guess is that you will have the same issue on your Time Machine backups. But if your TM disk is local, it won’t take nearly the amount of time to backup as it does to upload to an online disk. The uploading to an online disk at this point is becoming unworkable because of the size of the database and its structure. My request is that the structure of the database be changed so that a change to a single file in the database will only require that change to be backed up/uploaded, rather than the entire database itself. This is how Mail.app’s database works. It’s not how Entourage works, which is one of its biggest drawbacks.
Am I making sense?
Cameron Graber
Austin, Texas

I think we just need a “Synchronise” option under “Tools”, which would indeed synchronise (source -> target) .dtBase2 files, in order to make the process as efficient and painless as possible.

This is key to any effective backup and/or business continuity strategy, and would actually encourage DT users to make “off location” backups on a daily basis. Realistically nobody’s going to do this with CD-Rs (I used to carry DATs to my bank like +10 years ago) on a daily basis, yet you can be hit by a disaster (corruption, theft, fire, flood, …) anytime, including tomorrow - even though you have filled 100s of important documents in DT today.

Hi, Cameron. A great many years ago I was a grad student in logic and philosophy of science at UT, while at the same time doing research in biochemistry with Roger Williams’ group. Lived in a beautiful area adjacent to Barton Springs. Many pleasant memories.

DEVONthink Pro/Office are not monolithic databases. A database file is a package file, a special kind of folder in OS X. Indeed, if you select a database and Control-click on it, you will see a contextual menu option, “Show Package Contents”. The folder view of the database displays 10 numbered database files, Backup folders and (in DEVONthink 2) a folder, “Files.noindex”, that holds all of the document files within the database.

When you add a file to a database, whether by creating it within the database or by Import-capture of an external file, the file is stored within the Files.noindex folder. Other information about that file is added among the 10 numbered database files, including the organizational location. If the database can capture text content in that file, the database holds more information about that text content than would “ordinary” indexing, as artificial intelligence routines are an integral part of the database. Thus, See Also will examine the contextual relationships among the words in a document you are viewing, compare those patterns to those among all the other documents within the database, and suggest others that may be contextually similar. Likewise, Classify will examine the content of a selected document, compare it the to contextual relationships of the documents contained in the groups within the database, and suggest one or more groups that might be appropriate for ‘filing’ that document. These AI routines tend to improve in usefulness as a database grows. Of course, the Classify routine depends on non-random filing of content into groups. But if the user seeds groups initially with content that contains definable relationships by content, Classify will “learn” those relationships.

There are advantages to concealing the complex structure of such databases and other files, such as applications, as package files. For one thing, they become double-clickable, so that one can, in the Finder, double-click an application to launch it or double-click a DT Pro database to open it. Another advantage is protection of contents from improper changes. A DT Pro database will be damaged if any of the contents of the package is externally modified, deleted or renamed, unless via procedures allowed by the database software.

The reason iDisk copies the entire package file over again each time the database has been modified is that it cannot look inside the database file to isolate just those components that have been changed. That’s one of several reasons why I wouldn’t attempt to use iDisk for backup of my databases. Perhaps in the future iDisk will become smarter and faster; some users report better performance with DropBox, although I haven’t tried it. I’ll also confess that although some online storage resources have improved a great deal, I’ve been burned in the past and am not quite ready to trust my important files to the “cloud”. Note: ALWAYS close a database before storing it online using iDisk or DropBox. Especially if one attempts to access the database using another computer, the database will attempt to lock itself, and opening it will likely damage the database.

Some backup software can look inside package files to detect and backup only those elements that have changed since the last backup, others cannot. Eric uses rsynch for his databases. The DEVONthink 2 database structure works better with Time Machine backups than did the database structure in DEVONthink 1.

iDisk is the worst possible candidate to backup large files or packages such as DEVON databases. It uses the WebDav protocol for transferring the files. The WebDav implementation in Mac OS X is just not reliable and implemented in the most inefficient fashion possible. I do not recommend it at all.

Bill,
Thanks again for your post and working with me on this. Barton Springs is just as beautiful as ever, though Austin continues to grow like crazy. You probably wouldn’t recognize it.

I appreciate your explanation of the package issue and iDisk. Let me offer a little more information about how I am trying to implement this, although I do not expect that it will make any difference in your response. I have iDisk syncing turned on so that there is a local copy of my iDisk on my HDD. That way I always have access to my iDisk, even if I am not online. So any time I make any change to the local iDisk, OS X syncs that to the cloud iDisk. As previously posted, as the database has grown, this has increasingly become less workable to the point that it really doesn’t work at all any more because the computer is constantly trying to upload the database - there is never a break.

I suppose hoping that Apple improves its iDisk performance or makes it smarter would be nice, but is there any possibility that a future version of DevonThink will have a different structure that is more iDisk friendly? I recognize that I don’t really know what I am asking for here, even if I do consider myself pretty tech savvy. So forgive me if that is an outrageous request.

I have about come to the conclusion that I can’t store the database on my iDisk and will just have to keep it on my HDD with a backup ever 24 hours rather than a “continuous” backup like I am currently trying to implement.

Cameron Graber

I had mentioned two of several reasons why I wouldn’t use iDisk for backup of database files.

Annard, fortunately, mentioned the one which is even more critical than the ones I had mentioned (iDisk is too dumb, and too slow). The ultimately critical shortcoming of iDisk is Apple’s flaky implementation of WebDav — the risk of data corruption is too high to be acceptable.

Although I use Time Machine, I work on a laptop and am often not connected to my Time Machine backup drive. I never depend on a scheduled backup if I’ve done important modifications to a database. That’s why I use, and recommend, Backup Archive. When I break for lunch or dinner after making significant changes to a database, I start Backup Archive. When I come back to the computer, I know that I’ve got current internal and external backups, and the database has been verified and optimized.

Thank you again for your reply. I am trying to move the database off of my iDisk now and will just keep it on my HDD only (cloned daily). However, I am having trouble moving it. Every time I try to move it off of the iDisk, I get the following error:

“You cannot copy cg pc.dtBase2 to the destination because its name is the same as the name of an item on the destination, except for the case of some characters.”

I have tried numerous locations on my HDD, and I have also tried to move it to an attached HDD, but I receive the same message each time. It does not matter where I try to move the file, I get the message. I am quite certain that I do not have the same file name littered in all of these various locations. Can you offer any suggestions for what the problem might be? Note that I cannot move the 1.5.4 database off of my iDisk either - I get the same message.

Thank you,
Cameron Graber

Cameron, I’m clueless as to the cause and nature of the problem. Probably Annard or others could comment.

However, if you cannot move the database (caution: never try to move a database while it is open, or when it hasn’t completed synchronization from changes made on another computer), there’s a simple “brute force” workaround.

In the top level (Split view), select ALL the content of the database, then choose File > Export > Files & Folders. Create a new folder (e.g., in Desktop or Documents) to hold the exported content. When the export is complete, examine the Log for any files that failed the export, and save the Log as a text file if there were failures.

Now, in DEVONthink Pro/Office 2, create and save a new, empty database. Press File > Import > Files & Folders and select ALL the contents of the folder holding the previously exported material. When the import is complete, examine the Log, and save the Log as a text file if it identified files that were reported as failing the import.

Slightly off-topic, but thanks to all for this discussion. It answered a question I’d been pondering, but hadn’t gotten around to posting. Gotta love the DT forums.

Every backup application that is package-aware will not upload the whole database again as the database is basically a folder ‘disguised’ as a file. Time Machine correctly does not add the database over and over again to the backup but only the changes.