Database recovery using Time Machine

I’m hoping for some insight into a Devonthink Pro Office database rollback, and data recovery.

I’m the techie for a friend that owns Devonthink Pro Office Ver 2, and is running it on an Intel iMac with latest Apple Software (10.6.6). She scans documents in using the OCR typically on a daily basis.

Several months ago she got a second computer, so I put her dtBase into Dropbox.

Based on your FAQ that I just read, that seems to have been a mistake.

Yesterday she said her computer acted weird, and froze on her. After restarting, she could not find some of her recent scans. When I looked at her computer today, I could not find any new scan documents since February 18, 2010, a month ago.

So I restored via Time Machine, on several different dates, yet never recovered anything more recent. When I search your site, I don’t see any reference to Time Machine.

Does anybody have any ideas on whether you can restore a dtbase from Time Machine ?

Thanks

mark

I have no experience with DTPRo Databases on DropBox, but in general it should be possible to recover a database with time machine (I did that some weeks ago as I had accidently deleted things in my DB and ended up with a corrupted DB) and could successfully restore a backup from the last day.

one warning:
be careful to open a backup of the database and the database itself at the same time, at least for the not recent version of DTPro this might lead to problems: see viewtopic.php?f=3&t=12699

AFAIK one problem might be:
If the TM backup is saved as long as the database is open, it might be corrupted - better would be to close the DB before writing the backup.
For that reason, I have configured DTProOffice (thanks to C. Grunenbergs advice) so that it automatically saves a backup of the DB each day.
This does only save the metadata!
It is saved in the DB package and if TM then makes a copy of it, it is a closed and “consolidated” state of the database and could be used for recovery, if the original DB metadata saved by TM would not be saved.

other points to consider:

  • where did she save her scanned files?
    Did she import them, so that they are stored in the .dtbase package or did she index them, so that the only they in their location in the finder and are not copied to the db?

  • did you also search for backups of her global inbox, or where does she put her files in DevonThinkPro?

Best wishes (I hope you’ll get back the data from anywhere),

Martin

Thanks for your reply, and I’m just getting back to having access to my friends machine.

You make a very good point about taking backups while DT is closed. She tends to live in DT, and I’ve automated the backups, and of course Time Machine is automatic, so I’m going to have to ponder that.

And she leaves her computer running 24 hours a day, so there is no end of day to run a backup job.

And when you say metadata, I have only a vague idea of what you are referring to.

In her case, we are scanning in documents, running OCR, and saving them in DT. So the files are only within the DT structure, not anyplace else on the machine.

We are not using the Global Inbox, as it was too confusing, and she really only has one database.

The problem with storing the dtBase2 in Dropbox, is that when 2 different computers are both running DT, and DT decides to reorganize the database on each machine at the same time, bad things happen.

And it looks like when I restored from Time Machine, the restore worked, but the internal indexing was already screwed up, and I could not see the newer documents.

WHen I ran a Verify and Repair, DT said - Reparation Failed 346 errors.

WHen I tried to File : Export : Database Archive, DT said verification of database failed.

When I rebuilt the database, I lost about half of the documents, although I can now pass the verification step.

I opened a backup of the old database, found one of the missing documents, and dragged it to the desktop. I closed that database, opened the newly rebuilt database, and dragged it back in.

I was surprised to see that it kept the PDF + Text format, and I could search and find words inside the PDF. So I’m guessing that the OCR words are stored inside the PDF itself ?

So I think all of her data is on her hard disk, I just need to find it and reimport it into the new stable database.

I’m going to start trying to understand how to get underneath the hood, and what’s the easiest way of getting stuff in, and then deduping things.

Thanks again for the insights.

mark

Hi red-diode,

you should ask the experts (DevonThink team), if the data are important for your friend.

Well, if she “imports” all the scanned documents, they are stored inside the database package and you should be able to find them with a spotlight-search or a similar tool.

You can create a copy of the database package, open it in finder and then copy the pdf folder (if you’re only interested in pdfs) to another location where you can examine it and try to extract lost documents (if they still should be inside).

I know the problem with the 24/24 running mac and dtpro and that’s bad for backups.
The idea with Time Machine and the open database came from Christian Grunenberg and it sounds more than reasonable that it might not be good to backup an open “document” of such complexity in an undefined state…

However, as I said the “backups” that can be created automatically in Devon Think Pro are only the metadata (don’t ask me what that means exactly, I think the groups, the tags etc.) but NOT the real documents (PDFs, …).
You can see them in the database package - already their size tells you, that they can not include all the content.

So to also save the “real content”, you have to save or duplicate the whole database package.

And concerning the OCR:
sure, the OCR creates a text layer IN the pdf documents (also the size of the pds might increase dramatically), so its all in the pdf.

What was also good news to me: the tags are also stored in OpenMeta format, so they are attached to the documents even if you take them out of the database and can be read/changed with other tools which can handle OpenMeta.

Kind regards

Martin

Even though the computer is running 24/7, presumably she sleeps occasionally. So I would suggest that, as her last act at the end of the day, she should shutdown DT and run a manual Time Machine backup.

The last backup of the day is the one that Time Machine keeps, so doing this will ensure that she can never lose more than a day of work.

If she’s really forgetful, you could probably set up an Automator action to accomplish the same thing.

Katherine