When rebuild eats all my files

Hi,
Ever since a hard drive crash and restore from backup, I’ve had all sorts of DT problems, and basically can’t get it to function right.

The latest is this - one of my main databases was “corrupted”. So I hit “rebuild”.

The rebuild log tells the whole story so that I don’t have to:

Rebuild Database…
Exporting 45636 items
Importing 3 items
Done.

Ok, so meek little me has one question in response to this log and the resulting empty database window:
“where did 45,633 of my emails that were stored in this database disappear to during the rebuild?”

Did DT decide that only 3 were worthy of its awesomeness?

Thanks in advance for any insight.

(Interpretation: I will try restoring from an older backup, and hopefully that will solve my immediate needs. My goal with this message is to prod the DT developers to examine their “rebuild” process more carefully, because it is not functioning correctly if it throws away 99.99% of my files in the process.)

Morgan Giddings, PhD
morganonscience.com

Precisely because Bad Things such as a hard drive crash (or a burglary of your computer equipment or a house fire) can and do happen, I recommend a backup strategy that foresees such possibilities.

Damage to files on a hard drive can also result from seemingly lesser Bad Things, such as a System crash while data is being written to disk, a power outage or lightning strike that shuts down a desktop computer while data is being written (unless there’s robust protection by an Uninterruptible Power Supply (UPS)), or running out of free hard drive space (in which case the operating system itself could overwrite data files). Last, but unfortunately not least, there are ‘hack’ utilities that can be downloaded and that may cause all sorts of problems on your computer; the only time I’ve had a database ‘blow up’ in the last 5 years resulted from installing a utility that had caused problems for a user. I installed it as a test, and it destroyed an open database. As I had a recent backup, no damage was done.

DEVONthink has routines that can deal with minor database problems (Verify & Repair or Restore Backup) and somewhat more serious database damage (Rebuild Database). But if a database has been badly damaged, the only recourse may be to recover from a complete (and recent) backup of the database.

In some cases of failure of the Rebuild Database procedure, more files might be recoverable if instead, File > Export > Files & Folders is run with all content of the database selected, and the export is saved to a new folder in the Finder. Then create a new, empty database and run File > Import > Files & Folders and select ALL the content of the folder holding the exported content. Even if that procedure doesn’t recover all the desired content, it may help recover some items that are more recent than the last complete external backup of the database.

So complete external backups of a database should be made periodically. DEVONthink Pro/Office has several procedures that can do that. File > Export > Database Archive will produce the smallest possible compressed and dated complete backup of a database, and the archive should be stored on a medium other than your host computer’s hard drive, such as on an external hard drive. (I store such archives on a portable hard drive that can be stored at another location, such as at my bank). Scripts > Export also provides routines to save a database archive to iDisk or to JungleDisk.

I also use Time Machine. Having at one time worn the hat of quality assurance manager for an agency, I like a ‘belt and suspenders’ approach, remembering one of the precepts of quality assurance concerning data integrity: “If anything goes wrong, it’s your own fault. It’s not the fault of the software, the computer or the power company, because you should have considered those possibilities.” That’s a harsh precept, but it is prudent to act on it.

Suggestion: Try the tip above about exporting the content of the database to the Finder, then importing it into a new database. That might help recover some files added more recently than the time of your most recent complete backup of that database. And make sure you are implementing a good backup strategy, so that you are not hurt by any future Bad Thing. My databases are more valuable to me than my auto, and perhaps even my house. Think of backups as a relatively cheap “insurance policy” to protect your valuable data.

Hi Billy,
Thanks for the nice long message helping me with backup strategy…

But did you actually read what I wrote?

“I will try restoring from an older backup, and hopefully that will solve my immediate needs.”

I DO have backups and a backup strategy in place (my drive crashed while I was traveling, and I had backed up only two hours before I left on that travel).

While I understand your goal to be helpful, writing back with a long-winded explanation of backup strategy is not what I was asking for (nor was it needed).

I’m a software developer - and have been for about 30 years. I understand that a developer can’t foresee every possible circumstance. I manage a team of about 9 developers at the moment (all working on the OS X platform).

That is why it is so vital for developers to listen carefully to every single bug report.

A “rebuild” that deleted 99.99% of my files from the database is a bug. No wiggly words can avoid that conclusion.

That’s why I was surprised that you said nothing along the lines of, “Thanks for the report, Morgan, we’ll definitely investigate this!”

If I were writing DevonThink, I’d want to make sure that my “Rebuild” procedure doesn’t get rid of hundreds or thousands of files.

That’s why I posted here.

Unfortunately, I didn’t expect a great response. Nearly every time I’ve posted in the past, there’s been very little help, mostly just lecturing like this.

I posted repeated times over a period of years that DA wasn’t working for Ebay searches. You guys never did anything about it. (Even though other users were posting too).

I posted about several very odd behaviors of DT in version 1.x - and you blamed it on my system configuration.

Now I post about an obvious bug in the rebuild procedure, and you blame it on my backup strategy.

I find it sad - the Devon Tech series of products are really, really promising, and I’ve really, really wanted to like them and use them. But the products never quite live up to the promise, because they always have these kinds of bugs (the betas of DT are getting better, but still have issues). This has been true since I first used your products three years ago, and unfortunately it remains true today.

I surmise that the reason they still have bugs may be because, instead of taking feedback like my previous message seriously and fixing the problem(s) (with a big “THANK YOU” to those of us who take the time to report them) you just assume that people like me are making stupid mistakes, and need a lecture on backing up.

Morgan

Please accept my apology for having been preachy about the importance of a good backup strategy — in your case that was unnecessary. But as this is a public forum and newbies are likely to read new threads, I frequently seize on the opportunity to write about the importance of backups.

As I mentioned, the repair, restore and rebuild procedures in DEVONthink can handle various database problems. But each of them depends on the existence and validity of information contained within the database about its contents and also on the integrity of the actual document files stored within the files.noindex folder. The latter are stored in Apple’s own file management system.

In your case the Rebuild Database log reported that it had exported tens of thousands of files contained within files.noindex, but that it was able to recover only three of them in reimport to the database. Does that indicate that the Rebuild Database procedure was buggy? Or is it possible that those files had been corrupted in some way, or that the file system had simply lost information about them as a result of the crash (the disk directory can sometimes go wonky before the final crash episode)? DEVONthink maintains information about the names and locations of files within files.noindex via paths. But if the filesystem can no longer properly copy them during the Rebuild, they cannot be successfully reimported.

Working in support, I work users through procedures to recover from database problems, usually with success. After new application releases and after OS X updates I test those procedures on copies of my own databases, and they work with few exceptions. The exceptions have usually involved older WebArchive files resulting from bugs in Apple’s WebKit, or “strange” variants of PDFs. In those cases it’s usually possible to recover those files in an acceptable form (Christian has made recovery automatic for most of the problems with WebArchives). But if a file has been damaged or there were disk directory problems, recovery isn’t possible.

It’s hard to tell because we don’t know what happened during the crash of the hard drive, how you’ve restored the backup and which version you’re using. Rebuilding skips only missing files usually, therefore the restoration of the backup was probably incomplete or the backup is damaged too.

However, this should definitely work:

  • Restore a backup
  • Use Tools > Verify & Repair first
  • Then use Tools > Rebuild

Assuming that the backup isn’t damaged and complete, then everything should be fine afterwards.

Bill: This topic seems similar to what has happened to me twice in the last year with the total loss of some files. I have been using DevonThink Pro Office for about a year now and love it, however I have lost files that I know were there. I’ll create a folder for a new vendor and later, be it days or months, I’ll open the folder and the files are gone! There seems to be a connection between a change in the path and the files disappearance.

What I’d like to know is the physical location of the raw file. You stated that the “actual document files are stored in ‘files.noindex’”. I’m unable to locate this file. When I open a file in Acrobat from the DTPO database it displays folder “c” or some other lowercase letter. Where are those folders?

I had the DTPO database on a volume named “Data 1” and briefly moved the database to my main HD “Mountain Lion Server” and used it there for several days and entered several documents. I later decided to return the database to “Data 1” and found the docs entered to the database while on Mountain Lion Server were no longer there. This brings me to the question - the path must be relative to the database-correct? It seems that an absolute path might be better or maybe the whole DB should have a container so that if the customer moves the DB to another volume or disk everything would move with it. I think I’ve also lost files when I have renamed the volume but not moved things, is that possible?

Don’t worry I keep a separate file of everything I scan into DTPO. This file is automatically date and time stamped at the time of scanning. If I only knew what I was missing and the creation date I could restore it. I do Backups with Time Machine , CCC and offsite but it seems that the problem is with an internal index in DTPO. None of the backups have the missing files. Only the original PDF is still there since this is outside the range of DevonThink. This is a great program but I’m beginning the call it DevilThink.

Any ideas?

Thanks

John Schubert

Run Tools > Verify & Repair. Check to see if some documents are now in an “Orphan” group, ready to be refiled. Check the Log (Window > Log) for a list of missing files.

Unless they were Index-captured, DEVONthink stores the files of documents in a folder named Files.noindex, within the database.