Merging two Devonthink Databases

Kevin · June 21, 2014, 7:52pm

I lost 10 gigs of stuff in a crash of some sort (I believe with DTOP, though their folks strenuously deny it) in December. I did have a Time Machine backup for December and was able to restore most of it. But to do so, I removed (to the desktop) the files for April 24th (when I restored the TM Backup). That means of course that I’ve lost what was in the April files that wasn’t in the December files.

Is there any way to import what was in the April files? Do I import or what? And which files? Are they hidden?

Many thanks for any advice.

Kevin · June 22, 2014, 1:36pm

Hi

Sorry to be vague, but since the architecture of the files is unclear (and, I find it unexplained in the guides), I am not sure what is meaningful.

In April I took all the Devonthink files in, I think it was lib/application support and moved them to the desktop. Then, I took the December files (13 gigs as opposed to 3 gigs) and moved them to lib/application support. But in doing so, all work between Dec and April was of course NOT available when I opened Devonthink.

So now, on my desktop, I have a whole series of files. I have had various versions of Devonthink, Personal, Professional, and then ProfOffice over the years. Each created a separate folder. So, in the folder I see what you see in the enclosure: various things, some going back to 2007!

Are you saying I should open the DTOP db from April 14, then just copy everything into what I have now when I open DTOP? Is there a script to remove duplicates ( know I can find them, but it is tiresome to select and remove thousands of files). And that will pull in all the other stuff from older versions?

Thank you

Kevin · June 23, 2014, 10:52am

Sigh. This admonitory tone is just why I was in the forum. The support people replied first with vague and cryptic questions, and the second reply was simply a denial that what happened could have happened: Crashes are impossible! I personally have fifty zillion files in my DevonThink database and it’s never crashed! You can’t be having this problem.

And now your unhelpful reply. DTOP is a good application with a so-so interface, and a penumbra of priestly self-satisfaction around it.

Thanks for the help.

korm · June 23, 2014, 12:56pm

Sorry for trying to help you. Sorry I didn’t succeed. You’re loss is surely aggravating; no doubt.

Slapping me in the face publicly is not necessary, though.

gg378 · June 23, 2014, 2:15pm

Kevin,
If you browse for just 5 minutes through the forum, you would notice that korm is the most helpful, thoughtful person you have ever seen on a forum! And the DT people are very helpful/competent, too. Let’s leave it at that.

Concerning the merging: I haven’t followed this thread carefully, and I can’t read the answers that korm seems to have edited out, but I wonder about the timing of your restore operation. In the past, I had system corruptions, too (not DT, but OS X wide). I immediately, i.e. within a day restored the most recent backup. It was still a little tricky to get back the one day worth of files, but that’s manageable.

Concerning your case: From your OP, I simply don’t understand what’s going on. You had a crash in December. So presumably you knew that in December? You didn’t act on it in December? So after the crash, you somehow moved on and added stuff until April, and then decided to do something about it? That sounds odd, so I assume I’m missing something.

In any case, if you now have the December DB running again, can’t you just open the April DB, and define a smartfolder that lists all files that were added or modified in the interim? Then you would copy those over into the December DB. Of course, if you had done extensive tagging/grouping on these files, you have to make sure that those assignments stay intact. That could be quite a bit of work, but honestly, 4-month lags between backup and restore is what is causing this.

Edit: I just noted more details in one of your followups: You EXPORTED items from the April DB. And then you simply overwrote the April DB with the December DB restore? Not keeping a copy of the actual April DB? You might still have it on TimeMachine, though. I would close the December DB, restore from TM the April DB to a DIFFERENT location with NEW name, and then open it. Also open the December DB and start copying files over. Of course, this assumes that this is a manageable number of files. I would have accumulated or modified probably a few hundred files during the December - April period. I would not want to use scripts to do any of that, let alone duplicate removal, as duplicate identification is a little dicey in DT. This simply sounds like a day of work, or a bunch of extra late night shifts to do this one by one. Of course, if you have dumped thousands of items during that period, I’m not sure what to propose.

The point is: If you use the April DB instead of the April Export, you can very easily identify all files that were added/modified in that period, and so your copy load will be much reduced and it will be much safer. The less files you have to deal with, the less errors will happen. PLUS: All DT metadata (e.g. URL from web clippings" will survive.

Frederiko · June 23, 2014, 2:25pm

Here is what I would do:

a) export all your files from your December backup into a new directory-“Directory A”
b) in your desktop directory with your april files, I would do consecutive searches for all the files types you know are in your database, eg, pfds, docx, txt etc.
Your could even do a smartsearch to cover all the files types at once. If you do this you will have a listing of just the relevant database files and not the supporting files used by DT
c) copy all the identified documents to a new directory - “Directory B”
d) use a duplicate finding tool like Gemini, http://macpaw.com/gemini to identify and remove all the duplicates. Gemini looks for duplicates by checking the contents and determining if they match and not merely filenames or dates. Delete the duplicate files from Directory B
e) Now the files in directory B are the new files since your december backup and you can reimport them back into your December database. I am afraid you will have to manually recreate their positions in the directory structure.

(i apologise if any of this is a duplicate of Korm’s advice which was already deleted when I saw this thread)

I agree, there is no need to insult Korm. He is a user just like you and me, and a phenomenally helpful one who must have taken hundreds of ours to give us all the amazing advice he has contributed to these forums.

Good luck. Recreating a database is hours of work. The only compensation I can offer is that you will never have this happen again, because you will never not have backups again. Been there

Fredriko.

PS Never just rely just on one Timemachine backup drive. They are known occasionally to fail silently. I keep two alternating drives for time machine backups just in case one goes bad and a swap in a new one every year. Carefully created data is priceless compared to the comparatively trivial cost of backup drives.

gg378 · June 23, 2014, 2:38pm

I’m curious: Why is everyone so big on exporting? To me exporting has a fatal flaw: Replicatants are being lost. The export process (in the form of just exporting the whole DB as a whole) creates a folder structure in OS X which mimics the DT group structure, but with one (for me) absolute deal breaker: Each replicant version gets exported as a standalone, duplicate file, not an OS X alias. You might say “well, at least no info or files get lost, disks are big enough these days, so who cares”. Well, the problem is “annotation/modification”. If I then later grab one of the those files, say a pdf, and make further annotations/modifications, only one of those initially equivalent files gets changed, the others not. That destroys the whole point of the DT DB for me.

I was planning to start a new forum thread on this issue some day.

Frederiko · June 23, 2014, 2:46pm

The only purpose of exporting was to create a working copy of the files for comparison and deduplication by Gemini. After the deduped files had been imported into DT, the Directory A could be deleted.

Frederiko

gg378 · June 23, 2014, 3:00pm

Sorry, Frederiko,
My remark was not aimed so much at you. Gemini sounds like an interesting tool! But as long as Kevin still has the April DB somewhere in backup, I still think that working straight from that DB is best, as it would be guaranteed that the added/created/modified dates and other metadata are still absolutely correct, which is crucial for his purpose.

P.S.: Completely agree on the Time Machine advice. TM is a very handy system to fetch old stuff, but it ain’t no backup, especially when using the TimeCapsule which is on all the time. I have one TimeMachine and at 3 least three bootable system backups, in at least two physically separated locations. They are now all 2.5" USB powered drives. They are attached weekly to make the backup, and then immediately taken off the “grid” and stored in safe locations. So lightning will not get them. One backup is NEVER a backup. The moment it is hooked up to the system for backup action, it is no longer “backup”, because at that moment it sits on the same machine on the same grid as the original. Power surges and disk controller or OS system failures could easily take down the original and the backup together! So there must be AT LEAST one additional drive at that moment that is unconnected and safe, better 2. I think this method takes care of virtually all failures to a very high degree with the exception of silent file corruptions. Those wouldn’t be noticed until it is too late. That’s where TM comes in really handy. But of course, ideally one should have at least two of them; one could be on all the time to fetch all recent file mods, and the other maybe only once a week to create a slow stream of decently spaced old versions.

Kevin · June 24, 2014, 12:05pm

Dear All

Thanks for the helpful suggestions. Apologies to Korm if I was unduly harsh to a helpful contributor; I was responding to what seemed to be a kind of piling-on from DevonThink personnel and then the forum.

To satisfy curiosity: I use DTOP as a repository mostly for articles I’ve already read, and clippings I might someday need. It so happened that I kept adding stuff, and never went systematically looking for something for that Dec to early April period. Then a couple weeks of casual musing: “That can’t be right…” etc. etc… Also not to the point but I have TM, but also two bootable backups, one off site, regularly swapped. But because I do this systematically every week, that was no help.

On to business. The material I moved to the desktop was not intended to be used—I get that its location matters. I was simply getting it out of the way since I didn’t want to delete it when I pulled in the December material.

I’ll read through all your suggestions, and a couple from Devon, and move forward.

Thanks again for all your help.

gg378 · June 24, 2014, 2:06pm

Hi Kevin,
Sorry for the tangent on backups, but good to see that we actually seem to follow a very similar procedure.

“Curiosity” is not what I would call our questions. You might have come to the forum strictly for a quick suggestion concerning the merging. But the forum is about give and take. By mentioning the circumstances that lead up to this, you created, at least with me, less curiosity but concern. Concern that something could go corrupt in December, and was not noticeable until April. That must be the biggest fear of any knowledge management system user. I simply wanted to establish whether you had a known crash in December (which can ALWAYS happen, via OS X, via DTPO, via other crazy circumstances) and didn’t do anything until April about it. OR: on the surface your databases just ran fine until you finally hit upon the corruption in them in April. That latter is the worrisome scenario. Your last email you seem to hint at this latter scenario, at least partially.

Kevin · June 24, 2014, 3:59pm

What I’m saying is that it’s perfectly possible that the corruption I noticed (“missing file”) was observable long before April IF I’d been using DTOP to retrieve information, not just putting it in.

alanshutko · June 24, 2014, 11:27pm

Well, if you are only exporting to import them back in, that is easy to fix. Go to the scripts menu, More Scripts… and download the Dupes to Replicants script. Then you can convert those dupes right back to replicants.

gg378 · June 25, 2014, 2:13pm

Yes, for export-then-import scheme that would work. However, that still leaves metadata such as URLs for clippings and wiki-links in the dust (from Kevin’s description, that does not seem to be an issue in his case). I have never done this, but I wonder: How reliable is dupe detection? I have quite a few files that are absolutely not the same, but are labelled as duplicates (example: three videos of oscilloscope traces that are megabytes different in size, have completely different poster frames and content, and one is even portrait instead of landscape mode - yet they are labelled by DT as dupes; I had filed a ticket on that). So running a duplicate removal script over a large amount of files is not my cup of tea; presumably, after detecting a “duplicate”, it will be deleted, and the other file will receive a replicant? So there is potential for data loss. In fact, “script” combined with “removal” just sounds wrong, unless there is no other choice.

But in the end, I was really talking about “final exports” from DT (admittedly a tangent to the OP), i.e. without re-import. If replicants are turned into standalone files, that’s simply not going to be useful for the reasons I mentioned in the previous post.

If Kevin has backups of both the December and April DB, it still seems to be unnecessary to me to use “export”.

In fact, after his last email I understand the situation a little better, and it seems to me that I would follow a different route:

IF:

all the “corruption” in the April DB is “missing files”, and nothing else,
and the number of missing pre-Dec files is less than the number of files added in the Dec-Apr period (*), then the least troublesome path would be:

Keep the April DB going (**). Get a list of all the “missing files”. This can probably be accomplished (haven’t checked) by

Checking the log of a “verify” operation on the DB
Make a smartfolder that includes ALL files in the DB, and use the “as icons” view style; I assume that “missing” files would show up with some icon the clearly alerts you to the fact [edit: does not work, DTPO still has a thumbnail of the missing file].

If those are all pre-December files, then hunt those down in a backup copy of the December DB and re-insert them in the April DB.

Do a verify&repair and probably a rebuilt, and it should be OK.

(*) Kevin lost “10 gigs” of data. Is that three 3 GB videos (OK, then we have already spent way too much time here) or 10,000 1 MB files. The latter will mean “work”.

(**) Time Machine can restore old files to a different location. So both versions of the DB can be opened at the same time. Or you can first rename the current version from “mydb” to “mydb-current”. Then restore the old “mydb” from TM and maybe name is “mydb-dec”. Then work with them. Note that renaming the DB file does not change the name of the db as it appears in DTPO once it is opened. So it might be best to also rename their “name”, not just “filename”.

cgrunenberg · June 26, 2014, 7:22am

If you’ll import all files/folders exported via File > Export > Files & Folders… at once, then replicants are automatically rebuilt.