backup Files.noindex via hazel from DTPO

saschabur · July 5, 2015, 5:37am

I have some data loss for unknown reasons .
It is possible that i deleted with the “backspace” key some groups and files, as there was no confirmation dialog. (Meanwhile , i restored the “confirmation dialog” with the Terminal Command")

As a new supplementary backup strategy, i would like to copy automatically every document that is imported in the database, in a Dropbox-Folder.

In this way, classification data is not backuped; but the original documents (a lot of PDF) would be in a folder that can be searched in order to retrieve informations.

But as DTPO puts the folder in a package, Hazel cannot watch the folder Files.noindex in order to copy automatically the most recent files.

My questions:

How can i automate an export of every new document in a backup-folder
By the way, what is the system of the Files.noindex, why are there subfolder s"0", “1”…, why not one folder for PDF, one for DOC…
And how can i sync back the documents? When i restore a database; using a backup-folder a timemachine-version of .dbase2 Package, i also must retrieve the docs and pdfs that are in the noindex.files, but how can i copy only the missing files from the noindex.files-directory in the timemachine to the noindex.files-directory in the actual database?
thank you

korm · July 5, 2015, 9:59am

Don’t be looking to backup your documents to a location inside your database. That’s no backup and is pointless. Would you think that duplicating the content of a Word document to the same document is a “backup”?

You need to be backing up the entire database package (mydatabase.dtBase2) to multiple locations. Bill DeVille and Jim Neumann have written about this extensively, as have numerous other readers – if you’d take time to search the forum you’ll find the answers to all of your questions about how to backup. I urge you to read what’s already published here – lots of folks already spent hours publishing their useful suggestions.

As for the internal structure of files.noindex – it’s used by the database software and not structured for any other purpose. I’m once again going to guess you’re looking for some technique to manually modify the internal structure of your databases, which will ultimately destroy your work.

saschabur · July 5, 2015, 10:24am

Yes, i performed a search “Hazel noindex.files” and had no returns.

I know that i can backup the entire Database and then try sync them (i did it manually, i am searching for another way, but this might be described in the forum, i will search, but if someone has quickly a hint, thx))

But what i want is a kind of “everything that goes in is bein saved asis is in ONE Folder”. Hazel could very well do this job, if it had access to the folder inside the package.

The only workaround that i see is not use Inbox anymore (thats sad), create a “MyownInboxFOlder”, and say to Hazel to copy everthing that it founds there to the inbox of DTPO.

Once again: DTPO is a great software, but is has some very, very ugly points that i and other users pointed out hundred of times:

No realtime Sync
Conflict Resolution says nothing interesting when pointing conflicts (<- this could be the error that made me loose about 500 documents)
When you classify and you are *not" highlighting the item, but a group as you searched the group <- You classify by mistake a whole group

and so on.

I can live with these points as i love the rest (especially duplicate highlighting, classifyng)
but I really need to be able to see every document that goes in in another place , to compensate the lack of reliability against mistakenly deleted items.

Greg_Jones · July 5, 2015, 10:58am

So what happens after a few months when you again mistakenly delete items, and your ‘other place’ also contains all the documents for the past few months, including the documents that you did intend to delete? Or a situation where you have made changes to documents in your database, but those changes are not reflected in the ‘other place’? How are you going to reconcile these scenarios?

It seems if a reliable backup strategy to recover deleted documents cannot be put in place, then perhaps a safeguard somewhere in the 3-stage deletion process of ‘Move documents to DEVONthink trash>Empty DEVONthink trash to Finder trash>Empty Finder trash’ would accomplish what you want.

saschabur · July 5, 2015, 11:10am

Thanks for your comments.

Generally, i work with documents that are edited one time and then never again:
Scanned PDFs for exemple, screenshots, PDF’captured webpages.

OK, there might be annotations on PDF, but this is another problem when they are lost.

But i lost scanned documents , and as we destroy the paper after scanning, there are lot of important documents only in digital form.

Thanks to the backup strategy based on:

Timemachine
Cloudservice base backup of the folder with the Devon Database

all documents are definitely somewhere. But it is a big mess now to sync lost documents from the old database-versions to the newest one.

korm · July 5, 2015, 11:27am

Your topic is “Backup” so search for that. Here for example is a search that produced 2,000+ threads on that topic:

search.php?keywords=backup&terms=all&author=&sc=1&sf=all&sr=posts&sk=t&sd=d&st=0&ch=300&t=0&submit=Search

Whoa. You want to backup all your documents into a single folder? How is that going to help?

DEVONthink Sync is not a backup solution. If you mistakenly delete your data then the deletions would be synced. What good is that?

What do you mean: “Conflict Resolution” ? It looks like you lost your documents because you had disabled the warning about deleting documents

– and then you emptied the DEVONthink trash – and then you emptied your OSX trash – and you had no backups of your database. This is the three-step process Greg mentioned. Sorry, automation cannot stop us when we’re determined to ignore the safeguards.

User error – it’s useful to pay attention, read the manual, etc.

BLUEFROG · July 5, 2015, 1:32pm

It sounds less like you want to “see” the items and more like you want to reproduce them. Two very different things.

No offense, but this feels like an overreaction to you mistakenly deleting something. It’s like you threw out your favorite shoes so you now want to buy a second pair of all your clothes and store them in a big pile in your garage… just in case you MIGHT throw out another item.

korm is absolutely correct on this, so I’ll repeat it: Sync is NOT a backup solution! And as Sync is a mirror, the deletions would propagate to the Syncing machines (unless they were offline or not connected, which would infringe on the “realtime” concept). In fact, the “realtime Sync” you mention would delete your files even more quickly, so you’d have to be even more aware of what you’re doing.

saschabur · July 5, 2015, 2:12pm

Thanks to Bluefrog and Korm, but you misunderstand me.

I have at least 30 different versions of my Devondatabase, with everything in it (files and database), since the beginning i worked with (and imported 16GB data from Evernote, waht was not easy. As you say, Evernote does not want to go us away… )
So my Data is there, for sure! But where…?

I had no idea when the data is gone beetweem january 2015 and now…
Now , i have found , by restoring by “trial and error” old databases till i found back the group.
I lost 2 h for this…

Now my problem is no that i don’t know how to sync the “old” databases in order to update the actual, by putting in only the things that are really interesting and have been deleted.
I don’t want a “One-Folder-Backup” as an overreaction, but as an principle. Scan a paper, then put and classify in DTPO, and “fire and forget” a backup in a One-Folder-Backup-Folder.
This one can be useful:
if one day i have to move every original document in a new filing system (as i had to do withe Evernote-> DTPO). I hope not, but who knows?
It would be a chronologic BAckup, very simple to check.
How can i check in the files.index all documents that have been scanned 10 weeks ago, when there is a problem with DTPO? In such folder, it would be very simple
it would be very simple too to make a comparison between docs that are backuped in another place. All Folder-Sync Applications that i know have a lot of problems with nested folders, but a comparison of two single folders, even with thousands of files, is very simple and fast…

And i don’t want DTPO to change its saving system, i only would like to automate the 3rd backupsolution that would come in addition to the others…

The lack if real time sync has nothing to do with this problem, we are only a little bit tired on work to ask the each other with my partner: "Can open Contracts Group? New contracts are up to date? "“No, wait, have to sync…”.
And with every Sync -> Memory Leak -> DTPO Memory needs grow, grow,grow…
And then
"Can open Customers Group? New customers are up to date? "“No, wait, have to sync. Stop, have to clean up the machine, it’s slow as a snail with Devon eating 6 GB memory…”.

)

korm · July 5, 2015, 4:16pm

30 versions of the database? Each version larger than 16GB? Can’t imagine how you can work your way out of that.

If it were me, at this point I’d forget about DEVONthink, export everything into a series of standard folder hierarchies in the file system, and work my way though the mess with BeyondCompare of something similar to find and get rid of all the duplicates.

saschabur · July 5, 2015, 6:25pm

Hi Korm,
of course, i talk about “Backup” versions and take in account also the Time Machine Backups , that are not 20 GB large copies, as you know, but only changes that MacOS noticed in order to rebuild the files as they have been in a specific date.

(((By the way, it is much more than 30, because there are 2 Time-machine-backuped Macs synced, and there is a cloud storage backup with 5 versions of one of the 2 Macs. (At hubic.com, you can have Cloud Storage 10 TeraBytes for 5 euros/month or 50 euros/year ) )))

But in order to find the missing group, i had to backup 6 or 8 versions before i found…

And i still don’t know how i can quickly browse the differences between 2 database versions in order to find the other documents (not the big group) that have been accidently deleted (or have been lost for unknown reasons).

I am in contact with the support.

Thank you

saschabur · July 14, 2015, 5:50am

The support team gave me very quickly some hints to synchronize old databases, but this is really a big mess.

Let me resume

even with a good backup strategy, if DTPO decides to mess up your database, you have a BIG problem. DTPO can mess up the DB while synchronizing, are also when you are optimizing the database: DTPO changes all “date added” to the date of the Optimization command, even if it’s a document that you have added 2 years ago
I have all my documents somewhere, thx to the backups organized from outside of DTPO (Timemachine on two machines, Cloud Backups). Somewhere…
I have already restored 20 versions from Timemachine , but still haven’t found everything.
My initial post was : how can i access/backup every document when it is added to the DB, in one folder ? This is what i have to organize now for the futur, and i still have not a simple answer. One idea could be to create an “Hazel Input” folder, and add everything whazt is added to DTPO goes there. Afterwards, Hazel can copy it to DTPO’s Inbox.
Problem: OCR will be performed on the DTPO documents only and not on the backup.
Anyone with a good idea how to export every document regulary from DTPO?
If you relie on DTPO for paperless office, another parallel DB with such kind of backup folder seems to be really the only way to be sure that you keep all documents, and prevent DTPO from killing documents by mistake/ by Synchronizing.

By the way, in my opinion, there is no reason that DTPO deletes scanned documents: a scan with 1 MB size of a paper letter that you received should be perfectly protected.There should be a flag like “original document” on scanned documents, that cannot be deleted or if so, only after confirmation, showing a preview of the doc.

Synchronzing beetween computer must be more clear: if there is a conflict, DTPO should not only show the dates of the documentrs, but also a preview of the content - and say what it will do with these docs (replace? update? delete?)

korm · July 14, 2015, 12:14pm

Ugly story - -sorry to hear it. Hard to follow the exact sequence – we you able to rule out user error and identify the exact failures in DEVONthink? That sort of detail helps with dev QC.

I wonder if DEVONthink is the product for your needs – no disrepect intended. Since you have a need to keep exact copies outside the database of everything inside the database – which is expensive of your time and attention and gets really really cumbersome to manage without losing track of actions – then the obvious question, IMO, is why bother with DEVONthink? Or maybe just index everthing?

(BTW - FROBGOBLIN has posted some ideas here and elsewhere about using an indexed database so that you can use a cloud service to access your data on multiple machines. This sort of thing eliminates the need to bother with Sync.)

Greg_Jones · July 14, 2015, 12:25pm

Unless I’m misunderstanding what you are wanting, there’s a preference for this in the DEVONthink Preferences>OCR tab.

saschabur · July 14, 2015, 12:48pm

Thanks for your quick answer.

I think the errors occured when syncing two machines and/or when repairing databases that don’t find the original missing files for any reason. (This is what i say: it should not delete any record, even if the file is missing. Missing file: go in a special Group “Records without data” (a little bit as a the orphan files, but in the other sens).

And you ask if DTPO is really what i need:

YES, if the little problems are solved , it could be the perfect tool.
and YES, Indexing existing folder structures (like a dropbox folder) would have been the perfect thing. This is exactly what i intended to do when i began working with DTPO.
But (as often with DTPO, a little detail that makes a solution not possible), i had to learn (after hours of trials), that DTPO does not sync an idexed database - or it copies the indexed files to the another computer, what makes strictly no sense.

My initial idee was:

Two computers, two Dropboxfolders, always Synced beetween both (DropboxSync is very reliable (Google Drive is not))
Both machines have a DTPO-Database that indexes every document in the Dropbbox folder.
But this doesn’t work, because when you Sync the two databases, they copy the idexed Dropboxfolder on the other machine…, so the 2 GB DTPO-Databases is changed in a 102 GB-Database with the documents inside…

This is why finally i decide to confide the Documents to DTPO in it’s database.

And now i have to find a way to retrieve the right document in the right places.

For the moment, i have to wait , not enough time for repairing this, right now i keep 3 Timemachine versions of the database on the Desktop and when i realize that something is missing, i open the backupdatabases. But i don’t know how much items are missing, the fact is that groups have disappeared, and also in some groups that i don’t viewed / edited for a long time, one or two items are missing for unknown reasons.

One of the main wishes that i have (i already wrote this several times)

when syncing, the activy window should really stay in front, and when action is finished , DTPO *should really and clearly " say: “Sync OK”, or “Verify and repair”, or anything else.
But not close the window and let the user check: “Activy Monitor, where are you? Ah , OK , there (always behind all other windows !!))”, and then “Sync really finished? Let’s have a look in the Log if there is an error”. This behavior really makes no sense! And if developer think that another fixed pane in the main window with Sync or Error Status makes too much mess, please, a little flag in a corner for “Sync in Progress”, “Sync OK” or “Sync failed” with three colors should really not be a problem…
When there are Sync conflicts, DTPO should exactly say why there is a conflict, and show both doucments/items in a preview window. Otherwise, it should keep a copy of both items, clearly flagged as “Sync Conflict Group (1)” "Sync Conflict Group (2) , a little bit as Dropbox does.
an option: don’t sync data for index-databases
when classify, and DTPO doesn’t find the right group (this happens, even if DTPO is really intelligent): a little search field where i can type “Ele” and the group" Electricity and Water invoices" appear…
Don’t let the user classify the groups in the proposal pane with the groups that DTPO thinks matching. It really can happen very quickly:

DTPO makes some proposal where to classify.
User highlights on one of the proposals in the group window, but finally decides to look for another group that matches better.
As he has to activate other windows in order to search the group, the actual window looses focus
When user comes back and he clicks “classify” , because finally the first group is matching, the whole highlighted group is classified in another group…

Thank you so much for DTPO that looks to me as one of the best softwares, but with some of the worst bugs or bad behaviors that you can imagine.

saschabur · July 14, 2015, 12:58pm

Hi Greg, yes, this is a misunderstanding. I know this option.

What i want to say is that every original document that comes in , after having been OCR’d, should be protected from deletion with a very strong security (more than one confirmation “do you really want to delete this item”, especially because if you clic on the “delete” sign in the utility bar, there is no confirmation.)

I say this because i believe that scanned documents, especially if you throw away the originals one day later (after TM-backups and cloud backups, Deleting paper makes sense in a real paperless office), should stay forever and must stay forever, except some rare situations.
.

So those docs must be “holy” for the database software, and perfectly protected.

In the worst case, it is not a problem to have 256000 orphan files in case of big bugs.
You can quickly find documents and retrieve every information with Finder’s “Search” in the PDF.

But it is a problem to have groups with missing documents and you even don’t know that they are missing until you search for them.

A good behavior is the behavior of Lightroom.

I have 111000 pictures in my Lightroom-Database.

There are sometimes (rarely) pictures missing for several reasons.

But as there is an option to keep the index of missing pictures, i can see at once that in a group of pictures, there is on original picture missing, and i see its preview. This makes it very easy to retrieve it in old backup copies of the Data folders…

Greg_Jones · July 14, 2015, 1:23pm

If you are indexing all documents in the database to a Dropbox folder, or folders, why is there a need to sync databases via DEVONthink? Just index the Dropbox folder(s) into databases on both machines, and as long as Dropbox does its thing correctly, the databases will mirror one another.

I don’t know how it would ever be practically possible for DEVONthink, or any app, to automatically track which documents are ‘holy’ and which are not. Unlike editing in Lightroom, OCR creates a new document by converting an image to the PDF+Text format. That’s not the same as tracking changes to an existing document. If you are already using Hazel, one option would be to write a Hazel script to lock the documents before importing/indexing to DEVONthink.

saschabur · July 14, 2015, 9:58pm

Only one reason of several possible needs: Because you want to classify these documents, and the classification information is in Devon, not in Dropbox. So you have to sync the databases. If you classify all these docs by folders in Dropbox, the Classify Intelligence of DTPO cannot play…

This could be an idea, but… you loose OCR workflow integrated in DEvon (it’s very performant!)

I would like to work with Devon, and i think that little changements or options could make it perfect.

Kinsey · July 15, 2015, 12:08pm

That behaviour would be worrisome, but I do not see it over here when using the Backup & Optimise command.

korm · July 15, 2015, 12:29pm

The OP is incorrect. DEVONthink does not change Date Added when Backup & Optimize is run. It changes Date Added when Rebuild is run – as it should. Date Added is merely internal database bookkeeping, obviously. DEVONthink does not change Date Created unless explicitly told to do so by the user. Date Created is what matters in tracking your documents. For scan-PDF-to-DEVONthink, Date Created will always tell you when that document was originally added to the database.

Kinsey · July 16, 2015, 1:48pm

Thanks for confirming this Korm. It didn’t sound like expected behaviour at all.