Undo mass processing by smart rule

johannesjom · September 15, 2020, 9:33pm

That was exactly what I expected. Folders yes, but the content (files) is not touched. But in my example the file is renamed.

johannesjom · September 15, 2020, 9:34pm

Currently none. I also thought that I could exclude cross defects with other filters, because the problem was solved after I switched this filter to manual.

johannesjom · September 15, 2020, 9:38pm

This is an effort that one likes to invest for 3000 files. But doesn’t it make sense to be able to reverse some actions (latest 50 files ore so)? Doesn’t that work for server edition either?

Blanc · September 15, 2020, 9:38pm

When I get a moment, I’ll try and reproduce this tomorrow; as Jim has not been successful in doing so, I wonder whether you have stumbled upon a bug which is only active in very specific conditions. What type of file did you have in the “2010” folder? Were they all of the same time? Can you let me know the name of one or two files (before they were renamed by the rule). How exactly did you import the files?

PS If you don’t have a backup of your files, take a moment to rethink your backup strategy (I know, it’s a bit late to point that out now). You must have taken ages to rename 3000 files - and in that time, one of you strategies should have created one backup at least.

johannesjom · September 15, 2020, 9:43pm

Thank you for your efforts, I had made a gif earlier, but unfortunately deleted it. Can I do it again tomorrow and add the file names etc. At the moment I can’t do anything because I’m having the 3000 files re-read by OCR… But i only use PDF. Moved the folder (which includes one pdf) from the finder into that group.

Blanc · September 15, 2020, 9:47pm

Feel free to add any further info whenever you have time; I fear anything I do won’t actually help you anyway - but if there is a bug, finding it would help the wider community, so it’s worth my time anyway. I’m sorry for the time you have lost.

Again, please check you backup strategies to see whether they really cover you against the widest range of possible incidents. I posted my strategy a while back.

BLUEFROG · September 15, 2020, 10:19pm

None of the editions have such a mechanism built in for mass undos.

Did you actually move the folder from the Finder into the database, i.e., you have no copy of it in the filesystem any longer?

At the moment I can’t do anything because I’m having the 3000 files re-read by OCR

What do you mean by this??
Why would you run OCR on the documents again?

chrillek · September 16, 2020, 8:07am

I doubt that very much, though I don’t have the server edition. I suppose it is more or less DT combined with a web server and another GUI.
And frankly: as with any kind of automation, it gives you a lot of power. Also the power to shoot yourself in the foot (I managed to run a “rm -rf *” once from the root directory of a Unix server – talk about “reverting some 50 files”). No one can safeguard against all kinds of actions (simply imagine a script/rule that moves files to the trash and empties the trash immediately afterwards).

johannesjom · September 16, 2020, 9:04am

I have named the documents in a folder on the FS. From there it had been imported into Devonthink and then OCRed. I just did that because it was faster than digging in the backups.

I am not talking about all eventualities being covered. But to undo a changelog of the last 50 changes should be possible imo.

chrillek · September 16, 2020, 9:18am

Depending on the definition of changes. The last fifty keystrokes – probably. The last fifty actions of an arbitrary script – probably not.

Imagine you’d have an AppleScript run inside your smart rule which in turn executes a shell script that sends your files to an external server and then removes them from your disk? How could any software revert that action? Or, closer to your experience, if a script renamed 50 files to the same name by error? How to restore the 50 originals from the single file that remains?

rkaplan · September 16, 2020, 11:37am

Perhaps a solution would be an option that for any open database, the user could configure a Snapshot or AutoSave of the database every X minutes while retaining Y snapshots. This would be similar to AutoSave in Microsoft Word.

chrillek · September 16, 2020, 11:49am

Good idea. On the other hand, databases tend to be a tad bigger than office documents. Which could make taking a snapshot not so “snappy” and risk filling up the harddisk in no time. Also, DT would have to stop doing anything else while taking a snapshot. Stop right in the middle of a smart rule because it’s snapshot time? I wouldn’t want to implement all that transaction handling😉

rkaplan · September 16, 2020, 12:02pm

Would it? Time Machine, Arq, and Apple’s APFS filesystem all do snapshots silently in the background. Windows Servers have a “Volume Shadow Copy” feature which is similar.

Granted maybe those are features so complex that they are best left to the operating system or a freestanding utility.

In fact you could already implement the snapshot feature I am suggesting fairly easily today with Carbon Copy Cloner and/or ChronoSync. Having it integral to DT3 would make it simpler to configure - but it is doable right now with that software.

chrillek · September 16, 2020, 12:08pm

I was thinking “what happens if a snapshot is taken while 3000 files are moved and renamed”? But you’re probably right: there would be one snapshot before all that, one in the middle and one afterwards. One would just have to restore the one that should be kept.

rkaplan · September 16, 2020, 12:16pm

Agreed

And as for filling up the hard drive, that’s the point of letting the user determine the frequency of snapshots and how many to retain. The snapshots thus could be tweaked to reflect the type of database operations the user does and to reflect the speed and capacity of available storage space on the computer.

Related issue - I plan this for a separate post at some point with more details, but I have recently concluded that on the scale of “large” databases and/or large hard drives (for me about 200Gb of databases on an 8Tb drive with 2.5 Tb of total content) Time Machine simply does not scale. I just disabled Time Machine this week and switched to Carbon Copy Cloner for my local backups (continuing Arq for cloud backups) and I am much happier with its performance. Time Machine is not user configurable enough and feature-rich enough to work on this scale.

chrillek · September 16, 2020, 12:35pm

The Apple story: easy for the average user. A pain for anybody with advanced requirements.

Silverstone · September 16, 2020, 12:48pm

I recall someone said that you are supposed to use any of backup techs at hand for these purposes. DT’s backup is not about saving contents of data, - it is about saving database structure and integrity.
Or @BLUEFROG’s written about it in the manual somewhere…

BLUEFROG · September 16, 2020, 1:16pm

that’s the point of letting the user determine the frequency of snapshots and how many to retain. The snapshots thus could be tweaked to reflect the type of database operations the user does and to reflect the speed and capacity of available storage space on the computer.

This was previously an option in DEVONthink 2.x’s Preferences > Backup but…

It confused people into thinking there were file backups being performed.
Some people would not set reasonable maximums and bloated their databases with excess and unnecessary backup data, e.g., people set it to Hourly with 24 backups.

PS: Creating a snapshot is no trivial matter and it would have to be a proper snapshot to account for potential actions to be undone, not merely undoing a file renaming problem.

MikeP · September 22, 2020, 7:03pm

Depends on point of view. MacOS is a great combination of simple UI with all the power of a Linux-like OS under the hood.

Re. Time Machine, it basically uses standard OS features and uses hard-links to create snapshots. The beauty of it is that it’s technically open and fully accessible with basic Linux console know-how (handy if the TM drive is on a NAS).
The downside is that for scenarios like a DB where a couple of blocks get changed in a 100MB file means the whole file is copied again. Other software may store block-level deltas in a proprietary format. A TM-based solution here could be to exclude the DB from TM and have an own backup strategy for it, or have the DB spit out change logs for TM to back up.

chrillek · September 23, 2020, 7:18am

Firstly, I’m of course using Mac OS and do like it. Most of the time. Before that, I used Linux and hated the GUI most of the time. However, Mac OS dies not have Linux-like power IMHO. There’s no package management, eg. You have to go for Homebrew or something like that or install from source. There’s no filesystem standard describing where additional installs go. Setting up simple things like a lamp stack are a pain ITA, and every major OS upgrade changes something under the hood that breaks your lamp stack. Automation is bad and getting worse with every release: AppleScript is a dead horse, JXA abandoned by Apple and even their own apps lack automation functions. Major utilities like see work differently then in Linux and are are behind (like the just abandoned bash).
One can use a Mac for development instead of MacOS, not because of it. The GUI is still one of the best out there. But don’t compare it against older Apple standards. Why does Apple mail not have a share function?
But that has nothing to do with DT, so let’s take that private, if need be.