Indexed Watch Folders on WebDav Behavior?

I own DevonThink Pro Office.

I have asked a question about this topic already but I am still a bit confused as to what to expect.

I have used folder actions to INDEX a WebDav folder which I have mounted to my machine using TRANSMIT (a rich FTP client).

Obviously, I want DevonThink to index anything that gets put in this folder. And it worked!

But… I am a bit confused as to how it works exactly, as there was an instance where it did NOT work. I added a file, and it didn’t get sent to the database.

I re-added it, and then it did.

  1. Because the WebDav folder is always mounted, but experiences interruptions (internet goes off, server down, etc.), I am wondering how DevonThink will handle this?

  2. Does DevonThink poll the folder on a time basis and then add new files? Or does it just index files on their way IN to the folder?

The difference of course, is important for a WebDav folder. If Devon polls the folder at intervals, when the WebDav folder is next stable and available, it’ll just index anything new. If however, it only adds files as they are added to the WebDav, this could present the opportunity to miss something when the WebDav folder is offline and unavailable.

  1. Also - this will probably be answered by question 2 above - but, what kind of time delay should I expect from when something is added to an index ‘watch’ folder, and when it’s actually added to Devon Think?

Thanks

You are dependent on with the intersection of at least eight processes - roughly in this order: your host server where the physical file are stored, the Internet, your ISP, your local network, the local OS X file system managing the WebDAV mount, the Folder Action Dispatcher monitoring changes to the WebDAV folder, Transmit attempting to manage the file systems’ management of the WebDAV mount, and DEVONthink. Of these, only the first seven are actively doing or monitoring something, and none of those seven are doing their job continuously. They are either driven by interrupts or by polling. If one of the events breaks down, an interrupt is not monitored, or the event occurs that a polling cycle misses, then the Folder Action will not trigger DEVONthink.

In a Folder Action, DEVONthink isn’t monitoring anything. Ever. It is the Folder Action Dispatcher (a part of OS X and not part of DEVONthink) that attempts to execute AppleScript commands that then activate DEVONthink and tell it to index the file. If in the chain of events mentioned above the Folder Action Dispatcher fails, then DEVONthink will fail to index.

It is not possible with the hardware and software you own to have 100% reliable folder actions.

THank you for the detailed reply. This is a bit disheartening, as while I understand the dependency chain you describe (though this is my first time using MacOS, which I in fact purchased just to use DevonThink - so I didn’t know that folder actions were OS-functions), I feel as if it’s, essentially, a fairly simple concept I’m implementing.

Namely - I have have data stored on a WebDav server (‘the cloud’) and just want DevonThink to index the cloud folders.

I have so far noticed that it works only when I add NEW data to a folder which was already set to “index,” which, to me, seems to indicate the index script runs only when data is added to the folder. It does, so far, seem to work, so long as I only add data to the folder AFTER I’ve indexed it.

**Obviously though, as you’ve mentioned, if I add a file to the WebDav folder when the internet connection is down, DevonThink (or the index script rather) will not “see it” for indexing, and it will be passed over and left out.

  1. Is there a way to force DevonThink to poll a folder and just sync the missing indices? (I know there is an update index button?)

  2. Still, I’m a little confused as to how DevonThink’s “sync” feature works then, since I was under the impression it was meant to do precisely what I’m looking to do?

Yes, it’s a simple concept. Possible failures can result independent of the operating system (OS X) and DEVONthink, as korm observed.

Ok… but am I missing something as regards the two functions I inquired about, namely,

  1. “update indexed items” and

  2. The entire ‘sync’ functionality, which I know was recently launched - what does this do, if not something along the lines of what I want to do? (WebDav Index Sync)?

No there is not.

The DEVONthink Sync feature is different from what you want to achieve. Sync is for synchronizing whole databases between machines and users. Sync can use a WebDAV location for that purpose – but that has nothing to do with either folder actions or indexing. Completely different technologies.

Did you read something that created a different impression – if so, it might be helpful to know what that was.

Ok… so then, should I understand the matter as:

  1. I can index local folders without problem (and have done this successfully), but using WebDav folders as index targets is going to be a problematic solution?

(even though I have made it work, it’s not especially robust, and due to aforementioned issues, does miss files, and certainly doesn’t update very well, such as when I create a new sub-folder in the indexed folder, the new folder doesn’t index because it wasn’t “added”… so what I do is I drag it to the desktop, and drag it back… then it indexes properly in DevonThink…)

  1. I didn’t read/misunderstand anything, I just assumed this wouldn’t be a problem, as I know that DevonThink does list WebDav support - I assumed that this was essentially a sync operation.

  2. I’m also getting the sense that DevonThink is really more designed for importing data INTO the database. I’ve mostly opposed this so far for the reason that I want local copies of my data (and don’t much like the idea of duplicates of everything) AND simply don’t have enough space on the Mac to store all the data in my cloud.

  3. I’m still committed to making DevonThink work however, so I’m wondering if there’s anything I can do to make the setup more robust. To be fair, Transmit is a superb WebDav mount client - and it’s solid even on jumping WiFi, disconnects, etc. - I think the problem lies in the scripting that is used by the “index” folder action.

I’m wondering if I write/acquire a script that does the simplest thing possible - RE-INDEXES the folder on a time-basis… it would completely make all of this work…

Thoughts?

A manual (if tedious and potentially confusing) workaround I’ve found to my indexing problem (which I should note, isn’t JUST a problem with WebDav, but can also get messy with LOCAL index targets if, as mentioned above, you create a new folder for example - the script does NOT index it)…

My solution is to just “index” a folder manually by selecting it… which creates massive duplicates in the inbox from the time it was indexed before… but then I just go Data>Move duplicates to trash… and it cleans it up.

It’s not amazing but… I’m thinking it could work…

Ah never mind about that… I got excited and then realized that duplicates are only removed in the open window - so, if I had duplicate data already dispersed from the Inbox Global, and then re-indexed the folder, and then tried to remove duplicates, DevonThink doesn’t think they’re duplicate, so they stay.

I would have to go and de-duplicate every database for that to work, which would be very annoying.

No, not really problematic as long as your machine is connected to the WebDAV location. (If the location is disconnected, however, you’ll get “File Missing” errors in DEVONthink.) DEVONthink can’t index a file that the file system doesn’t know about.

Oh

DEVONthink is designed to import and index, equally well.

Not really. Why assume that? Folder actions are managed by OS X, not DEVONthink. It is the OS X Folder Action Dispatcher that tells DEVONthink to execute DEVONthink’s piece of the script. Folder Actions can be unreliable and fail to notice a change to a folder. There is nothing DEVONthink can do to prevent OS X from making a mistake.

Why not forget this whole folder action. Select the indexed group in DEVONthink. With the group selected, open Tools > Show Info. Attach the script at /Applications/DEVONthink Pro Office/Extras/Scripts/Triggered/Synchronize.scpt to the group. With that script attached, then every time you select the indexed group it will be updated.

Personally, I wouldn’t bother with a script that automatically reindexes a folder on some interval. It’s a tricky script to write and get working correctly.

Hmm - I didn’t mean DevonThink was responsible - I also didn’t realize, being completely new to Mac, that Folder Actions were part of the OSX’s own scripting language…

I didn’t know that there were ways of adding things to DevonThink Groups like that - I saw a tutorial that showed the use of folder actions so I did that.


I tried this method of using the ‘synchronize’ function, though I can’t figure out what it’s doing exactly… I have noticed some things about it just playing with it though…

  1. I put the script on the ‘Documents’ LOCAL folder, but I have subfolders in there. It seems to work on the top level folder, but when something is added directly to the subfolder (like I save pictures from the web there for instance), it does NOT update.

  2. When I click the “update indexed items” command after highlighting everything in the Global Inbox - it does reveal the nested folder changes.

  3. Do I have to attach the script to every subfolder of a main folder that I index? If so… does that mean every time I add a folder to index, I should go through and index every subfolder individually as well?

  4. How does this command differ from running ‘update indexed items’ manually?

  5. Another thing I’ve noticed - If I move a file from the Global Inbox to another database… and THEN “update indexed files,” DevonThink re-indexes the same file again (which makes sense)… but… this is problematic since it means a lot of duplicates will result - whenever I reload the indexed files, all the files I’ve already sorted into the database will be recopied!


Thanks so much for the help - I was expecting to be able to set this sort of thing up more easily.

Since I’ve posted so much I should just state clearly what I’m trying to do in a sentence: I have 3 machines, 1 of which is the DevonThink Mac. I want to unite all the data into a cloud server I own. I then want to use DevonThink to INDEX all the data on the 2 remote machines, and its own local Mac.

Obviously then, as both the local and WebDav folders are changed (data added to them mainly), I would like the data pushed into DevonThink so it maintains an index of ALL my data.

I understand that deleting or moving data breaks indices, so I’ve attempted to use subfolders to minimize how often this occurs, or make it easier to trace when it does.

Thanks again!