Indexing folders on multiple machines creating copies of files?

Hi folks, I have a google drive folder that I am indexing in devonthink on my laptop.
That works fine. But I also want to read files in this folder in devonthink on my desktop.

So I was told here in the forum that I should have the google drive folder named the same way/location on my desktop and all would work well.

So indeed I can read it fine on the desktop as well. The problem is that I am getting dozens and dozens of copies of the files made somehow.

The newly created files are getting numbers appended to the names filename(1) filename(1)(1), filename (1)(2)(1)(1)… etc. Since the folder contains about 5000 distinct files you can see the cumulative effect - my google drive is filled!!!

Any suggestions as to what is causing this and how to fix it?

At a first guess, I would suspect the files are not continuously available locally. As you rightly said, indexing files on two or more devices requires that those files are to be found at the same path on all devices. The second requisite, however, is that the files must always reside in that location. From what you are describing, I suspect they are being made available locally temporarily rather than continuously.

I don’t myself use Google products, so cannot provide any specific guidance. I assume that you have the option of making files or folders available locally (i.e. off-line); you need to do that.

I’m not really sure, how you use indexed folders/files …

Normally you just have to drag and drop the Google Drive directory to the location of your choice in your DEVONthink database. Hold down the OPTION+CMD keys (the cursor will turn into a curved arrow) while dropping the folder… That’s it.

If you synchronize the DEVONthink database with a second Mac you should NOT synchronize the Google Drive folder on the second Mac, because that leads to copies of added files…

What exactly do you mean by that? (I’m not sure of the use of the word synchronise here; do you mean “don’t drop GD into DT on this machine too”? So “don’t index GD a second time” - in fact, once DT has synced to DT on the second Mac, that Mac will already be indexing GD, so no need to add it again. Or do you really mean “synchronise”, ie don’t let GD sync itself with GD?)

That’s what i mean. If you then add a file to the Google Drive folder (or via Devonthink in the indexed Google Drive Folder) on the first device, this file is immediately synchronized on both devices via Googles Server. This file will be added immediately also to Devonthink (via the indexed Google Drive folder) on the second device. Then, once you sync the Devonthink database, you get duplicates, because Devonthink will add the duplicated file again… and again. In my experience, this can result in a large number of duplicate files.

I think with respect to indexing Cloud drivers, there remains some confusion to some people which comes up here frequently. To spell out as best I can, see below. Anything wrong, speak!

  • I find Google Drive syncs reliably (compared to Apple’s iCloud). Main purpose for me is to share files with my siblings and family. (Not discussing or evaluating here any differences in perceptions of privacy/security of the various cloud service providers!).

  • It is essential for Google Drive to be set to have all files destined for indexing into DEVONthink to be on the local drive. DEVONthink does not reach into the cloud! That is controlled in the Preferences for Google Drive app where it should be “mirror files”. They say for this way of doing it (compared with what they call “Stream Files”):

- Store all My Drive files in the cloud and on your computer
- Access files from a folder on your computer
- All files are automatically available offline

Note: other Cloud synch service providers use different nomenclature and methods to control for this local/server copies if they offer it–adding to the confusion to some.

  • While it’s possible to attempt make an index in DEVON to use the Opt-Cmd keys to drag the “Google Drive” item in the Finder Sidebar, nothing really happens in DEVONthink afterwards, far as I can tell other than looking like some of the folders are there.

  • The way I index Google Drive into DEVONthink is to do it for only a selection (only those I really want) of folders (not individual files) that are mirrored on the local drive. I avoid the Opt-Cmd key option to drag as I can’t remember it. I use the Menu->File->Index Files and Folders …

  • The “mirror” copy are indexed in the exact same macOSX system folder names on both iMac and Macbook: /Macintosh HD → Users → rmschne → Google Drive so it all works on both machines. The file and folder names all have to look the same from the DEVONthink/macOSX perspective.

With this approach I don’t see duplicates. I accept that some do. A mystery worth watching. Sometimes I do notice it takes a while for new files in the Google Drive folder that are indexed into DEVONthink to show-up in DEVONthink. I don’t know why.

1 Like

That should not be a problem; it is common practice for the indexed cloud store to be syncing between two devices using its own sync mechanism and also be synced via DT. Obviously, however, you are speaking from personal experience with GD, which I cannot. The same basic theory applies to indexing iCloud, Dropbox or OneDrive files though - and this is the first time I have read a report on file duplications and the solution you have suggested. When you saw file duplications, we’re those files set to be available locally (i.e. offline) in GD?

Well, I need to circle back and walk-back something (without a crib sheet).

When I wrote above :

I was looking at a DEVONthink database that while was synching to WebDAV and available to Bonjour, I was unaware that the other machine did not yet “import” it. When I turned that on and allowed full sync to this database with ONLY the Google Drive Indexed … oops. Doubled up on the folders. Even after updating the index. And the when I add a test file to the Google Drive on Macbook, it syncs up WebDav, then on iMac that new file is momentarily shown in the indexed folder, and then as reported above, it’s trashed in DEVONthink (but not in the Google Drive Folder).

It did not used to do this. I used to have no qualms about indexing Google Drives. Something changed somewhere along the way. Many candidates.

Edit: Other weirdness. Created a Test:

Machine A (iMac): new database, index one folder on Google Drive. Everything imported ok. Enabled syncing WebDav and Bonjour. Per watching, all files and folders synched up. Then turned off sync (to enable test)

Machine B (MacBook). Imported the new database from the sync menu. Files should be pointing to the same indexed files on the Google Drive fully synched on that machine. Did a Sync. Seemed to work. Added a random file from Desktop to the Google Drive folder that is indexed. That file showed up in the right place in DEVON indexed folder. Allowed DEVONto synch.

I checked that Google Drive synced the new test file correctly. It did.

Machine A (iMac): After the test file added, I turned synch back on I watched the sync with that folder “open” in DEVONthink. The new test file appeared where it should.

I repeated above, after cleaning sync locations and deleting the test database. This time, on Machine A after round trip sync … duplicated folders and incomplete files. No files/folders duplicated or deleted in the source Google Drive.

This is maddening.

Hypothesis: If DEVONthink syncs databases before the Google Sync engine does its thing, then DEVONthink will delete files that DEVONthink exist, but not in the local to that database–yet. Never noticed this, but may or may not be new.

My plan: turn off DEVONthink syncs on indexed files on sync services and only do when I know the sync services are complete. Synching doesn’t have to be that timely anyway. Manually sync. I can’t think of what else, even in DEVONthink can be done.

Further to this – and maybe should have been obvious from the start – I’m concluding that DEVONthink databases that hold Indexes pointing to folders that are synched by other methods, e.g. Google Drive (and probably all the rest) should NOT be also synched by DEVONthink. Leave well enough alone and let the cloud service provider do the synching at his own pace (which sometimes is not immediate or if at all).

Then when new files re-appear on the local drive that DEVONthink indexes and DEVONthink notices the changes, DEVONthink will react to the changes automagically or with a re-index.

This also implies that intermingling such index cloud folders with other groups that are imported and thus in need of DEVONthink to take over syncing responsibility, things will probably get messed up (technical term) if timing/status not all in sync.

SO: I’m going to use dedicated DEVONthink databases when synching with any cloud-synch folders that are mirrored/not-optimised/off-line (pick a word) on the local drive. I will no add indexes to such cloud-sync folders to databases that require DEVONthink synching. These non-DEVONthink synching databases can reside on multiple machines that also host cloud synching with the same folder names (described above).

I hope I’m explaining this observation and strategy well enough. Or if I’m wrong, speak!

Yes, complicated, caused by all these synching methodologies, especially when indexed into DEVONthink, are involved. I think it’s the double-syncing leading to the issues I (and others) have observed.

Is the Google Drive folder the same on all machines?

Yes. But only after all syncing complete by Google.

See new note posted just as you posted.

Actually I meant whether the path of the Google Drive folder is the same on all machines.

Yes, as I explained above. I’d link to my post above, but can’t see a way to do it.

But turns out the conventional wisdom (which I also thought) using the same folder names not important if one relies on the cloud service (and not DEVONthink) for syncing these files and folders that reside in a local folder system but synced by a cloud service provider. Getting DEVONthink involved simultaneous with cloud syncing service causes the “mess-up”.

And what’s the path of your Google Drive folder? Actually the sync shouldn’t even handle the files/folders in case of known cloud folders.

As described above. For this particular folder,

/Macintosh HD → Users → rmschne → Google Drive → Family → RHS

It is “RHS” that I index.

Well, that’s not what I think I’m seeing. When both Google Drive and DEVONthink sync on, thing get messed up: files there but not in DEVONthink, doubling up of folders (groups), etc. When Google Drive Sync (with their Google Drive App) on and DEVONthink sync not on for that database holding the synced folders … all works as expected. Google Drive App fully in charge of distributing the files.

Perhaps whatever DEVONthink now does to detect cloud folder (Google Drive in particular) is now erroneous due to some change? Just speculating.

And this is a folder or an alias on all Macs? E.g. on Monterey the actual location is now a different one and the former location is just an alias.

Real folders, no aliases involved, in iMac and Macbook, both running macOS 12.3. I don’t know the marketing name of what I have for macOS.

Aliases and symbolic links do my head in (but have been known to use them for special purposes), and don’t work for some of the cloud service providers anyway.

Last time I tried Google Drive an alias was actually used on Monterey (12.x). Which Google Drive version do you use, the latest one?

Google Drive No aliases involved in my past that I know of, but nor do I recall that an issue but might not have paid attention. I’ve been changing the Google Drive folder on and off all afternoon on both iMac and Macbook, and all “fine and dandy” in DEVONthink on both machines without DEVONthink sync turned on but relying on Google Drive sync.

“ls” reports:

drwx------@  15 rmschne  staff   480 28 Mar 12:55 Google Drive

I’m going to bet that is an alias; it seems unlikely that GD can do what iCloud, OneDrive and DropBox don’t. Take a look in ~/Library/CloudStorage and see what you find there (I think that is where OD and DB have to keep their files nowadays). Alternatively, use Get Info on the Google Drive folder rather than a subfolder.

Or is the fact that Google Drive isn’t doing the same as everyone else the problem? I’ve just tried to get more info; the only thing I found was that in February GD wasn’t in ~/Library/CloudStorage - but no info on why the changes to macOS might not have affected GD or perhaps has.