Database with indexed files / group not replicated to hard-d

Uniqueuser · January 22, 2010, 8:01pm

Hi, I just setup a database and added files via indexing.

I have some stuff in the DT inbox that I want to move around.

Hence I created a new group (and I think this is like a folder) and move the file to this new folder. My expectation was, that these actions (create folder, move file) are replicated on the hard-drive because the database uses an indexed approach.

But this doesn’t seem to be the case. Or am I missing something?

Bill_DeVille · January 23, 2010, 7:07pm

That assumption wan’t correct. For Index-captured content, the synchronization is one-way, from the Finder to the database but not the reverse.

Uniqueuser · January 26, 2010, 7:07pm

Ok, understood. So, things I add directly to DT are living in it from that point in time.

Is there an export option to get it onto my hard-drive? I’m not a fan of having my data in a tool that I can’t access without it. Outlook once burned me to often whith its PST file approach.

Bill_DeVille · January 26, 2010, 10:10pm

Certainly. DEVONthink 2 stores all your documents in their native file types in a folder inside the database package file, so that it could always be recovered there.

At any time, you can choose File > Export > Files & Folders, select all of the content of a database and export it back to the Finder.

DEVONthink provides two capture modes: Index, so that the files remain external to the database, and Import, so that the files are copied into and stored within the database.

Of the two modes, my preference is for Import, as I want my databases to be self-contained so that I can easily migrate them among my computers, and also because I’m free to reorganize the database without being constrained by organization in the Finder.

Uniqueuser · January 27, 2010, 8:12am

Will this export only Files & Folder or other information obects like web-links, web-archives etc. too?

That would be my preferred way as well, but I just have to much data. And, how do you keep it in sync over several systems? By copying it? Or is there an included “Keep these system up to date.” function?

And, it would be a killer feature if this would work via something like a P2P network where I can selected which parts to sync to whom.

Bill_DeVille · January 27, 2010, 5:20pm

Bookmarks (Web links) and WebArchives are files, too and are exported by the File > Export > Files & Folders command

The storage space of the files is the same, whether they are Index-captured and remain external to the database, or are Import-captured, copied into the database. After importing files from the Finder, I either delete the Finder files or archive them to an external drive.

I work on a database on one computer at a time. The big advantage of using a self-contained (Import-captured) database is that migrating it to a different computer involves copying only it to the other computer, whereas copying an Index-captured database would involve copying not only the database package file but also all the externally linked files in such a way that their Paths were not broken — which is inefficient, messy and has a high probability of losing information.

Comments about synchronization: From a quality assurance viewpoint, “synchronization” of databases on multiple computers remains an ideal that, unfortunately, still isn’t well supported by the existing technology and infrastructure of the Internet, and involves issues of data integrity that can bite the user if ignored. Here, I’m talking about approaches to synchronization of files, address books, calendars and the like using MobileMe, DropBox and other approaches to ‘cloud’ computing.

I use MobileMe to synchronize my Address Book, iCal calendars and some other data on my Macs and my iPhone. When Apple first started up MobileMe there were some fairly serious teething problems. MobileMe is working pretty well now, so that if I update an Address Book contact it almost always gets synchronized among my Macs and iPhone in a reasonable amount of time.

“Almost always” is OK for my Address Book. But that’s not good enough for me to trust the integrity of an important database entirely to a system over which I don’t have total control, if I know that it isn’t totally reliable. At a minimum, I’m going to keep frequent backups on local media.

Currently, we strongly recommend against using MobileMe/iDisk to synchronize DEVONthink databases, for still another reason. Apple’s approach works pretty well for files, but not for a complex database. The developers are working on a plugin to reduce that kind of problem, so look for this in a future release. Currently, you should expect damaged databases if you use MobileMe to synchronize them among multiple computers.

Some users report success using DropBox to synchronize DEVONthink Pro/Office databases on multiple computers. But problems will result if two cautions are ignored. Always properly close a database after accessing it on one computer, before accessing it on a second computer. And never assume that synchronization through the ‘cloud’ takes place instantaneously. Data movement is much slower than on your hard drive; if you are modifying a database while it is still being updated via DropBox, chaos can result.

Because I have a slow satellite broadband connection that is unreliable if there’s bad weather either locally or at the distant uplink station, ‘cloud’ synchronization isn’t feasible for me. I run pretty large databases. Often, the synchronization procedure would take hours to complete. If I had a fast and reliable broadband connection, I would make use of ‘cloud’ synchronization, but would still maintain local backups.

So, I do “synchronization” either by copying a database from one computer to another, or by running the database over a portable hard drive that can be connected to the computer I’m using.

Yes, there are little databases such as Evernote that depend heavily on ‘cloud’ storage. But Evernote is much less powerful that DEVONthink and cannot handle the volume of data that DEVONthink does. You can anticipate a DEVONthink app for the iPhone (and a certain new Apple product), however, that’s currently under development.

That can’t happen currently, as DEVONthink databases are single-user. However, DT Pro Office has a Web Server mode that allows one to ‘broadcast’ databases over a network.

Uniqueuser · January 31, 2010, 1:07pm

Thanks for the long answer.

DT being single-user without a way to sync or more precise to replicate to other systems is most likely a show-stopper for me.

I use Jungledisk to keep several directories in sync between my MacPro and my MacBooPro. I need both as I’m one of those road-warriors.

So, using DT on the dircetories wouldn’t be a problem. These are the same. But if I now add stuff to DT on MP or MBP, it’s stucked to this system. Hence, I will get two forks of the database.

Again. I think the concepts of DT are great. But to really become successful it first need to keep data where it is and just add the DT meta-data to it’s own database. Things added to DT need to be reflected on the local filesystem. And, the DT database needs a way to be replicated / multi-user to work on more than one system or in a team.

Bill_DeVille · January 31, 2010, 7:18pm

Uniqueuser:

Thanks for the long answer.

DT being single-user without a way to sync or more precise to replicate to other systems is most likely a show-stopper for me.

I use Jungledisk to keep several directories in sync between my MacPro and my MacBooPro. I need both as I’m one of those road-warriors.

So, using DT on the dircetories wouldn’t be a problem. These are the same. But if I now add stuff to DT on MP or MBP, it’s stucked to this system. Hence, I will get two forks of the database.

Again. I think the concepts of DT are great. But to really become successful it first need to keep data where it is and just add the DT meta-data to it’s own database. Things added to DT need to be reflected on the local filesystem. And, the DT database needs a way to be replicated / multi-user to work on more than one system or in a team.

I completely agree with you that it would be great if the problems of divergencies in content of a database on multiple computers were eliminated or reduced. I’m often a road warrior, too. Usually, when I’m on the road I’m working in my research databases, which are pretty large — about 9 GB on disk, mostly text content.

The DEVONthink developers are working on ways to improve synchronization of databases on two computers via MobileMe (which currently should not be used, except for zipped backups) or DropBox. But I’m still plagued by the slowness of such synchronization for anything but relatively small databases. Moving data around on the Internet is by no means instantaneous. ‘Cloud’ computing involves uploading data from the source computer (usually at a fraction of the rated download speed of the broadband connection), then downloading it to the target computer(s).

Let me describe one of my road adventures. I was comparing how two shipyards managed a particular waste material. I travelled to the location of a governmental agency that held the voluminous reporting records of one of them. These documents were public record but not accessible via the Internet. I scanned hundreds of pages into a new database (with OCR turned off at the time — OCR was later run overnight). While scanning was going on I was making notes into a database about interviews with agency staff and from inspection of other documents. In this case, all the information was public record level.

I also had similar information already gathered about the other shipyard, but in that case also had access to information that was not public record.

OK. For that project I used 9 GB for my research databases that held content necessary for analysis of the new data, and 3.1 GB of data documenting the two shipyards’ practices. That was about 12 GB of data. It took me 3 days to analyze the data and demonstrate differences in practices at the two facilities. Informal sharing of portions of those conclusions with two people achieved the purpose of that work, a settlement agreement involving quite a few million dollars.

Suppose I had created a workgroup for that project. Forget anything like Evernote, which doesn’t have the horsepower to let one work quickly with that volume of information. My most practical method of sharing would be to provide other workgroup members zipped archives of the databases, with some content redacted in one database for most of the workgroup members. Copying would be done by Ethernet or sneakernet for local members, or via MobileMe, JungleDisk, etc. for remote members. Synchronization in that case would NOT be desirable.

Re multi-user access: The DEVONthink applications are single-user for very good reasons. The logical problems of multiuser access to a database are not trivial. A multiuser database must protect database integrity in the face of potential conflicts among users, such as simultaneous attempts to access the same document and edit it in different ways. And I mentioned the fact that my data included some content that could not be disclosed to others, which means that levels of access to data must be managed. Some users should not be able to read certain content. Some users should be prohibited from the ability to edit, delete or reorganize content. A multi-user database is very much more complex than a single-user database — and generally MUCH more expensive for that reason. (The Server mode of DT Pro Office allows multiple remote users to browse, search and download content of databases, but does not let them make any changes to the viewed databases — although remote users can upload notes and files for consideration by the database administrator.)

Example: A single-user FileMaker Pro license is $300. A FileMaker Server 10 Advanced license is $3,000 — and users who access the multi-user database must be running a copy of FileMaker Pro. FileMaker Pro is a good database application for numerical data, but it doesn’t compare to DEVONthink Pro Office for document management and analysis.

DEVONtechnologies has long-range plans for a multi-user “enterprise” level of DEVONthink. It would allow multiple users access to a central database, so that “synchronization” is automatic. But it would not be intended to be a consumer-level product.

Re the comment that DEVONthink should merely retain metadata and place the files themselves in the Finder: Christian has noted the possibility of two-way synchronization of Indexed content in a future release. But in the case of synchronization of databases on multiple computers, I see a number of potential complications for that approach. Personally, I would avoid it.

nsgirl71 · February 10, 2010, 3:29pm

Hi, I have some questions which I think are related to this post, except a whole lot simpler circumstance.

First, I created several databases on my iMac for all of my filing. Some of my files are outdated but still need to be kept, and would only need to be accessed in unusual circumstances (ie. audit). I burned these on to a disk, which is in safekeeping, and indexed the files in my database. Basically just so I know what I have on these disks. When I click on the item I get the expected message “file missing: drivename/diskname/filename” but the file size is still showing as however many mb. Is this actually taking as much space when it is not on the disk as it would take if it were there? Or is this just giving me information about the file?

Second, I just bought a laptop and would like to do some work on it. I am only one person, my desktop will be shut down when my laptop is on, no one else will be using my database. What is the best way to do this? I don’t need to keep them constantly syncing while I work, because I won’t be on both at the same time. Would I be able to keep my database on a flashdrive or something and work off of it on whichever computer I’m on - or would that be too slow, or corrupt the data? What would be the best method to deal with this? I would be using only the laptop for extended periods while away from home, but also using the laptop during evenings at home when I might have been using the desktop during the day. So, it would be a little tedious to be copying back and forth every time I switch, but I would do it if it were the best/only option.

Finally, if I move the database on to the laptop, and put in one of the disks that contain the outdated files - will it still know to find it in the cd drive?

Uniqueuser · February 16, 2010, 10:25pm

Using a flash-drive or USB stick might work. But it depends on how often DT writes to the database. Writing is a lot slower than reading.

I thin you need to try it out. I once bought a 64GB USB stick to store some virtual machines on it. Reading speed is good. Around 25MB/s but writing is only about 5MB/s. Not usable…

Nexus · March 3, 2010, 9:35pm

I have also some questions which I think are related to this post, except a whole lot simpler circumstance for me too.

I have many DT databases that I have indexed data too from external harddrives. My primary goal is to delete duplicates of movie files and have move all the indexed movie files (I have the original file left on the external drive) to another Database that I collect all the movie files and so on. This works very well. But in some way, many indexed files I have moved from a database to my master database (that I collect all the movie files too), is now 0 kb in size and although I had the external disk for this movie files so I can play them, I still get 0 kb in size for these in the database and therefore have trouble to sort this files after size.

So my question is:
Is it possible to sync this movie files that has 0 kb in size so they get the real size back to my primary database?

h.wenzel · March 14, 2010, 1:02pm

@nsgirl71, I have nearly the same scenario - iMac at home and MacMini at work - and I managed it like this: After finishing work on my home computer, I copied the data to an external drive.
(By the way, for protection the data are copied into an encrypted sparsebundle on this external drive - in case it gets lost.)
Arrived at work, I copied the data to the MacMini. So I had access to exactly the same database.

But you should be pretty cautious ! For DT databases it’s essential to shut the app before copying. And the DT-database should be placed exactly into the same (corresponding) folder on both computers.

As copying takes a lot of time, I soon switched to synchronizing. For synchronization I use ‘Syncronize! X plus’, but I think there are several others apps that should work as well. And I splitt my former one big DT database into serveral small ones. As I don’t use each database every day, that’s much faster.

Of course, it takes some time and discipline (DT must be shut down, don’t forget to synchronize before leaving !). But as a positive side effect you have two backups of your data.

I use this way of synchronizing even for Mail, iCal, Safari bookmarks and others. With Mail and iCal it doesn’t work smoothly all the time . But for my DT databases, I did not have any problems except when I forgot to shut down the app or forgot synchronizing. Maybe an automator action that forces application-shutdown and synchronizing in the end of the day could prevent this human failure.

korm · March 14, 2010, 1:16pm

+1

I have basically the same routine as h.wenzel, though I use ChronoSync.

I also have most of my client data in a folder structure external to the databases I use for client work. These are indexed. For federal security reasons I can’t import them. My approach is to have all my work in a folder at the root of the system drive on both the desktop and the laptop. The folders are mirror images, and ChronoSync keeps the whole hierarchy synchronized. (I use “dissect packages” and only sync changes – speeds up the process and has never failed.)

The work folder for the MyWork hierarchy below the root is outlined below [use your own names]. It is MyWork that is synced between machines - brings along the data and all the databases I need. When I’m finished with a project I archive the database outside of MyWork.

MyWork
–Client 1
----Project 1
------Project 1 document hierarchy
----Project 2
----[etcetera]
–Client 2
–[etcetera]
–DEVONThink Databases
----Database for Client 1 Project 1 ← indexed to Project 1 docs
----Database for Client 1 Project 2 ← indexed to Project 2 docs

I also have work documents, schedules, issue trackers, etc., that are my work products and are not indexed but reside inside their respective databases.

Finally, I keep the contents of the Application Support folders for Scripts and for Templates.noindex synchronized so that I have the same DTPO utilities on both machines.

For full assurance, I TimeMachine the root every hour, and clone the whole root volume every morning to an external drive. Thus I get three snapshots of my work beside the master.

h.wenzel · March 14, 2010, 3:17pm

That is good to hear - uptill now, I didn’t dare to synchronize indexed DT databases.

Just to have that clear: does this apply to ‘normal’ files and folders within the finder ? As far as I’ve understood, a DT database should be copied or syncronized as a whole because of the internal links within the database (?)

Thanks for mentioning ChronoSync - I think, I’ll give it a try. Synchronize’ X Plus provides less features (e.g. no bootable backup) and they ask you for a two-year-renewal of $23.00

korm · March 14, 2010, 6:09pm

Everything - all the folders that are indexed (containing the client documents) as well as the DTPO databases - for my work are subfolders within the folder named “MyWork” in the example I posted above. The databases in the “DEVONThink Databases” subfolder. The indexed data in the various “Client” subfolders. So, yes, this applies to normal and indexed files. The whole hierarchy under MyWork is sync’d.

Yes. That’s what I do.

Nexus · April 1, 2010, 4:16pm

Nexus:

I have also some questions which I think are related to this post, except a whole lot simpler circumstance for me too.

I have many DT databases that I have indexed data too from external harddrives. My primary goal is to delete duplicates of movie files and have move all the indexed movie files (I have the original file left on the external drive) to another Database that I collect all the movie files and so on. This works very well. But in some way, many indexed files I have moved from a database to my master database (that I collect all the movie files too), is now 0 kb in size and although I had the external disk for this movie files so I can play them, I still get 0 kb in size for these in the database and therefore have trouble to sort this files after size.

So my question is:
Is it possible to sync this movie files that has 0 kb in size so they get the real size back to my primary database?

At last I have some explanation to this strange behaviour that maybe can be a bug for the moment.
When you duplicate indexed files to another database, for now it seems that Devonthink must reach the source files for this video clips so you get the real size of the video files.

When I get the error in the quote, I could not reach many files for the moment and thought I could duplicate/move them to another database and get the same properties. But for now in the latest DTP(O) 2.0.2 it seems impossible to do this. It seems impossible to syncronize this movieclips although you can play the videofiles after you have duplicate/move files to another database there the source files cannot be reached and you get 0 bytes in size for the clips.

The solution for me was to start from the beginning with this files and have possibility to reach the source files before I do this duplicating to a tag in another database.

Developers, Is this a bug?

cgrunenberg · April 2, 2010, 7:48am

Yes, it’s a bug. V2.0.3 will fix this.

Nexus · April 2, 2010, 9:39am

Thanks for this confirmation that this is a bug Christan!