Cleaning large document collections

This is listed in the Manual on page 22, but it’s not very clear how to do this. Importing the documents into DT allows me to manipulate the documents within the database, but the changes are not reflected in my original documents folder.

While that makes sense, I’m wondering how DT can then help me clean up that original documents folder. It doesn’t make much sense to just duplicate everything, and there are some formats that DT just doesn’t handle, so deleting the originals and entirely relying on DT doesn’t seem to be the way to go (but convince me if I’m wrong!).

If covered before, please just point me in the right direction! I’m a newbie to DT …

Steve

There is no standard procedure we can recommend here. DEVONthink does not display most file types directly but it can handle them. You can open them externally and keep them internally. And for stuff you need also outside of the database you can just index them (means: file stays where it is and is references from the inside). You do this via File > Index or by dragging with Command and Option held.

Is the idea then to import all your documents and discard the originals, opening files within the database with an outside application when necessary?

Say you have a folder on your disk entitled “Various bits and pieces”.
The question is now whether to import or to index.
Option 1 is to import. In this case all of the files come into the file structure set up and organized by DTP. At that point you can delete the original folder.
Option 2 is to index. I haven’t done this myself, but if you index the folder the files remain where they are, but now DTP knows what they are, and can access them, search them and all the rest. If you index, you shouldn’t delete the folder.

That was a very amateur response, which I hope isn’t too misleading.

Declan

Not at all!

I do understand that - it was the “delete the orginal files” that is, frankly, a little scary. Can the originals later be exported out of the database in their original form (i.e., a Microsoft Office .pptx file imported into the database can later be exported an be indistinguishable from the original)? Or, should I import a file that DT can’t display/use properly, I can export it from the database (and if I choose, then index it just to allow the database to track it)?

Files in DT databases are stored in the database package unchanged from their original form. In fact, they are their original form (.doc, .pptx, .pdf, and so on). To see for yourself, go to Finder, right click on a DT database file (they have an extension .dtBase2) and chose “Show Package Contents” from the contextual menu in Finder. When the package contents are displayed, browe the folder Files.noindex and its sub-folders. All of your files are in those folders. They are all originals.

The particular structure of the folders matters to DT, although it might be confusing to human eyes. Nonetheless, you can use File > Export > Files and Folders inside DT to get any, or all, of other those files out of DT’s database whenever you wish. Use only Export to do this, because removing files manually from inside a database package will damage the database and possibly render it unusable by DT.

Personally, I never delete files from their original folders after I’ve imported them to DT. That’s because the data belongs to my clients and I need to demonstrate that I have it readily accessible. My point is don’t do what anyone tells you - before you delete anything, you’ll think about your circumstances and what happens if that file is gone, and then decide for yourself about deleting on a case-by-case basis.