Preparing to Import

levelbest · April 5, 2015, 2:40pm

I am thinking of testing importing vs indexing. This makes me nervous as doing so makes my drive too big to safely back up until I erase the files I just imported. I still have a cloned backup that I made just before beginning this, but once successful I cannot re-clone to that drive again without paring down (removing) the originals.

I am planning to 1) import my main working folder in my documents folder, 2) delete my imported files from my Mac, 3) work on my files with classify, 4) export my files back out to my Mac, 5) re-index.

I may try this on just a sub folder to experiment until I feel more confident that I know what I am doing. And, if this all works as I have described I may feel confident enough to skip that last step of re-indexing. But for now I need to understand completely that this is a safe process before I start deleting all my old files.

I have to ask, is there any reason to be concerned that importing something as large as a documents folder (258.06 GB on disk) which contains photos and images as well as other documents (some pics go with projects), and then exporting them again is going to cause any change in the files? ANY CHANGE? I just need to be certain before I go farther.

Thanks.

Bill_DeVille · April 5, 2015, 5:10pm

Why would you want to capture your entire Documents folder to a DEVONthink database?

My own documents folder contains a lot of folders and files that wouldn’t add value to a DEVONthink database, such as the Microsoft User Data folder, my collection of DEVONthink Pro Office database files, and a lot of files that really aren’t of any significant lasting interest.

I don’t treat DEVONthink like a Finder replacement. To do so would, for my purposes, dilute the effectiveness and efficiency of my DEVONthink databases. Between Spotlight searches and EasyFind searches, I can find why I’m looking for in the Finder, especially items that I might wish to capture to DEVONthink.

I have two kinds of DEVONthink databases: collections of references and notes about topics that suit an interest or need, and collections of data such as financial records.

My most important topical database reflects my professional interests in environmental science, technology, policy issues, case histories and laws and regulations. It holds some 30,000 documents and a total word count comparable to the Encyclopedia Britannica. I’ve been building it for more than 12 years, adding new references and notes, pruning obsolete or less useful items. Because it’s topical, the AI assistants such as Classify, See Also and See Related Text are powerful tools for organizing and exploring the information content.

Long ago, I improved its effectiveness and efficiency by spinning off into a second environmental topic database content that focusses on methodologies: environmental sampling techniques, sample analysis methodologies, evaluation of environmental data (statistical methods and quality assurance procedures), risk analysis and cost/benefit methodololgies.

For example, when I’m researching the human health effects of mercury in fish, I want to see case histories, toxicology, and actual or proposed safety and regulatory standards. I’m not immediately interested in the collection, preparation and analysis of fish samples, or in risk assessment techniques, which I would find distracting. By the same token, when I’m evaluating potential analytical methodologies for mercury compounds, I don’t want to be distracted by search or See Also lists about case histories or toxicity.

By splitting my collection of environmentally related references and notes into two databases, I’ve made both collections more useful than were they in a single database. Both databases are topical in nature, although each covers a range of disciplines.

By contrast, I have other databases such as my financial database that collects and organizes information about banking, investment and project cost information (highly useful at tax time), in which the AI assistants such as Classify, See Also and See Related text are of little use (the text content of these items really isn’t topically coherent) – but really are not needed for my purposes. A great deal of this content is from scans sent to DEVONthink Pro Office for OCR. Those scans are first sent to a special database, Incoming Scans, that holds 32 smart groups. Each of those smart groups is based on a search of document content that results in separation of incoming scans by text content, so that documents from each bank, investment firm, bills by vendor and so on are automatically organized and can be moved into appropriate groups in my financial database. Neat trick! Most of my filing chores resulting from scanning a stack of documents to DEVONthink are simplified. After emptying those 32 smart groups there will likely be only a few remaining Items in the Incoming Scans database that require individual attention.

I treat my databases like information Lego blocks, opening and closing them as needed. Of course, as the option in Database Properties to supply indexing information to Spotlight is checked, I can do Spotlight searches across all my databases, whether they are currently open or not.

I rarely need to share data with other applications on my Mac, so favor Import-captured rather than Index-captured databases. I like to be able to organize database content without worrying about synchronizing them with Finder content. Just a matter of my workflows and preference. Others may need or prefer to use Index-csaptures and that works for them.

korm · April 5, 2015, 7:20pm

If you are cloning your boot drive and its content to a smaller drive you might want to consider changing that strategy. I always clone boot drives to external drives with at least the same capacity as the boot drive and use software built for cloning – such as Carbon Copy Cloner, etc.

levelbest · April 5, 2015, 8:55pm

I agree. I have had problems upgrading to Yosemite so I am still at Mavericks. I will get there, It’s on my projects list of to-dos. But for now, I have to make sure my backup fits on a 1TB drive. And yes, I use CCC too. Highly recommended app.

levelbest · April 5, 2015, 9:12pm

In my case, I have many years of poorly backed up files. I Have whole documents folders inside of other folders. I have my old MDD G4 hard drives backed up and my old 12" Powerbook drive backed up. It’s all a very large and lovely MESS!.

I have used duplicate checkers already, quite a bit. I have also been learning new strategies for how I think and where I should be putting things so I can find them again. One of the benefits that I discovered in DTPO right away was how I could look at a file, use the magic hat, and see a different file that was also relevant.

For example, I worked as a FEMA inspector for a few major disasters. I did nearly 500 emergency inspections after Katrina alone. As a writer naturally I did my best to record the stories that I heard and to describe how I felt. But that process took some years and involved writing a coffee shop in Taos, writing in a cabin on Hood Canal, Writing in Virginia on my old MDD G4, and on and on and on.

When I found some writing I had done, just a few pages, about the Superdome and some stories I was told, I clicked down the magic hat list and found something I had written in a cafe in Taos about another aspect of the disaster. The files were not named the same at all but DTPO made a good suggestion to look at them both.

I am culling the herd as It were, finding the interesting ideas again and killing off the duplicates. I am turning what I have that is research, into proper research and turning what is salable into real marketable ideas.

I realize that my documents folder also has Microsoft user data and other non essential files and no, I am not going to include them in my index or import. I am focused primarily on my 4 life goals and their sub goals, and my 8 support areas.

All of this is about learning to build and to maintain a concrete folder structure so that I can decide where things belong.