Import vs Index

ajoyce · April 21, 2009, 8:10pm

OK. I have a few questions:
If DT is a document management program and the encouragement is given to not index your files but to import them instead, that would mean that I would have two copies of my files. Obviously that would fill up my hard drive pretty good. So, I could delete my original files and just keep the ones in the DT db. Now, assuming I did that and imported all my files:

I wouldn’t be able to see those individual files anywhere because they’d be buried somewhere in my library. Correct?
I would only be able to open those files by finding them and clicking on them within DT. Correct?
If I’m working within a program and want to open a file, I’d have to go to DT and search for the file and then click it to open it. Correct?
If I’m working with a file I opened through DT into an external program and decided to use the “Save As” function, then if I save that new file into the same directory as the file I was saving from, that new file won’t show up in DT and I’d have to find it and add it manually, or do some kind of sync function. Correct?
If I open a program by itself and then create a new file, then that file would have to be saved to the desktop and “put away” later. Correct?
If all my files are in DT and DT decides to go out of business and something happens to my DT program, I’m out of luck, unless I’ve made a backup to an external disk. My Time Machine backup wouldn’t be much help. Correct?

However, if I index my files, against the general recommendation of DT, then my regular backups through Time Machine would be sufficient to restore if necessary. And, I would be able to save new files as I’ve always done. And, I can create new files and store them “on the fly.” Correct?

If the above last question is in deed correct, then DT becomes a simply “find the file quickly” program, or a replacement for Spotlight that has a little more muscle. Correct?

Please help me understand better why this program will be of sufficient value to me to spend the money to buy it. I can see where the use of DT is a totally new paradigm, but I’m unsure as to whether it will be the help to me that is suggested in the few descriptive phrases used in marketing the program.

Bill_DeVille · April 21, 2009, 10:29pm

The decision as to whether to Index or Import content is up to the user. There’s nothing ‘wrong’ with either approach, depending on your needs and workflows. Yes, I prefer Import-captures, but that doesn’t mean that my approach would be best for you. If you wish, you can Index some files and Import others.
Re your questions about Imported files in the database: All the files are stored in their native file format inside the ‘Files.noindex’ folder, which is inside the database. Your files are not held hostage in the event that you quit using DEVONthink. The files can either be exported to the Finder, or the database can be opened (Show Package Contents, for DT Pro/Office) and the files inside ‘Files.noIndex’ can be copied or moved to another location.

Suppose you no longer had a working copy of your DEVONthink application and database, but had a Time Machine backup of it. Could you recover all of your files from that backup? Of course. Just open up the database and copy or move your files.

Yes – although some of your assumptions are not correct, i.e., there’s nothing wrong with the Index-capture approach, and you could fully recover everything from an Import-captured Time Machine backup.

No, DEVONthink does a great deal more than the Finder and Spotlight.

Your database integrates all the the text content of your documents into a unified body of information, with artificial intelligence features that can recognized contextual relationships of the text in each document to all of the others, and the relationships of the documents within each group. For example, See Also can suggest other documents that are similar to the one you are viewing. That can be a very powerful research tool, which has no counterpart in the Finder and Spotlight. If you download new content from the Web, the Classify feature can suggest a likely group into which it can be placed. (One has, of course, greater freedom to create or modify group organization in the case of Import-captured material than for Index-captured material.)

In rich text documents within your database, you can establish static or Wiki links to other documents. I take lots of notes inside the database and it’s easy to associate them with other documents. I do most of my drafting inside the database, with immediate access to all my notes and references. If I wish to see a list of the documents that contain a certain word, I can Option-click on it to see such a list. If I select a text string and press Command-/ (Lookup) a Search window opens so that I can search for other occurrences of those words or the exact string within my database, or all open databases. If you have a large batch of unfiled documents, the Auto Group artificial intelligence feature will group documents with contextually related content (it’s usually pretty smart).

DEVONthink provides Services, Bookmarklets and scripts that can be associated with Safari or another Web browser to provide robust capabilities for capturing Web content into your database, and a built-in browser for tabbed-browsing display of HTML, WebArchive or bookmarked Web pages.

Searches: DEVONthink has more powerful search capabilities than does Spotlight, with the ability to search all properties/content of documents or to limit searches to Name, Comments, Label, State, etc. Searches are very fast. Search results may be saved as a smart group, which can be further edited to provide extremely powerful filtering of your information. The Search window provides features such as Spelling, to look for variants of a search term, and Context, to look for terms related to the search results.

In other words, your DEVONthink database provides an information management, research and writing environment that has no parallel in the Finder and Spotlight.

I won’t go into details of many other available features, such as the ability to create a rich text ‘Index’ document, link it to ‘asset’ groups/files, and export as a Web site. Or the large AppleScript dictionary to automate many procedures, including a number of furnished scripts. In DT Pro Office, the ability to ‘broadcast’ one or more open databases to multiple remote users accessing the database(s) via a Web browser, with the ability to browse, search, view or download documents and to upload notes and files. Or the integration of scanning and OCR of paper copy into a database.

So, no, DEVONthink isn’t just a somewhat enhanced Finder and Spotlight. It’s much more than that. I call it the best research assistant I’ve ever had.

chatoyer · April 22, 2009, 5:03am

I think Bill covered that quite well, but just to chime in: my own workflow is a combination of indexing and importing, and I will admit that it took me a few months to carefully think about what to import and what to index. I got there in the end. I have quite a few Keynote files imported into DTPO. While I cannot edit them directly within DTPO (but I can view them), a double click launches Keynote. When I’m done editing, a ‘save’ actually saves them to the database. Very efficient.

I constantly amaze my colleagues with the speed of being able to find something, and the built in AI has, on more than one occasion, found something in my database that I had long forgotten about but which was highly relevant to the document I was currently viewing.

ajoyce · April 22, 2009, 12:31pm

I appreciate the responses. I’ve gone through the video tutorial and I’ve read through the online tutorials as well as the user manual. I’ve been trying to get my head wrapped around this paradigm and, frankly, haven’t had much success. When I first saw the program and read the description I thought, “Hmmm, this sounds interesting, I’ll give it a shot. It may be just what I need to organize and keep track of a burgeoning hard disk.”

So, I grabbed hold of DT and used it for a few days before grabbing DTP, because I felt like if I was going to go forward with this program I wanted to have the more optioned program. But in the days since that acquisition and use of the program, I just can’t understand how this is supposed to help me. I’ve spent literally hours trying to fit the operation into my routine (and maybe that’s the problem… maybe I need to dump my routine because DTP is so dramatically different). Perhaps I’m trying to put all of my thousands of files into DTP when I really should only put a segment of my files (for example: leave out my graphics, music, and other such files).

In my first post on this thread I made several statements and ended them with a question: “correct?” Those weren’t answered directly and I guess the answers must be, “Yes. That’s right.” to those questions. I appreciated Bill’s explanation but I’m just not there yet… and I don’t know if I can get there. I’ve found the tutorials “wanting” for me to sufficiently grasp the underlying principles of operation. I’ve been working on a Mac since 1986 and don’t consider myself ignorant of its operation. I’ve gone through the original Mac to a MacBookPro and use the machine daily for my work as well as my personal enjoyment. Perhaps I’m stuck in my preconceived notions of computer operability.

But, all that said, I thank you for your help, though I may be beyond help at this point… I’m so confused by it all and I’m not sure a forum can really help. I can’t help but believe that there are probably others like me who must feel some of this in regard to DT. Yet, I don’t really have a solution other than perhaps a better video that would take a person used to the old paradigm into this new paradigm. I think the video is too focused upon the basics of using a program (such as file… new… save… copy… paste… etc) rather than "you used to do it this way, consider a new approach). But can such a video even be made? Perhaps not.

Thanks again.

AsafKeller · April 22, 2009, 12:42pm

Pray tell how to perform this magic!

Bill_DeVille · April 22, 2009, 5:25pm

If you set up a rich text document with links to other documents and then select that set and choose File > Export > as Website, that rich text document has been converted to an HTML file with navigational links to the related files.

I created a little DT Pro 2 database to illustrate how I associate rich text notes to PDF and other documents. Because it contains a TOC (table of contents) rich text document with navigational links it can be converted into a functional little Web site.

If you wish to play with this feature, download the file named Example Database - Associating Notes.dtBase2 08-12-18.zip at homepage.mac.com/WebObjects/File … US&lang=en

Open the database in DT Pro 2. Switch to the Split View and Select All the contents. Export as Website to a new Finder folder.

In the Finder, double-click on the that TOC HTML file to open it in your Web browser. Click on a link. Away you go!

chatoyer · April 22, 2009, 5:40pm

Hi ajoyce

Let me see if I can help further. Others can chime in if I’ve made any errors. Your original post:

Yes

Technically, you would still be able to open them if you navigate to where they are stored as Bill explained, but I would probably recommend DT to act as the ‘front-end’ to finding and opening those documents that you have imported or indexed. In many ways, think of it as a slight replacement to the Finder, perhaps? At least in some instances when working with files you have directly imported and then wish to modify.

Yes and no. If you have indexed that file, that means you haven’t actually imported the file into DT, but you have rather asked DT to index its contents and leave it where it is on your hard drive. I’ve never tried to modify an indexed file by accessing it through DT so I’m not sure of the outcome. If you have imported the file into DT, then yes you would have to use DT to locate it and then modify it. Depending on how you have your database structured, finding it can be easy (as in my case).

Pretty sure I covered that in my first post, as did Bill, but let me know if something isn’t clear.

Do you mean ‘put away’ to DT? If so, then yes, but there are some useful shortcuts that I use and can explain if you wish. But, yes, I do not believe you can save directly to DTPO. At least not in the current version.

Bill has addressed this - not correct. DT stores files ‘natively’, so a pdf stored in DT is a pdf in the database and accessible as a pdf.

That I’m not sure of. I use SuperDuper to make an exact clone of my hard drive so I don’t have to worry about whether Time Machine is sufficient.

I don’t think it is, but it depends on what you use it for I suppose. At a bare minimum, I think the answer to that might be yes, but that’s without really diving into the application and making it work for you. In all honesty, the existence of DTPO form the bulk of my reason to switch (back) to Mac in late 2006. I’m an academic and work with thousands of documents, ideas and timelines. I’ve found that my workflow in DTPO (which doesn’t make use of all of its features, that’s for sure) has improved my productivity immensely.

Hope this helps.

AsafKeller · April 22, 2009, 7:07pm

Very helpful; thanks, Bill. What I was hoping for was a procedure to easily create the “index” RTF file. I understand how to create the links, one at a time. I was hoping for a way to select multiple files and create links for them (in the RTF) all at once.

Johannes · April 22, 2009, 8:42pm

Easy:
create a new rtf file, open it in its own window. Go back to some list view, select all files you would like to link and cmd-opt drag them into the rtf-window. vola: a list of links (only posible issue: it is a flat list, no hierarchy).

Johannes

Bill_DeVille · April 22, 2009, 10:01pm

Thanks, Johannes. Especially in the Split view, one can select multiple files, even from different groups, and Option-Command-Drag them onto the rich text file that’s to serve as the index to the site.

By creating a ‘second level’ (or deeper) set of index RTFs that are linked from the primary index, then linking those to files, one can set up a structured or branching site (more navigational links would be indicated here, and are relatively easy to set up).

This isn’t a full-featured Web site creator, but it can be useful for rapidly distributing information via HTML, e.g. in an academic setting for course information, readings, etc.

Of course, having set up such a set of links in a database, it also works within the database, as well as through browser access by multiple users via Web sharing in the Server mode of DT Pro Office 2.

I make a great deal of use of such ‘index’ RTFs to add structure and associations to my notes, drafts and projects.

ajoyce · April 22, 2009, 11:00pm

Well, this thread is definitely going in a direction that hasn’t been much help to me so I guess I’ll move along.

Thank you chatoyer for your comments. I do appreciate them.

rpor · April 22, 2009, 11:27pm

[quote=“Bill_DeVille”]
Thanks, Johannes. Especially in the Split view, one can select multiple files, even from different groups, and Option-Command-Drag them onto the rich text file that’s to serve as the index to the site.

Nice tip! When a Group is referenced using this technique it is displayed as icons or icons with details. Is there any way short of invoking the “Reveal” commend of displaying it in the standard 3 pane format?

Bill_DeVille · April 23, 2009, 5:51am

Perhaps I should have picked up on your ‘paradigm’ comment earlier.

As a matter of fact, I don’t use DT Pro Office to “organize and keep track of a burgeoning hard disk.” By no means do I put all my hard drive contents into my DTPO2 databases. Nor do I put everything into a single DTPO2 database; I’ve got a number of topically oriented databases that reflect various interests and needs.

My main database reflects my professional interests in environmental science, policy issues and legislation and regulations. I’ve been lovingly building it for years, starting back in 2002 with the original version of DEVONthink. It has grown to hold more than 25,000 notes and references comprising more than 35,000,000 total words. Two or three years ago I spun off a portion of the original database that deals with technical and methodological topics such as environmental sampling, chemical and physical analytical and quantitative procedures, environmental data evaluation methodologies and related topics into a separate database, because this kind of content meets different research needs than those for my main database. Splitting these databases improves the focus and effectiveness of searches and See Also in both databases.

The paradigm that attracted me to the original version of DEVONthink was that I could unify my collection of documents of various filetypes so that I could not only search across them, but view and work with them within a single workspace. My collection of digital documents included mostly Word, HTML, text and PDF formats. DEVONthink let me search across those filetypes and, in the search results view,view documents without having to jump around among several different applications.

In other words, DEVONthink unified the information content of my collection of documents in an important way, bypassing the Tower of Babel effect that tended to segregate information by the filetype of the creator application.

There are artificial intelligence features built into the core of a DEVONthink database. The Concordance collects the words within the database. Think of it this way: the database not only ‘knows’ all the words contained within each document, but can compare the contextual relationships of the words in each document to the contextual relationships of the words used in every other document in the database. That’s the basis of the See Also routine. While viewing a document, when See Also is invoked a list of ‘similar’ documents is presented. I often find that valuable, as it helps me find relationships of terms and ideas that I may not have realized.

So DEVONthink not only unifies the information content of my documents and allows me to work with them within a unified environment, it provides a research assistant to help me find and analyze information.

DEVONthink also ‘watches’ the content of groups created by the user, looking for relationships of the documents contained in each group. As the database and its groups become well populated, the Classify artificial intelligence feature gets better and better as a filing assistant, suggesting appropriate locations for new content added to the database.

Taking still another step, the Auto Group feature can analyze the text content of a collection of unclassified documents and begin to group those that are related. Now, in DEVONthink 2 beta 4, the groups so created are tagged by the name of one of the documents it contains, so that these AI-created groups have names that are intelligible to mere humans. When I’ve got hundreds or thousands of items to organize, Auto Group gets me off to a running start.

The Search feature is very powerful and very fast, and uses AI to rank search results. In addition, the Spelling feature lists terms that are variants of the search terms, which can be useful for catching misspelled terms and other variants. The Context feature looks at the collection of documents listed in the results list and displays other terms that may be contextually related to the query terms.

So this is a second-level paradigm of DEVONthink, provision of assistance to the user to help find and explore information and to organize it.

These two paradigms take DEVONthink qualitatively beyond working in the Finder, different applications and Spotlight.

I mentioned that I have a number of databases that meet my interests and needs. For example, a financial database that contains banking records, investment records, bills, invoices, tax records and the like. DEVONthink helps me organize, find and analyze this kind of information also. Through organization, searches and smart groups I can quickly pull together the financial records I may need for any purpose, whether it’s the material and construction cost documentation of a renovation project, a spreadsheet or the scanned and OCRd PDF of a tax filing for a prior year.

I don’t bother to archive all my email. But I have a useful email database containing correspondence of special interest.

I have a database containing hundreds of listserve message collections dealing with the Apple Newton. I don’t use my Newton nowadays, but I keep that database up to date. Perhaps one of these days it might be useful to a Newton fanatic.

When I tackle a new project, I may do extensive Web searches using DEVONagent. In one case, I collected more than 10,000 HTML documents related to effects of hurricanes Katrina and Rita on the health care facilities and infrastructure in Louisiana. I dumped those into a new DT Pro database and used searches to organize them and filter the collection down to about 5,000 documents that provided useful information. It was used by health care professionals to assist in analysis of problems and identification of resources.

Jan_Butler · April 23, 2009, 10:25pm

Bill, how does one dump their Devon Agent findings into DTP2?

Bill_DeVille · April 24, 2009, 1:53am

From the Pages view of search results, Select All (or some) results. Choose Data > Add to Devonthink > as HTML.

pym9 · August 2, 2009, 6:59pm

After reading through this thread (and much else besides), I’m still confused.

Questions:

Basically, if I want to use DTPO for my academic research, I need to fully convert over to it and import all of my research notes (at least) into it, because if I don’t, I’ll need to a) if importing, after creating or editing a document outside of DTPO in say Word or Scrivener of Papers, I need to save it to file outside the database and then manually add that document to the database, and then trash the extraneous outside file or b), if indexing, then I need to manually resynchronize my index every time I want to use DTPO.
Note-taking. I tried using DTP 1.x for a month in the archives a year and a half ago, but nearly lost it when I found that my first few days’ worth of notes were completely out-of-order (from the order I had recorded them) and were instead now in alphabetical order. So I had to go back through and institute a numbering scheme to keep them in the proper order. Instead of titling each new .rtf file “Colbert to Stewart, Aug 2 2009” I now gave them a code: CC-001. But then I ran into the problem that if I skipped a few documents and later decided to take notes on them, I had to insert them between two numbers. So after a few days of that, I switched to dates. Now every new rtf was titled: “CC-2009-09-02 Colbert to Stewart.” Of course, that still leaves the problem of items with no date, etc. None of this was necessary when I just took notes in Word, and put everything from a single box/folder into its own word file. Of course, then it was hard to find things in a 30+ page document, but at least the integrity of the order of the notes was preserved (and I can go back and add things later). I suppose I could just take long notes in DTPO, but then, from what I read, the AI features don’t do much good. So how do others take notes in a way that preserves the original structure of the information without creating very long documents (ie all the items with a folder within a box)?
Zotero. I tried taking notes in Zotero, and I liked it, but couldn’t find any way to export individual notes from Zotero to DTPO. The advantage is that all of the document info is correlated with each note, and you can have multiple notes on each item. I can’t figure out how to export the individual notes in a meaningful way.
External editors. If I’m writing in Word, and I want to use other Word documents that are in the DT database, I have to open DTPO to open the files? I can’t just use Word’s “Open” or “Open recent”? This seems hugely inconvenient. Why can’t DT operate like Spotlight or Copernic Desktop Search (which I used 2 years ago when I had windows), and index things automatically, all the time, without interfering so much in file structure? The index feature doesn’t really do this, since it’s manual. Plus, I want an application to take notes in that provides better structure, and I’m just not sure that DTPO is it.

I really want to be able to use the AI features of DTPO in working on my articles and manuscript, but find all of this so confusing and frustrating that I’ll probably give up and just use something else. It really is NOT user-friendly. I’ve devoted probably 20+ hours to trying to figure all this out.

gnoli · August 2, 2009, 8:28pm

Dear pym9,
I think that you are using DTPO trying to make things that are not intended for. DTPO is a powerful, flexible, database freeform. That means that you do not need neither a fixed form nor a fixed structure (like FileMaker, for example), and this is the principal characteristic feature of DTPO. If you need a particular fixed structure, I think that you could use apps like Scrivener, OmniOutliner or Thinderbox …

parezcoydigo · August 2, 2009, 8:56pm

pym9–

On #1, well essentially the answer to your question is a qualified yes. If you want to use the tools that DT puts at your disposal to organize, manipulate, and use your data it helps to have all your data together in the database. Or, in a series of tightly defined databases. Indexing does let you do this without importing, as long as the computer where your files are is the same one you use for your database, ie that they will always be together.

On #2, there is a very simply solution. The folder hierarchy in the left pane is indeed alpha ordered. But, in a two or three-pane view, individual note files can very easily be ordered by the date they were created either through using the view->sort->date created drop down menu options, or by adding the column date created along with the defaults of ‘Name,’ ‘Modified,’ ‘Kind,’ 'Size," etc. Right click on any of those columns, check ‘Date Created’ and then select that column to sort your notes.

On #3b, if you index instead of importing files, you can put them wherever you want in the finder file hierarchy. Or, alternatively, you can use templates or whatever from within DT to open files from within the database in your editor of choice. When working with Scrivener, I prefer to import notes from DT into Scrivener for specific writing tasks and to write my articles and chapters there.

DT allows you to create as much or as little structure to your database as your prefer. I’ve used it to manage academic projects with more than 5000 items and millions of words. In the process, I recreated the structure of both the archives I work in, and of the structure of the book.

pym9 · August 3, 2009, 3:37pm

Thanks for your replies, you have certainly cleared up some of my concerns. But for #2, the problem is that I may very well skip a document or group of documents and later decide that they were significant and go back and take notes on them. In this case, the sort by date created would not work.

So maybe if I named each note: “XXXX-2009-09-02-10 Colbert to Stewart” whenever I skip documents that would take care of the problem? XXXX would be my abbreviation for the Archive/Collection, and I would only use the “10” at the end in cease I’m skipping documents of the same date, so I could later go back and add 1-9?

What to do with undated documents, ie. if there was an undated mss. between XXXX-2009-09-02 and XXXX-2009-09-05? Should I give it a fake date for the computer and make it clear to myself that the item is undated, i.e., “XXXX-2009-09-03 ND Shopping List”? What should I do with a whole bunch of undated materials in a box? I suppose just ignore the problem of order, assuming it’s somewhat arbitrary anyway?

Of course, many archives are based on the box/folder structure, and items within a folder are subject to being shuffled around, so to a certain extent I guess I could just use the tried and true XXXX/box/folder structure and forget using dates. When reading correspondence, though, it’s always nicer to have it in order Maybe I need to do “XXXX/box/folder/2009-09-02 Colbert to Stewart”? Seems like a lot to type, but I guess that might be the price I need to pay to use the flexibility of DT.

Certainly I need to adjust the structure the database intelligently based on the vagaries of the structure of the individual archive, but I want to avoid the catastrophe of losing chronological order like I did on my first attempt to use DT.

Another random question: When taking notes on secondary sources or long documents, do you all break up your notes into shorter bits, i.e. chapter by chapter or something like that?

Johannes · August 3, 2009, 4:18pm

Maybe I missed the point, but you can set the sorting order to unsorted (only in split view). With that the documents will be ordered according to when the document came in – or where ever you drag them. That is the reason why I work exclusively in split view.

And you can change the date columns of a document by AppleScript (some examples are in the menu Scripts>Date).

Johannes