Could I ruin the AI?

I recently purchased DEVONthink Pro and am happily integrating it into my workflow. My goal is to form a research library from collections of interesting and relevant clippings that sprout up over the next several years. The thing is, I don’t want to invest time into carefully forming a research library if I’m not using the program effectively. I want to set the system up properly from the start; I don’t want to ruin the potential power of DEVONthink’s AI features in the long-run. These are some issues that have been bugging me in this regard:

1- Pre-filtering: Should I invest time into “helping” DEVONthink by making sure that all of my files don’t contain a bit of extraneous information? For instance, instead of pulling a highly relevant quote from a website, I’d rather archive the whole page. Is this a bad idea? Will this taint the relevance of search results and the usefulness of the “See also…” feature (since irrelevant content inevitably appears in web archives)?

Another scenario: If I stick a monster PDF file in the database, am I messing with DEVONthink’s AI features? Wouldn’t it be “unfair” to mix an enormous block of text with pre-filtered highly relevant quotes?

2- Databases: How many databases should I have? My research library is in the same database as project support materials for my clients (but in different groups). Should these groups be in separate databases since they are completely unrelated? Could keeping too much unrelated information in one database spoil the potential usefulness of DEVONthink’s AI features?

3- Groups: Does it really matter what group a certain file is in? Say I have a group entitled “nytimes.com” where I store clippings from news articles that I find interesting. One may be about software and another may be about healthcare. I don’t believe this would work well with the program’s “Classify” feature. Should I find another way to organize clippings that doesn’t involve grouping them with their source?

4- Naming files: I’m trying to implement a method for storing excerpts that is described at http://www.stevenberlinjohnson.com/movabletype/archives/000230.html . How should I name files – can I rely on DEVONthink’s auto-name feature (for clippings)? If I am capturing an excerpt from a book, where should I store the page number – in the file itself or as meta-information?

I’d love to hear what power DEVONthink users think of these issues. Thanks for your help!

Hi, Matt. Good questions.

I usually capture selected text/images as rich text, precisely to avoid including extraneous material. I’ve noticed that many journals and other sources either make this easy to do (Science is a good example, as the full content of an article can easily be selected) or provide a printer-friendly version (the New York Times is a good example).

I know that Johnson recommends a “sweet spot” of perhaps 500 words for notes. But I’ve got many long documents, including PDFs that run 500 pages or more, and See Also works well for me.

I recommend topical databases for two reasons. [1] I’ve got so many files in my various databases that if they were combined into a single database it would be slow, especially for AI work. On my MacBook Pro with 2 GB RAM I try to keep my main database between 20-25 million total words for good performance. I’ve got more than 2,100 documents in it at the moment. There are several hundred zero-word bookmark links, and at the other extreme some book-length PDF files. On average, word count is roughly 1,000 words per document.

I use that main database as the temporary container for new content that I’ll periodically export to one of my other databases, as that mode lets me collect stuff without having to switch databases. I’ve always got some “unrelated” material in my database, and that doesn’t usually hurt See Also’s usefulness.

But I’m working on a very large and very detailed database that holds reference manuals and protocols for chemical analytical techniques, environmental sampling methodologies, statistical and other considerations for evaluating environmental data, and related quality assurance guidance documents. This database will eventually be as large or perhaps even larger than my main database, which deals with environmental science, technology, policy and legal issues. When I’m researching e.g. health effects of mercury in fish in my main database, i don’t want to be overwhelmed with hundreds of documents on related analytical procedures (including sample preparation) for detecting mercury in fish samples. So those two databases serve different purposes, and work better for me separately rather than combined.

I like to write inside the database that holds my reference collection, as the references are at my fingertips. I’ll create a new group for a writing project, which will contain subgroups of duplicates or excerpts of the most important reference materials. Why duplicates rather than replicants? So that I’m free to mark up, highlight and otherwise “vandalize” my working copies without messing up the original reference documents. When finished with the project I’ll probably export that group to a new database for archival purposes, but keep the final, polished output (currently I use Papyrus 12 hybrid PDF files for final output) in the database.

I prefer topical grouping rather than organization by source. Of course, DT Pro will let you replicate documents, so you can “file” them both ways.

When storing excerpts from a book I usually make the name of the document indicate the book title, author and page number(s). And I’ll include another document that holds a standard citation for the book, so that all the segments and any notes about them can be quickly pulled together by a name search. That way, I’ve got everything I need to do footnotes or endnotes in the final version of the project.

Thank you Bill. Your comments on multiple databases are especially helpful: it seems like if I’m initially saving things to one “master” database, DEVONthink is flexible enough to let me effortlessly move items to more specific topical databases in the future. Your comments on duplicates/replicants finally clarified the purpose of those features for me.

This is the one thing that confuses me. How do you “include another document” within a document? Also, can you post a sample name of one of your documents so that I can better understand your method?

Thanks :slight_smile:

Response to mattw at http://www.devon-technologies.com/phpBB2/viewtopic.php?t=3530.

WBD - 061115

mattw said on 14 Nov 2006:

Sorry, I should have been more precise. Suppose that in my research notes for a project I’m clipping excerpts from a book or article, and that I may wish to cite and/or quote them in a footnote or endnote in my finished article.

I do a great deal of initial drafting and note-taking inside the database environment, so that I’m limited to the capabilities of rich text note. Unfortunately, those capabilities don’t include footnotes or endnotes. Maybe one of these days, when Apple beefs up Cocoa text. :slight_smile:

Suppose I’m looking at a particularly interesting document and start clipping excerpts for possible use in my project. Each such clipping is stored as a rich text document in my project reference notes group(s). When I search or browse those notes later, I like to see document names that cue me to the content, and/or alternatively to the source. And if I end up quoting or citing material in a note, I will need to quickly find citation information associated with it.

[1] Assume the source reference document is already in my database. I will likely create rich text notes that are either excerpts or my comments about a topic in it. I will probably assign to each such note a name that’s meaningful as to the topic, and perhaps may include the name of a principal author. But if I quote from an excerpt or need to cite the source, I’ll need some reference to the source material in order to construct the citation in my finished article. So, below my excerpt or note in that document I’ll create a link to the source document, or to another document in my database that contains the citation information for the source. (If the source is available both in print and on the Net, I’ll probably include in the citation a URL link.)

[2] If I’m making notes or typing excerpts into my database from a printed source that’s not in my database, I operate as in [1] above, but make sure to create a separate “citation note” document for the source to which each note can be linked. And I’ll likely include the page number of the book or article for each quote or comment.

I’m using Papyrus 12 to “polish” some projects for final output, usually as PDF that can either be printed, posted on the Web or made available on CD or DVD. (I’m using Papyrus 12 because I can use it’s hyprid PDF file format. So PDFs can be viewed in my database exactly as created in Papyrus, but can be directly edited by Papyrus without going through intermediated steps to recreate a PDF version.)

I’ll probably set up a topical outline in Papyrus, and drag and drop or copy/paste material into the Papyrus document from my DT Pro database.

Thanks to Alexandria’s forum post, I use a neat utility named Afloat that lets me set up my current papyrus document so that it can “float” above any other application when I invoke it from the Dock, into which I’ve minimized it when not in use.

So I can work with my project material in DT Pro. When I find something I’d like to incorporate into the Papyrus document, I can copy it to the clipboard, then click on the Papyrus document in the Dock and paste it in, and use Exposé (or Command-Tab) to jump back to DT Pro. Or I can make Papyrus float transparently above DT Pro, select a document or text and drag it onto the Papyrus page. If only the one Papyrus window is open, I can press Command-M to minimize it, then click on the DT Pro window to activate it. Almost as convenient as though Papyrus were the native text editing mode.

Of course, Afloat will work with other word processors as well. I’ve chosen Papyrus for the editable PDF capability.

In Papyrus I can insert footnotes and/or endnotes. Since I set up for each of my research notes a link to citation source material, I can quickly find appropriate text for a footnote, and/or for a References set of endnotes. If I wish, I can set up hyperlink bookmarks between an endnote number in the text and the endnote, and those links will work with all standard PDF viewers.

I wish such tools for managing and mining information and producing final output had been available during my academic days. Back then, my tools were index cards and a typewriter. But I suppose that experience makes me appreciate DT Pro and a competent word processor all the more. :slight_smile:

Thanks again Bill. I love the efficiency of your method. Looking forward to experimenting with it. :slight_smile:

Bill, in a recent post you said:

Could you expand on this a bit? I understand the naming convention, but want to make sure I understand the organization. I assume that a quote from a book goes into a file named for that book, similar to Steve Johnson’s suggestion for “chunking” (although I know you disagree with his 500 word limit). Does this book file stand alone? Go into a topical file?

What is the nature of the “other document.” Does this work as a bibilographic reference manager? Where is this stored?

I don’t propose my habits as “the method” – merely as an approach that works for me, and can be modified to fit different projects and circumstances.

For a given writing project, my notes are more likely to be located in a topical subgroup of project notes than in a subgroup organized by source. But I might do it either way.

The “other document” isn’t mysterious, really. It’s merely the bibliographic citation information for the source. Rather than repeat all that information in each note I’ve made from that source, it’s easier to make a link to it.

I don’t use any bibliographic reference management software. My attitude is that I’m not going to use any citations that I haven’t, in reality, actually used.

Every few years there’s another study – usually accompanied by much distress – about how frequently academic and scientific publications list citations that were, in fact, never looked at by the author. I’ve got several such studies in my database, as examples of fraud. It’s surprisingly common.

I think it’s unethical to “lift” citations from other authors or from citation searches that I haven’t looked at or used.

The cases that amuse me most are the ones where authors keep repeatedly citing references listed in another publication, when those citations were erroneous, or irrelevant to the topic. :slight_smile:

Once, I was criticized by someone because I hadn’t listed a reference that was cited by an author. LOL. That was one of the fraudulent reference cases – that author had lifted a citation that contained several typos from another publication, so had obviously never looked at it.

Bill,
I agree, of course, about not using references that one hasn’t read.
What I’m trying to understand is how you stucture your internal reference file so that you can grab it as a reference when you need it. Do you use a sheet for references and then link to them? Is there some other post you could point me toward?
thanks,
Lew

Hi, Lew.

It depends. If I’m working on a project that’s going to have lots of references, I might create a sheet, which contains records of the citation information. But records contain only plain text and the format isn’t all that great for grabbing the information to a word processor, especially if multiple fields (cells) are used.

So I’m much more likely to create a rich text document for each citation and store them in a “Cites” subgroup. The citation may be to a book, a journal article, a URL or perhaps, e.g. both the standard journal citation and it’s online URL.

When I take notes and/or copy excerpts from a reference, I’ll create a hypertext link using the contextual menu option “Link to”. I always use a little text block beneath the note, comment or quote that I select for this purpose. I’ll usually use a cue such as See or Reference to select for making the link. Sometimes the same note may link to more than one reference, using multiple “cue” links.

I usually start a writing project by creating a “TOC” outline in a rich text document. I’ll simply hack out the section headings that I want to “fill out” for the final product. I don’t care about the niceties of outlining, as it’s easy to do this using rich text lists or ordinary text, or a mixture. Each Section will be linked to a corresponding draft document for that heading/subheading. As I work in those sections the notes I’ve got will be grist for the writing mill. Each time I use in my draft a note or quotation referring to one or more citations, I’ll hit its link(s) to the citation(s), copy/paste it/them into my draft and enclose it/them in {} brackets.

Now, when I grab that draft material in to a competent word processor (usually Papyrus 12 in my case, sometimes Pages) I’ve got my working material including citations for footnotes or endnotes.

Simply work out some procedures that will accomplish your needs and that are consistent enough that they become comfortable to use in practice. As a former senator from Louisiana, Russell Long, sometimes said “There are more ways to kill a cat than by stuffing it with butter.” :slight_smile: