Well, I don’t know if I would call SBJ’s article “path-blazing.” Illuminating an extant trail, yes. Blazing a new one, no.
And, to answer both the question he asks and the one raised here: I think what everyone, including the blazing SBJ, is that what makes his database so damned good is that the information in it has been pre-selected. It’s not regular grain, it’s fortified breakfast cereal. Context matters, and what makes a search, especially the fuzzy search of DT, work is that there is a rich contextual network. If you poured every part of an article in, you would weaken that network. You might still get interesting results. You might even get unanticipated results that led to fundamental insights and innovations, but what SBJ was getting back was his own mental processes, mirrored and sorted outside of his own mind. It’s sort of the ultimate version of Freud’s mirror stage: “Goodness, that’s me!” Except, it’s more like: “Goodness! That’s what I think!” And lest you think I’m making a mockery of all this: this is a very cool thing indeed. Witness my own use of DT Pro, and in a way very similar to SBJ’s:
Gumbo.dtbase
/Agriculture
//Okra
//Rice
…
/History
//African
//Colonial
//European
///Foodways
////Montanari 1994
/////24 – Quotation or notes paraphrasing larger span of materials. Anywhere from 10-1000 words.
…
The “Montanari 1994” above is a text (as a folklorist I use Chicago style documentation, with the author-date option) and the “24” beneath it is a page number. Sometimes it can be range: as small as one for a quotation that breaks across a page but sometimes 10 pages, or even more, when I simply prefer to summarize a chunk of the text which is less useful in detail but still worth capturing in some fashion.
Labor intensive? Yes. Context rich – that is, a context of my choosing and so richly meaningful to me? Yes.
Would I be interested in a script that could take an RTF, DOC, or PDF and break it into paragraphs and make each paragraph a separate entry in DT? Sure. But I’m not sure the results would be as good.
Of course, I could be wrong and I would be happy to hear discussion to the contrary.
EDIT: I didn’t originally write “Goodness” in the quotations above. It was sh*t, but that got replaced with “Nuts” and that just didn’t sound like me. I have been known to replace almost any expletive with “goodness” when my one year old daughter is around, so I went back in and cleaned up the language my way. I had not idea that there was a language censor built into this thing. Also, spaces aren’t respected by this version of phpBB, and that’s why the directory structure represented above has multiple slashes to represent nested groups.