"Keywords" popup?

riddle · February 26, 2005, 11:03pm

The DEVONthink 1.9.2 User Manual, p. 45, says:

Huh? I don’t see any pop-up in the lower left corner of a document window.

Is this maybe an out-of-date description of the “Words” button in the lower right, next to the “Classify” and “See Also” buttons? When I click on “Words” don’t get a popup but a drawer which displays a list of words from the document, sorted by frequency, with their lengths and weights. And if I click on one of those words I don’t see a list of documents containing it, but I do get a search window with the word in question filled in. Which sounds like a step in the same direction.

So – is this the same feature, or something different? And if there’s really a Keywords popup, why can’t I see it?

– Prentiss Riddle prentissriddle.com

riddle · February 26, 2005, 11:24pm

Aha – elsewhere in the manual, on p. 55, it says:

I do see the “>>” icon. And when I click on it, I get a popup list of words with “heat bars” beside them. If I click on one of those, I get a list of documents containing the words. The words and their ordering in the popup are peculiar – some seem to be relevant terms that it would be useful to search on, while others are pretty much noise (such as random alphanumeric strings extracted from any URLs that happen to appear in the document).

Exactly what does “keywords that DEVONthink has calculated for the document” mean? Is this the mysterious “Keywords” feature, elsewhere described as a button on the lower left? And if so, what is it good for?

– Prentiss Riddle prentissriddle.com

Peter_Gallagher · February 26, 2005, 11:50pm

You’ve asked the right question, Prentiss.

I’d like to know the answer to that, too. The ‘keywords’ seem to be driven, like a lot of other things in DT, by the weightings in a concordance.
This makes them ‘good for…’ purely statistical uses.

After a couple of week’s fairly regular use of DT, I find it hard to see them being keywords in any meaningful, research-related sense … unless you put a much higher priority than I do on serendipitous ‘random walk’ type essays into the database (usually a last ditch technique for someone not writing fiction).

I’d really like to hear from someone who has found a valuable use for either the keywords or ‘classify’.

Peter

riddle · February 27, 2005, 3:56am

Thanks, Peter. What do you think about “See Also”? Have you seen Steven Johnson’s recent New York Times essay praising DEVONthink in general and See Also in particular? Johnson is a science writer whose work I admire and he says he lives by DT and See Also. You can see the article and two rounds of discussion in his blog at:

[query.nytimes.com/gst/fullpage.h ... A9639C8B63](http://query.nytimes.com/gst/fullpage.html?res=9B0CE1DA1038F933A05752C0A9639C8B63)

[stevenberlinjohnson.com/mova ... 00230.html](http://www.stevenberlinjohnson.com/movabletype/archives/000230.html)

[stevenberlinjohnson.com/mova ... 00231.html](http://www.stevenberlinjohnson.com/movabletype/archives/000231.html)

Now Johnson has a couple of things in his workflow which may not be generalizable to other people. For starters he has a research assistant who transcribes into DEVONthink anything Johnson highlights in his reading! Also, Johnson focuses on 50-500 word excerpts and groups them by the book or article they come from, rather than by topic. In fact, from his description of his method it sounds like he doesn’t use Classify at all. But having invested all that labor into setting up his DEVONthink database, he says See Also is invaluable for tracking down connections he otherwise might miss as he writes.

Sounds plausible. Then again, maybe it’s really just a random walk in a dynamite collection of quotes and excerpts, and he’d do as well rolling dice.

– Prentiss Riddle prentissriddle.com

Bill_DeVille · February 27, 2005, 6:04am

Peter:

You’ve been using DT fairly heavily for a couple of weeks. My database is now about 29 months old, and for the most part contains data that I’ve carefully selected over that time period. However, because my interests center around environmental science, technology and policy my collection ranges from chemical analytical/biological laboratory procedures, to legislation and regulation, to hazardous site investigation and remediation, to risk assessment, to statistical methodologies for assessment of environmental data, to toxicology and health effects, to conservation biology, genetics, economics – and a wide swath of other related materials.

SEARCHES

My initial interest in DEVONthink was roused simply by the fact that it can do searches on the text of a variety of file types. I already had thousands of reference files on my computer. DT allowed me to search my collection, which is no small accomplishment. If that were all that DT could do, I would still be using it every day. For it has allowed me to construct a personal “encyclopedia” of my interests that is far larger, deeper and richer than the Encyclopedia Britannica.

DT doesn’t (yet) offer a full set of Boolean search operators. Users are limited to AND, OR, phrase and wildcard searches, with case/no case variants. There’s also the fuzzy search, which I often find useful for technical terminology. Of course, it’s fairly easy to do multilevel sorts and searches of search hits (DT Pro beta allows one to create “smart” groups the contents of which are based on a search strategy, but this can be emulated easily in DT PE). With a little thought and surprisingly quickly, “smart” searches of a DT database can be done now.

I’m looking forward to having the more complex search tools of DEVONagent in DEVONthink. The search potentials I miss most in DT include the lack of a NEAR operator, and the ability to mix exact multi-term phrases with AND and/or OR terms. But there will be a price to pay when we are able to construct very complex search operations: speed. Really complex searches of my database, which contains tens of thousands of files and tens of millions of words will take longer than simple AND or OR searches of a few words. Even so, I’d rather stress my CPU than my brain. (And I’ll probably move to a G5 when DT Pro version 2 comes along.)

KEYWORDS

When I’m really digging into a topic I do sometimes find the Keywords button useful, especially in a jargon-rich field, or when I’m interested in checking other items by the same author.

Perhaps the major weakness of Keywords, IMHO, is that DT only uses single terms as keywords. Let’s say that I’m looking in my database for information about polybrominated diphenyl ethers. Keywords doesn’t seem helpful. But wait! There’s an acronym, PBDEs, that’s commonly used for these chemicals. Sure enough, there the acronym is in Keywords. Click on it, and there’s a long list of the references in my database. (It helps to know the jargon.)

SEE ALSO

I make a lot more use of the See Also button. DEVONthink’s contextual recognition features really work, and can recognize multi-term patterns – think of it as Keywords on steroids, or even (often) as a very intelligent search feature.

How smart is DEVONthink’s See Also feature? In my experience, it covers the spectrum from genius to idiot. On balance, I find it extremely useful. Always remember, while you are looking at articles on the biochemistry of nitrate uptake by marine algae, that DEVONthink doesn’t know anything at all about biochemistry, or algae. See Also merely suggests other articles that have similarities to the article you are reading. It’s up to the user to understand the content being read, and to judge how related DT’s suggestions are to the user’s interest. In my database, DT will suggest several other research articles on nitrogen uptake by algae or phytoplankton, as well as articles about algal blooms and phosphate limitation, “dead zones” resulting from algal blooms, programs to reduce nitrate runoff from agriculture, potential economic losses to fisheries from excessive marine nitrate levels, and so on. DT may also suggest some items that seem dumb, based on factors such as the mailing address of a research institute, an author with the same last name who writes about racing cars rather than biochemistry, and so on. Overall, DT makes a good research assistant.

CLASSIFY

I’ve got just under 800 groups/subgroups in my database. Some groups/subgroups are very well organized, most are rather messy. I’ve never used auto-classify (although I will experiment with that using a database copy sometime soon).

The Classify button in early versions of DT often refused to highlight a suggested group for location of an item. Recently, DT classification has become more aggressive – almost always, one or more suggestions is highlighted for action by the Move button. I like that. I’ve got a big backlog of unclassified items in my “Edit These” group, and I’m making more use of DT’s decisions rather than manual decisions. Even when I could quibble with DT’s classification decisions, I see no ill effects on searching and See Also functions. The more I let DT make classification decisions, the better (more consistent, even if not like the decisions I would make) they seem to get – in this instance, perhaps DT is smarter than I am.

I have no hesitation in locating an item in more than one group. That breaks strict hierarchical rules, and can lead to a network rather than hierarchical organization. Hierarchical groups are easier for us humans to understand, but perhaps that isn’t so important after all.

At this point, I carefully classify some items, such as project records, for my own convenience. For other items, I’m willing to let DT decide where to put them.

Peter_Gallagher · February 27, 2005, 6:47am

Hi Bill,

You’ve certainly got a lot more experience with DT than I do, so I am happy to hear that you’re satisfied that DT does boolean searching.

But I can’t find any way to make it do a ‘not’ search. Without that, I consider DT doesn’t have a boolean search at all (nor, apparently, does DevonTech – which now denies that DT includes the boolean searches that the manual and help file both say is included). My reason for putting such emphasis on the disjunctive is that it’s MUCH more selective than a conjunctive (and/or) search. The ‘not’ keyword is simply essential for targeted search.

I strongly agree with you about proximity searching (at least as a sticky ‘preference’ set for ‘all words’ searches).

As for classify and ‘see also’: I’m willing to wait to see what happens with further use. But I find them so often wrong (and so unpredictable in the ways that they are wrong) right now that I don’t find them useful.

On ‘keywords’: I think you may have stated their best use. I haven’t been able to use them like that because the ‘keywords’ being returned from my database right now are typically the most frequently appearing longer terms in an article. These are only accidentally terms that matter to the meaning of the document itself. So checking what other documents say about the word ‘predominance’ or ‘unravel’ (two of the top of the list of ‘keywords’ in the document I’m browsing right now) is pretty unhelpful to me. Had DT chosen the keywords more intelligently than just statistically weighting a concordance, I might be making use of them right now – even after two weeks of use – instead of arguing about them.

Best wishes,

Peter

Bill_DeVille · February 27, 2005, 7:44am

Peter:

It’s been quite an adventure using DEVONlthlink, and overall, quite enjoyable. My own database has become more usable as it grows. I expect that you will have a similar experience.

I agree that most of the keywords are of little use to me. I think we’ll see more opportunity for metadata tags on items in the future. For example, I’d love to be able to tag results of searches.

If there were more ‘snippets’ of information in my database, like Steven Johnson’s, I would miss the NOT operator more than I do now. With many large documents in the db, NOT would often exclude things I’d like to look at. Search strategies can be tricky.

If DT’s capabilities grow as much over the next two years as they have over the last two, I’ll be delighted. My guess is that by that time, the AI features in DT will be taken about as far as they can go on current G5 Macs, and will be pressing for another generation of hardware.

riddle · February 27, 2005, 3:32pm

Thanks for the info, Bill. I find the pro-con with Peter very helpful as I evaluate DEVONthink.

One of Steven Johnson’s main points in his blog entry is that there’s a “sweet spot” of 50-500 words where See Also is valuable. You say you’re finding it useful even though you have longer texts in your database. I’d like to know more about what you think of his point. Is he wrong about the sweet spot? Is it perhaps the case that See Also works okay with large docs but would work better with small ones? Or maybe your research purposes require less granularity than his serendipitous strolls among ideas?

Thanks again to both of you.

– Prentiss Riddle prentissriddle.com

xuanyingzi · April 1, 2005, 7:50am

Oddly, there doesn’t seem to be any close relation between Keywords and word frequency or “weight”. Based on a sample document I’ve been working with (religion.sbc.edu/jester.html), none of the words shown in the list of Keywords is among the most frequently used words. And no word that occurs frequently or is assigned a heavy “weight” in the list of Words is listed among the Keywords.

It’s unclear to me what “keyword” and “weight” mean in DT. For the Keywords, the PDF user manual and DT’s own help file say that DT looks for the most important words that are also found in other documents. But according to the Concordance, the three most frequent nouns in my entire database are “century”, “Chinese”, and “world”. These three words also appear in my sample document, but are not listed among its Keywords. Neither the manual nor the help file say anything about “weight”. It would seem to refer to the relative importance of each word, but the word ranked at 100 in the list of Words of my sample document is “rhinoceroses”, which occurs only once and definitely is not a crucial term in that document. (“Rhinoceroces” does not occur elsewhere in my database.) And anyway, if DT believes that “rhinoceroses” is the main word, shouldn’t it be one of the Keywords?

This made me curious to learn how DT discovers related documents in the database and suggests classifications. The list of related documents appears to be based on word frequency instead of Keyword or “weight” – which is good because these might give questionable results. As for Classify, DT’s choices seem to be heavily influenced by the size of the documents in the database. A large document is likely to contain a large number of occurrences of the most frequently used words in an average-sized document, so DT suggests to classify the latter into the group that contains the former. This brings us back to the question of the “sweet spot” that Prentiss raised in the previous post in this thread.

A useful lesson I learned from all this is that, at first sight, it wouldn’t make much sense to keep very large documents in the DT database (or documents that are much larger than the average). But on the other hand, if one removes the larger documents from the database, one cannot use them as a convenient starting point to find shorter related documents.

So the best way to deal with large documents might simply be to exclude them from those used to classify other documents, which can be done from the Info palette. Indeed, this allows Classify to give better results, and does not prevent the larger documents from appearing in the list of See Also.

Timotheus · August 20, 2005, 10:44am

I must confess that I dearly miss a ‘traditional’ keywords-feature in DT. So I don’t mean the feature which goes under this name in the present version of DT, which to be honest is of little use to me. I mean the simple opportunity to attach manually keywords of your own choice to a particular file, which exists for instance in MacJournal and in countless other applications.

Am I alone, or are there others who would like to see a similar feature implemented in DT?

Maria · August 21, 2005, 4:24am

Hi,

I used to feel the same, and I loved the stickies in CP Notebook, although they can be a bit childish and are not really necessary.

As for keywords, there is a workaround which may be even more sophisticated than real keywords:

Create a group “Keywords” on the top level and inside some further groups with all your keywords and – if you like – keyword hierarchies. Now, if you wish to assign a keyword to a certain document or folder, just use ctrl-click and create a replicate in the appropriate group.

(Advantage 1) Had you used traditional keywords, you would search for files with these keywords. In DT, you need not search but just open the group in the keywords group where your data is right available.

(Advantage 2) Keywords cannot easily represent a hierarchy (search for “Fiat” and you do not get the “cars”. In groups you can go up and down the hierarchy and include or exclude whatever you like.

(Advantage 3) No additional commands.

(Disadvantage) Needs some “get used to”.

What do you think about it? I havent’t tried it yet, because my databases are more a mess of data, I hope to clean them up when I have holidays or get 60 or something like that…

Maria

Timotheus · August 21, 2005, 12:23pm

Your solution sounds very ingenious, Maria, and might well deserve a serious try out.

Yet, I wonder what the result would be if I tried to create in DT a similar database for the 1766 unique keywords presently contained in my Bookends database. If I understand you well, I would then have the choice between creating one huge file with all 1766 keywords, which would probably be useless as a working instrument, and dividing these keywords in subgroups, the subgroups in sub-subgroups, etc. This last strategy would in theory certainly be possible, but (1) would require quite a lot of annoying work, and (2) would not only force me to open very frequently one or more folders, but also to remember the exact location of every single keyword inside the structure. And I fear, or rather am sure, my memory would fail to do that.

I feel the need of a database application with a rather powerful and versatile keywords feature, in order to be able to create not just a simple and endless list of chosen keywords, but as many lists of keywords as I like.

An archeologist, for instance, should have the possibility to create separate lists of “archeologists”, “archeological finds”, “locations”, “maps of locations”, “inscriptions”, “epochs”, “photographs of archeological finds”, “publications” etc. etc., in order to be able to search every possible combination of these lists. Moreover, it should be possible to create hierarchical structures between (and within) these lists, and to locate any keyword within any desired number of lists.

This would make it possible to verify instantly if the archeologist X ever dug up pottery, if he or she ever found anything dating back to the 12th century, if any archeologist in the past century ever found a mirror in the South of France, or wrote about brick patterns in the middle ages, if the database contains photographs of objects which could be both writing instruments and combs, etc. etc. Needless to say that such an instrument would be very, very helpful both for commenting on past research and for planning future research.

This is the kind of functionality which I ardently desire, and which, as far as I can see, DT in its present form doesn’t offer. For when I want to know if any archeologist ever found, for instance, a table, DT not only finds the tables that may have been dug up, but also the tables (= charts) in the publications of my colleagues, and the files containing the word “table” in which the concept “table” has no importance at all. On the other hand, it doesn’t find files about household goods, between which there also could be tables.

I hope these simple words may at least give some idea of what I mean, of what I’m looking for. To be honest, I often have the feeling that DT, with all its sophistication, fails to offer some basic functionality which could be very useful for anybody.

But it might well be that this post simply betrays my ignorance about the real force and real possibilities of DT, and my inability to exploit them. In that case I apologize.

laudunum · August 21, 2005, 8:46pm

1766 keywords! The map is approaching the territory, no? How did you end up with so many?

Timotheus · August 22, 2005, 5:29am

Well, just to cite the first keywords beginning with A: accademia platonica - accessus ad auctores - Acciaioli, Donato - Accolto, Benedetto - accusa - Aconci - acutezza - Acuto, Giovanni - adagio - Adams, Henry - ……………

and I can assure you that the territory behind this map is infinitely more ample tham the map itself …

Maria · August 22, 2005, 6:27am

Hi Timotheus,

1766 is quite a number, and in which ever organization, quite uncomfortable to maintain. I have a slightly different concept of keywords (in number…)

No, I do not think of a file, but of folders, each named as a keyword (well, a lot of work with 1776. But may be you find a workaround via OPML) and containing replicants of those documents that will be given that keyword.

Yes and no. A logical, hierarchical classification of your keywords would make it most easy to find the place. But with 1776 keywords I agree that it may be annoyong to open to many folders on too many levels.

But if you create a flat hierarchy in alphabetical order, it would be like a keyword popup menu anyway. This may be more close to what you wish I think.

Best,
Maria

PS Thanks for choosing archaeological examples!

laudunum · August 23, 2005, 1:43am

Ah, you and I think of keywords differently – could be a disciplinary difference. Such as list as yours strikes as more an index, but, obviously, it has been working for you and that’s fine.

Back on topic: Maria, your idea is brilliant, but I would still prefer a way to embed/associate keywords into/with the documents themselves in DT. Like EXIF information for digital cameras, one could hope that such metadata could also be exportable for use in other applications.

Timotheus · August 23, 2005, 5:06am

Thanks, Maria, for your explanation! And yes, Maria and Laudunum, my conception of keywords may be somewhat particular, but I like to make them as specific as possible; and for me it works brilliantly. If I kept them more generic, I just would get far too many hits while searching in my bibliographic database, which contains some 5000 records.

But I begin to fear that my ideas about keywords and about how a keywords feature could and should be implemented in DT seem rather strange to many. That’s a pity: personally, for such a feature (which, by the way, is already implemented in an application like Bookends, where I use it with great profit) I would gladly sacrifice a whole range of existing features: for instance, the Classify-feature, which I rarely use.

I admit that my ideas about what a database application should be able to do are rather static and traditional; yet, I believe in their fruitfulness.

ChemBob · August 23, 2005, 11:53am

This is interesting. I made a significant effort to organize everything into logical groups when I first began using DT. Now when I add something to DT and tell it to “classify” it classifies correctly probably more than 95% of the time, saving me from having to drill through groups to find the best location (when I might not even remember the best group for it; I think my memory is more like yours Bill DeVille than some of the others here). DT often picks a more appropriate group choice than I might have made and, if I would also like it elsewhere, I just choose all the groups where I want it in the “classify” pane and it replicates it to all of them. Using DT the way I do the searches always return appropriate results when the proper combination of search parameters is chosen (I usually use the standalone search window).

So I guess my point is I can do without keywords far more easily than I can do without classify. To each their own I suppose, but I thought one of the major features of DT was to be able to avoid all this manual labor (e.g., forever having to choose where to group an item, picking keywords for it, color coding it, yada yada yada).

ChemBob

jams · August 23, 2005, 2:41pm

Funny. I’ve come to the end of my tether with groups and hierachies. I’m presently looking for a way to work without them (well, more-or-less without them), assigning metadata instead. I’ve realised that typing a few keywords in the comments pane (and labelling the file) takes all of 5 seconds. Items can then be grouped in specific ways using the sort command or by searching. That way I can just throw everything into the top level of the database, without having to deal with an impenetrable maze of nested groups (and without wasting time at some future point by reorganising my folders).

I know that DT’s classify function can be an excellent way of dealing with this maze, but I’ve noticed that more and more people are seeing the future in metadata (rather than the nested folder metaphor). With folders, you have to make one-off decisions about how to organise your files. Sorting/searching for metadata is specific to your exact needs at that moment. And you are not polluting your database with further groups each time.

Perhaps I’ve just never understood the best way to make use of groups. (For my work files, for instance, I’ve always grouped things by client, rather than by topic - and the ‘Classify’ function just can’t handle this). Or perhaps it’s just that I can’t stand navigating eastwards across my browser pane as I work my way through a hierarchy of groups.

Anyway, the system I’m trying out at the moment makes better sense to me (my head feels much clearer!). Virtually all my files are sitting on or near the top level of my database - I make use of groups only as temporary folders for projects I am currently working on. I run searches then group the necessary notes, ungrouping them again when I’m done (admittedly, some of these groups are going to sit around a while). My only permanent folder for the time being is an inbox (where everything new goes and stays until the metadata is assigned). Seems so much simpler. I’m only using DN at the moment, bit would certainly make use of Smart groups if/when I upgrade to DT pro.

Ironically, this process has also helped me realise new ways of using groups. i.e. groups=temporary/metadata=permanent. My only concern now is that DN/DT may not be the best tool for this system (Boswell tackles the concept in an interesting way, but is lagging behind in so many other ways). Full column support would also make things much easier (may be DT pro has this? I’m only using DN.)

Gosh, I didn’t mean to write so much. And I know some of this will seem naive. I’m certainly not an advanced user. But I’d be interested to hear what people think about the groups/metadata argument.

I just get the feeling that with DT’s AI and search functions, I might be able to do away with my old folder hierarchy altogether. Put simply, I want to spend more time writing, and less time organising files!