Varieties of data visualization approaches:
Data visualization approaches cover a very wide range of possible approaches, both in the identification of connectivity with some property or measure, and in the method of presentation.
Many approaches focus on a property of a document that’s innate. For example, DEVONagent 2.x provides in the Digest view a listing of words that appear frequently in the set of pages, the “Topics”. Click on a topic and one sees a network representing, at the center, the subset of pages that contain that term, with graphical linking to subsets that also contain another of the terms in the topic list, and so on. In this example, no user intervention is required to generate the list of topics, or to define the characteristics of the network (although the user can enter a new topic term).
DEVONagent users often ask for the ability to sort a list of Web pages by date. Unfortunately, there’s not a standard – or if so, certainly not one in wide use – that defines the use of dates for organizing Web pages. A page will have been captured and saved to the Archive or to DEVONthink on a certain date. The page may have been modified by the site administrator on a certain date, it may contain the date of publication as a news article, and it may refer to an event that happened, e.g. in 1957. Most date references simply occur in the body of the text, with no standard contextual reference to clarify how they are to be interpreted by software.
Suppose I’m researching a paper on the influence of Adam Smith on contemporaneous and subsequent economic theory. I’ve got a large collection of documents, from original materials to historical overviews to a number of scholarly publications written at various times.
I’m likely to discover that different writers have different perspectives about Adam Smith. I may try to analyze this in terms of “schools” of economic theory, and I’ll find that a timeline becomes important in describing the rise and fall of such schools. For example, it will be pretty evident that if I look at a book written by a German economist in 1890 it will substantially differ in theoretical approaches from one written by a German economist in 1996 (and not merely in the number of footnotes – a subject about which I tend to tease scholars).
And I’m likely to find that the geopolitical setting in which writers work is important. For example, Galbraith and Lafler have commonalities of perspective that they would not share with a Russian Marxist economist.
Timeline presentations would make good tools to help explicate the evolution of economic theory from the time of Adam Smith to the present.
Here’s where a tagging scheme becomes important. To support such a presentation I’ll have to provide two components; a quantitative component, time (which can be aggregated by year, decade, perhaps an interval such as that period dominated by a political movement, etc.) and qualitative components, such as descriptive terms for a school of economic theory. Subsequently, I can do searches for references and/or time periods within my database.
That tagging scheme is metadata about selected topics, arising from my own research and analysis. It doesn’t exist in my database independently. I’ve got to add it. Or perhaps just create tables representing the presentation I want to provide the reader. And that tagging scheme (or presentation) is “subjective” in the sense that it is subject to review and analysis of my findings. It’s a hypothesis (at best; I studied under Karl Popper).
And it’s the culmination of a lot of hard work! Which is why, whenever possible, I try to work with “objective” visual representations. An example of such an objective visual representation is a weather map. Or a three-dimensional representation displaying by color and shade the level of contamination of a pollutant in surface and subsurface soils at a Superfund side, which summarizes reams of data. Note, however, that it has taken decades of work to establish the techniques and conventions underlying such visual representations, so that they can be accepted as both objective and meaningful. These two examples are really wonderful, as one can obtain a great deal of information at a glance.
I must say that I have reservations about most mind-mapping schemes, as they often tend to support quick, facile conclusions about relationships that may well be wrong or meaningless. Example: I once conducted a series of 730 experiments, with the results of each documented in a lab notebook and each containing a visual representation of the data as a graph. Flipping through the pages and glancing at the graphs, one quickly gained the impression that the results individually displayed a trend in the results, and the many-fold replications of that trend reinforced the impression that one could draw a meaningful conclusion from the graphs. Wrong. A careful statistical analysis of the data showed there was no trend. As these experiments involved a potential medical procedure (for treatment of glaucoma) that was costly and had associated risks, a more careful evaluation led to the conclusion that the procedure should not be done, as it would not have benefited patients, might injure some and would be a waste of resources. That conclusion was subsequently verified by other researchers.
Subsequently I showed that collection of 730 graphs to several doctors and medical researchers. Without exception, they concluded that the data indicated a useful procedure that should produce favorable results. How easily we can be led to jump to wrong conclusions!
We often bring preconceptions to visual experience. A number of artists have demonstrated that they can make us “see” things that are not there, or that are physically impossible.
Such caveats aside, visual representation of data can be very useful, and it’s likely that it will appear in DEVONthink. I like the visual representation in DEVONagent 2.2 and hope that at some point it can be extended to (more memory intensive) use of phrases rather than just single-word terms.
The revised database structure that will be used in version 2.0 will make it easier for other applications to access the content of a database, or perhaps to develop cooperative interactions with other applications. So that should provide additional routes to visualization of the content of databases.