One of the problems with documentation is that most users rarely read it, and even fewer would read it were it to become highly detailed.
The downloadable PDF documentation is searchable and provides a brief description of all of the menu and contextual menu commands.
But the additional layer you suggest would be a gargantuan task, never complete, always being revised and probably not a useful way to lay out for examination the tools one might choose to do a particular task.
Which is why I usually recommend that new users start small and experiment with the features.
Your example of the PDF capture of a Wikipedia page is a good example. Were the links captured? Yes. Were all of them “marked”, and so visible without a mouseover? No. An answer to the “why” question is that designers of Web pages have considerable latitude and it could take a lot of documentation to cover all of the possibilities.
A PDF download of a Web page may not capture all of the links used on that page, because the page may have been designed with Java or other tricks that PDF can’t interpret.
And a user of DT Pro who is still running under OS X 10.3.9 can capture a Web page as PDF, but the resulting PDF won’t capture hyperlinks. DT Pro makes a great many calls to the operating system (PDF captures are only one example) and there are significant differences between Jaguar, Tiger and Leopard that affect what happens. That’s obviously also true of some scripts.
A user’s choice of Web browser introduces important variables. Safari and other WebKit browsers (including DEVONagent and the DT Pro browser) are Cocoa applications and make use of OS X Services. Camino and Firefox, whatever their other virtues, are not Cocoa-based and have limited to no communication through Services. Another variable among browsers is their Applescript dictionary – built-in scripting “hooks”; Firefox doesn’t work well with Applescript. So no set of scripts can be designed that work equally well among all browsers, unless they were limited to the level of the least scriptable browser.
So when I want to capture information from the Web, I limit my browser choices to those that communicate well through Services and work richly with Applescript. That excludes Camino and Firefox, for example. I use (depending on my workflow of the moment) DEVONagent, DT Pro’s built-in browser or Safari, in order of decreasing “richness” of the capture to database options.
Another big set of variables: I use rich text captures of Web material almost exclusively. I want to capture just the text and graphics of an article (and the hyperlinks and source URL) and to exclude all other material on that page. I don’t care about the Web site designer’s layout; I just want the information content that’s important to me. That has important consequences for the efficiency of searches and See Also use, as well. I subscribe to online scientific journals such as Science and Nature. They provide downloadable copies of articles in searchable PDF format. I don’t put those in my database. Instead, I select the text and images from the HTML view of the articles. My captures are richer, because they include hyperlinks usually not present in the PDF versions, and “tighter” because they avoid extraneous material such as the end of the previous article and the beginning of the next article.
But if I were a Web page designer, I would have very different interests and consequently a different choice of capture format, so a different workflow from beginning to end.
My preference for rich text captures is part of my overall workflow. I do my writing inside my database in rich text. I can easily extract quotations from other rich text documents. I can insert hyperlinks into rich text documents, or use highlighting and insert page marker cues. When I’ve finished a draft I’ll copy/paste it into Pages or Papyrus for final polishing including layout, footnotes and endnotes. (If I need to furnish the final output in MS Word, either of those can do an acceptable conversion to Word. Pages accepts my draft material from the database via copy/paste beautifully. A transfer of images to Papyrus requires more tinkering, but I have other reasons for liking Papyrus for some purposes. I’ve got MS Office, but avoid using it.)
I generally don’t download pages as PDF, for the additional reason that extracting a quotation from a PDF requires editing to get rid of all those fixed line endings. But I do download PDFs of pages from my bank’s Web site, because that gives me an accurate record of transactions and captures as HTML or WebArchive are not permitted on such a secure site. The simplest method is to “print” the page as PDF, using DT Pro’s Save to DEVONthink Pro script in the File > Print panel under the PDF button. (A different script to save the page as PDF won’t work on my bank’s secure site if it attempts to grab the URL, download and convert.) And I’ve scanned thousands of pages into databases as searchable PDF+Text.
Re your question about styles and lists in rich text, just activate the Ruler for a text document. The Ruler options act just as in TextEdit.
Just wanted to illustrate a few of the variables involved in choices of tools for a workflow. Most of them can’t be described within the documentation of a single application. Overall, a good workflow should honor the personal preferences of the user, and should efficiently move along to the desired end result. Although I prefer rich text captures, I’ve got a mix of file types in my database because I have to handle a variety of sources and contexts.
I find that I end up using a relatively small subset of all the tools available in DT Pro Office for most of my work. I’ve settled on those through experiment and practice. When new features come along, I sometimes adapt to them and change procedures. For example, rich text captures of tables in early version of OS X were unusable. Now they are pretty good.