On Documents and Information

On Documents and Information

Greetings, my fellow Devon-software users. I’d like to share some thoughts on the downsides of DevonThink with you, downsides and drawbacks that either I haven’t found a solution or workaround for or that may be true for other users, too, and should therefore be tackled by the Devon Development Forces as soon as possible.

DevonThink is a great application and I have been using it more or less intensely since its infancy. You see, I am a teacher and folks like me have to deal with tons of the most various information imaginable. DevonThink, so I thought, would help me keep track of all those worksheets, texts, diagrams and research data I had assembled in years past, pretty much the same Steve J. promised what Spotlight would do. And, so I hoped, DevonThink could thus prevent me from inventing the wheel twice. Finding and reusing old material would be much more efficient than creating it all over again every time I needed it. Unfortunately however, this never really worked for me. The reason for DT not living up to my expectations is rooted in the one great and painful gap in its feature set: It cannot handle documents.

Now, this is the point where many of you might raise their hands to intervene. You should feel encouraged to do so, but let me explain my point a bit more detailed before. DT has always been great for gathering information. Convenient system shortcuts and a built in web browser and lots of other cool features help you collecting all kinds of information that might appear useful to you. It goes like this: “Ah, I might use this little snippet later, and who know, this pictures might come in handy later, too. Oh, and this website contains tons of information that might not be related to my current project but I’d better store its URL for later reference.” Ain’t it so? That’s how information gathering via DT has always worked – and it’s great. Now let’s take a look at what this kind of workflow creates: Yes, you are right, lots of tiny little snippets, notes, pics, clips, whatever. These snippets are what I relate to as ‘information’. It’s a good thing that DT is also an unparalleled information manager. Artificial intelligence and a clever browser with lots of views and features make it easy to stay organized. But organized or not, this information really isn’t worth a penny unless you make something of it. That’s what we gathered it for, isn’t it? We want to create something with this information. We want to write a text, an article, an project plan, an outline, a mind map, a letter or eMail, a report or an essay. In short, we want to create a 'document" that in some way incorporates the previously gathered and organized ‘information’.

Unfortunately, here we reach the limits of DT’s feature set. There may be a basic text editor, ok, but you can’t really create a polished document with it. It’s great to jot down notes, but that’s about it. It surely can’t produce, for instance, a decent school worksheet or a page layout for the next company memo. What follows now is the colourful world of fancy workarounds. You could copypaste the stuff you need into other applications, always keep a DT window open next to your word processor, and so on and so on, and that would be fine with me, but… yes, again there is a little but. When it comes to handling ‘documents’ DT hardly understands any file format apart from RTF and PDF. This means that you have to save your creations outside the DT database. This doesn’t seem acceptable to me because you leave the reach of DT’s search capabilities. DT cannot answer the question “Where is that report I wrote 6 months ago on a similar topic” anymore. Sure enough, it can still find you the information you gathered for it, but not the report itself. Again, workarounds are imaginable but circumstantial. You could export your documents via PDF to DT. But why should you create and look after two editions of your documents? This turns the user into a file converter and manager rather than an ingenious mastermind with a productivity head start.

That said, here is my wish list for future versions of DT:

  1. Give up that nasty database approach.
    I say this from a users point of view. I have no idea whether this is possible from the development point of view. Spotlight can index all the files on the HD, so why can’t DT? As a user I don’t want to constantly think about which files I put into the database and which not. This is reduces efficiency.

  2. Make DT more transparent
    Once the DB is gone, there is no need for a browser anymore. You should instead turn DT turn into an EasyFind on steroids, which incorporates all the features (e.g. AI) that make DT so unique and would thoroughly kick Spotlight’s butt.

  3. Add tools to facilitate document creation
    No, I am not talking about new text editor features here. I am thinking (or dreaming) of floating panels that give you the information needed for the document creation process. I am dreaming of graphical visualizations (like in DA 2.0b) to illuminate connections and relations between the information snippets gathered for a certain project.

That’s about it. Now, let the DT wars begin. I am sure many righteous knights will be happy to defend their app of choice. As much as I am looking forward to this discussion, I am particularly interested in what Devon Officials think about my thoughts and ideas.

Best regards,
Christian

Hello, Christian,

I find your post very interesting. But I have a different point of view. For me, gathering, and creating information are two different tasks, that I would not like to get mixed up. DT is really GREAT for gathering and organizing information, but for creating my own documents, I would not expect that DT replaces TeXshop, for example (this the one I use the most), or Word and Co.
Instead of having a single application that replaces all others, I would prefer that DT keeps what makes it unique: the file browser, with 6 (!) different views, the web browser so well integrated with the application, the great import/export capabilities,
the AI (although I’m always a bit sceptical about DT’s “classify” ), and above all, the excellent applescript support. If you really have a large number of files in your DB, and nee, for some reason to change the organization, your quickly come to the conclusion that it’s impossible without applescript. Applescript really makes DT versatile.
Each one can adapt DT to its own methods of work. For my own needs, I could set up a script that checks, every month, the new publications on arxiv.org, downloads, and classify the abstracts, makes a sheet with the bibtex info. It now takes 15 to 20 minutes to check 7 topics (just the time to have a coffee). In the past, I did the same thing, it took me half a day for 2 topics, with, most of the time, some errors. I’m sure that many users could give such examples.

So, for me, no DT war, just a wish that in the future, DT interacts more and more with OSX
Creating/ sharing/reading spotlight’s smart folders would be great, I think. If I understand correctly , DTPro 2 will be spotlight compatible. I’m not sure exactly what it means, but spotlight will certainly be able to index DT database, that’s at least half of my wish.
Improvements in the metadata treatment, like links in sheets, or the possibility to add custom columns to the database like Mori.
Some improvements of the interface, like, I agree with you, floating,transparent ,windows, drawers, toolbars a la Dragthing would be helpfull.
But also, more features mean bigger , slower application, needing more memory, more CPU, etc…So I guess the developpers are constantly searching a balance between features and speed (Am I wrong?)

I’m also very interested to have the opinion of DTpro users and developpers about these questions.

Regards,
Alb

Neither would I want that. If you understood my post this way you certainly understood wrong (wrongly?). I quoted the important passage of my original massage once more and set the crucial parts in bold letters. I that I didn’t want DT to become a document creation app. I spoke of “tools to facilitate document creation” like floating panels and that show information next to the document creation apps.

I couldn’t agree more.

Sidenote: Maybe I should consider learning AppleScript. Your automated workflow sounds too good to be true.

However, this only refers to DT as information gatherer and manager. It still doesn’t adress DT’s problems with handling documents.

I think it has turned out that we agree on more than the last matter.

Well, my suggestions would make DT much leaner than it is today. Of course, I fear they are quite illusionary as giving up the database approach would mean a complete rewrite of the app. I don’t expect that to happen. As far as I am concerned, tearing down the borders between the file system and the DT’s database would be enough.

Hoping for more answers,
Christian

But it works, and I didn’t (and still don’t) know much of applescript. I’ve been looking a the scripts provided with DT, asked advices on this forum. I always got quick and precise answers,-and sometimes some pieces of script- especially from Christian Grunenberg, Bob Annard, and Bill Deville.

You don’t really need to learn applescript, but, it’s worth spending a little time to understand what applescript can do.

Can you explain with more details what you mean by “give up the database approach”?

Alb

I can! The database is IMHO a big productivity bottle neck. You have to put in file to work with, you have to export other files you wish to work on with external apps and afterwards you have to import those files again. All a big hassle for the user, for whom the name “information worker” gets a whole new meaning, namely as “the person who constantly works with files” instead of “on files”. The database is the reason why DTPro constantly is in the way of my workflow. It is the reason for a good deal of data redundancy, because many files don’t work in DTPro and have to be kept in a stripped down form (RTF) within DTPro as well as in a complete version (Pages, Word, Excel, whatever) in your file system.

If the developers could figure out a way to make the DT database work more like a the Spotlight index does, this would be great. Because in this scenario the user would never know that there might be some database running in the background. Why not make DTPro a Finder plugin? This is what I meant when I said “Make DT more transparent” in the first place. In this scenario the user could just work on his files, like it should be.

Of course I know, I have to say this again, that this is all fantasy. I doubt that the developers will ever make that dream come true.

Best regards,
Christian

Hi Alb

Can you share your Arxiv script somewhere? It sounds like an amazing tool.
Thanks in advance.

Here’s a simplified form of the same thing (I’ve deleted the previous one).


--cat_list: the list of categories you're interested in
--thedate: should be of the form: yymm, by default, thedate is the year and the month of current date  
--tex_link is the link to the tex source of the paper


set cat_list to {"OA", "FA"} --, "QA", "KT", "SP", "GN", "PR"}
set thedate to short date string of (current date)
set thedate to (word 3 of thedate & word 2 of thedate)
repeat with this_cat in cat_list
	set main_URL to "http://front.math.ucdavis.edu/math." & this_cat & "/" & thedate
	RecordSelectlinks(main_URL)
end repeat
----

on RecordSelectlinks(this_url)
	set selected_links to {}
	tell application "DEVONthink Pro"
		try
			close every window
		end try
		set thesource to download markup from this_url
		set abstract_group to my selecpart(thesource, "<title>", "</title>")
		set thelist to (get links of thesource)
		set thedestination to create location "_Nouveaux abstracts/" & "-" & abstract_group & "/"
		activate
		open window for record thedestination
		tell application "System Events" to keystroke "6" using (option down & command down)
		--set tex_sources to create location "_Nouveaux abstracts/" & "/TeX_Sources/" & abstract_group & "/"
		set currentdelims to AppleScript's text item delimiters
		set AppleScript's text item delimiters to {"/"}
		repeat with thelink in thelist
			try
				set testlink to text item 3 of thelink
				if (testlink / 100 > 98) then set selected_links to selected_links & thelink
			end try
		end repeat
		set AppleScript's text item delimiters to currentdelims
		----
		----	
		repeat with short_link in selected_links
			set complete_link to "http://front.math.ucdavis.edu" & short_link
			--set tex_link to "http://fr.arxiv.org/e-print" & short_link & ".gz?front"
			set thesource to download markup from complete_link
			set record_comment to my selecpart(thesource, "<title>", "</title>")
			set record_title to my selecpart(thesource, "<b>Title:</b>", "</font><p>")
			set theabstract to my selecpart(thesource, "<b>Title:</b>", "</pre>")
			set thesource to "<html><body bgcolor=#ffffff>" & "<br clear=all><p><FONT size=+1><B>Title:</B>" & theabstract & "</pre></body></html>"
			create record with {name:record_title, type:html, source:thesource, URL:complete_link, comment:record_comment} in thedestination
			--create record with {name:record_title, type:nexus, URL:tex_link, comment:record_comment} in tex_sources
			beep 1
		end repeat
	end tell
end RecordSelectlinks
-----
on selecpart(thistext, delim1, delim2)
	set currentdelim to AppleScript's text item delimiters
	set AppleScript's text item delimiters to delim1
	set thispart to the second text item of thistext
	set AppleScript's text item delimiters to delim2
	set thispart to the first text item of thispart
	set AppleScript's text item delimiters to currentdelim
	return thispart
end selecpart


I’m just a new/trial DT user, so perhaps getting ahead of myself, but I couldn’t agree more with this observation!

The ideal for me would be to drop the database concept, and continue to refine DT’s “meta-Finder plus document viewer” capability. I really only want to deal with ONE set of documents - the originals - but also retain DT’s amazing ability to classify, re-organise and preview those documents.

I’m sure it’s much safer to have DT acting only upon its own database, rather than messing around with the disk directory itself.

BUT if DT simply maintained its own disk directory of chosen documents (i.e. imported references rather than imported documents), and independent of the OS, then one could choose to view that (within DT) or, if needed, go to the Finder to see the full, original structure.

Obviously there would have to be some kind of sync enabled, so that DT knew when any documents in its own directory had been moved or deleted - but as long as the user could choose when to invoke it (or at least, at what intervals) I don’t think it would bog things down at all.

Of course, all this is easy for me to write…but no doubt not so easy to programme! But I think this kind of shift would really maximize the potential of this already-useful programme.

SN

Version 2 will do this (more or less) - the database will contain only metadata, the contents will be files (and therefore you can easily edit them with any application), version 2 will synchronize the filesystem and its contents (e.g. rename external files when the content is renamed in DT), will be Spotlight compatible and will support smart groups (without needing Spotlight).

Version 1.1 of DT Pro will be the first step towards version 2 by removing all those nasty copy/don’t copy preferences (and some more confusing preferences), by combining File > Index and File > Link To commands, by improving File > Synchronize and by making indexing compatible to phrase/wildcard searching.

Sounds a bit like iViewMedia Pro - not only for pictures???
:question:

I didn’t dare to dream of it… in this case there is only one question left: When will those versions (both 1.1 and 2.0) be available?

Happiest regards,
Christian

A universal binary of V1.1 will be available later this month.

Question 1: Duplicates. Will v2 (or even 1.1) provide for a way to eliminate duplications when synchronizing? Ideally, this would take place recursively at the lowest selected level of both the DT database and the source folde(so, for example, if you select the entire database, all groups will be searched as if they were ‘flat’)

Here’s an example of how that would be useful: I imported a couple of thousand docs and meticulously reorganised them within DT. However, I then lost track of which folders I already had imported from, so I “cleverly” decided that simply synchronizing the database with the source data would take care of it. Surprise! - hundreds of dupes - very tedious to deal with using the little script.

Question 2: File name colours. After messing up my data with all these dupes, I noticed that many of the duplicate file names were in red - but I thought that dupes were supposed to be in blue. At one point it appeared that red also indicated unsupported formats, but many of these open just fine in DT. I can’t seem to find an explanation of this in the help files. Could anyone explain the meaning of red file names?

Question 3: Indexing. In my quest for a ‘meta-finder’ non-database way of working with DT, I keep coming back to using indexing. I’m aware of the limitations on searching if the originals are changed, and loss of the easy portability of a unified database. But, what about a sequence of indexing, organising within DT, then exporting the entire new organisation outside of DT? Would there be anything lost by following this procedure multiple times (until v 2.0 comes out!)?

Thanks for any thoughts.

As all the ‘old hands’ already knew, I eventually discovered (from the tutorial file, not the help file) that red indicates “replicants”. While I still don’t understand why imported dupes would be tagged as replicants sometimes and dupes at others, at least I understand what it means…

I’ve spent some time testing this, and it has worked quite well. One gets most of the advantages of DT, along with the flexibility of keeping the docs in the Finder. Documents not directly supported by DT were correctly resolved on export, creating a nicely re-organised structure.

It sounds like something like this (only more elegant and more sophisticated) is at the root of v2.0, which is great - but in the meantime, using Indexing instead of Importing seems to be a very convenient and surprisingly efficient workaround - at least for my purposes.

SN

The synchronization of V1.1 will be more sophisticated & reliable and therefore won’t create duplicates anymore if you’ve reorganized (e.g. grouped) the contents of imported/indexed folders.

The only disadvantage of indexing will be that indexed material can’t be edited (as soon as V1.1 will be available which will make indexing compatible to Phrase/Wildcards searching).