Overdue DA Review (warning: long post)

sgmiller · December 1, 2006, 11:05am

When I first switched to the Mac in July, one of the first programs I started using was DevonAgent (DA) which I viewed as a replacement for CopernicAgent (CA) which runs under Windows and is the only program that I know of which is comparable to DA. At that time, I felt there were a number of shortcomings in DA compared to CA which I posted about. After that post, Bill advised me not to compare DA to CA without taking into account the capabilities of DevonThink. After using both DA and DT for some time now, I see the wisdom of his advice, realizing that DA and DT are so tightly integrated that I consider them to be part of a single package. So, instead of trying to make a comparison of DevonAgent to CopernicAgent, I thought I would take another stab at a comment on the whole Devon package which I will just refer to as “Devon.” I am posting this to the DA section because most of my suggestions involve DA.

Before I do that, I need to explain a bit about how I use Devon. My work commonly involves gathering a large amount of mostly online information on a given subject, usually an organization or an individual, and then using that information to write a report. I would estimate that about 75% of this material is from webpages, another 20% from Lexis/Nexis and perhaps 5% from paper documents which I scan into pdf files. I usually begin with a DA search using a search set consisting of the largest search engines and then dumping the results into DT. I then supplement this material by importing any relevant material from my hard-drive (which, incidentally, I search using a superb program called FoxTrot instead of the almost useless Spotlight). I then get to work using DT’s autogroup and classification functions to start categorizing the material into an outline which later serves as the basis for my report. (Since almost all my material is online, I don’t have the problems with sourcing that some users seem to have since the URL, title, etc are preserved in the DT data base.) All things considered, the Devon software is probably just about perfectly matched to the way I work and far superior in most ways to the combination of CopernicAgent and NetSnippets that I used to run under Windows.

That said, there are a few areas which need improvement:

PDF’s and other file types

Since filetypes are than html are basically invisible to DA, they don’t make it into my initial sweep. I can’t just ignore these files for obvious reasons, so I have worked out a way to get them into DT. I do a search on Google and restrict the filetype to PDF for example. Lets say I get 3 pages with 100 results on each page. I then use the three URL’s as input into the Download Manager which then imports the files into DT where I dump them into the files with all the HTML files for grouping and classification. The lack of pre-filtering is not a problem with these kinds of files since unlike web-pages, documents don’t change and if they are unavailable, Download Manager simply doesn’t find them. This system works fine but it is a bit cumbersome. I think it would be better to have an option in DA to bring other filetypes into the results. As I said, filtering is probably unnecessary so it shouldn’t be so hard to do.

The Funnel v. The Net

I see Devon as a kind of funnel into which you pour a lot of material and getting useful results out of the end. For example, I just started working on a project which will involve processing about 1500 documents which I can’t imagine doing without the Devon capabilities. However, there is situation which I sometimes face that is the opposite of dealing with a large amount of information. Sometimes there may only be a handful of online sources on a given subject. Hypothetically, lets say that I am researching Mr. X and there are 7 sources on the entire Internet broken down as follows:

Six html documents, two of which are only in the Google cache
Three PDF documents
One .doc

DA is only going to find 4 html documents and will miss the cached HTML and the PDF/doc files. Once again, I am going to have to run a separate Google search to make sure I don’t miss the other file/types as well as the cached files. For some people, missing a few documents might be irrelevant but for me, in some situations, it could be a catastrophe. CopernicAgent did find the cached files (as well as the other file types) because the filtering unction could be turned off, instructing the software to return all results, even if the pages did not contain the search terms. Again, the work around is to simply run a Google search in these instances. By definition, there won’t be many files so importing into DT is not a big deal.

Evaluating Plugins

I don’t see any means to identify which plugins are returning which results in DA, so how do I know which plugins are most effect for my purposes? If there is a way to do this, please let me know but as it stands, I find myself just guessing about the efficacy of the various plugins.

Archives

For me, the archives are almost useless. My searches are often highly related and I usually will want to see a document even if it is in another archive because that document will be relevant to current research. The only time I want to avoid a duplicate is when I am updating a particular search. So, best would be the ability to filter against a particular archive/s. I could work around this by searching the archive first, but it is an additional bit of work and as it stands, there are no boolean searches possible for the archive. There is another workaround involving making backup copy of the archive and then deleting all but the relevant archive but that is both cumbersome and potentially dangerous, risking losing the whole archive.

Stability

I find both DA and DT to be a bit quirky. DA almost never actually finishes it searches for me and DT seems prone to corruption in the database as being subject to the eternal spinning beach ball for reasons I don’t understand. None of this crippling but it can be annoying.

Manual

I consider myself fairly intelligent, but often I can’t understand what the help manual is saying. For example, I just read the section on workflows and I still don’t understand what most of them do. This thing needs to be rewritten.

So, all in all, I am very satisfied with the switch now that I understand how DA and DT work together. I would like to see the suggestions I made incorporated but I can basically live with things as they are now with the exception of the archive problem. This really needs to be fixed, ideally with the selection option or at least the possibility of boolean searches. Also nice would be the addition of some kind of visual metaphor for organizing in DT, but thats a big subject and probably would need another post to discuss.

cgrunenberg · December 1, 2006, 11:23am

Thank you for immense feedback! Some of your suggestions are already in the pipeline.

But a database corruption shouldn’t happen - did you force quit DT Pro or did it crash? If the “spinning beach ball” problem should happen again, please open Apple’s Activity Monitor, select DT Pro, open the Info panel and press the “Sample” button. Then save the sample and send it to us so that we can check what’s causing the issue. Thanks in advance!

sgmiller · December 1, 2006, 11:32am

I am pleased to support Devon in anyway that I can since the product has become so integral to my worklife. I am told that developers appreciate this kind of detailed feedback, so it if it is helpful, I am glad.

As for corruption, I am using the term as a non-programmer so I am not sure if it is technically correct. For example, I recently had a database that was getting impossible to work with due to frequent spinning balls, etc. I did a Verify/Repair and it showed a large number of memory errors so I did a repair and almost all of the problems disappeared, but I still find myself doing a Force Quit from time to time.

I also had another database that simply got trashed. I don’t remember how, but it became unusable. I don’t know who to blame for these problems but after using databases for many years, I do realize they are prone to problems which I call corruption. Whatever problems I have had with DT in this regard, they have not been insurmountable but I will try to remember to send you the sample so yoiu can check.

Once again, thanks for your quick support. Its one of the pleasures in using the Mac. So many of the software houses seem to be very eager to please their customers. I often get responses to my queries a few second after I send them off!

cgrunenberg · December 1, 2006, 11:37am

Using Force Quit might damage databases, e.g. if DT is currently updating the database files. How much RAM does your computer have and how large is your database (number of contents/words, see File > Database Properties)? If there’s not enough RAM and/or the database is too large, this would explain the spinning beach balls.

Rebuilding the database you’ve repaired might also be a good idea to prevent future issues.

sgmiller · December 1, 2006, 11:47am

My current database is about 172,000 unique words/ 3.4 million total and I have 1 gig RAM.

valente · December 1, 2006, 11:55am

To sgmiller:

Thanks for pointing out FoxTrot. I’m an unhappy user of spotlight too and I don’t like NotLight or EasyFind that much either.

Just a tiny, hopeful (but doubtful) question: can you use it to search into the .dtBase?

– MJ

valente · December 1, 2006, 11:59am

I downloaded FoxTrot and I’m doing a “test drive” and – !!! – it does search inside the dtBase too. I’m so happy you pointed this out!!! It may be just what I needed.

Continuing the drive now.

– MJ

sgmiller · December 1, 2006, 12:10pm

With regards to Spotlight, sadly no it cannot search DT files but this is part of a larger problem I have had for some time which is the problem of proprietary databases versus standard file types. That is, even if FT did search DT files, I am unwilling to trust my long-term archives to any properietary system. I have learned the hard way that I cannot know what program, operating system, etc. I will be using down the road so any files that need to be archived have to be done so in standard file formats that I can reasonable count on to be generally usable into the future.

My additional problem is that I need to archive online material with at least the URL and DT can only “stamp” the URL onto exported pages in RTF format as opposed to html and web archives. Since many web pages don’t convert nicely to RTF, I have come up with another answer. Even if I have a webpage in DT, I make sure that I create another copy which do I do first by “stamping” the URL/title/date onto the page using a bookmarklet I customized for that purpose and then using a QuicKeys macro I save it to an archive folder on my hard drive. I have all this set up to work under the F1 key so all I need to do is load the page into Safari and hit one key to archive the page. I have the “Add to DT as WebArchive” set up as F3 so I can quickly import the page and archive it to my hard drive with two key presses if I want to do it at the same time.

Its more work but I got burned some years back in Windows by having thousands of web pages stored in a program called SurfSaver that didn’t have any way to export the files properly. I did figure out a work around but since then, I swore that I would always archive pages in normal file formats. My current system is the latestin a series of ways of doing that, but FoxTrot finds the pages under all the ways I have used.

In fact, Spotlight is so bad that without FoxTrot, I could not work on the Mac and be forced to go back to Windows (NOOOOOO!)

sgmiller · December 1, 2006, 12:12pm

So, FT does search DT? I guess I never tried it but I am glad for you. I still need to archive stuff separately for the reasons I specified, but that is interesting.

sgmiller · December 1, 2006, 12:16pm

Valente…are you sure about that? What does FoxTrot call the file type? Are you sure it is not just finding PDF files stored in the DT package file?

valente · December 1, 2006, 12:55pm

Well… that’s a good question.

I search and FT does find the reference I’m looking for in DTP. The path it gives me is (e.g.) “/Users/xxx/Documents/Dbase/xxx.dtBase/Files/.” I can even search (preview) the file in FT, which is great. (I’m mostly doing this for .pdf files that are stored inside a “Files” folder in .dtbase, but cannot be searched using spotlight and cannot be browsed via finder either.)

However, I’ve been having several crashes. I’m still going to try it a bit longer (a few days, if the trial mode lets me) and if the crashes keep happening I’ll try their support.

If FT really stabilizes I’m going to purchase it. If not – well…

Btw, I also agree with you on “backuping” the info you have in DT (especially since there’s no way now to access externally the files stored inside the .dtBase). I’m doing it regularly in an external drive so my 80GB internal hard drive won’t be full of duplicates and twice as full.

– MJ

sgmiller · December 1, 2006, 1:24pm

Sounds like you are accessing only the PDF files as I said. If you manage to “see” another filetype such as html that is stored in the DT database itself, let me know, but I doubt it.

As for FT stability, it is a bit quirky but they just released a beta FC for a new version which I have tested and is more stable. In my experience, it doesn’t crash very much. The index does need to be rebuilt from time to time but that is true of all DTS programs I have found. I just rebuilt mine and it works like a charm. I have it set to ignore everything on my drive except for my data files (and email) which are large (several gigs) because Spotlight is ok for finding the odd random non-work related file.

As I said, FoxTrot is everything Spotlight should have been and I take that very seriously. I need to be able to find things in my archive reliably and so far, FT has never failed. It would be nice if it updated “on the fly” rather than having to update itself periodically but since I use Spotlight for that kind of stuff, it isn’t a big problem for me.

I don’t understand why the DTS market isn’t bigger for the Mac than it is. It is such a hot thing in Windows with at least 5 major competitors but Vista will probably kill them all off. I refuse to believe that Mac users find Spotlight acceptable with it lack of preview, no proper phrase searching, etc.

valente · December 1, 2006, 1:29pm

Another info:

In FT the name of the file found at the .dtBase is not the name I gave it inside the database, it’s the original name of the file before imported to the database.

For instance:

I find a pdf file in Science Magazine that downloads as “sarticle.pdf” in my desktop. (Normally I don’t open it in Safari since the acrobat reader plugin is very slow; alas, the very nice DT script “Save PDF to DEVONthink” that renames it instantly doesn’t work on links.) Then I import it to DTP and change the name inside to “Barker 1997.pdf.” When I make a search on FT the result it gives me is on a file called “sarticle.pdf” inside the .dtBase.

It’s is not the best, but I already knew that DTP only renames the file internally. What I like about FT (damn the crashes!) is that I don’t have to open DTP when I’m searching online and find an article that I’m not sure is in the database. I can make a search on FT and instantly know if I have it there or not, thus making all the process (Find>Check>Download>Archive in DT>Rename)faster.

– MJ

sgmiller · December 1, 2006, 1:31pm

Also, the other argument for storing files externally is that you can do one search with A DTS program such as FoxTrot rather than having to remember to search different places on your drive with differen tprograms.

I am still waiting for a program like Google Desktop Search that will let you know when searching Google that you also have material related to that search term on your hard drive. Even simple notification would be great. However, I am no big fan of GDS by itself because I don’t think the Google concept works well for searching your hard drive where an extensive preview function is far more helpful than relevance rankings which don’t translate well to hard drive searches. (Never found any relevance ranking in any program of any kind to be actually relevant to me!)

valente · December 1, 2006, 1:42pm

You are absolutely right. It doesn’t. However, since my main searching need is on .pdf files, it’s okay. Most probably this has to do with the way pdf v. html files are stored inside dtBase.

That’s good to know. How many times does FT crashes on you? It’s been happening a lot to me. (If I leave it open, without use, it crashes after a few minutes.) But again I have a MBP (Intel) and maybe there’s something on that.

Yup! Spotlight is better than nothing, but its instant accessing to the disk and inability to allow more personalized searches makes it almost unusable for me. (Besides, as a launcher I have the great Quicksilver.)

– MJ

sgmiller · December 1, 2006, 1:44pm

Also, where is this “Save PDF to DT” script? I have heard about if before and asked about it, but I can’t seem to find it. Is it a script or workflow?

sgmiller · December 1, 2006, 1:46pm

Sorry I didn’t answer your question fully. I also have a MBP and FT rarely crashes for any reason and never while just sitting on the desktop. Sounds like a conflict of some kind.

sgmiller · December 1, 2006, 1:49pm

Sorry, I just re-read the email from the FT dev and I can’t post the beta. Forget I said that.

valente · December 1, 2006, 2:02pm

And its indexing abilities make everything much faster! I first started by purchasing iPassepartout because it allowed me to search (and intantly view) pdf files. But it was very slow. Then I tried Yep, but I hated the UI and the general way of behaving.

Finding DT was a bliss, really. To organize and search. The unique low part is not allowing external searches. However (I hope I’m not mistaken), it seems that will be possible on v. 2.

If FT stabilizes it can be a great complement for now.

The other think I would welcome would be the ability of commenting the pdf files in DT (even if it would only be reachable inside the dtBase.) Doing it in Preview doesn’t work for me.

– MJ

valente · December 1, 2006, 2:16pm

They are actually two scripts that work on Safari. (I’m not sure about other browsers, like Camino.)

Since I don’t know where they are exactly (they are Christian work, if I remember correctly) I’ll copy-paste them here for you:

Save PDF to DT (in a Group):

tell application "Safari"
	try
		if not (exists document 1) then error "No browser is open."
		set theURL to URL of document 1
		if theURL is missing value or theURL is "" then error "No page loaded."
		
		set this_name to ""
		repeat while this_name is ""
			display dialog "Saving PDF to DEVONthink Pro. Please enter a file name:" default answer this_name
			set this_name to the text returned of the result
		end repeat
		
		tell application "DEVONthink Pro"
			if not (exists current database) then error "Please open a database before using this script!"
			set theDestination to display group selector "Destination" buttons {"Cancel", "OK"}
			set thePDF to download URL theURL
			set theRecord to create record with {name:this_name, type:picture, URL:theURL} in theDestination
			set data of theRecord to thePDF
		end tell
	on error error_message number error_number
		if the error_number is not -128 then
			try
				display alert "Safari" message error_message as warning
			on error number error_number
				if error_number is -1708 then display dialog error_message buttons {"OK"} default button 1
			end try
		end if
	end try
end tell

Save PDF to DT (not to a specific group):

tell application "Safari"
	try
		if not (exists document 1) then error "No browser is open."
		set theURL to URL of document 1
		if theURL is missing value or theURL is "" then error "No page loaded."
		
		set this_name to ""
		repeat while this_name is ""
			display dialog "Saving PDF to DEVONthink Pro. Please enter a file name:" default answer this_name
			set this_name to the text returned of the result
		end repeat
		
		tell application "DEVONthink Pro"
			if not (exists current database) then error "Please open a database before using this script!"
			set thePDF to download URL theURL
			set theRecord to create record with {name:this_name, type:picture, URL:theURL}
			set data of theRecord to thePDF
		end tell
	on error error_message number error_number
		if the error_number is not -128 then
			try
				display alert "Safari" message error_message as warning
			on error number error_number
				if error_number is -1708 then display dialog error_message buttons {"OK"} default button 1
			end try
		end if
	end try
end tell

You’ll have to put them in: /Users/xxx/Library/Scripts/Applications/Safari/
and from my use they don’t work in links (“link save to…”) – you’ll have to open the pdf file in safari (via acrobat plugin) and then use the safari script.

– MJ