Almost satisified but not

sgmiller · August 4, 2006, 4:32pm

I started using DA today and I get a pretty idea of how it works, but I have a HUGE problem with the fact that it doesn’t search the content of anything but HTML files. I do hope that is going to be remedied soon because it is a real limitation for me. Once that is done, this will be close to a great product but not yet.

eboehnisch · August 9, 2006, 11:37am

Which file types would you like to see getting searched by DEVONagent?

Eric.

sgmiller · August 9, 2006, 12:10pm

Well, at the minimum I would say .pdf, .doc and .xls and then I would add .rtf and.ppt.

Sorry to be so hard on this point, but without these other data types, it is impossible to do what I would call a comprehensive search. There is just way too much info these days in PDF and doc files to have them simply ignored.

Anyway, I plan on doing an in depth comparison of DA versus Copernic sometime soon which will mention this and some other points. I think you gus are very close to having the clearly superior product with just this and a couple of other changes.

eboehnisch · August 9, 2006, 12:18pm

The main problem with .xls and .ppt files is that these file formats are proprietary to Microsoft, and that you need special libraries to read them. We’re already looking for suitable frameworks.

Another problem is: We’re using Web based search engines for the initial search. So, if they don’t index, say, .xls files, DEVONagent will never find them, of course.

Eric.

sgmiller · August 9, 2006, 12:25pm

Ok, but then PDF and DOC files should be doable and these are the most important anyway. Of course, Google and the other major search engines index them so that is also not a problem.

ionos · August 9, 2006, 1:13pm

Hello Eric:

I meant to suggest this for DevonThink, too: For most file formats, there are command line tools to convert them to plain text files (antiword, wv, ps2ascii, …). I would be very happy if I could simply tell DT/DA which command line tool to use to convert (and index) certain file types. And a menu item “Reindex” …

Cheers
i

eboehnisch · August 9, 2006, 1:24pm

Well, do you have CLI tools for Excel and Powerpoint as well?

Eric.

ionos · August 9, 2006, 1:51pm

There are some on dataconv.org/apps_office.html . From the FreeBSD port package for xlhtml: “Convert Excel and PowerPoint files to HTML and text”. Don’t know what Excel/PowerPoint versions it works, but for me, anything helps.

Cheers
i

eboehnisch · August 9, 2006, 1:53pm

ionos: Thank you! We’ll check them out!

Eric.

Bill_DeVille · August 9, 2006, 3:24pm

PDF files are a very tiny fraction of the files on the Net, but the most important alternative to HTML for “upfront” data dissemination. If I had to vote for one additional file type searchable and downloadable in DEVONagent, it would have to be PDF.

Word files are declining on the Net as a primary means of posting information, and I think the trend is accelerating. Same for Powerpoint, as it’s becoming very common to post PDF versions of Powerpoint presentations. I think the number of DA users that would need direct searching and download of Excel files would be vanishingly small.

At this point I want to distinguish between file types posted as the primary communication medium with site viewers, and additional file types that are listed, e.g. via an HTML page, as available for download. This latter group includes a very broad spectrum of common and not-so-common file types. I see .rtf, Excel, Word, Powerpoint, and dozens of other file types as included in this category, including – among the Mac community – file types of considerable significance such as Pages, KeyNote, OmniOutliner, and so on.

DEVONagent can directly search HTML. So it might be argued that one would miss important PDF, Word, and other information resources that cannot be directly searched and downloaded by DEVONagent.

As a practical matter, that’s really not true. The reality is that DEVONagent searches are one of the richest sources of new PDF, Word, RTF, and other file types that I add to my DEVONthink Pro databases.

That’s so because most “good” Web sites list and describe such linked file resources. Because DEVONagent actually downloads the search results to one’s computer, those links are included, and the descriptions are of course searchable when I send my important DA searches over to a DT Pro database.

Example: Suppose I’m looking for technical and policy information on the European Union’s Web site. That information is available in PDF format. DEVONagent can’t search and download PDF files. But DEVONthink Pro can. If I send from DEVONagent to DT Pro HTML pages with linked PDF I can search the descriptions and download the PDF (or Word, or Powerpoint) to my database where the text is not only searchable but has useful artificial intelligence features to help me analyze the content.

Suppose I’m lead to an interesting Excel file. DT Pro can’t read the Excel file. But it can open it under Excel and capture a version as PDF that can be read and analyzed. So it has ways of capturing information of significant importance from almost any file format that’s readable on the Mac.

Suppose I’ve sent over from DA to DT Pro a page that links to hundreds of Word and PDF files. I can use a script to automatically add them to my database.

Yesterday, I created a new DT Pro database containing more than 10,000 HTML pages, which consisted of the search results from two DEVONagent searches. Those searches were the start of a new database on heavy metals, starting with lead and mercury. DEVONagent had already filtered out junk pages and I’ll do some further filtering and organization in DT Pro, with the end result a very useful reference collection.

I simply can’t overstate the importance and usefulness of the teamwork between DEVONagent and DEVONthink Pro. The combination lets me capture and mine information from documents in ways that Copernic can’t approach.

I hope that teamwork is included in your comparison of Copernic and DEVONagent.

sgmiller · August 10, 2006, 4:46pm

I agree that PDF is the most important, but even if .doc is a “declining” format, it still is very important and I often find critical information in this format. XLS and .ppt are not as important but should be included for comprehensiveness.

As for downloading linked PDF, that doesn’t cut it for me. Much of the time I am looking for a name which would almost never appear in any kind of description of a file. I need the entire document to be searched in order to find what I am looking for.

No, I won’t be including Devon Think in my comparison. I can see what you are saying but I dont use the program and I want to do an even comparison. Of course, DA integration with both the MacOS and with DT are a plus, but the outcome will depend on many factors including this one.

Bill_DeVille · August 10, 2006, 6:56pm

Fair enough.

But I hope that your review mentions that most people who use DEVONagent for Web research also use its companion program, DEVONthink Pro, and that DEVONthink Pro does download and handle PDF and Word formats with ease.

From DEVONagent, one can send content to a DEVONthink database or initiate a search in the database.

From DEVONthink Pro one can initiate a Web search in DEVONagent.

Copernic lacks such an interactive companion application.

sgmiller · August 10, 2006, 8:58pm

I certainly will mention that and clearly, along with the Mac OS integration, it is a positive point for DA.

Look, I really want to like, make that love, DA. I switched from Microsoft because my frustration with XP had grown to monumental proportions and the Intel processor made it possible for me to make the switch without giving up anything. I will say more about this when I get a chance to write my review, but there are just a couple of things holding me back from making the complete switch from Copernic to DA. I think these can and, from what I understand here, will be addressed but I can’t ignore them. I make my living doing research so I have to make sure I have the bases covered.

More later.