IRIS, which version?

hwangeruk · January 21, 2008, 12:03am

Hi
I am trialing this software.
I wish to use OCR on my PDFs. And this has IRIS built in, and it works well.

Is this the latest version of the IRIS engine?
If so, and DevonThink is cheaper, why would I buy IRIS instead of this software and just use it for OCR?

Thanks
Hugh

annard · January 21, 2008, 8:59am

It depends on what you want to achieve. If you want to convert a lot of paper to searchable PDF and these papers are nicely legible our software is a good choice. If you need to do complex OCR with contrast/brightness, custom dictionaries and other aspects that need to be fine-tuned for optimum results you’ll want to purchase IRIS.

hwangeruk · January 21, 2008, 10:33am

Thanks for the reply.

I found DevonThink to be very good quality, very fast for scanning.

I am struggling to find a use for the “database” nature of documents.
I can OCR the my PDF’s, put them in a Mac OSX folder and let Spotlight index and search them.

What is the advantage of putting them in DevonThink DB? (A polite question, I hope that doesn’t seem rude)

Perhaps encryption? And I am locked in to DevonThink if I load all my documents into it.

I’d buy it just for the quick OCR though

Thanks for any help you give.

Bill_DeVille · January 21, 2008, 8:27pm

First, let me emphasize that files entered into a DT database are not “locked in” – they can be exported to the Finder at any time. Actually, PDFs are already stored in the Finder, in the Files folder inside the database package. DEVONthink isn’t encrypting the files stored in the database. So you can completely recover all the files which were imported, as well as all the files you may create or modify inside the database.

Your second question was “What is the advantage of putting them into a DEVONthink DB?” I love that question. No, it’s not rude. Are you in for a response!

I’m going to tell you why and how I use my databases, why I call DT Pro the best research assistant I’ve ever had, and why I don’t use EagleFiler, Yep, Yojimbo, Pages or Together for my own research. Tip: none of the alternatives can handle my main database responsively, and most choke up on attempts to import it. None of them has the AI features that I use very heavily in DT Pro/Office, so none qualifies as a sometimes bright research assistant. All of them have UI deficiencies for my purposes, although I’ll grant that some of them have “prettier” GUIs than does DT Pro.

I’m managing more than 150,000 documents that reside in a number of topically designed DT Pro databases. Almost all of those databases are self-contained, so that I can move them between computers easily. Some are designed for distribution on DVDs or over the Internet as teaching guides or resources. I use some databases frequently, others rarely. I do most of my work on a laptop with 2 GB RAM and a 100 GB HD. I couldn’t fit all my databases on the laptop, so most of them reside on external drives and/or another computer.

I don’t put all of the files that are contained on my computers and external drives into my databases. My databases contain collections of files related to my interests and needs. But I use iTunes to manage my collections of audio and video files, because it’s very well designed for that. And I use Aperture and iPhoto to manage my collections of photos, because they are very well designed for that. I use iCal and Mail to manage calendar events and To-Dos, although I’ll summarize interesting events in a daily journal that I keep in my main DT Pro Office database.

I pretty much live in my main database, both because it contains tens of thousands of references and notes related to my interests in environmental science, law and policy issues (more than 24 million words of text content), and because, as my default database, it’s also the vehicle for pulling in new content both for those topics and for material ultimately destined for other databases. I do most of my drafting inside the database when I’m working on a project.

Let’s explore what I gain by putting material into a DT Pro database, rather than depending on Spotlight to help me find material of interest, and perhaps place a related collection into a smart folder.

First, I gain a lot in search speed compared to using Spotlight searches, assuming my collection isn’t so big that I need Virtual Memory. I’m spoiled. I like most single-term searches to take 50 milliseconds or less, and that happens in my main database. I typically use the Search window (Tools > Search), and I can keep open as many Search windows as I wish while I’m looking for information. If the collection of all my files is analogous to a library, my collection in a topically-designed database is analogous to the case where I, as librarian, have placed a collection of interesting and related books into adjacent shelves in the stacks. But the collection involves many disciplines of environmental science and technology, as well as environmental laws and regulations and policy issues.

Second, I’m in a much richer working environment than is provided by the results of a Spotlight search. For most of my content I can immediately view a search result inside the database, with query terms highlighted. I can use Find to look for an alternate term. I can copy excerpts to the clipboard for incorporation into a new or existing document. I can select a text string and do a Lookup on that string inside the database. I can call up the Classify operation if I wish to file the result document elsewhere, or replicate or duplicate it elsewhere. I can invoke See Also to see a list of contextually similar documents in the database. For a long document, perhaps a book-length PDF, I can select a paragraph, section or chapter and see a list of similar documents contextually related to that excerpt. I can examine a list of topics or words that an AI feature considers significant in the content. Often, I will select some or all of the search results and move them or replicate them into a new group for organizational purposes, especially for a new project.

If I wish, I can tag some or all of the results of a search by replication or duplication into a new group, or by invoking a script that will allow me to insert a searchable text string into the Comment field, or by adding a searchable Label or State to the selected items. Most of the time, I don’t want such tags to be permanent, precisely because most of the time a new project involves looking at my collection in a way that’s different from previous investigations, so the old tags would tend to get in the way of a fresh set of insights. For that reason I’ll usually clear my main database of such tagging added for a project by resetting Labels and States to the default None (Label) and Off (State). Often, I’ll first spin off a group containing all the project files into a new database, perhaps keeping just the final article or report in my working database.

More about tags: Yes, the next major upgrade of the DEVONthink applications will add tagging features. I will probably make little use of them, beyond the approaches noted above. I’ve been working around computer information systems since the 1960s. Back then, if you wanted to find something tagging was a necessity. Now it’s not a necessity, and often gets in the way of doing productive work. Tagging is not a waste of time for a few items. But if you are working with thousands of new items added to a new database, tagging requires a lot of drudgery by the user, and – especially important – tagging schemes can never be logically adequate for managing material that can be approached from many differing perspectives. It’s easy and perhaps useful to tag the photos and notes resulting from my trip to Malta last March. But tagging a research collection of reference materials is quite a different beast, and that’s what my main database is. I would find tagging much more useful for a different kind of database, such as my financial records including checking account reports, expenditures and tax records – I do use tags in that database. And I use tags within a "project’ group created within my main database, as noted above, precisely because a project group has limited objectives and I can use tags as mnemonic devices in that context. Will I ever attempt to tag all of the content in my main database? No, because I would find the effort counter-productive. I’ll confess that I usually have thousands of items that haven’t yet been organized in my main database, and I’ll probably never catch up with organization. I don’t want to be a slave to filing chores The database is still a useful working environment. Searches, See Also and See Selected Text are not limited by incomplete organization of material.

Third, DT Pro/Office as a research assistant: In the long-ago past I’ve been a research assistant. Later, I had research assistants. Typically a research assistant has the assignment of finding useful material and mining specific information out of that material in support of a research or writing project. Research assistants usually live a life of drudgery, involving finding material, reading it and extracting gems of information. Because research assistants (human) have to eat, clothe themselves and have some sort of shelter, they cost a considerable amount of money.

Whenever I received the work output of a research assistant I’ve had to do some reading, thinking and evaluation.

Nowadays I use DT Pro/Office as my research assistant. It’s amazingly inexpensive, but I rate it as perhaps the best one I’ve ever had. It manages my collections, rapidly searches material and I make a great deal of use of See Also for finding insights into relationships of ideas.

See Also isn’t trained in any discipline. But it’s very quick in looking for relationships among the uses of words in my large collection of references and notes. It makes a lot of dumb suggestions, but also a lot of suggestions that turn out to be very useful.

It’s my responsibility to understand the disciplines that I’m researching for a project, and to evaluate the suggestions made to me. Sometimes I’ll follow a trail of See Also suggestions. Once in a while I have a “Eureka!” moment – See Also has helped me grasp an insight – perhaps about population dynamics in an ecological setting, or perhaps a fundamental difference between environmental regulations in the European Union and those in the U.S. – that is surprising and useful, something I hadn’t thought of.

I’ve been building that collection of reference materials for years, beginning long before i started using an early version of DEVONthink. Prior to putting them into a database, each PDF or text or HTML file sat on my computer drive individually, each opening in its own application. Yes, I could search across files to an extent. But from the beginning DEVONthink included the ability to search for contextual relationships among documents. As my database grew (and it’s still growing) that feature has become more and more useful to me. It allows modes of interaction with the information content of a database that are not possible with any of the other Mac database applications noted above. That’s why I call DT Pro/Office the best research assistant I’ve ever had.

hwangeruk · January 21, 2008, 11:02pm

wow, thanks for the “astonishing” reply!

I totally agree with your view on tagging, it takes a lot of maintenance.
I also have had this “academic” discussion with myself when deciding when archiving my emails at work into a hierarchy, or just relying on skillful searching.

Hierarchy like tags takes an additional overhead, and I have found that dumping all my archive emails into one pot and using smarter searches proves to be more efficient in terms of time spent managing the data. I’ve yet to not be able to find anything, even if it takes 10 minutes of head scratching and searching, but filing would have taken cumulatively much more than that over the last 5 years of this particular company/mail.

You’ve really got my interest up. I need to tinker some more to see what I can get out of it
Presumably Spotlight is indexing in the background which I could turn off once my files are in DT? As I don’t really need to find files otherwise

Sounds like DT has matured over a period of time. I like iteration rather than revolution

Thanks for your input.