New User Quest - OCR/SCAN only avail on DTPO??

I am a new user–using free demo but ready to buy very soon. Can somebody clarify the OCR/Scan capabilities. On the DT comparison page it says that this is only available on DEVONthink Pro Office, but I’ve seen other places where (I thought DT Pro users were scanning. I am just an individual and don’t have a huge office but I would like to be able to scan documents into DT to make my own office paperless. Will it really cost me $50 just for this feature?

Thanks!

OCR is included only in DEVONthink Pro Office.

Thanks Bill,

I think I was confused by this, in part because I haven’t really used OCR in many years.

I have an HT OfficeJet Pro L770, which supports Mac. It has included software that allows you to scan and archive, and wireless direct digital filing to a preset network destination, all at the touch of a button on the OfficJet, so that if a preset destination is set up, the computer only has to be on to receive it. The HP does the rest. It can also duplex scan (I think), and has the ReadIris Pro OCR software built in. This is one of the reasons I bought this printer, but to date I have not used this feature.

Once the document is scanned, can I select DEVONthink as the program I want it edited in? –I don’t know if this is more a function on the HP side of things and it may or may not recognize DT, or should I be able to select any destination software that I want.

Do you think I could specify the destination folder as the DT inbox? If so, could I run the OCR software from there, or would I have to send it somewhere else first, run OCR software, and then drop it in the Inbox, or other DT destination?

Direct ditital filing is one of the main reasons I bought this aa-in-one, though I have not used it at all since purchasing it, in part I think because I didn’t have any sort of organizational and management system for the scanned data. DT seems to fit the bill perfectly, though I’m just trying to determine if I’m bypassing this cool feature of the OfficeJet if I purchased DTOP, or if, since I do have this advanced feature on the HP, would I be enhancing what I can do if I purchase the Office Pro version (or will DTOP work with the HP)? Or, if they won’t work together, it seems that I would have greater functionality by purchasting DTP and keeping the direct ditigal filing & OCR capabilities of HP & ReadIris separate and then moving docs into DT from there.

I’ve searchd at the DEVONtechnology site to find if my HP is directly compatible with DTOP, but I don’t find any information about this. So, since I haven’t used this feature and because I don’t know if my HP can directly integrate into DTOP, I’m seeking wisdom and feedback from some of you who are familiar with this aspect of the software. Based on my setup, can you tell me which option you think might be better––HP direct digital filing and then importing into DEVONthink Pro, or would my HP directly integrate into DTOP, the combination giving me a very solid set of features and capabilities.

Any advice or suggestions are greatly appreciated.

If you use the ReadIRIS OCR software, I suggest that you designate a folder other than the ‘Inbox’ folder into which to save the scanner output. (If you saved the image file resulting from scanning to the Inbox folder, IRIS would not be able to find it, as the image file would be immediately sent to your DEVONthink Pro Global Inbox.)

After OCR, you can capture the searchable PDF into DT Pro.

I’m going to offer some opinionated comments on the HP scanner. I think you’re talking about a similar scanner to mine (I have an HP OfficeJet L7780).

In short: that scanner is a massive pain in the neck and I won’t use it for documents ANYMORE, period, full stop.

The digital filing seemed useful. However, to make it work, you have to share a SMB filesystem out there. It doesn’t support AFP shares. The sheet feeder has a tendency to jam, and when it jams you pretty much have to restart the scan job from scratch, or try manually combining PDFs later, since it just says “Gee, I’m sorry, I quit.” The scanning is SLOW. It can scan at a pretty decent quality, which is good for images, but overkill for documents. For some reason, when I scanned from digital filing, it often got the output dimensions of the PDF wrong: I’d scan an 8.5"x11" piece of paper, and it would claim it was 17x22 or larger when the PDF came out. That gave me problems doing stuff in other programs like Acrobat… But it was doable.

The software HP wrote that came with it was junk, and most of it stopped working when Snow Leopard came out. I never used ReadIris (though they keep spamming me to upgrade) so I can’t say how well that works. The documentation suggests that it won’t OCR colored text, and it will convert text to b/w, but I don’t know if I believe that.

For me, the HP was decent if I only had to scan one or two things a month, but wasn’t worth using to try to have a digital office. When I really wanted to go paperless, I got a Fujitsu scansnap which fixes all the above problems. (See other posts in this forum about it.)

The good news for you is that just about any OCR software can be used with DTPro, and people here have probably used it. A common thing to do is have the scanner output stuff to the same directory each time (easy with Digital Filing) and have a folder action or Hazel watch the folder and kick off OCR, and then have it import into DT when it’s done. It works quite well.

The advantages with DTPO are:

  • You can OCR within DTPO, so if you have a PDF that you got somewhere that wasn’t OCRed, you can right-click to OCR it.
  • You can let DTPO manage the OCR queue
  • With the Scansnap, you can hit the button on the device to automatically scan and OCR into your DTPO inbox.

Advantages to OCRing outside DT:

  • You might already have an OCR engine and not want to buy another. For example, Acrobat Pro, PDFPen, Abbyy Finereader, IRIS, or even VueScan.
  • A different OCR engine might work better for you. Many folks like the Abbyy that comes with DTPO, but other engines may work better in certain situations, or offer more control.
  • Other engines might be faster. The Omnipage engine that comes with PDFPen, for instance, is multithreaded and can use all your CPU cores. The one on DTPO is not.

Just to add a note on the original question ($50 just to add OCR), the Office version also includes some additional features that make it much nicer to work with importing email and an import plug-in for Mail.app.

Thanks so much for the feedback. Ironically, since posting my last comment, I was wondering about OCR of existing PDFs, i.e. those you don’t scan yourself. So your point is a good one alanshutko, And I have the same exact OfficeJet as you––I typed the model number incorrectly in my post. And I was disheartened to say the least when I read about your experience with it. I haven’t even used that machine much as I tend to use my Color LaserJet 1600 for all printing, and scan/OCR was something that I was “going to get around to one of these days.” It never dawned on me that it would even be problematic. I’ve always been of the opinion that if it’s an HP printer, it’s the best. I’m very sad to have to rethink that, but thanks for the heads up nonetheless. You’ve given me very helpful information, though almost more than I can wrap my head around at the moment. That’s a good thing though.

I appreciate your detailed information on the setup. Unfortunately I’m stuck with it for the moment as I just lost my job (which is why I was hoping I wouldn’t have to buy DTPO), but at least I know what to expect. And it sounds like my best bet is to go with DTPO for long-term options and the ability to OCR existing PDFs, and think about a ScanSnap later as finances permit.

And Greg, to your point about the other enhancements of DTPO, I hadn’t really thought that archiving mail would be something that I would have a need to do, but I can not think of certain cases where that could be helpful. I was just really hoping to save some money, but your point is a good one. And I think that both DTP and DTPO allow multiple databases do they not? I can’t imagine needing to use more than a couple, but what are the constraints in terms of number of allowable databases?

And thanks to you too Bill for your help.

With ether Pro or Pro Office, the number of open databases would only be limited by the speed and memory of your Mac.

So I’m assuming you guys are using a ScanSnap? If so, which one. I was reading reviews of the S1100 and S1300, and they talk about the auto align feature of the S1100. Is this also in the S1300? One reviewer also says: “the software “sits and waits” between receipts you feed through, until you click “finished”. This is a FAR superior software feature to the larger S1300-- because it takes a few moments to put down & pick up another receipt. The scanner sits patiently and waits for you to feed through as many receipts or pieces of paper as you want, with no urgency.”

What do they mean by this? Does the 1300 some how turn off or if you don’t get the second page in fast enough, does it break up pages that you want to be a multipage PDF into individual pages?

Thanks!

I use an S1500M, which is great for heavy-duty large-load processing. The software has an option in Scanning to “continue scanning after current scan is finished”. This means the scanner will not close and save the PDF until the size limit is reached (up to 1,000 and/or 1 GB). Obviously, you can terminate this well before 1,000 pages.

Would you think that the 1300 has the same feature?

I’m also using a S1500. I believe the S1300 would also have the “wait for next page” feature, since that’s strictly in the software. I think all the Scansnap scanners automatically adjust for orientation, skew, etc. At least, the S1500 does, and I’m pretty sure the S1300 does as well. The main differences I’ve seen between the models is the speed of scanning and the size of the document feeder.

GeekyGirl, don’t get TOO down on the L7780. It’s a good printer and a pretty decent flatbed scanner. With normal weight letter paper, you can definitely use the document feeder for a lot of stuff. Since it’s all you have right now, it’s worth using. I really started to hate it when scanning magazine-weight paper because it jammed a lot more often.

So definitely try things out. You can probably get a pretty decent workflow going, but there may be more manual work required than with a scansnap. (For example, merging PDFs or rotating pages.) Fortunately, DT makes it very easy to make those changes.

Thanks. Don’t worry. I’m not going out to buy one now. I’m just checking out my options to see what would be best. That way I can keep an eye out for an sales or promotions, etc. I’m the type that researches everything to death and then goes out and buys on instinct. Yes, I know, very logical. But that instinct is always based on my research though.

I’m assuming that you meant that PDFs can be merged (and I’m assuming split as well?) within DT?

Did you choose 1500 over 1300 because you do a lot of high volume work?

I had downloaded the trial version of DT Personal, but based on this thread I’m going to try out the Office Pro version as I want to play with OCR with existing PDFs. I’m fairly certain that’s the one I’ll buy, again, based on the great feedback on this thread. I’m hoping moving files from Personal to OP won’t be too difficult.

Yes, you can combine PDFs (by selecting them, right-click Merge), split PDFs by opening them, selecting the page, and Right-click Split, rotate pages, reorder pages, and annotate them. It’s great.

I got the S1500 because I wanted the bigger document feeder. I cut up a lot of old magazines and scan them, and it’s a lot easier with the larger capacity feeder than it would be with a smaller one.

I’m pretty sure someone can link you to instructions on pulling your database from DTPersonal. As I recall, it’s a standard .dtbase that DTPO can open, it’s just stored somewhere in Library/?

Thanks to the DEVONtech search in the DEVONagent beta, here’s a link!

viewtopic.php?f=3&t=8048&hilit=migrating+database+from+personal

Thanks for the information. And I’ll check out the link.

I hadn’t thought about scanning old magazines. Wow, that gives me endless new ideas for how to use the software.