Import or Scan In

Hello, I am a bit confused about how to use DT Pro. I have quite a few of invoices/statements that I scanned as PDFs with my Scansnap over the last few years. Some are OCRd with PDF Pen and others I haven’t gotten to it yet. I also have quite a few documents that I need to scan into PDFs somewhere. I am unsure how to proceed. Should I scan directly into DT and OCR or scan as a pdf in my home folder and OCR with PDF Pen and then add to DT or what? Also not sure how to handle all of the existing pdfs OCRd and those not yet OCRd but just sitting in home folder. Will I have duplicates, those in my home folder and again in DT?

lifehacker,
I too am new to DT and have quite a mix of pdfs. Although this may not be the answer to your question it may assit in helping you to make a decision. Here is a link to another thread:
viewtopic.php?f=2&t=11564&p=54301&hilit=scan#p54301

Like myself, it sounds like you may have to import your documents multiple ways. Most of my pdfs did not have OCR performed on them when they were created therefore I imported by Choosing File > Import > Images (with OCR)
My newly created pdfs were imported through Scanner-initiation: allowing the scanner’s software to perform the OCR and then sending it to DEVONthink.
hope this helps

Also

I scan directly into DTPO and OCR from there. The Abby OCR engine is high quality, and you save that extra step!

For any non-OCR’ed docs you have, just dump them into DTPO, right-click and “convert to searchable PDF.” You may also want to take a look at the OCR tab in preferences. I like to keep the conversion quality at 100% and move the original to trash.

Can you explain by dump them? Do you mean import? Does this create two copies, the original and a dt copy?

Yes, by “dump” he meant Import.

It’s up to you whether to keep the original image-only PDF or not. See Preferences > OCR. There’s an option to move the original to Trash after OCR.

Note that you may add a sortable “Kind” column to a view window to help identify image-only PDFs than you may wish to run through OCR in order to convert them to searchable PDF. To add that column, choose View > Columns > Kind.

The Kind of an image-only PDF is “PDF”.

The Kind of a searchable PDF is “PDF+Text”.

If I scan directly into DT can I have access to the pdf that I can use outside of DT in case I ever used something else? I am not big on being tied to one application with my data but rather be able to use it with any application that opens, searches pdfs? In other words, does DT create a scanned file that only it can use or can I open with anything else? If its available to other apps do I need to export it out of DT first (like out of the database) or are in some folder?

The format of the DT database is basically contained within a folder. As long as you can locate the path within the folder you can get at the PDF. I have even used this on my Windows machine, I just sync over the database and use another search tool to locate the document.

On a side note, has anyone found anything half decent for Windows? I am trying Benubird but it’s not even close to being half as great as DT…

I think I found the answer I was looking for. Playing around with DT it looks like you can export the files. I did an export to a folder on my desktop and the 3 files in DT were exported their. Any downside to doing this to create a backup of just the pdf’s? Just strictly speaking from a point of view of having the pdf’s outside of DT.

Well, it’s more work for no benefit, really. You can just back up the whole database and even if the disastrous happens and you have to get all the PDFs out you can easily do so.

Note that you can forget about file locations within DT, if you want a PDF somewhere else (for example, on an iPhone, or a USB stick) you can just drag and drop the file from DT to the destination. I have done this to put files in GoodReader on my iPhone.

There is one advantage to using an Export as a backup. If, for some reason, you could no longer run DEVONthink, an exported heirarchy would keep the group/folder/directory structure you had created. Accessing the files within the .dtBase package would lose the directory structure.

I don’t think this scenario is sufficiently likely to worry about, but other people may think differently, depending on the value they place on that organization.