Good Afternoon! I am new to DT and have a couple of questions.
I imported about 90 PDF’s. I simply dragged them in bulk to the DT icon. I was unsure how to have the PDF’s be imported via OCR to be able to search them. So once they were imported, I selected them all and then went to Data > Convert > to Searchable PDF.
What I ended up with was (2) copies of the PDF in DT. They both say “PDF + Text.” However, one has yesterdays date, and the other is the original date the PDF was placed on my computer (I assume this is the original at any rate). The size difference is different. For example, I have one PDF + Text that is 253.7 KB (which is the original I assume based on the date stamp), and another PDF-Text that is 4.4 MB with yesterdays date. The two PDF’s also have different number of words, and it is not consistent. For example, in the referenced PDF’s above, the original has 8639 words, and the one with yesterdays date has 8650. However, with another PDF it’s reversed: The original has 9082 words and the same PDF that has yesterdays date has 8743 words.
I’m confused as to how the word counts are different (in general) and inconsistent across original PDF’s and those that are the same but dated yesterday. I also don’t need them both and I’m unclear why this occurred and which one I should keep. Could you advise what I did wrong? Ironically, when I ask DT to see if these are duplicates, it doesn’t see the double-PDF’s that way–though clearly they are except one is much larger in size with a different word count.
If the import only says PDF, is it fair to assume it has not been OCR’d? I tried selecting Convert > to Searchable PDF, but it remained saying in the “kind” it was only a PDF. Can you advise?
What is the best way to import existing PDF’s from my HD into DT, and have them automatically imported so they’re searchable (using OCR)?
What is the pro/con’s of having one, massive DT db, and use groups for different things (e.g., journal articles, GTD reference, personal finances (i.e., tax returns, receipts, etc), versus, multiple databases (e.g, db for all research/journal articles, a separate db for GTD reference stuff, a separate db for personal finances, etc?
Finally, if I import an entire folder of documents, does the folder become a group by default or do I need to select the folder and tell DT it is a group?
In advance, many thanks for your help!
Smitty