As a non-user who just happened to stumble across this software (but having read up on it by now) I would love to have your feedback/expectations of whether DT would be able to do what I intend… So the question is not how it is done, but would you expect it be to be possible (easily)?
Situation: I am the editor of a law journal. I have the last 20+ years of issues as pdf-files (10 per year, one per paper-issue (which is approx. 44 printed pages with approx. 250,000 characters per issue, practically no pictures)) and consider getting DT to make better use of the knowledge contained therein.
Intention 1: use DT for cross-referencing.
Imagine a new decision is handed down by a court. I am working on the text to publish it in an upcoming issue and wish to add references to cases as they were published in my journal (eg: a judgment from 2022 references a decision from 2007 and I want to say “that was published in my journal in 2008, page 27”. ). In order to do that I would want search the pdf files (without splitting them up on a text-by-text basis) for the case reference.
Intention 2: research/writing.
I want to write an article and wish to dig through the existing pdf files to come up with relevant information using Boolean operations and things like NEAR. For this I added search results from DevonAgent to the database. Would DT cross-reference DA search results to contents of the 20+ years of pdf?
Intention 3: own database.
There are numerous accessible databases which DA is not able to crawl (login, language, silly interface etc…). Imagine I download all documents from the database that contain a reference to section 1 Sales of Goods Act - say 200 texts in HTML. Next I download all that reference to section 2 Sales of Goods Act - say 180 texts in HTML. Next I … and so on. There will be hundreds of duplicates, because if the text says “section 1 and 2” the database will spit it out under the first two searches. Add, that I will search different databases so that the various texts will not be exactly alike (one database will abbreviate the decision, one will provide it “as is” and the third will add a comment to it). Will DT mark/highlight/notify such duplicates and provide a feature that allows me to get rid of all but one?
Thanks in advance for your thoughts!