Simplify Scanning

We have been using Devonthink pro office for the past few years but other then simple scanning and putting files in folders we have not really made the software work for us.

I wanted to know if its possible to have rules or scripts setup that would automatically file the scans in the correct folder based on a word written on the paper being scanned.



The only possibility right now is to use scripting. Here’s an example that will file the selected items to various groups:

property pWords : {"DEVONthink", "DEVONagent", "Apple"}

property pLocations : {"/DEVONtechnologies/DEVONthink", "/DEVONtechnologies/DEVONagent", "/Apple"}

tell application id "DNtp"
	set theSelection to the selection
	repeat with theRecord in theSelection
		set theText to plain text of theRecord
		set i to 1
		repeat with theWord in pWords
			if theText contains theWord then
				set theDatabase to database of theRecord
				set theGroup to create location (item i of pLocations) in theDatabase
				move record theRecord to theGroup
				exit repeat
			end if
			set i to i + 1
		end repeat
	end repeat
end tell


What would that script actually do? How does it know what word to pick to be used for filing?

If you could explain that would be great.



In this case, it looks for one of the defined words in the text (acting as a Keyword). If it finds it, it creates a Group in the database and moves that record into it.

Note: This is a simple proof-of-concept script toshow how a mechanism might be built. This could stay simple or become incredibly complex.

Comment: Christian’s script does illustrate the possibility of turning over filing decisions to the computer, so that one never has to think about this chore.

Personally, I prefer making my own filing decisions, sometimes with the aid of the Classify assistant.

I do a lot of scans. The scanned documents tend to be of two types, financial (invoices, bills, tax records) that I file by category and year in a financial database; or papers, articles or books that relate to my research interests and that are filed by topic into an appropriate group in an appropriate database.

The design of my financial database makes filing very easy, and I don’t need the computer’s assistance to do that. For example, a receipt from Acme Brick would be dumped into a group created to hold receipts for 2014 (or other year in which it was paid).

The organizational design of my research databases tends to be more complex, based on filing items by their topics. This is done for my own convenience, not for the computer’s. There are usually a lot of groups, some of them with hierarchical structure. In some databases I have hundreds of groups. Occasionally, I find it useful to file a document in more than one group.

I often use the Classify assistant to make suggestions about possible filing locations for a new item in a research database. But I rarely use the Auto Classify assistant, as I do want to maintain some level of control of my topics. I would never turn over all filing decisions to the computer, as might be done using a script to look for keywords and make filing an automated procedure. In a short time, this would result in filing decisions very different than my own.

Let’s go back to that database that holds my financial records. Several years ago a user posted a neat script that looked into terms in the receipt and created a new name for each document based on Vendor, Amount and Date. As the Amount might be in dollars, Euros, etc. the script was designed to distinguish the currency used. As the Date might be written in variant forms, it was designed to recognize the possible variants and present them in a uniform format. Interesting. But when I played with it, there were sometimes errors, so I decided not to trust it, attractive though the idea was. For example, a script is likely to fail in attempts to identify the Vendor. Amount and Date errors are sometimes wrong.

But, with the simple organizational structure, I sometimes need to manipulate the contents of my Receipts 2014 group. For example, I may need to analyze the total receipts of a project as distinguished from costs of other projects. I don’t have many project categories, and a simple approach is to Tag by project. That will allow a search of the Receipts 2014 group by project.

Here’s a trick that I may use at tax time, or to do other quantitative analysis of costs. DEVONthink isn’t designed for much number crunching. Excel is. Sometimes it would be great to transform my cost records into a form that can be read by Excel.

I spend a bit of time manually renaming some or all of my receipt documents in the Receipts 2014 group. Each such document name will be in this format: VendorName Amount Date. In my case, Amount will be a number expressed in dollar units and Date will be in the form YYYYMMDD.

Example: AcmeBrick 324.96 20140519

Notice that the VendorName item doesn’t contain a space between Acme and Brick. That’s important.

If I use such a format in receipt names, I can then easily do searches for various purposes. For example, if I want to see all the receipts receipts indicating payments to Acme Brick by year or over all years in the database, I can easily do that. If some of those receipts are for different projects, I can break them out by project. If I want to look at just the receipts from Acme Brick in the month of May, 2012 I can do that.

Let’s say that I want to do a report of the total cost for a project in 2014. I do a search in that year’s group by the tag for that project (I’ll use the full Search window and the Advanced button). The result is a list of document names that meet the criterion. But that list is just text strings. How can we transform the list of document names to a form readable by Excel?

Select all the results. Press Command-C to copy them to the clipboard. Create a new text document formatted as plain text (PLAIN is important!). Paste the list into it, and save the document. Now do a search and replace operation on the document, searching for each Space and replacing it by a Tab. Save the document again. Now change the filetype to one reflecting tab-delimited text that can be read by Excel. Read it into Excel. Number crunch as you wish. Slice and dice by sorts, e.g., VendorName, Date, etc.

I keep a blank Excel sheet template among my Templates. When I want to do such number crunching I’ll call up that Excel template into my working area in DEVONthink and import the tab-delimited text. I keep the sheet in the database.