Classify Blank

Hello,

How long does it take for classify to start making suggestions? I just imported a folder structure of bills and financial data into a new database, and then dropped a few files that need organizing into the inbox. I expected DEVONthink would recognize the files in each folder and suggest where to file them, but the panel is blank.

Perhaps I’m being too impatient? Or, maybe my organization structure is wrong for DEVONthink’s AI?

Thanks,

Jon

Those bills that you imported, and the new files you’ve added to your inbox, are they PDFs? If so, do they contain selectable text? You can check this by looking at the file type in DEVONthink. It should be PDF or PDF+Text (or by trying to select text). If it is just “PDF”, then there is no machine-readable text and classify won’t work. If the files are all PDF+Text, then we have more troubleshooting to do!

Check to see that all of your bills are PDF+Text.

Yep, every file is a PDF, and all the PDFs are OCR’d before importing to DEVONthink. I use the ScanSnap and their bundled OCR software.

However, I was just reading this thread: Auto Classify failure and it looks like I’m having the same issue, and simply misunderstanding how the software works. Apparently the AI doesn’t work well with bills, so perhaps that’s the issue I’m having as well.

The poster in the linked thread indicated that his/her PDFs were not OCR’d.

When I look at the bills from my phone company, water utility, electric utility, security service, Internet service provider, etc. I conclude that other than the fact that all of them include my name and address, one or more dates and one or more dollar amounts, there’s little commonality in the other text contents of those bills. The terms each company uses to describe the services provided are different. Which is to say, there is little topical coherence among them in their contextual relationships, which what is analyzed by the Classify assistant. If I were to file each company’s bills into its own group (limited to bills from that company) in my database, Classify would become more useful – but I prefer not doing that.

Classify works wonderfully in some of my databases that organize their contents into topical categories such as scientific papers and policy issues dealing with environmental matters such as the effects of invasive species on populations of native species, environmental impacts of strip coal mining, etc. The contextual relationships among the documents contained in each such group become quite distinctive in the terms used, including the likelihood of associations of terms to other terms. DEVONthink’s AI algorithms “see” different patterns of text content in each such group, and can then make useful suggestions about where to file a new document that’s about, for example, a case history of strip coal mining problems. I’ve got hundreds of such topical groups in a database, and Classify a useful assistant.

But I could make Classify useless for that same collection of documents, were I to reorganize it by creating groups that were not topical, but instead were organized into groups each of which represented the date of publication of documents. Topical coherence (among the contextual relationships of the contents of each such group) would be destroyed.

Bill, thanks for the explanation, that sounds perfectly sensible. I’ve already got a system in place using Hazel for automatically filing my bills in the filesystem, so I think I’ll just keep that as it is and concentrate on using DEVONthink for my technical research.

Thinking about the topical groups, I see in my database now a few places I can help the textual analysis engine along. Thanks again!

(PS. I imagine you are probably getting tired of having to explain how the AI works… how many times have you had to do this? :slight_smile: