"Classify" not working

jrzap · December 21, 2010, 5:06am

Hello, I know this must have been asked before but can’t find an answer and I’m desperate!

How does “Classify” really work?? I haven’t used my database in months, but I do remember that I could choose any file, hit the “hat” icon and there would be 10 to 15 suggestions as to where to move the file. However, now nothing happens! With some files, I am only suggested 3 or 4 groups (and I have many groups/topics), but with most of the files I don’t get anything at all.

-The files are not in the Global database.

-I am using Dropbox to sync with another Mac but they have never been on at the same time.

-My database 4gb

-I do have the latest update of DTP

-Using the new 11 inch Macbook Air

Please help. I need to “Classify” a couple hundred researched documents and I don’t want to be “manually” looking where to put each one!

Thanx

jrzap · December 22, 2010, 5:15pm

anyone?
please

Bill_DeVille · December 23, 2010, 4:16pm

The Classify artificial intelligence assistant works by comparing the contextual relationships of the terms used in the selected document to the contextual relationships of the documents contained in each of your groups.

The more coherent the contents of the documents in each group, and the larger such groups become, the fewer the number of suggestions will be made by Classify. However, if Classify doesn’t see significant similarity of the new document to the contents of your exisiing groups, it cannot make a suggestion.

We humans often organize documents in ways that don’t make much “sense” to Classify. Suppose, for example, that we create a group to contain articles written by a certain reporter, the author. But if his articles cover a wide variety of topics, thus with widely varying uses of terms, they won’t appear coherent to Classify, which looks for contextual similarities among the documents in a group. Classify looks ONLY at the textual content of a document. It doesn’t look at Name, Document Properties, Spotlight Comments or other metadata. I like that behavior. Perhaps one might better use Tags for our own purposes such as articles written by a certain author. Or create a group for documents written by an author, which holds replicants of items that had been filed by topic.

But if we create a group that holds documents about coal mining, Classify will fairly quickly begin to find similarities that will lead it to suggest that group as the possible location for a new article about coal mining.

BTW, identifying the size of a database by the disk storage size of the database is almost irrelevant (assuming you’ve got plenty of free drive and or disk image space). The single most important measure of database size is the total number of words contained in it. The next most important measure of size is the number of documents in the database. Those items are listed in File > Database Properties.

Try this experiment. Select a large searchable PDF file and choose Data > Convert > to plain text. The plain text file will show a file size (required storage space) that’s much smaller than the file size of the PDF. DEVONthink focuses on the text content of documents and metadata about them (such as Name, group location, tags, Creation Date, Spotlight Comments, etc.). As each text-containing document is added to a database, the document is indexed.

The Concordance contains a list of all the unique words (e.g., strings of alphanumeric characters ranging from 3 to 50 characters in length) in the database. The Concordance also lists the frequency of use of each unique word in the database. DEVONthink “knows” which documents contains each of those unique words.

If you want to boggle your mind, think about what the Classify AI routine is doing when you ask it to suggest possible group locations for a new document. It’s looking at the contextual relationships of the words in that document, and comparing those to the similarities of contextual relationships of the clusters of documents within each group in the database. It does this much better in a database with tens of thousands of documents and hundreds of groups, than it does when you first start a new database with only a small number of documents.

korm · December 23, 2010, 5:25pm

Are your documents scanned text or images? Images won’t classify.

If text, are they PDFs or some other format? If PDFs are they OCRd or not OCRd? PDFs not OCRd will classify poorly or not at all.

jrzap · December 23, 2010, 6:00pm

Thank you!!!

But my problem is that I either don’t get any “suggestions” or only the same 4 groups over and over again.

I’ve tried with documents that I know I would to get specific group suggestions (because in the past I have auto-classified similar documents and gotten a specific group) and I only get the top folders where those groups are located (the groups I know should be suggested because they contain similar data and have been suggested in the past)

So I don’t know if inadvertently I changed something or there my memory is failing and there is something I am not doing right.

Any help is appreciated.

padillac · December 23, 2010, 9:06pm

I’ve found this approach useful but irritating because if I replicate the document first, and then use classify on the “original” item, the replicant I just created gets deleted. So what’s the best approach for this? Right now I use the tags field to manually enter the name of the author group, and the group name suggested to me by classify. But then that leaves the third “original” replicant in place that I then have to clean up.

I’m interested in hearing about a clean and straightforward approach to using replicants with classify.

korm · December 24, 2010, 11:49am

Following up on my last post, could you tell us about the documents you are trying to classify (and which fail)? What kind of documents (.doc, .rtf, .pdf… or what?). Are the documents images or not?

Have you run Tools > Verify & Repair on your datbase to see if there are errors to fix?

Have you checked Tools > Concordance to see if the concordance seems reasonaby complete (it is, as Bill points out, a factor in classifcation)?

You might attempt Tools > Rebuild Database.

fmottaz · August 20, 2012, 8:13am

I have a smilar problem. I have four large databases where classify works without problem since the beginning. I use it a lot and 99% of my documents are OCRised PDFs.

In the latest database, I only get one suggested group. I first thought I needed to feed more items to the database manually until each group would have enough data. To no avail.

I’ve repaired, optimized, but I’m still stuck and really need this function back.

Greg_Jones · August 20, 2012, 9:35am

Check your groups using the info panel, and make sure that ‘Exclude From Classification’ is not checked.

fmottaz · August 20, 2012, 9:47am

Thanks a lot, that didi it, but how could this happen ? I’ve never checked that box and did not even know it existed until today.

Greg_Jones · August 20, 2012, 10:17am

Did you originally index these groups? If you index folders from the Finder into DEVONthink, the groups will have classification turned off by default.

fmottaz · August 20, 2012, 12:31pm

Thanks for your reply, but no. I only use internal storage with DTP. Guess it is a glitch or a case of mysterious spontaneous activation (read user mistake somehow). But now I’ll know where to look if something like that happens.

RobMcC · February 1, 2015, 7:09am

Problem: Concordance and Classify & See Also no longer work.

I have always kept them open when adding files, but I have not used Devonthink for a few months. Also messily transferred it across computers while retiring from academic employment.

My files are all pdfs and I always scan any that are image only.

Any suggestions? Should I rebuild my databases? Is this a risky thing to do?