The Classify artificial intelligence assistant works by comparing the contextual relationships of the terms used in the selected document to the contextual relationships of the documents contained in each of your groups.
The more coherent the contents of the documents in each group, and the larger such groups become, the fewer the number of suggestions will be made by Classify. However, if Classify doesn’t see significant similarity of the new document to the contents of your exisiing groups, it cannot make a suggestion.
We humans often organize documents in ways that don’t make much “sense” to Classify. Suppose, for example, that we create a group to contain articles written by a certain reporter, the author. But if his articles cover a wide variety of topics, thus with widely varying uses of terms, they won’t appear coherent to Classify, which looks for contextual similarities among the documents in a group. Classify looks ONLY at the textual content of a document. It doesn’t look at Name, Document Properties, Spotlight Comments or other metadata. I like that behavior. Perhaps one might better use Tags for our own purposes such as articles written by a certain author. Or create a group for documents written by an author, which holds replicants of items that had been filed by topic.
But if we create a group that holds documents about coal mining, Classify will fairly quickly begin to find similarities that will lead it to suggest that group as the possible location for a new article about coal mining.
BTW, identifying the size of a database by the disk storage size of the database is almost irrelevant (assuming you’ve got plenty of free drive and or disk image space). The single most important measure of database size is the total number of words contained in it. The next most important measure of size is the number of documents in the database. Those items are listed in File > Database Properties.
Try this experiment. Select a large searchable PDF file and choose Data > Convert > to plain text. The plain text file will show a file size (required storage space) that’s much smaller than the file size of the PDF. DEVONthink focuses on the text content of documents and metadata about them (such as Name, group location, tags, Creation Date, Spotlight Comments, etc.). As each text-containing document is added to a database, the document is indexed.
The Concordance contains a list of all the unique words (e.g., strings of alphanumeric characters ranging from 3 to 50 characters in length) in the database. The Concordance also lists the frequency of use of each unique word in the database. DEVONthink “knows” which documents contains each of those unique words.
If you want to boggle your mind, think about what the Classify AI routine is doing when you ask it to suggest possible group locations for a new document. It’s looking at the contextual relationships of the words in that document, and comparing those to the similarities of contextual relationships of the clusters of documents within each group in the database. It does this much better in a database with tens of thousands of documents and hundreds of groups, than it does when you first start a new database with only a small number of documents.