Help on Classify (AI)

Hi
I just bought DTPO + DTTG after spending 50 hours of the trial. This is an amazing product I am using to move away from Evernote after 10 years suffering its horrible search capabilities until DT was properly capable of doing mobile.

However I am not getting anything out of the Classify functionality. I have migrated my first notebook from Evernote into a DT Database and creating the same folder structure I have been using in Evernote, but for any reason in the Classify drawer I either see no recommendation at all or just one group, always the same no matter the document.

To may myself more clear, the database is a 3 levels hierarchy structure with 5 main groups named

A1 - Legal & Taxes
A2 - Real State
A3 - Vehicles
A4 - Invoicing
A5 - Investments & Banks

Each of those has max 2 levels below.e.g A3 - Vehicles ->Ford or A5- Invoicing ->Hardware or A1 - Legal & Taxes --> PIT–>Taxes 2016

All the final nodes of the hierarchy as a max number of 60 documents with probably an average of 30-40

any document in the database, when I check the Classify is just pointing to A2 - RealState group with a very low confidence rating.

Any ideas or suggestions?

Does the group A2 - RealState group contain many or huge documents? What kind of documents do the groups contain?

Are these PDF files you are trying to classify?

Hi both and thanks for taking the time on this.

Most of the documents contained in this database are searchable PDF Files. Not only in A2 group, but across all the database I would say it is composed by 80% searchable pdfs

A2 is the biggest group both in terms of size

A1 - 170MB, 289 items in 3 subgroups (94, 18, 177)
A2 - 700MB, 208 items in 3 subgroups (14,7,187)

In this case, there is a 359 pages, 216MB searchable pdf (which btw is shown as pdf, not as pdf+text), a 20MB note with 3 huge photos, a 60 MB audio file (m4a) and about 7 searchable pdfs 20MB each. All these files are NOT directly into A2 group, but inside a 2nd level or a 3rd level group. The rest are normal notes and pdf files

A3 - 108 MB, 48 items
A4 - 16MB, 46 items
A5 - 170MB, 65 items

Does excluding this document from classification (see Info panel) improve the results? What’s usually the top See Also result?

Hi

Regarding the “See Also” feature, I must say it works remarkably well. The first document I see as a suggestion is always the same document i have selected (is this an intended behavior?), but the rest of the documents in the “see also” drawer are very good recommendations; e.g. an invoice for a furniture purchase result in recommendations about other invoices from the same store, to the same purchaser or even a furniture catalog of another shop, which is awesome.

About the pdf document, I am afraid removing from the Classify it did not help, so I did some extra tests:

1.- Removing the big pdf document from “Classify” in the info panel, did not improve the situation. I used the PDF Smart Group in the database to randomly select about 30-40 documents and the recommendation was always the A2 - RealState group. Not any sub group inside, but the top Group, which contain only subgroups and no document. I also created a smart folder for “no pdf” documents and did the same check, same results.

Maybe is there a way to force DTPO to reindex everything that I am not doing after the changes?

2.-Moving this big pdf document from the database into a new one I have created for this showed no improvement. All documents were showing “A2.-RealState” in the Classify Drawer

3.-Moving all pdfs bigger than 10MB to a different database changed nothing. Still “A2.-Realstate” on the Classify Drawer for all of the documents.

4.-I moved directly the A2.-RealState group into the new database. This automatically changed the Classify drawer to empty. I review about 80 documents randomly selected from the original database and the classify drawer was always empty.

Any ideas?

Check if the PDF is actually searchable. If OCR was done on Evernote’s servers, the data is not local to the file.

Hi Jim
The PDF is searchable. I can open it with Preview and search for any term inside.

However if you consider my previous post, it seems that this pdf has nothing to do with it, as removing it completely from the database has no impact on the recommendation. It is always “A2 - Realstate”

It seems that AI/Classify is not working for me at all as if I remove the whole group, for the rest of the documents on the database I get no suggestion at all, while the “See Also” feature is returning great suggestions.

So basically it is either “A2 - Realstate” o no suggestions at all on the Classify drawer.

Is there anything else I can do? Is there some kind of reset feature or something I can do on my side to the the Classify working, even with “low confidence” results?

Any chance that you could send us a copy of the database? Because it’s hard to tell whether it’s working as expected or not as the results depend on the contents/groups of the database.

I am afraid not, it contains taxes report, customer data and other sensible information. Is there any related info I can send you without sending the documents? Probably you have a way to extract database internal structure, data model, keywords, or similar things without sending the documents.

As the original documents are still in Evernote i will create a new database with the empty group structure and start adding the documents one by one to see if the drawer start showing meaningful results at a certain point, unless you suggest otherwise

You could create a copy of the database package, choose “Show Package Contents” in the Finder’s contextual menu and remove the “Files.noindex” folder from the copied package and zip the package afterwards. This would be sufficient to reproduce the problem.

Just let me know what is the best way to send you the database structure, do you want me to upload it here?

Thanks for the metadata! Almost all groups are excluded from classification (see Info panel), “A2 Real Estate” is one of the few remaining groups that isn’t excluded. The classification seems to work as expected (as far as I can tell) after changing this setting.

Thanks Christian!

All the groups and subgroups (with the exception of some of them) are a result of the import from Evernote.

I have repeated a third time the Evernote import, and I can confirm that when you import from Evernote, all the groups and subgroups created by the import script have the “Exclude from classify” bit set, and that’s been the root of all my problems with the AI

Thanks for the time and support!
Emilio

The next maintenance release won’t set this option automatically anymore.

Thank you! This has been driving me crazy for ages.