I can’t get Auto Group to work on OCR’d PDF’s. What’s up?
Only related documents are grouped. Could you send some examples you’ve tried to auto group to cgrunenberg - at - devon-technologies.com? Thanks in advance!
Before I do that, let me tell you my latest findings, this morning.
First I selected about 170 PDF’s (beforehand converted to PDF+Text), performed Auto Group and got only one group with about 6 related documents in it.
I undid that action and tried it again, now with only about 20 PDF’s (out of the original group of 170) selected. This time around I got much better results: 4 groups were formed, with correctly related docs in them.
Why did it not work the first time?
BTW: Thanks for your quick reply!
The command depends heavily on the selection and its contents. Therefore it’s hard to tell without having access to the documents.
I understand that the selection and its contents are crucial, but isn’t it strange that I get better results with a smaller selection out of a larger sample than that I get when I select the whole sample?
I tested this by repeatedly importing a group of 88 PDFs collected from the NY Times Civil War blog, then Auto Grouping the imported group. Results vary slightly each time. Always, around 40 PDFs (~45%) are not auto-grouped. Between 17 and 20 groups are created. Since the content has a high degree of similarity, I’d expect that they would all be auto-grouped, rather than merely the ~55% that are.
Also, because Auto Group gives no clue as to why its groupings are what they are, it’s impossible to suss out what’s different about the ~55% set of non-auto-grouped documents. It would be helpful if Auto Group threw tags on the grouped documents and/or created group names that included semantic clues as to the reason for that group.