I have over 5000 files to file and would like to automate placement into the groups

I have about 8 years of downloaded pdf files, word docs, spreadsheets, etc. and have been sorting them into various folders on my mac. I have over 5000 still left (already sorted about 6000 or so). It would be helpful if i could place them into DevonThink and have the program automatically sort them into the folders (groups) that i have created, and am importing into DT3. otherwise it will take me another four weeks full time to sort each of the remaining files. i have looked everywhere, but cannot find an answer to this, unless it is so obvious that i am not seeing it. Any help is welcome. Thanks in advance. BN

Welcome @barrynewman

Note the AI isn’t prescient… or sentient so where things are filed is based on existing data in the locations you have manually filed to.
However as you’ve filed many already, the chances of good suggestions are higher.

Note: you do not have to wait to import the files. Classification is done on imported files. I’d import them into the Inbox of your database, select one and open the Tools > See Also & Classify inspector. The top section is where classification suggestions are made. You can even double-click a suggested group to file the current document into that location.

Thanks- but still have two questions (at least).
First, i understand that the AI is not perfect, but as you noted, i was hoping that the broad array of folders i already have would help that. But the inspector basically finds where the AI thinks it might go. There is a button to move the file to wherever you chose (i guess you could choose multiple groups). however do i need to do that for every file then? or is there a prior step where i can just say move it without the inspection. (I couldnt find where that option or command was, if it is there).
Second- what about doing this for multiple files? What if i selected say 30 files at a time and wanted DT to move all of them to appropriate places using the AI? can i do that as a bulk election? or am i back to moving them one by one, with the intermediate step of the inspector?

Another question- is there some way of better tagging all the current folders and files as they were brought into DT so that they were tagged, and matching new files because easier of more accurate?

If there is some resource that has all of this info, please advise, however i could not find it anywhere either on the users guide or in the support groups. thanks again, Barry

But the inspector basically finds where the AI thinks it might go.

100% correct.

however do i need to do that for every file then? or is there a prior step where i can just say move it without the inspection.

I would respectfully say, you should not be thinking about this until you’ve used the classification inspector enough to gauge what it’s suggesting.

Yes, you can classify more than one file via the Data > Classify command. But again, I would not be doing this until you’ve gotten acquainted with what the AI suggestions are across a range of documents.

The Inspectors > See Also & Classify in the built-in Help and manual covers the AI and this inspector.

Another question- is there some way of better tagging all the current folders and files as they were brought into DT so that they were tagged, and matching new files because easier of more accurate?

Define better tagging and what you mean by “matching new files because easier of more accurate?”

PS: I am curious… You seem in an awful rush to complete this and I would advise you to consider the process and ramifications. Are you on a deadline that requires all this be done by Monday? Are you okay with some things being classified in places you wouldn’t have classified them? Will you be spot checking these after the process is done?

Remember, DEVONthink, it is still operating on best guesses, even if those guesses are sometimes (or often) eerily accurate.

PPS: I hope my questions and comments aren’t read as brusque. I’m just trying to manage your expectations so you don’t end up with an unwanted situation later on. :slight_smile:

Thanks for the ongoing answers. While not on a specific deadline, i have too many other things, and have just spent the last three weeks, working nearly daily, trying to clean up the mess and organize my files so i can move on and do the things i need to do. (Heading a national task force, completing a survey and needing to do analysis, starting another survey, working on a qualitative study that requires interviewing 30 subjects, and on and on…) Being able to use the AI feature would be very helpful, but i agree that if it comes at the cost of losing track of these files, it is not worth it. However it will likely take me another three to four weeks to go through all the files individually and try to sort them. I should have been doing this all along, but as they say, life is what happens when you’ve made other plans.

I have tried the Data > Classify command and more often than not it returns failed in the log. are the files that i brought in within folders automatically classified when i do that, or do i need to force DT to classify them afterwards? is there some way to improve the tagging?(i.e. if the AI is learning, is there some means of having the AI assign more or better tags to each file so that when you go to search, it is easier to find the ones you need, and also easier for DT to recognize where to send a new file without a problematic manual search- thanks) Would i need to try to OCR them even if they are editable pdf’s or word files?
I think from what i can see, and what i am hearing, that i may just be better off completing my own manual sorting into folders, dumping the folders into DT and then later seeing if new files added can be moved in with the AI function. By then, it will have a large database on which to work. (I will be adding files into about 6-7 databases- sadly i have a lot of interests/responsibilities :wink: )

and while i have you, one other quick question- when it finds duplicates, i am assuming that it includes dups in other groups even though i intentionally added the file into sometimes two or even up to four folders. Is there some way of limiting the discovery of duplicates only if they are in the same group?
Thanks again for all your help.

and PS i was going through some other emails, and unless you have an evil twin, you have answered my emails as far back as 2017. (See how long it has taken me to get to where i am finallyh entering my files into DT? Sad i know, but as i said, life is what happens…) Thanks again for ALL your help.

1 Like

Perhaps I can add a bit of my experiences I gained when I adopted a collection of about 37000 “miscellaneous” documents that I had to sort through. They originally were gathered on a Windows machine, where the file system does not offer tagging, so the only sorting structure was nested folders. I essentially had to re-sort everything.

What worked for me, after some trial and error:

  • Import the entire collection in a new DT database.
  • Convert duplicates to replicants.
  • OCR all PDFs (select those where the file type is PDF, not PDF+text. Convert in small batches of about 100, as there seems to be a limit to the built-in library used for the conversion)
  • Now that the AI has text to work with from every document, start sorting documents manually until you have about 15 to 20 representative documents in each of your target folders. Think about similarity here: The AI can only measure “similarity” by looking at groups of words.
  • Start going through your unsorted documents one by one with the “See Also & Classify” inspector open on the right. It will show you suggestions where to file the document. If it guesses correctly, a simple and quick CTRL-C will file the document, otherwise, you can simply double-click on a suggestion or type your intended target above.
  • As soon as you find you press CTRL-C almost all the time, let the AI have a go on batches of documents.

That worked for me, and perhaps you can get something out of it… but be aware that I did not require perfect sorting, I was OK with creating replicants in several places in case a document fit into several folders, and I heavily use tagging and searching to find things. Your needs may vary. Good luck!

2 Likes

Thanks for the note. Quick quesstion, though, the pdf’s are not ocr’d but are text editable. do i still need to ocr all of them? I already have dozens of folders created with multiple files sorted into each of them so i may have achieved sufficient baseline. Also, as i have different databases (politics, healthcare, computer science, policing (i do work on that)., preparedness) can i sort these from one database into the others if they are all open at the same time? Or do i need to presort them into each of the respective databases and then continue. Thanks again, very helpful. Barry

The question is: can you find them when doing a contents-based search?

Also, as i have different databases (politics, healthcare, computer science, policing (i do work on that)., preparedness) can i sort these from one database into the others if they are all open at the same time?

Of course, you can. It’s really no different than the Finder in this regard.