An oldie but still pertinent…
If you are collecting hundreds of documents a week but not reading them, then you are a librarian or an archivist. Which are noble occupations! But being a very good archivist is not the same thing as being an expert in whatever your subject is.
These days I’m not quite as interested in knowledge, I’m more interested in understanding.
As for the OP
Theoretically, there are many ways to organize information:
-
Classification. You find criteria by which one multitude differs from another. Starting from top and all-including and moving down to atoms. Like in biology (in pre-molecular biology it was a major scientific method)
DT solution: groups. A very powerful tool in DT, even if all files in there will be named like File1, File2, File3 and so on. It also may be done right in the filesystem with folders, but with one shortcoming - any change in name of any subfolder in the hierarchy will brake all links to the files. Not in DT. -
Multiple classification. When you cannot make a single hierarchy, and criteria are not mutually exclusive. Exact representation may depend on a situation, or view angle
DT solution: groups, replicants, tags, groups as tags. You cannot do it in filesystem with just folders without duplication of information. But it is easy with DT - you can make multiple representations of the same files, depending on the situation, build alternative hierarchies, not duplicating files, re-grouping on the go without corrupting any links. Groups as tags make it very easy sorting new files/groups/reps into those groups. -
Indexing. Coding information with specific address data. Almost all is done automatically behind the scenes, except metadata.
DT solution: core functionality of DT - power search tools, see also, metadata, custom metadata.
Some users on this forum go extreme and place all files in one or several groups for all DB - so powerful search functionality are in DT (advanced search, smart groups, see also). Some argue, that it is much faster than browsing group structures. -
Networking. Linking arbitrary pieces of information.
DT solution: wiki-linking, linking any DT items using stable universal Items IDs in DT and outside with URI-schemes, info in inspector about incoming and outgoing links and mentions.
You can build anything from interactive tables of contents to complex neuronets with your DB info, DBWideWeb )
Yes, Albert Einstein was such a librarian in patent bureau at first
Seriously though, I just mean that not seeing this “boundary” as a necessary step from nothing to knowledge is not a lesser problem than stopping for too long at it.
Literally, knowing begins from defining what you don’t know. You have to acknowledge first that it exists, you just don’t know it (but you “know about it”). That’s it.
When your little child come to you and say that he understands nothing, it often means: “please do my homework for me”. You just ask: I will help you if you tell me what exactly you don’t understand. Trying to answer your question he may just come to understanding that all is clear and nothing to ask for help with.
I’m pretty sure that the documents he handled at the patent bureau made little or no contribution to his theories.
It’s a joke of course )
But who knows (see below)…
Yes, but after you are clear about what knowledge you need to understand, what may be in part and what you don’t need at all.
«…
It is conceivable that his labors at the patent office had a bearing on his development of his special theory of relativity. He arrived at his revolutionary ideas about space, time and light through thought experiments about the transmission of signals and the synchronization of clocks, matters which also figured in some of the inventions submitted to him for assessment.
…»
Galison (2000), p. 377
Stay on course, everyone
thank you everyone once again, there are so many things covered here i will need time to ingest all these different approaches (though some overall common approaches i will start to implement, like not dumping every single thing automatically into devonthink )
I do want to emphasize that while i am a full time academic, i guess im still old school with the way i collect and store academic related data (papers, data, figures, manuscripts) and this make me think i need to revisit that at some point as well…
My concern with this initial post (and thus the bankruptcy part…) relates to all other non work/academic aspects of life such as:
- my PKM (personal knowledge management) where i have many files (.md/pdf etc) on home maintenance, programming, Mac, food recipes, tourism etc etc etc
- all the tens of bills, mortgage, tax, tech invoices (and here also work reimbursement i guess) that appear daily in my email/web
- family stuff like health insurance, documents, house etc
these are the areas i literally get tens of emails/web downloads a day (im sure its not just me :)) where up until now they all go into a the devonthink inbox and then when i have the courage (and time) i manually sift through them (or if possible and the PDF is in english use the amazing smart rules)
thats where i was targeting the automation solutions initial questions on approaches. Most of these “life” documents have to be stored for one reason or another long term in devonthink
appreciate all the responses again and happy to hear more approaches (And some code/real life examples) if people have found zen approach to all this !
Z
- my PKM (personal knowledge management) where i have many files (.md/pdf etc) on home maintenance, programming, Mac, food recipes, tourism etc etc etc
- all the tens of bills, mortgage, tax, tech invoices (and here also work reimbursement i guess) that appear daily in my email/web
- family stuff like health insurance, documents, house etc
To me, this immediately suggests multiple databases.
#1 feels like a catch-all database or multiple, like recipes, interests, etc. Also, a home database could be useful or integrated into #3.
#2 Finance or business, potentially one database for each
#3 A family database, potentially a health and a family database.
Also see the Building Your Database section of the built-in Help and manual.
I think these divisions are even more important when mobile is involved. There is no need to carry all your data, all the time, e.g., on your phone. Do you need a PDF about San Juan Capistrano when you’re at the doctor’s office? How many times have you been asked for your 2007 tax return? Do you do your finances on your phone, sitting in a Starbucks?
Watch someone with an out-of-control wallet or purse, fumbling through nonsense they don’t need to carry, desperately trying to find something: lipstick, driver’s license, kleenex, aspirin, whatever. How effective is having all that extraneous stuff ? And before someone jumps onto the “Well, this is digital so that doesn’t apply!!”, I’d plant my flag that is certainly does apply. Having a bigger wallet doesn’t mean you can should just add more junk to it. It just means you have some extra space should it be needed.
A strength and weakness of DEVONthink is it allows you to efficiently gather and store tons of data. Not all that data is useful, and much of it is impermanent.
Well actually…as an insufferable coffee snob, only in a Starbucks under duress, but,
I am always dealing with receipts* on my phone. I have an iOS shortcut that asks me for some basic values (total, tax, tip, vendor, category) and then formats it as text in the comment section and adds the PDF to DTTG.
Then on desktop a script extracts the comment and enters the values into Custom Metadata fields and files by date and category.
It’s a great way to make use of little bits of time and to avoid too many loose ends.
Right now I am experimenting with adding a link to receipts for events (meals, movies, concerts, plays) to my daily journal. So I can look back and see how the day went and what I was up to even if it didn’t warrant an actual journal entry. Sometimes I just watched a movie and didn’t have anything to say about it.
*Receipts come as scans of paper, texted links to a webpage that I can screenshot, emails,
Amen! Decent lattes, but dreadful coffee IMO.
Back in the days when Katie was still doing Macpowerusers, she and/or David talked about using Hazel to automatically rename and file documents. If I recall correctly, s/he (or they) used Hazel to OCR (if necessary) then extract the date and name of service provider like the electricity company or the Bank etc.; rename the file in the chosen format - I think yyyy-mm-dd and the name for example, then file it.
Macsparky might have better instructions in his Hazel or other book(s).
I have been using HoudahSpot for global search.
Since MacOS Sonoma, it no longer searches Apple Mail.
Does Foxtrot really search emails in Apple Mail too?
If so, given I was perfectly happy HoudahSpot (until it stopped searching Apple Mail) do I really need Foxtrot Pro instead of Foxtrot Personal? I am usually just trying to find a document on a subject.
Hazel has no OCR function built in. You must use other tools for that, perhaps in connection with Hazel.
My suggestion for you to invest some time on Tesseract. It has the potential to ocr almost any language in the world.
I work on Amharic. I use Tesseract to ocr Amharic documents and search them using Foxtrot.
I find the personal absolutely worthless. The pro is the way to go.
I’m afraid I can’t remember. I don’t use Apple Mail, I use MailMate, which is perfectly searchable.