I wish for two features (which are in EagleFiler):
Documents in DEVONthink Pro are organized in hierarchical folders on the hard drive which reflect the hierarchical organization in the DEVONthink database. Currently, DEVONthink Pro organizes the documents in arbitrary folders in the finder. Without opening the DEVONthink Pro database, it is difficult to find the document you want.
Have a re-scan action to rescan the document folder for stray files. When using Skim, Skim creates a skim file in the same directory as the PDF file that is read. This skim file is not listed among the PDF files in the database, though it exists in the finder. It would be nice for DEVONthink Pro to be able to re-scan its document folders for stray files.
I would like DEVONthink’s Groups to mirror the Finder’s folder structure containing the documents.
Thus, when I create a new group in DEVONthink, a new folder in the Finder appears. When I nest that group inside another group, that folder is moved in the Finder into that other group’s folder. Essentially, I would like DEVONthink to be able to manipulate the location of files in the hard drive after importing them into the database.
Currently, if I create an indexed database by selecting “Index…” and choosing a folder of files, that folder’s structure is static in the DEVONthink database. If I subsequently move a document to another group, its location in the Finder/Harddrive does not change. This is not what I want.
Other document organizers such as EagleFiler and Together - which do things that are similar to DEVONthink - are different from DEVONthink in mirroring the Finder’s folder structure in their groups. If a new group is formed, a new folder in the Finder is formed. This way, they allow you to find a document in the Finder WITHOUT having to open up the database. This is VERY HANDY when sharing databases remotely. One does not have to have users of the database own a copy of EagleFiler or Together to find a document. One can also simply use the Spotlight to find the document. One can even use one’s iPhone or iPad to remotely retrieve a document.
This currently cannot be done in DEVONthink. DEVONthink organizes the documents in arbitrary numerically ordered folders as they are imported. When I import a PDF which I have chopped into numerous individual chapters, the chapters are strewn over several folders by DEVONthink. I simply can’t go to the Finder to find the chapters neatly organized in a single folder.
Thus, DEVONthink’s internal organization of its documents is a HUGE MESS. One is highly dependent on DEVONthink’s database to find one’s documents.
And if this database becomes corrupted, the documents are in a messy folder organization rather than one already reflecting one’s group hierarchy.
It would be far easier and far more logical and far more useful for DEVONthink’s groups to actually mirror the folder structure containing the documents.
This is why one long wish since buying DEVONthink years ago is to have this folder hierarchy feature that other competing applications do.
Create the group in DEVONthink, nest it inside another group, select the nested group, right-click, and select ‘Move to External Folder’. A folder will be created in the Finder inside the parent folder. You can move documents around in a similar manner, by moving them into the database, move them to another group in DEVONthink, and then move them back to the external folder. You can attach a script to the group in DEVONthink to automate the process of moving the documents and groups to the external folder(s). Sure, it’s not as simple as why I suspect you would like, but it does work.
The DEVONthink database structure allows you to replicate documents into different groups and also allows you to have documents with the same name in the same group. I’m not familiar with EagleFiler, but I do know that Together is not capable of doing that. Over the years I have exported databases and run them extensively as indexed, and each time I end up importing them again because having the documents directly in the Finder is just too confusing. The ability to replicate documents into multiple groups is a very powerful feature of the program to me, and your documents will never have a 1:1 DEVONthink group/Finder folder relationship if the database groups are mirrored in the Finder.
That is a thoroughly awful way to work. One of my DEVONthink databases has 80 gigabytes+ of information. This technique would force me to duplicate the files every time I want to reorganize files in the DEVONthink database.
This is why it is far simpler for DEVONthink to move the documents and folders around to maintain a hierarchy in the Finder which is similar to the hierarchy in the database.
Replicating documents in DEVONthink is not truly replicating the document. There is only one document store in an arbitrary folder in the actual database.
What DEVONthink does is create aliases of all of its documents - not replicants. And it is the alias you are manipulating in the database. Worse, it doesn’t even italicize the Alias like the Finder would - creating a confusing abstraction. For example, when you trash one replicant and not the others, you do not destroy the original file or even a duplicate of the original, you are destroying the alias - the pointer to the document.
I agree that aliases/replicants can be useful in organizing information. They physically organize the information in a way that cannot be concretely done with tags.
However, I wish I could have my cake and eat it too.
What DEVONthink could do for example is to have a preference setting to allow DEVONthink to organize the real documents hierarchically in the finder to mirror at least one hierarchy created in the database.
The replicants also be instead called aliases and indicated by italicization. Then one can create as many as one wants to organize the information. But at least one knows where a document will really be in the Finder since the non-italicized version would correspond to the real file and its location in the group/folder hierarchy would mirror its location in the Finder. (Thus, with my solution, every group/folder created in DEVONthink would have a corresponding folder in the Finder).
This way, by creating aliases rather than “replicants” which are not replicants since they are not file duplicates, one is more realistically modeling what is actually happening to the documents and not creating confusion.
I suppose we will need to agree to disagree. I believe it would be utterly confusing to change DEVONthink to mirror the Finder’s alias-type system, where if you do delete the original, all of the aliases are now broken and the document is gone. Using a mirrored Finder organizational system, all documents captured to the database Inbox would be the ‘original’ document. If the user then organizes the database by creating replicants (Finder alias) of the document in one or more groups, then the database folders in the Finder would show the original document in the Inbox and all the alias pointers to the original scattered in their respective folders.
However, the user doesn’t want the document remaining in the Inbox-(s)he deletes it after it has been replicated. Since the document in the Inbox is the ‘original’ the document has also been deleted from the database (as well as from the Inbox folder in the Finder) and all of the replicants that were created in DEVONthink are now an alias pointer to a nonexistent document. Let’s assume that DEVONthink would warn the user that an original document, with alias replicants, is about to be deleted (the Finder is not as forgiving). Then what is the user going to do-locate where one of the alias replicants is located, and move the original to that group? You think the example I gave on indexed files is an awful way to work with 80+ GB of data? This scenario would be much, much worse.
I believe Christian is pretty smart and I suspect that he thought through the various possibilities when deciding on DEVONthink’s database format. It’s extremely scalable, fast, and dependable. As I mentioned earlier, I have no experience with EagleFiler but I have played with Together-I gather that you have tried out both? How well did they scale to accommodate 80+GB of data? I didn’t have acceptable performance with Together, as it began to choke for me at about 3k documents.
I neglected to comment on this-I believe the above is partially correct while also partially incorrect. The statement that DEVONthink is not truly replicating the document is accurate, as there is only one document stored in an arbitrary folder in the database. However, it is more correct to state that all documents in the database groups are aliases, as they are all pointers to the actual document that is stored in the arbitrary folder of the database.
When creating a replicant, DEVONthink is actually creating a replicant of that pointer. Delete an occurrence of a replicant, then the app deletes that pointer. The user does not need to be concerned with which occurrence is the ‘master’ and which occurrence is an alias-they are all equal. Delete the last pointer and the document is also deleted from the database.
DEVONthink does assign priority to the first pointer of a document with replicants, as the first pointer will be the occurrence that shows up in a document search. Other than that, the way that DEVONthink handles replicants is completely transparent to the user.
We do not have to agree to disagree. With the features I am proposing, you can completely do what you are already doing currently. The changes would be completely invisible to you.
Again, with the features I am proposing, you can completely do what you are already doing currently. The changes would be completely invisible to you. Thus, they would not be confusing at all.
I am not proposing creating aliases in the Finder. I am simply proposing that the creating the folder structure in the DEVONthink database creates an identical folder structure in the Finder.
Then, the first replicant of the document (using your term for replicant - first replicant means the original representation of the document in DEVONthink) can point to the location of the document in the folder structure.
Moving the first replicant to another folder in DEVONthink will cause DEVONthink to move the document to the corresponding folder in the Finder.
This causes the structure of the DEVONthink database to have a mirrored structure in the Finder.
Subsequent replicants can be created (second, third, fourth, etc.). Moving these replicants does not move the actual document in the Finder’s folder hierarchy.
All of the replicants point to the actual document in the Finder as they currently do.
If the first replicant is deleted, the the second replicant takes its place. This second replicant becomes the new first replicant. Moving this replicant in the DEVONthink folder hierarchy will move the actual document to the same place in the Finder hierarchy.
Note that not everyone uses replicants. I don’t. I use duplicates instead. The advantage of duplicates over replicants is that I can highlight the text differently in each copy. Replicants are an attempt to save disk space. But with multi-terrabyte hard drive storage systems, saving space is not as necessary.
For those that don’t use replicants, the DEVONthink database exactly mirrors the organization of documents in the Finder.
The advantage of what I am proposing is that one can easily find and share documents easily without opening up the DEVONthink database.
What I am proposing would not create aliases in the Finder.
I am not proposing creating an alias in the Finder for every replicant.
There would be no scattering of aliases in their respective folders.
This point is a misunderstanding of what I am proposing.
If there are two or more replicants, the original replicant is designated the first replicant, the subsequent one is the second, the next is the third, etc.
All replicants point to the same document in the Finder - as they do now.
Only the designated first replicant represents where the actual document is in the Finder folder hiearachy, which mirrors DEVONthink’s folder hiearchy.
If the first replicant is deleted, the second takes its place and becomes the designated first replicant.
All of this happens in the background, managed by DEVONthink.
From your perspective, there is not difference in how DEVONthink currently works and how it would work with my proposed change.
The difference to me, however, is that DEVONthink becomes far more convenient.
Another benefit of my proposal: One can locate the location of the folder structure and files OUTSIDE of the DEVONthink database itself. This makes the documents even more accessible to others. It makes synchronization easier. One doesn’t have to open the DEVONthink database “package” to see the original files. Locating the files outside of the database is how Sente, for example, manages its documents.
This leads to another benefit of my proposal: If the documents can be located outside the DEVONthink database file, then one can locate the documents with Mac OS X Spotlight. Currently, they are hidden in the DEVONthink database package.
Currently, you can’t find the document unless a reference to it is temporarily stored in the DEVONthink /library/cache.
For example, one file I have “Soil Mineral Deficiency” has a reference to it in “~/Library/Caches/Metadata/DEVONthink Pro 2/DC470C81-C366-4339-A012-0CDD5B8E3367/c/9CE6910B-F099-41AE-BADA-1CDA4F58D0E2.dtp2”. This was found by Spotlight.
But Spotlight did not directly locate within the database itself, which is: “/Users/Shared/Shared Files/Definitive Mind Data and References/DEVONthink Database/DefinitiveMind.dtBase2/Files.noindex/pdf/0/Soil Mineral Deficiency.pdf”
The big disadvantage to storing documents in DEVONthink is that they are hidden to the rest of the operating system unless DEVONthink creates a cache reference file to it.
Finding documents in your DEVONthink database without having to have DEVONthink open is very handy.
With what I propose, DEVONthink would not delete the original document unless it was deleting the last replicant.
Thus there is no change from the original behavior.
What I propose does not change anything to what current users expect of DEVONthink in its behavior.
What I propose is simply a significant new additional benefit to using DEVONthink.
The new features I am proposing would NOT slow down DEVONthink at all. They would be transparent to current users.
With the large files I have, DEVONthink’s memory requirements grows to much larger than 1 gigabyte of RAM when several files are open. This can slow down the rest of the Mac if I have to keep DEVONthink open all the time and run out of memory, forcing the Mac to run off of hard-drive swap files. In this scenario, it is an advantage to be able to use documents in the DEVONthink database from time to time without having to open DEVONthink.
DEVONthink is becoming slower when opening a database as they become larger. It is also become much slower when I have to open up several large databases.
The advantage of what I propose is that I can find and open up documents in my DEVONthink database without having to open the database itself.
For example, I can open up one working database in DEVONthink while accessing the documents in the other databases I have for DEVONthink, without having to open and close them.
This speeds up work in DEVONthink. And it utilizes memory more efficiently, thus speeding up work when multitasking in several programs.
My problem with this proposal is that you’re basically asking for a major rewrite of the whole program underpinnings, with all the delay and potential for new bugs that this brings.
You’ve mentioned some benefits to compensate, but to me they just don’t seem worth it in comparison to the risks of disruption, and the opportunity costs - the time spent on this would be better spent improving other aspects of the program.
All just my opinion, of course - and the only opinion that matters will be those of the developers and their cost-benefit analysis…
If you are a Mac Programmer, you would understand.
The feature of file and folder manipulation in the operating system is basic and should be easy for a good programmer to implement since the ties to the system are already provided by the OS X Cocoa framework.
File manipulation routines have long been present in the Cocoa framework. They are basic functions. This is why other simpler applications that store and organize files - like Eagle Filer and Together - can implement it.
DEVONthink already manipulates files - storing them in folders designating type of file (e.g. PDF, CHM, DOC, etc.), then storing them within that folder in alphanumerically ordered folders. If you add another type of file - e.g. mp4 - DEVONthink will create another folder for that filetype.
It is not that big of a jump for DEVONthink to create folders to mimic its own folder hierarchy rather than use arbitrary folders.
Even locating the document folder to outside of the DEVONthink database is a trivial programming exercise.
There would not be any disruption to current users since:
It is invisible to them. They would work with DEVONthink like before
It can be also implemented as a preference.
Current users can be informed that the database needs to be updated to the new format - which occurs with practically every major database - before DEVONthink creates the new folder hierarchy in file system.
Certainly, the developers for DEVONthink can implement it in a 3.0 version, if they want. This way, everyone can expect changes, improvements, and can expect some possible disruption, if any. So I don’t expect any disruption at all.
Another issue which makes it nice to have hierarchical folders in the file system the mimic DEVONthink’s folder structure - and additionally, storage of the files outside of the databvase package:
I store hundreds to thousands of large files (e.g., 16 to 160+ megabyte PDFs) in DEVONthink databases.
When I open a file in Adobe Acrobat, read it, highlight a paragraph, then close the file, DEVONthink FREEZES for several minutes as it re-indexes (I guess) the file. (Note that I have to use Adobe Acrobat since DEVONthink is incapable of showing nor creating bookmarks in PDF files.)
This is pretty irritating. It takes the spontaneity out of using DEVONthink.
Ideally, this re-indexing happens in the background. However, it does not. DEVONthink is frozen until it is done. You cannot re-select the document. You cannot open up any other document.
By allowing files to be accessed outside of the DEVONthink database, perhaps it would be easier to decouple the re-indexing of the document so that DEVONthink wouldn’t freeze every time I read and highlight a document in Acrobat.
This would make it using DEVONthink far faster.
Currently, I have to consider alternatives to storing large documents in DEVONthink because the re-indexing slowdown/hangup/freeze is significant.
Additionally: realize that DEVONthink is NOT exactly a database. It is a file system with a database that stores metadata about the files it is storing. As such, DEVONthink is essentially its own file operating system, riding on top of the Mac’s own file operating system, OS X. It creates is own folders (directories). It moves, stores, deltes, and duplicates files. (This means it can already do much of what I am proposing for it to do).
However, since it hides the files from the actual operating system there are some limitations it imposes on your files. For example, your documents are not searchable from Spotlight. Your documents aren’t instantly or conveniently available in other applications since you have to have DEVONthink open and they are hidden within the database “package”. And there are slow-downs which freeze DEVONthink when you use other applications to open and modify the documents - such as by highlighting sections of a PDF in Acrobat.
What I propose is to allow DEVONthink to be more open and accessible with its files.
You clearly have a greater technical knowledge of the file system than I do and I’ve enjoyed reading your posts. You’ve obviously spent a lot of time thinking about this and it clearly causes you concern, in a way that it really doesn’t to me (I don’t experience any of the problems you face, and my databases are in the gigabyte range). If your changes could be made risk-free and in a short time, fine. Nonetheless, I think the concerns I express about the risks and the opportunity costs are valid.
But your and my opinions, and our knowledge of the filesystem is in a way irrelevant - neither you nor I know intimately how the database works: do you not think it at least possible that the developers chose the implementation they did because it facilitates (or stronger, is necessitated by) the AI, which after all is one of the main selling points of the program?
The biggest selling point of DEVONthink is to organize one’s papers on the computer so one can have a paperless system.
The storage of the files in the file system does not have anything to do with how the database file itself works.
The operating system gives DEVONthink a reference ID for each file so that DEVONthink can find, open, analyze, and display the document. Where the file is does not matter since DEVONthink can do everything it needs to via the file reference ID. This is how it works, for example, with an indexed database - where the files are stored outside of DEVONthink, though not manipulated by DEVONthink.
In an indexed DEVONthink database, one tells DEVONthink where the folder containing the hierarchy of files is. It is a fixed hiearchy, however, since DEVONthink does not - again - manage the hierarchy of files in the file system. It only manipulates the groups/folders within the database.
My feature request is to give DEVONthink the power to manage the folders and files within the external folder and maintain an indexed database of those files.
In fact, this way of viewing my suggestion is the easiest way to achieve this: Allow DEVONthink to create an indexed database. Then create the hierarchy of folders in the Finder that mirrors the folders in the database.
Thus my idea is a more capable variation of the current idea of an indexed database.
There would be no risk to implementing this since DEVONthink already is capable of referencing external files in an indexed database. The groundwork is already there.
On a side note: I haven’t found DEVONthink’s AI search more useful than a Spotlight search in all the years I have had DEVONthink.
Human beings have an unlimited memory for pattern recognition. When formally tested, researches gave up on determining the limits for human pattern recognition when subjects demonstrated recognition of over 10,000+ patterns - even with a single exposure to each pattern.
As such, I rely on this to remember the relevant points in the thousands of articles I have. Then all I need is a word pattern match and I’m good to go. Spotlight serves me well in this regard since it also takes into account the text inside documents.
DEVONthink is a great way to organize documents. I bought and tested every competitor to DEVONthink and I think DEVONthink is the overall best.
Yes, it has its limitations and faults - e.g. being unable to show or manipulate bookmarks in PDFs. But DEVONthink’s developers did listen to suggestions - e.g. adding changeable highlight colors in PDFs.
And I would like DEVONthink to be even better. Thus, my suggestions.
One of the recommendations in the DEVONthink forums is to use the application “Skim” to export your PDF highlights so they can be harvested in other applications.
A problem for DEVONthink’s organization of its files is that Skim creates “.skim” files with the same name as your PDF document and stores them alongside the PDF document.
This is fine with EagleFiler in that it happily keeps the .skim files with their associated PDF files since it creates folders in the file system which mimic the folders in its database - like I would like DEVONthink to do. Rescanning the documents will place these skim files alonside the PDF document files.
However, DEVONthink organizes the documents first by document type (.pdf, .doc, etc.). When the database is verified and repaired (the analogous procedure to EagleFiler’s rescan), DEVONthink creates a “.skim” folder and puts all of the Skim Files in these folders. This separates them from their associated PDF document. In the database, it places the skim files in a new “Orphaned Files” folder. Imagine having thousands of PDF files highlighted using Skim. The Orphaned Files folder will contain thousands of disorganized Skim files.
Luckily, Skim is smart enough to keep track of the .skim file associated with the PDF, even if the .skim file is moved elsewhere.
However, these skim files should have been kept with the original PDF since the folder where they are located in the database is the appropriate topic where they should be stored. These skim files also contain your notes about the PDF.
DEVONthink forces you to reorganize your skim files.
If would have been easier if DEVONthink organized the documents hierarchically mimicking the database’s folder organization in this case.
Marinco, I cut the Gordian knot years ago by deciding to work with self-contained (import-captured) databases. I have total freedom in organizing and reorganizing the databases. Once captured, I don’t care what’s in the Finder.
My main database, which has been evolving for more than nine years, has been migrated to several generations of Macs. Essentially none of its contents have Finder counterparts on my current Mac that’s hosting it.
You are requesting a very significant redesign of the database. Christian has commented a number of times on the forum that the seemingly arbitrary organization files in the Files.noindex folder within the database contributes in major ways to the speed and scalability of DEVONthink databases. Not only are the files grouped generally by filetype, in a structure that bears no resemblance to the group organizational structure, but the organization of files is dynamic. A given file may change its location over time. That’s why Christian notes that an attempt to refer to the path of a file within the database from outside the database isn’t recommended.
Other document databases such as Together and EagleFiler have non-dynamic organization of files. But they cannot scale up in database size and performance as does DEVONthink. I’ve tested them with the contents of my main database. They cannot handle it. Performance was dreadful.
So what you are suggesting would seriously restrict the performance of large databases.
I was puzzled by your remark that you saw no advantages provided by DEVONthink’s AI features, compared to Spotlight searches. I find the ranking of search results in DEVONthink useful.
Of course, the Finder has no counterparts to the Classify and See Also AI assistants. When I’m working on a writing project See Also is my favorite tool for exploring ideas.
I second the initial request. The database of DT is optimized for DT only - but a disadvantage when other programs access the files, expecting a regular file system. I don’t think it’s a big step: An indexed database helps a lot, except that it’s sometimes cumbersome to handle (e.g. moving files, creating folders), though this can be eased with scripts. Replicants can be implemented with hard links in the file system (actually hard links behave already a lot like replicants in indexed databases, only DT does not know about them).
Maybe I draw the wrong conclusion here, but: If performance and scalability of DT depend on this file structure, what about an indexed database? Indexing has neither the dynamic structure nor the grouping by filetype. Does this greatly impact speed and possible size?
It seems to me that Bill’s point about scalability is decisive. Brookter’s point about getting from here to there is also important. And I don’t overlook the AI features of DT, which aren’t available elsewhere, and certainly aren’t provided by Spotlight.
Scalability and AI it seems to me are key. I’ve tried Together. Perhaps it has improved recently, but when I used it, big dollops of data used to slow it down to the point of being, well, not very useable. I’ve also tried Eaglefiler; it doesn’t seem to choke on the large volumes of data that Together didn’t like and I respect its developer, but it lacks DT’s AI features that I find very valuable.
In my experience, and I’ve looked fairly exhaustively across the personal data management field on the Mac, DT is unique in these two respects - its ability to handle huge amounts of data, and its so-called AI features: Classify and See Also are so useful. From my perspective, I wouldn’t want their capabilities compromised. Any supporter of a rival philosophy needs to explain how those two sets of features can be preserved. Oh, and how DT’s development can proceed acceptably whilst a completely different framework is created.
If what I am capturing consist of distinct PDF and other files, then sure I am interested in what is in the Finder. The reason is that these files HAVE to interact with other applications - not just DEVONthink.
PDF files HAVE to interact with Adobe Acrobat or Skim, for example, because DEVONthink is LIMITED in its abilities to manage PDF files.
Where DEVONthink is failing me is when the files it contains start becoming very very large. When PDF files are very large, for example, storing them in DEVONthink starts becoming inefficient. For example, when highlighting a PDF that is 150 megabytes large in Adobe, then returning it to DEVONthink, DEVONthink FREEZES for minutes. Situations like this would make it nice to have the files accessible outside of DEVONthink itself.
Certainly, DEVONthink does allow you to store the files externally to itself - via an indexed database Thus, it is capable of the first step of what I am asking. But the structure of files in folders in an index database is static, not dynamic. I would like it to be dynamic.
This is very different from how I use DEVONthink. Everything I store in DEVONthink are distinct files (PDFs, .doc, photos, sounds, etc.). They have Finder counterparts.
I do not store data within the DEVONthink database itself.
Not really. I am asking that the indexed database be allowed dynamically store files in the external folder, rather than statically store them as they are currently.
If a given file can already change its location over time, then what I am asking is not that much different.
Correction: Together and EagleFiler have DYNAMIC organization of files.
Perhaps if the files were numerous small ones, then DEVONthink may be slowed down. I haven’t noticed this.
But for LARGE files, DEVONthink is definitely less efficient and slower to use.
My work covers numerous fields in science and medicine. DEVONthink’s AI is NOT smart enough.
For example, when I search for a medication or a chemical or a neurotransmitter or hormone, in my database, DEVONthink’s AI results are pretty much the same as Spotlight’s results.
It is like the difference between SIRI and voice recognition on the iPhone. SIRI understands context and concepts, voice recognition does not.
I want conceptual understanding, DEVONthink AI does not understand this. Thus DEVONthink’s search ends up being similar to doing “find” in a word processor - a sequential linear search.
I use DEVONthink differently.
DEVONthink isn’t useful for me for “exploring ideas” since as a database, it has no ideas to begin with. For me, the best tool for exploration of ideas is the mind. The human mind has a far faster and far larger capacity for exploring ideas than what DEVONthink can do.
DEVONthink highly useful, however, for storing ideas I already know so I may retrieve them later. The structure of the folders in DEVONthink is the schema for these ideas.