A question on file organization and methodology...

I use DTPO mostly for reference and to let me carry reference files around in DTTG. I do also use it sometimes for storing ongoing projects if they look like they’ll take a while or become large and complex.

Normally, outside of DTPO, I duplicate files when I want a copy in another project so that old references cannot be changed inadvertently, but this leads to many (…thousands? I can’t remember…) duplicate files in Devonthink. I’m also leaning more & more towards keeping active work in DevonThink since DTTg has been working well for awhile now.

This makes me want to turn DevonThink into a metaphorical filing cabinet with one copy of each file and add replicants to any project, place I need it, or possibly even things I logically connect but file separately (thinking that might help the AI searches).

Is there any generally accepted best practice for this?

Thank you,
ttwoods

Hi,

guessing you need consider your workflow. What I tend to do is to create a folder including my project files and documents and save it as a template within DTPO. If I have a new project I just click on the mouse and all my structure including the files will appear. You can do that easily by creating your folder structure in Finder and include all your “blank” documents. After finishing, just save the folder as name.dtTemplate and move it to the template folder of DTPO. You can find this folder easily by using data - new with template - template folder.

Hope that helps.
Best
Steffi

Thank you for the time and thoughts. I have also made various templates and want to learn more. I just haven’t needed more yet and don’t want to spend my time making useless things even if I am learning and enjoying it.

My central concern is the part about keeping one copy of each file and replicating elsewhere. I’m most worried that DTPO may call two files copies, I delete one and replace it with a replicant, and then find out they were different. As an example, I used to handle inventory stocking levels. Assuming no items were added or dropped, two spreadsheets might be identical except for the date and the numbers in one column. I could potentially lose an entire year’s data if DTPO were to flag them as copies and I cannot find any information regarding the number of cells or severity of change that might be required for it to differentiate them when it looks for copies.

I kept my original post vague on details because there is certainly the possibility of other problems I haven’t realized. I mostly like the idea of keeping one copy of each file and using replicants or tags to put it in other places-- rather like a library with infinitely deep shelves. I’m just wondering about the wisdom of that because I would probably never successfully untangle the mess if I ever decided to go back to my normal method.

I may use my eBooks for an experiment. I’ve been keeping them in a folder hierarchy within my reference DB but there are now about fifty books and several hundred articles so I’ve been thinking to put them in their own DB. I might try copying them into a new DB and then organize that one as I am proposing. I don’t think it would be a fair test, though, since I won’t actually be modifying any of those files. It is a large enough sample size to gain some insight, though.

Speaking my own personal opinion, I generally don’t concern myself about duplicate files. In fact, one of my primary daily databases has 357 duplicates of the same file, and it is of no concern to me. (I also segregate data in such a way that the duplication is a better idea in my organizational approach.)

Also, generally filenaming conventions will help filter duplicates. If you have two files marked as duplicates but they have different names, it’s usually a good bet they are contextual duplicates (ie. not exact dupes, but sufficiently similar). But since they have different names, this could be sufficient to keep you from “deleting a file” unnecessarily. In your example, the inventory sheets are likely contextually similar, but I would imagine they’d have varying names (based on location, dates, or whatever). If you are just reusing the same name, I would consider revising this practice.

In the case of the inventory spreadsheets, I was doing the same things with the same part numbers and equations-- only the number sold would change. Rather than creating an entire sheet every time or even filling out a template, I simply copied the file from last time, changed the date, and updated the number of units moved (adding or removing rows as needed). That saved a lot of time and provided a consistent format. It also meant that only about 1-5% of the thousands of cells (some of the cells in one column), the date, and the filed location changed. Otherwise, they were identical.

Unless I find a compelling reason to change, I’ll live with the duplicates. They just strike me as useless clutter and it drives me batty. I’ve lived with it as long as I’ve used computers, though, and it hasn’t killed me yet. :wink:

Thanks again,
Tim

“Changed the date” - as in the modification date or part of the filename?

That example was from a few years ago so I’d have to go check, but I would normally name the file something like “20180510 Stocking Level Adjustments”. Whatever I named it, the beginning date part of the filename would change and then the year on the spreadsheet’s summary page inside the file would change.

I first wanted to just add sheets inside the spreadsheet. Unfortunately, It ended up needing about four sheets per cycle and the number of sheets in the spreadsheet quickly became too cumbersome. That’s why I ended up having a different (but almost identical) file each year.

The example I gave came to mind because the differences between files are small but the loss of a file would be unrecoverable-- though not catastrophic. With those names, it would also be easy to spot falsely flagged duplicates. The scenario I would be more concerned about would be something less noticeable. I could imagine a situation where similar files are identically named but either not the same or should not be considered the same. …hmmm… I export an essay or article, make some final edits after meeting with someone, and reimport it. That would be a situation where the modification date changed but not the filename and little else, but the changes mattered enough to perform. More trivially, two different projects might each have a bunch of notes imported for archiving when they are finished. They might each have a note or single log entry saying, “Call Bob.” They are different projects, though, and the calls are not related but they could easily have the same filename since I tend to reuse local names (e.g., most projects have a file named “project log” but some logs stay nearly or completely empty.) I just don’t know which going in. I might also drop a quick plain text note that has a descriptive name and no naming conflict internal to the project but the same name becomes attractive in another project for the same reasons.

I haven’t noticed any files falsely flagged as duplicates. I just realize that there is a potential. I can weigh the cost of misidentifying a file, but not the probability.

I’m not sure why you’d export, edit, and reimport. Why wouldn’t you either edit the document directly, or duplicate it in DEVONthink then edit the duplicate?

I see many references in this discussion to identical file names and duplicates of documents, so I want to mention this to clear up any possible confusion. DEVONthink does not evaluate the name of the document to determine if documents have duplicates in the database. It is the content of documents that is evaluated to determine if there is enough similarity of content to flag documents as duplicates.

Greg,
Thank you for checking in and also following along. Unless I’m misunderstanding, filenames have come up only as a way to differentiate files that might be improperly flagged as duplicates.

Blue Frog,
Normally, I would not export, modify, and then reimport. The two situations I can think of off the top of my head where that sometimes happens are: (1) when I need to share a file or otherwise work collaboratively but DT is not available (for example, a file I might bring or send to a Windows-based coworker), (2) when I am working with an important file.

I try not modify any important file from inside DTPO after reading a post from a highly respected community member who said we aren’t supposed to do that. I had already been doing it for a few years before reading that so I don’t worry much about it but I will take those steps anyway if it is a critical file. I’m guessing that was true in some specific context but decided to limit my risk exposure.

Adding a bit of detail (all of the factual information I remember but keeping the member anonymous in case he doesn’t want attention) he was explaining to someone that DTPO isn’t intended for active work so projects should be kept as regular files and directories outside of DTPO until they are done and ready to archive. The search and filing are convenient enough that, coupled with my desire to be consistent, I still use it for large or complex work but generally keep small or short term things out of DTPO until they are done. Unfortunately, that means that I sometimes have to look two places to find something and don’t always have what I want at hand if I’m not at my computer.

Truthfully, DTPO may not even be the best tool for me. When I bought it, I didn’t believe it was the best tool. I only needed OCR and wanted a better way to file and find things. It gave me everything I needed for a lower total cost as well as other capabilities I might someday want. I ended up liking it a lot and have been using it for many years, but mostly just as a filing cabinet and searching my files. The times I’ve tried to do more with it, I find myself either overcomplicating things or some other tool is simpler and faster for me. I’ve blamed it on my own ignorance of it’s full capabilities, but really don’t know.

After my initial reading of the manual and watching a few YouTube videos, I just read what I can when I have time, search the forums when I’m unclear, and try things from time to time or when it appears to be a better solution (even if it later turns out not to be better).

–Tim

This is certainly a matter of opinion, or defined by environmental / situational requirements that must be met. I actively work in at least four databases every day and I use importing 95% of the time, ie. I work solely inside the database.

The admonition I think was intended is to not modify the internal structure of a database. Files should be accessed within DEVONthink, not by accessing the internals.

The thing about DEVONthink is not that it’s complicated. It’s really not. Its flexibility and the number of options give it an appearance of complexity. Much of the experience of running DEVONthink is what your needs are and what you are bringing to the situation. This is no different than Photoshop. Photoshop has a ton of tools and features the majority of people will never use. But that doesn’t mean someone can’t go in and make some adjustments, etc. to suit their needs. Options are… well, optional :mrgreen:, but that doesn’t necessarily make the application “complicated” to use.

Thank you for that encouragement. That is what I had been doing when I started, is simpler, and was the natural methodology for me. I will return to making it my default to put everything in from the beginning.

You know…

Maybe I just needed to step back and look at my underlying thoughts a bit. The top three strengths (for me) in the software are its ability find something, find things related to that thing, and find some likely places to put something. Everything beyond that is (sometimes very) convenient but non-essential. Why I just realized is that most of my problems and concerns are more about me trying to do the software’s job than they are with me finding the right solution.

My biggest concern from the day I touched a computer has been losing data-- either by damaging it or by misplacing it. Consequently, I want to keep everything and keep it highly organized. I’m also old enough to remember my first 6 Mb HD and thinking I’d never be able to fill that up before the computer died. The result of this is that I’m a terrible digital packrat who frets about culling data while needing to do so to keep my HD useful and being able find something.

It might just be time to realize that drives are cheap now and the software can probably find things better than me so I should consider letting it do so while I just file logically, trim cruft, focus on my work and get bigger drives as necessary. Hmm… that still sounds like a lot, but a lot less than I was doing.

My databases range from a few megabytes for simple testing, to several gigabytes (some for testing, some for other purposes). Some databases are highly structured, some a very organic. My Support database (and yes, I do 90% of my job directly in a Pro Office database!) is much more carefully curated. I create and delete databases, as I need them. Again, showing off DEVONthink’s flexibility.

Well said, and indeed drives are cheap. I have about 15TB of external drive space spanning many drives, most of it for exploration and support purposes.