Practical limits on number of replicants?

I’ve been programmatically working through the organization of my DEVONthink databases, moving things around and regrouping a good number of files to improve the efficiency and accessibility of my pdf files.

I’ve significantly increased the number of replicants in my primary database, replicating files across multiple groups to which they are relevant. These are NOT files that I would change in any way – for the most part they’re digital copies/scans of journal articles and book chapters, though I’m talking about hundreds, eventually thousands of articles here – I just want to arrange them into multiple, distinct groups that make sense in my mental model of the database’s deep organization. I’m curious about effects of this…

How much overhead does adding a lot of replicants add to the back-end operations DT, say, in searches, or verify and repair operations?

How much additional file space is required for a replicant?

I’ve not done this yet, but: what are the impacts of replicating indexed files?

Is there a practical limit of the number of replicants of a given file, or the total number of replicants in a given database, above which DT might become unstable?

Why not use tags instead and forego the replication?

3 Likes

No additional space is required for a replicant but I agree with @chrillek’s query into why you’re not just using tags instead?

I use tags but I’ve been trying to reduce my reliance on them as they tend to get too numerous and complicated over time, and it’s hard to visualize their relationships. (I’m working on simplifying my uses of them.) Replicants seem to me an effective way of maintaining parallel, hierarchical organizations of data that are each easy to visualize and parse, for example: historically-organized AND thematically-organized collections of the “same” documents. Or documents replicated in groups according to their relevance to an ongoing project, without changing their original groups in the database.

Tags are groups. Tags can be arranged hierarchically. Tags can easily be shown in smart group results. Tags can do everything you are using replicants for.

3 Likes

While you’re welcome to pursue the course of action that makes sense to you, I think you’re trading one set of complications for another as you are multiplying replicants to the point you’re asking if there’s a limit to them. So you’re still setting up a complicated system, just supplanting tags with replicants.

2 Likes

Technically there’s no difference - tags are groups and tagging means replicating to tags. And if the option to exclude groups from tagging is disabled, then groups are tags too and replicating is also tagging.

1 Like

I was probably exaggerating my intentions to replicate all over the place; I’m more likely to have only a few hundred replicants active by the end of this process – as I said before, to use them to define essentially different ways of gathering and visualizing files, according to particular criteria.

That resembles tagging, I can see that, smart groups based on tags might be another way to manage this process, and in fact I do use tags for things like subject keywords and project identification; all the documents for my fall undergraduate course, “LIT 3400,” have that tag, no matter where the actual file resides. (In fact, one approach I’m considering is to strip out all tags that are subject keywords – those really tend to multiply – and use tags only to identify writing and teaching projects, and then to gather those materials with smart groups.)

Perhaps if the UI for managing Smart Groups were changed to include ways to group Smart Groups – I would be more inclined to rely on tags for the procedures I’m describing. I’ve got about 30 smart groups at present and scrolling around in the side bar to find the ones I want to focus on can be tedious. If I had a better way of organizing them that would help.

Replication does, in fact, address pretty well one of the frictions I’ve run into with DT: its overreliance, to my mind, on semantic classification to the detriment of other forms of classification that have their place at times, such as arranging by last-name-first-name of the principal author.

An example: my main literature database includes dozens of texts by the French philosopher Michel Serres, the breathtaking range of whose literary and critical writings defies simple classification. It makes good sense, at least for me, to keep those texts all in one place, a group named “serres-m,” and then to place replicants of individual texts in a variety of other thematic or historical groups elsewhere in the database: “Verne, Jules,” “Philosophy of Science,” “Ecopoetics.” This enables me to ask the subtly different questions, “Didn’t Michel Serres write something about ecopoetics? – Right, there it is in the “serres-m” group!” and “Let me check the ‘Ecopoetics’ group to see if I’ve neglected to check someone’s work on the subject – Oh, yes, there’s that essay by Michel Serres I forgot about!" I could do this with tags and smart groups, but often it’s just easier to look in groups of replicants that fit the general criteria that interest me at the moment.

I’m not sure what overreliance you’re referring to as DEVONthink accommodates many methods.

1 Like

“Overreliance” is too strong a term and I regret having used it earlier. I think, in summary, that, while DT3 is a terrific tool, I rely on it daily (hourly) as my principal front end to, and archival method for, a huge database of texts for note-taking, teaching, writing projects, etc., I’m still trying to sort out best practices of a working model that includes both a stable classification scheme of materials – say, alphabetical by last name of principal author, or date of publication (as a literary historian that one matters to me often), and other, flexible, opportunistic schemes of classification, according to the needs of a particular class I’m teaching or a writing project, but without altering the stable scheme – keep the “originals” always in place there – which would be thus the default organization.

The stable schemes could be based on semantic relations (classification by DT’s “analysis of content”) but, because many/most of the documents I keep in my databases are not short scientific articles (for which keyword and content analysis methods are fairly reliable) but monographs and edited collections of essays (for which text-based content analysis is less reliable because, well, a book on a notional “single” topic will include specific content that wanders all over the place: e.g., classifying by content or tagging the Standard Edition of the Complete Psychological Works of Sigmund Freud and you get pretty much “Freud” + “Psychoanalysis”, which is at best self-evident), then the stable-flexible alternate schemes approach is especially important so that my files don’t end up all over the place, tucked away in groups that are irrelevant to other, perfectly reasonable lines of inquiry. Thus, as I observed before, I keep baseline copies of all texts by Michel Serres in one stable group but classify replicants for this or that specific project or use tags and smart groups creatively for that purpose.

I could accomplish this – and I’m trying to accomplish this – with creative and more precise methods of replicating, tagging, searching, smart grouping. But I find that the current interface for viewing tags and smart groups is fiddly and inefficient. (Count me among those who hope that one day we’ll be able to group and subgroup Smart Groups in the sidebar. Smart Groups of Smart Groups?) So I keep trying new solutions…

1 Like

How many items & words (see File > Database Properties) actually?

Many. :slight_smile: I believe we discussed this in another thread several months ago. I keep several databases open or readily available, but my primary archive, which is always open, includes 18,000 items, 5.8 million unique words, 590 million total words. As you may recall in that earlier thread I previously kept the contents of that archive distributed across several databases that were open all the time. To simplify my workflow, and to facilitate replicating and manage tagging within the stable/flexible model described above, I combined the previous four or five large databases into a single database. Searches are a little slower (esp in my circa 2017 Intel Mac PowerBook – which I hope to upgrade this fall) but very comprehensive.

I definitely can’t remember every thread anymore - too many users, threads, topics & numbers :slight_smile:

3 Likes

I’m sure that’s the case and didn’t expect it of you :slight_smile: . The forums here are rich, complex: I learn something useful every time I check in.