Does this "Workflow" for Tags > Groups make sense?

Hello All,

I hate to be “that” newbie, who skips the zillions of similarly asked questions/posts in the past, and jumps in with the very same ‘face-palm’ question that has been asked since someone switched the internet on… :neutral_face:

I’ve had a look at the various YouTube tutorials, and have now read through many posts on Tags/Groups etc., but figure I’m going to try my luck regardless…

I’m slowly beginning to get my head around DtP, and it’s potential power… The search function, in conjunction with OCR’ed PDF’s, is yielding excellent results, and has already allowed me to call up related articles that I would most likely have missed, were I to have relied solely on Finder.

With this being said, I know I’m only halfway there - primarily due to my ineffective use of the Group functionality, and hope someone would be kind enough to “vet” what I am proposing, prior to my going all the way down the proverbial garden-path, before realising I’m doing it wrong, and need to start from scratch again…

My dissertation is in law, and I am doing a comparative of 4 main jurisdictions. Prior to crossing over to Mac - I had a complex folder-tree, which saw a myriad of sub-folders under each of the 4 Primary jurisdictional main folders…

As indicated in the picture below, prior to DtP, I had many different folders, sub-divided according to different topics, with PDF’s of full-text journal articles, or extracts from books, inside - along with cases and pieces of legislation.

The obvious difficulty - many PDF’s should actually have been placed in several different folders, simultaneously, since they spanned several different topics - not to mention the fact that it became increasingly frustrating wading through the subfolder levels, in an attempt to find things. I was missing my own files - often stumbling across files long forgotten, or even worse - never read…

My remedy started with using Leap etc., to start tagging. And it was whilst researching additional Leap functionality, that I came across DtP - and then a special/discount offer package presented itself, I jumped at it.

Where I’m at now:

Before initially importing all my research data into DtP, I had restructured my research folders. To facilitate the tagging, and OCR process, I flattened the folder structures, and placed all the PDF articles and chapter excerpts (photocopies scanned to pdf) etc. into a generic “Data” folder, inside each of the jurisdiction folders. I then still had separate Caselaw and Legislation folders. The latter two are not problematic, since there are not that many files in each of them - but the articles number in the thousands per jurisdiction data folder.

As is hopefully clear from the above, my “Groups” are somewhat lacking. My “tag” option, is however improving by the day. I had many tagged documents already done when I imported them initially, and over the past few weeks, as I’ve opened pdf’s, I keep updating and expanding the “tag” database, as it were.

My question:

Where to from here? I realise that the “tagging” option, and the “tag” view will offer functionality, but I cannot but help thinking that improved “grouping” will bring additional - and much needed - functionality to the party as well. What I am uncertain of, is how to go about doing this?

Do I:

A.) Forge ahead only tagging, and then using the tag-view to generate smart-groups by (e.g.) sorting via tags?
A-1) If so - what do I do with articles that contain multiple tags? Do I Replicate or Duplicate these then, into the separate Smart Folders?

B.) Continue tagging as I open/read articles, during the ordinary course of working on my dissertation/with DtP - but in addition, start creating “Groups” manually - i.e. simply “repeat” the Folder groups I used to have, prior to my flattening my folder structure?
B-1) If I manually create these groups - again, what would be the best method to handle the files that share common themes (and would ‘fit’ in several Groups) - Replicate or Duplicate?

C.) Or - am I missing some amazing automated aspect of DtP, something like auto-classify(??), that will do this for me?

I have read up on the DtP intelligent sorting etc., but gathered that this works better once some initial sorting/classifying has been done manually - DtP then manages to recognise the groups/sorting/classifying easier - is this correct? Would I first need to do some of it myself, before handing over control?

I have no problems with that - just want to get some feel of what would be required…

Apologies for the length of this post. But I find it annoying when a one-liner query is put up, with little to no information - only for it to have to be pulled out, before any meaningful advice can be given. I prefer to get as much info out initially, to give others a better idea of whether they are able/willing to assist!

Many, many thanks in advance!!

Holy moses!

It has taken a fair amount of reading, and re-reading - but finally getting my head around Duplicate vs Replicate.

Things are made slightly more complicated by the frequency of posts where lengthy debates are had about the terminology/underlying philosophy/scripting req’s/implications/meaning of life, the universe and everything/ etc. pertaining to replicate vs duplicate… It takes a while before finally finding posts where the difference is actually explained, not debated! :laughing:

Good times!

With this being said, reading through my soliloquy above, it’s clear that I should be using Replication, to have a PDF in multiple groups… It actually suits me that a change in one, will be updated across all of them, so this is what I will be going with…

Now to figure out the shortcut for doing the above!

[And thanks to you all, for holding back, and allowing me the time to find my own answer! Personal development and growth - beautiful! :wink: ]

Hi there,

I’m also a researcher, and I also have a lot of data that belongs to different categories. I find that the best way to deal with this is tagging, and if I need to find all articles that belong to two given tags, I just do an advanced search where I specify that I want all entries with the two tags. If I need the same double (or triple, or whatever) tags frequently, I create smart groups. No need to replicate/duplicate anything.

Just a quick response (I am offsite right now) but Tagging actually minimizes the need for Replicants as pseudo-Replicants are created in each of the appropriate Tag folders. Not to confuse or dissuade you.)

Replicants have their place but in a Tags-based workflow, files truly can be in “two places at the same time” (I am a huge Tagging fan - previously having worked for Ironic Software 8) ).

Replicants and folder hierarchies may still be your method - it’s your choice but I just wanted to clear up a potential misunderstanding.


Two responses, and both advocating tagging!

Many thanks for the replies. I am pleased that I’m not too far off…

My natural inclination was proceeding with the tagging - as I will in any event - and appreciate the fact that it sidesteps the replication/duplication “issue”, since - in conjunction with the smart folders, it allows the ‘2 or more places at once’ approach… Although, technically speaking, that’s a misnomer in and of itself… :slight_smile:

Having said this - I’m beginning to realise I’m probably more ‘visually-inclined’ than what I might have previously realised… Whereas the tagging fulfils all my needs from the “find it” perspective, since using the Tag View will immediately throw out all the appropriately tagged files, it still leaves me somehow wanting more

I cannot put a proper finger on it, but it [cannot believe I’m about to type this] feels as if a more complete folder structure, staring at me from the main view, will compensate, in some manner, for my apparent need to “see” the concepts before me, if that makes any kind of sense…

Just seeing a folder structure with a few sub-folders to my main jurisdiction folders has me feeling like I’m missing something. Which is completely irrational. And yet - it lingers. So before I slink off to properly interrogate myself on a recliner of sorts, anyone else out there ‘get’ what I’m saying? Or am I showing progressive signs of Overthink?

Joy. :neutral_face:

Thinking about it some more…

I guess I could push through with the tagging, and only then - when it’s mostly completed, I could reassess the Groups that “present” themselves from the Tag View perspective, and replicate into Groups then… This probably makes more sense. [Sign of relief at finding some rationality again :wink: ]

Having dealt with many of these kinds of questions with Ironic Software, please be assured you are quite sane (well, as far as I know you now 8) ).

Here’s the thing… folder hierarchies are not inherently bad BUT the biggest issue isn’t how useful they really are, it’s how culturally biased we are to using them because it’s what we’ve been using for so long. I find a combination of high level folders combined with Tags to be a nice combo, partially because some data segregation aids in speeding up searches (and/or the speed of windows in Finder / Open & Save dialogs, etc.) This high level is already present in having a ~/Documents folder - I just extend it a bit further, like Business or Personal, etc.

I think it’s more comfortable mentally as well as what we have to deal with when we go to other peoples’ machines.

Items that can be rigidly and effectively categorized can be handled well either by an organizational structure or by tagging. By “effectively” I mean that the categorization will be sufficient and unchanging, such as categorization of a tax-related receipt for a particular year.

In my financial database I use a well-structured organization to handle such categorizations of documents, and that works well for me. I almost never use tags in that database.

But in the databases in which I spend the most time, I’m doing research and writing among my collections of tens of thousands of references. Ironically, those databases, which are much more important to me, are very sloppy in organizational structure, and I never bother with tagging new content as I add it.

That’s because, in the course of a series of research projects, a given document will be of varying importance, and the concepts it discusses will individually be given different emphasis related to the project of the moment. In other words, an a priori effort to effectively categorize that document, trying to anticipate its potential usefulness for the kinds of projects I might undertake, would take so much time and effort to become effective and sufficient that I would never get anything else done. My objective in adding content to a database is to make use of its information content, not to spend a great deal of time and effort in guessing the manifold ways in which the bits of information it contains might be relevant to how it should be filed or tagged.

Many years ago, before the days of full text searching (and AI assistants), I was director of a university information center that disseminated results of federally-funded research. Searches could only be done by keywords that represented the content of each document. A human had read each document and created a list of keywords to categorize the content. Even with lists of standard keywords appropriate to various scientific and technical disciplines, this approach has serious methodological problems. Different individuals would apply keywords differently to the same document, and a given individual would often apply keywords differently to similar documents depending on whether the work was done before or after lunch.

Not surprisingly, I developed an abiding suspicion of the consistency and sufficiency of categorization of documents, whether by grouping, keywords or tagging. The more time and effort spent, the better the results will be – but the return on investment of that time and effort is questionable, especially when done “up front”.

When adding content, I do broad categorization as to which database receives the new item, and somewhat more specific categorization by filing it into a group. But when adding a lot of stuff, I’ll often break rules and end up with a lot of unfiled items, which I may eventually try to organize.

DEVONthink’s search tools, including smart groups, together with the AI assistants such as See Also save me a lot of time and effort, and allow me to explore information content from various perspectives. These are the tools that I use to identify potentially useful information when I embark on a project.

At the project level I do spend a fair amount of effort in organizing and tagging useful references (as replicants), together with my notes and drafts. There’s a good return on investment of effort at this level.

But when that project is completed, I’ll usually archive the project group and remove any tags applied to documents, as they probably wouldn’t be useful for the next project. (Don’t cringe, Jim!)

My professional work and my research are both project and task oriented, and so maintaining group hierarchies works well from an document organization perspective. I was enthusiastic about tagging well before DEVONthink incorporated the feature (I used Leap and Yep extensively for this), but gradually lost an interest in tags because there was no extra payback from all that work over and above what I was already doing with group hierarchies. I rarely tag anything any more.

Tags are just a parallel classification scheme – just another group. In DEVONthink tags do nothing to make the AI better – See Also & Classify ignores tags, so adding tags to documents doesn’t improve on the classifications I generate in group structures. In fact, the AI gets much better when there are more groups and shorter documents.

For my work, to organize by both groups and tags is like paying twice for the same result.

The downside of both a group-oriented organization and tags is that they each require one to guess at what will be important when we get deeper into our research. Like Bill’s example of the work to keyword documents at his information center. I get more value from content and context search than group or tag organization and have stuck with DEVONthink more for the search and AI features than anything else – otherwise I would have reverted to OS X file system folders a long time ago because without search and AI DEVONthink is just another file system.

One thing I would advise, is never attempt to structure your document organization too far in advance – which ever method is used, groups or tags or both, don’t try to build an empty structure in which to place documents later. Every time I’ve done that the whole organization proves to be too rigid and therefore useless. Another lesson here is to never give too much weight to any one’s opinion on how to organize – including my opinions :laughing: Learn by doing, then redo and learn more.

Repeated because this is truth-couldn’t have said it better myself. Just start out and tear down what doesn’t work and build it back until it does work. I used DEVONthink for many years before I arrived at the best combination of groups, tags, and replicants for me, and I still tweak it as my needs change.

I agree with korm’s last statement (and it’s one that is hard for people to believe). People would always get on Ironic’s forums and cry, “How am I supposed to Tag?? Tell me exactly how to structure and what Tags are best?!??!” waaahh!!! :open_mouth: Then they’d disappear when I’d say, “Well, it depends…”

If you are working in a collaborative environment, things have to be structured and decided on a group level (whether Tagging or filing in folders or some combo of the two). If the structure is yours alone, do what feels right for you and don’t be dogmatic, even with yourself about it. Do searches that feel natural to you and build your structure to suit that. Granted, DT’s tag searching is not optimized for Tagging as much as content (and no one else has the giant electronic brain that DT does, so this makes perfect sense) but it’s still useful if it makes sense to you.

Cheers! 8^)

PS: Yes, Bill - I am suppressing my cringing. 8^)

PPS: I think our next app should just be called “GEB - Giant Electronic Brain”. Has a very Toyko feel to it. Anime mascot anyone? 8)

Thank you all for the insightful opinions - it is much appreciated.

There are some excellent points here, and I have been giving much to think about…

I can see the value in making this process my own - and it remains central to any attempts on my part…

Having read the above, and thought about it, I realise that several nails have been hit on heads - what I was looking for was mainly due to my always having used a Folder System - and not because it would necessarily make for a better database…

With this being said - therein lies a fair bit of Faith! I guess I what I need most, is to get my head around the fact that DtP is imminently capable of giving me what I need, and more, as it is right now… I guess I need to start trusting the “See Also” and AI of DtP - which is kind of difficult, since in the back of my mind I carry the “But what if it misses that crucial file” fears…

I will do some searching in the Forums about the above, to better educate myself in how they work. Since if I can relinquish ‘control’ of the data to the Programme, things will surely start working quicker… Right now, it feels like I’m spending way too much time on worrying about how I need to set the Database up, whereas I should just start using it - and get back to the Writing Up…

Thanks again!

Don’t be shocked, but DEVONthink will miss “crucial” connections sometimes – because it cannot know what’s “crucial”. Only you do. So, note taking, annotation, organizing your thoughts and categories – all the stuff you have to do anyway to research your topic will help DEVONthink but won’t substitute for your own artfulness.

If software could research and write legal dissertations we wouldn’t need lawyers, hmmm? :unamused:

Ha! And the World would be a better place for it, no doubt! :slight_smile:

I’ve read through several posts on the “See Also” and Classify - and have also just bought Kissel’s Take Control of Getting Started book, so think will work through that properly, before popping up back here… :blush:

I say that, since only a few hours ago, I was ready to drop my fixation on the Folder Structure, and focus on Tagging alone, together with the ‘AI/See Also’ magic of DtP [with due regard to your point, that DtP (fortunately??) does not understand The Law!] - and then I began to realise that the AI/See Also magic will work dramatically better if I have a decent folder structure in place… So I’m back to where I almost started! 8)

So think I’ll get the basics down, and then return…

Have some questions about that AI algorithm - not that I would understand any of it - but hopefully some kind souls, without revealing trade secrets, will be able to give me a clearer understanding of how I should go about splitting up my data…

@Cassidy: Although Christian has noted that in the future the AI assistants such as Classify and See Also may be able to span open databases, currently they work only within individual databases. Personally, I prefer the limitation to individual databases. As I’ve created a number of databases that meet my interests and needs, I don’t consider it difficult to take on the task of deciding which database is to receive a new document. Having done that, I think the AI assistants are more useful to me, than if they were “diluted” by assigning them the additional task of deciding which open database should receive a new item, or suggesting similarities among documents across open databases.

One of my obvious choices for databases design is the one that holds financial information such as banking and investment transactions and tax-related data such as scanned receipts, etc.

Obviously, I’ll decide to put new content related to my financial interests into that financial database.

I’ve got two databases, each of which deals with content related to my professional interests in environmental science, policy and regulation. However, I found early on that I gain benefits by splitting the material into separate databases.

My main database holds some 25,000 references covering a number of scientific and engineering references, policy issues and laws and regulations, as well as some 5,000 of my own notes. I’ve been building and updating that database in DEVONthink for more than 10 years, and it is extremely useful.

I also have a large companion database that deals with methodological issues such as protocols for environmental data sampling, analytical procedures, quality assurance, assessment of environmental data and related techniques such as risk assessment and cost/benefit analysis.

Why did I separate those materials into two databases?

For one thing, were I to combine them, the single database would have poor performance on my MacBook Air, which is limited to 4 GB RAM. i’m spoiled. I want most searches to take 50 milliseconds or less, and See Also suggestion to pop up immediately. I use a rule of thumb on database sizes. If I hold the total word count of each database to 40,000,000 total words or less, I can run that database at full speed on the Air. (Now that I’m using a MacBook Pro Retina with 16 GB RAM, splitting databases for performance reasons has become less critical, although of course even the larger memory environment could ultimately be overwhelmed as databases grow.)

More importantly, those references and notes become more useful when segregated topically. For example, if I’m doing research on the human health effects of mercury pollution in fish, I don’t want search results or See Also suggestions to distract me with information about how the fish samples were collected and analyzed. I want to concentrate on case histories, toxicology, health standards and regulatory approaches. Conversely, if I’m working in the environmental methodologies database I will be interested in potential problems of sampling design, chemical analytical methods that avoid problems posed by interfering substances and so forth.

Rarely, I might file a document into both databases. Suppose, for example, that a new report criticizing a major toxicological study for methodological reasons is released. That report will belong in both databases.

@Cassidy: I’m a heavy user of See Also and the related See Selected Text assistants when exploring concepts in a database.

See Also works by comparing the terms used and their relative frequencies in the document being viewed to the terms used and their relative frequencies in all other documents in the database. A suggested list of possibly similar documents will be presented. Often, depending on text contents, that list may bridge across terms, so that, for example, the list of suggestions might include a document about wolves (also canines) when a document about dogs is being viewed and See Also is invoked.

DEVONthink has no training or insight into any discipline, such as a scientific discipline, the law or anything else. DEVONthink doesn’t do critical thinking. See Also merely looks at patterns of words, but it can do so among many thousands of documents very quickly, something we humans cannot do.

It’s up to the human user to decide whether or not a suggestion made by See Also is useful. Many of them will not be useful. They are either items the human is already familiar with, or irrelevant to the idea being evaluated. The suggestions that really delight me are ones that I wouldn’t have thought of, and that might give me a new insight – a Eureka! moment.

I right-click on each of the suggestions made by See Also and open them in a new tab in the document being viewed. I’ll scan each tabbed item, quickly deleting those that don’t seem useful. This approach also allows me to do See Also on a tabbed item, following another list of suggestions that might provide useful material.

See Also doesn’t do semantic analysis or understand concepts; the human “partner” does, however, and that partnership can be a wonderful way of exploring the information content of a database. (And it’s a great way to break writer’s block.)

Steven Johnson discusses at some length how he uses DEVONthink in his book, Where Good Ideas Come From: The Natural History of Innovation.

Many thanks Bill.

I’ve read through many of your responses (and those of others) - including one where you explained something in the context of formation of arsenic(?) :slight_smile: - and I think I now have a fairer idea of what’s potting…

I completely understand the limitations of the AI/algorithm, appreciate it will never be a replacement for ability to evaluate data-relevance - and acknowledge it being able to throw up potentially interesting links, not necessarily seen initially…

Having said that, each situation will obviously be different.

In my particular case, which is no doubt very similar [and thereby contradicting my point just made :slight_smile: ] to other DtP users, I came into the software package very late into the progression of my research for my dissertation… The implication is that I had, by and large, already weeded out the relevant from the irrelevant… This would then be the opposite situation of where I would find myself at the commencement of all future research projects, where (as someone beginning to appreciate the possibilities of DtP), I would simply throw everything into my new Database, and use DtP’s features to sort them later…

As it stands, I have a series of PDF’s, albeit spread out over 4 different juridictions, but they all focus on a fairly specific aspect of the Law.

The American sources have “labor”, vs the South African, British and Australian term of “labour”, and each of them have distinguishable Legislative names/characteristics (Wagner Act; Labour Relations Act; Fair Work Act), and they have certain ‘concepts’ that are present in some, but not in others (Apartheid; Fair Representation; Golden Formula; Workchoices etc.) - but apart from these distinguishing characteristics - that are spread out over hundreds of PDF’s within a particular jurisdiction, they are all focused on a very specific area…

Now whereas I will be the first to pop up here again, to sing it from the rooftops if I am proven wrong, I remain rather sceptical regarding to what extent the ‘AI/See Also’ feature is going to make a meaningful impact within the above data-context… But then again - it probably shouldn’t have to, since I already have these conceptual frameworks within my mind…

HOWEVER - (and please note - I stand to be corrected), what I was hoping for is a system that, were I to select an America labor law article dealing with Fair Representation in terms of the Wagner Act, that the “See Also” AI would then throw out Fair Representation articles, from the US, that were applicable in terms of the Wagner Act (1st Prize), and then less relevant - but certainly related, in terms of the Norris-LaGuardia Act (2nd Prize) , and 3rd Prize - in terms of the Landrum-Griffin Act…

Instead - what I now have every expectation of happening, is that whereas DtP might throw those back at me, it would also pick-up British, Australian and South African articles, due to the convergence of specific general terms (Representation; Labour/Labor; Unions; etc.) - all of which would not be relevant in the slightest, since none of them have Fair Representation as a cornerstone of their law…

Which is why the Folder Structure seemingly becomes crucial, since if the American articles were at least grouped together at a upper-level, then (again) presumably(??) DtP would have an easier time of ‘recognising’ the links, and hopefully bump USA pdf’s 'up’the Confidence Meter, over say AUS/RSA/UK articles… And if the Folder Structure is a key component of having that work properly, then I am even less convinced of the possible benefits to be accrued - since if I have them grouped already, then how does “See Also” yield better/different results to an Advanced Search?

I realise I’m waffling off about something I haven’t even fully tested. But I will be the first to come back here, hat in hand… I’m just trying to get my head around things, since if I’m possibly right about the above - then I shouldn’t spend too much time worrying about creating a detailed Folder Structure (where I currently don’t really have one), since the Search & Tagging option will suffice…

But if there is any chance that I am wrong about the above - then I’ll gladly put in the effort. Because if the AI/See Also can even deliver 1/10th of what I was hoping it could deliver, then it would be completely worth any input effort…

I hope I’ve managed to make myself clear?..

And thanks again, in advance…

Having been around for a long time, I’m no longer surprised when I find a different way of looking at something that I had thought I understood very well. The sensation may feel disquieting or exhilarating.

Kuhn wrote about paradigm shifts in the history of science. I’ve seen a lot of paradigm shifts in my lifetime. Academic disciplines such as psychology, history, political science, anthropology and philosophy undergo dramatic shifts in focus and methodology every few years. And then there are the rapidly changing fads of popular music and pop culture.

There’s an old Yiddish saying to the effect that 99% of everything is drek, which I interpret as an admonition to use discrimination rather than blanket acceptance. As for paradigm shifts in writing history, I find myself missing the grand sweep of Arnold Toynbee.

Standards in the legal profession have changed greatly since I was young. Practices that are now common would have resulted in disbarment back then. (Properly so, I think. Not all changes are for the better.)

Just sayin’. :slight_smile:

As your database grows, the AI assistants grow with it. From time to time, try playing with See Also. Once in a while, a gem-like suggestion may pop up. Perhaps it will lead you to a new paradigm.