Indexed pdf's and tags

dlwheel · October 22, 2013, 10:13pm

I have a large pdf library that I have indexed in DTO. I tagged the files with Taggit which have been functioning in DTO.
I figured those meta tags would work with the Mavericks file system and vice-versa. It doesn’t seem so. Am I missing a step? I really want them to work smoothly together.

I could wish that my Zotero tags would also work but Zotero never supported meta-tags anyway.
-David

korm · October 22, 2013, 10:47pm

Mavericks tags are a bit mysterious – the “all tags” tag browser panel in Finder seems to randomly show tags and then not show them.

But, for your specific problem, I think DEVONthink is just a bystander. You’d need to take it up with Taggit or the Apple Forums. DEVONthink is compliant with Mavericks tags and OpenMeta. If Taggit tags are not compliant with either, then that’s something that developer would have to address.

dlwheel · October 23, 2013, 12:55am

If I tag a DTO indexed pdf file with Mavericks, that tag does not show up in my DTO Database. If I do a ‘get info’ on that file inside DTO, it only shows the tags from DTO if there were any. If I likewise do a ‘get info’ on that pdf in Finder, it does not show the OpenMeta tags from DTO.

It looks as though the ‘tag field’ is pointing to different fields in DTO and Mavericks.

Greg_Jones · October 23, 2013, 1:02am

The DEVONthink tags are compatible with Mavericks tags. However, DEVONthink only updates its tags in the filesystem when the document is exported out of the database-it does not write out changes to documents that are already indexed into the database. DEVONthink works this way to avoid constantly reading/writing out data that will adversely affect performance. Move the document into the database, then move it back to the Indexed location and the tags will be visible to Mavericks.

korm · October 23, 2013, 1:10am

I agree with Greg – however (and I haven’t tested this fully) if a file that has been tagged in Mavericks’ Finder Get Info, then if I add a tag to that file after it is indexed in DEVONthink, and then go back to Finder to look at the tags again, I see that DEVONthink has replaced all the tags with the one that I added in DEVONthink.

I don’t believe that DEVONthink should be removing tags that Mavericks’ Finder added – but it looks like it is.

I’d appreciate if someone else could try this experiment with their own indexed files and see if DEVONthink is removing the Mavericks tags unexpectedly.

[size=85]Edited: Update – I’m told this is expected and perhaps might change in the future. But Mavericks does not give as much info about tags as does OpenMeta so coordination among programs might be difficult.[/size]

Greg_Jones · October 23, 2013, 1:22am

I’m just getting started with Mavericks, but so far it does look like the only Finder tags that survive with the indexed documents are the initial tags upon indexing. It does look like DEVONthink is overwriting any subsequent tags added via the Finder.

ndouglas · October 23, 2013, 1:37am

I’m not seeing what you’re describing, korm, but I might be doing it wrong.

The Mavericks tag system is an extended attribute containing a binary property list. For each tag, there’s a label (“Green”) and a number (“4”) (if there’s an associated color).

In addition, the 3 bits used to track the label in previous versions of OS X are used here to store the color of the last-applied label, so the label system is backward compatible.

So DEVONthink should be compatible, at least to the point where it sees the color of the Mavericks tag and applies it to the file when imported. However, I’m not seeing that – they don’t seem labeled at all when they are imported.

As far as exporting goes, in my testing the Mavericks tags seem to take precedence.

EDIT: It’s possible that my copy of Mavericks has a bug; I’m using the first GM, not the final.

Greg_Jones · October 23, 2013, 1:46am

Perhaps I’m misunderstanding what you are saying, but DEVONthink exports out all of its OpenMeta tags to a Mavericks-compatible format. At that point, all the tags are available in the Finder (not just the color/number combinations). So in the Finder I can apply a tag to a document that originated in DEVONthink. As all my databases are indexed, I now have many tags appearing in the Finder in addition to the original Mavericks colored Finder tags.

From the Mavericks Finder:

ndouglas · October 23, 2013, 2:23am

Sorry, my mistake. I was focusing on the DT labels vs. Mavericks tags issues (?) that I seem to be perceiving. I read what korm said about DT tags but was thinking DT labels. Carry on!

agostinocirillo · October 23, 2013, 10:22pm

With Maverick introduction of tags to OSX, I see a BIG problem with actual DTP management of indexed files:

DTP doesn’t seem to “see” tags set in Finder, but
Finder does recognize tags set in DTP (but not all of them… Buggy issue?).
When a file is moved to finder and reindexed, DTP seems to exclude the tags eventually set in Finder in the meanwhile.
I see that in this way, “update indexed file” command looses part of its purpose for indexed file.
I have written this to DTP Support:

I’m just beginning to use Mavericks, and I’m really enthusiastic about the introduction of tags in OSX. Until now I have used DTP with indexed files databases for this purpose. And this means hundreds of tags and a meaningful way to work.
But now I see a BIG problem. Scenarios:
A) if I tag in Finder, Mavericks tags are not “seen” by DTP, even after updating;
B) if i tag in DTP, Finder sees the DTP tags (I’m not really sure it sees all of them…).
I see that scenario A is a mistake! The normally expected behavior now should be that I could tag indifferently in Finder or in DTP with exactly the same result: see and use the same tags both in DTP and in Finder (after updating indexed items, of course).
Now it’s a mess for me with my beloved tag system, and, in spite of choosing a indexed file system in DTP, I cannot mirror my files with OSX file system, so I have to dismiss long waited tag feature. I cannot tag in Finder in view to use these tags in DTP. In order to use them I’m obliged to make a trick: to “Move to external folder” and re-import in DTP… But doing so DTP seems arbitrarily delete the pre-existant Mavericks tags. Why?
My wish is that, inside DTP and for indexed files, the tag should be kept the same ones assigned in Finder, as it seems quite logical. It’s the same that happens with any other file attribute: could you imagine that a file has a different creation date one in Finder and another in DTP. In effect, I don’t understand why DTP doesn’t show Finder label for indexed files… With Mavericks tags all these issues should be easily resolved. The simple logic should be: the user in Finder is the same in DTP, so if he tags (in any app) the apps should show what he does with his files. This is the spirit of the “update” command!
I hope this will be resolved in a next update, please.

agostinocirillo · October 24, 2013, 6:30am

This is the large and interesting response I received from DTP support team. I decided to share it, because it brings a touch of relevant arguments about the use/misuse/no-use of tags and the policies of DTP creators with this:

[…]
Mavericks tags are new, and it may take some time to determine what uses can be made of them in DEVONthink, as well as how useful they turn out to be. DEVONthink and the Finder are logically very different, and there are differences between Indexed files (which reside in the Finder) and Imported files (where metadata resides externally to the files themselves) concerning even DEVONthink’s tagging behavior.

As for me, I’ve been working with information systems for more than 40 years. In the old days keywords/tags were extremely important, as full text indexing, analysis of contextual patterns, etc. did’t yet exist. Back in the late 1960s-early 1970s I was director of the Environmental Systems Applications Center at Indiana University, which did computer searches of federally sponsored research for relevance to information requests from government and industry.

In those old days I became extremely frustrated by two major problems with keywords or tags: 1) it takes too much time and effort to capture all of the characteristics of documents for which those tools might prove useful; 2) humans are inconsistent in their application of the tools–different people tend to be inconsistent with each other in choice of terms as well as in their application, and the same individual exhibits the same issue in applying the tools to similar documents at different times.

I’ll often use the same references for different research projects. But there can be important differences in the perspective from which one approaches the same reference for different projects. I found long ago that tags or keywords applied to that document for one project were often not only of little use for a different project, but could actively impede one’s proper analysis of the reference material for a different purpose.

I immediately fell in love with DEVONthink, when it first appeared in 2002.

The DEVONthink environment provides full text searching (and the options of searching content only, Name only, etc.), options in the full Search window such as fuzzy searches, Similar Words, the Advanced button, the ability to instantly create smart groups from searches and the ability to quickly replicate selected results and slice and dice them by different criteria. The artificial intelligence assistants See Also and See Related Text let me quickly explore similar concepts in collections of tens of thousands of documents and are especially useful for identifying conceptual relationships one had not thought to. I can Option-click on a word and instantly see a list of all other documents that use that word. I use Annotation notes to annotate and add metadata to a collection.

I often add large numbers of new documents to a database when I’m working on a research project. For example, for one project I added more that 15,000 new documents resulting from DEVONagent searches.

I almost never add tags or keywords when adding new content. I concluded long ago that the return on investment of the time and effort to do that was actually very poor. I do spend some effort on group organization of material, but I rarely spend a lot of time doing that.

Where I find tags very valuable is at the level of a specific research/writing project. I’ll use tags to help identify useful references, perhaps to identify the section(s) of the paper or book to which they are most applicable, and perhaps to classify variations of a concept illustrated in references.

When I’m finished with the project, I usually clear out the tags I had created, so they won’t be in the way of the next project.
[…]

DEVONtechnologies LLC
www.devontechnologies.com

Greg_Jones · October 24, 2013, 11:17am

I don’t know if the entire statement is indicative of how the DEVONthink team feels about tagging. I’d pretty much look at the first paragraph as the official response, and everything that follows ‘As for me…’ to be Bill’s personal opinion.

Back to the challenges of tagging with DEVONthink and Mavericks, I have to wonder if part of the problem lies with Mavericks? It would appear that the Finder’s tag search function only works with OpenMeta tags? Here I have a document on the desktop with an OpenMeta tag that contains an ‘o’ in the tag name, as well as the Mavericks tag ‘Orange’. If I begin my search with typing ‘o’, then the document is returned in the search result.

However, when I add an ‘r’ to focus my search to ‘or’, there are no document returned as the OpenMeta tag does not contain ‘or’.

Unrelated to the tagging questions, it’s also curious why Mavericks shows that the document was modified Today at 10:10 AM, when it is 7:15 AM as I compose this?

korm · October 24, 2013, 1:35pm

Yes. The problems reported here relate to a seeming technical incompatibility between DEVONthink and Mavericks with respect to indexed documents.

I would like to see that resolved. First question is whether this is a DEVONthink issue, or a Mavericks issue, or both.

The rest is philosophy, which is interesting but off topic.

Greg – if you do your test with a non-colored tag (there can be only 7 colored tags in Mavericks) do you get the same result?

Greg_Jones · October 24, 2013, 2:25pm

Any tag in Mavericks can be assigned one of the 7 colors, and (unlike Labels prior to Mavericks), any document can have multiple colors assigned to it based on the color of the tags.

I would assume that Mavericks ships with 7 pre-defined tags with color names so that users can ‘label’ documents as they did prior to Mavericks. I never used Finder labels much so I cannot tell for certain, but I don’t believe documents that were assigned a label in (Mountain) Lion inherited the corresponding colored tag in Mavericks. Can anyone confirm or not?

To the bigger question, if I create a tag in the Finder and immediately search for it, it does not turn up in the search results. If I then import that document into DEVONthink, then DEVONthink correctly identifies the tag. If I then export the document back out to the Finder and perform the tag search there, then the document is found. I believe this limited testing would support the possibility that a) DEVONthink is exporting OpenMeta tags back to the Finder in Mavericks and b) the Finder’s tag search only looks for OpenMeta tags and c) the default tag format of Mavericks does not appear as an OpenMeta tag to Finder searches. It would be nice to have someone with more knowledge of the entire tagging syntax test this, as again I have no special expertise on any of this, nor have I tested this extensively at this point.

clane47 · October 24, 2013, 2:37pm

“I would assume that Mavericks ships with 7 pre-defined tags with color names so that users can ‘label’ documents as they did prior to Mavericks. I never used Finder labels much so I cannot tell for certain, but I don’t believe documents that were assigned a label in (Mountain) Lion inherited the corresponding colored tag in Mavericks. Can anyone confirm or not?”

@Greg,
I only had 2 labeled files in the finder and they are both now showing a tag of the same color. It appears that labels are indeed being converted to tags.

korm · October 24, 2013, 2:47pm

I had numerous labels in Mountain Lion and they all survived the Mavericks upgrade. But of course DEVONthink does not pick up OS X labels (never has and, IMO, should not).

Bill_DeVille · October 24, 2013, 7:42pm

agostinocirillo:

This is the large and interesting response I received from DTP support team. I decided to share it, because it brings a touch of relevant arguments about the use/misuse/no-use of tags and the policies of DTP creators with this:

[…]
Mavericks tags are new, and it may take some time to determine what uses can be made of them in DEVONthink, as well as how useful they turn out to be. DEVONthink and the Finder are logically very different, and there are differences between Indexed files (which reside in the Finder) and Imported files (where metadata resides externally to the files themselves) concerning even DEVONthink’s tagging behavior.

…

As for me, I’ve been working with information systems for more than 40 years. In the old days keywords/tags were extremely important, as full text indexing, analysis of contextual patterns, etc. did’t yet exist. Back in the late 1960s-early 1970s I was director of the Environmental Systems Applications Center at Indiana University, which did computer searches of federally sponsored research for relevance to information requests from government and industry.

In those old days I became extremely frustrated by two major problems with keywords or tags: 1) it takes too much time and effort to capture all of the characteristics of documents for which those tools might prove useful; 2) humans are inconsistent in their application of the tools–different people tend to be inconsistent with each other in choice of terms as well as in their application, and the same individual exhibits the same issue in applying the tools to similar documents at different times.

I stand by both areas of that response to a Support ticket, but in different ways.

The first paragraph refers to issues of integration of the new tagging features in Maverick with the existing OpenMeta-consistent tagging scheme in DEVONthink, including differences between DEVONthink and the Finder and differences between Imported and Indexed documents in DEVONthink. These issues will certainly receive attention by the developers of DEVONthink and by the users of DEVONthink.

The other parts of that response referred to my personal opinions about the ROI (return on investment) of spending a lot of time tagging or keywording documents. They are not a policy position of DEVONtechnologies, but I hope they are of use to users of DEVONthink.

Tags or keywords can be very useful tools for handling unique characteristics of documents that make them more easily identifiable and retrievable, especially if those characteristics remain valid for repeated purposes of access of those documents and are easy to assign. For example, if I tag a collection of notes and photos to identify them as related to a trip to Malta, I’ve made it easier to retrieve them if I wish to do so. They can be very useful for other purposes, such as identifying a set of references in my collection of documents that are useful for a particular research/writing project. But in the first case I would probably leave those tags or keywords permanently in place. In the second case I might decide to remove them after completion of the project, and in fact add value to my database by removing them.

Back in the day I was managing a university center that accepted queries about scientific and technical issues related to environmental issues, and searched computer tapes for information about federally funded research that might provide useful information.

We searched computer tapes by keywords. The result of a search was a list of numbers that matched the numbers of more than a million paper copies of abstracts, which were filed in shoeboxes in a quonset hut on campus. We sent the search lists to staff in the quonset hut, who then pulled the corresponding abstracts, made photocopies of them and sent us back the photocopies.

Our staff was supplemented by hiring a number of graduate students familiar with various scientific and engineering disciplines.

When we received a query, the first task was to translate the query into keywords that would be likely to pull relevant material in the computer search stage. The second stage was to examine stack of photocopied abstracts resulting from a search, and determine their relevance to the original query. Relevant abstracts were organized and sent back to the quonset hut staff to be pasted up on letter-sized paper and photocopied as collections to be sent as a response to the query.

At the time, this was a bleeding edge project that often did provide useful information to people who sent in queries. We received support in part from federal funding, and in part from fees charged to (primarily) industrial and governmental customers. It did help disseminate the results of federally funded research to potential users of information. Today, of course, it seems very primitive.

There are serious fundamental problems in attempting to make documents retrievable by assigning keywords or tags to them. These problems have often been addressed in the field of information science.

One problem is comprehensiveness of keywording/tagging. A given document may be relevant to multiple topics. Limiting the keyword or tag to very high levels of a topic, such as air pollution, would (in my example of our information dissemination center) result in many thousands of abstracts in a search result. That’s not very useful. Keywords should filter the search to provide results for a specific query. So we need to use such specific keywords to designate each of the important topical elements of an abstract at the lowest level of terminology possible. Typically, keywords supplied on those computer tapes had been assigned at the federal agency that supplied us with the tapes.

This requires the person who is assigning keywords to an abstract to recognize the elements of information contained, and to assign one or more keywords to each “element” of information that might be important. That takes time, though, and so can make keyword assignment expensive.

Quite often, in reviewing the final results of a query response, one of us would recognize that potentially important information that we were familiar with had been left out of the response, usually because the keywords used to describe it had not made it relevant for the search, and/or because the person choosing keywords to match a query had not included an important one.

During that project i visited with several of the federal agencies that did the keywording and supplied the tapes, to discuss this problem. They had tried to mitigate the problem by two approaches; development of glossaries of keywords and staff training. While there were some improvements (which raised the cost of the effort), there were never satisfactory solutions to the issue. I had the same kind of problem at my end, in the phase of translating a query to a set of keywords.

Ignoring a related problem, which is that the terminology in different disciplines to define information may differ even for closely related items, and that the terminology in a given discipline tends to change over time, the fundamental problem with the issue of comprehensiveness of descriptors is that it cannot be mitigated very much without drastic increase of time and effort.

The second hair-pulling issue is consistency of application of descriptors, whether by different individuals, or by the same person at different times. Use of glossaries and training of personnel helped somewhat, but never made enough difference to keep this from being a serious problem. Adding an additional layer of review of the descriptors used for a document helped, but that added substantially to cost.

Based on that experience and on the fact that I often need to approach to analysis of information in my research databases from differing perspectives, I do not tag new items as they are added to those databases. I simply don’t have the time to attempt an adequate job of that, and wouldn’t consider the effort likely to be repaid well. DEVONthink gives me access to full text searches and to the ability to vary search criteria to improve results, when I’m looking for information. See Also can sometimes help overcome the problem of variations of terms used for similar topics. The DEVONthink environment is very different, compared to the limitations of our information dissemination project in the old days, which relied entirely on use of descriptors for searches.

That doesn’t mean that I consider tagging unimportant. It does mean that I tend to restrict tagging to a relatively small number of items, where that becomes a major aid to retrieval or use of the tagged items.

I often dump hundreds of new documents into a database. It’s unlikely that I’ll consider upfront tagging for any of them to be worth my time. In a few cases, such as the example of associating notes and photos of a trip to Malta, I might do so.

Feel free, as always, to consider me an eccentric. I probably am.

agostinocirillo · October 24, 2013, 9:19pm

DTP has just released an update (2.7.1). After giving a try, now the actual behavior with indexed item seems to be:

The tag in set in Finder > DTP mirrors that tag;
The tag is deleted in Finder >DTP keeps that tag;
The tag is set in DTP > Finder mirrors that tag;
The tag is deleted in DTP > Finder keeps that tag, that’s back in DTP after updating.

The case 4) is quite strange: the result is that a deleted tag comes back to life against user’s will. It’s on the way to be a source of mess.

I have discovered too that all DTP tags have now appeared in Mavericks tags list (accessed via Finder) only after a long while: Mavericks seems to need a lag (in my case, several hours) to detect and compile a list of the DTP tags (is this normal, is an indexing issue?). It’s on the Mavericks side, but it’s strange.

korm · October 24, 2013, 11:57pm

@agostinocirillo, thank you for the additional testing.

I’ve been testing your scenarios and I believe an explanation that matches all of your findings (1) to (4) is that

(step a) DEVONthink recognizes all tags that existed in the document at the time the file is indexed.
(step b) Any Mavericks tag changes made in Finder after (step a) are not shown in DEVONthink
(step c) Any tag changes made in DEVONthink after (step b) are added to the file and all Mavericks tags added in (step b) disappear from the file

The question then is whether it is DEVONthink that is deleting the Mavericks tags added in (step b) or if it is Mavericks. A question for developers to sort.

I notice a behavior in Mavericks’ Get Info that might be a red herring, but might be relevant. When you add a Mavericks tag in Get Info, the tag is initially shown with a dotted marque, which turns into a solid line after a few seconds – suggesting that there is a slight delay in the file system as something is updating to reflect the new tag. I don’t know why there would be a delay updating a file’s attributes, but there could be a delay updating the Spotlight index.

At this point, it’s a good idea, IMO, to either add your tags in Mavericks, or add them in DEVONthink, but don’t go back and forth because there’s a chance you will not get what you’re expecting.

Greg_Jones · October 25, 2013, 10:38am

To the bigger question, if I create a tag in the Finder and immediately search for it, it does not turn up in the search results. If I then import that document into DEVONthink, then DEVONthink correctly identifies the tag. If I then export the document back out to the Finder and perform the tag search there, then the document is found. I believe this limited testing would support the possibility that a) DEVONthink is exporting OpenMeta tags back to the Finder in Mavericks and b) the Finder’s tag search only looks for OpenMeta tags and c) the default tag format of Mavericks does not appear as an OpenMeta tag to Finder searches. It would be nice to have someone with more knowledge of the entire tagging syntax test this, as again I have no special expertise on any of this, nor have I tested this extensively at this point.

I have discovered that the Finder does indeed have a search option for both OpenMeta and Mavericks tags. The results that I was reporting earlier was based on my having the search set only to find OM tags. Select the ‘Other…’ option in the Finder’s search window:

Also the Academic workflows on Mac blog is posting some good info on this very subject, including DEVONthink in his testing. It’s a blog that most DEVONthink users might find well worth following, even if one doesn’t have an interest in tagging.