Indexed pdf's and tags

I have a large pdf library that I have indexed in DTO. I tagged the files with Taggit which have been functioning in DTO.
I figured those meta tags would work with the Mavericks file system and vice-versa. It doesn’t seem so. Am I missing a step? I really want them to work smoothly together.

I could wish that my Zotero tags would also work but Zotero never supported meta-tags anyway.

Mavericks tags are a bit mysterious – the “all tags” tag browser panel in Finder seems to randomly show tags and then not show them.

But, for your specific problem, I think DEVONthink is just a bystander. You’d need to take it up with Taggit or the Apple Forums. DEVONthink is compliant with Mavericks tags and OpenMeta. If Taggit tags are not compliant with either, then that’s something that developer would have to address.

If I tag a DTO indexed pdf file with Mavericks, that tag does not show up in my DTO Database. If I do a ‘get info’ on that file inside DTO, it only shows the tags from DTO if there were any. If I likewise do a ‘get info’ on that pdf in Finder, it does not show the OpenMeta tags from DTO.

It looks as though the ‘tag field’ is pointing to different fields in DTO and Mavericks.

The DEVONthink tags are compatible with Mavericks tags. However, DEVONthink only updates its tags in the filesystem when the document is exported out of the database-it does not write out changes to documents that are already indexed into the database. DEVONthink works this way to avoid constantly reading/writing out data that will adversely affect performance. Move the document into the database, then move it back to the Indexed location and the tags will be visible to Mavericks.

I agree with Greg – however (and I haven’t tested this fully) if a file that has been tagged in Mavericks’ Finder Get Info, then if I add a tag to that file after it is indexed in DEVONthink, and then go back to Finder to look at the tags again, I see that DEVONthink has replaced all the tags with the one that I added in DEVONthink.

I don’t believe that DEVONthink should be removing tags that Mavericks’ Finder added – but it looks like it is.

I’d appreciate if someone else could try this experiment with their own indexed files and see if DEVONthink is removing the Mavericks tags unexpectedly.

[size=85]Edited: Update – I’m told this is expected and perhaps might change in the future. But Mavericks does not give as much info about tags as does OpenMeta so coordination among programs might be difficult.[/size]

I’m just getting started with Mavericks, but so far it does look like the only Finder tags that survive with the indexed documents are the initial tags upon indexing. It does look like DEVONthink is overwriting any subsequent tags added via the Finder.

I’m not seeing what you’re describing, korm, but I might be doing it wrong.

The Mavericks tag system is an extended attribute containing a binary property list. For each tag, there’s a label (“Green”) and a number (“4”) (if there’s an associated color).

In addition, the 3 bits used to track the label in previous versions of OS X are used here to store the color of the last-applied label, so the label system is backward compatible.

So DEVONthink should be compatible, at least to the point where it sees the color of the Mavericks tag and applies it to the file when imported. However, I’m not seeing that – they don’t seem labeled at all when they are imported.

As far as exporting goes, in my testing the Mavericks tags seem to take precedence.

EDIT: It’s possible that my copy of Mavericks has a bug; I’m using the first GM, not the final.

Perhaps I’m misunderstanding what you are saying, but DEVONthink exports out all of its OpenMeta tags to a Mavericks-compatible format. At that point, all the tags are available in the Finder (not just the color/number combinations). So in the Finder I can apply a tag to a document that originated in DEVONthink. As all my databases are indexed, I now have many tags appearing in the Finder in addition to the original Mavericks colored Finder tags.

From the Mavericks Finder:

Sorry, my mistake. I was focusing on the DT labels vs. Mavericks tags issues (?) that I seem to be perceiving. I read what korm said about DT tags but was thinking DT labels. Carry on!

With Maverick introduction of tags to OSX, I see a BIG problem with actual DTP management of indexed files:

  1. DTP doesn’t seem to “see” tags set in Finder, but
  2. Finder does recognize tags set in DTP (but not all of them… Buggy issue?).
  3. When a file is moved to finder and reindexed, DTP seems to exclude the tags eventually set in Finder in the meanwhile.
    I see that in this way, “update indexed file” command looses part of its purpose for indexed file.
    I have written this to DTP Support:

This is the large and interesting response I received from DTP support team. I decided to share it, because it brings a touch of relevant arguments about the use/misuse/no-use of tags and the policies of DTP creators with this:

I don’t know if the entire statement is indicative of how the DEVONthink team feels about tagging. I’d pretty much look at the first paragraph as the official response, and everything that follows ‘As for me…’ to be Bill’s personal opinion.

Back to the challenges of tagging with DEVONthink and Mavericks, I have to wonder if part of the problem lies with Mavericks? It would appear that the Finder’s tag search function only works with OpenMeta tags? Here I have a document on the desktop with an OpenMeta tag that contains an ‘o’ in the tag name, as well as the Mavericks tag ‘Orange’. If I begin my search with typing ‘o’, then the document is returned in the search result.

However, when I add an ‘r’ to focus my search to ‘or’, there are no document returned as the OpenMeta tag does not contain ‘or’.

Unrelated to the tagging questions, it’s also curious why Mavericks shows that the document was modified Today at 10:10 AM, when it is 7:15 AM as I compose this?

Yes. The problems reported here relate to a seeming technical incompatibility between DEVONthink and Mavericks with respect to indexed documents.

I would like to see that resolved. First question is whether this is a DEVONthink issue, or a Mavericks issue, or both.

The rest is philosophy, which is interesting but off topic.

Greg – if you do your test with a non-colored tag (there can be only 7 colored tags in Mavericks) do you get the same result?

Any tag in Mavericks can be assigned one of the 7 colors, and (unlike Labels prior to Mavericks), any document can have multiple colors assigned to it based on the color of the tags.

I would assume that Mavericks ships with 7 pre-defined tags with color names so that users can ‘label’ documents as they did prior to Mavericks. I never used Finder labels much so I cannot tell for certain, but I don’t believe documents that were assigned a label in (Mountain) Lion inherited the corresponding colored tag in Mavericks. Can anyone confirm or not?

To the bigger question, if I create a tag in the Finder and immediately search for it, it does not turn up in the search results. If I then import that document into DEVONthink, then DEVONthink correctly identifies the tag. If I then export the document back out to the Finder and perform the tag search there, then the document is found. I believe this limited testing would support the possibility that a) DEVONthink is exporting OpenMeta tags back to the Finder in Mavericks and b) the Finder’s tag search only looks for OpenMeta tags and c) the default tag format of Mavericks does not appear as an OpenMeta tag to Finder searches. It would be nice to have someone with more knowledge of the entire tagging syntax test this, as again I have no special expertise on any of this, nor have I tested this extensively at this point.

“I would assume that Mavericks ships with 7 pre-defined tags with color names so that users can ‘label’ documents as they did prior to Mavericks. I never used Finder labels much so I cannot tell for certain, but I don’t believe documents that were assigned a label in (Mountain) Lion inherited the corresponding colored tag in Mavericks. Can anyone confirm or not?”

I only had 2 labeled files in the finder and they are both now showing a tag of the same color. It appears that labels are indeed being converted to tags.

I had numerous labels in Mountain Lion and they all survived the Mavericks upgrade. But of course DEVONthink does not pick up OS X labels (never has and, IMO, should not).

I stand by both areas of that response to a Support ticket, but in different ways. :slight_smile:

The first paragraph refers to issues of integration of the new tagging features in Maverick with the existing OpenMeta-consistent tagging scheme in DEVONthink, including differences between DEVONthink and the Finder and differences between Imported and Indexed documents in DEVONthink. These issues will certainly receive attention by the developers of DEVONthink and by the users of DEVONthink.

The other parts of that response referred to my personal opinions about the ROI (return on investment) of spending a lot of time tagging or keywording documents. They are not a policy position of DEVONtechnologies, but I hope they are of use to users of DEVONthink.

Tags or keywords can be very useful tools for handling unique characteristics of documents that make them more easily identifiable and retrievable, especially if those characteristics remain valid for repeated purposes of access of those documents and are easy to assign. For example, if I tag a collection of notes and photos to identify them as related to a trip to Malta, I’ve made it easier to retrieve them if I wish to do so. They can be very useful for other purposes, such as identifying a set of references in my collection of documents that are useful for a particular research/writing project. But in the first case I would probably leave those tags or keywords permanently in place. In the second case I might decide to remove them after completion of the project, and in fact add value to my database by removing them.

Back in the day I was managing a university center that accepted queries about scientific and technical issues related to environmental issues, and searched computer tapes for information about federally funded research that might provide useful information.

We searched computer tapes by keywords. The result of a search was a list of numbers that matched the numbers of more than a million paper copies of abstracts, which were filed in shoeboxes in a quonset hut on campus. We sent the search lists to staff in the quonset hut, who then pulled the corresponding abstracts, made photocopies of them and sent us back the photocopies.

Our staff was supplemented by hiring a number of graduate students familiar with various scientific and engineering disciplines.

When we received a query, the first task was to translate the query into keywords that would be likely to pull relevant material in the computer search stage. The second stage was to examine stack of photocopied abstracts resulting from a search, and determine their relevance to the original query. Relevant abstracts were organized and sent back to the quonset hut staff to be pasted up on letter-sized paper and photocopied as collections to be sent as a response to the query.

At the time, this was a bleeding edge project that often did provide useful information to people who sent in queries. We received support in part from federal funding, and in part from fees charged to (primarily) industrial and governmental customers. It did help disseminate the results of federally funded research to potential users of information. Today, of course, it seems very primitive.

There are serious fundamental problems in attempting to make documents retrievable by assigning keywords or tags to them. These problems have often been addressed in the field of information science.

One problem is comprehensiveness of keywording/tagging. A given document may be relevant to multiple topics. Limiting the keyword or tag to very high levels of a topic, such as air pollution, would (in my example of our information dissemination center) result in many thousands of abstracts in a search result. That’s not very useful. Keywords should filter the search to provide results for a specific query. So we need to use such specific keywords to designate each of the important topical elements of an abstract at the lowest level of terminology possible. Typically, keywords supplied on those computer tapes had been assigned at the federal agency that supplied us with the tapes.

This requires the person who is assigning keywords to an abstract to recognize the elements of information contained, and to assign one or more keywords to each “element” of information that might be important. That takes time, though, and so can make keyword assignment expensive.

Quite often, in reviewing the final results of a query response, one of us would recognize that potentially important information that we were familiar with had been left out of the response, usually because the keywords used to describe it had not made it relevant for the search, and/or because the person choosing keywords to match a query had not included an important one.

During that project i visited with several of the federal agencies that did the keywording and supplied the tapes, to discuss this problem. They had tried to mitigate the problem by two approaches; development of glossaries of keywords and staff training. While there were some improvements (which raised the cost of the effort), there were never satisfactory solutions to the issue. I had the same kind of problem at my end, in the phase of translating a query to a set of keywords.

Ignoring a related problem, which is that the terminology in different disciplines to define information may differ even for closely related items, and that the terminology in a given discipline tends to change over time, the fundamental problem with the issue of comprehensiveness of descriptors is that it cannot be mitigated very much without drastic increase of time and effort.

The second hair-pulling issue is consistency of application of descriptors, whether by different individuals, or by the same person at different times. Use of glossaries and training of personnel helped somewhat, but never made enough difference to keep this from being a serious problem. Adding an additional layer of review of the descriptors used for a document helped, but that added substantially to cost.

Based on that experience and on the fact that I often need to approach to analysis of information in my research databases from differing perspectives, I do not tag new items as they are added to those databases. I simply don’t have the time to attempt an adequate job of that, and wouldn’t consider the effort likely to be repaid well. DEVONthink gives me access to full text searches and to the ability to vary search criteria to improve results, when I’m looking for information. See Also can sometimes help overcome the problem of variations of terms used for similar topics. The DEVONthink environment is very different, compared to the limitations of our information dissemination project in the old days, which relied entirely on use of descriptors for searches.

That doesn’t mean that I consider tagging unimportant. It does mean that I tend to restrict tagging to a relatively small number of items, where that becomes a major aid to retrieval or use of the tagged items.

I often dump hundreds of new documents into a database. It’s unlikely that I’ll consider upfront tagging for any of them to be worth my time. In a few cases, such as the example of associating notes and photos of a trip to Malta, I might do so.

Feel free, as always, to consider me an eccentric. I probably am. :slight_smile:

DTP has just released an update (2.7.1). After giving a try, now the actual behavior with indexed item seems to be:

  1. The tag in set in Finder > DTP mirrors that tag; :smiley:
  2. The tag is deleted in Finder >DTP keeps that tag; :astonished:
  3. The tag is set in DTP > Finder mirrors that tag; :smiley:
  4. The tag is deleted in DTP > Finder keeps that tag, that’s back in DTP after updating. :confused:

The case 4) is quite strange: the result is that a deleted tag comes back to life against user’s will. It’s on the way to be a source of mess.

I have discovered too that all DTP tags have now appeared in Mavericks tags list (accessed via Finder) only after a long while: Mavericks seems to need a lag (in my case, several hours) to detect and compile a list of the DTP tags (is this normal, is an indexing issue?). It’s on the Mavericks side, but it’s strange.

@agostinocirillo, thank you for the additional testing.

I’ve been testing your scenarios and I believe an explanation that matches all of your findings (1) to (4) is that

(step a) DEVONthink recognizes all tags that existed in the document at the time the file is indexed.
(step b) Any Mavericks tag changes made in Finder after (step a) are not shown in DEVONthink
(step c) Any tag changes made in DEVONthink after (step b) are added to the file and all Mavericks tags added in (step b) disappear from the file

The question then is whether it is DEVONthink that is deleting the Mavericks tags added in (step b) or if it is Mavericks. A question for developers to sort.

I notice a behavior in Mavericks’ Get Info that might be a red herring, but might be relevant. When you add a Mavericks tag in Get Info, the tag is initially shown with a dotted marque, which turns into a solid line after a few seconds – suggesting that there is a slight delay in the file system as something is updating to reflect the new tag. I don’t know why there would be a delay updating a file’s attributes, but there could be a delay updating the Spotlight index.

At this point, it’s a good idea, IMO, to either add your tags in Mavericks, or add them in DEVONthink, but don’t go back and forth because there’s a chance you will not get what you’re expecting.

I have discovered that the Finder does indeed have a search option for both OpenMeta and Mavericks tags. The results that I was reporting earlier was based on my having the search set only to find OM tags. Select the ‘Other…’ option in the Finder’s search window:

Also the Academic workflows on Mac blog is posting some good info on this very subject, including DEVONthink in his testing. It’s a blog that most DEVONthink users might find well worth following, even if one doesn’t have an interest in tagging.