Distinctive metadata

Hi,

I am considering to use Devonthink for my document management, and I have some questions regarding DT 2.0.2 Personal.

How do I search for distinctive metadata? For example the author? Say the author is Black (as in the name) how do I search for this?

Or how do I search by creation/modification time?

And, if possible, how do I EDIT for example the author field on pdf files etc?

Thank you.

Frank

You can search for distinctive metadata simply by creating a smart group – not sure what magic is required in the search window… would be good to know on occasion, but certainly not a show-stopper.

Show Properties (⌥⌘P) will allow you to add an author name to rtf and pdf files.

Hmm, that’s a start, ⌘⌥P does show some more meta data.

Is there no other way to search for it then create a smart group though?

Tools > Search

(Recommendation removed. Not needed by OP.)

Hmm, all this makes DT very complex… It feels like more of a bother then any help to me…

Some types of ‘distinctive metadata’ such as the Author field in Document Properties are only available for PDF and rich text documents, and often require one to enter that data.

As was noted above, a smart group can be quickly created (Data > New > Smart Group) that will list all documents that, e.g., list “Arthur E. Black” in the Author field of documents that have Document Properties. So will a metadata search in Tools > Search.

But I probably wouldn’t bother with that, as such ‘distinctive metadata’ is far too limited in my main database of nearly 30,000 references and notes, which holds a variety of document filetypes — most of which don’t use Document Properties. For that reason, I don’t bother to enter data into Document Properties for my PDFs and rich text documents.

Another real problem is that an author’s name may be referenced in several ways, such as ‘Arthur E. Black’, ‘Black, Arthur E.’, ‘Black, Arthur Edward’, ‘Arthur Edward Black’ or ‘Black, A. E.’.

So I wouldn’t bother with a search based on Document Properties. That would be far too limiting.

DEVONthink will let me do a search that will find all the references to Black. It can be formulated in various ways:

“Arthur E. Black” OR “Black, Arthur E.” OR “Black, A. E.” with an ALL search would pull the most common ways in which he would be listed. These exact strings will be highlighted and a selected search result will scroll to the first occurrence in several common filetypes.

I find it much less work to write such a query than to go through the drudgery of entering a priori tags, keywords or Author fields into individual documents as they are added to the database — and even if I tried to do that on all documents that mention this author, I might easily miss some that would be pulled by the query. Note that if I now wish to tag some or all of the results found by my query, that’s easy.

As for metadata such as Creation Date, for example, there will likely be additional enhancements of smart groups. Usually, I’m either looking for a specific date or for a range of dates. By adding a Creation Date or Modification Date column (View > Columns) to a smart group or search results view I can sort by date and quickly select a desired date or range of dates. As for dates contained in the content of documents, there have been some very ingenious scripts discussed in the forum that can look for dates written in a variety of date formats and handle them consistently.

Which annoys me no end. Author should be available for pretty much any document type.

Entering data doesn’t bother me. I did it with my iTunes library back when I started using iTunes.

Smart groups are a poor way to search for things on a regular basis.

If it works for you, please do stick with it.

author:Black means author contains Black, not author is Black.

I do not agree, I find that wide angle searching makes me lose sight of the forest for all the trees…

See above about contains vs is. Half the time I do not remember peoples full names, especially if they contain middle names. I would search for black, which is a word WAY too common to do a wide search on.

I find smart groups a lousy way to search. They are a good way to store a search for later use, but they are imo ill suited to actually search. Too much hassle to make or edit a smart group etc…

I think maybe DT isn’t for me. Everything it seems to do, I do not much care for. Everything I would want it to do, it doesn’t.

Drats, that might have been something I was interested in when skimming through some thread yesterday and wanted to get back to… and I can’t PM you to ask about it.

@ cnf

Document Properties are characteristic of some, but not all, filetypes. Within DEVONthink, only rich text and PDFs have editable Properties. DEVONthink has no control of that, or even over which Properties fields are editable from one PDF to another.

Of course in a DEVONthink query the term ‘Black’ will be found if it is contained in metadata or a document. I didn’t see your point.

It so happens that I’ve counted at least 14 authors with the last name ‘Black’ in my main database — and that’s not a very common last name. If I want to look at the articles and bibliographies of a particular author named Black, I’ll have to refine my query if I want to look at Arthur’s publications and not those of James, Harold, and others. If you work with scientific literature, and try to stick to only one of the ways of writing an author’s name (such as the way the name was entered in the ‘Author’ field of a PDF of a book) you will likely miss a number of that author’s publications and will certainly miss citations of that author’s work — so that often isn’t a sufficient piece of metadata; in other words, it’s not very reliable.

In three or four minutes I can construct a query that will with high reliability find all the references to a particular author in a database containing tens of thousands of documents, without having spent time and effort in a priori metadata entry as each item was entered into the database. I can then save that as a group, or as a smart group (a 1-button click turns the results into a smart group, if I wish). If I were to need to separate out those documents of which the author was a primary or secondary author, or another group in which that author was cited by others, I can do that in just a few minutes. If I wish, I can now tag something that may be interesting and useful. And I’ve saved myself a lot of time and effort.

Years ago I worked on a massive 3-volume bibliography funded by the U.S. National Science Foundation, and a series of subsequent publications (also funded by NSF) based in part on that literature. That required me to do a priori multilevel categorization of thousands of items, which was an almost intolerable amount of drudgery. It took years. The reason I love DEVONthink is that it would have made that job almost a piece of cake — and I could have done the job much better with DEVONthink.

I have no intention of spending several minutes constructing queries. Spotlight is faster then that.

I guess my mind works in a different way from yours. What I am looking for, in the simplest analogy I can think of, is iTunes for my documents. It seems DT is not it for me…

Yes, you’ve mentioned that a few times.

OK cnf. But my point is that I save time and effort (and with more comprehensive results) compared to the total spent using an iTunes-like procedure, as in most cases the ‘Author’ information would have to be added. The time and effort for adding metadata should be counted. In a database with tens of thousands of references, that could add up to a lot of time.

Note that the procedure I would use handles cases of multiple authors and cases where the publication may use a variant of the author’s name (which is very common).

Once done, if I wish, I can tag the entire batch in a single operation. Then that author’s stuff has a single-click metadata access.

Together, Yojimbo, and (to a lesser degree) EagleFiler use more of an iTunes-like metaphor for document management than DT.

In that case, have you looked at Papers? I am not sure how well it works with documents other than PDF, though. Perhaps worth a try.

All you need is a simple search index like Spotlight… Your needs don’t appear to be that complex to where you need to find very specific data in a large dataset nor discover associations between documents. Spotlight can do none of that, but if your needs are generalized go with the solution that is as complex as your needs are. No sense killing an ant with a sledgehammer.

Keep in mind you can spotlight index your DT databases, so DT gives you room to grow if you think you might need that someday, otherwise if you’re positive you’ve reached the limits of your needs, then go with a solution that piggybacks existing OSX search apis like the alternatives mentioned.

I preferred EagleFiler of the lot mentioned… but I outgrew a simple search index like Spotlight so that’s how I found myself where I am today.

I think I am just going to give up on this… I can’t find a single app that even approximates something that works for me. I have heard a lot of people raving about DT, but for me it adds more pain then it solves. Sadly the same can be said for virtually every app I have looked at so far :frowning:

EDIT: I have tried Together, Yojimbo, EagleFiler, Papers etc. They tend to give me the same headaches…

I think you might be looking for something like Tinderbox, which is built around attributed data, or multi-column outliners such as OmniOutliner or TAO.

The “downside” to these program is that they aren’t as friendly to the wide variety of data types that DTPO (and the others you mention) handle: PDF, MP3, Quicktime, Sheets, RTF, HTML, etc.

As a result, I use DTPO with Tinderbox and OO3 for a variety of mostly unimportant reasons.* As my data gets more organized, my clips and notes get attributes and land in one of these.

Then I’m able to write based on my “notes,” and can always follow links back to the original data, which is kept in DTPO and Bibdesk, where its easy to “mine” with the wonderful tools that Devon provides.

HTH, Charles

*For quite a while Tinderbox didn’t support Unicode, which made import/export really nasty. I got hooked on OO3 because it did. Although OO3 is a less sophisticated program, it is Applescript-able, and the integration with DTPO on that level make up for a lot of its un-sophistication.

Owkay, that just looks amazingly confusing ^^;

I also don’t really write notes all that often ^^;

And TB is really expensive, kludgy, has a terrible interface, and stores all of your information in a single, colossal XML file. Great job, Tinderbox creator :confused:

I’ve been begging on this board for extensible metadata for a few years. For instance, select five documents, add an “Author” text-type field to them. Add the author’s name to that field, either through AppleScript or manually. Then type “Author:Black” or something similar in the search interface, and the search returns those documents. Then, in the “Additional Information” section of the Get Info pane, display a list of the fields and their values.

Actually logged in here to see if it had been added, or if there was any consideration of it. Be back in another year or so, I guess :stuck_out_tongue: