See Also does not find objects just by "name" (espec. no fol

elwood151 · August 10, 2010, 3:24pm

I’m having the following problem with DT Pro Office:

I have a large collection of scientific articles in pdf format,
the pdf files are normally named AuthorYear, so e. g. Sago2003.pdf (=Full text).
All the pdf files are indexed.

Furthermore I have a bibtex database with >2000 entries, where the entry is named (bibtexkey - in DTPro the field is called “@”) AuthorYear,
so the bibtex entry name “Sago2003” belongs to the file Sago2003.pdf
(Normally, the bibtex field “pdf” also contains the file name, e. g. “Sago2003.pdf”)

So far, so good.

I now want to take notes about the different articles.
I highlight them with Skim (and also index the .skim files with DTPro).

For very important ones, I also take notes in RTF format.
I create those “reviews” in DTPro directly and they are named “Sago2003_rev”.

Those citekeys (like “Sago2003”) are normally unique in the database, they could only “interfere” with “Sago2003a”, b, c…

So, for my understanding of the See also function, if I select “see also” for the RTF file “Sago2003_rev” - I would like to see Sago2003.pdf and the entry “Sago2003” in the bibtex database as some of the first results.

Unfortunately, this is not the case and it seems to me that DTPro somehow ignores the object names in its search.

example:
I created an empty rtf file “Sago2003_rev” and so see also is also emtpy.
If I paste “Sago2003” as the only text in this file, nothing changes.
If I paste “Sago2003” a second time, then the bibtex database entry appears in the “see also” results.

this gets even worse if I add more text to the review-file (text notes about its content):

even if I insert the complete bibliographical information (Authors, Title, Conference, Year) in the RTF, the bibtex database entry is not in the results…
even if I add 12 times the citekey “Sago2003”, the bibtex file does not appear in the results! - I have to add it 13 times (in addition to the 67 words in the text file)

Another example: For books, I may have more than one pdf files (10…50 copied pages as single pdf files) and I save them in a folder named e. g. “Hemingway1940”.
If I now create the RTF review file “Hemingway1940_rev”, the folder with the nearly same name (Hemingway1940) does not appear in the “see also” results!
Does that make sense?

My conclusion:
The name of the objects (which I tried to use as an identifier) is not that highly rated by the AI engine for see also.

Any hints for me, how to make my system better?
(or can the behavior of DTPro be improved?)

(Sure, this is somehow obvious and I know that “Sago2003.pdf” belongs to “Sago2003_rev”, but it makes me feel uncomfortable that DTPro seems not to understand this system of mine.

Martin

Bill_DeVille · August 10, 2010, 4:26pm

By design, See Also looks only at the content of documents and not Name or other metadata.

But Lookup searches for Name text strings can do what you are trying to do.

elwood151 · August 10, 2010, 5:01pm

Thanks Bill for making that clear.

I think, I understood already that seeing DT’s behavior.
But: could the name and the metadata not be added to the content?

The name (at least if given intentionally by the database “owner”) might have more or at least as much “meaning” as/than the content and I don’t want to create a tag for each of the 26xx entries in my references database and assign it to the PDF files and reviews as well…

so, if you say that metadata are also ignored, it would not help to add this identifier (cite key) to the spotlight keywords of the pdf?

Bill_DeVille · August 10, 2010, 6:03pm

That’s right. See Also won’t look at Spotlight Comments either.

Unlike other databases, the artificial intelligence features such as See Also and Classify are built into the core of DEVONthink databases.

When you think about what See Also is doing, which is very complex, you might agree that managing comparison of the textual content of the document being viewed to – not only the textual content of every other document in the database – but also the contextual relationships of the words in each document, is a big job.

See Also’s most important use, for me, is in bringing to my attention other database content that surprises me, because it involves ideas or connections that I hadn’t thought of. So I wouldn’t want to “control” it by trying to tilt its list of suggestions toward items that I already know about. Obviously, a list of See Also suggestions may include some that i dismiss as dumb or irrelevant to my interest, and I’m responsible for separating the wheat from the chaff.

I use See Also not merely to suggest other documents that might be interesting about a topic, but to help me explore new ways of looking at the material and finding new ideas. I would find it much less useful if it merely “cataloged” the documents about a topic, based upon a limited set of criteria.

If I wish to emphasize special relationships (of my own creation) among documents, I use other tricks to do that such as hyperlinks and/or “Lookup” text strings in rich text notes, or perhaps tags. DEVONthink provides a pretty wide range of tools and trick with which I can define such relationships e.g. for the important references for a project, or for documents that contain the works of a particular author and also citations of those works in other documents, or a smart group that meets certain criteria, etc. When I make notes about a document, regardless of its filetype, I do so in a searchable rich text note, to which I can add links, tables and images if I wish (so I don’t use text notes in PDFs, don’t use Document Properties and rarely use Spotlight Comments).

This doesn’t mean that I’m critical of others who use text note annotations in PDFs or use Document Properties, and so on. If you disagree with my workflows, feel free to consider me eccentric about such matters – you are probably right.

My pirpose is just to suggest that DEVONthink provides a very rich environment in which there are many avenues for accomplishing a task, so it will accept varying workflows preferred by the user.

elwood151 · August 10, 2010, 6:14pm

Hi Bill,

thanks for your long reply.
But the “lookup” strings were one of my problems:
as I said, in a rtf with 67 words I had to insert this lookup string 13 times until see also noted its relation to the bibtex database…

ok, I just start seeing the “mathematics” behind:
The bibtex database with its >2500 entries contains about 450.000 words, so the one occurrence of this unique lookup term is not very important for the whole document.

Seems that I’d have to split the bibtex database file in pieces (one file for each entry) to be able to relate each entry to other contents…
(As my bibtex library is still evolving, this would have to be automated and is not a thing I’m happy about, but this problem seems to be related with the fact that see also can not (what it still could in DT 1.x) locate single records of a sheet, but just the whole document - if you know what I mean…

If it would treat the sheet as n single records, it could also find the relations easier…

Martin