Synonyms Go Anywhere by 2012?

jasperx · October 15, 2012, 9:53pm

I am making a try at using DT to organize technical information from wells. A historical artifact of these data are that they are sometimes identified by a unique well identifier (UWI), or a Serial number, or a user defined name. For a given area I maintain a spreadsheet (rosetta stone) which relates each of these different “names” so that I can find and relate the data. It sure would be handy if I could feed DT my spreadsheet of these names so that I could leave the rosetta stone behind. The UWI, Serial and user Names need to behave synonomously in all ways to avoid a terrible tagging job.

jasperx · October 18, 2012, 8:54pm

I just finished up an effort to create a workaround for the lack of synonyms and it does not seem to be fully working… partially but not quite what is needed. Perhaps one of you will notice where I have gone wrong and suggest a better path.
I created a csv of 650 wells where each line in the csv contained all of the synonyms used as well identifiers for a given well. Then I got some help writing a script which turned the csv into 650 files… each file named by one of it’s names. Each file contains all the synonyms for that well. Then I imported the 650 wells into DT. Call the file name synonym Awell.txt and have it contain NameA and NameB and NameC… if I do a search on Awell… I find Awell.txt and any files which have A in the name. If I type in NameB I get Awell.text… very good! But what I do not get is the other files that a seach on Awell turned up. A bit confusing but bottom line is that having the synonyms referenced in a document does not make searchs work the same for each of the names. Perhaps it would work better if instead of 650 files I created 3 times that number, such that every well name is represented by a name that references the other two names. Bummer

korm · October 18, 2012, 9:47pm

jasperx · October 18, 2012, 10:32pm

Tagging or commenting is really a poor substitute for real synonyms. Imagine this for a single project area:
The state makes information available in files named using NameA…a service company makes information available in files using NameB and your client uses NameC. NameA=NameB=NameC. Each of those sources has several hundred documents you need to download and incorporate in your project. Is the solution to hand tag thousands of documents? Or for the program to recognize that NameA,NameB and NameC are really the same name. I am still toying with the idea of creating 1800 files so that I can cross reference the 600 wells with their 3 names to avoid the human error… but I am worried that even this will not act like synonyms.

korm · October 18, 2012, 11:05pm

jasperx · October 18, 2012, 11:41pm

Korm,
Were you referring to the option on the Info Panel to

?
This is fine for a file here or there. What I need to do is feed DT a table of alias’s.
I have several documents for each of 650 wells. The wells have 3 different naming conventions… 1800 names for 600 wells for 4,000 plus documents. How do I feed the synonyms to DT? I have a lovely cvs of the synonyms and just need a way to feed it to the program.
Adding alias to 10,000 documents one at a time is really not a solution.

BLUEFROG · October 19, 2012, 1:10am

I am confused by your terms and hesitance to use Tags.

What do you see as a “synonym”, any data from the UWI ?

jasperx · October 19, 2012, 1:52pm

Perhaps this will illustrate the issue:
Data from wells come from different sources who use different naming conventions:
One data source will have data like this
12345_log.jpg
12345_wellHist.pdf
12345_scoutTicket.pdf
Another source will have data like this
123456789ab_detailProduction.xls
123456789ab_summaryProd.xls
Yet another source will have data like this
23_01_sampleDescription.pdf
23_01_myNotes.txt
23_01_surveyorPlat.pdf
12345 is the “Serial Number”,123456789ab is the truncated “API” and 23_01 is the clients internal reference number… all these are data for the SAME well. The desired situation is that files from three sources could be imported and a search for 12345 would produce identical results as searches for 123456789ab and 23_01. There will be thousands of files but they will all have one of these three identifiers somewhere in their names. Tagging or aliasing each file, one file at a time is unworkable. It seems so much more straightforward to feed DT a “Rosetta Stone” file of my 650 wells which would relate the three names to each other… ie. make them synonymous.

BLUEFROG · October 19, 2012, 3:34pm

(I am unfamiliar with “Rosetta Stone” outside the language learning program. Is this a spreadsheet app? I am also not sure how you would approach using aliases either.)

Actually Tagging looks like the most viable option here.

If you had your own internal reference number in addition to the three given values, you could tag any of the files in your example with this singular value. Then a Tags based search would yield any file with this Tag.
If you wanted this “synonym” approach, you could Tag all of the mentioned files with all three values. A search for a Tag of one value would show the other files. So if someone on the phone is referring to the truncated API, search on that value and you’d get docs Tagged with that value, including the files with different filenames.

There are two things to consider here…

Do you need to Tag all your documents now? Often the answer is “No.” Many systems are workable by integrating data on an as needed basis.
If you do need to Tag “all” your documents, you don’t have to do it one at a time. You don’t even have to do it within DEVONthink. And though the initial load may be hard, then it’s only maintenance after that. You have already made correlations between the three given parameters so I imagine you have some file / organizational structure that helps you coordinate these documents for bulk Tagging.

If you get new documents on a known well, you’d just need to apply the existing Tags and they’d be found in a search. New docs on a new well would just require a new set of Tags and those would be related.

The only thing to bear in mind is the focus of the search. If I have a file with 12345 in the contents but it’s not tagged with a 12345 Tag, I’d need to make sure my search is on Tags, not on a filename or file contents. This is the reason some people used to use a prefix like @12345 or &12345 (which is a habit from early pseudo-tagging methods). With OpenMeta Tags this is no longer needed from Spotlight but still may be useful within DEVONthink.

jasperx · October 19, 2012, 4:16pm

Thanks for the response.
“Rosetta Stone” is my slang for a file which contains the names for all 650 wells. It is a cvs I exported from excel. Over a period of months I have built up the spreadsheet to contain the api, serial and client numbers for all 650 wells. I have a huge number of files…many thousands that are named by either the serial or the api or the client number. Do I need to work with all of the files? No… I probably am working with about 200 of the 650 wells. What I do need is to see every file for the 200, whether it is api, serial or client numbered in it’s file name. How do I do searches now? I go open my spreadsheet, look up the three names for each well and them perform three searches. Pretty darned ridiculous. Having to go into several thousand files and individually tag, or alias or any other manual, one at a time interaction which involves typing strings of numbers is an opportunity for a programmer to make a big win for users. My situation is another version of what other posts have described in context of multilingual search… feed DT a list of terms in several languages and it suddenly becomes possible to seamlessly mine collections of files in as many languages. This would be huge.

BLUEFROG · October 19, 2012, 7:24pm

This looks more and more like a Tagging job. I think the bigger part of tedium is that you have 650 wells!

How many files per well do you think you have? This doesn’t sound impossible (or potentially that hard). The hardest parts are…

A certain mindset to see Tags as very useful for the contextual relationships they establish. (Tagging is usually poorly done and has a lot of misunderstanding.)
The signal-to-noise ratio could be high without supplemental Tags (not a huge deal either but some sense should be applied). Consider this… If I have 300 files out of 30,000 files tagged “23_01”, “12345”, and “123456789ab”, a search on any of those Tags will return all 300 files. This is a very good thing but it may not be as helpful in finding the exact document I am looking for. If I am talking to someone and I want to see documents about zoning then a “Zoning” Tag may be appropriate, etc. You also have other non-Tag metadata to help trim results. Make sense?

The one area where people complain most about Tagging: “Why doesn’t your app Tag my files automatically? I have 17 gajillion files to tag!!! >:(” The problem is that Tagging is relative to its context so how is an app to know what Tags are appropriate? (To some degree, work in latent semantic analysis and mapping can work but I’ve also seen this fail often. There is no good autoTagging engine thus far.) What you’re talking about, at the base level of your “synonyms” is simple, static, and an accepted dictionary. If you want to add beyond that, it’s best to have an agreed upon dictionary (even for yourself its not a bad idea, though I’m pretty loose in my Tagging because I’m not sharing files and Tag data. No one has to understand it but me). Sorry, preaching now. 8^)

Bill_DeVille · October 21, 2012, 12:31am

As you have a list of the possible names for each well or related document, one obvious solution is to use OR searches including all the synonymous names of a well. You already have the list of synonyms, so search for them. For multiword well names, enclose the name in quotation marks to designate it as an exact string.