function for importing a large bibliographic database

dclayton · December 26, 2004, 1:26am

Surely this has been raised before, but in any case I’ll raise it again. It would be AWESOME if there were a command or script in DT to import and parse a flat database file, separating each record to make individual notes or files within the DT database. I saw another user (over in Usage Scenarios) just suggest this for Hypercard stacks. Yesterday I asked about this (over in Tips and Tricks) in the context of importing my existing StickyBrain data into DT. And today I thought of an even more incredible usage of DT, if only I could do the import. I’m a scientist and I use Endnote for organizing my references in manuscripts. Over the years I have built up a collection of 4 Endnote libraries that collectively contain about 4000 bibliographic records, each of which includes fields for the manuscript abstract, author information, notes I’ve made about the paper, and more recently even the URL for downloading the full pdf from the journal. This would be a veritable motherlode for DT to go to work on. I could easily ouput all the data into a single text that demarcates each entry by some uniform separator (character, tabs and spaces, period at the beginning of a line, etc). But at the moment, were I to import this into DT, it would come in as a single enormous file. I bet some ingenious Applescripter could figure out how to translate this into something that DT would recognize as 4000 separate files, but I don’t have any experience scripting, myself. I’d LOVE to be able to use Classify and See Also to mine all the data I’ve already got, and to manage the tsunami of new information pouring daily into Pubmed/NCBI and the many journals. I bet a lot of other people would, too. Can anyone help? Any hope for this in the future?

moses · December 26, 2004, 7:39am

Have figured out a workaround on how to do this… have posted it as it’s own topic under “Tips and Tricks”:
devon-technologies.com/phpBB … =5392#5392

Hope this helps…

Timotheus · December 26, 2004, 8:39am

But, dclayton, why? Why should it be more convenient to have your bibliographic database in DT instead of in Endnote, Bookends, Sente, or another application especially designed for this? What does DT offer you, in this respect, that these applications don’t offer you?

smolk · December 27, 2004, 7:38pm

As far as I am concerned, it is convenient to have all reference material - be it abstracts, references, summaries, or articles wholesale imported into DT - in one location. Easier when searching for something.

Sente does not yet cover everything (like edited volumes). Personally, I could never get used to the hideous look of Bookends. I know, shouln’t count. Still does.

Timotheus · December 27, 2004, 8:46pm

But this means that you’ll end up having thousands of often very small records in DT. Is this really convenient? And does DT offer you the import facilities (from Library of Congress, Pub Med etc. etc.) which bibliographical applications like Endnote, Bookends etc. offer? And is making and editing a bibliography just as easy with DT as it is with those applications?

dclayton · January 9, 2005, 6:37am

Here’s a brief update/followthru on the helpful tip above. I played around with this yesterday, and as Moses mentioned, the limitation with his trick is in the lack of useful filenames for the individual files that result from the splitting process. If you’re trying to import a large existing Endnote database, you can indeed get all your records into DT, which makes them available for analysis using DT’s very powerful Classify and See Also functions. But the utility is undercut greatly because you can’t visibly scan the results of a Classify or See Also operation and make immediate sense of them, unless you’ve gone to the trouble of renaming every record by hand (e.g., “FirstAuthor-Keyword-Source-Year” is my standard format for PDFs). So, for now at least, I think I’ll just stick with Endnote’s search functions for my existing Bibliographic base, but use DT for my rapidly growing collection of PDFs.

Also, Timotheus questioned why one would want to import Bibliographic records:

Right, I don’t forsee using DT for actual entry and editing of bibliographies in manuscripts – Endnote or Bookends will handle that just fine. But what DT uniquely offers is semantic analysis, which is would work hand-in-glove with the article ABSTRACTS that are now imported as a matter of course into bibliographic records downloaded from PubMed, Web of Science, etc. This sort of “text mining” is very much of a coming thing. I have a colleague who received a $5 million NSF grant whose main thrust is the production of just such a semantically organized database for all the research literature pertaining to honeybees (as it were), using primarily the abstracts downloaded from NCBI and other sources. $40 seems rather like a bargain in that context (granted, there are other aspects to his grant!)

In any case, I’m having a lot of fun with DT, and it’s clearly the right tool for managing my exploding collection of scientific literature PDFs – and probably lots of other things!

moses · January 10, 2005, 5:24am

Regarding the file names, I have thought it would be nice if you could tell DT to automatically rename a selected doc or docs based on the first line in the doc – as is done when you use the Take Rich Note Service.

With that in mind, I have had another idea on how to get the files in with useful names. You could use my work around and instead of dragging the pdf’s into DT you could open them all in individual windows and create a macro (using some other macro program like Quick Keys or Keyboard Maestro) to automatically “select all” in the front window, then hit “command-)”, then close the window and repeat with the next window and so on… This would create new files in DT via Services with names taken from the first line of each file. The files may need to first be converted into something other than pdf, I don’t know. dclayton, you may want to try that if getting your files in with names from the first line would work. I have no pressing need to do this myself just now so haven’t, and won’t, spend the time trying, until I need to…