DevonThink + Papers (Abstracts Only)

heshca · March 18, 2011, 12:40pm

At the moment I’m using DevonThink and Mekentosj’s amazing program Papers to keep my academic papers organized and searchable by indexing the folder that Papers uses to store both the pdf’s and the index files of its database. The main problem I am having, however, stems from the snippet philosophy of Steven Johnson that I built my DT database around. Essentially, I don’t want my DT database to be cluttered with full pdf’s because that’s just too much information to be useful.

What I would like, ideally, is a way to index just the abstracts of the papers (where available). I have no AppleScript experience, but I feel like it is feasible considering Papers pulls abstracts from academic search engines and matches them to the pdf - attaching them so they appear in the inspector each time a specific paper is highlighted. I’m wondering if somebody has done this, or can point me in the right direction? There must be some variable Papers stores abstracts in for each pdf that can be mined via an indexing script from DT.

acl · March 18, 2011, 5:20pm

Papers is completely unscriptable. So, unfortunately, there is no easy way to do what you want (or at least, none that I can see).

houthakker · March 20, 2011, 10:04pm

We should probably all send notes to Mekentosj to request the provision of an Applescript library …

In the meanwhile, although the application is unscriptable, the Sqlite data files can be read, and I have sketched a rough draft of a Papers 2 to DevonThink script which sends summaries and notes from the Papers 2 database to the currently selected folder in DEVONthink 2.

(It uses the same customizable CSS formatting as the Sente 2 DevonThink and Skim to DevonThink scripts).

houthakker · March 23, 2011, 11:07pm

In addition to the abstract and any notes, Ver 0.16 also appends a standard Papers2 citekey, e.g. {Walton:2005uv} to facilitate using the DEVONthink notes when drafting a paper.

heshca · March 30, 2011, 5:03pm

houthakker,

Thanks for the link and your amazing script. The only problem now is figuring out why some of my papers don’t have abstracts showing up, while others do. I’m sure it’s just a problem with the information Papers has stored.

houthakker · March 30, 2011, 8:47pm

Are there abstracts which display in the Papers2 GUI but fail to export through the script, or is it simply that Papers2 doesn’t always capture an abstract from your reference sources ?

heshca · March 30, 2011, 11:41pm

The latter.

heshca · April 6, 2011, 1:32am

I’ve noticed now, after looking a bit more closely, that the script is doing something a bit odd. I currently have only 12 papers in my Papers database and the script is importing 18 items, 6 of which have the author as “Anon.”

Any ideas?

houthakker · April 7, 2011, 9:52pm

It’s exporting every record in the Publications table of Papers 2. When these are simply journals, rather than particular articles, for example, Papers creates a record and attributes it to Anon.

If this is distracting, such records could be filtered out - for the moment I have erred on the side of completeness.

Bill_DeVille · April 9, 2011, 7:26pm

I’m commenting here about whether the OP’s interpretation that Johnson would consider his ‘snippets’ approach to mean that Johnson himself would consider a database of pre-prepared abstracts of publications to be satisfactory for his own projects. I’m quite sure that Johnson would not be satisfied.

Steven Johnson is a prolific writer (a good one), and has often written about how important he finds DEVONthink databases for collecting and accessing information for his book projects. In a recent book, Where Good Ideas Come From: The Natural History of Innovation, Johnson describes his DEVONthink database (probably the one he used when writing that book). It consists of his own writings: chapters, essays, blog posts and notes. It also includes thousands of passages (quotes) transcribed from books or articles or clipped from Web pages.

Those quotes, sections of text in an article or book of up to a few hundred words, and that describe an idea, place, event or thing, are what Johnson means by his “snippets”. They are much richer than abstracts, and a single article or book might the the source of several snippets in his database.

Johnson uses DEVONthink’s See Also feature to explore the database for connections. For example: “Several years ago, I was working on a book about cholera in London and queried DEVONthink for information about Victorian sewage systems. Because the software had detected that the word ‘waste’ is often used alongside ‘sewage’, it directed me to a quote that explained the way bones evolved in vertebrate bodies: namely, by repurposing the calcium waste products created by the metabolism of cells. …it sent me off on a long and fruitful tangent into the way complex systems—whether cities or bodies—find productive use for the waste they create. That idea became a central organizing theme for one of the chapters in the cholera book.”

Conversely, while writing Johnson may drop a paragraph into DEVONthink and invoke See Also, to see whether there are interesting connections to other items in the database. (Had he been writing within the database, as I usually do, he could have selected that paragraph and invoked See Related Text, with the same result. I find it a wonderful way to explore ideas, or to break writer’s block.)

Note that Johnson’s database includes longer items, as well as those snippets of up to 500 words. In other writings, he has noted that he uses assistants to help scour the literature for excerpts that go into his database (I don’t have that luxury).

My reference databases differ from Johnson’s in that I prefer including complete articles, reports or even books. I do most of my reading and research within a database, as it provides a rich environment including powerful searching and, of course, See Also. If I wish to see what other documents contain a particular term, Option-click immediately lists them. or I can select a phrase and do a Lookup search for that phrase.

I don’t find most papers or articles too large to make See Also less functional for me. True, if See Also suggests a large report or a book, I may have to search within it for the relevant section. I mitigate that over time, for interesting topics, by adding a linked rich text note that may include an excerpt (like one of Johnson’s snippets), or summary with comments—and that note might expand to cite similar references in the database, perhaps with page links or Lookup phrases that make finding the important section easier.

Abstracts? I collect very few, usually from journals to which I don’t have full access, or for a topic that’s of only potential interest and thus as a ‘marker’ for a reference that I might want to explore more fully, later. Otherwise, I want the full article in my database, if the topic is one in which I’m interested.

houthakker · April 9, 2011, 10:07pm

The title of this thread may give the wrong impression about the function of the script

It does, as it happens, include the summary/abstract field among its exports, (in response to a request - and perhaps summaries are indeed more useful in DT than in Papers2), but more importantly, it also exports notes …

(Papers2, unfortunately, does not segment its notes for a reference - there may well be an argument for creating a separate record for each paragraph, though sequence and coherence might be lost).

(But I always enjoy discussions of auctoritas - what would Freud, Marx, Keynes etc have said ? Perennially diverting, for some reason … Experimental method has displaced it in the natural sciences, but fortunately it still seems to find a paleo-botanic niche in the humanities ).

houthakker · April 10, 2011, 7:40am

A thought stimulated by Bill’s helpful reflection is that I could modify the script to:

Export the summary field as a separate record (without the notes section).
Allow for user segmentation of the notes field (lines ending in a colon could be interpreted as new headers)

The modified script could then export the Papers2 notes field to several DT records. The colon-terminated lines could be used as record titles, and the rest as a comment field.

(Each note record would still include the Author/Year/Title and citation tag of the source, as well as the comment, and be placed in the DT folder for that source).

A modified script could also index any PDF attached to the Papers2 record, in the manner of the Sente to DT script.

[DONE: Segmentation of notes by colon-terminated headers, and indexing in DevonThink of PDFs attached to Papers2 references, has now been added to ver 0.022 of the script].

(I am personally still using Sente, which provides built-in segmentation of notes, with Title, Quotation, Page, and Comments fields - generally a better framework than Papers2 for a database of well-segmented notes. Text entry in the Papers2 note-field can also be puzzlingly slow in the current build …).