Building Library of Journal Articles, Conference Papers

milligan · August 19, 2005, 6:05pm

Would appreciate feedback from experienced users about whether I am approaching my deployment of DT Pro v1. properly. Does the following make sense as a way to organize our information and make it accessible via DT? I suspect it is a fairly common situation.

I am creating a large database of pdf’d articles from 5 annual conferences of the International System Dynamics Society, an assortment of simulation model files, and URL links to articles accessible through the online version of the System Dynamics Review. Devonagent is working flawlessly to capture web archives of the abstract pages for each article. All the local files (pdfs and models) are copied to our server, where they are accessible via our network.

We will deploy three copies of DT on laptops that connect to the server when the users are in our offices. We want the 3 DT databases to be identical. The pdf documents will not be housed within each database on the laptops, but will reside only on the server. The databases will simply link to the copies on the server, saving hard disk space on the laptops.

The pdf articles came on CDs, each corresponding to the year of the conference, but they deal with a whole range of topics, transportation, energy, health, project management, environmental, systems theory, etc. On the server, the pdf file content from each CD resides in a separate folder, labeled by the year of the conference (i.e., one folder per year). I have imported each folder into a laptop-based DT database. The files are grouped into a folder whose name is identical to the one on the server. The file path shown in the DT database leads back from the laptop to the original folder on the server.

I really want to be able to access all these files by the subject matter that they relate to, not the year of publication. I could leave them in the groups categorized by year of publication and rely on DT to search by content, but that isn’t generating sufficiently narrowed results. So, I have created topical groups (transportation, energy, fisheries, environment, business, project management, etc.) Originally, I created replicants for every pdf, and then displayed the first page of each pdf in the “vertical split” view of the database, determined which topical group the document most related, then manually moved one of the replicants for that file into the topical group folder, leaving the second replicant in the original group relating to the year of the conference. I did this so that I could easily see where each paper originated from, since this is not included in the pdf document itself.

I now believe that I don’t need to use replicants at all. I should just sort the files into topical groups in the database, leaving them in their original arrangement by year, on the server. I could rely on the path to tell me what conference they came from, but it is hard to see in the column view. I have opted for adding the conference year to the spotlight comments field for all files in a folder, using an automator “add comments” script, which means that when the files are imported into DT, the spotlight comments show up in the comments field in the DT database.
So, the DT database will be grouped topically, but the files in the database will link to files on the server that are grouped by Conference year.
First: Does this approach make sense?

Second, if I copy the database on the first laptop and put the copy on a second laptop, will the paths be correct or will I have to amend them all to reflect the fact that the DTP database is on a different computer and possibly in a different location within the HD directories?

Third, is there any way to gain some efficiency in sorting the files into topical groups? By far most common content element in all the files is “systems” so, DT seems not to know how to distinguish content, which means that the “auto group” command pretty much leaves them all sitting in the same group as they were originally. Interesting, using only one year of publications as a test (200+ articles), I found that the “see also” feature worked pretty well finding related articles, in random testing. Not sure why that would happen, since I assume the same AI logic is being applied to the content of the files.

Fourth: When DT displays a web archive page, the URL links within the page are active and generally seem to work fine. However, when I attempt to activate a link on a journal publisher web site to download a pdf file of the article that is abstracted on the web archive page, DT crashes. Putting aside the fact that crashing shouldn’t happen, is there something I could do to avoid this problem? I suspect that it might have something to do with the fact that when I access the publisher web site via DT, I have not logged in as a paid subscriber, so do not have permission to download the pdfs. However, in Safari, this would simply lead to a window explaining that I can’t download since I am not a subscriber. In DT, it results in a crash, at least on the site I am dealing with.
[/b]

howarth · August 19, 2005, 9:06pm

“I really want to be able to access all these files by the subject matter that they relate to, not the year of publication.”

I would suggest using the Comment field to enter keywords. Then search on Comments and you’ll find all the relevant items.

As you import the PDFs, change the item subject line to the date of the paper, using the European system: 2005/08/19 for August 19, 2005. All the items will then auto-sort by date (if you use the default sorting style).

Beyond these two factors, you don’t really need to sort the items into folders and subfolders. Instead of doing all that prep work, just create a good, consistent list of keywords and share that list with your colleagues.

I offer that suggestion based on my own experience, of creating a highly structured set of folders and later finding that keywords would have been a simpler way to go.

milligan · August 23, 2005, 2:59am

Thanks, Howarth, for your advice. I am going to re-think the approach I have been taking. Luckily, I am not too far down the road.

cgrunenberg · August 24, 2005, 5:09pm

If the path to the files on the server is identical for all laptos (e.g. /Volumes/theServer/wherever/…), then this should work.

This feature definitely needs some more work and finetuning but it’s just v1.0

In my experience the most common reason for this is that an Acrobat Internet plugin is still installed. And this plugin is not compatible to the new WebKit of 10.3.9/10.4.x (only to Safari). However, if you’re using 10.4.x, then both DA and DT can display remote PDF documents without an Acrobat plugin and therefore removing the plugin should fix the issue.

milligan · August 24, 2005, 5:24pm

Thanks, Christian. I will go searching for the plug-in. I recently updated Acrobat, so you are probably correct in your diagnosis of the problem.

Since my original post, I am also re-thinking the approach of keeping our library of articles on the server. Most of my use of the database occurs when I am away from the office and seeing only a reduced image of the first page of the pdf documents (because the original linked documents are not accessible) is not working too well for me. We are going to try burning the whole library to dvds for each laptop user. Would probably have to re-import all the documents since the file paths will be different, but if we aren’t going to sort them into topical folders (relying instead on keywords in the comments field) this won’t be much of a problem. It does mean, however, that we should modify the comments for each file in the finder info field before we burn the copies to dvd, so we don’t have to duplicate the categorization for each DTP database.

Bill_DeVille · August 24, 2005, 7:59pm

milligan:

As you are rethinking the location of the PDF contents anyway, why not consider the approach of importing the PDFs directly into the Files folder of the DT Pro database package? One would do that, of course, after properly setting the DT Pro Preferences options for PDF & PS imports: check the option to import to the db Files folder; check the option to use PDFKit, but DO NOT also check the option to use rich text. Then do File > Import > Files & Folders from the source PDFs (whether from the server, or from a DVD).

You would probably appreciate the faster access to the PDFs (compared to DVD access) and wouldn’t have to remember to carry the DVD about with you.

And it would be a simple matter to update the database periodically as new conference materials are released.

You are only talking about a relatively few gigabytes of data. These days, laptop hard drives have gotten pretty large and/or portable FireWire drives have gotten fairly cheap.

Another option would be a read only copy of the database (probably based on a DVD), but I don’t recommend that, as you and others will probably want to be able to write in/add to the database.

milligan · August 24, 2005, 9:50pm

Good idea, Bill. Might as well do it that way. My problem is lack of space on my HD. BUT, I think a good part of my problem is really due to poor housekeeping. Lots of duplicates or successive drafts with small revisions. Also, I tend to create copies of storage-intensive products like Keynote presentations (lots of graphics) when I customize them for different clients. Gobbles up space. DTP should actually be able to help me with all this clutter and mess, but my current situation is the result of years of bad habits (good intentions, bad habits), so it will take a while to get it under control and in the meantime, I have to earn a living. So, doing payable work is a priority. I could, of course, buy a larger HD for my Powerbook, but then I’d just a have a bigger closet to throw stuff in!