Changing my way of using DTPO for research

First off, I have been using Devonthink for several years and been enjoying it greatly. I appreciate the versatility and stability of the software.
I am a researcher (life science) running a lab and use DTPO mainly for two tasks:

  • topical databases for things that are central to my research. I just dump everything in there, knowing that this will be the place to look it up be it a pdf research article, a hurriedly scribbled note of an idea I had, etc.
  • I index all my research literature that I keep in my reference management software (currently in Sente and Papers, but I will finally make up my mind once Papers 2 is out).

While I used to work on two different computers (office/home) and synchronized both to a portable Firewire harddrive, I have now put everything on a MBP 15’’ which has become my main computer. It has its disadvantages such as lugging the computer around all the time but at least I have one big worry off my mind (keeping everything in sync). Besides, I am not sure if the Sorter would take kindly to being sync’ed between two computers.

Since upgrading to DTPO 2 I have been asking myself if I make the best use of the software. Most notably, I keep all my files in folders in Finder, only some items, mostly information snippets clipped from webpages, actually reside inside my libraries.

One thing I am particularly unsure about is if I should depart from the filesystem-base storage concept and declare and use DTPO my one and only information administration hub. Before you point me elsewhere: Yes, I did read the online help and and searched the forums and am at roughly familiar with pros and cons of either approach. Feel free to point out particularly relevant points though. But what I don’t know is if all the information is up to date and how much of an (dis)advantage some things represent in practice. Hence the longish introduction to give you an idea about my work.

In no particular order:

  • sync and portability
    Working on a single portable computer provides some degree of portability of my data. Does it still make sense to strive for importing data or would an indexed approach have its advantages since the portability is already provided by the laptop (even though the database would not be self-contained)?

  • file types: If some filetypes are not understood by DTPO, would creating a less Finder-based approach such as the DTPO hub wold represent create a bias against finding and using the information contained in those particular file types? Right now I am painfully aware that I need to look outside DTPO but this feeling might vanish.

  • Version 6 of Sente does not store pdfs in the file system space any longer but inside its (cryptically named) subfolder structure hidden inside a UNIX package. But its note-taking abilities are awesome, and the comments can be exported to the file system as text files, complete with quotes, my own comments etc, so I am kind of reluctant in giving Sente up for making (not maintaining) my notes about research literature, not even using this technique http://www.devon-technologies.com/scripts/userforum/viewtopic.php?f=2&t=9434.
    One advantage these annotation files have is that they are searchable. Perhaps I’ll find a way to share the pdfs between Sente, Papers and DTPO like in the old days.

  • What is currently (!) the best way to work with documents that reside inside DTPO? I have experimented with the template-based approach described here
    http://www.devon-technologies.com/scripts/userforum/viewtopic.php?f=2&t=9506&p=44140&hilit=template#p44140
    and
    http://www.devon-technologies.com/scripts/userforum/viewtopic.php?f=2&t=7595&p=35731
    Does this still apply and how do you deal with filetypes that you use less often or that DTPO does not understand? Either way there would be no templates to facilitate making DTPO aware of the new file. Or can you automate this?

  • Is importing and killing the duplicate outside DTPO after successful import creating friction or is it still worth it? Again, can this/should this be automated?

  • size.
    DTPO is incredibly fast and stable. But my current databases are already measured in Gigabytes (including their internal backups though). Would I not hit a reasonable size limit if I tried to import everything? Even a product in the top league will have its limits somewhere. I can foresee trying to split my current databases into more numerous, but smaller databases as a consequence of this. Would this entail some consequence elsewhere in my workflow that I may not be aware of? Such as the inability of creating replicants across databases? What else?

  • longevity
    Generally I try to keep things simple because every shiny gadget can break. Since we are talking about possibly decades of work ahead still even a rare error is a threat. So I am thinking about relying on simple lookup functionality (instead or at least in addition to a link), in which case the fancy search stuff would take place in realtime and the only thing that needs to be stored is a little string of text. Moving or renaming the file that contains the target passage is no problem then. Paranoid? Lately some people lost all their notes due to the fact that Skim (a very nice pdf reader) stored the notes in a place that was unsafe, so in some cases all notes, sometimes representing years of note-taking, were gone (Yes, I know, backups and everything, but you get my point).

Am I trying to over-engineer my work or are some of these points relevant? How did you solve your problems? I am particularly worried that in solving one problem I am creating another (does this work-flow really work? and how do I make sure there is nothing disapearing in the cracks). If the new task is similar in dimension to my old problem then I ought to stick to the file system approach (keeping things simple, remember?)

Any and all comments particulary real-life experience from similar fields are most welcome.

Prion

PS: Sorry for the long post.

I can address a couple of your questions, from my own workflow. I use DTPO almost exclusively on my desktop machine as my information hub, and I sync with a MacBook.

Index or import: Of the 8 - 10 databases that I keep open continually, those that are reserved for client projects have their data indexed. More accurately, the data I get from clients is indexed, and my work papers are only within the DT database. I do this solely because of document retention/destruction requirements of the clients. My own databases, research archives, etc., I keep all within DT. My bias is toward importing.

File Types: I use lots of file types and lots of tools. I rarely have an issue using Open With in DT to open/save a file from its native app. There’s always the fallback of using “Reveal in Finder” and then opening the file from its Finder locale. Its just never a problem.

Annotation: Note taking is the least settled part of my workflow. Since it is a very frequently asked question on these forums, I suspect that most people (except Bill DeVille who is perfection on this topic :confused: ) use an ad hoc variety of approaches. Notating PDFs is the main problem, and it depends on PDFKit’s shortcomings, IMHO. Ideally, I would like there to be an annotation layer that is searchable within DT, but that doesn’t modify the underlying text layer. Recently I’ve found that Curio can do passable job of that, and I’m using it more frequently for annotation. Except, there’s no way to easily get those notes out of Curio.

Importing and keeping a file outside DTPO: Kill the file outside DTPO. Inside DTPO it’s still a file in its native form, and you can copy, export, or whatever. So why keep another instance?

Database size: when they grow big (see other posts here for the recommended limits) just split in some logical manner.

Stability/longevity: Nothing lasts; not DT either (sorry, Eric). Keep your data backed up in as close to native as you can; export notes separately if you can; and hope for the best.

Hi korm

I remember you mentioned elsewhere that you do not use the Sorter (I do). Perhaps when I have settled on something worth being called a workflow I may return to sync’ing the two computers and just carry a 2.5 HDD around, but until then it is probably safer to use the laptop and go from there.

I was more thinking along the lines of what material I should import into DTPO and what to keep outside. I am not talking about stuff that is irrelevant, that is trivial and needs to stay out. But what about relevant stuff that is either very big and/or not understood by DTPO anyway? I am of two minds whether or not to import this kind of relevant material or not.
In this case there would be just one instance of each file, either inside of DTPO or in the Finder, but two places where I would need to look. Hmm.

On the other hand, I am reluctant to commit all my digital life to DTPO. I have no clear understanding as yet how much this would exclude the use of other tools. It is more a kind of weariness in the presence of “my way or the highway” type of applications, I guess.
What do you do, what do you keep inside/outside?

Annotating pdfs and taking notes on research papers is very important for me, too. Although not settled either, I feel that Sente is very close to my way of doing things on paper, I move to a certain part of the relevant page, highlight or underline a passage and scribble a note to the page margin (or sometimes use another piece of paper which I am then frightened to lose). The Sente notes can be exported as files and the format is almost endlessly configurable. Sente does have its drawbacks, but the notetaking for pdfs is really nice.

Yep, I am aware of that, but that is more a question of how much of a data lock-in a certain app imposes on you. DTPO is fairly friendly here, though, you can always export to the Finder.

But some features, such as the application-specific links to documents are nice in principle (papers://xyz and the like) and will work for a while. However, when I tested the “rename my pdf collection” feature in Sente which works very well, the links kept working, but when I sync’ed this to another computer all the links were gone. Dead. Luckily, I was still testing and did not lose any data. This is a different kind of longevity issue. It will probably be unimportant when the lifetime of a project is limited but for a life-long occupation with a subject, it may be something to keep an eye on. Perhaps it would have been a solvable issue but I am drawn towards a simpler technique, DTPOs lookup feature: http://www.devon-technologies.com/scripts/userforum/viewtopic.php?f=3&t=9360&p=43640&hilit=lookup%2A#p43640

Thank you for you helpful remarks, much appreciated, and do post new ideas to help me form an opinion.
Prion

I look forward to tracking this thread. Great questions and information.

Re: workflow: as an academic myself (Business School), I throw all kinds of news items into DTPO as web archives. In fact, this is probably the most common file type in my more active database (more on database splits and workflows later). I tend to store documents and journal articles (pdfs for the most part) on my hard drive and have DTPO index them for quick retrieval. Like the original poster, I’ve wrestled with whether or not to put EVERYTHING into DTPO and be done with it, but the size of the resulting database and subsequent performance issues, if any, turn me away. My thinking is that, as I say, DTPO indexes everything for me, so I will find something regardless of whether it lives within DTPO or not.

At one point I was using DTPO for notes, but I’ve recently found Notational Velocity to much better for quick notes. Most of my writing is done either in Bean or TexShop (for LaTeX).

Have to admit, I’ve never fully embraced the multiple database concept. I do have separate databases for teaching, research and personal stuff, but that’s all. I do find that my teaching database is far more likely to have diverse file types. Same for my ‘personal’ database.

Just had an idea. If anyone is interested (say one or two others?), I would be keen to get a few people together on skype and record a podcast on this very topic, if only because it is precisely the kinds of thing I would personally want to listen to. I’ve done a few podcasts before (for www.nzmac.com) so I have the ability to record, edit, etc., but we would need to impose on the folks at DT to host the mp3 file. Anyone?

David

Tracking? Nah: Contributing! :smiley:

I am becoming less sure that the size-related performance difference between indexed and self-contained databases is dramatic. In fact, it may not even exist at all. If I understand it correctly, the size difference is caused mainly by the refeenced files themselves which are contained in their original form within a UNIX package. The performance, on the other hand, should be limited mostly by the complexity of the contents of the database that keeps tracking those files, be they inside or outside the UNIX package. It might be worth checking.

It is at least possible for an application to behave like this. I recently went from an indexed to a self-contained library of digital photos in Aperture 2. The monolithic library is now 110 GB (yes, that’s GIGAbytes) and the performance difference to when the library appeared sleek and lean with all the actual images residing outside is minimal. When you right-click the monolithic package you are allowed to view the content in the Finder (applies to Aperture, DTPO etc. libraries) and it looks like not a lot has changed at all, there is the database and there are the files. It is just less likely that a user or a misguided other application creates a big mess resulting in a disagreement between database and the files referenced therein.
Whether DTPO is in that league I think I (read: someone) should test when I have a moment before discarding imported libraries as an option.

True, but what other consequences does importing vs indexing have?
Some have been mentioned already:

contents somewhat hidden (advantage: database less likely to get out of touch with referenced content; disadvantage: content has to be accessed through DTPO (is that so?))

size may encourage user to split into several databases, but replicants and “see also” feature do not work across databases (is that correct?)

What else belongs here?

I am debating two things mostly:
Dump everything of relevance for a project in or not? If so, can I still access things through the Finder? I probably should not but then, what is the point of doing so? To keep everything in a single place? What is the difference between making the Finder this place then and just index the folder containing the relevant files? Is Synchronizing its contents to the DTPO database enough to keep everything healthy as the project keeps evolving?

The second thing I forgot. I’ll go through a mental reboot and report back in, the post is long enough as it is already.

Prion

Good points. I suppose I’m not quite ready to throw everything at DTPO, and am happy with doing the keyboard shortcut for Synchronise every day or so. This method/approach probably has drawbacks that I’m not quite aware of yet, but it seems to suit my workflow (or is it the other way around!). I suppose it comes down to the fact that the indexing capabilities of DTPO allow for indexing of material outside the database to begin with, thus negating the need to import? That’s how I’ve always looked at it, but then I have to admit I do not use the AI features of DTPO that much which may, or may not, be entirely dependent on a self-contained database.

When I use DTPO to search my indexed folders (i.e., my journal article repository in Finder), I do indeed use DTPO to display and do any further searches within the document I’ve pulled up. I suppose you could ‘reveal in finder’ but I’ve never felt the need. DTPO manages it well enough for my needs.

David

@ Prion re Import/Index differences and issues:

Size: In DEVONthink 2 there’s little if any effective difference in size between an Imported or Indexed database.

True, the database package size of an Imported database will be larger than the database package size of an Indexed database.

But that’s not relevant to memory usage and performance.

In the DEVONthink 1.x applications, some types of files were stored in the monolithic database, which had to be loaded into memory. Such files included text files, RTF/RTFD files, HTML, and WebArchive files. Therefore, an Imported database which held such filetypes had a larger memory footprint than did an Indexed database holding the same content. And Imported database could take longer to load into memory when opened, and especially if RAM resources were low, performance/responsiveness of the Imported database compared to an Indexed database with the same content could suffer.

The DEVONthink 2 database structure now stores all filetypes in the Finder, so the differences in memory footprint resulting from capture mode are no longer significant. Likewise, there’s really no difference in disk storage requirements, assuming in the case of an Imported database that the files copied into the database are subsequently deleted (which is what I do).

Portability: Because Import-captured databases are self-contained, the advantage goes to this method of capture if one wishes to be able to easily move databases among computers, or to transport or even run databases on external media. Because information will be lost if the Paths of Index-captured files are broken, moving the database also requires moving the externally linked files in such a way as to retain valid Paths.

I frequently do move my databases, so I prefer Import-captured files. This also simplifies backup. If I backup a database (e.g., using File > Export > Database Archive), I’ve also backed up all the referenced content as well; the compressed archive holds the complete content of the database. That would not be true for such a backup of an Index-captured database.

Organizational flexibility: I like total flexibility to organize and reorganize the group structure of my databases. Here, the advantage goes to Import-captured content. That’s because if I wish to use Index-capture and maintain synchronization of my database content to external files and folders, I need to keep the structure created by the initial Index capture of the Finder folders and files.

Considerations favoring the Index-capture mode:

Need to stay in synch with a structured source of new and/or modified data: Suppose it’s important to keep updated to a source of files such as a server to which new content is systematically added, and/or on which the files are sometimes modified (e.g., standard operating procedure updates). Here the advantage goes to the Index capture mode. Invoking File > Synchronize on the top-level groups on the server will update the database content. One could attach the Synchronize script to group(s) in the database, so that when such a group is opened it automatically synchronizes to the current content of the corresponding folder on the server.

Need to share files with another database or application: Example: if I used a citation manager database to manage PDF files in its database, I would likely Index-capture those databases rather than Import them in to a DEVONthink database. Now I can use the information content of those PDFs in my database without in any way affecting the operation of the citation manager, and if new items are added by the citation manager application, I can synchronize them into my database. Note: I would expect that some day in the future it will be easier to share content within a DEVONthink database with external applications. For example, use of URL/UUID file information allows access to a file contained in a DEVONthink database by some applications.

Two other points:

Spotlight compatibility: One has the option to provide Spotlight index metadata so that Spotlight searches can find content stored within a DEVONthink database.

A future release will enable cross-database use of Classify and See Also. This should add confidence for creation of multiple topical-oriented (or historical) databases.

YAPBDP (Yet Another Prolific Bill DeVille Post). :mrgreen:

Still can’t convince you to post these gems somewhere they’re easier for everyone to find/reference later? Lots of good stuff only buried deeply in forum threads. :frowning:

Calloo Callay!

Perhaps you missed: this. :slight_smile:

Hi Prion

I am a new user of DevonThink, and came across your post trying to answer some questions of my own.

Regarding Sente 6, it only uses a package and cryptically named files if you are set up to sync between computers. If you turn sync off, you can name the files intelligently.

Discussions on the Sente forum point out that indexing a “sync-ready” folder is not a good idea because the files come and go, are duplicated, renamed and deleted, according to Sente’s needs. However, if you don’t sync, then you control the horizontal and vertical, and indexing should work.

There are long discussions on the forum about how to convert your folder–some could do it, others had problems.

In the Sente 6 pref’s, you can change it to keep files in place. I successfully linked DT indexed files to the Sente database. Now to figure out a good way to keep the PDF notes in sync.

@alanterra & dizziness

many thanks for your suggestions. I don’t think it is Sente’s sync’ed libraries (which I don’t use) that is creating the cryptically named pdfs, rather it is a consequence of letting Sente keep full control over the attached pdfs by letting them reside inside the library.
I have opted not to keep my pdf files outside the Sente library bundle because I work at two different computers as the need arises. I have experimented with many methods that would keep the links intact on both machines but over time each and every one of them failed under certain circumstances, forcing me to re-connect the pdfs (the automatic link repair in Sente did not always work as expected).
Keeping everything (references and pdf files) inside the Sente bundle keeps everything under Sente’s control and so far it looks like a smooth ride, I can easily move the library to another computer and everything Sente cares about is in one place. The downside is that Devonthink does not see those files normally.

However, as noticed here http://www.devon-technologies.com/scripts/userforum/viewtopic.php?f=4&t=10360, creating a symlink to the attachment folder inside the Sente library bundle and placing it outside this bundle lets Devonthink index the pdfs and also later on synchronize newly added files.

Hope this helps
Prion

@all

I have moved all research related material into four topical databases, I think it is the only way to find out if this will pay off.

1: Contrary to the indexed approach the monolithic DT databases of the imported type are self-contained and easily portable to a new computer. Good. I will have to see if I miss any of the tools I would normally use in the Finder.
2: Most often I use the three-pane view because it gives me the A) the Finder-like context, B) the number and name of other files in the same group and C) the content of the selected document.
I already used this view before but I was surprised to realize how much of an impact this has on my workflow.

At the moment I am reviewing a manuscript and I have the file containing the body of the text, another file with the figures, yet another with the supplementary material, then a file outlining the reviewing guidelines of the journal and another file I type my comments, all sitting inside the same group. I can move very quickly between them using just the keyboard and still see a representation of the content which is actually usable (unlike the coverflow icons in the Finder which I never liked) with access to highlighting and annotation features. Whatever I can see, I can start working with right away without having to open it first.
A small thing perhaps but something I haven’t been able to achieve quite the same way using the Finder and Quicklook. None of the things under point 2 actually require an imported library but perhaps I am more in a discovery-ish mood because I committed myself fully and made Devonthink a more central place in my workflow.

Bonus: Because I have my Sente pdfs indexed in the same database, I can at the touch of a button search for similar articles.
The paper I am reviewing is actually rather good, the South African red wine that sits beside me is even better. Bliss.

Prion,

I have followed yours and the other entries about Sente and Indexing vs. Importing, thanks a lot for finally telling about the workflow you came up with.

But the last paragraph makes me wonder: I thought you came up with importing every PDF, but then you say you index your Sente-PDFs? Did I get anything wrong?

Thanks for clarifying, and some more nice evenings with a good glass of wine,

Maria

Maria
yes, I import all research-related pdfs, but import into Sente, not Devonthink and indexed Sente’s pdf folder in DTPO. The details are in my first post from March 04 above.
However, one of the recent updates of Sente broke that system somewhat because Sente will not let me choose the naming pattern of the pdfs inside the Sente library. Word is that this is making a re-appearance in a future version, though.

Much as I like the interplay between Sente and DTPO, I am going through an ever increasing imbalance between the joy of handling the tools to carry out my research and my research as such. I am spending less time mucking around writing Applescripts if something is not working to my liking, not because they will not work but because it is beyond my control how long it will be until one of the programs I depend on will break this script.
The fewer programs that stay out of my way and require a minimum of maintenance the better. I now have more time reading other people’s papers and writing my own, administration is already requiring too much attention.
I don’t mean to discourage, some day Sente may restore the functionality and I will return to indexing its pdfs in DTPO on a real-time basis, for the time being I update the index manually every now and then and put up with the fact that the pdf title is totally garbled.
Still I want to get that paper out until the end of this week and I will not let my computer be the reason I won’t :wink:
Prion

Prion,

Thanks for your reply. It helped, but I particularly liked this statement:

Beside the worries about the stability of a system in the future, this is what I am mostly concerned with as well. One reason that I abandoned DT even for some years – I only worked with the file system and simple notes in BBEdit. So now I try a second time to make maximum use of technology without losing time for administration (after the time I have to spend for setting up the system).

Good luck for your paper, mine is due the weekend after this week…

Maria

Prion,

I don’t know if you’re still monitoring this thread, but a sentence in your original post caught my attention:

It finally came out a couple of weeks ago. Which reference management program do you now favor? Has anyone else tried it?

Rick

For those who wholly or partially Index content to their database but also wish to mirror new content in their Indexed database groups back to the Finder, DEVONthink version 2.0.9 will allow you to move such content from the database to the corresponding external folder, then synchronize the folder back to the database.

For example, you may wish to share your OCRed PDFs with another database, which can then be Indexed into your DT Pro Office database.

But OCR by DT Pro Office results in Import of the searchable PDFs. Previously, in order to achieve your wish, you would have to export the PDFd to the external folder, delete the originals in the database and Index or Synchronize the external folder to the database group in which those PDFs are to be displayed. That’s a lot of work.

Now, there’s a command in version 2.0.9 that can save you almost all the work. Move the PDFs into a previously Indexed group, invoke the command and Synchronize the group. Your OCRed PDFs are now stored in the desired Indexed external folder, and displayed in the desired Indexed group in the database. They are now available for sharing, e.g., with your citation manager database.

See the Release Notes and user documentation for version 2.0.9.

Of course, there are other tools in DEVONthink that will let you change Indexed content to Imported content. It’s up to you – and you can change your mind later! :slight_smile:

Bill, I am rethinking my approach to DTP and I found your post quite helpful. However, it’s two years’ old, so I thought I’d check. Currently, I have multiple data bases, several of them in the several-GB range. They are indexed because I thought that would reduce RAM issues. That seems not to be the case. If so, I will move back to an “imported” approach. What is the best way to do so while retaining the A.I. information?

Thanks
Paul