Using DT as a reference database – best practices

Hi!

I’m a journalist in need of a reference database. I need somewhere to place ideas, random notes, highlights from articles I’ve read, web pages, pdf-documents and images, etc . Stuff that I need right now or might need in a year or two.

I’ve tried DT a couple of times before, but it never stuck with me. With the new sync I’ve decided to give DT a serious chance. But before I dive in and try to set things up myself, I would appreciate any input on how to have DT store what I need.

What I need to store

  • Plain text notes I create myself.
  • Plain text notes that I create using this Ifttt recipe (https://ifttt.com/recipes/455089-instapaper-highlights-to-dropbox), which takes highlights I make in Instapaper and stores them as individual plain text notes on my Dropbox account.
  • Academic papers in pdf format (that I want to both read and annotate)
  • Offline copies of web pages
  • Images

Devices in use
I want to use my reference database on OS X, iPad and iPhone.

Questions

  • To me, all of the information I want to store is treated the same, it’s stuff that I need in my work. So it seems natural to have just one database. But is there a reason to split things in more than one?
  • Indexing vs importing – when is either the better choice?
  • If a folder is indexed, I can still access the files outside of DT which means that they are available for backup as individual files. If I import stuff into the database, am I running any risk of a total corruption of the database, loosing everything?
  • One of the reasons I like plain text files and pdf:s is that there is minimal lockin. That I have everything in those formats today is the reason I can consider trying DT in the first place. If I import files into the DT database, am I loosing this possibility for the future or can files easily be exported in their native file format to a directory structure on my harddrive?
  • Can DT monitor certain folders (like the Dropbox folder with plain text highlights from Instapaper) and import every new file that goes into it?
  • Dropbox seems like the best sync store option for me. A lot of my plain text files are already on Dropbox. Can DTTG access them directly, or should I still create a sync store on Dropbox? From what I’ve read in the documentation, the answer is yes, in part because of sync performance and stability. If I go with indexing, can the content of the plaintext files still be accessed on DTTG?
  • Can DT monitor RSS feeds and create offline copies of every URL in the feed? I’m collecting bookmarks on Pinboard and would like to have them archived in Devonthink as well.
  • Can DT import bookmarks from text files in Netscape bookmarks file format?
  • Does any DT edition do OCR in jpegs?
  • What format for web pages – web archive or pdfs – do you recommend? Currently I’m doing this with EagleFiler, as web archives.

A lot of questions. Any input much appreciated!

Thanks,
Anders

I know it’s annoying to suggest “browse the forum”, but in the case of DEVONthink I think that’s an essential part of learning the product. There are years+ advice from some very informed and adept users and the advice is invaluable. Yes, PHPBB is ugly forum software to use and makes search difficult – but time and patience will pay off.

This is a personal choice, which I make based on categorization: for example: personal info vs. work info, or one client vs. another. A second factor is size. I usually have one database per client and over time I split off data that has aged out and is no longer relevant to the current work into a archival database for that client’s work. It doesn’t matter how many databases you have.

The eternal question. Bottom line: it doesn’t usually matter as far as desktop use is concerned, though in my opinion with the current operation of DEVONthink to Go I prefer importing because for now I’m not trusting and comfortable with the process of syncing indexed files, personally. (I love my data and don’t trust it to a process I cannot monitor) Please search “index vs. importing” in the form using the “Advanced Search” feature of the forum and read everything Greg Jones (@greg_jones) and Christopher Mayo (@FROBGOBLIN) and Bill DeVille (@bill_deville) have written on the topic. Pay particular attention to the discussion of how to handle moving indexed files. The current methods can seem inelegant – because in some situations they are cumbersome.

I like Christopher’s article here: christopher-mayo.com/?p=2376

Imported documents are stored internally in a folder structure, and the documents are untouched – they are stored intact and unmodified. Databases are what Apple calls “packages” – a type of folder. You should backup databases and treat them like any file from a backup and security perspective. You can export some or all of your database, using the DEVONthink client, at anytime you want. If you export a group hierarchy containing documents you will receive a folder hierarchy viewable in Finder that mimics the groups and contains the documents.

Your documents are not modified. See the last question.

Yes. I index a number of Dropbox folders, refresh the index in DEVONthink and voila – my changes are in the database.

DTTG cannot access Dropbox files directly (i.e., it doesn’t act like Ulysses or similar apps). What you said is correct: “create a sync store…” etc.

Yes, DEVONthink can monitor RSS feeds. (Not DTTG, though.)

Your best bet I think is to export your bookmarks from Netscape, import to Safari and then import the Safari bookmarks. You can use the Clip to DEVONthink browser extension, too. I’ll admit I’m not a Netscape user, though.

DEVONthink Pro Office. As you know, the result of .jpeg OCR depends on the quality of the .jpeg. If you take an image of a skewed document you might get odd results, for example.

Doesn’t matter, really. Except if you like to refresh archives occasionally, in that case I would use a web archive. You can capture a web archive and then, from inside the database, have DEVONthink made a PDF from it. You cannot make a web archive from a PDF, however.

Your databases can be searchable in Spotlight, also, if you click the “Create Spotlight Index” option in the Database Properties for that database. A Spotlight search will find your documents – it doesn’t however report the database or group where that document is located.

I suggest also looking into DEVONsphere Express (a menu bar search tool that can also locate database data) and DEVONagent (for web research and import into DEVONthink).

Thanks for the shout-out about my indexing post. Some of the particulars in the blog post are a little outdated (evidence that DT is constantly improving the product), but I think it gets across the main points about how I use indexing. In general, I prefer indexing for the flexibility, and don’t really see many downsides to it as long as you are good about making backups in case you run into strange behavior (usually user error).

I’m in the same profession, same requirements. Some quick observations:

  • For random text notes (including the same IFTTT recipe you mention) I have a folder in Dropbox called “MemoryTXT” that I index in DT’s Global Inbox. This is so I can generate notes from different places in iOS but have them all land up in one place in DT. Some are tapped out in Drafts, some are clipped from iOS Safari using Workflow and then processed through Drafts to MemoryTXT, some land there via Instapaper and IFTTT.

Where possible, clips are automatically renamed with a year-month-day stamp for easy chronological sorting.

Far better if Devonthink To Go had all the clipping options of the desktop version, but it doesn’t, and this works quite well.

  • The new Devonthink To Go has consolidated Devonthink’s role as my repository of all knowledge. With it, Devonthink is the best reference database system out there. I can’t think of any better ecosystem for journalists - not least because you have the option of keeping data in your own garden, without the need to transact through another company’s servers.

Not annoying at all. I’m browsing along, but these were specific questions that had surfaced. So I really appreciate you took the time to answer them!

What I don’t get with syncing indexed files, and might be what you are referring to here, is how they are synced at all: On the desktop, DT have pointers to the folder where the files are but they are not imported into the database. But does that mean I can’t edit the indexed files on iOS?

Also, when reading about syncing vs importing, I more and more get the impression that it’s not an either/or choice that has to be made: I can index some folders and import others? Pure reference material, like pdfs and web archives, could be imported because I will never edit the files but my plain text files that are more of “living documents” that I might want to edit with other files could be indexed instead. (As long as I still can edit the indexed files on iOS.)

Would a database with internal problems fail totally at once, or could some parts of it be ok while others not? What I’m wondering is if the parts of the database I use daily can be ok while there is some kind of error in a more seldom used part of it, which would mean that chances are slim I wouldn’t notice in a long time which in turn would mean that doing a restore would be hard. (Because there would be a lot of new information added between the last ok backup of the database and now.)

I was not clear: My question was not about automatic indexing of Dropbox folders, but automatic importing from a Dropbox folder. Could DT import all new textfiles created from Instapaper/Ifttt automatically – and delete the files from Dropbox once the import is made and confirmed ok?

I’m not a Netscape user either. But from Pinboard I can export bookmarks in a textfile in “Netscape bookmarks format”. Eaglefiler can read that file, import the bookmarked URL and add the tags and annotations I’ve made on Pinboard (which both are included in the “Netscape file”).

So MemoryTXT is a unified destination folder for a lot of difference sources on iOS? Are you leaving the text files in the Global Inbox, or are you filing them into the correct folder later?

Exactly my thoughts!

Yes, MemoryTXT is a sort of Global Inbox in Dropbox. You could have any number of Dropbox inboxes, but I’m too lazy to set them up.

Also, because I use MemoryTXT mostly for random notes, occasional flashes of genius, etc, notes tend to stay in there. I look through them occasionally, or they bob up in DT’s AI searches.

Other reference materials - PDFs, web archives, bookmarks - go to DT’s Global Inbox and get sorted to appropriate folders from there.

Really, it’s a blank slate, and your usage will determine how things get set up. The only advice I would have is that you build in minimal overhead. Invariably, if a system requires lots of filing or tagging or tricky title taxonomy, I end up neglecting it and falling into a mess. But that’s me - you might be a person of more orderly habits.

(I should add that none of the ideas I’ve mentioned are original. I can’t remember where I cribbed them from - possibly these very forums - but there are a lot of people wiser than me who devise workflow systems for sanity and fun. I just steal their ideas and implement them imperfectly for my own use.)