webpage → Readability → PDF → DevonThink?

My research workflow involves:

  1. finding articles on the web
  2. converting them to a nice clean view using Readbility.com’s Chrome extension
  3. using Chrome’s “Save to PDF” function to save the cleaned-up article as PDF to my Desktop
  4. importing the resulting PDF into Devonthink.

This seems dreadfully tedious. Is there a more automated way I can do this?

Yeah, use the Clip to DEVONthink extension for Chrome to send the PDF directly to a database. Just tried it in Safari and it works a treat! 8^)

Sorry, but I laughed out loud when I read your “dreadfully tedious” comment about your workflow.

As a certified old geezer, I’ll tell you a variant of the yarn about how I had to walk five miles barefoot through the snow to school, uphill all the way – and uphill all the way back home after school.

Back in the 1960s I worked with Lynton K. Caldwell to put together a three-volume bibliography titled Science, Technology and Public Policy, funded by the National Science Foundation. It included some 3,000 references.

Shocking though it may seem, DEVONthink didn’t exist back then, much less personal computers, the World Wide Web, and digital books and journals.

I spent a lot of time in the library at Indiana University, going through card catalogs, then down in the stacks browsing books and journals.

When I found a book or article worth including in the project, there was no command to invoke to copy it or make a summary of it. Instead, I used 4 x 5 inch cards to record information, including complete citation information and notes about the content – all handwritten.

Data storage was in shoeboxes. Groups consisted of stacks of cards that held topical similarities, each stack held together by a rubber band. As I didn’t have the benefit of searches, Classify or See Also, I created those stacks by flipping through the cards and sorting them into stacks based on my interpretation of content relevance and importance. Slips of paper inserted under the rubber bands provided information about each stack’s topic and the fit of that stack in the bibliography’s organizational structure (also handwritten).

Sorting was by no means a one-time procedure. The initial sorts were later subdivided into smaller topical groups. Once in a while, in progress, changes would be made in topical grouping, leading to a lot of re-shuffling.

Finally, together with handwritten material describing each of the topical areas in the bibliography volumes, the stacks of cards were delivered to typists to produce camera-ready copy.

That, my young friend, was a project in which the term, “dreadfully tedious” had real significance. You have no idea how easy you have it. :smiling_imp:

That story may give you an appreciation of how much I love DEVONthink for managing my collections of references and notes.

My own workflow for capturing Web content differs from yours. I find content I want to capture from searches (DEVONagent Pro) or from browsing journals, news and governmental sources.

When I find something worth keeping, I select the portion of the Web page that contains the desired content and invoke Command-) to capture it as a rich text note. As I’ve set Preferences > Import - Destination to ‘Select group’, I can immediately file the addition in any group in any open database. (But don’t try this in Chrome or Firefox, as they don’t properly handle Services.)

I prefer rich text capture to PDF capture, as rich text is more editable and it’s easier to move into Pages by copy/paste if I want to capture a quotation. I almost never capture a complete Web page, as I want to avoid capture of extraneous content in order to improve the efficiency of searches and the AI assistants. There’s also a saving in file storage space, by comparison to capture as PDF.

Love your story Bill, oh how I remember the days and they were not that long ago really, I remember getting my first computer, (a very modern cutting edge Apple LCII) only in late 1994 I think it was, before that research was very much like your story. Thanks for sharing this :smiley:

Ha! Awesome tale, Bill. I agree with Allsop - thanks for sharing. 8^)

The DEVONthink extension of Safari does not work like the Print To DEVONthink service with Readability rendered pages.

Whatever style in Readability you choose you will always have a background at least a bit darker than white. This background will be in the PDF generated by the extension. But when you print To DEVONthink you will get black on pure white. Which is a good thing for actually printing the PDF.

This is, by the way, not a Readability only problem—a lot of web pages have a distinct style sheet for print which has black on white no matter how the page looks in the browser. I would prefer the extension (which is great and very handy) to use the print style sheet too. In many cases these styles not only have printer friendly color schemes but also make menus, adds, and other non-content elements invisible.

Another problem with Readability and DEVONthink is that when you import a Readability rendered page (whether by extension or Print to service) you will have the Readability URL in the URL field in DEVONthink.

I know that Readability adds the original URL to the text of the PDF but it would be awesome if DEVONthink had an import filter for Readability and similar services which detects the original URL at the very bottom of the PDF and writes it into the URL field in the info pane.