Bulk convert imported Pinboard bookmarks to Markdown

My knowledge database is primarily based on plain text (specifically Markdown formatted files). The reason I use this methodology is that it keeps the majority of my database lean and I have no problem making use of an index only approach to my data - that way I can use my Markdown editor’s of choice across both desktop and mobile to get data into my DT database. I also have a huge PDF library hosted on OneDrive (ever since MS provided unlimited storage as part of their Office 365 package) and this is also indexed by DT and updates automatically in a seamless manner each time I relaunch DT.

I have approximately 5K Markdown notes that live in an nvALT folder and each note takes up a minimal amount of data as all of my image/rich resources reside on a separate web server (or increasingly direct links to OneNote or Flickr based images). This workflow works well for me as the project documentation I produce (based on my knowledge database) is created via Brett Terpestra’s Marked app (and sometimes via Ulysses on my IOS device) which is effectively converting a HTML representation of my documentation to a CSS styled PDF/DOCx on the fly.

But the major problem I’m having at the moment is that the only way to get my (20K plus) Pinboard bookmark data into DT is to import the bookmarks and then face the ‘Hobson’s Choice’ of converting to HTML, webarchive, rich note or PDF; which has a huge overhead as each bookmark ends up being on average 300k - 3Mb (depending on the format choice). HTML is the most efficient but that is a poor choice because the majority of the data is taken up by code markup. That same Pinboard data lives in HistoryHound (St Claire Software) and the index size for all 20k plus bookmarks is less than 100Mb (it only indexes the text content of each web page). On that basis I made decision to use HistoryHound over DT as my tool for searching my Pinboard bookmarks as HH also has DT like sophisticated search logic that helps my locate relevant content. The main thing that I’m missing using this approach is the smart concordance based search features of DT - primarily the ‘see also’ feature, which has consistently provided me with wonderfully unexpected links in my data over the years. I’ve yet to find another desktop tool that can do the same thing. None of this would be necessary if the full text search capabilities of Pinboard (part of Pinboard’s premium archival service) functioned well but alas it doesn’t.

The DT developers already provide a bookmarklet for converting individual web pages to Markdown (via Brett Terpestra’s Markdownifier web service) and then automatically adding the converted page to DT. My wish is that this type of conversion tool could be part of the main DT program for bulk converting existing bookmarks to Markdown formatted files. At the very least an Applescript could/should be provided to handle this requirement.

An alternative thought is that maybe DT’s developers could strike up a strategic partnership with St Clair - developers of HistoryHound so that DT can make use of HH’s index.

I’m aware that Devonthink wasn’t originally developed for plain text workflows and the developers believe it functions best as a monolithic database that captures and holds all your data but in this multi platform, multi device, cloud driven world, many people have moved away from this way of working. Overall I’d like to see the DT developers optimise their product for both plain text workflows and index only approaches. The eternal wait for a useful Devonthink To Go product line would surely benefit from this too. :wink:

1 Like

Did you have a look at this thread? Web Clipping from ANYWHERE workflow

Thanks for this Christian but it’s not really what I’m looking for (that’s not to say it won’t come in useful). I have no problem clipping content to my nvALT database as Markdown via desktop or IOS application (there’s a great ‘Workflow App’ script for IOS). The problem I have is bulk converting my 20K plus archive of bookmarks that reside in Pinboard to Markdown.

I use my nvAlt database and Pinboard very differently. Pinboard is just a collection of anything I find interesting or relevant to aspects of my work. My nvALT database (indexed by DT) is a collection of data that I’m using (or have used) for specific projects. I then use DT’s ‘Replicant’ feature to group together nvAlt docs for specific projects as part of the planning process.

This is a great system for my needs but I lose out on much of Devonthink’s power because there is no easy way to convert my 20K plus archive of Pinboard bookmarks to Markdown content (much of this archive originated as Delicious bookmarks but is cleaned up on a regular basis for 404’s using Spillo).

My interim solution is to import my bookmarks into HistoryHound so that I can search my archive using boolean logic. This is pretty useful but that as useful as the more sophisticated search capabilities within DT.

It’s for this reason I posted this message as a feature request and not something specific to scripting although I’m sure an Applescript solution exists seeing as DT has a web browser built in and my thinking is the existing web clipper bookmarklet could be adapted to function on bookmarks within the DT database (in bulk fashion as any other of DT’s conversion processes function).

Perform the same function as:

heckyesmarkdown.com/

Which enables you to take the content of a web page and convert it to a markdown formatted document. The existing Devonthink web clipper/bookmarklet uses the API of the this (in the background) to perform it’s magic (if you chose Markdown as the desired format).

This is great for individual pages but it doesn’t enable you to bulk convert a bunch of bookmarks. And I have over 20,000 that need to be converted.

The existing DT Pinboard script downloads the bookmarks and their associated tags but you cant index bookmarks so those bookmarks need to be converted into indexable data. Markdown is by far the most efficient form of data (if you want access to the rich content) but it isn’t one of DT’s built in conversions (those being HTML, Webarchive, Rich Text, PDF etc). It would seem logical that an application that already facilitates conversion of such a rich variety of formats would include native Markdown conversion too.

I’ve been a long term user of Devonthink and find it essential to my workflow but I sometimes feel that the developers of DT see Markdown as nothing more than a hipster fad. You see much the same attitude in the Scrivener forums - another application that I see as an essential part of my arsenal, not that it’s so essential that I can’t see how Ulysses has usurped it in many ways over the last 12 months or so.

Not having a dig at anyone here, especially not you Korm (I’ve always make a point of reading your considered posts in this forum). But Christian’s suggested solution indicated to me that he hadn’t properly read my request. Not a good sign considering he’s one of the lead developers.

I’m probably feeling a tad touchy having read through though the appalling tone some senior members of this form took (many of which are on the DT staff) towards a poster in this recent thread - New Version

I’m not suggesting he was without fault but I thought his treatment was very harsh.

Correction, I have one single need - the conversion of 20,000 plus bookmarks imported to my DT database to Markdown. I have no problem converting web page content to markdown on an ad hoc basis. I also keep my nvALT folder in Dropbox and index it with both DT on my laptop and Desktop so I can avoid having to use the DT sync engine.

I’ve discussed the problem directly with Brett Terpestra (I’m part of the Marked Beta team) and we both felt the the task was best resolved from within Devonthink. Brett suggests that it will be a simple enough task to achieve for somebody with appropriate Applescript skills via the heckyesmarkdown.com/ API.

The core of my suggestion is that seeing as Markdown documents are a standard part of the DT spec, DT should be providing a conversion tool in much the same way as they do for other document formats that are part of the DT spec - WebArchive, HTML, Rich Text, PDF etc.

I did of course search far and wide for an alternative solution and came across this Python based solution - larryhynes.net/2015/03/local-ful … earch.html - which I will trial and report back on the results. However I still believe the DT should provide a native solution seeing as they already facilitate the import of the bookmarks from Pinboard via an Applescript in the first place.

For anybody else reading this thread, Korm and I have taken this conversation offline to explore a favourable solution.

Please note that DEVONtechnologies does not write the conversion tools or “native solutions” for document formats such as WebArchive, HTML, rich text, PDF, etc. For all of those listed filetypes, DEVONthink uses code available in OS X including Apple’s WebKit, rich and plain text, and PDFKit, as well as Apple’s Quick Look technology. The ability of DEVONthink to index text content and display documents in many other filetypes is via Spotlight plugins and Quick Look rendering usually supplied by the developers of filetypes. This approach enables DEVONtechnologies to focus its development resources on integration, organization and manipulation of the information content of a variety of document filetypes in the case of DEVONthink. DEVONtechnologies is a small company and doesn’t have the resources to independently develop code for the variety of document filetypes that can be accommodated in databases, nor would that be a productive utilization of resources.

In each generation of DEVONthink there have been a number of maintenance releases that adapt to updates and upgrades of OS X, add new features, respond to user suggestions and fix bugs. This takes a lot of development resources, at no charge to registered users. The developers are also extending their vision of DEVONthink towards a future generation, planning for more power, more features and improved simplicity and usability (we think that will respond to many user comments about UI and appearance – although functionality will always be paramount).

From time to time over DEVONthink’s history new document filetypes have emerged. Some take hold and become important, and others don’t. Markdown is an example. In response to user requests new filename extensions have been recognized, such as .markdown and .md. Display has become possible via third-party code (not developed by DEVONtechnologies). A menu command, Data > New > Markdown Text (with appropriate filetype extension) has been added to DEVONthink. As DEVONthink Pro and Pro Office have large scripting dictionaries (and allow execution of other scripts also), automation of procedures and extensions of features become possible. There’s a lot of activity in the Scripting area of the user forum. A number of scripts are available for installation in the Extras area under Help > Support Assistant. There are many outside sources of scripts that may be useful to users of DEVONthink Pro and Pro Office.

Christian Grunenberg is DEVONtechnologies’ chief developer. Although he sometimes suggests script approaches and corrections or additions to them, he has a lot on his plate.

The script you discovered may do the job. As it doesn’t include extensive error management (timeouts, etc.) I wouldn’t suggest you run it on the entire block of items to be converted but at least in the beginning on smaller blocks of them. Please do report on your experience with the script. It may be useful to others as well.

Bill – with respect – there are already cases where DEVONthink is converting HTML to Markdown using the heckyesmarkdown API mentioned by the OP. The Clip to DEVONthink tool does this. DEVONthink also already supports importing from Pinboard. I believe the OP is merely suggesting extending the existing solutions which are provided by DEVONtech today for individual web pages to a solution that can be run in batch against multiple Pinboard links. It’s feasible.

This might not meet the needs of a large portion of the user base, but it’s a reasonable and thoughtful suggestion that at least should go into the maybe/someday category.

Thanks for your detailed response Bill. It’s good to get some background on the way you use existing OS X technologies for certain tasks and how you ascertain the importance of emerging technologies/file formats and how they will integrate in future iterations of Devonthink.

I’m already aware of how to display markdown content within DT and make use of the hidden Terminal initiated preference to make ‘Best Alternative’ the default view for my Markdown documents. Although I’m not the most active member of this forum I’ve been a Devonthink customer since March 2007 and have developed a workflow that’s apt for my specific needs. I make a lot of use of cloud services and use my IOS devices as productivity tools (not just for consumption) so in recent years have moved to an index only approach to my DT data. I know this isn’t the way DT was designed (it certainly isn’t the way I used it back in 2007) but cloud/mobile is mission critical for me so my workflow has developed to accommodate this over the years. I’ve explored most of the Applescripts available on the DT website and through the support assistant too. So I think of myself a relatively well informed user of Devonthink.

The script that Christian linked to would not work for me for the following reasons:

  • The IFTT rules rely on the Pinboard bookmarks being public. Mine are not public, Pinboard is mainly used for private collections (hence it being advertised as the ‘anti-social bookmarking service’).

  • Even if I decided to make them public on a temporary basis the process of adding the the ‘2md’ tag to 20k plus bookmarks world be arduous to say the least. Doable via scripting Spillo but hardly straightforward

I’m going to contact the author of the Python script tomorrow to see if it could be tweaked to use Brett Terpestra’s superior web service before committing to a test and with luck Brett will help out too if he has the time. I’m hoping there may even be a way of combining the approach of Brett’s original Ruby script with the Python script approach for a more solid solution.

Rest assured, I’ll report back my findings either way.

This is exactly what I was suggesting if it’s at all possible.

Thanks for making sense of my request Korm and putting it into Devon-speak! :slight_smile:

I agree that the OP has a legitimate wish to convert thousands of documents to result in Markdown-formatted text documents. I agree that such an ability might be useful to other users of Markdown and DEVONthink. I agree that it might be useful to users of Markdown, whether or not they use DEVONthink.

If I appeared to resist anything, it was the implication that lack of that existing ability in DEVONthink was a failure to meet the requirements of a specification (explicit or implied) of the application. While Christian and others in DEVONtechnologies often do offer advice or even scripts in response to user requests, we don’t have the resources to do that in every case. Usually, to develop and test a response that takes more than a few minutes of a developer’s time isn’t justified by a single request, unless it appears likely to be useful to a significant number of users.

The community of Markdown users appears to be growing. If that’s the case, I would think that approaches to bulk conversion of document filetypes such as those bookmarks to Markdown documents would be of generic interest in that community, and would lead to such developments independently of DEVONthink. If not, why not?

It’s also important to point out that many people initially thought the popularity of Markdown was something intrinsically linked to IOS/Android devices initial weakness with rich text (an argument I’ve witnessed often on the Scrivener forum). But over time Markdown has picked up in popularity because of it’s data portability/efficiency benefits (as well as it’s core focus of simple semantic markup that helps provide focus and separates content from layout/design).

As a customer of DEVONthink I shouldn’t have to be aware of which technologies are provided by the core DEVONthink engine and which are part of WebKit, PDFkit or other extended parts of the OS X engine. The only data format that exists under the ‘Data’ menu in DEVONthink that doesn’t have inclusive conversion capabilities is Markdown. I accept that Markdown wasn’t originally a native data format when DEVONthink 2 was launched but now that it’s possible to create a Markdown document through the ‘Data’ menu I don’t think it’s unreasonable to expect that DEVON Technologies provide a conversion tool (that functions on bulk data) in just the same manner as they do for all the other data formats in the ‘Data’ menu.

Markdown is growing in popularity so any conversion tools that bring parity to the options available to DEVONthink customers through the ‘Data’ menu will be available to all DEVONthink users. If I felt this was a single use case I would of course look to hack together an AppleScript with my limited scripting skills (I’m a ‘Design Strategist’ by trade not a programmer and use DEVONthink to manage my research).

As Korm highlighted and I had already mentioned it appears that much of this scripting challenge has already been addresses the the DEVONthink Web Clipper (and associated bookmarklets) and this is another reason why I had hopped that the DEVONthink team could adapt this functionality for bulk conversions.

I’m aware that many of the scripts offered by DEVON Technologies where originally developed by the community (e.g. I originally downloaded the Pinboard import script via the kind user that posted it in this forum many years ago). But now that Markdown is a standard data format that can be created as well as read through the DEVONthink ‘Data’ menu, I stand by my original request that this is something that DEVON technologies should be providing for it’s customers. I bring this up now as I’d like to ensure that it’s part of the functional specification consideration for DEVONthink 3 as much a requirement for the current iteration of DEVONthink.

Listening to Gabe Weatherhead on this weeks Mac Power Users (the show was dedicated to DEVONthink) I was reminded of the strength of the community here on the DEVONthink forum and in particular the collaborative nature of the DEVONthink customer service function. I’m not sure that the closing comment of “If not, why not?” with regard to the Markdown community being responsible for finding a solution here is reflective of this collaborative culture.

Indeed, I think it’s likely that conversion to/from Markdown formatted documents will be available within DEVONthink’s Data > Convert menu options.

The question is more “when” than “whether”. DEVONtechnologies does want to support its user community’s preferences in document filetypes. A number of DEVONthink users do create and work with Markdown documents, and DEVONthink has adapted to recognition of the filetype extension.

Markdown is a relatively recent set of conventions for writing HTML in plain text, introduced by John Gruber in 2004. Those conventions can be interpreted (usually by a Perl script) to result in display of a document to include formatted text, links, images, lists, page layout, etc.

There is not a definitive set of those conventions at this time. There are “flavors” (variants) of Markdown, and some in the Markdown community describe Gruber as having abandoned the effort to define and evolve Markdown. There are ongoing efforts to standardize Markdown, and a draft of standards approaches is scheduled to be submitted to the Internet Engineering Steering Group this month (April, 2015), with request to the user community to review and comment on the proposals.

Which is to say, a conversion routine for Markdown may not produce the desired results for all users of Markdown. That’s one of the reasons I asked why the Markdown community had not yet, apparently, created an “unmessy” approach to bulk conversion of documents such as your large collection of bookmarks. There may be a least common denominator approach that’s possible, but it may not satisfy all users of Markdown. By contrast, the contextual conversion options in DEVONthink’s Data > Convert options depend on use of standards applicable to Apple’s plain and rich text, HTML (and its variants, WebArchive and Formatted Note) and PDF. The filetype of selected document(s) is recognized and (if possible) the conversion options are displayed.

Given the existing set of priorities and heavy workloads of DEVONtechnologies’ developers, how soon should a set of conversion routines for Markdown documents be provided, if they must be developed within DEVONthink? Should they be based on Gruber’s 2004 syntax and markup conventions, or on Markup standards proposed to or adopted by IESG?

I think the best approach would be for DEVON Technologies to continue using the resource the Markdown community provides via Brett Terpestra’s excellent API (as you currently do via the DEVONthink Web Clipper).

This way you allow Brett and the community to keep up with the specifics of which flavour of Markdown the conversion uses - most people I know favour the canonical ‘Semantic Markdown’ approach as detailed here: nikcodes.com/2013/08/20/semantic-markdown/

All that’s needed right now to facilitate bulk conversion of the Webloc files within a DEVONthink database, is a way of triggering the DEVONthink Web Clipper from within DEVONthink rather than an external web browser.

Looking through the DEVONthink Applescript dictionary there appears to be a ‘do javascript’ command within the DEVONthink suite so maybe a script could be developed utilising this.

It’s also worth mentioning that you also already have a ‘create Markdown from’ command in the DEVONthink suite which I’ve managed to test (via a basic single action script, not a looping version for bulk import) but this doesn’t offer the choice of creating the Markdown via the Readability engine (as offered through the DEVONthink Web Clipper), so the results are a tad messy and not very readable).

Hopefully this provides further clarity as to my request Bill. DEVONthink is already using Brett’s API to convert external web pages to readable Markdown. It doesn’t seem such a huge leap to me for DEVON Technologies to use this same API to allow bulk conversion of Webloc files that already exist within the DEVONthink database.

I should add that the risk with using Brett Terpestra’s API via heckyesmarkdown.com/ relies upon Brett kindly making available and maintaining this API. But in the unlikely situation that he stops supporting this service the DEVONthink Web Clipper will become broken too.

It’s great to hear that conversion to/from Markdown will appear within the Data > Convert menu at some point. I fully understand the issues regarding standards and the complexities that this adds but considering Brett is in the process of commercialising his nvALT application, I think there’s very little risk in him abandoning his conversion API as it’s a central aspect of the nvALT product offering.

I’ve just run some tests on the Python script I mentioned in an earlier post - larryhynes.net/2015/03/local-ful … earch.html - and it works as described by it’s creator (with the caveats he describes); although you need to ensure you have both the BeautifulSoup and HTML2md packages installed within Python for the script to function.

However it’s definitely ‘rough and ready’ and doesn’t produce the same quality of markdown as that produced by Brett Terpstra’s elegant API. On the plus side it’s very speedy (as long as the source web sites serve their pages in a speedy fashion) and it certainly provides the most simple way of capturing indexable data from a bulk export from Pinboard (with a tiny storage footprint within your database too seeing as it’s plain text).

I’d definitely class it as an interim solution but a very workable one. The biggest problem with this script as I see it is that you lose the source link as part of the conversion. The messy Markdown I can live with, as it’s the textual content I’m most interested in. And the Markdown it produces is still leagues ahead of that produces via DEVONthink’s Applescript suite function.

Sadly seeing as the script relies on the HTML2md package to function there’s no way to alter it so that it can make use of Brett’s API. But the approach the script uses is interesting none the less.

The attached zip shows the same bookmark converted via the DevonThink Web Clipper (that uses Brett Terpstra’s superior API) and via the Python script (the Python version uses the .Markdown file extension and the web clipper version uses the .md extension).

As you’ll see the Python version has significant syntax errors and the Instapaper engine doesn’t create Markdown image links as well as the Readability API version with the source website (The Guardian). Instapaper isn’t as good at parsing modern responsive design websites as Readability.

The Python version is usable but it does have significant errors.
MD Conversions.zip (16.4 KB)

I’ve come up with a couple of native solutions for this challenge that work within DEVONthink which I’ve detailed here:

The Python script makes too many syntax errors in the Markdown for my liking and the Instapper API (core to the script) is nowhere near as good as the Readability API (which the DEVONthink Web Clipper uses).

i hope you’ve seen Obsidian by now, right?