Linking to web pages of imported web sites


I have downloaded (imported) a few websites using the built-in importer in DTPO. What disconcerts me is that the links are not translated to “local” links (paths), and therefore I am not able to navigate the website when my machine is offline.

May I therefore ask two questions:

  1. Is it possible to import the site in such a way that all links will point to the locally downloaded pages? (i.e. using the ‘paths’ rather than the URLs)

  2. Is there a way to capture the local “path” of a link rather than its URL?

Thank you.

DEVONthink Pro’s download manager doesn’t support this, it’s optimized for downloading to a database and doesn’t modify the downloaded files. But there are lots of third-party tools to do this, e.g. SiterSucker, Blue Crab (Lite), HyperImage, CocoaWget, Web Dumper, SimpleWget, WebCopier.

Thanks, Christian.

With regard to point 1 of my original post:
Actually, it seems that the DT downloader does indeed modify the URLs so that the site is coherent and no links are broken when it is read offline. The problem I reported was with a particular site, which perhaps used some tricks and was recalcitrant to offsite downloaders in general. So please consider this issue addressed.

With regard to point 2:
It would be great if DT provided a facility (ideally via a contextual menu—perhaps a service?) which, given a particular link on a downloaded web page, would return the local file path corresponding to the link. Or, at least, a way to “reveal” the “position”/“path”, in the DT database, of the downloaded web page that I am currently reading. Perhaps this facility is already in place and I have missed it? The only workaround I can think of is to “search” for the web page that I am viewing, right-click on the appropriate search result and select “reveal”. This is rather counterintuitive and cumbersome, however.

There is a simple reason for insisting on this: I need to enrich some of my text notes with offline-viewable links to particular web pages of my downloaded web sites.

Thank you.

The downloaded files are never modified but their original URLs are also stored and therefore DEVONthink Pro can access the required files, independent of their location in the group hierarchy and independent of the path.

Linking to paths inside the database packages isn’t recommended but you can of course insert item links, either via Edit > Copy Item Link or via the contextual menu of rich texts ("(Insert) Link To > …").

ΟΚ, thanks, it now makes sense. Allow me to say that the downloader is kinda half-baked without the ability to modify the URLs so that they point to the locally stored files. Your point about wget etc. being good alternatives is well-taken, but there is little point in downloading an entire website for offline viewing without being able to also navigate it offline. I am not saying that there is “no point” but there is certainly “little point”. In fact, I wonder whether it’s worth the resources to develop and maintain the your own in-house downloader altogether, since it is relatively deficient and offers no advantages (as far as I can tell) compared to a solution like wget, other than being integrated in the DevonThink GUI.

At least for my workflow, all in all, the best solution seems to download web sites with wget and then to import the entire downloaded tree into the DT database.

Or am I missing something again?

It’s possible, see above: “DEVONthink Pro can access the required files, independent of their location in the group hierarchy and independent of the path.”

My apologies, Christian. I only now realized that the URLs in the page sources are relative and therefore work both online and offline without modification. :blush:

And similarly, I only now realized that Edit > Copy Item Link does the trick with regard to my point #2.

So all is clear now.