I don’t whether it could work, however it seems clear that it should work:
From help:
All links within the site are modified so that they point to the downloaded images or other embedded objects. This ensures that the page/site can be displayed at any time.
The whole paragraph:
Website: Opens the Download Manager and downloads a complete web page/site for archiving and offline viewing. Make sure the download options are set correctly, especially the options that define which links DEVONthink should follow (if any). All links within the site are modified so that they point to the downloaded images or other embedded objects. This ensures that the page/site can be displayed at any time. By default, groups created by the Download Manager are excluded from tagging.
Of course the browser will report an error. However, my understanding based on the documentation and some forum posts from @cgrunenberg is that DEVONthink should use the imported documents when following a link from an imported file. So when I open index.html in DEVONthink and click the link to all.html, it should load the all.html that it has already imported, rather than making a request to the server.
I understand where you’re coming from, I just believe that you aren’t fully aware of the import site / download manager feature.
If I’ve misunderstood how the feature is supposed to work, then fine - but I’d like to hear that directly from @cgrunenberg / @BLUEFROG. They both claim that it works as I expect it to, it just appears to not be working for me or others.
I watched the Navigation bar as I hovered over links. It was pointing to my localhost. After shutting off the SimpleHTTPServer, navigation is no longer possible.
The links aren’t pointing to relative links, but still to a server.
I wasn’t until I tried it out. I have to admit that I find the GUI irritating and the results not consistent: importing to the filesystem seemed to work ok whereas importing into the database did not give the results promised by the documentation.
DEVONthink doesn’t change links while downloading (and never did), it only looks for items in your database(s) having the full absolute URL (e.g. after resolving relative links) while browsing and should use them if found. Please send me the database and I’ll have a look at it.
The URLs are fine. The question at hand is this part of behavior:
So you open the all.html page which has a link to subdir/page.html. When you click that link, DT should “resolve the relative link” to be “the full absolute URL” http://pat.local/dt-html-import-site/subdir/page.html, and since there’s a document that was downloaded that has that exact URL, use that item.
I seem to be missing something here. Until now, I thought that you wanted to view downloaded HTML pages while you’re offline (cf. your first post here). Now you say that you want DT to “resolve the relative link” (i.e. subdir/page.html) to the absolute URL (i.e. http:/host/something/subdir/page.html).
But this is, as far as I can tell, the exact behaviour DT shows now. With the obvious caveat that it can’t “resolve” this relative link when the relevant server is offline (either because it is turned off or because you have no connection.
I just tried it with a website here (mind you: one hosted outside of my local network – maybe that’s relevant?) and it works exactly as expected: If WLAN is off, DT displays the HTML pages as they where downloaded. So that seems to work as described in the documentation, at least in this case. Note Since @pete31 saw the same behaviour as @padillac, also with a local web server, maybe it is related to the fact that the URLs’s domain is .local? At least in @pete31’s example, I’m certain that the loopback interface will be used. Depending on the network layer, someone/something might take a shortcut there and handle server/connection detection differently then for a “real” interface?
I apologize for shouting @BLUEFROG and I see no problem but @pete31 and @padillacdo because @BLUEFROG and I apparently tried external websites (i.e. not hosted by a server running on the current machine).
I just tried to replicate @padillac’s setup with a local Apache and saw the exact same behaviour: DT displays the website ok as long as the server is running. If the server is turned off, DT complains about itnot being available. I think that this is inconsistent: It is not relevant on which server the site is hosted.
So maybe @cgrunenberg could have a look into the code and figure out how the local interface (lo) and the other(s) (en0 etc.) are handled differently?
Indeed. As described multiple times previously, the process is:
Use the Download Manager to import a site
Open one of the imported items
Click a link, and have DT open the imported document that corresponds to that link, rather than requesting it from the web server.
Does it work when you click a link on one of the pages? That’s the question of this thread. The individual pages all work fine. It’s when clicking on a link from one page to another that DEVONthink does not load the imported document.
Interesting observation, and one that I found plausible. However, I removed the pat.local entry from /etc/hosts and restarted the machine, and DT still tries to connect to pat.local (which now just times out because the domain doesn’t exist).
I will try it with an external site at some point though, to see if it behaves differently. fwiw, @BLUEFROG confirmed that he saw the same issue that I reported.
Ok, I read, and reread this entire thread. The solution seems to be pat.chicken
Seriously, I want to do the same as padillac wanted to do,
Import a website
Navigate it in DT3 offline (either with no network connection, or with the website down)
I imported a website, subdirectory (complete), Files [all options selected], follow links in subdirectories
When I selected the main page [html], and turned off wifi,I had the same issue, with the page showing “The internet connection appears to be offline”
I am using DevonThink 3, on my iMac. I do not know what he meant by changing from pat.local to pat.chicken - and I do not think that applies for me, though it could
If (and that is actually a very small if) the website uses JavaScript to load parts of the page, this behaviour is expected and completely normal. One of the consequences of Web 2.0, I’d say.
@pete31 shared an example of the Apple develper documentation recently. It consists of just a bare HTML scaffolding, and every single part is filled in at “run time” (aka when the page is opened) by JavaScript: the documentation is retrieved from a server, which obviously will not work when your machine is offline.
So “importing a website for offine use” will probably not do what one would naively expect in many cases nowadays.