When importing website trees, the index files seem to be missed

halloleo · August 15, 2023, 8:21am

I sometimes import documentation websites (e.g. Flet documentation, to have them locally when offline in the middle of nowhere.

I use the Import → Website functionality with the following settings:

Subdirectory (Complete)
Download To Folder

However when I look at the resulting tree everything seems to get downloaded, just not the index.html files like https://flet.dev/docs/controls/index.html.

Why’s that? And is there a way to download them as well?

chrillek · August 15, 2023, 8:34am

When I open this website, the URL doesn’t say anything about “index.html”. If the server does not provide this file, it might be difficult to download.

curl --head https://flet.dev/docs/controls/index.html
HTTP/2 308
date: Tue, 15 Aug 2023 08:31:00 GMT
location: /docs/controls/
…

Easy to see: The server sends a redirect to docs/control. Nobody has the chance to download index.html since the server doesn’t provide it.
Also: There’s no rule anywhere saying that an index.html must exist. This is simply the factory default for many web servers, but sites can change the default easily (for example to index.php)

halloleo · August 15, 2023, 4:56pm

True, @chrillek. Good point.

Still, it would be nice if DEVONthink downloaded the page document at /docs/controls/ and saved it under some file name… After all the url https://flet.dev/docs/controls does serve a document