Download all links from a webpage?

For example …

https://www.stigqter.com/stigs/U_SOL_11_X86_STIG_V2R3_Manual-xccdf.html

This page is just a table with URLs to little documents and a short explanation.

I want this in DEVONthink - complete, so that I can refer to any of those linked little documents even if offline or when this sites goes down.

I though that some webarchive would do this, but it only contains the main page and refers to all others just a links to the web version.

Any idea how to get this into DT?!?

:open_mouth:

A web archive doesn’t create an offline archive of linked files.

If you just wanted individual files from the links, you could do something simple like this…

tell application id "DNtp"
	set theURL to "https://www.stigqter.com/stigs/U_SOL_11_X86_STIG_V2R3_Manual-xccdf.html"
	set theSource to download markup from theURL
	set linkList to (get links of theSource base URL "https://www.stigqter.com/stigs/")
	set newDest to create location "/stigqter" in (current database)
	with timeout of 6000 seconds
		repeat with thisLink in linkList
			create PDF document from thisLink in newDest
		end repeat
	end timeout
end tell
2 Likes

Opening the web page in DEVONthink and using Scripts > Download > Links of Page should be sufficient in case of these simple documents.

3 Likes

As this looked like the faster approach, I tried this first.
Many thanks!

It took long until I found the Scripts menu, as I was search for the string “Scripts” and totally overlooked the menu with the graphic symbol :smiley:

But when I finally found and executed it, nothing happened.

Is there something that should be visible?
Or some place where the results could be found?

Many thanks for this, but I need to download many similar webpages … so a generic script would be needed.

Also, I just want the HTML at best like a WebArchive, not one or lots of PDFs …

For now, I just tried this:

tja@mini:~/Downloads/Solaris_Benchmarks$ wget -r -l 1 STIGQter: STIG Details: Solaris 11 SPARC Security Technical Implementation Guide Version: 2 Release: 3 Benchmark Date: 23 Apr 2021

Works like a charm :slight_smile:

I then just copied the 223 downloaded files to my indexed folder:

(base) tja@mini:~/Downloads/Solaris_Benchmarks/www.stigqter.com/stigs$ cp -p * /Volumes/CRYPTOMATOR_ONEDRIVE/WORK/Security/Solaris/Benchmarks/

And there we have it, as items in DT!
This also allows to open the main page within DT and click on the single links to open the stored webpage from the DT item. Great.

I think this method will suffice …

Many thanks nevertheless!

Did you view the web page in the active window before using the script? E.g. a bookmark without a preview pane is not sufficient.

I’m not sure, to be honest.

I clicked on the “bookmark” that I added to the INBOX of DEVONthink.

That opened the webpage in the preview panel below.
If this is “view the web page in the active window”, then yes!

After that, I opened the Scripts menu and clicked on the mentioned script.

But nothing visible happened.

Any way to debug this?
It would be great to do such things directly from DT …

Thanks a bunch!

Which edition of DEVONthink do you use?

Always the latest :hugs:

The edition, not the version :slight_smile:

Oh, sorry.

The Standard edition

This edition doesn’t support the download manager.

I was aware of that.

I just did not understand, that this advice and the script actually use the Download Manager.

Many thanks.

P.S. The script could output something about the situation, instead of just being totally silent …

The next release will add a warning to the Log panel.

1 Like