Is it possible to import multiple web pages as one PDF into DevonThink?

Example: the list of books in the library of Edge Books by People at Edge.org | Edge.org
Books by People at Edge.org | Edge.org
at the bottom, links to the next pages
image
It’s easy to convert one web page into a PDF; but is there a way to concatenate all pages into one ?
I would like to avoid having to click on and import each page number
thanks in advance for your time and help

Just use Tools > Merge Items after clipping the web pages.

1 Like

So there is no way around having to click on each page link as per above ? thank you

The web page is designed to be used by people.

The web site designers demand/require you click on those little numbers which you recognise as links to retrieve and load into your browser yet another, and different, page. Except for maybe the hyped ChatGBT or something as “intelligent” like people, hard to see how it could be turned into a computer algorithm do that in any generic way.

Perhaps ask the web site to change their design to allow a link to a consolidated PDF they create for you.

Edit: You might be able to make the “wget” program work for you on this web site. It can do recursive downloads. Search on “wget command macOS” for how to obtain and install on a Mac.

2 Likes

Only by scripting and using e.g. the create PDF document from command.

1 Like

thank you. I will have a look.

just an addendum. I could have formulated my question better. It was triggered by the fact that each page URL has an easy to recognize syntax, namely for example for page 2

https://www.edge.org/library?page=2

This script should do it.

You can set theURL_EndPage to a higher value (but I’m not sure whether it’s a good idea to do pages 0 thru 58 in one script run).

-- Create PDFs from URLs and merge them

property theURL : "https://www.edge.org/library"
property theURL_Query : "?page="
property theURL_StartPage : 0
property theURL_EndPage : 9

tell application id "DNtp"
	try
		set theGroup to display group selector "Choose destination:"
		set theDate to do shell script "date \"+%Y-%m-%d% %H:%M:%S\""
		
		repeat with i from theURL_StartPage to theURL_EndPage
			set thisURL to theURL & theURL_Query & i
			set thisRecord to create PDF document from thisURL in theGroup with pagination
		end repeat
		
		set theChildren to search "kind:pdf creationDate>=" & theDate in theGroup
		
		if theChildren ≠ {} then
			set theRecord to merge records theChildren in theGroup
			set name of theRecord to (name without extension of theRecord) & space & "[Merged]"
			set URL of theRecord to theURL
		else
			error "No PDF records found"
		end if
		
	on error error_message number error_number
		if the error_number is not -128 then display alert "DEVONthink" message error_message as warning
		return
	end try
end tell
2 Likes

ABSOLUTELY INCREDIBLE! YOU ARE A GENIUS. Works perfectly. Fantastic. Thank you so much !

2 Likes

may I ask I would modify the script to choose the same source and destination, and ideally make it generic to choose the URL of the currently selected DevonThink “.weblock” item. thanks again

May I ask why you’re trying to generate a PDF of these pages?

1 Like

to annotate them. thank you for your post

You’re welcome :slight_smile:
So you’re annotating a PDF showing the thumbnails of books in their library?

1 Like

I chose this web site just as an example. I would never disturb forum members with a problem limited to a single web site. It is a VERY common and recurrent problem. I just have to replace whatever web site in the URL in the script. That’s why I was asking if the procedure could be made automatic (ie recognition of the web site in the DevonThink webloc item).

See Script: Create PDFs from URLs and merge them.