Importing PDFs from Web Site

I have access to a web site that posts a small PDF digest each day. I’d love to be able to snag those PDFs each day so that they can be indexed and searchable in my DT database.

The site has a page for each month of the year and on each page the files have a unique, date-based name (i.e., “Digest - 6.22.22,pdf”).

Any thoughts on an approach to do this or am I looking at writing some kind of web-scraping script?

Thanks for your thoughts.

— Robert

1 Like

If it’s just a publicly accessible website (no logins) you could just write a small shell script that uses curl or wget to download the PDF

3 Likes

Yep, that makes sense @mdbraber. Thanks, I needed that nudge.

As the download manager of the Pro/Server editions can’t be scheduled, another option would be a scheduled smart rule or reminder and using a small AppleScript. Here’s a simple example for a reminder which could be assigned to a bookmark of the web page containing the downloads.

property pLocation : "/Downloaded PDFs"

on performReminder(theBookmark)
	tell application id "DNtp"
		try
			set theDatabase to database of theBookmark
			with timeout of 30 seconds
				set theURL to URL of theBookmark
				set this_page to download markup from theURL
				set these_docs to get links of this_page base URL theURL type "PDF"
				
				repeat with this_doc in these_docs
					if not (exists record with URL this_doc) then
						if not (exists record at pLocation in theDatabase) then
							set this_group to create location pLocation in theDatabase
						else
							set this_group to get record at pLocation in theDatabase
						end if
						set this_record to create PDF document from this_doc in this_group
						set unread of this_record to true
					end if
				end repeat
			end timeout
		end try
	end tell
end performReminder
1 Like