I have access to a web site that posts a small PDF digest each day. I’d love to be able to snag those PDFs each day so that they can be indexed and searchable in my DT database.
The site has a page for each month of the year and on each page the files have a unique, date-based name (i.e., “Digest - 6.22.22,pdf”).
Any thoughts on an approach to do this or am I looking at writing some kind of web-scraping script?
Thanks for your thoughts.
If it’s just a publicly accessible website (no logins) you could just write a small shell script that uses
wget to download the PDF
Yep, that makes sense @mdbraber. Thanks, I needed that nudge.
As the download manager of the Pro/Server editions can’t be scheduled, another option would be a scheduled smart rule or reminder and using a small AppleScript. Here’s a simple example for a reminder which could be assigned to a bookmark of the web page containing the downloads.
property pLocation : "/Downloaded PDFs"
tell application id "DNtp"
set theDatabase to database of theBookmark
with timeout of 30 seconds
set theURL to URL of theBookmark
set this_page to download markup from theURL
set these_docs to get links of this_page base URL theURL type "PDF"
repeat with this_doc in these_docs
if not (exists record with URL this_doc) then
if not (exists record at pLocation in theDatabase) then
set this_group to create location pLocation in theDatabase
set this_group to get record at pLocation in theDatabase
set this_record to create PDF document from this_doc in this_group
set unread of this_record to true