Batch download website folder to PDFs -- How?

Hello to all,

Is there a way to download a complete folder from a website and have the pages as PDFs (to preserve formatting, no other reason) in DTPO?
Bonus points if it can go straight to the currently selected group. :slight_smile:

Example: I want to have arduino.cc/en/Reference/ for offline reference.

Ideas?

Thanks,
C

One possibility is to bookmark all pages of interest (manually or maybe via AppleScript) and to use the script “Convert URLs to PDFs” (see Scripts > More Scripts…) afterwards.

Thank you!
C

OK, for everyone with a similar problem, here is what worked fine for me. :bulb:

  1. Download the site with all necessary files using terminal and wget (I wanted it easier and used it with a commercial GUI, SiteSucker). If you get much unwanted stuff as well, don’t worry too much about refining the download, it will be deleted later. Have the links converted to local links.

  2. Import the complete downloaded material into an empty group in DevonThink. Select the (sub)group holding the content you want as PDFs and check if the pages (still HTML) display correctly.

  3. Select all documents in that group, run the script Download > Convert URLs to PDF documents. When done, sort group by kind and trash non-PDFs.

  4. Move the group (that now should hold all desired pages as PDFs) to its final location, delete remaining unwanted download stuff.

Such a procedure could be a new feature. It could be added to download manager or be a new item for More Scripts.

It’s really fun working with DTO once I figured something out! :smiley:

C

1 Like

DEVONthink is a lot of fun indeed! :smiley: