Archive part of a very large site (BBC recipes)

The BBC here in the UK has a very useful recipe section on its website - but is about to close the whole thing down. (Under pressure from the govt and commercial providers). There are more than 11,000 recipes on there.

I would like to use DTPO to archive the entire recipe section of the BBC site (bbc.co.uk/food/recipes/). I realise this is ambitious, but thought I’d give it a go all the same. I tried using the function File-Import-website and it began its download. I left it running, knowing it was bound to take some time. Fortunately I noticed that it wasn’t archiving simply the recipes section, but the entire BBC website. Frankly, that’s overambitious.

So, I’d like to know if there is a neat and clever way to archive the recipes section of the BBC site into a database - before the whole thing is lost to posterity. Any tips or suggestions?

You could customize following of links, see Options panel of download manager. E.g. only the same directory or subdirectories.

The BBC recipes are excellent and I would also like to download them all. It would be good if some kind person would detail in easy steps how to achieve such a download, (afraid I do not understand cgrunberg’s reply :frowning: ). Also is it best to open the BBC website in DEVONagent Pro or Safari to accomplish this. Thanks in advance.

Downloading and Importing an Entire Site Automatically
To download and import an entire Web site, choose File > Import Site, enter the URL, and click Add. Then, in the Download Manager panel, select the site, click the Actions pop-up menu, and choose Download To > Database Name or Inbox (that is, anything except Folder). Finally, click the Start/Stop Queue button. Be aware that downloading an entire site may take an extremely long time and use an appalling amount of disk space—so use this feature with caution.
(For details on the many options you can use when downloading an entire site, consult DEVONthink’s help: from the main help screen, click Documentation. Look in “Menus,” then “The File Menu,” and then find the “Import Site” paragraph.)

Criss is referring to the Options under the Action button (gear icon). The URL I’m capturing here is bbc.co.uk/food/ and I’m going for the Subdirectory (Images and multimedia), though I did add a few types. I am downloading to the Recipes database directly.
As you can see, there is a LOT of data coming through. (It’s over 55,000 items now and climbing). I am letting it run its course for due diligence’s sake. :smiley:

… Now at 200,000 items… :open_mouth:

There can be some ammount of fiddling about with this option. Testing is always a good idea before committing.
Website_Download.png

I’ve tried tweaking and playing with most of those options, but whatever I try, the download appears to be attempting to archive the entire bbc website - news, sports, the lot. That’s way too much for one database, or even one hard drive, I suspect.

Apparently, ‘how to download bbc recipes’ is a trending search in Google today. And #bbcrecipes seems to be trending on Twitter. So we are not alone in trying to find a solution.