Alternative way to crawl/archive web pages

Check out urltoys.com/, it is a commandline tool for easy crawling and spidering web pages. You can like “rip” all pdfs or all images from a site or list of URLs… and then import this stuff into DT.