I do not seem to be able to import a web site into an archive without DT immediately crashing. I’m running Leopard.
When I first started to import the site I was able to partially archive it, since then th edownload will start but then crash within seconds!
Any ideas?
Thanks
Steve Didier
Steve, the Download Manager routines are hitting some still rough-edged areas of Leopard. We are hopeful that Apple will improve the stability of WebKit and some related code in the next Leopard update or two.
I rarely (perhaps twice in the last three years) use Download Manager to capture complete sites, as I typically want only a small portion of a site, e.g., selected papers/articles from a journal issue or news source. The vast majority of my Web captures are selected rich text that includes the text and graphics of articles but excludes all the extraneous material such as ads, other text and so on, that’s common on most Web pages. So I don’t use Download Manager for two reasons: one cannot designate just the specific pages to capture among many, and I don’t like to capture full HTML pages.
But I can understand why users might like to capture an entire site, e.g., a lawyer might wish to document the content of a site at as of a particular date (perhaps as evidence in a patent law dispute), or a real estate agent might wish to capture a set of listings.
Currently, Download Manager works well with some sites, but on others it may encounter page elements/layouts that trigger those Leopard instabilities.
Thanks Bill, This was the first time I had tried to download an entire site. But the site is essentially an aerodynamics tome for dummies that I wanted to convert in it’s entirety to a pdf file for easier reading offline.
Oh well, hopefully Apple will fix it soon
Is it possible to import the site as individual pages and then stitch it back together in DT?
Steve
Sure, if you capture the individual sections as rich text, you can subsequently hyperlink each section to the next, for navigation to the next section.
Or you can create a Table of Contents rich text note that lists and links to the various sections (the sections could be captured as text, HTML, WebArchive or PDF, using that approach). Assuming that the site itself contains such a TOC linked list, a quick way to do this would be to capture that site page as plain text (to “kill” the Web links), then select Format > Make Rich Text. Now select each section title and right-click on the selected text, then choose Link To from the contextual menu options. Navigate to the document to be linked.
i have to admit though i would be very happy if the download manager did work well and one could tweak a bit more the options, because this function of DT has an awsome potential and could become a essential part for me.
often i have to download an entire sites content, not actually making a 1-to-1 copy but just certain files.
e.g. downloading documents from a site or subdomain. in this case i use an app called “blue crab”. you specify a site/domain, choose the level of links to follow (i’m not exactly sure if it just follows the links or if it actually “scan” the whole domain or directory you select) and tell it to just download files with the ending .pdf and .doc.
that’s very helpfull because like this i dont have to click through every single site and download each file i need separately, instead i get the hole bunch of pdfs i needed in a folder on my desktop.
i see that what i am describing i a full blown download manager / site ripper and one could argue whether it belongs within the DT environment or even if it is closer to devon agent. but since there is something called download manager which already works quite and offers good features, and someone mentioned the matter, i thought i’d throw in my opinion
Thanks Prontomat, Blue Crab works as advertised and once I had the site in a folder it was easy to import into DT as a webarchive.
Steve Didier