Importing and editing HTML sites?

Hi, I’m still getting used to the HTML editing feature – e.g. import a .html page (or web page) and I can then move my cursor around in it and highlight, add text, etc.

It seems to work if I archive a page. But, if I click on a link on that page, and then click back, my edits are gone! Hitting Cmd-S after the edits gets me a beep.

The ability to edit HTML I’ve imported would be of real help. How can I do this stably?

Also, how can I archive a group of pages? Say I’ve got an online manual, where the TOC page links to the rest. I’d love to import the entire thing into my database so i can edit the pages, link them to other entries in the database, etc.

Thanks in advance for your help!

Editing HTML or Web Archive documents has limitations and is sometimes problematic due to a bug in Apple’s WebKit code.

I can fairly easily select and delete areas of a page, and sometimes the change can be saved. For HTML it’s possible to open the source view and make changes in the source that will be saved.

But adding links, highlighting and notes to documents captured as HTML or Web Archives can be a major task involving changing the source.

Personally, I prefer capturing information from Web pages as RTF or RTFD rich text documents. I’ve got many thousands of captures this way. The advantages of this approach include capturing images that are viewable offline, avoiding extraneous material such as advertisements, and having captured material that is easily editable, including the ability to link to other documents.

In your example of a multi-part document that has a TOC, I would download the rich text of the TOC and each segment, then change the links in the TOC to point to the captured text segments rather than out to the Web. Now I can do lots of things easily, including highlighting, adding notes and comments, and linking the documents to other documents in my reference collection.

I often download HTML or Web Archive pages from a DEVONagent search and transfer them to a DT Pro database. If I need to do any editing or linking of some of those pages they can be converted to rich text using Data > Convert > to Rich Text. Then I can easily select and throw away extraneous material and add hyperlinks or markups as I please. (Sometimes that Apple WebKit bug may bite during a conversion, but I usually don’t encounter it.) But as I often add hundreds of pages at a time from a set of DEVONagent search results, I only rarely go to the trouble of converting and editing those pages. :slight_smile:

It’s also possible to use DEVONthink’s Download Manager to download an entire Web site. I rarely do that. You can find the directions in the Help files (Help > DEVONthink Pro Help). Depending on the way the site links are designed, it can be tricky to capture the wanted material without also capturing a large volume of unwanted material.

Hope this helps.