problem with images when capture web pages

Hello everyone,

today I was working with my database after move them from my imac to my ibook and I have realised that all my web captures doesn´t keep the images inside the html document, when I have a internet connection they load without problems but it doesn´t keep a copy of this images inside my database, how could I achieve this?

Thanks

That’s the nature of HTML page captures. Only the text and the hyperlinks to images are captured.

The most efficient way to capture text and the images you want would be to do rich text captures (also know as note captures) of selected material. I say most efficient, because one can choose not to capture unwanted images, or can easily edit them out after capture. Generally, I do at least 90% of my data captures from the Web as rich text captures.

Another way, which captures all images, would be to save a page as a Web Archive and import it into DT Pro or, if using DEVONagent or the browser in DT Pro, use the contextual menu option to import the page as a Web Archive. Note that Web Archives are Macintosh only; they cannot be read by Windows users.

Still another way is to capture a Web page as a PDF document. DT Pro installs a script at YourBootVolumeName/Library/PDF Services/ named Save to DEVONthink Pro.scpt. To use is while viewing a Web page, select File > Print. When the Print panel appears, press the PDF button, then select Save to DEVONthink Pro.scpt and choose the location for the import. Note: Hyperlinks on the Web page will not work in the PDF capture.

Thank you for so quick answer!!

Is there any automated or semi-automated way to convert a bunch of captured web pages into web archives?

Thank you again

I saw the current thread just as I was about to write and ask about importing web sites (File-Import Site…). There are a couple of sites I want to import as a web archive. The problems I have are several: sometimes all that’s imported is the index.html file; sometimes I get lots of files but things still seem to require being connected to the web to view them (I’m not talking about going to other sites which are linked because I certainly don’t want to suck all those, too). I really want the sites in an archive in my DTPro database and not need to connect to the web.

I read the following in the DT Help file:

Import Site…: Opens the “Download Manager” and downloads a complete web page/site for archiving and offline viewing. Make sure the download options are set correctly, especially the options that define which links DEVONthink Professional should follow (if any). All links within the site are modified so that they point to the downloaded images or other embedded objects. This ensures that the page/site can be displayed at any time.

I don’t understand what “options that define which links DTP should follow.” I don’t see anything about this in Preferences->Import and I don’t see anything in the dialogue box that pops up when you choose File->Import Site…

There’s a lot of data I need to get into DT for a big research project I’m working on, and if I could suck these sites successfully so that I can work the site while not online, it would help me tremendously.

Any help understanding Import Site… would be greatly appreciated.

Happy New Year to all!
Martin

When I wanted to capture web pages for the database, I had been using the script “Add page to DevonThink.”

Then I read what Bill wrote and wondered if I should be doing this:

But I can’t find a script or command called rich text capture or note capture. How does one do this?

PS: is there any place that lists the best way to do common tasks? I am finding myself doing what sounded right until I discover in the forum that there was a better way.

Thanks,
Jerry

Jerry:

There’s not a script for what I do: Selecting the desired text/images and leaving out what I don’t want. :slight_smile:

Different users have different “best ways” to do things. I write a lot about how I do things. My practices may not be the best way for you (or even for me, if I would only realize it sometimes).

I’ve learned a lot from other users on this forum and expect to keep learning new ‘better ways’ here.

Thank you, Bill.

So, in Safari, I select what I want and then use the script “Add selection …” or “Add text to DevonThink”? Is this pretty much the equivalent of Add page to DT, except that in your method, the images themselves can actually be captured?

Thanks,
Jerry

This looks like a wonderful idea, I had not seen that script and I have immediately made a test, first with Safari, then with OmniWeb.

However, in both cases, it didn’t work: it seems to do the printing job as usual (without asking me for the location, by the way), but then suddenly I get a “Printing error” message, and my browser window remains frozen. No way to move anything, sidebar remains greyed instead of becoming blue, and the only way is to shut down and relaunch the browser.

Do you have any idea how I could solve that problem and make the script work without the printing error? It would really be nice to be able to convert webpages into PDF like that!

Yes. Locate the “Save to DEVONthink Pro.scpt” file in /Users/YourUserName/Library/PDF Services/. Double=click on the script to open it under Script Editor. Press Command-S to save it back to the same location. Now it will work.

Note: I often do rich text captures of selected text/images to retain working hyperlinks and also to avoid obtrusive advertising. The browsers in DEVONagent and in DT Pro offer convenient contextual menu options for choosing that option or to capture as HTML or as a Web Archive.

In Safari, and other apps that support System Services, you might prefer using these services:

Take Rich Note [command-)]
Take Plain Note [command-(]

They’re faster than running the Add Selection/Text to DEVONthink scripts.

Thank you, Bill! I have done as instructed, and it works! So easy to do… but I had not figured how to do it. I must say that I am a relatively recent “convert” to Mac: I was on PC until 2004, and then switched to Mac, scripts remain a somewhat mysterious world for me.

Thank you also for your advice about Rich Text Capture. Obviously, there may be different capturing strategies depending upon the context.

Yes, thank you Bill! I just re-saved my script and it works perfectly. I haven’t been using it much, but thought I’d give it a go. I should have known you’d have the answer!

Back to work! :slight_smile: (I’m about to hand in my second chapter. Scary!)

Alexandria