Clip to devonthink & missing images in wikipedia

jakemkc · March 12, 2020, 8:00pm

Hi all,

Quite often I need to save wikipedia pages to pdf as an offline reference. This is done through the clip to devonthink plugin or the create pdf bookmarklet.

However, I have to repeat this step a few times until the pdf contains the images in the wiki page.
It is reproducible in latest version of Firefox or Chrome. e.g., https://en.wikipedia.org/wiki/Factor_analysis

Is there a setting I can tweak to improve the saving behavior here? I want to do it just once.

I am clipping with the option “pdf (one page)”.

Best,
Jake

MichaelHD · March 12, 2020, 8:16pm

The print layout (= same as the PDF layout) is determined by stylesheets in the backend of the website. Try to save the page as a webarchive

jakemkc · March 12, 2020, 8:20pm

Thanks for the input.

However, my case is that if I repeatedly do the conversion to create multiple pdf, after a few attempts, I will get the pdf with the images. So I think it’s not related to stylesheets.

I prefer to save to pdf because it is easier to annotate and able to export back to finder without additional issue.

Best,
Jake

BLUEFROG · March 12, 2020, 9:10pm

Try it in Safari.

jakemkc · March 12, 2020, 9:29pm

Just tried in Safari and it behaves the same as in Firefox and Chrome.

I wonder can others reproduce the issue using the example wiki page?

My hunch is that once the link is passed to devonthink, it didn’t wait until the images are loaded to create pdf conversion…

Jake

MichaelHD · March 12, 2020, 9:38pm

You can try Fireshot in FF

jakemkc · March 13, 2020, 4:13am

Thanks for the suggestion. However, fireshot’s pdf is not text-selectable. it’s just an image in pdf format. Firefox can save the entire page as an image without installing plugin.

I understand that clipping will not work well across every website, especially those require login and powered by complex scripts for fancy effects.

I suppose Wikipedia is not one of them and it is a popular knowledge portal for many of the internet users that worth fixing related issues

Jake

BLUEFROG · March 13, 2020, 10:28am

That’s strange as I have no issue in Safari. What OS are you running?

jakemkc · March 13, 2020, 7:33pm

Here is my computer settings:

Mojave 10.14.16
DT 3.0.4
Safari 12.1.2
Firefox 74
Chrome 80.0.3987.132

Is there a DT setting related to this behavior that I can try to adjust?

Jake

suavito · March 14, 2020, 4:25pm

Did you try to Save as PDF to DEVONthink in the Print menu?

I save webpages from Safari as PDF all the time and found Printing to PDF to be best way as it a) shows a preview and b) allows to chose between either the “normal” view (the result depends on the settings in the print CSS of the webpage) or the Safari Reader view. With the latter being most of the times the best option for me.

freizhang · January 2, 2021, 4:15am

I have issue on Mojave and Big Sur too, the missing image issue is frequently reproducible and quite annoying.

BLUEFROG · January 2, 2021, 3:31pm

Welcome @freizhang

What URL are you trying to capture?

freizhang · January 4, 2021, 11:58am

Sorry for my late reply, it can be reproduced by the url @jakemkc provided.
It seems all the pictures are fine(mainly mathjax pictures which are small) except one slightly bigger picture.
Could it possible due to the resource dependency resolving timeout was triggered?

jrgetsin · January 4, 2021, 2:40pm

DT3 has had this problem from the beginning, in my experience. It is clearly a bug in the program, as it repeats exactly the same every time.

Images are included with Wikipedia clips only in the third iteration. Doing anything else with DT3 in between iterations means you have to start over with three iterations again.