Document not captured correctly

GJT333 · March 29, 2022, 5:44pm

I am curious why my captured PDF is showing ASCII for the Translate and lookup options in the menu.

BLUEFROG · March 29, 2022, 5:59pm

A PDF isn’t a text file so what you see and the underlying code aren’t necessarily the same.

Select the PDF and choose Data > Convert > to Plain Text and inspect the text file.

How did you capture this PDF and from what URL?

GJT333 · March 29, 2022, 6:41pm

I captured it using your plug-in for FireFox. The URL is here (The Historical Unity of Russians and Ukrainians - Modern Diplomacy) Converted to plain text can be seen in screen grab.

cgrunenberg · March 30, 2022, 8:07am

This seems to be an issue of the Safari/WebKit engine and/or the PDFkit. A PDF exported or printed from Safari has the same broken text layer.

GJT333 · March 30, 2022, 8:21pm

OK. Is there a known work around? Do you guys report it to Apple/WebKit folks?

cgrunenberg · March 31, 2022, 7:20am

We did in the past, none of the reported issues breaking the text layer was ever fixed.

GJT333 · March 31, 2022, 3:12pm

So does this mean that PDF’s captured on the web on a Mac are not searchable? I guess I am trying to understand the impact of this.

cgrunenberg · March 31, 2022, 3:23pm

Usually they are but not in case of websites using certain languages and/or fonts.