When I convert a pdf to rich text I always get an image of the pdf in the converted text. This is never what I want — it means I have to go through every converted text and remove the image afterwards.
I can’t find a way of turning this behaviour off (and I can’t convert to plain text because I’d lose bold/italic etc.) The Display PDF Attachments of rich texts settings has no effect.
Is there a setting to disable this behaviour? If not, does anyone know of a way of stripping out all the images automatically?
EDIT: Found a way of getting to the underlying text — open the document as a folder in Finder and look at the underlying TXT.rtf file. But that’s not really a sustainable solution. Is there a less clunky way? Thanks.
I can’t reproduce this. Please post a screencap of what you’re seeing.
Thanks Jim. Here you go…
It’s always done this (several years), as far as I remember: it’s just that I’ve started to use it more often, so it’s becoming irritating.
But I’ve just tested, and it only seems to happen on a pdf which has itself been converted from PNG, like this one:
I typically use if for getting text from screenshots, which is what I did here: screenshot of our posts > import > convert to pdf > convert to rich text > swear at embedded image…
Our free WordService includes a service to remove attachments. Another possibility is to use AppleScript:
tell application id "DNtp"
tell text of think window 1
repeat with theAttribute in attribute runs
if exists attachment 1 of theAttribute then
set text of theAttribute to ""
Thanks Christian — those are both very helpful!
Is it a bug, by the way, or is it expected behaviour with converted PNGs?
It’s more or less expected, the actual conversion is performed by macOS’ PDFKit framework and therefore the results might vary depending on the document and the macOS version.
Thanks for your help, Christian.