Copied PDF text question: Paragraph break - different behaviour, same document, copying text from PDFs (DevonThink & PDF Expert)

Hi, does anyone have any advice: I need to copy chunks of text from multiple PDFs - and I would prefer to copy straight from my DevonThink database. However, when pasting from DevonThink the text always has a paragraph return at the end of each line. When I open the same PDF in PDF Expert and copy the same text, it doesn’t have a paragraph break at the end of each line.

Is there a setting in DevonThink I should change so there are no additional paragraph breaks, or is this just how DevonThink behaves. I’m puzzled as I have never noticed this before (and have used DevonThink for several years). Thanks in advance. (I have looked at guidance and previous discussions - possibly not well enough)

What do you see in Apple’s Preview app?

I’ve tried that with picking a PDF in DEVONthink at random (print from a web page), selecting four lines, and then pasting into Word…using all three paste methods. No paragraph pilcrows come in as you report.

Perhaps it is the PDFs you are copying from? Or the paste target file?

PDFExpert doesn’t use Apple’s PDFKit so it’s not a 1:1 comparison.

And as @rmschne asked, where are you pasting?

Thanks all for your rapid responses.

I’m pasting into Scrivener.

I’ve just also seen that copying from a different PDF within DevonThink does NOT include end of line paragraph breaks. So maybe it’s the quality of (a lot!) of PDFs.

I also did a test: using the same block of text from one of the PDF that generates line breaks: both DevonThink and Preview behave the same and generate end of line paragraph breaks; PDF Expert doesn’t (all pasting into Scrivener). I have also tried pasting into Word and get the same behaviour as pasting into Scrivener. So it seems I just have a massive trove of PDFs which haven’t been created in the best way? If anyone has any workarounds, please do share! I’m working with around 200 PDFs and most of them seem to generate line breaks when text is copied.

If there’s something I can do to the PDFs so they don’t have paragraph line breaks, please do still advise.

However, I have answered my own question for a semi-elegant workaround: I sometimes use CopyEm, and have instructed it to paste all copied text as ‘one line’ (‘transform it’). Doing this also gets rid of real paragraph breaks, but it’s still not a bad compromise.

Thanks

I just tried same experiment copying from a PDF displayed inside DEVONthink, then pasting into Scrivener with 2 methods (paste, and paste match style) and both times did not get any pilcrow end paragraph marks, as you are reporting.

I don’t know what CopyEm is, but I can’t help but think it has a role in this mystery you have.

Thanks for testing: I have come to the conclusion that it relates to the original PDFs (how they were created perhaps). I don’t think CopyEm has a role in this - it’s a handy copy/paste utility and it’s always been in the background.

OK, I have found a solution: I used the DevonThink command ‘convert to PDF (paginated)’ on a ‘bad’ PDF, and then tested copy and paste: voila it works without the end breaks! So I shall now gradually bulk ‘convert’ all the PDFs.

Thank you all for jumping on this and helping me think it through. Still puzzled why some PDFs behave this way (not just in DevonThink but also Preview), but at least I have a solution and a workaround!

1 Like

There are more flavors of PDF than Baskin-Robbins’ ice cream! :icecream: :slight_smile:
And just like ice cream, not all of them taste good :stuck_out_tongue:

Seriously, PDFs have been around a very long time and there are many ways to generate them, not all of them good and some made for very specific purposes. Even PDFs coming from one source can vary as they’re generated over the course of years.

1 Like

Thank you Bluefrog, I can see that now. Thankfully DevonThink’s ‘convert’ function can clean them up and turn them all into a palatable flavour! So all good! I probably should have tried that earlier!

2 Likes

My pleasure :slight_smile: