When I summarize annotations as Markdown or RTF the annotation document includes the exported highlights from the PDF; however, sometimes the words within the highlights are out of order. It doesn’t happen on every highlighted segment of text; however, it has happened on multiple different PDFs. There doesn’t seem to be anything I can pinpoint as a common denominator. I’ve included screenshots below to illustrate the problem. Anyone else have this issue?
I get this all the time, where something later in the text shows up above something after in the text (in the summarize doc). If there is a way to fix this, I’m all ears, but I believe it’s an issue with the text layer/PDFkit.
Which application did you use to highlight the text? It might be also due to the internal order of some highlight annotations. An example document would be useful.
Hi Bluefrog! Yes, I tried this and the plain text is accurate. The problematic quotes also show up correctly when I copy and paste them from a pdf into a new document.
Thank you for the document! I was able to reproduce the issue, it’s caused by internally unsorted rectangles of some markup annotations (e.g. highlights), usually created by applications not using macOS’ PDFkit. The next release will fix this.