PDF numbering to reflect document page number

Just wondering if there is a chance DT could include in the PDF viewer the page number (or even page range) of the source, not just the document page number.
Eg: 2/20 (55) with 2 being the PDF document’s page number and (55) being the page number of the actual document, say a journal article or ebook etc.

Furthermore, if this could be reflected in the summaries of annotations, that would be even better.

It’s really great that DT can summarise the annotations etc. But when using those summaries or annotations as citations or references in reports/essays etc, only having the document number can be confusing and cause friction in the integration from research to output. That is, if I want to copy paste a quote from an annotation into an article, I’d have to click on the link or go back to the PDF to find the page number of the actual source. Having the source page included in the page link would be much easier and very helpful.

Eg, Link on annotation summary = “Page 4 (57)”

Is the PDF standard aware of the “actual” page number (i.e. the latter of the page numbers you want shown)? If not, the is no way DT could provide it (short of scanning typical locations and hoping any number found there was a page number; correlating the result with previous and following pages would make that more reliable, of course); this would be no small undertaking, I fear.

3 Likes

EDIT The following is unfortunately utter nonsense. Instead, please read

No. PDF supports shockingly few metadata, and the page number is not part of them. It’s just some ink somewhere on the page, as far as the standard is concerned.

Which makes retrieving the page number from the document a difficult task, as you correctly said. Even more so, if it depends on the results of OCR.

1 Like

I’m not sure if this is part of the standard PDF spec, but I often come across PDFs where the page number doesn’t start from 1. An example:

Sometimes those start from a larger number because it’s an excerpt from a magazine. Sometimes the first page is “cover” because it’s the book cover. Sometimes the front matter is numbered using Roman numerals, as is the case in the example linked above.

Many programs can read those correctly, e.g. PDF Expert and Firefox (I’m almost certain DT can as well. Not at my desktop atm). But I don’t know any that can edit those.

Though that sounds very certain, I stand corrected. Embarrassingly, there is a way to provide for a Page label inside a PDF, and there’s also a label property in the PDFPage class of Apple’s PDFKit framework.

The latter should make it relatively easily possible to add the page label to the annotation summary. Especially, since DT already displays the page label correctly in its preview pane.

@snakenuts: To atone for my previous erroneous comment, I could try to provide a script that fixes the annotations in the way you proposed. But since I do not use annotations at all, I have no sample summary file to work with. Could you provide such a file, preferably as a Markdown document (while it might be possible to modify an RTF file, too, it’s a lot more work)?

1 Like

Definitely there are programs that can display the source’s page range. My goodreader app is able to do it. But I would only assume it’s for more recent digitised versions.
And to be honest, I swear there was one time I noticed it on DT as well. I just can’t recall correctly.

Edit: Yes, I just confirmed that DT can display the internal number of PDFs (if the metadata is there of course), but these aren’t reflected in the annotation summaries.

1 Like

How much of the document would you need?
I can upload one, for sure. Would I need to attach the pdf as well?

As promised, I wrote a first iteration of a script: