PDF+Text vs. PDF-Document

I have already searched the forum, but I find only older posts and hope hereby brief clarification.
When I send a document to DT with my ScanSnap directly on the MacBook, it automatically creates a PDF+Text document. If I scan via the mobile phone and put it into the inbox (unfortunately I don’t know any automatism there either) I “only” have a PDF document.

The PDF document in my example is 1.4 MB in size. The PDF+text, on the other hand, is only 80 KB in size.
The quality of PDF text is of course significantly worse due to the size alone. But what else are the differences?
I can mark the text in the PDF in both cases; so the DT search works perfectly.

The only difference besides the quality is that in the bar on the right, the words are not counted for “PDF document”, but they are for “PDF+text”.

So where is the real difference?
Unfortunately, I did not find anything in the manual either.

What do you use the most - and why?

Answers in German are welcome as well :slight_smile:

PDF+Text has a text layer which was indexed (and therefore the full text search & concordance can use the indexed text), PDF document doesn’t (e.g. if the PDF contains only graphics/images).

Wow, I love this forum and how quickly there are always answers here, thank you!

If that’s not a full text search on the PDF document, what is it? Because I can still select the text and also search normally. When I type a word in the search at the top of DT, it also finds results in PDF documents (so I’m not talking about PDF+text).
Or does it have to search again each time and PDF+Text has an index that speeds up the search?

Sorry for my dumb questions.
In my mind, PDF+Text would have to be larger in storage volume than a PDF document.
Or is compression always started when converting to PDF+Text?

The only difference is really whether DEVONthink indexed the text layer or not. This doesn’t have any impact on file sizes and selecting text might still be possible due to the live text feature of recent macOS releases. Everything else is impossible to tell without a copy of the document.

Okay, thanks. But when I’m converting a PDF-Document I just make a right click → convert to → PDF+Text
After that I receive a duplicate document (PDF+Text) and the size of the file is reduced, compared to the PDF Document

2023-02-24_e.on (PDF-Document).pdf (3.0 MB)
2023-02-24_e.on (PDF+Text).pdf (1007.9 KB)

An OCR’d PDF is never going to be the same size as the original. In some cases, it can be larger; sometimes smaller. Also compression can be en/disabled in Preferences > OCR.

And enabling Original Document: Move To Trash in the same preferences will put the unOCR’d original in the database’s Trash.

Thanks again :slight_smile:
The compression is disabled, that’s why I’m wondering about the reduced size of PDF+Text.

I wasn’t aware about Original Document: Move To Trash is working for individual converted documents, I thought it is only about incoming/imported documents.
I’ll activate it, thanks :slight_smile:

You’re welcome :slight_smile:
Each page is processed and saved individually then collated back into a finished file.

PS: A smaller (or larger) file isn’t indicative of its quality.

1 Like