Merging PDF and PDF/Text Documents

rkaplan · December 20, 2019, 4:27pm

If I have a set of documents of which some are PDF and some are PDF/Text and then I choose to Merge the documents, the resulting Merged document shows it as a PDF/Text document.

I would suggest that it would be better to call it a PDF document because the “PDF/Text” designation is misleading - in reality it is only partially OCRd and it would be better to call it a PDF to alert me to do OCR an convert it to a fully text-searchable document.

BLUEFROG · December 20, 2019, 4:37pm

The file will report as PDF+Text since there is a valid text layer detected in the merged file. And there are plenty of PDFs that have a mix of text and pages f images requiring no OCR.

I would not suggest doing to OCR post-merge. It should be done before merging the files.

rkaplan · December 20, 2019, 4:59pm

That could really slow down the workflow since it takes time to do OCR especially on long files - that is best left for overnight rather than immediately on receipt of documents when the goal is to organize them (often meaning combine them).

What is the risk or downside of doing OCR post-merge?