DT3 doesn't see text inside PDF

I can read such files in DT3, but word count shows 0. When I select text and copy it, nothing is copied. Pasting inserts some space, like invisible chars… Other apps, like PDF Expert or Adobe Acrobat don’t have such problems.
What’s wrong?

Here is a page of such text: Page 10.pdf (96.9 KB)

Other apps, like PDF Expert or Adobe Acrobat don’t have such problems.

These apps don’t use PDFKit, so they’re not a reliable comparator.

You can open it in Preview and save over itself with the Create Generic PDFX-3 Document option enabled.

It should then be imported as PDF+Text and be searchable.

Thank you, Jim! It’s worked fine!

You’re welcome :slight_smile:

Does anyone know PDF types well enough to explain the difference between PDF/X which is for printing and PDF/A which is for archiving. Both seem to have a lot of the same specifications. I ask because Preview doesn’t make a PDF/A and I don’t want to spend money on an app that will (even if I did, Preview is easy use).

Adobe knows…

About PDF/X, PDF/E, and PDF/A standards

PDF/X, PDF/E, and PDF/A standards are defined by the International Organization for Standardization (ISO). PDF/X standards apply to graphic content exchange; PDF/E standards apply to the interactive exchange of engineering documents; PDF/A standards apply to long-term archiving of electronic documents. During PDF conversion, the file that is being processed is checked against the specified standard. If the PDF does not meet the selected ISO standard, you are prompted to either cancel the conversion or create a non-compliant file.

The most widely used standards for a print publishing workflow are several PDF/X formats: PDF/X‑1a, PDF/X‑3, and (in 2008) PDF/X‑4. The most widely used standards for PDF archiving are PDF/A‑1a and PDF/A‑1b (for less stringent requirements). Currently, the only version of PDF/E is PDF/E-1.

For more information on PDF/X, PDF/E, and PDF/A, see the ISO and AIIM websites.

For details on creating and working with PDF/A files, see www.adobe.com/go/learn_acr_pdfa_en.

Thanks for that, it helps. But it doesn’t actually answer the question of the difference or similarities. For example, both formats must have no javascript, must have embedded fonts, no video or audio, and so on. There are a lot of similarities between the formats, so I’m curious as whether the PDF/X format – which can be created from within Preview – is similar enough to PDF/A to be used for a “poor man’s” archival purpose.

Any PDF can be used for a “poor man’s archival purpose”.

Thanks @BLUEFROG for that handy trick. I hoped it might work for my bank statements, which are delivered electronically in a peculiar PDF format that DT doesn’t recognise as PDF+text. The text is selectable, but when copied it yields only a few garbage characters. Anyway the PDFX-3 approach didn’t work so I will revert to using OCR. But if you have any better ideas for recalcitrant PDFs they would be welcome.

(Preview.app shows the PDF producer as the Columbus suite from Macro4.)

You’re welcome. We are always discussing things in here, so if we discover something more robust, we’ll certainly let everyone know. Cheers!

Not for one looking for long-term archival with the hopes of opening the PDF 20+ years from now. That’s why the PDF/A format exists. But short of paying for an app that can create PDF/A, I thought the PDF/X format may be a way to create a long-term archival PDF with native PS tools.

After I have save over itself with the Create Generic PDFX-3 Document option enabled, all the text in the PDF are gone. Did I do something wrong?

Welcome @luklau88

This is likely an issue with Apple’s PDFKit and CID fonts (usually Asian character sets). Where are you seeing the text is missing?