Pdf is not pdf document?

Dear all, today I noticed something really strange. Some documents (bills sent as a pdf from Thomann to be precise :-)) can be imported into devonthink but they don’t include any text. Everything is there, when I open the pdf in preview. However devonthink cannot access the text (which means, I can’t search for it :frowning:)
I tried to OCR them within devonthink, but these menu items are greyed out. Devonthink recognize these documents as “pdf documents” (vs. pdf or pdf with text)
Any idea what I can do with this files?

Were any issues logged to Windows > Log? Does it work after reimporting them?

I found a similar problem with PDF and PDF+Text and the strange behavior for DT3. When I get the invoice of Telekom and open it with Adobe Acrobat there is no problem with Text. I’m able to search for text in the document.
The file size is about 130 kB. Import to DT 3 leaves the file size as is but the fie type is without Text. It is possible to activate text with the cursor, but copy or search is in DT3 not possible. An now at the top; when the file is exported in the Mac-File System, outside of DT3, the file size exploded by factor 10. When I do an OCR in DT3, everything seems okay except the file size which increases by the factor 20.
It seems a miracle too me, is there anybody who can explain this bevahiour?

Maybe the PDF document doesn’t permit copying. Does this work in Preview.app?

In the Preview App it is the same as with DT3. I can mark the text and copy will not work. I looked in the permissions, there is stated that I have all possibilities to copy and so on. To my surprise the file size went from 135 KB up to 2,6 MB. Only by open the file in preview.

It’s possible that the PDF document is not compatible to macOS’ Quartz engine and PDFkit framework. This would explain why it’s not working in both DEVONthink and Preview.

on my Mac runs MACOS 10.11.6 and I use adobe acrobat 9. with this version of adobe everything works fine, the file size stays small and it is possible to copy text from the pdf document. the search in the text also works fine. I also made a test with PDFPen Pro. There was no text and OCR was not possible.
Is this the progress? with every step we will miss something proven

You can’t reliably compare the behavior of Acrobat with other PDF applications. Adobe created the PDF standard and Acroabt can do many things other apps can’t.

Okay, I agree with you. But what about the blow-up of the file (20 times) when I export it from DT to the file system?

How are you exporting the file?

Also note that Acrobat can support different kinds of compression, some quite aggressive. It’s likely a difference in compression, or potentially a lack of compression in PDFKit’s output.

Mark the file and go to Menü Storage - Export as document. The result is a 2,6 MB file started with an 135 kB file. In my opinion the export is intended as easy method to bring the files available to others. isn’t it?

That export should leave the file intact.
Can you ZIP and post the PDF you are referring to?

One Invoice for example Rechnung 200812.pdf.zip (43.8 KB)

In general there is no problem with self produced scans and OCR with Acrobat. Invoices from third party send as Acrobat-files have the described behavior .

Thanks for the file!
Interesting - I see the same thing here.
@cgrunenberg may have some insight into why this may happen.

These export commands (Word, PDF, RTF etc.) support multiple selections and merge the items into a new document for exporting. For simply exporting a file drag & drop or sharing is usually recommended.