PDF processing in Sierra

FYI-
appleinsider.com/articles/17/01/ … -data-loss

Note the most recent issue:
“In the most recent update to Sierra, users that edit PDFs in Preview are also discovering that saved OCR text layers generated by any utility relying on the ABBYY FineReader engine, including ScanSnap and Doxie, are stripped out by the Apple utility after a save.”

Looks like Apple really dropped the ball on this one. Many thanks to the DEVON team for attempting to manage this- I’m sure it’s much worse than we see from this side-
Suggest users feedback complaints directly to Apple on this

edit add:
tidbits.com/article/16966

"…the recently released macOS 10.12.2 has introduced a serious new bug related to PDFKit. Brooks Duncan of the DocumentSnap site published a note from one of his readers that warns that the OCR text layer added to scanned PDFs by Fujitsu’s ScanSnap software will be deleted if you edit the PDF in Preview. Eric Bönisch-Volkmann confirmed this, saying ruefully:

10.12.2 fixes a few bugs but kills the OCR text layer in PDFs. We worked around the earlier bugs in DEVONthink 2.9.8 and will address 10.12.2’s new problems in the upcoming 2.9.9. But yes, as soon as you edit a PDF in Preview the text layer is gone. Our customers are delighted."

ken

Thank you, Ken. Very useful info.

For bug reporting to Apple, non-developers can use this site:

bugreport.apple.com

or you can use

discussions.apple.com

Personally, I find submitting bug reports to the Discussions site is not useful – it’s not certain that Apple staff ever read things reported there.

Thanks, ken and yes, it has been a very trying time for Development. A workload that should not be on our shoulders. Thanks for the note and the support.

Could this potentially have anything to do with the issue I’m seeing?

Scan to DT Pro, not able to convert to searchable PDF.

Scan to a file, import a file, can convert to searchable PDF.

Running latest versions of all programs and 10.12.3 Beta.

My apologies if this is an inappropriate question for this thread.

It’s not inappropriate, and it is the cause of your problem.

Do note: macOS 10.12.3 is a beta release and should NOT be used in mission-critical or production environments, nor should it be expected to behave as expected. Betas are a moving target code-wise. We code for public releases, as the code can change substantially in the beta process. Again, you should NOT run beta operating systems on a machine you need to be working.

THIS IS EXTREMELY ANNOYING!

I just switched back to DTP Office after 2 years of pause. And I import all my ScanSnap Scans into DT.

Now I realized that OCR recognition does not work with macOS Sierra 10.12.2. Not even manually. I know its Apple and FineReader OCR for SnanSnap does not work either. But: Since this is a key functionality for most of us I guess it shold be mentioned somewhere, not only within the support forum.

Are there workarounds?

Use JPEG/TIFF/PNG for your raw scans instead of PDF and disable the option to enter metadata after OCR and it should work again.

Just installed latest Mac OS beta and it appears to have fixed the PDF issue I outlined above. I’m able to both scan to DTP and get a PDF+Text file as well as convert previous PDF only scans.

Since I’m running 2.9.8 of DTP and the latest versions of ScanSnap software, the only variable I can isolate is the latest beta from Apple.

I can confirm what jeffg says.
I was able to convert scanned documents into PDF + text that I could not convert with 10.2.3 Beta 1 and version 10.2.2

Promising, although perhaps Apple will break it again with beta 4.

I am still running MacOS 10.12.1. Does that version have the .pdf problem mentioned above?

No, not this particular issue. That broke in 10.12.2.

Great, thanks for your reply!