Many pdfs that we think don’t contain searchable text actually do, they just don’t show up in apple Preview because apple’s PDFkit api is not very good (Did they fix this in Mojave?). You can confirm whether a pdf already has text by opening it in Adobe Acrobat Reader and searching for any text string you can see it should contain.
The reason to test this is so that you don’t perform a long re-OCR that balloons your file size.
If you find such a PDF, you can use a simple terminal command with a utility called ghostscript to make it readable to apple Preview.
PS: The script Convert > Convert PDF with Quartz, available in Pro and Pro Office’s Script menu > More Scripts will allow you to do the same without Preview.
I’m running DTPO 2.10.2 (latest version as of this date) and there is no Convert PDF with Quartz script in More Scripts. The available Convert to PDF does not make the document searchable.
Can anyone please provide a link to the Convert to PDF with Quartz script?
If I use the Convert to Searchable PDF function in DTPO, it performs an unneccessary OCR, and destroys the table of contents and any other internal links.
Saving as PDF-X3 from Preview does not render the PDF as searchable.
EDITED: I got Ghostscript to work by moving the files into the Documents folder, where GS could write the file without toruble. The resulting file has a working TOC and is serchable in Preview and other apps that use the Apple PDF frameworks.
This is the method that finally worked for me, thanks. I’m running macOS 10.14.1 and couldn’t get any of the Preview-based methods to work. FWIW, you can get ghostscript (“gs”) without having to use homebrew. I got it by downloading MacTex (pages.uoregon.edu/koch/), although I didn’t try it.
Once installed, open a terminal window and verify it’s installed by typing “which gs” or “man gs”.
These PDF scripts seem to have disappeared in DEVONthink 3 Pro. Is there an alternative?
I would like to apply a Black & White quartz filter to an OCRd PDF from within DEVONthink.
During OCR conversion PDFs get too big because of grayscale.
Thanks for the quick response! It takes a few clicks to select and apply the B/W Quartz filter, but it works.
How about supporting this B/W conversion natively in DEVONthink? I read a lot of threads here from people concerned about big file sizes after OCRing.
I could imagine a context menu entry, such as “OCR → to searchable PDF (Black/White)”.
You’re welcome and yeah, it’s an Automator workflow under-the-hood.
How about supporting this B/W conversion natively in DEVONthink? I read a lot of threads here from people concerned about big file sizes after OCRing.
I could imagine a context menu entry, such as “OCR → to searchable PDF (Black/White)”.
Thanks for the suggestion. Development would have to assess this.