For PDFs with text searchable in acrobat but not in Preview.

Many pdfs that we think don’t contain searchable text actually do, they just don’t show up in apple Preview because apple’s PDFkit api is not very good (Did they fix this in Mojave?). You can confirm whether a pdf already has text by opening it in Adobe Acrobat Reader and searching for any text string you can see it should contain.

The reason to test this is so that you don’t perform a long re-OCR that balloons your file size.

If you find such a PDF, you can use a simple terminal command with a utility called ghostscript to make it readable to apple Preview.

Here is the command:

gs -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=“/path/outputfile.pdf” “/path/sourcefile.pdf”

Here is a guide on installing ghostscript on your mac via homebrew.

macappstore.org/ghostscript/

Good luck
Sam

1 Like

If you resave a file as PDFX-3 from Preview, it should also resolve the issue.

PS: The script Convert > Convert PDF with Quartz, available in Pro and Pro Office’s Script menu > More Scripts will allow you to do the same without Preview.

Thanks - I tried saving as pdfx-3 and saving with quarts in preview and they didn’t seem to work for me. I’ll try again sometime.

I’m running DTPO 2.10.2 (latest version as of this date) and there is no Convert PDF with Quartz script in More Scripts. The available Convert to PDF does not make the document searchable.

Can anyone please provide a link to the Convert to PDF with Quartz script?

If I use the Convert to Searchable PDF function in DTPO, it performs an unneccessary OCR, and destroys the table of contents and any other internal links.

Saving as PDF-X3 from Preview does not render the PDF as searchable.

EDITED: I got Ghostscript to work by moving the files into the Documents folder, where GS could write the file without toruble. The resulting file has a working TOC and is serchable in Preview and other apps that use the Apple PDF frameworks.

It’s called Filter PDF.

This is the method that finally worked for me, thanks. I’m running macOS 10.14.1 and couldn’t get any of the Preview-based methods to work. FWIW, you can get ghostscript (“gs”) without having to use homebrew. I got it by downloading MacTex (pages.uoregon.edu/koch/), although I didn’t try it.

Once installed, open a terminal window and verify it’s installed by typing “which gs” or “man gs”.

Hope this helps too!

1 Like

These PDF scripts seem to have disappeared in DEVONthink 3 Pro. Is there an alternative?

I would like to apply a Black & White quartz filter to an OCRd PDF from within DEVONthink.
During OCR conversion PDFs get too big because of grayscale.

That is from quite some time ago. Here is a version you can test.

  1. Unzip the attached file.
  2. In the Finder, press Command-Shift-G and paste: ~/Library/Application Scripts/com.devon-technologies.think3.
  3. Drag and drop the AppleScript from Step 1 into the Menu folder in this window (or a desired subfolder).
  4. Relaunch DEVONthink and you can access the script from the Script icon menu (and the subfolder, if chosen).

DT Quartz Filter.workflow.zip (187.8 KB)

Let me know how it behaves.

1 Like

Thanks for the quick response! It takes a few clicks to select and apply the B/W Quartz filter, but it works.

How about supporting this B/W conversion natively in DEVONthink? I read a lot of threads here from people concerned about big file sizes after OCRing.
I could imagine a context menu entry, such as “OCR → to searchable PDF (Black/White)”.

You’re welcome and yeah, it’s an Automator workflow under-the-hood.

How about supporting this B/W conversion natively in DEVONthink? I read a lot of threads here from people concerned about big file sizes after OCRing.
I could imagine a context menu entry, such as “OCR → to searchable PDF (Black/White)”.

Thanks for the suggestion. Development would have to assess this.

Cheers!