Search in PDF?

roufianos · March 6, 2009, 9:01pm

Hi,

I cannot search for a word inside PDF documents. I enter the word in the search field but my pdf files are ignored.
Spotlight indexing is enabled.
Any ideas?

Thanks!

Rouf

Bill_DeVille · March 6, 2009, 10:28pm

Try your search in the full Search window (Tools > Search), where it’s easy to inspect all the settings for the query. Were you searching in the correct database where your PDFs are stored, for example? Was the query set for Name, although you wanted a Content search?

I do all my searches in the Search window, not only because I can inspect the settings more easily, but because there are more options available in the Search window. And I don’t lose my place in the document I was reading.

If you haven’t tried the Lookup command, give that a try. Select a word or string and press Command-/. The Search window will open with the selection entered in the search field. You can then modify the query using the operators and search syntax, if necessary. For example, if the selected string was John Doe, but you don’t want to find anything about deer, enclose the string within quotation marks.

If you have verified that your query is formulated properly and is looking in the correct database where a PDF resides, yet it doesn’t show up in the search results, examine the Kind of that PDF. If Kind = PDF, it’s not searchable; you can see words in the image layer, but the PDF doesn’t contain searchable text. If the Kind = PDF+Text, it does have a searchable text layer. OCR will convert an image-only PDF to a searchable PDF.

roufianos · March 6, 2009, 10:52pm

Thanks for your reply.

Yes I tried with the Search Window also.

All parameters are ok. The file type is PDF+Text.
Nothing shows up when I search for a word inside the PDF…

eboehnisch · March 9, 2009, 5:52pm

Can you search within that PDF when you open it externally in Preview?

acl · March 10, 2009, 5:13pm

It could be that pdfkit is having trouble with the text. Try to copy the text from the pdf to the clipboard (by selecting it) and then paste it somewhere. To take a completely random pdf file on my hard disk, I get
Fractionalizationindimerizedgrapheneandgraphenebilayer
ie all spaces are gone. That may well be what is happening in your case (in my case, all pdfs from the same source have this problem, but if I open them with acrobat I can copy and paste with no problems).

OK take a look at this discussion
betalogue.com/2007/07/13/mac … t-a-space/
(to save time, you can just read the main text and the comments by John Calhoun if you want).

Is this the kind of thing you are seeing? One solution would be to export the pdfs to images and then OCR those (!)… With DTPO this results in huge files, but I also have acrobat pro which is much better in this regard. However, this way you lose hyperlinks (and scripting acrobat pro is very unpleasant too).

annard · March 10, 2009, 5:38pm

You can also just try to “Convert > to Searchable PDF”. And with Abbyy the files shouldn’t be so big anymore.

acl · March 10, 2009, 11:32pm

Indeed that works and is less hassle. And yes, files are smaller now than before, but, for example, I just tried on a 4-page, 125kb text+image pdf; using the abbyy engine and setting the preferences to 200dpi and 75% quality results in a 3MB file and pretty bad image quality (but good OCR), while acrobat pro results in a 120kb file and much better image quality (it also looks like there is some image processing occurring with acrobat, which helps it look better).

I haven’t tried changing OCR settings to get the abbyy result further down, that may be possible.