PDF "printed" to DTP office not searching

Hi all,

Sorry if this is a silly question but I have only just started using DTP office.

I have many pdf files in the database - a lot generated by the print to DTP option installed on the safari print dialog.

In all but one of these pdf files I can search ok

The one I have a problem with started as a pdf file loaded from the web into safari and then I saved it to DTP via the print command. It shows up as being of type pdf + text and I can highlight words within it when viewing so I do not think it is just an image version of the pdf.

But when I search for terms within this pdf I get a no results found message.

The pdf is around 20-30 pages long and just under 900k size if that makes a difference at all.

Anyone got any ideas on how to make this file searchable and also why it is not anyway ?

Thanks

Suggestion: Select that PDF in your database and choose Data > Convert > to Rich text. Examine the resulting text file. Does that suggest why your query didn’t work?

In Preview, open the file and choose Tools > Inspector. How was that PDF produced?

Hi

And thanks for the suggestions: I have tried them with the following results:

If I try and convert to rich text I just get an immediate log message saying “not converted”

If I try and convert to plain text DTPO hangs with 90% CPU has to be killed off

I was not sure what info you were after from the preview idea you mentioned. It just kinda looks ok to me

The pdf in question is centerleft.net/journals/betsy/do … lBlogs.pdf if you want to try and see what happens your end.

I just tried this:

  • Create a brand new DTP database - called mine test
  • Navigate to the pdf page above in Safari
  • In safari hit print and select “to DTP” under the PDF tab
  • hit ok
    It puts the pdf into the new DB ok all 130+ pages of it
    It lets you view and highlight words in the pdf
    but if you view the pdf and pick a word in it to search for

I Used “Fictional” you should [well I do anyway] a empty results set

Any ideas ?

This is using 10.4.11
Safari 3.0.4 (523.12)
macbook pro core duo version [older one]
DTPO version 1.5

Thanks

I didn’t have any problem importing the PDF into a DT Pro Office database, and the PDF document is fully searchable.

Conversion to plain text took less than 2 seconds on my MacBook Pro (older) with 2 GB RAM. Like you, I found that conversion to rich text was logged as not converted.

I used Inspector in Preview to check the version and creator. The PDF is version 1.2 (older than usual these days), produced from a MS Word document (Windows, I think) using Acrobat 5 PS script. Apple’s routine for reading/converting this PDF as rich text probably found something non-standard in the text formatting and aborted. Note: Although Adobe’s “Portable Document Format” is a ‘standard’ there are actually a considerable number of ‘flavors’ of PDF files. I’m sticking with Acrobat 7 at the moment, as I don’t think Acrobat 8 is currently fully compatible with Mac OS X 10.4.x or 10.5.x.

It may not have been a good idea to Force Quit DT Pro while it was doing the data conversion, as it might have been writing to disk at the time. You may have slowed down because you were heavily using Virtual Memory at the time. You should immediately run Tools > Verify & Repair to check for possible database damage from the Force Quit. I usually do a Restart after performing Force Quite on any application, as there could be errors in memory.

Hmm,

This is strange.

Following your last replay I again made a fresh new database in dtp, so no possibility of crash or left overs from a force quit etc.

This time I saved the pdf to disk from preview and used the DTP import menu item to get the pdf into DT.

Tried a search and again got an empty set

BUT

Then I tried selecting a word by highlight and copy from the pdf when viewed in DT. and search for that by pasting it into the search box in DT

  • first off, the paste results in unreadable chars in the search box some strange graphics chars I have not seen before

This is what I see in search box img.skitch.com/20080101-m3rfj9ay … 1ne65g.png

  • The search failed again

Now, on a hunch, I just double click a line in the pdf displayed in DT and copy that to the search box

again unreadable graphic like chars in the search box as a result BUT

But. I got a hit on the search this time!!!

this seems like it might be some form of character encoding problem maybe.

If I type the exact same text into the search box then the text displays ok in search box but I get no results. Paste it in and I get graphic looking search box but get a hit in the results!

I am using this on a UK mac - would that make a difference would you think ?

Very strange.

??

I’m puzzled.

Searches (including Lookup of a selected text string) work on my computer, and I can copy/paste rich text from the PDF into a DT Pro rich text document or a TextEdit document with no problems (and no strange characters).

But the fact that OS X couldn’t convert the PDF to a rich text document, although it succeeded with a plain text conversion, indicates that something is strange about this PDF.

Suggestions, anyone? Is UK localization/keyboard that different from U.S. localization/keboard?

Agreed, strange indeed,

On the face of it I do not think UK is that much different to US but as you say this might be a strange pdf.

All of the other pdf’s I have captured to dtp [so far] seem to work fine. I have spent a while today just checking that.

The really strange thing I think is that a single word copied from the pdf to the search input does not work where as a whole line [or the amount selected from a double click] does work. Both produce the funny graphic chars in the search field though.