PDF now an image

Wayne_Dreier · January 28, 2006, 3:22am

Can someone explain what is happening in this scenario:

I used DA browser to locate a web page. I then went to Data- Add to DEVONthink and selected PDF (One page) and the page appeared in DT fine. It is exactly what I want and need. I did notice however when the page shows up in DT it is listed as an Image type rather than a PDF type which was what I selected in DA. Should I be concerned about the apparent change of type?

Bill_DeVille · January 28, 2006, 4:47am

Wayne:

Sometimes it’s difficult for me to check out user queries, as I’m generally using development versions of the applications.

On my Mac that still has OS X 10.3.9 I’m using DEVONagent 1.8 beta 5. On that computer, if I open a Web page in DEVONagent’s browser and select Data > Add to DEVONthink > PDF (One Page), DEVONthink Pro receives a PDF-Text document (as indicated in the Info panel) and can “read” the text contents.

The same is true on my main Mac, running 10.4.4 and DEVONagent 2.0 beta 7.

I didn’t reinstall DA 1.7 to check out that version under 10.3.9. But if memory serves, DA 1.7 also produced PDF+Text documents when Data > Add to DEVONthink > PDF (One Page) is selected. Are you using DA 1.7 or the public beta version 2.0?

Can you select text or find (Command-F) text in the documents you are capturing? I’m assuming there was text in the Web pages you captured. If there was no text, the resulting capture would be an image-only PDF. Note: I have come across a few Web pages like that. Can Command-F find text in the original Web page?

One more possibility. In your database, check the Path field in the Info panel for one of your documents captured this way. Does the path contain “tmp”? If so, that would indicate that the PDF was sent to a temporary folder, and would disappear after a reboot, breaking the path to the file – and in that case, your database would contain only an icon of the original file. If that has happened, the solution would be to change your DT Pro preferences setting for PDF & PS, so that PDF & PS files are added either to your database or (my preference) the database package Files folder. Then reimport the Web pages perviously captured.

Wayne_Dreier · January 30, 2006, 2:32am

Bill_DeVille:

Are you using DA 1.7 or the public beta version 2.0?

Can you select text or find (Command-F) text in the documents you are capturing? I’m assuming there was text in the Web pages you captured. If there was no text, the resulting capture would be an image-only PDF. Note: I have come across a few Web pages like that. Can Command-F find text in the original Web page?

One more possibility. In your database, check the Path field in the Info panel for one of your documents captured this way. Does the path contain “tmp”? If so, that would indicate that the PDF was sent to a temporary folder, and would disappear after a reboot, breaking the path to the file – and in that case, your database would contain only an icon of the original file. If that has happened, the solution would be to change your DT Pro preferences setting for PDF & PS, so that PDF & PS files are added either to your database or (my preference) the database package Files folder. Then reimport the Web pages perviously captured.

I used DA 2.beta 4.

Finding text gives mixed results. Certain words are found, others are not. I haven’t been able to determine any consistent results.

The page is a page from the 1920 census captured from Heritage Quest Online. I am assuming that it may be sort of both an image and a text combination, if that is possible. When searching, if anything is found it appears it is only in the area which is probably text. But again, some words are found in that area while others are not found.

I checked the path and it does not contain any .tmp in it. I can zoom in on the page in DT Pro which is one of the requirements I would have. I can also be offline and view the page which was not the case when I captured the same page by using DT Pro browser.

cgrunenberg · January 30, 2006, 11:07am

Could you please post the URL of the captured page? Then I could check this over here - thank you!

Wayne_Dreier · January 30, 2006, 4:03pm

Here is the URL: persi.heritagequestonline.com.oh … 1&offset=1

Because it is a database accessed through my local library, I’m not sure it will work to go directly to it.

cgrunenberg · January 30, 2006, 5:22pm

No, it doesn’t work. Do you have another URL? In addition, please check that “pdftotext” is selected in the PDF preferences if you’re still running 10.3.9 and don’t use TextLightning.