PDF forms in DT

dpfels · August 17, 2010, 3:16pm

Hi,
I am trying to use a PDF form from within DT.

I notice that PDF forms with data that was entered in Acrobat maintain their the information in individual fields. However, if the field is edited from within DT, the information is lost (even though earlier field information entered previously in the same form is maintained).

This seems odd, as DT allows editing of individual PDF form fields. Why isn’t the field information saved? Is this functioning correctly, or is it a bug?

Thanks,
Dan

Bill_DeVille · August 17, 2010, 4:04pm

It’s easy to forget that there is no single PDF filetype, but raather a number of different “flavors” of PDFs.

I usually thing of a PDF as consisting of an image layer and a searchable text layer. But there are other things tucked away in many PDFs.

Adobe adds a “form” layer to PDFs and Acrobat contains special code for the creation and management of forms.

DEVONthink uses Apple’s PDFKit to interpret and display PDFs, and PDFKit doesn’t have the same code as does Acrobat.

Try this: Do a search for a term that is the normal text layer of a PDF form in your database. That PDF will be listed in the search results. Now repeat the search, but instead use a term that’s in a form field but not elsewhere in that PDF. Search cannot find it.

Now edit that form PDF in DEVONthink or in Preview and Save it. Repeat that last search query, and the PDF will be found.

In effect, the form fields have been “flattened” into the text layer. They are now searchable, but no longer editable as form fields.

dpfels · August 17, 2010, 4:18pm

Thanks for that very helpful reply.
However, when I open my form in Preview, it is both editable and savable. That is, whatever I enter into a form field reappears in the same field when I reopen it.

I am also confused by the fact that DEVONthink allows me to edit the form fields. If it were not recognizing the field, why would it be editable?

I see no evidence that DT or Preview is flattening the PDF form, rendering it un-editable. Only that DT is unable to save edited changes, something that is entirely possible with Preview, which I assume uses the same Apple PDFkit.

Am I missing something?
Thanks,
Dan

Bill_DeVille · August 17, 2010, 4:46pm

I’ll confess that I haven’t played with PDFs containing forms for a while, but back then (over a year ago) I found that the form fields were no longer editable. I don’t think those forms were produced with Adobe software, so there may have been other differences.

However, quite recently another issue came up that involved OCR of PDFs that contained form fields, and that was quite interesting. In that case the user sent a sample PDF. The “background” text of the PDF was not searchable. Form fields had been added and contained text that the user wished to search.

Following OCR of such a PDF, the background text was now searchable, but the form fields were not searchable, though they displayed readable text. But the important data for the user’s research was in those non-searchable form fields.

I found that if I did an edit under Preview and saved the change to the original PDF, then after OCR the content of the form fields was searchable. And “printing” the original PDF to DT of course flattened it, so that OCR made all the text searchable. In this particular case, as some of the text in form fields was obscured in the fields, “printing” as PDF produced a better display of the content of the forms fields, then OCR produced a completely searchable PDF.