Copy & convert PDF's in one action?

Is there a way to import PDF’s, have them converted to Plain Text, AND have the original PDF copied into the DT Files folder – all in one action? I have fiddled with Pref’s and can’t figure out how to do it, it seems you can only do two of those 3 actions. I’d like to be able to copy PDF’s into DT and have a plain text file generated for enhanced searching capabilities, all in one action. Any suggestions?

Are you saying you’d like both of the “Files: Copy files to database folder” and “Index & Convert: Convert to Plain Text” actions of PDF & PS preference settings to happen during import?

Seems confusing that the “Convert to Plain Text” checkbox overrides whatever the import behavior is when that pref is unset, more like it belongs under the Files category.

I see this import settings topic relating to Override PDF Import Settings?, where Maria brings up the broader issue of selecting alternative methods for importing. And I still wonder if there’s some kind of feedback that would make it more obvious what’ll happen during import without having to check the prefs.

We don’t think that this is a very common requirement – to generate to documents when importing one. Consequently, when you choose to convert imported PDFs to plain text, the only thing that you’ll find in the database is a plain text file.

You can either import the PDF file (including copying the file to the database folder) OR import just the plain text portion of it.

Best,

Eric.

Maybe the preferences could make the behavior clearer? For example, when “Convert to Plain Text” is checked the “Files: *” settings could be grayed-out.

Put that way, it makes sense, but I still wish I could do it :slight_smile:
The issue of wanting to have “two” documents in DT stems from a larger problem: not having PDF’s show highlighted search results in DT… but here’s an idea: What about a feature where DT can copy a PDF into DT, convert it to uneditable Plain Text, and then invisibly link the two together, so that as far as the user tis concerned they are one. When such a document is viewed a button could be displayed that says “View as PDF” or “View as Plain Text”. Thus, searches in such documents could reveal the actual words in the Plain Text mode, and then to see formatting and images the user could click the “View as PDF” button. Basically, the need here is for highlightable searches of PDF’s within DT, like in Preview, so that Plain Text conversions could be eliminated in most cases… I have some huge PDF’s and none of the above options work very well in DT, am always having to do multiple searches, in DT and then again in Preview… am sure other users miss this too.

Creating a multi-layered PDF and text document sounds interesting.

Maybe it’s possible (in DT Pro) to have a script that automatically generates the text version of a PDF file when importing?

Searching PDF files in DT might be improved by integrating PDFKit from Tiger.

Which is due to the lack of an API for doing this in Quartz. Maybe Tiger will bring some relief here.

This is exactly what DT does anyway :wink:

Hmm, maybe that could be done. We’ll have a look at it.

We agree. But until we have an API for this from Apple (the search code is directly integrated into Preview), we could only do this by completely writing our own PDF displaying code, which is basically a PostScript raster image processor, or embed Acrobat like the Acrobat plug-in does it for web browsers. But this would be clumsy, memory-hungry and not easy to accomplish.

Best,

Eric.

A somewhat related use would be if:

  1. you have a password-protected PDF
  2. you extract the text with textlightning

Then, it’d be nice to have both the PDF and the text in the DEVONthink database; the PDF for better layout / presentation, the text to do searching.

Joe

We’re patiently waiting and hoping. :slight_smile:

I noticed Introducing PDF Kit says:

Being able to annotate PDF documents directly in DT would be quite nice.

I second the following requests:

If I understand correctly the information i.e. the plain text is already there in the database - it is just a matter of displaying it. And as you have done with HTML - the possibility for that button would be really great.

A way to do “highlighting” or “stickies” on top of pdfs would be nice; I was thinking of a meta-inforamtion somewhere along: “highlight the area between two coordinates in this color” mimicking the pen based highlighting on paper - and as PDFs are frozen layouts this would be the only information necessary to create a useful additional feature - if you also add the possibility to add text to such a highlight we could even do bookmarks, notes, annotations in the pdfs that are searchable, classifyable etc. (in form of a “new file format - a pdf-highlight-location” or so…) just dreaming.

I was about to post a request for some better form of document annotation in DEVONthink (even if it were just something along the lines of adding a field to the document’s info window that is essentially a hidden rich text document associated just with the document[1] – that way I wouldn’t have to worry about managing yet another document just to jot down a set of notes) but I did a quick search first and found this, and it sound like it would be outstanding – if it actually allows for nice in-PDF annotations.

Have you guys been able to take a look at PDF kit yet? And does it look like it might make annotations possible? A good annotation solution is one of the few things that DEVONthink is still missing, and unfortunately it’s also the kind of thing I’ve really had a need for lately…

[1] Actually, I’d like a bit more than that, it would also be nice to have a new command that, when the appropriate key combo is pressed, would automatically open the annotation and append a tag noting to current page of the document that I was reading. This way if I was reading and I wanted to make a note about a passage, I could just hit Cmd-A (or whatever) type in my note under the automatically inserted stamp, then close the annotation window and go back to reading.

I would like to add that the items being mentioned here are all items I would dearly love to have in DT,

Specifically,

Bookmarks for pdf and rtf docs
Embedded and attached annotations for pdf and rtf docs

Wiki links are almost a way to create attached annotations for RTF documents, but that’s not the most convenient and “usable” solution. Adding comments is another “almost” method, generalized to other document types besides RTF.

Different styles of annotation might be sufficient for certain doc types. Highlighted text in HTML docs would be useful (for me). And example brings up a distinction between simple marking and more sophisticated annotating.

Specific annotation/marking styles, generalized for all document types, would be ideal. That could easily move out of range of intended capabilities for DT, at least for awhile. :slight_smile:

I have exactly the same request as Moses.
As I want to be able to trace all the PDF’s I have referenced in DT, I have choosen the ‘copy in the DB Folder’ option.
But as I don’t want my PDF’s to be mutilated and not being able to read the original paper in its original presentation, I don’t have choosen the ‘convert into Text’ option.

And that’s a huge problem not to have the search words highlighted when performing a search, as DT does for the text documents (It’s all the power of DT!!)=> I have more PDF’s than Text files in my DT DB…
I then have to open dozens of PDF’s into Acrobat and to search for the keywords, it’s really boring.

When performing a search, I don’t think that having the PDF itself highlighted is important. We just want to see the relevant sentence/ paragraph. If it seems to be particularly relevant, we then can read the PDF itself (in its original presentation, not in a text version), without highlight, it doesn’t matter.

So, as DT already has a copy of all the PDF’s to be able to perform its search, I think it would be really easy to you to implement as a first step a ‘toggle’ button (instead of waiting for an OS X improvement), to give us the possibility to have the raw text file highlighted when performing a search.
The ‘pdf highlight’ function would be also interesting, but is not as precious as the simple ability to highlight the text while performing a search.

Please, please, please, follow this request, I think it’s really simple to implement, and it would be a great enhancement for the user!
Thank you,
Jean-Christophe

thank you for the suggestion. One of the next 1.9.x releases will probably introduce a “Summary” column which will display the first occurence (like the “Pages” tab of DEVONagent’s search windows) and this should be similar to your request. And of course Tiger will introduce lots of new possibilities to handle & display PDF documents.