Editing PDFs for searchable notes

My apologies if this question has been covered…I tried searching the forum but came up empty-handed.

If I annotate PDFs using DTPO 2.0 pb7, are the “notes” or “text” I add searchable? I tried a few test runs, and it seems that my mark-ups added to PDFs don’t turn up when I save the edits to the PDF, move to a different folder just to make sure, and then run a search. DT doesn’t seem to recognize the text I add.

I have tried importing Skim files that have my notes added to them in annotations, and the import works great – brings the notes right in – but they’re not searchable. My little understanding is that this all has to do with the way PDFs are saved. In any case, I prefer to mark up my research articles and would love to be able to search the mark-ups in DT. I also am thinking about this as a workaround for qualitative data analysis, too.

Any thoughts would be much appreciated. DT is simply amazing – thank you for all you do.

I use this workaround - set the Skim preference to “Automatically save Skim notes backups”. It creates a sidecar file in the same place as your PDF, with the same name and an extension .skim. If you index .skim files they are really RTFs and are searchable. When you update the PDF in Skim and save it, the .skim is updated. (This approach is tricky if you import files.)

Thanks, Korm, I just tried that and it works well for getting the notes in as searchable text.

Only thing is: I’d like to have the notes in there, searchable, right alongside the original PDF text. If I use your workaround, or import an RTF of notes that I export from Skim, It means dealing with two files - the original PDF and then the notes file. Any way that I can be parsing my notes in the original PDF within DT, so that I’m seeing the anchored notes about particular places in the text right alongside the text?

For example, I’m reading a research article and find a quote that reminds me of another author, I’ll want to write an anchored note “just like Smith wrote in his 1990 piece!” or the like. I’d love it if I could then search “Smith” in DT and have that PDF pop up in the search results, and when I open it it’ll take me right to the note and the quote where I was reminded of Smith.

This is how I’d also envision a workaround for qualitative data analysis, if the “notes” or “tags” would be searchable and locatable to their exact anchoring spot. If the data files (fieldnotes, archival sources, interview transcripts) can be saved as PDFs, then one could annotate them in DT or Skim with the appropriate labels in the exact place, and then the search function would be able to bring you right back to that place when the label is entered as a search query.

DT does that beautifully when the search query is textually identical (or fuzzily similar!) to the data - it snaps you right back to that word. But if your thought or label or code or note is something totally different, a concept or word not “native” to the PDF at hand, then the game changes. I don’t expect DT to be a mind-reader, but I’d love to use its powerful search features to focus the search results in a different way than an exact match or a “see also” match. Tags are a great start, but as I understand it they are applied to the given file as a whole, not to specific places within the file, so searching research article or an interview transcript marked up with multiple codes would not turn up its location-specific particulars.

Right now I’m just unsure of if this is a possibility at present, or perhaps is in the works for future releases?

And then there are my infamous kludges – which let me get my own work done pretty efficiently – using rich text notes and a Lookup string to link to any location within a document of any filetype, or Page Links to link to a specific page of a PDF.

Thanks Bill — could you say more about said kludges of infamy?

So I’m guessing there’s no way at present – either using DT, Skim, or Preview – to have annotated PDFs the DT database whereby the notes are searchable, without any exporting or other type of workaround?

I work primarily with rich text, PDF and WebArchive notes and references, although other filetypes creep in as well. I don’t want to use a system of note-taking that’s specific to only one filetype, such as PDFs, and I prefer not modifying my original files.

So when I’m making notes about a reference document, I’ll create a new rich text note and give it the same name as the reference document plus an extension to the Name, e.g." - Note 091009" or other extension that may be useful when looking at the note’s Name. Now, if I do a search for the reference document by Name, I’ll also pull up the list of all the associated notes about it.

That’s one of my kludges, but it results in searchable rich text notes that are associated by Name to reference documents. Also, within that note I’ll use a hyperlink to the reference document, so that I can pop it up from within the note. The hyperlink can be created by selecting a text string, Control-clicking and choosing the contextual menu option, ‘Link to’. I then navigate to the desired document and choose it. Often, when viewing the referenced document, I’ll choose ‘Edit > Copy Item Link’ and then paste that into the note, resulting in a hyperlink to the reference document.

There’s more. Suppose I want to tie the note to a specific section of the referenced document. I can do that by choosing and copying to the clipboard a text string at the desired location in the reference document. Then I paste that string (enclosed within quotation marks) into the rich text note as a ‘cue’ string. Later, if I select the cue string (including the enclosing quotation marks if I’m using DEVONthink 2) and press Command-/ (the Lookup command) a Search window opens and when I hit Return the search list will include at least two results – the reference document and my rich text note. Select the reference document from the list and it will scroll down to the first occurrence of that ‘cue’ string. As a practical matter, I find it’s easy to select a cue string (3, 4 or 5 words) that’s very specific to the desired location in the reference document.

For PDFs, I can use the Page Link to a desired page in the reference document. If I wish to create a link in my rich text note to page 151 of the referenced PDF, I can obtain the Page Link for page 151 by a Control-click on the page icon for page 151, then copy the link to the clipboard and paste it into my rich text note. (This requires that the PDF’s sidebar be visible. If not, choose View > PDF Display and select ‘Sidebar’.)

Some document filetypes, e.g. Word, Excel or others that use Quick Look for display, can’t scroll to the cue string occurrence. In that case I may open the document under it’s parent application, perhaps noting that I need to jump to page 13.

Bottom line: Yes, those tricks are kludges, but I find them useful and more efficient for me than any note association scheme that’s limited only to a single filetype. I do my draft writing inside a database, and those rich text notes usually evolve into portions of a final draft, including content that provides information necessary for citation and footnote/endnote content when I move the draft over to a capable word processor for final polishing and editing.

I put together an example database illustrating these techniques, and it can be downloaded at files.me.com/wbdeville/6hwjy4

That link expires 9 December, 2009.

Bill, this is excellent guidance. Your sample database is a great model. Both should be included in DEVONacademy when it is updated for DT 2.

(Eric - hint! :wink: )

Bill, this is really helpful. Thanks for taking the time to explain!

Hi Bill,

is it possible that you reupload your example database?

Thanks in advance

Hi, Patrick - I had intended to update it, but haven’t had time.

Here’s the link again: files.me.com/wbdeville/6hwjy4
Expiration date: 4/13/2010.

Thank you very much.

Hi Bill,

Thanks for the explanation. Could you please reupload your example database? I have just found out this thread.

Thank you :slight_smile:

Gun

Sorry, I’ve taken it apart to add some of the newer features of version 2.x and simply haven’t had time to finish it. I’ll try to do that one of these days and post a note on the forum.

I am looking forward to the new version. Thanks!!!

Ditto–looking forward to seeing the file.