Simple query about pdf annotations

michiganuser67 · July 24, 2010, 11:23pm

Hello – I’m a new user of DTPO, using the trial license and trying to learn about the program, to see whether it fits my needs (these are academic - I write and teach at a university). I’ve explored the forums before posting this question, but can’t find anything recent (nothing applying to DTPO 2.0).

DTPO lets me annotate pdfs. I can draw boxes, highlight, etc. I can also add text notes or comments. However, the text comments I add are not being captured by the index – and that seems wrong. What’s the point of adding notes if they aren’t accessible to the program?

A long-ago thread attributed this to problems in the PDF text layer, and said this issue would be addressed (viewtopic.php?f=7&t=6929&p=31766&hilit=pdf+annotate+indexed#p31766 ), and the user’s manual for 2.0 talks about DTPO’s ability to annotate pdfs, trumpeting this as a big improvement in 2.0. Of course that’s true. But if one takes notes about the pdf directly on the pdf, surely that text should be indexed? Or do I need to write all of my thoughts/analytical notes as RTFs that will be attached, and not in the more visible format of a marginal comment?

(I’m thinking about the annotation tools I’ve seen for pdfs in Sente – another program I am contemplating as part of an academic workflow.)

michiganuser67 · July 24, 2010, 11:24pm

Thanks!

Bill_DeVille · July 25, 2010, 2:39am

Your query is a simple one, but the issues are not simple.

Text notes added to PDFs are not part of the text layer of PDFs that’s searchable and indexable in OS X, e.g., in Preview and Spotlight. But even if they were, I wouldn’t be satisfied by just searchable PDF text annotations, as I work with references of different filetypes, and a solution for PDFs isn’t a universal solution. (I’ve been associating notes to documents of any filetype for years and it works for me, but seems unwieldy to most of the rest of the world, apparently.)

If you search the forum for the terms “Sente”, “BookEnds”, etc. you will find a number of discussions, and some at least partial solutions. Some users had developed a workable cooperation between Sente and DEVONthink, which broke when Sente released an upgrade. Perhaps cooperation can be established again.

One of the core principles of DEVONthink is that it will not change the filetypes of imported documents; you should always be able to recover files you put into a database back to the Finder, still in their native filetype (even if you don’t have a working copy of a DEVONthink application). That means that it is unlikely that DEVONthink would adopt a proprietary approach that would solve annotation issues by changing the user’s files to some sort of DEVONthink filetype, such as a new package file that includes the original file and annotations to it, and that could handle all document filetypes, e.g., PDF, Excel, Pages, WebArchive, etc. That’s interesting to think about, but such an approach would raise issues; for example, an annotated PDF or Excel file would no longer be directly readable by a non-DEVONthink user’s PDF viewer or Excel if sent out as an attachment in an email message.

michiganuser67 · July 25, 2010, 4:01am

Thank you, that’s really helpful in understanding the issues. And I applaud DTPO’s general principle of not altering the underlying file – that’s impressive, and welcome. I just hadn’t realized the issues with pdf (which are the one kind of source I’m thinking about at the moment).

The problem with associating notes (as separate files rather than marginal comments) as a way to comment on pdfs (or on any other source) is that it requires additional work to specify which part you’re talking about – and to be clear which words are yours, and which are in the original.

This is problematic for, e.g., jpgs that are partly OCR-renderable, but not entirely. That means I can’t just cut & paste the text and put it in the note (between quotation marks), with my comments before or after. And I can’t take just a snippet of a jpg and stick it into an rtf (as a small image chunk) and comment next to it – can I? If so, that might work. If not, I would need to retype text, and/or rely on clunky descriptors like, “In paragraph 2, after the description of George Washington, …”

I have read some of the forum descriptions of workflows with Sente, and with Bookends, both of which do sound complicated; I had been leaning away from incorporating either one, simply for workflow simplicity; and trying as far as possible to work with DTPO until real writing begins, and then switch to Scrivener or Nisus or Word. I’ve thought most people relying on Sente or Bookends did so because they really needed the reference manager functions in particular, i.e. the need to export bibliographically formatted citations. (That’s not absolutely critical for me, and I know some academics have devised architectures to keep citation information handy in DTPO, e.g. idlethink.wordpress.com/2008/09/ … geekery-i/ ). However, pdf notations that are searchable (as Sente’s seem to be, at least within Sente) may be something to consider. Although based on your comments, I’m assuming that if I created such files,

a) such annotations that couldn’t be read by other programs, and specifically,

b) DTPO could not index the annotations in Sente either? In the Sente screen these note fields appear on the side of the screen – but I’m not sure how the underlying code works, if the notes are attached to the pdf metadata, or in the text layer or somewhere else. I understand that DTPO may not be able to “see” a Sente note if it’s in the wrong place.

Thanks again for your thoughtful comments, and your support for a newcomer to this stuff!

nestor · July 25, 2010, 8:14am

Hi,
what about annotating Pdfs in SKIM (which is opensource)?
Generally I use the marker to undeline text (and this produce a note with the selected text), and then you can add in the same note your comments etc. (I use brackets to distinguish my comments from the quote). Then you can:

save notes in the proprietary skim format (.skim) which is readable by DT or
(I prefer this) export notes in .rtf file (which can after be splitted in single notes files.
Hope this helps

elwood151 · July 25, 2010, 9:37am

@michiganuser

I second nestor’s suggestion: you should try skim.

It lets you highlight pdfs and add notes of many different kinds in a separate file,
if you open the pdf with skim, you can see the notes “layers”
the plain text of the notes is searchable in finder and with DevonThink.

I also use Skim for annotating pdfs and am quite happy.
However, for very long summaries, I create a separate rtf file in DevonThink.

Martin

michiganuser67 · July 25, 2010, 1:43pm

Thanks very much – I will try Skim!

One point of detail. I’m actually dealing with these files as .jpgs – that is, DTPO imports them as jpgs and then is doing OCR to find the readable text. (As discussed on another forum post.) This is great, as it saves me the step of conversion (time-consuming, as I have a large quantity of images; the only alternative I found was to I batch-convert jpgs to pdfs with GraphicConverter, which was great except that it produced very large files).

Anyway, am I correct that SKIM can annotate these “pdfs” that are actually OCR-d jpgs? In other words, even if the file extension is not “pdf”? The DTPO annotation tools work fine with these files, so I assume Skim wouldn’t have a problem either. (Alhough the files still have a jpg extension, DTPO lists them (in the “Kind” column) as “PDF + Text.”)

Thanks again!

elwood151 · July 25, 2010, 2:48pm

The file extention IS pdf indeed!

DevonThink displays the old name, e. g. Einstein1931_p05.jpg, but if you select with right click “show in finder”, you’ll see that the actual file is named Einstein1931_p05.pdf.

Those are real pdf files, so skim or preview or anything else can open and display them.

Skim is great - I’m sure, you’ll like it.

Note that in DevonThink you can see by the file type “PDF” or “PDF+Text”, if the file has text content or not.

Graphic Converter is a great utility and it could also influence the file size - not everything is plain intuitive as there are soo many features. If you really want to get familiar with it, have a look at the manuals.

If you just want to convert graphics to pdf and do not fear the command line, you could also try with imagemagick (howto’s to find with google).

Kind regards

Martin

michiganuser67 · July 25, 2010, 3:42pm

Thanks! I do have access to Adobe Acrobat (through my employer, so its high cost isn’t a concern). Would you recommend Skim over Acrobat for its annotation features – either ease of use or access via DTPO indexing – or should I use Acrobat for this sort of annotation if the cost isn’t a disqualifying factor?

elwood151 · July 25, 2010, 6:02pm

I don’t have experience with Acrobat and its annotation features (because of its cost).
Skim is free and very powerful, so when having to choose between an expensive commercial software and an excellent freeware application:
as long as the expensive software does not have features I urgently need, I would always stick to the excellent freeware…

rolfschmolling · July 25, 2010, 6:06pm

Hi,

to chime in, in my experience annotations in Acrobat are like Previews or DTP(O)'s, you need to save them to keep them and then they cannot be changed any more. This is different with Skim though – at a later point one can chose to convert the annotations, marks etc. into Acrobat/Preview-compatible annotations. I perceive the inability of DTP(O) to incorporate Skim’s FULL abilities as a big flaw since the code behind this is freely available (API).

Anyway it is possible to work externally with Skim and work – clumsily – around this flaw…

regards, Rolf

self-propelled · July 25, 2010, 9:46pm

I’ve been checking out Skim, and while it’s note-taking features are great, I can’t see a way to annotate PDFs with hyperlinks. Does anyone know if this is possible? At the moment I’m stuck with Preview’s clunky invisible-linkbox method, and am surprised that I can’t find anything that will just let me type in a URL and make it clickable.

I find having to export Skim notes as .skim or .rtf in order to render them searchable by DT unsatisfactory, as you then have to somehow associate those notes with the file from which they came (with a folder or link or something), and they exist as a separate file in the DT structure, which doubles everything up. The whole point of annotations ought to be that they’re written ‘on’ and associated with the pdf. It would be great if DevonThink could incorporate the style and automaticity of Skim’s notes into DT’s own more basic ‘Data>New From Template>Annotation’ feature, which does have a built-in link to the source document.