PDF Editing within DevonThink Pro Office!

musicpenguy · January 20, 2007, 4:09am

One feature that makes this software not perfect is the fact that you can not add text to pdfs. If DevonThink could add abilities like PDFPen, to DevonThink, then this would truly create a paperless office.

This is the one feature that is a must in the next update, please please try to develop and add this to the next Pro version.

Thanks for the great software!

jwiegley · January 20, 2007, 4:54am

I have to say that the focus on DEVONthink seems to be document storage and retrieval, not editing. For example, even though DT can edit text and RTF files, I know many users who prefer to use TextMate, Scrivener, Mellel, Papyrus, etc.

Now, you can always use Apple’s own Preview application to add notes to a PDF. This is what I do to mark up work-related PDFs while I’m on the phone with a client.

John

Bill_DeVille · January 20, 2007, 4:56am

Although currently PDF files cannot be directly modified in DT Pro/Office, searchable text can be added as metadata in the Comment field of the PDF’s Info panel. So you can add keywords, notes and comments about a PDF as a metadata ‘attachment’.

It’s likely that metadata features will be enhanced in future versions.

In OS X 10.4.x Preview allows one to make searchable text annotations on PDFs. Perhaps this can be done directly inside DT Pro in a future version of the PDFKit code that’s used to display PDFs in your database.

Currently you can select a PDF file that has been captured into your database using either the Index or Import capture mode, and choose Open With to open it under another application such as Preview, Acrobat. or PDFPen, where the PDF can be edited and saved. Next time the PDF is opened or synchronized, the edit changes will be visible in your database. (Note: but please don’t expect Open With to work for editing Imported MS Word files.)

musicpenguy · January 20, 2007, 4:58am

I think at least the Pro Office version should have this to make it a one stop shop for all of your documents, and the lack of reflecting changes made to pdfs automatically when changes are made in external edits makes this a must feature to have.
(I have switched from using pages, to DTPO for all of my writing and notes, so I think that if you can highlight a word with ORC you should be able to add your own)

musicpenguy · January 20, 2007, 5:20am

Thanks for the index tip seems to work seemlessly

Denriddy · May 22, 2007, 5:08pm

Then I have to say that the people at DEVONthink, to be responsible, should run, not walk, to their web site and immediately remove the following:

“DEVONthink is the one and only database for all your digital files, a practical, powerful document manager… .”

Because it isn’t.

The lack of ability to edit PDFs, coupled with included ability to edit text and rich text documents, make the actual management in DEVONthink of large mixed collections of these types of documents that contain text that needs editing, dup removal, spell checking, etc. extremely unweildy.

I already brought up in another thread the lack of global find and replace and was ridiculed and told how little sense it would make in “the one and only database for all your digital files.” I can’t even comprehend this kind of attitude under the banner of those kinds of claims.

It is a powerful program. There is much it can do for those who need specific—and limited—functionality in organization of and access to and searching of files.

But there are far too many aspects of true management of digital files that simply are not supplied by the program. If I hadn’t needed them, if I hadn’t gotten DEVONthink specifically to have that kind of functionality, if I hadn’t believed—from such sweeping marketing statements as above—that it could and should provide such functionality, I wouldn’t have brought them up and asked about them.

I hope it matures into a program that lives up to its marketing claims. It isn’t yet.

Bill_DeVille · May 22, 2007, 7:12pm

The term “document manager” certainly does not imply the ability to edit all of the file types being managed.

DEVONthink Pro isn’t intended to be a universal editor of all file types, nor is that possible now or in the foreseeable future. The very idea implies that were it to attempt that, DT Pro would become the universal application – with the attendant consequences of a huge file size and high cost. That’s not what document management – or information management – is about.

The focus of the DT applications is to allow one to collect together documents of whatever OS X-compatible file type and to assist the user to search and analyze the information content of text and metadata.

The most significant problem hindering that focus is the Tower of Babel effect resulting from the existence of many proprietary file types. Currently, the universe of file types has two categories: “known” ones, meaning that the text and/or media content can be extracted, and “unknown” ones that aren’t readable. However, a DT database contains metadata about both known and unknown file types, and this metadata can be supplemented by the user.

DEVONtechnologies depends for the most part on OS X to capture text content, e.g. Cocoa text, PDFKit and WebKit. Text/images can be captured as PDF and added to the database for any OS X-printable file type document, and text content of most file types can be clipped and added to a database.

Future versions of the DT applications will be able to capture text from (but not necessarily accurately render or edit) a broader universe of “known” files. It’s likely that Apple’s PDFKit will allow limited editing of PDFs from within a database. But for the most part, editing of many file types will best be done under the parent application rather than inside the database; for that purpose the “Launch Path” or “Open With” commands are available in the database.

Denriddy · May 22, 2007, 7:51pm

It seems to when it’s convenient (e.g., text and rich text), not to when it’s inconvenient (e.g. PDFs and Word documents).

I even could understand and work with this bifurcated (and I did not say “schizoid”) approach to the handling of textual information if only the following promise would be made good, from the “Indexing Files on Your Hard Disk” page of the Help file:

“A future version of DEVONthink Pro Office will keep indexed files up-to-date automatically and will allow editing of indexed files.”

It’s not even clear what “allow editing of indexed files” means the way the sentence is written, but if it in fact means “edit indexed text and rich text files,” that would make indexing—and then editing certain types of files with an external editor—a viable option. As it stands, neither indexing or importing is workable for managing, including editing, a large mixed collection of documents.

We live in a world of mixed formats of documents. I can understand the difficulties in creating a program that aspires to do as much as DEVONthink aspires to do. What I don’t understand as well is why there seems to be such defensiveness about the things it has promised or claimed that it doesn’t yet handle well.

Obviously, the developers already see the need of being able to edit files with text in them, or they would not have included the function for plain and rich text. Obviously, the developers already see the need of being able to edit indexed collections and have them automatically updated, or they wouldn’t have promised the capability. I am not attacking anyone involved. I simply think it would be prudent if the public claims and expectations were made more clear and realistic, and sincerely hope that the program will address the things that currently make it unworkable for my needs.

Other programs are moving into this niche fairly rapidly, and I actually would like to see DEVONthink capitalize on and maintain its early lead—whatever you think of my comments and suggestions.

Bill_DeVille · May 22, 2007, 9:37pm

I must emphasize that “document management” and “editing” are entirely separable operations.

Many document management databases, e.g. for a corporate Intranet do not allow user editing of documents. The documents are read-only, and properly so.

Note that the DT Pro Office Web Server mode allows one to distribute the contents of a database to networked users in such a searchable but read-only mode; users (Mac or PC) can search for and read database content but can make no changes to the database. Such a document management system can be useful to an organization for distribution of policy and procedure documents, forms, etc.

The term “document management” has a number of technical definitions and levels in information science. The DT applications are intended to assist single users to maintain and mine information from collections of documents, with some useful and unique artificial intelligence assistance. Personally, I use DT Pro because it helps me manage and analyze the information content in my databases. I use DT Pro as an interactive research assistant.

The phrase you quoted about future changes in the Index capture of document is a simple statement. Currently, one must invoke the Synchronize command to update indexed content; in the future, updates will take place automatically. Currently, the text content of Indexed files is read-only; in the future, the user may be allowed to modify the text for certain file types.

Do not expect that one will be able to modify a Word document within the database, including images, formatting and layout. Do not expect that one can substantially edit (significant changes in content and layout) a PDF document within the database – even Acrobat Pro is very limited in editing capabilities. The same goes for Excel, Pages, PowerPoint, Keynote, Mellel and many other applications. It would be impractical in development or license costs to DEVONtechnologies (and the resulting increased costs to users) to try to build in extensive multi-filetype editing capabilities within the DT applications, beyond those available in OS X itself. OS X keeps getting better in that respect. It’s likely that simple editing such as annotation of PDFs will appear in future DT versions.

Do expect that development will focus on the purposes and intent of the DT applications, including improved searching, database structure, speed, user assistance, scalability and interoperability with the operating system and more applications. Editing a complex Word document is best left to MS Word itself, Papyrus or perhaps the free NeoOffice application.

Denriddy · May 23, 2007, 12:37am

And I’ll emphasize that they “have been,” but will continue to be only for has beens.

Meanwhile, the trend clearly is toward integrated document management and editing, and will continue inexorably in that direction, leading to the realization of longed-for capability that reaches back to the dawn of computing. Whether DEVONthink comes to the party or not is no skin off my nose.

I realize that many corporate megaliths with grunting cubicle dwellers have to keep bodies of data “read only.” Entire industries have been built on providing them just such clunkware databases.

Thankfully, we’ve arrived in the 21st century, and with document formats like rich text and PDF, the future is integrated document management and editing, not proprietary dinosaurs like “MS Word docs.” Even Leopard is rumored to have basic editing in Preview.

Rail against it. Argue against it. Fight it. Stand in front of it and wave your hands in its face. Fine with me.

It’s coming.

Denriddy

kewms · May 23, 2007, 12:59am

It does? Funny, I hadn’t noticed. My main database now exceeds 1.5 million words, and I don’t find it unwieldy at all. I recently pulled in a mixed collection of more than 300 files in one bite, and didn’t find that particularly challenging, either.

You might consider that document management and document creation are not the same thing. Both corporations and researchers need to manage very large numbers of documents, most of which are either static or semi-static (changing only at prescribed intervals).

While I assume scenarios exist in which one might want to batch-edit or spell check 300 files at a pop, I must admit I’m having trouble imaging one. Do you have a particular example in mind?

Katherine

DarylF2 · May 23, 2007, 1:12am

PDF was never intended as a editable format. It simply isn’t structured for editing, and even Adobe Acrobat Pro can only do very limited editing of existing PDF documents. Asking DEVONthink Pro to be able to edit PDFs is asking something which is extraordinarily difficult, and well beyond what a tool like DTpro should do…

Word .doc is a closed, proprietary format and so is tricky to handle well. Apple has provided decent Word import/export in Mac OS X, but it doesn’t support MANY advanced Word features. Edit those Word documents in TextEdit (which can open .doc files) or Pages, re-save them, and those advanced features are lost… I don’t think most of DTpro’s users would like that… Indexing of Word files within DTpro is very important, but editing them is not, for the VAST majority of users, I’d wager.

Denriddy · May 23, 2007, 1:21am

I believe the phrase “that contain text that needs editing, dup removal, spell checking, etc.” is the dividing line. “Pulling in” documents is not an issue. Editing them in various formats and keeping the target collection updated with changes is…well, extremely unwieldy. To coin a phrase.

Asked and answered, counselor.

I can give you several: managing and editing for publication a large collection of works from various sources where, e.g., “all right” has been written incorrectly by some as “alright.”

Or: having works where British punctuation and quotation conventions have been used in some of the MSes, and need to be converted to American Chicago Manual of Style conventions.

Or: dealing with a great deal of OCRed documents where certain OCR errors are repeated frequently due to the font in a particular scanned text.

Or: needing to replace every instance of “–” or " - " or " – " with a proper em dash without spaces, viz:—.

Or: having inconsistency in formatting, such as some documents with single hard returns between paragraphs plus tab at the beginning of each paragraph, and needing for consistency’s sake to convert all to double space between paragraphs, no first line indent.

I could go on. But this should suffice.

Denriddy

Bill_DeVille · May 23, 2007, 2:52am

Denriddy, I’m managing more than 150,000 documents in various DT Pro databases.

My main database contains many thousands of reference materials, including books and peer-reviewed journal articles. In that collection I occasionally come across typos. As these are source documents, I believe it would be incorrect to correct the typos. I’m not going to do that. Nor will I ‘correct’ British usage/spelling in a published paper.

Never forget that some of the language usage in the social sciences (especially, and sometimes in the ‘hard’ sciences) would look like misspelling and/or poor syntax in ‘ordinary’ English. Turning loose a blind search and replace engine across multiple documents could seriously alter the meaning of some text – not to mention Bowdlerizing Shakespeare.

I sympathize with one of your points, OCR errors in PDFs. I wish I could correct them easily. Ideally, I should be able to correct the errors only in the text layer, as the image layer is authentic and should not be changed (that can be critically important for scanned documents). However, the often random nature of OCR errors would lead me to do correction document by document, instead of across documents. I don’t think batch processing across multiple PDFs would be useful for correcting OCR errors.

But Adobe didn’t design the PDF format to be easy to edit. I know of no software available at the consumer level that can correct OCR errors at the text level, but not affect the image level. I’ve heard rumors, but so far I know of no working software that I could acquire.

Yes, this is the 21st century and software development has come a long way. However, batch search and replace across a variety of file types remains a non-trivial exercise. Moreover, for what I consider very valid reasons, I wouldn’t do it to my databases, even if I could.

howarth · May 23, 2007, 3:28am

Denriddy:

I can give you several: managing and editing for publication a large collection of works from various sources where, e.g., “all right” has been written incorrectly by some as “alright.”

Or: having works where British punctuation and quotation conventions have been used in some of the MSes, and need to be converted to American Chicago Manual of Style conventions.

Or: dealing with a great deal of OCRed documents where certain OCR errors are repeated frequently due to the font in a particular scanned text.

Or: needing to replace every instance of “–” or " - " or " – " with a proper em dash without spaces, viz:—.

Or: having inconsistency in formatting, such as some documents with single hard returns between paragraphs plus tab at the beginning of each paragraph, and needing for consistency’s sake to convert all to double space between paragraphs, no first line indent.

DR, each of your examples is of accidental rather than substantive variation. Accidentals change the appearance of text, not its meaning. Sustantives affect meaning, like “dank” for “dark,” but there’s no meaningful difference between “all right” and “alright” or the presence/absence of single/double spaces.

You’re describing house style, the consistency of formatting imposed on texts by copy-editors. You may achieve such conformity with a word-processor and a few Replace All commands. In a research database like DTPro, irregularity of content is more desirable than uniformity, especially at the substantive level, and at the accidental level, conformity is not meaningful; it only affects the appearance of text.

On the distinction between accidentals and substantives, see W. W. Greg, “The Rationale of Copy-Text” (1950); also summarized at en.wikipedia.org/wiki/Textual_criticism

cyberbryce · May 23, 2007, 4:40am

Perhaps some of the resistance you are facing is a sort of reflex reaction, since it appears that many users have the mistaken impression that PDFs are document formats that are structured in such a way as to be editable. Can you explain specifically what you mean by editing PDFs? Do you mean like Skim or Acrobat Professional, or more like Intaglio, like xpdf, like KWord, like PoDoFo? Or something altogether different?

Bryce

Edit: Add, PDFPen, forgot about that one. And never mind about xpdf – but there are a variety of postscript editors that might be added to the list too, I guess.

Denriddy · May 23, 2007, 5:27am

It looks like you had a discussion with yourself and won—although some programs you named don’t edit, per se. But PDFs are editable. I edited one tonight in Acrobat.

You didn’t mention Infix, which allows editing of PDFs like using a word processor on Windoze machines, and Nitro PDF Professional and several others that are all over Windoze platforms like a cheap suit and allow editing of PDF text.

I don’t know why OS X is so far behind the curve in this arena. But it can’t stay here for much longer. I’ll mention again that at least limited text editing is rumored to be included in Preview for Leopard.

Having come this far, I’ll go farther and predict that PDF is only the first wave of alternative document formats that are “portable,” so I’m not suggesting that they are the be all and end all of the evolution of document management (in the fullest sense of the word). But they can be edited. If they couldn’t be edited, there wouldn’t be so many programs on the market to edit them with.

Given, though, the circular futility of discussions like these, I’m now considering converting every PDF document in the large document collection I’ve been trying to get usage out of with DTP et al. to plain text or rich text so I can waste less time in dead-end forum threads and spend that time getting the editing done that I need to get done (with not a single apology to or approval from W.W. Greg or any of his minions), then convert to PDF (as needed or appropriate) after the editing is finalized.

Denriddy

Bill_DeVille · May 23, 2007, 5:35am

I’ll toss in the fact that a number of the PDFs in my main database are completely editable – text, footnotes, images, layout, everything.

But that’s an illusion. Those files are hybrid PDFs with the “.pap.pdf” file type. They were created by Papyrus 12. They remain editable under Papyrus, but appear to be PDFs to any PDF viewer on any computer platform.

However, ‘standard’ PDFs cannot be edited by Papyrus, although the developers hint that they are working on that.

sjk · May 23, 2007, 7:31am

Glad you’ve found something more useful to do with your time than waste it on circularly futile dead-end forum threads that share differing ideas and opinions. Shame on me for thinking there might still be something useful to contribute to this discussion.

Daud · May 23, 2007, 8:42pm

I don’t want to edit the pdf itself but would like to select text for cutting and pasting into my RTF note. However, I can achieve this only by opening the pdf in external Acrobat Reader, because inside DTP selecting a piece of text in double column article leads to selecting text in BOTH columns at the same time.
Can DTP have the same text selecting ability as Acrobat ?

David