Import Word Docs without Conversion!?

There should be a way to import M$ Word docs without converting them! This way formatting can be preserved. It would be fine if that made them “read-only” to DT, and therefore only user editable by opening the external application…I guess that is the same as ‘Linking’ to the files…except that this procedure requires manual updating.

The value of this program as a repository for collected knowledge is severly limited by its inability to deal with word documents without murdering the formatting. I embed images in all my docs as a way of associating text and images…its all lost on import.

Moreover, OpenOffice docs arent even recognized! Dealing with OpenOffice docs natively would be a HUGE plus for DT. I hope these issues will be addressed in an up-coming release of DT.

Best wishes

…Saving Word doc OR OpenOffice docs as HTML preserves formatting AND embedded pictures in a way that is fully compatible for DT…perhaps then DT should consider implementing an HTML conversion tool for these types of docs…

However, once in HTML format, editing text in DT is very cumbersome as you must work around all the HTML…

Best wishes

Unfortunately, it’s beyond the scope of DT development to try to capture the text information content and also the ability to directly render and provide full editing capabilities for .doc Word documents. Even were Microsoft to release to developers the code to do that (they have not), the code size of the DT applications would become much larger. And then there are all the other applications in this Tower of Babel world for which we would like to provide similar features.

DT doesn’t convert .doc documents. Instead, DT uses a built-in feature of OS X to “read” the text content of .doc files – without images and without full format and layout of the original.

This does allow searching and analysis of that text content within a database that may also include related information from other file formats, e.g. PDF, HTML, plain or rich text and so on. That can be very valuable.

I agree that addition of images, format and layout often provide important information in addition to the “raw” text of a document. It’s possible, of course, to save such a document as PDF or HTML to enhance the information content as seen in the database. But that requires extra work on the part of the user, additional storage space, and extra steps to edit the original file and save the changes to the database.

My preferred ‘heavy-duty’ word processor is Papyrus 12, simply because it has a hybrid PDF format that allows me to see exactly the document as it was created, with images, layout, special formatting and so on. That’s because the file is read as PDF in the database (with working links, e.g. to endnotes), but remains fully editable within Papyrus, with edit changes immediately visible in the database. Wouldn’t it be wonderful if Microsoft were to do something like that?

DEVONtechnologies hopes to add more “known” filetypes in the future, so as to allow at a minimum text capture from additional file types, and perhaps improved rendering capabilities of some of those, as well. To the extent that developers “wall off” their products with proprietary file types, that remains a difficult task.

Thanks for that thorough reply!

I will check out the Papyrus program you mentioned, as I really want to get the most from the DT db…

I keep all my lab records in Word, simply because it was at hand and licensed…also I collect all manner of other data…

Because of the issues with M$ Word that you already mentioned, I have recently switch to…which is able to convert all my Word docs and, for the most part, preserve the formatting including the embedded images…however, the OpenOffice docs arent recognised by DT…considering that OO is an open source project, I guess it would not be a major obstacle to integrate this file format…and it would certainly be very helpful…

I have never even heard of Papyrus before…although I do like the sound of the hybrid PDF output…so I’ll check it out.

Keep well,