Bypassing defects of Data > Convert > to Rich Text

The current implementation (DT 2.0pb5) of
Data > Convert > to Rich Text
is useful but a little rough for HTML to rich text conversion. Any italics, for example, are lost, as is any space between paragraphs.

Until this is fixed in DT2, I am bypassing Data > Convert > to Rich Text and using an Applescript which automates the more fully developed HTML to RTF conversion that is built into TextEdit.

Just in case it is of use to anyone else, I select the HTML record in Devonthink, and run the following script:

-- BYPASS SOME DEFECTS OF DT 2.0pb5's built-in HTML to RTF conversion
-- (Data > Convert > to Rich Text drops italics, paragraph spacing etc.)

-- Select an HTML record and run this script
tell application id "com.devon-technologies.thinkpro2"
	set oHTMLRec to content record of front viewer window
	if oHTMLRec ≠ missing value then
		if type of oHTMLRec ≠ html then return
		set oRTF to my makeRTFcopy(oHTMLRec)
	end if
end tell

on makeRTFcopy(oRec)
	tell application id "com.devon-technologies.thinkpro2"
		
		-- save the HTML record source to a temporary file with an html extension
		set strSource to source of oRec
		if length of strSource > 0 then
			set strTempFolder to (path to temporary items folder as string)
			set strTempPath to strTempFolder & "tempRec.html"
			set oFile to strTempPath as file specification
			
			open for access oFile with write permission
			write strSource to oFile as «class utf8»
			close access oFile
			
			-- load the source into TextEdit and save it with an rtf extension
			set strRTFfile to strTempFolder & "tempRec.rtf"
			
			tell application "TextEdit"
				set oDoc to open oFile
				tell oDoc
					save in file (strRTFfile)
				end tell
				close oDoc saving yes
			end tell
			
			-- remove the temporary html
			set strCommand to "rm -R " & (quoted form of (POSIX path of strTempPath))
			set strResult to (do shell script strCommand)
			
			-- create a new DT record by loading the rtf file
			set oLocn to parent 1 of oRec
			set oNewRec to import strRTFfile to oLocn
			set name of oNewRec to (name of oRec)
			
			-- remove the temporary rtf
			set strCommand to "rm -R " & (quoted form of (POSIX path of strRTFfile))
			set strResult to (do shell script strCommand)
			
			return oNewRec
		end if
	end tell
end makeRTFcopy

Do you have an example for me? Because DEVONthink is using Mac OS X (and therefore probably the same functionality as TextEdit) to convert HTML to rich text.

If you create a record with the source below, I think you will find that Data > Convert > toRichText strips out the Italics and space between paras, whereas TextEdit preserves them.


<head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/></head><body><I>Quotation:</I><P>To my view, the advantage of constraints can be explained as follows: On one  hand, a most natural "physical" & evolutionary constraint is placed on our brain in  terms of relative size: The human brain accounts only for 2% of the body mass, <b>but</b>  <u>nevertheless</u> consumes 20% of the total metabolism, which has to be maintained with  energy, i.e. with food (Roth & Dicke, 2005). Therefore, our brain is very ‘expensive’ in terms of energy supply. Increasing capacity in increasing brain size therefore does  not seem to be an evolutionary advantage, since in increasing brain size, also the  intake of fo  s of other organs has to be maintained.<P><I>Comment:</I><P>May be worth trying<P>{Jaeggi 2007@223}<P></body>

Please check the fonts, see Preferences > Web. The defaults should be “Times 11” and “Courier 13” and support italics.

That does the trick - thank you.