MHT file format

jebster · November 26, 2009, 2:44pm

Dear all,
Has anyone encountered this situation - and ideally can suggest a workaround? I am importing my research notes into Devonthink Pro Office – a collection of thousands of files - pdf, doc, and quite a few saved webpages in the mht format. Most of these files were created on Windows XP, the mht single file web archive format in particular, by internet explorer. The pdf and doc files are imported without problems, however, the mht is an “unknown file format”.

I have thought about converting the mht files to pdf, but that conversion itself is hit and miss - unless you verify the conversion settings for each file for text that is cut off, large whitespace gaps, etc. Easy enough to do for a few files but when you reach a few hundred or thousand even…

Is there a way to add mht file support to OSX, that would also then enable Devonthink to read these files? Or is there another way?

Many thanks for any assistance and ideas!

cgrunenberg · November 27, 2009, 11:04am

Opera should be able to open MHT files. In addition, there’s an utility to extract the contents of MHT files: See macupdate.com/info.php/id/16101/file-juicer

jebster · November 27, 2009, 3:40pm

Thanks very much for the reply. Yes, indeed, Opera will work for viewing MHT files, and File Juicer will extract the contents into a folder with HTML and related (images, stylesheets, javascript, etc.).

For those who come across this thread and are looking for a solution too for MHT files, I’ve also come across unMHT for Firefox as well as QuickLook and Spotlight. The latter allows Devonthink to show a preview. I notice that on some of my MHT files - ones without images maybe? - Devonthink recognizes the file as an MHTML file without going through unMHT (which shows a smaller thumbnail preview).

But so far nothing that will allow Devonthink to index the contents of the MHT. Unless I’m missing something?

So I think therefore I’m reluctantly looking at conversion. The File Juicer option would leave me with a bunch of files in a folder, which was the reason I started saving single file MHTs in the first place. But if it’s that or the PDF option, I guess I’ll have to choose between that and the uncertain formatting of the PDFs. Either way I guess would allow Devonthink to index the content, which is the most important thing.

But the best solution would be for Devonthink to recognize MHT. I’m a bit baffled because it does seem to recognize MHTML. Just not MHT. Even so, it does NOT index the contents of the MHTML, which leaves me back at square one…

Any further ideas or opinions on the conversion option are welcome!