… that would certainly set it apart from similar programs.
For all I know, there may be a secret regex that accomplishes this. Imagine hitting CMD-F, typing in the magic code (or checkbox), and then replacing all images with nothing.
Wow, would that save time when removing dozens of avatars embedded in forum text, or what? I often select and clip—using CMD-) to Take Rich Note—a long swathe of Reddit text. It’s such a nightmare to delete 20+ embedded images that I’ve actually considered doing a plain text clip instead and losing many important links (along with bold and italics that might make a difference).
Anyway, if such power does not exist, it would definitely attract public interest if it came into existence, say, in the next update. Holy cow!
Too bad the Clutter-Free checkbox option for the whole-webpage-capture (which I never use) does not also (optionally) apply to, and so remove images from, selected-region-capturing via CMD-).
The days of avatar and other crap-image clutter are upon us. We have faith that DevonThink can conquer this foe!
It’s not very difficult to remove images from HTML programmatically. Something along these lines should do the trick
/* Define regular expressions for images in HTML */
const RE = /<img[^>]*>/g;
const app = Application("DEVONthink 3");
/* Get all selected records and loop over them */
app.selectedRecords().forEach(r => {
/* Get the type of the record */
const type = r.type();
if (r.type() === "html") {
/* Remove all occurrences of the RE in the current record. */
r.plainText = r.plainText().replaceAll(RE,"");
});
Note: Code is not tested at all and requires at least one HTML record to be selected.
Removing images from RTF(D)
I’m uncertain if DT ever creates an RTF file with embedded images. When I clipped a web document in RTF format, I got an RTFD, which is in fact a folder. What one could do (and that’s really quite convoluted):
get the path of the “document” (in fact, the path of the folder)
export that folder to some temporary place on the computer (like /tmp)
open the file TXT.rtf in this folder
in that file, remove all references to images. Note that they might also come in the form of URLs to external images like Avatars.
write the modified contents back to disk
import this RTF file again into DT
remove the folder from /tmp and the old RTFD from DT
As I said: Convoluted, but not impossible.
More than you asked for
Also, this will remove all images. Which is not necessarily helpful if someone included an image to make a point (as opposed to get on your nerves with an avatar).
So, you might as well just save the web document as text (or markdown), perhaps.
This script creates RTF records from selecetd RTFD records.
Note: This script creates new records. They have a new UUID, i.e. existing links still link to the original record and NOT to the new one.
(It is possible to write the RTF data back into the original record, however DEVONthink then doesn’t know that the record has changed and still shows its type as RTFD. If one afterwards opens the record and changes its content (e.g. by typing a space and deleting it), then DEVONthink changes the type to RTF. If someone wants to use such a script let me know)
-- Create RTF records from RTFD
-- Note: This script creates new records. They have a new UUID, i.e. existing links still link to the original record and NOT to the new one.
use AppleScript version "2.4"
use framework "Foundation"
use scripting additions
property moveOriginalRecordToTrash : true
tell application id "DNtp"
try
set theRecords to selected records
if theRecords = {} then error "Please select some RTFD records."
repeat with thisRecord in theRecords
set thisRecord_Type to (type of thisRecord) as string
if thisRecord_Type is in {"rtfd", "«constant ****rtfd»"} then
set thisRecord_Path to path of thisRecord
my importRTFVersion(thisRecord_Path, thisRecord)
end if
end repeat
on error error_message number error_number
if the error_number is not -128 then display alert "DEVONthink" message error_message as warning
return
end try
end tell
on importRTFVersion(theRecord_Path, theRecord)
try
set theURL to current application's NSURL's fileURLWithPath:theRecord_Path
set {theAttributedString, theError} to current application's NSAttributedString's alloc()'s initWithURL:theURL options:(missing value) documentAttributes:(missing value) |error|:(reference)
if theError ≠ missing value then error (theError's localizedDescription() as string)
set theAttributedString_Range to {location:0, |length|:theAttributedString's |length|()}
set theDocumentAttributesRTF to {NSDocumentTypeDocumentAttribute:(current application's NSRTFTextDocumentType)}
set theData to (theAttributedString's RTFFromRange:(theAttributedString_Range) documentAttributes:theDocumentAttributesRTF)
set theTempDirectoryURL to my createTempDirectory()
set theTempURL to ((theTempDirectoryURL's URLByAppendingPathComponent:(current application's NSProcessInfo's processInfo()'s globallyUniqueString()))'s URLByAppendingPathExtension:"rtf")
set {successWriteRTF, theError} to (theData's writeToURL:theTempURL options:(current application's NSDataWritingAtomic) |error|:(reference))
set theTempPath to (theTempURL's |path|()) as string
tell application id "DNtp"
try
set theImportedRecord to import theTempPath name (name without extension of theRecord) to (location group of theRecord)
tell theImportedRecord
set aliases to aliases of theRecord
set comment to comment of theRecord
set creation date to creation date of theRecord
try
set custom meta data to custom meta data of theRecord
end try
set exclude from search to exclude from search of theRecord
set exclude from see also to exclude from see also of theRecord
set exclude from Wiki linking to exclude from Wiki linking of theRecord
set label to label of theRecord
set locking to locking of theRecord
set rating to rating of theRecord
set state to state of theRecord
set tags to tags of theRecord
try
set thumbnail to thumbnail of theRecord
end try
set unread to unread of theRecord
set URL to URL of theRecord
end tell
if moveOriginalRecordToTrash then
move record theRecord to trash group of (database of theRecord)
end if
on error error_message number error_number
if the error_number is not -128 then display alert "DEVONthink" message error_message as warning
return
end try
end tell
set {successDeleteDir, theError} to (current application's NSFileManager's defaultManager()'s removeItemAtURL:(theTempDirectoryURL) |error|:(reference))
if theError ≠ missing value then error (theError's localizedDescription() as string)
on error error_message number error_number
activate
if the error_number is not -128 then display alert "Error: Handler \"importRTFVersion\"" message error_message as warning
current application's NSFileManager's defaultManager()'s removeItemAtURL:(theTempDirectoryURL) |error|:(missing value)
error number -128
end try
end importRTFVersion
on createTempDirectory()
try
set theTempDirectoryURL to current application's |NSURL|'s fileURLWithPath:((current application's NSTemporaryDirectory())'s stringByAppendingPathComponent:("" & space & (current application's NSProcessInfo's processInfo()'s globallyUniqueString())))
set {successCreateDir, theError} to current application's NSFileManager's defaultManager's createDirectoryAtURL:theTempDirectoryURL withIntermediateDirectories:false attributes:(missing value) |error|:(reference)
if theError ≠ missing value then error (theError's localizedDescription() as string)
return theTempDirectoryURL
on error error_message number error_number
activate
if the error_number is not -128 then display alert "Error: Handler \"createTempDirectory\"" message error_message as warning
error number -128
end try
end createTempDirectory
No, I was not thinking about OCR. Rather: does it copy the content (aka bytes) of the images into the RTF like a formatted note in DT does. Apparently not.
Btw: what’s the difference here between RTF and RTFD? Afaict, RTF would allow to reference or embed images, too.
WordService provides a service to remove attachments. In addition, the hidden preference RichNotesWithoutAttachments makes it possible to always capture RTF and never RTFD.
Thank you so much for this. I’m a DIY guy and should have considered using emacs to either scan through RTFD documents, kill the image-inclusion bits, and delete the associated files in the hidden directory.
It works! Amazing script. I used to program in Obj-C (command-line tools only) and still to this day have not yet included the Foundation framework in any of my scripts and taken them to the next level. Lovely work. Thanks.