In DT’s markdown editor, certain unicode characters require two backspaces to delete, instead of one.
In some instances, if the user presses backspace once instead of twice, the unicode character will cease to be visible. However, the document cannot save in this state, even if ⌘S is manually pressed, presumably because the “remnant” of that unicode character corrupts the text. As a result, unexpected data loss could happen without any sort of warning. I have actually lost my edits a couple of times in this way.
How to reproduce
Scenario A (a quirk)
Create a new markdown document, use Source view.
Type “Test☯️” in the editor. (Note: unicode character yin yang is used)
Backspace once and save. Now the editor still shows “Test☯️”.
Switch to Preview and then back to Source. Both views now shows “Test☯” (unicode character YIN YANG). This is an unexpected change.
It turns out that two backspaces are needed to fully delete ☯️, whereas only one is needed to delete ☯. Seems like an inconsistency.
Scenario B (a serious bug)
Create a new markdown document, use Source view.
Type “Test💟” in the editor. (Note: unicode character heart decoration is used)
Backspace once and press ⌘S. The symbol has disappeared in the editor, as the user would expect. The menu bar item “Data” has flashed once.
Close the document and then reopen. It is now empty! The previous edit has been lost.
Redo steps B1 through B3. In this state you cannot switch between views (Preview/Source/Two panes), which is one of the only implicit indications that something has gone wrong. The other is file size (still showing 0 bytes after B3), however that is not something one would often pay attention to.
Comments
These are not the only unicode characters that trigger unexpected behaviors.
The data loss possibility is particularly nasty for two reasons:
(1) There would not be any explicit indication that DT cannot save a document containing a “half-deleted” unicode symbol.
(2) The “half-deleted” symbol leaves no visually discernible trace in the editor.
This bug does not seem to occur in the plain text editor. Only the markdown editor is concerned.
How to prevent
As users, for the time being, we’d better backspace twice when removing a unicode symbol in markdown, just to be sure.
Basically it always says /Users/xxx/Library/Application Support/DEVONthink 3/Inbox.dtBase2/Files.noindex/md/37/New Markdown Text 7.md: Unicode text, UTF-8 text, with no line terminators when there is a unicode character present.
When I fully delete the unicode character, leaving only 4 bytes (“Test”), it says /Users/xxx/Library/Application Support/DEVONthink 3/Inbox.dtBase2/Files.noindex/md/37/New Markdown Text 7.md
The video shows that when the heart symbol is inserted, the text “Test” moves downwards a little bit, presumably because the unicode symbol has a larger height. When the symbol is removed with one backspace, the text does not move upwards back to its original position. The document in this state could not be saved.
I see similar behavior with BBEdit.
The first character (yin yang) consists of six bytes in UTF-8 (E2 98 AF EF B8 8F)
Its Unicode value is U+262F U+FE0F
The second one (heart decoration) consists of four bytes (F0 9F 92 9F, U+1F49F)
A single backspace after yin yang gives the b/w yin yang symbol (E2 98 AF in UTF-8, U+262F in Unicode). Thus, it apparently deletes three bytes from the back of the character.
In BBEdit, a backspace after the heart decoration removes the symbol entirely.
The three bytes removed from the yin yang represent the “Variation Selector-16” in the variation selectors. So, it makes sense that removing that gives you the ordinary yin yang in black/white.
It reminds me of Apple’s usage of decomposed umlauts: In every other OS, a ü is a single glyph. In macOS/iOS/iPadOS, it is combined by a diaresis (¨) and an u. In some situations, you have to press backspace twice to get rid of the ü. PITA.
Now, for the heart – that’s a weird thing. I can copy the heart from your post and paste it into a new MD document in DT. If I duplicate that document, it is empty. Nothing.
OTOH, If I type the heart and then “test” on a new line and delete the heart, the MD file is kind of ok (i.e. it still contains “test”). It displays a bit weird sometimes, though.
So, I’d say the problem with the “heart decoration” is a bug. The one with the yin yang is a nuisance. But Unicode is complicated…
I agree with this. It’s not a big deal if a few emotes refuse to behave as they should. The real issue is that a half-deleted character could result in data loss, as I have explained in my original post. The issue is compounded by the fact that it’s EXTREMELY EASY to misselect emojis, instead of words, when using the Simplified Chinese (Pinyin) input method.
If you are dealing with a markdown document that cannot save, there is a possibility that it’s caused by an invisible half-deleted unicode character. Better not attempt to find out where the invisible culprit is, since it is, well, invisible.
The practical solution is to screenshot the editor pane (use multiple screenshots if needed), extract text from the screenshot(s), and then paste the extracted text into a new document (or optionally, overwrite the current document).