This will be a quick post to collect some screenshots.
I ran diff
in the terminal (the macOS default Terminal) on a couple of files containing text in Kannada script, and the output was unreadable. So I tried installing a bunch of other terminals: turns out they’re all even worse.
Here’s the test case (just picking the real-world text I was trying to diff; it may contain OCR errors but is well-formed): the string “ತನ್ನ ಒಂದು ಸತ್ಯಸಂಕಲ್ಪದಂತೆ ಸೃಷ್ಟಿಯಲ್ಲಿ ವ್ಯವಸ್ಥೆಯಿಲ್ಲದೆ ತನ್ನ”. In plain text:
ತನ್ನ ಒಂದು ಸತ್ಯಸಂಕಲ್ಪದಂತೆ ಸೃಷ್ಟಿಯಲ್ಲಿ ವ್ಯವಸ್ಥೆಯಿಲ್ಲದೆ ತನ್ನ
It renders perfectly fine in, say, Emacs (GUI, not terminal):
Now observe what different terminals do (in roughly descending order of quality):
Terminal (the default macOS terminal):
iTerm2:
WezTerm:
Alacritty:
kitty:
(Note how some glyphs are missing, and text is cut off below a certain depth!)
Warp (a terminal that requires sign-in!)
Tabby:
Hyper:
Contour:
This was surprising! Filed bug/request here: after the fix, it looks like:
What’s going on?
Well I’ve exceeded my time budget for this post, so I’ll leave discussion of wcwidth and all that for another time. In the meantime, here’s something I found that discusses the issue somewhat: a Unicode Technical Committee “Text Terminal Working Group” formed in 2023:
Looking at the Devanagari in the screenshot in the second link, it appears that support is slightly better on Linux (Konsole/Gnome-Terminal, and especially mlterm), but still rather buggy.
Edit: Here are a couple of other unconventional approaches that work fine. Maybe I’ll start using one of them as my primary terminal?
Jupyter notebook, running in the browser:
(It would be annoying to have to prefix !
before each command, but I believe there are Jupyter kernels for bash and zsh, though I haven’t tried either of them yet.)
Emacs M-x shell
(similar with M-x eshell
):
Edit (2024-07-14): I’ve started using eshell as my primary shell (at least when working with files that have non-ASCII filenames), and am very happy with it.
A bit about what’s going on is in my comments here. In short, it goes something like this:
-
Terminal emulators (unlike eshell and Jupyter notebook above) have a notion of a “grid of cells” where a certain “cell” can be addressed by row and column. (For example, a terminal program like Vim or anything that uses ncurses may want to “draw” something at a certain cell position. Also, there is an assumption that cursor movement will only be by integer multiples of cells.) Needless to say, this pretty much only works in the case of monospaced text in ASCII (or just a bit more).
-
For scripts that require complex text layout (like Indic scripts), one needs to leave things to the font and shaping engine (like HarfBuzz / DirectWrite / Core Text), which can give widths of glyphs that are not integer multiples of the “cell” width. All the problematic terminals above seem to be trying to force these glyph widths to the grid in a dumb way, either rounding up (Terminal, iTerm2) or forcing each grapheme or glyph to one cell (and having text overlap). None of these approaches will (or even can?) work with Indic text.
-
This means that only a shell / command-line interpreter (like eshell
mentioned above) that does not pretend to be a terminal emulator (i.e. does not try to force everything to a grid) can properly render Indic text. (It may be possible to get something acceptable by scaling glyphs, as I suggested in the above comment, but sadly no terminal emulator has implemented this.)
(A thought I had is that because terminal-emulator developers are likely not familiar with non-Latin scripts, it may be interesting to ask them to properly support setting your terminal font to a variable-width one (i.e. like Helvetica or Garamond: not a monospaced font), and having things like em and en dashes and hair spaces etc work properly, which will break the grid-of-cells assumption and then maybe Indic scripts will work too, the way emojis lead to Unicode support.)