The sorry state of terminal support for Indic scripts (on macOS)

This will be a quick post to collect some screenshots.

I ran diff in the terminal (the macOS default Terminal) on a couple of files containing text in Kannada script, and the output was unreadable. So I tried installing a bunch of other terminals: turns out they’re all even worse.

Here’s the test case (just picking the real-world text I was trying to diff; it may contain OCR errors but is well-formed): the string “ತನ್ನ ಒಂದು ಸತ್ಯಸಂಕಲ್ಪದಂತೆ ಸೃಷ್ಟಿಯಲ್ಲಿ ವ್ಯವಸ್ಥೆಯಿಲ್ಲದೆ ತನ್ನ”. In plain text:

ತನ್ನ ಒಂದು ಸತ್ಯಸಂಕಲ್ಪದಂತೆ ಸೃಷ್ಟಿಯಲ್ಲಿ ವ್ಯವಸ್ಥೆಯಿಲ್ಲದೆ ತನ್ನ

It renders perfectly fine in, say, Emacs (GUI, not terminal):

Now observe what different terminals do (in roughly descending order of quality):

Terminal (the default macOS terminal):





(Note how some glyphs are missing, and text is cut off below a certain depth!)

Warp (a terminal that requires sign-in!)




This was surprising! Filed bug/request here: after the fix, it looks like:

What’s going on?

Well I’ve exceeded my time budget for this post, so I’ll leave discussion of wcwidth and all that for another time. In the meantime, here’s something I found that discusses the issue somewhat: a Unicode Technical Committee “Text Terminal Working Group” formed in 2023:

Looking at the Devanagari in the screenshot in the second link, it appears that support is slightly better on Linux (Konsole/Gnome-Terminal, and especially mlterm), but still rather buggy.

Edit: Here are a couple of other unconventional approaches that work fine. Maybe I’ll start using one of them as my primary terminal?

Jupyter notebook, running in the browser:

(It would be annoying to have to prefix ! before each command, but I believe there are Jupyter kernels for bash and zsh, though I haven’t tried either of them yet.)

Emacs M-x shell (similar with M-x eshell):

Edit (2024-07-14): I’ve started using eshell as my primary shell (at least when working with files that have non-ASCII filenames), and am very happy with it.

A bit about what’s going on is in my comments here. In short, it goes something like this:

(A thought I had is that because terminal-emulator developers are likely not familiar with non-Latin scripts, it may be interesting to ask them to properly support setting your terminal font to a variable-width one (i.e. like Helvetica or Garamond: not a monospaced font), and having things like em and en dashes and hair spaces etc work properly, which will break the grid-of-cells assumption and then maybe Indic scripts will work too, the way emojis lead to Unicode support.)