To simply use TeX, you don’t need to know anything about how the TeX program itself is implemented.
If you do want to read the source code of the TeX program, you could just do it, as it has both been published in book form:
and is available as a PDF:
texdoc texif you have a TeX distribution installed, or else find it online here (3.4 MiB).
However, neither is perfect:
(Book ≫ PDF) The book has a useful introduction and appendices (including a diagram) that are not present in the PDF, and more importantly every page-spread (pair of facing pages) has a mini-index for all the identifiers that occur on those pair of pages,
(PDF ≫ book) the PDF has (some) click-able cross-references, which in the book require manually turning pages and so on.
This webpage will attempt to provide the best of both worlds, and eventually to explain the program and make it easily understandable. But it is still under construction, and will take some time.
In the meantime, here are some pages from the book that are not in the PDF (scans for now; hope to transcribe and/or create equivalents):
Preface (page v)
How to Read a WEB (pages viii-xiii – temporarily taken down till I can double-check permission issues)
Chart of TeX’s code components (dark) and memory regions (light+dark) (page 594), also available here (via this in TUGboat – but note that the middle section is scaled by amount of code as Knuth says in the videos, and not by memory structure as the caption in TUGboat says: scaling by memory is only for the outer sections).
Additionally, to become familiar with the WEB style of writing programs, you may also want to read:
“Literate Programming” by Donald Knuth, The Computer Journal Volume 27 (1984), pages 97 to 111.
The WEB manual: The WEB system of Structured Documentation (also available with
Many articles available on the excellent literateprogramming.com website (I wonder who is behind it?), e.g. An Introduction to the WEB Style of Literate Programming by Bart Childs.
Wayne Sewell’s book Weaving a Program: Literate Programming in Web
Then, you may want to work your way up to TeX from smaller programs written by Knuth in the same style: I’ve prepared a List of WEB files. Specifically, a possible reading order, from smallest to largest, is:
|Program||Pages||Sections||Fraction of TeX|
|POOLTYPE||7 + 4||20 + 2||≈ 1.4% to 2%|
|GLUE||8 + 3||26 + 1||≈ 1.6% to 2%|
|DVITYPE||47 + 7||111 + 2||≈ 8% to 10%|
|TANGLE||66 + 9||187 + 2||≈ 13.5% to 14%|
|WEAVE||98 + 12||263 + 2||≈ 19% to 20.5%|
|TEX||478 + 57||1378 + 2||100%|
DVITYPE share a lot of their code with TeX (specifically, TeX’s string handling and DVI output sections respectively),
GLUE shows alternatives to some code in TeX (and is probably best read after understanding the corresponding parts of TeX, even though it was published first),
WEAVE are the implementation of WEB, and perhaps worth reading as programs of smaller/intermediate size.
I have separate pages for each of these programs on this site:
Totally unrelated to TeX, but you could look at other “literate programs” entirely: Knuth’s CWEB programs, or the (Academy Award winning!) Physically Based Rendering book (see random chapter).
You can read Jensen and Wirth (“Pascal user manual and report”), the original tutorial and definition of Pascal.
An interesting document is Kernighan (the K in K&R) on “Why Pascal is Not My Favorite Programming Language”. Kernighan faced many difficulties with Pascal, and WEB in many ways is a solution to those same difficulties (I wrote more on that at the beginning and end of the annotated version of POOLTYPE).
You can try writing a few of your own small programs in Pascal (with and without WEB), as with my 7-page “Hello, world!” program here.
You can try reading some Pascal programs written by others, so that at least the idiosyncrasies that are common to all Pascal programs can be got over. My suggestions are:
Note that these two programs are both compilers. TeX itself is written like a compiler, so you’ll find many similarities.
Instead of reading the Pascal source as written by Knuth, there are many other versions you could read. I’ve collected a bunch here.
Many (not all!) are only of historical interest.
The most relevant may be LuaTeX, which is under active development, and started with a manual translation of the Pascal code to C. For (a randomly picked?) example, compare section 426 of the TeX program with this part of LuaTeX — it is in more familiar C style, but there are also more cases. (Compare with pdfTeX and XeTeX.)
Finally, after having read these smaller programs and having gained a bit of familiarity with Pascal, before reading the full TeX program I strongly recommend the series of 12 lectures that Knuth gave in 1982 called The Internal Details of TeX82. They are available on YouTube, but I’ve embedded them on this website, with some comments, here.
Also, as you read the program / watch the videos, you can also try solving these exercises from a course DEK gave about TeX:
It is unclear how much this would help, but often the earlier versions of a program are less complex, or at least illuminate how the program got into its current state.
TeX’s early “design documents” (TEXDR.AFT and TEX.ONE) are available on https://www.saildart.org/ (also published in Digital Typography).
The site also contains early versions of the TeX programs, in the SAIL language.
If you’re done with all the prerequisites and are ready for reading TeX itself, click here for a raw dump of TeX.
A debugger may help. I started writing something (has a really bad interface and is implemented in a stupid way currently).
After looking at the program.