# Session 4

Of the videos, this is the fourth session.

This will get replaced with the video (if the JavaScript runs properly).

Parts 17 to 20 of TeX: The Program. Token, token lists, eqtb (+hash table), saving and restoring for groups. This video in playlist: https://www.youtube.com/watch?v=D1jhVMx5lLo&list=PL94E35692EB9D36F3&index=15 0:03 "macro" and "control sequence" are basically the same in TeX (though only the former is used when talking of WEB). 0:48 Three kinds of control sequences: \a or \& (one character long), \end (two or more letters), ~ (active character). (Apparently \& and & were not different in the old version of TeX.) Note that spaces are not gobbled after active characters, while they are, after the first two (control sequences). 2:34 What's nontrivial in programming this: making sure control sequences disappear at the end of a group, and the 2x2 possibilities with \long and \outer. 5:30 A way to represent these (macros / control sequences) is "the other important data structure that appears in mem". (I think this means: these are memory_words, interpreted and organized in a particular way.) They are called token lists (Part 20, explained in (now) sections 289 and 291). A token fits in a half word; the other half of the memory word contains a link (for token lists). 6:20 The idea of a token is to represent, in a halfword (16+ bits), everything that TeX is scanning. This is represented as a pair (command code, character code). E.g. if you scan the byte or character 'A', you get a command code of 11 (for letter), and a character code of 65 (for 'A'). 7:28 The command codes are defined in module 207. Some are the same as catcodes, but catcodes that will never actually get into a token list are reused for other commands. E.g. the escape char (catcode 0) will never get into an actual token list because TeX would have done something on seeing the escape char. So command code 0 is also used to mean "\relax". Similarly, command code 5 (catcode 5 = end of line) is also used for "output a macro parameter". Command codes that will actually get into tokens are from 1 to 14. 11:00 The other case of a token is a control sequence, like \relax or \blah or \& or ~ or whatever. In this case the token is represented as the index ("pointer") of the name ("relax" or "blah" or whatever) in the hash table (eqtb). (Still packed in the same 16 bits; the two are distinguished by being less than 2^8*14 + 256 or more than cs_token_flag = 2^12 - 1.) 12:42 to 17:12(?): Digression: Toby(?) asks about input being limited to 7-bit codes...DEK goes into what would need to change; he would make the change eventually in ~1989. 17:12 Return to topic. The procedure "show_token_list" (section 292) is worth reading, to solidify understanding. The complicated example shown there, in 291 and 292: > This is not intended to be the simple example. This is intended to be the one that shows you everything. So that you'll be completely puzzled at first, but if you have a little patience you'll feel that nothing is--- once you figure this one out, you'll feel that you have total power to do all the rest. This example shows all the kinds of things that can happen, except for matching a left brace, which is a slightly special case. (Wow: considers the case of someone having multiple characters of catcode 6, e.g. both # and something else.) 22:22 Reason for treating all whitespace characters the same. 28:40 The procedure show_token_list(p, q, l) (section 292), which prints a token list starting at p, with nodes up to q (optionally non-null) printed on top line and the rest below, and stopping at l characters. 32:45 > Here I want to talk about another one of TeX's big tables: this is the table called the "table of equivalents". The table of equivalents is where we keep all of the meanings of all the control sequences. The table of equivalents is described starting in [section 220], and it's called "eqtb". It's one of the few places where I've used a cryptic abbreviation instead of a word... or something unpronounceable as an identifier in this program. I used to do that a lot, but now having the freedom to use multi-letter identifiers, it's turned out usually better to have a long pronounceable identifier. But this darned one was used so often I couldn't see myself writing out "equivalents_table" and I couldn't think of a good in-between, so it's called "eqtb" and I don't know what I say to myself except "ik-tb" or something like that when I read it-- I'm sorry about it. 33:59 The six regions in eqtb (see section 220) -- all of these are things that go away at the end of a group 36:44 For the first four regions, store 4 bits + 16 bits, called (eq_type, equiv). (The first 4 bits are for eq_level.) The eq_type is a command code from sections 207-210 (and more?) -- e.g. a primitive like "def", or "openin", or "parshape", or a macro call that's any one of long * outer etc. The equiv is e.g. for a control sequence, the start of the token list that has the meaning. 38:14 The (eq_type, equiv) here gets called (cmd, chr) by the scanner when it reads it -- though note that some of these eq_types (command codes) will never be seen by the scanner as a command (section 210). 39:40 The last two regions of eqtb have integers (scaled or otherwise) in them. 40:08 Corresponding to region 2 of eqtb (equivalents of multi-letter control sequences) there also exists a corresponding hash table (array) with as many entries as in region 2. Each hash array entry has one pointer (index) to the string pool, and another one internal to the hash table (link to another entry in list). The hash table uses the thesis work of DEK's student J. S. Vitter (see mention in 261). Converts each string to a number (hash code) in about the first 85% of the table, and the first thing that collides goes into the last 15%, etc. The hash table lists are on the average 1.7 or so ("fewer than two"). 42:01 "There are details about hashing in another book I wrote" -- reference to TAOCP 42:22 Recap of how eqtb is used (useful example, leading up to a segue into save_stack at 45:30). 45:30 save_stack, a table of memory words that are going to be put back later. (Section 268 / Part 19.) E.g. location 226 in region 1 of eqtb contains the meaning of "\a". So before overwriting eqtb[226], push onto the stack the old meaning (a half word?), and then "226". Later when popping the stack we'll re-set eqtb[226] to this meaning. (Some technicalities.) So save_stack contains these pairs of entries. 54:59 "It's one of the more subtle parts of TeX" 1:02:30: > [Module 1335 now] It gives this message: "end occurred inside a group at level" and then it prints out cur_level - level_one. If you can think of a more informative error message than that -- That seems to be one that my wife understood. That's my test. 1:03:00 Someone asks a question, example worked out: \def\a{alpha} {\def\a{beta} {\gdef\a{gamma}} } 1:07:30 There's a hint of how to "prove" this, stated informally in (now) section 283. 1:08:25 there's a question I can't catch, but the answer seems to be about how to read the TeX program: > Start at the beginning I would say -- Start at the beginning, read some of these comments and familiarize yourself with use of the index, and then set yourself some problem or other. In other words, something say "I wonder how he does that". Now the way you can do that -- that's a good question -- if you take any one of TeX's primitives like "def" or something like that, you can look it up in the index under that name and it will say "def primitive" and it will refer you to the place where it was put in the hash table and that will refer you to what command code it has and you might be able to trace through looking at the index the whole seq-- the whole history of def, how it comes through TeX. If you don't like that problem, give yourself some other little task, saying "I wonder what this does" and that will just give you a reason for perusing the index and finding your way through the report... the main thing to do is just to get a little familiar with the notation and mess up the page(?) get the page a little black on the edges.