The following is the comment I had left on YouTube, before comments were disabled:

Parts 17 to 20 of TeX: The Program. Token, token lists, eqtb (+hash table), saving and restoring for groups.
This video in playlist: https://www.youtube.com/watch?v=D1jhVMx5lLo&list=PL94E35692EB9D36F3&index=15 0:03 “macro” and “control sequence” are basically the same in TeX (though only the former is used when talking of WEB).

0:48 Three kinds of control sequences: \a or & (one character long), \end (two or more letters), ~ (active character). (Apparently & and & were not different in the old version of TeX.) Note that spaces are not gobbled after active characters, while they are, after the first two (control sequences).

2:34 What’s nontrivial in programming this: making sure control sequences disappear at the end of a group, and the 2x2 possibilities with \long and \outer.

5:30 A way to represent these (macros / control sequences) is “the other important data structure that appears in mem”. (I think this means: these are memory_words, interpreted and organized in a particular way.) They are called token lists (Part 20, explained in (now) sections 289 and 291). A token fits in a half word; the other half of the memory word contains a link (for token lists).

6:20 The idea of a token is to represent, in a halfword (16+ bits), everything that TeX is scanning. This is represented as a pair (command code, character code). E.g. if you scan the byte or character ‘A’, you get a command code of 11 (for letter), and a character code of 65 (for ‘A’).

7:28 The command codes are defined in module 207. Some are the same as catcodes, but catcodes that will never actually get into a token list are reused for other commands. E.g. the escape char (catcode 0) will never get into an actual token list because TeX would have done something on seeing the escape char. So command code 0 is also used to mean “\relax”. Similarly, command code 5 (catcode 5 = end of line) is also used for “output a macro parameter”. Command codes that will actually get into tokens are from 1 to 14.

11:00 The other case of a token is a control sequence, like \relax or \blah or & or ~ or whatever. In this case the token is represented as the index (“pointer”) of the name (“relax” or “blah” or whatever) in the hash table (eqtb). (Still packed in the same 16 bits; the two are distinguished by being less than 2^8*14 + 256 or more than cs_token_flag = 2^12 - 1.)

12:42 to 17:12(?): Digression: Toby(?) asks about input being limited to 7-bit codes…DEK goes into what would need to change; he would make the change eventually in ~1989.

17:12 Return to topic. The procedure “show_token_list” (section 292) is worth reading, to solidify understanding. The complicated example shown there, in 291 and 292:

This is not intended to be the simple example. This is intended to be the one that shows you everything. So that you’ll be completely puzzled at first, but if you have a little patience you’ll feel that nothing is— once you figure this one out, you’ll feel that you have total power to do all the rest. This example shows all the kinds of things that can happen, except for matching a left brace, which is a slightly special case.

(Wow: considers the case of someone having multiple characters of catcode 6, e.g. both # and something else.) 22:22 Reason for treating all whitespace characters the same.

28:40 The procedure show_token_list(p, q, l) (section 292), which prints a token list starting at p, with nodes up to q (optionally non-null) printed on top line and the rest below, and stopping at l characters.

32:45

Here I want to talk about another one of TeX’s big tables: this is the table called the “table of equivalents”. The table of equivalents is where we keep all of the meanings of all the control sequences. The table of equivalents is described starting in [section 220], and it’s called “eqtb”. It’s one of the few places where I’ve used a cryptic abbreviation instead of a word… or something unpronounceable as an identifier in this program. I used to do that a lot, but now having the freedom to use multi-letter identifiers, it’s turned out usually better to have a long pronounceable identifier. But this darned one was used so often I couldn’t see myself writing out “equivalents_table” and I couldn’t think of a good in-between, so it’s called “eqtb” and I don’t know what I say to myself except “ik-tb” or something like that when I read it– I’m sorry about it.

33:59 The six regions in eqtb (see section 220) – all of these are things that go away at the end of a group

36:44 For the first four regions, store 4 bits + 16 bits, called (eq_type, equiv). (The first 4 bits are for eq_level.) The eq_type is a command code from sections 207-210 (and more?) – e.g. a primitive like “def”, or “openin”, or “parshape”, or a macro call that’s any one of long * outer etc. The equiv is e.g. for a control sequence, the start of the token list that has the meaning.

38:14 The (eq_type, equiv) here gets called (cmd, chr) by the scanner when it reads it – though note that some of these eq_types (command codes) will never be seen by the scanner as a command (section 210).

39:40 The last two regions of eqtb have integers (scaled or otherwise) in them.

40:08 Corresponding to region 2 of eqtb (equivalents of multi-letter control sequences) there also exists a corresponding hash table (array) with as many entries as in region 2. Each hash array entry has one pointer (index) to the string pool, and another one internal to the hash table (link to another entry in list). The hash table uses the thesis work of DEK’s student J. S. Vitter (see mention in 261). Converts each string to a number (hash code) in about the first 85% of the table, and the first thing that collides goes into the last 15%, etc. The hash table lists are on the average 1.7 or so (“fewer than two”). 42:01 “There are details about hashing in another book I wrote” – reference to TAOCP

42:22 Recap of how eqtb is used (useful example, leading up to a segue into save_stack at 45:30).

45:30 save_stack, a table of memory words that are going to be put back later. (Section 268 / Part 19.) E.g. location 226 in region 1 of eqtb contains the meaning of “\a”. So before overwriting eqtb[226], push onto the stack the old meaning (a half word?), and then “226”. Later when popping the stack we’ll re-set eqtb[226] to this meaning. (Some technicalities.) So save_stack contains these pairs of entries.

54:59 “It’s one of the more subtle parts of TeX”

1:02:30:

[Module 1335 now] It gives this message: “end occurred inside a group at level” and then it prints out cur_level - level_one. If you can think of a more informative error message than that – That seems to be one that my wife understood. That’s my test.

1:03:00 Someone asks a question, example worked out: \def\a{alpha} {\def\a{beta} {\gdef\a{gamma}} }

1:07:30 There’s a hint of how to “prove” this, stated informally in (now) section 283.

1:08:25 there’s a question I can’t catch, but the answer seems to be about how to read the TeX program:

Start at the beginning I would say – Start at the beginning, read some of these comments and familiarize yourself with use of the index, and then set yourself some problem or other. In other words, something say “I wonder how he does that”. Now the way you can do that – that’s a good question – if you take any one of TeX’s primitives like “def” or something like that, you can look it up in the index under that name and it will say “def primitive” and it will refer you to the place where it was put in the hash table and that will refer you to what command code it has and you might be able to trace through looking at the index the whole seq– the whole history of def, how it comes through TeX. If you don’t like that problem, give yourself some other little task, saying “I wonder what this does” and that will just give you a reason for perusing the index and finding your way through the report… the main thing to do is just to get a little familiar with the notation and mess up the page(?) get the page a little black on the edges.

The transcript below is autogenerated from YouTube, and needs to be cleaned up.

trying on confusing to a novice user and macro is the word the manual uses when it’s when it lapses into computer science jargon but they’re the same really in concept and probably the new manual will just say that we call them control sequences because they control tech and they’re called macros because everybody program computer knows what macro isn’t and I don’t know anyway that’s that now on the other hand for web we also have macros and we don’t call those control sequences we just call those macros those are things that have been defined with with with the define instruction in there in your book now to represent a control sequence in tech there’s that means backslash something is a control sequence and the something is either for all these three kinds one is whether something is one character long and other case is whether something consists of two or more letters and this letter means that it’s that the chi code the code the type of the kick of the character in the two code table is 11 that’s that’s what a letter means so those are things that it two or more of those that makes a control sequence in tech like back / e and d then there’s the third kind which is an active character and that’s when it’s a code is 13 and those are those are something like single character control signal but they’re separate from the one from the backslash and it’s low as backslash ampersand would be different from ampersand if ampersand is to code at 13 integrity to not in the old version of tech but in a new one makes a difference to the scanner also between those last two types of control sequence starting with a backslash always ignores spaces that follow it so optional space always follows the follows control sequence studying the backslash but an active character on at least two code 13 kind of things does not gobble up any space after it so space after that we’ll we’ll survive it okay that now in the inside the machine we also have to take care of making sure that these control sequences can disappear at the end of a group if they’ve been defined inside of a group and we have to let’s see handle four kinds of control sequences that can be defined they can be defined to be long or outer or both now remember long it means that it’s their parameters can take the end of a paragraph symbol and an outer one is something that’s supposed to be illegal to use except when you’re at a quiet time some somehow in an outer part of the file not within something else both of these ideas long and outer are intended to catch a common error of someone miss miss placing a right missing a right brace somehow or not matching a parameter or because of some type typographic error and instead of spanning all the way to the end of the file we want to catch it real quick so if you if you run into something that’s outer then you know you’ve made a mistake so for example people will define something to be outer if it’s the beginning of a theorem enough in a math paper you shouldn’t have a beginning of a theorem in the middle of some formula right and and you define something to be long if it’s as if you expect that the parameter to that macro will possibly have more than one paragraph in it but otherwise soon as somebody gets to the end of a paragraph and hasn’t finished the argument to a macro that DeMeco isn’t long then tech will stop right there and and and recover with with some luck it’ll recover and go on to the next photograph with a fresh slate so and so we just mark those these four kinds of control sequences long outer or both or neither question here please please the reach behind you for a microphone and push the red button no it has to go on the videotape push the red button play button okay and yeah wait a minute puppy heavy do things like I tau backslash I teef right Alex will they have to be declared to be long or since their movements I want the backslash colon something they automatically be listen they don’t take a parameter so it doesn’t matter that it’s only affect whether the parameter grinds back / PA are or not an empty line and backslash PA are exposed Lee exactly equivalent they should even match each other as as parameters go ok now when we’ve gone a long control sequence defined in there or macro or whatever you want to call it we have to have a way to represent it inside it inside of tech and this is the other important data structure that appears in mem these are all one word items and they call token lists and token lists are defined somewhere or other let’s see the table of contents set apart 20 starts at module 2 70 so token lists are explained there with a bunch of definitions but the real explanation starts on next page well did you got to turn the page back and forth on here so what I wanted to do when I representing a token list is has something that in in a half a word I’ll be able to represent everything that tech might want to be scanning so I have 16 bits to work with at least now what do i what do I actually get one I scan the letter A I get a command code and a character code so if I scan a letter A I get an 11 and a character code which is 65 for the internal code of a capital A 11 is letter and this is ok but it’s a number you know you don’t have to know these numbers you know to memorize them or anything if you were tracing with commands that would say the letter A because it would look at that the command code 11 and it would look at the character code and figure out how to output its internal character code 65 and 11 which would mean the letter now now command codes are very important and we’ll talk about those shortly but within a token they they’re restricted they can only actually be between 1 and 14 none of the commands coming through in a token list are going to be zero nor will they be greater than 14 for in some of those aren’t the same as that your codes at some of them are I guess we might as well look at those first 14 codes the command codes is that is a page that he finds all these internal numbers it’s a module 201 as I’m writing the tech program actually referring to this page more often in any other page but i don’t think the reader has to is just that as writing it i had to make sure that i had all the commands in there but anyway they think the below the small command calls that that’s the modules 201 they’re all they include the chip cards and then we double use the ones that could never come through so escape for example is zero because if you give it to code of zero to a character like a backslash it to make an escape that’s the code however never will a backslide with something with an escape code actually come through into a token this because that starts a control sequence and you’re going to go ahead and get the whole control sequence so that command court is also used to mean relax and our left braces own accord one is only left braised 22 is right brace and so on and then we get number five if somebody had a carriage return a character to coated five that acts like an end of a line and again it will never come through the scanner because we get to the end of line you look to the next line to get something and put a blank through the scanner so 5 is also used to mean output a macro parameter and it’s also used to mean Beck’s ly CR so it’s got 33 purposes here and for a character it will never come through if you type a hash mark that could come through as a six and so so long with all these other ones we get to an active character a 13 it’s going to be again a control sequence so it’s going to be replaced by something else it won’t come through the scanner as a token but the end of the paragraph is going to come through as a 13 and so will something I see match a macro parameter know that will be interrupted by actually matching the parameter so end of paragraph I guess it can come in as a 13 you know what happened now actually let me see ya know this 13 would come through but it’s only because it’s stored inside a token list itself but so this destruction in 14 are special special commands that that are inserted in into token lists just so that we can do our funny things with parameters when we’re when we’re fetching them out of the scanner and they never come through for user characters anyway this this command code will always be some number between 1 and 14 and none of the other values are possible there this chair code could be anything from 1 to 1 27 0 to 1 27 for a character and that’s one case of a token the other case of a token is control sequence like back / relax is the simplest control sequence but there are other you can use can you see this far down no you can’t see it yeah good idea ok so if now the get back / relax coming through the token we wouldn’t give the command code for relax but instead we’d give the the hash table pointer for relax since the sense between the time you define something with relax in it and the time you use it you might have changed the meaning of relax or this might be some control sequence there wasn’t even define it so for this we want to point to the table of equivalence for each identifier there’s a hash table that a table that associate with every every sequence of letters that you’ve used somewhere relax will appear in there and that will tell what the meaning of relax is and there will be a lv a command code and a Turk owed associated with relax at any given time but in token list we’ll just have a pointer to this word we won’t have the contents of this word since this might might change so now when we represent then a definition as a list of tokens each token is either a pair of command and char like this or it’s a control sequence not question Tony no I’ve Toby the the characters in the thoughts are extended to 56 but these tokens are still limited to 1.7 the inner front you can have 200 e 6 characters but on your keyboard input 0 7 bit codes are only accepted so so somebody does not so we are not accepting 256 codes in the extra array the extra array and the ex order a texture gives a gives a let’s suppose for example that we’re on a Cray where they have I think nine bit9 bit characters or say absa dick we have eight bit characters okay let’s say absolutely well in that case extra of 65 would be some 8-bit character x-art would be an array that takes any 8-bit character into a 7-bit internal code but my internal codes are only seven bit for for the things in the buffer for example in the string string pool are all seven bit so you have a hundred and twenty-eight internal codes that tech would work with that’s not a terribly inherent restriction except that the if you wanted to work with larger you’d have to increase the size of the check owed the UC code LC Codelco SF code tables because all of those our index by 0 2 127 but you would have room for well no I don’t had min quarter word to this to this guy either I don’t think so you would have to wealth anyway there would be room though because I I actually compute the value of this token by taking 11 x 256 and add 65 i compute the value of this token by taking a large number sufficiently large like to to the 10th or something 02 to the 16th to to the 12th i guess because this will never be more than 14 so it certainly won’t be 16 that would be 2 to the 4th times 2 to the eighth would be to the 12 so if i take 2 to the 12th plus the location of relax in the table will represent control sequence also so we call this to to the 12th is called the cs token flag and if you add CS token flag to locate asian of the city of the control sequence then you’ve represented the control sequences of token if you have anything less than 2 to the 12th anything less than this token flag then you divide by 256 and you figure out what to come the command in the character parts out of it as far as the but thus far as extending the the type ascii code to eight bits this would lead to in compatibilities with other versions of tech yet i don’t think it would mess up any of the data structures particularly one character a web says the one character constants are used for string 0 2 127 and that would also that would also have to be changed to allow one character classes to be larger because webs internal code is identical to text when you save double quote a and double quote in a web program it’s supposed to give the tech the same internal pro than tech users the internal code is of course just internal so you know it’s a it’s an arbitrary mapping if we can handle episodic we can handle anything but but as long as it only mapped in laws only gives us 128 different possibilities on end then we can move those 128 into a spa in to 256 or 512 afterwards but we just won’t won’t fill the the the whole range now the solid data representation of tokens as I’ve said has two cases because there’s two kinds of tokens there’s the control sequence tokens this includes all three kinds of control sequences because all three of them are located in the same array or the kind that are packed with a command and a character together now in order to demo the sea in order to demonstrate that again there’s a procedure called show token list and that starts in module 2 73 and it says two pages long but it goes through and interprets a token a whole token list and that’s what prints out a lot of the things that you would see in integra when you have a one its historic when it’s showing you what hit what it thinks I macro was it’s using this show token this procedure now there’s some more complication so that I gotta mention because of parameters you see a token lisken has to also represent the fact that a macro might have parameters that we want to match and so that’s all explained in module 2 72 and and there’s an example there which I might as well put on a blackboard but it’s also in there it’s also it’s also in the note now the example is the definition of a macro and we’re just going to show how that how that’s represented inside step-by-step so you get the idea so somebody has typed in the tech define backslash mac a hash one hash to a space and then backslash be and then in space space of course will be ignored since it follows a control sequence then left brace hash one backslash minus a space hash hash one this is not intended to be the simple example this is intended to be the one that shows you everything so that it so that you’ll be completely puzzled at first but if you have a little patience you’ll feel like that that nothing once you figure this one out then you’ll feel that you have total power to do all the rest now this is the so this example is shows all the kinds of things that can happen except for a matching a left brace which is a slightly special case now okay Mac is there going to be a token list inside the Machine side of tech and we and it’s going to be in that one word area of memory linked together and this left the info parts of each word is going to be a representation of the token so let’s see what they what it would look like well first of all you would have a letter and a that’s type of 11 but let’s write it symbolically so letter A that’s your first that’s the first one then you have a a parameter number one now you don’t store that as hash mark 1 well for one thing the one is redundant because the first parameter has to be number one so instead it just says match a parameter and doesn’t give any number in it at all but it does give the internal code number of the hash mark because somebody might have several different symbols that he’s using for macro parameters whatever type that is six or something and whichever one he used we could reflect there and type it out in that as that character if we are asked to type out this token list so we save that character there we call it match but then the one is surprised if the person wrote to here giving error message saying you know illegal you’re supposed to number your parameters consecutively or something like that and then that would catch up catch an arrow wear something else probably was wrong up a pod now matching so these two going to just match and duplicating the character code internal code of this guy similarly for that one and then a blank space a blank space is type spacer in the command code and the code for space I can mention that all that all space codes no matter what the character is if it’s a tight spacer I convert it to a blank on input so that they’ll all match each other when you’re studying when you’re matching macro parameters the carriage return a tab anything that you’ve said is if type space will match in a macro and the reason is that this had caused many errors where people were trying to make a definition and they would in one place have a carriage return another place have a space or a tab or something and it would look okay on their screen and say what’s going wrong so everything everything that’s been the trying to be of type spacer is actually converted you tech doesn’t remember anymore what what what this spacer was if it was an asterisk and somebody said asterisk was to code 10 it would just be equivalent to a space then comes letter no that’s ridiculous that’s a better bug ok that’s a butt that’s a bug in the comment there a so this would be a control sequence B so this would be yeah this would be control sequence so this would be CS flag plus the location of B is the next thing and so I guess I like in other words i can represent that notationally at this backslash be so to cross off the word letter there that’s the misfits a misprint in that line and you just say backslash be it’s a control sequence for b is the next thing after this blank space and then no blank space for that then comes the special token called n match and end match is the is the command code at the Dow and the chur is 0 i believe so the end match is a command code of 14 and match is a command code 13 i guess and yeah and and this tells tech that when you’ve when you’re trying to look for mac then you get here you finished now you’ve got all your parameters now proceeding then we would go into the first thing and this would say out / am one this would say it’s time as we’re continuing to read this token list to now start reading parameter one that we’ve already found as we’ve been reading this part of the of the token list and then it would say do this control sequence and i would say letter A and it would say blank space then it would say what well now in a right hand side of a definition to hash marks in a row is used to stand for a hash mark so this would now say Mac row parameter hashmark and then this is a other char one it’s not a letter as in digits are specially noted so this is a one of type 12 this is parameter 2 so it’s out part out / am two blank space significant after a digit usually not significant because it did it’s usually a constant but in case of parameters parameters are only one digit long to space is there blank space and again out per m2 and then the link of this last token will be zero or will be no I mean indicating that the token list is over way okay so inside of the inside of tech there would be this list of half word numbers and it and you ask the show token this procedure to print these and it would print what well what at Prince is shown on an example next page and 273 it prints the letter A then it prints hash mark 1 now it knows what your character is it knows what your hash mark character is by looking here it learns it as it goes because you might have used some other symbol in tech isn’t initialized to think that this character is any different than any than any others and different token this might have different hash characters in them and then hash two comes out blank space comes out and control sequence B now whenever a control sequence ends with a letter o this short token list automatically puts a space after it it just in case the thing after it is going to be a letter doesn’t look ahead to see it four letters next if a letter were next it would be a bad mistake to leave a space out so there’s always a space after of this if it’s a letter I think maybe even if it’s not a letter I forget now in the old Tekkit work it it may take it made a decision but I think now it always puts a space out remember for sure and then it gets to end match and it translates that into a hyphen and a greater than sign to to look something like a right arrow then comes out pram one gets translated into hash one okay presumably you’re not using out / am unless you’ve already had parameters so it’s already learned what what code to use for that parameter however the yeah okay I think it can learn always in time so it puts this out and it says there there’s no space after it so I have to double check the code and see if it if it does that or not it says backslash backslash hype and without a space and I’m not sure about that I’ll double check it then letter a space macro parameter that comes out as two of those symbols one comes out of the one and so on to the end now let me see how does so let’s see if we can figure out from this code what comes out when I displaying a control sequence so the the token list procedure looks like this i’m reading from from module 273 it’s got several parameters that p is the main one that’s where the token list starts q is another one that’s only used with if we’re printing out error messages if we get to position q were expressed to do something to switch from line 12 line two of the error message and l is an upper bound if we get to printing out too much output will quit no button no sense going any further than not in that upper bound okay now come into the program is it sets master to the two hash mark just in case we haven’t learned it in time I I think that’s only to make this procedure robust in case of token list that wouldn’t really have have arisen in the course of tech of a tech job n is 0 that’s the number of parameters we’ve seen so far and tally is the number of characters we generated okay it so the main thing we have to do is display token p and return if there are problems so that’s module 274 let’s look at 274 if it’s p is out of range then we print escape clobbered with a period after it so we’re debugging and you’re printing out a token list and you get to a pointer that it’s not in the 11 word area of memory then something is definitely wrong and we’re going to give up we’re going to print clobbered with a period after it or we can tell that distinct from the control sequence clobbered because why and if the user had a real control sequence clobbered it would also have a it would be get a space after it instead of a period so this way at least we’ll be able to look at the message and see that it was a there was an this kind of error indication but this procedure is not supposed to ever call error because it’s used by the error routine to describe the location of an error okay now if in for p is greater than or equal CS token flag then we’re in this case where we have a control sequence so we call on our sub routine called print CS which prints a control sequence otherwise we’re going to divide by aqua 400 or same as saying 256 and break it into the M&C part the M is the command code and the sea is that character code we print bad with a period after it if we happen to have a negative token for some reason or our character code greater than 127 otherwise we are going to display it in the fashion it’s got an ex played ok so we got it in order to solve our problem as to whether a space comes after this or not we have to look up print CS where in the heck is print CS well for that I can look in the index and sometimes I thought it would be neat to have a subscript on all my identifiers saying where they were be fine so I could save one one step in my search me too 45 okay thank you trouble with that is that some identifies are declared many times and so I wouldn’t I would have to know about the scope of each one and it would be quite tricky now that ok Prince CS it says a Princeton and will control sequence and a space after the name if it can consist entirely of letters and so that’s the idea here then it will not get up space after this backslash hyphen and this hyphen had a bench acota to be a type letter and it goes through the three the three cases of a of a control sequence how here we get I want to talk about another one of text big tables this is the table called the table of equivalence and the table of equivalence is where we keep all of the meanings of the of all of control sequences the table of equivalence is described starting on pay on module 213 and it’s called eqt beam it’s one of the few places where I’ve used a cryptic abbreviation instead of a word in the in the or something unpronounceable as an identifier in this program i used to do that a lot now having the freedom to use multi leather identifier as it turned out usually better to have a long pronounceable identifier but this darn one was used so often i couldn’t see myself writing out equivalent table and i couldn’t think of a good Dutch in between so it’s called eqt be and I don’t know what I say to myself and accept it table something like that when I read it I’m sorry about it but anyway there’s six parts to eq TV and they are an equal size but i’ll i’ll just pretend they are for the time being so there’s six regions and let’s take a look at what they are now the first region is one character control sequences so these are the things that we there’s 128 of them that could be active characters type 13 characters and then there’s 128 of them that is backslash character backslash a backslash hyphen and so on so so 256 all together one character control sequences the next region is the greater than or equal to letter control sequences the third region is for skip some glues glue parameters fourth region is what I called well half word entries but it’s like local quantities but it’s half word entry things like the name of the current fonts are the name of the current power shape and but in all of the boxes like bucks 14 bucks 0 up to box 255 pointers to those boxes pointer is a half word so this will this will boxes are now local they’ll go away at the end of a group unless you define them to be global when you said set box and so on anyway it’s put into eq TV then these are integers here and the last other dimensions so it has six parts to it and it’s used for keeping local quantities everything that I want to be subject to the local mechanism the thing of going away at the end of the group is stored in eq TV now especially the control sequences something for the for these entries in regions one for this funny it has six regions and they’re slightly different in their feature but one thing they have in common is that they are going away at the end of a group now in the first four regions for each guy in here we star a command and a modifier called the church to the thing the command is a quarter word and sure is a half word so every every control sequence for example has a command that indicates whether it’s a ordinary call or a long one or an outer one or long and outer or it might be a tech primitive so it might have a command code that says this is a def instruction or something like that so the command code says what this control sequence means the Turk code is a half word so it’s like for example in the case of a would define control sequence by the user H would be a pointer to the mem location for the beginning of the token list well it’s not really the beginning of the token list because there’s a reference count first it we work for each token list we keep by account of how many times people are using a token list whenever that count drops to 0 we flush it and that this is has turned out to be quite an a quite efficient way to do things so the it’s a point to the reference count of a token list in the case of a call instruction but it’s a half word the turf field is a half board and you have these for all four parts that’s what the equivalent is this is called the command field is actually called the EQ type and this part is called the equal and then later on if tech is using it through the scanner eq type gets called the command and equip gets called the church some of the EQ types are special that should never come through the scanner for example the cred paragraph shape is stored somewhere in this region here and it’s eq type says power shape and some power but par shape is when it comes through the scanner it’s not supposed to be in the current paragraph shape it’s supposed to mean the command to septic paragraph shapes it puts a different thing entirely so posh a peer means to tech the this equivalent has to be recycled in a funny way when you’re when you’re coming to the end of a group and you’re going to give up this power shape and go back to the outer one you have to return a power shape node to the free memory and that’s why you look at the EQ type to see that so some of the some of the EQ types are command codes that would come through the scanner some of them art but in any case all of these guys have a type indicating what the equiv equivalent it has now here in the last two regions though they don’t have a quarter word and a half word associated with them they have a forward an integer or a scale value of course because an integer wouldn’t fit in a quarter word and a half work so he knows regions when we fetch a word out of region five of them of eq TB we expect to take a whole integer out of it now region two is special because it has a hash table associated with it all of these guys furthermore have a hash array that runs from the beginning of region two to the end of region to the hash array will give information as to what’s the name what are these what are the letters in this thing so they have pointers here and one of them is a pointer to the string table of it’s the number of a string so we can say what the name where the name starts in the string pool the other one is a pointer within the hash table to other things that belong to the same list that we take the letter sequence and convert it to a small number and then we everything that get that has the same value of that number would get linked together so the method actually used for hashing is a is the call the method of coalescing lists and with a with the refinement that my student Jeff bidder wrote a thesis about so that it’s we actually choose this hash value that we can we take every sequence and we we convert it to its so-called hash code which is a which is a number somewhere about the first eighty-five percent of this table and and the first things that don’t that that when we have two things going into the same place the first ones go into the last fifteen percent and start filling up the table until finally the overflows I come into the main table itself and Jeff thesis sure that eighty-five percent is the right number to get the best efficiency out of this method you can expect that these lists on the average are going to be of length something like 1.7 or some some small numbers so they’re very rarely you actually have a long number of things to check those are details about hashing that are in another book I wrote so so anyway that’s a hash method but the hash table then an array called hash starts at location at the beginning of region 2 here and ends at the end of region 2 it’s only used for that part of the table now these equivalence then where we keep control sequences so let’s recapitulate what I said suppose somebody says def a you know and he puts be here okay it’s fairly simple case so what’s going to happen tech is going to look to to see this backslash here it says all escape character it’s going to be a control sequence and it looks at the d e f and then it sees a non letter and so it says okay d e f is the thing I’m supposed to look for to control sequence made up of more than one letter so it computes a hash code for d e f say the hash code is 300 it looks at position three hundred and points to a string and says does that string number three string indicated by this is that d e f whoops no it isn’t 1 d e f ok look at the at the link here and we’ll look at another place in the hash table that points us to another string oh yes that is BEF ok what so we found the EF at this part of the of the EQ Phoebe what does it mean so we look up eq type of gdf it says that’s defined that means you’re defining a control sequence and so you found d e f that it means define ok the define routine looks at the next token without expanding it as a macro and and what happens is he’s an escape sequence it sees a letter so it says all maybe there’s going to be another letter will have another control sequence of more than one letter no it’s not a letter it’s a code one it’s a left brace kind of a token so this is a one letter control sequence special case of one character control sequence so we look up here we don’t go through hashing on this one we pick it up out of eq TB and we look up here what’s its eq type but we ok but we’re going to define it so what no matter what is he cute type was if it could have been deaf also it could have said define whatever its eq type was we’re going to change it and make a new definition of it we’re going to put in this place in eq TB call which is the command code eq type for a ordinary defined control sequence and then we’re going to run a little program that’s called scant oaks that reads the left brace goes to the end build a token list for this thing puts it in a big memory array and then the equiv is going to point to that to that token list and we’ll have a call in here now but what about what used to be in there before I put the call on there a was defined to something else I’ve got to do something with that because it’s got to come back again later on when I get to the end of the group okay so that’s that’s the last link in our chain here and there’s another of table of importance called the save stack this is where we save things that have to be restored later so the same stack is is full of memory words that are going to be put back on the on EQT be later let’s the put let’s see now back /a is in location well if you work it out it’s 129 + 6 + 97 or that well whatever let’s say might as well get it right 226 i think in location 226 is going to be back /a and so it’s going to be represented as location to 26 the new community now when I’m redefining that there was a previous meaning to back /a I don’t care what it was so let’s call it star because it started to be just whatever its previous meaning was when i define it again this previous meaning is going to be wiped out except that if i’m inside of a group the end of the group previous meeting is supposed to come back so what I’ll do is in the save stack maybe the save stack already has some stuff in it here I’m going to put the old thing that used to be for a and then on the next word under it I’ll put 226 indication that 226 has to be fixed up when I get to the end of the group okay how save stack contains a lot of 2 word items like this and the 226 also has a type quote associated with it saying fix up to 26 okay something like that now usually when I’m doing this this control sequence didn’t have any meaning on it was just an undefined control sequence outside in the other block and so I’m wasting space on the safe stack so there’s a special case it says easy fix up or whatever it’s called in the case that this star star was the was the most common one of undefined control sequence an easy fix up it’ll only saw one word and assume that that that the previous one was ed picked up that’s a technicality it’s conceptually it’s the same idea it’s just a small refinement to it to save space so what we put in and when we change something on the EQ table we put an entry on the save stack and this would be the address in eq PB and if the address is one of these high numbers we know that we’re that the thing we’re kept the restore is going to be an integer or a scale or a skip for a half word or something by the range of this address we can tell what what kind of a fix-up we’re going to be doing if it’s a small number we have to look at the EQ type to see what kind of a fix-up we’re doing but it was a large number we know it’s an integer yeah God please use the mic how do you know how much to restore if you have nested the braces down to several levels yeah you don’t restore be yeah a good question see we have to we have to store more information you pointed out that my data structure is incomplete but fortunately there’s a room for another quarter word here because i had a half word in a quarter word white you know yes i have another thing called eq level which is the level at which thing was defined this is important because if i didn’t have that I wouldn’t be able to do it right so so now besides this I have a level and this is the level that which the definition was made and there’s an this number is a little quicker you have to work it out in fact in the first version of tech I think about 6 bugs were in that section because I kept learning more and more about this scheme as I each time I thought I finally understood it completely no I think I do but I’m not sure I can explain it to you in five minutes but there is a level at which at which the definition is made and so level goes in with it and if this level turned out to be the same as the current level and I don’t even bother putting anything on the safe stack because the previous definition because something already was saved from an outer level so suppose for example i’m on level 3 inside of three pairs of braces and and i have to depths of a when i did the first step of a then i save something for fixing up later I did the second F of a I just throw away the first death I don’t have to star away first step if I start it away it would take up space and I would undo it and then I would undo undo that again so it’s like you know so I no point in wasting the space and doing extra work so I just throw it away at the time if it’s the same level but if it’s about but if it’s at a different level actually I must be at a higher level because it will make it because it is going to be a property of this whole method that every level on the Save stack is less than or equal to my current level I think strictly less than my current level I’ll never save anything and when I restore the save stack I’m going to when I get to the end of a group i’m going to pop back until i get to the right place in the saves tab so besides fix up words in the saves deck i might have another fix upward here and then i’ll have a pound ree type of a word here okay a boundary type of a word that says this is the this is where I began a new group there will be a boundary word up here to beginning of a new group okay so when i get to the to a right brace at the end of a group like run back and I do all the fix-ups until I get back to the preceding boundary and now everything is restored to the way it was when I entered that group you get the general idea of this of this method so last input so it when i get to the beginning of a when i get to the start of a group suppose somebody writes h box and then left praise okay I’m beginning a group so everything in here when I get to the right brace it’s going to tell me it you know then I’m going to go back and restore all the definition things that were made in this group and that’ll be indicated on this boundary here now I need a few more pieces in my data structure and you’re probably going to discover this yes no okay now the way the old version of tech this is a big change between the old version taking the new one in why the new one can be a lot better are you in the old way was that I kept in this boundary word here what kind of a group was starting there for example in this H box case this would say beginning of an H box but I didn’t remember that what kind of a group i was i was in instead i would go through all my fix-ups until I got back to Bondi weirdness and it would say oh that was an H box group that you were doing so I was a right brace came along but but at this point I had forgotten really what kind of a thing I was working on and but I knew it was somewhere in the safe stack and so I just went back to it and then it told me I yes it was a it was an H box group and so then I go and finish the H box for the for tech 82 this would this would be very bad because if we wanted to get rid of H box power and things so this was a V box and if I would destroy all this stuff here I’ll be about to make a paragraph but I just erased the baseline skip that I was supposed to use for that paragraph so I so I want to know when I have this right brace that I’m you know vbox and still have to finish that before I do the before i go doing these fix-ups so the so the new data structure stories in store I have a keep track of what kind of group i’m in it’s called curve group the kind of group i’m in and then in the boundary word i don’t store the kind of group that was starting there but the kind of group i interrupted so this boundary word is going to be a pointer to the previous boundary word and it tells what kind of boundary word that was therefore when i get to the end of a boundary when i get to the right brace i look at kerr group and says oh yes i was finishing this such i’m finishing such and such a kind of group and if I want to I can do these fix-ups as soon as I end at the right time I can do the fig steps and I finally do the fix-ups then I then I said curve group to the old cur group that was that was stored there and resume as I as I as I had so the same saving and restoring equivalence gets a bit subtle and it’s all described in let’s see part 19 of the program starting at module 2 51 so though so the problem of saving and restoring equivalents in that part of the data structure is discuss there and it’s one of the most subtle parts of I’ll take the program is actually short it covers modules 251 up to up to 2 69 sorts itself it’s a rather small amount of code but the code itself had to be just right in order to do all these features I haven’t described what brought about global definitions yet and that’s the that’s the other interesting thing about this which you had on top of everything else I said what happens when you said global orgy def of a and in that case you you don’t bother saving the old value because you’re never going to want to put it back again and you set the EQ level 21 and if you look carefully at what this does and you have to consider all the cases where you have a say 10 definitions and some of them were local and some we’re global and see exactly what gets stored and what didn’t you see that it does affect work so so that’s the the other catch is that when something is globe and I try to fix it up I’m coming through and right brace and I try to fix up something and I look and see that the the thing that’s in there is at level 1 that means it was defined global in the block so i don’t really fix it up at all and the yep dan is there any other use for eq level besides that oh no well it’s important for garbage collection because if i didn’t know that then i wouldn’t throw away things at the wrong time at the right time there I would lose my last reference to a token list so on so that’s why i have added an extra quarter word array called x eq level here to regions five and six so that I can have the level information for these two guys which didn’t have an extra quarter word so regions five and six is extra arrays of one by each for these guys saying what their level is without that things would go wrong at one point I thought I could get by with it but then and font and found out later that I couldn’t sew it so I needed all of the veneto all the structure of levels and boundary and and so on it’s all it’s all wrapped up and if any one of those things goes the whole thing falls in some cases even though it would might work for a few months the this is the way I’ll know this this kind of thing can can actually work on all the simple examples and then it will fail on the other so now I believe it’s been thought through with perfect rigor but it’s a it’s a it was one of the subtle parts of the program worth worth studying I think for for software people ok that’s the then representation of control sequences and equivalents and the subtle parts about it were related really to the garbage collection issues that I just hinted at namely there’s a reference count on each token list so suppose I’ve saved something on the stack it’s a suppose I’ve saved this and this was a pointer to some place in memory the old definition of a all right so that token list has a reference count associated with it now I did a death of a well then later on in this block i did a gif of a so at that point i wipe out this definition i set to level 2 1 but later but this fix up is still there in save stack i’m not going to go back looking through the safe stack to see if I had any fix-ups for the thing so then when I finally get to the right place I come through i see fix up to 26 I look and see all the levels been one so there’s been a GF to this guy I’m not going to fix it up I have to remember to decrease this this reference count on this entry that I’m not going to use I don’t just ignore this one I I decrease its reference com that was one of the latter bugs today I caught that I have been losing losing my memory at one point because of this yes I’m forgetting all these things are losing space in my mouth okay all I wanted to make a point about this a little bit that at level one at the outermost level you don’t have to save anything because there’s never going to be a right brace getting you out of that level and so you so so you save yourself it that’s why people have had the phenomenon that the tech will work fine on a program then they put left grace and right grace at the around everything and all of a sudden it runs out because save stack overflowed the reason is that when you when you’re in level two when you’re you know when you’re inside of braces and I’m starting to having to save everything and in the old tech it saved every time you change the font every time you change current fund it put something on the Save stack and pretty soon if you had a long enough program the save stack would have lots of fix up font fix up fonts in there well now it not only I think I don’t think it will that will change the font anymore because I have level on the font I didn’t used to but I have but but it still will will use memory more efficiently if you don’t have unnecessary unnecessary of braces here okay now some other questions yes if if you have a definition in the outer level and it will I can’t hear you so I sir if you had a definition in the other level so I back /a then it would have level one and then yeah levels level one is what inside the program when it’s level one is the technical term used for the outer level because level zero is the is the is is completely undefined and and so we considered an undefined control sequence was defined at level 0 that’s a technical point when I send an error message to a user I say he’s at level zero when he’s when I think he’s at what I call level 1 because he hasn’t he’s not really inside of any of any braces so for example at suppose you write this tech program left brace end I see the left brace I set curr level that’s one of my parameters one of my global variables is curl level when I see a left brace her level goes up from level 1 to level 2 and I see end and I checked his curl eville greater than level 1 and I say yes it said low its level to this is looks strange so I tell the user end occurred at level 1 it doesn’t use the word level though it’s a chain to fix the error message so it’s a little more makes a little more sense but yet it says one instead of two even though my my date my the number in inside of tech is too I reported to the user as one if you if you want to understand what I’m talking about it’s all much clearer if you look it up in the code and end occurred that’s module 1203 but when you’re just finishing tech one of the last things it does is it gives this message and it said ND end occurred inside a group at level and then it prints out kerr level- level one if you can think of a more informative error message than that that seemed to be one that my wife understood it’s my test excuse me then you have it yeah does it work yeah what what happens if on the other level you get of a / dep you have a deaf / a second level you redefine it and the third level ug def it okay yeah let’s look at this case just to see what happened so hit then here I said G def babe okay yeah so on the outer level i defined a nothing goes in to save stack because its previous definition was i mean i’m still at level 1 so nothing is in the save stack here there’s no way to take this one away you’ll never end at a group so I never put anything on a safe stack for that here I put something on the Save stack and so this will be then what was it two hundred and whatever 21 to 26 okay I’ll say fix up to 26 let’s call this um ok alpha beta and gamma okay so I so after I do this definition I speak alpha in here when I get to this definition it’ll say fix up to 26 alpha and beta gets stuck stuck in here when I get to this definition oh and also this will say level two okay and this will say 1 alpha now I get to this definition this definition I’m at level 3 and so it it’s not equal to this level so I put it on the Save stack to beta fix up to 26 and put in here three gamma 1 gamma Chi def one go okay get to the end of this group what why even on the sales taxes was a GF you’re right I didn’t put this on so there was a GF didn’t yeah that’s right yeah I didn’t put it on I just wiped it out increase the reference count to beta then okay then when I foot going to get to the right brace here I I look back and I see it’s going to know nothing happens I get to the next right brace I finishes this group and that that’ll tell me to fix up this definition here but I look in position to 26 and it’s got a global in it and so that’s so it doesn’t get fixed up now on the other hand if somebody had beat me to death of a at this point then it would have something something at level 2 and I would and I would go back and and that would have wiped out the GF at this point is Steve so it’s the last survived the last GF survives but but you can be Kennedy def can always be can cancel by a local death and this one would this one would store away this guy gamma and then that gamma would be restored and when I finally got in and so finally when I got to this one it was still not it would still not replace gamma by alpha he’s that’s what I meant by saying try it and you’ll see that it works in rosa and besides there’s a an informal rendering of the proper invariant relation that gives the basis for the mathematical proof that it works stated in that section there what was it 250 something or other i believe somewhere I I at least hinted at what was the right notion of it let’s see you save level pompous and then well I can’t find it just now well you got 265 if at least one global definition of eq TV has been carried out within the group that just ended the last such definition will therefore survive and that if you translate that into quantifiers and logic and everything would be the right thing to put in as in a formal proof of correctness of this algorithm there’s a lit there’s another complication that I didn’t mention of course with respect to if and and but when you say if then the level number doesn’t actually go up when when you pass this left brace and so when you get to the right place afterwards you just take out the boundary word and shift everything down you have to read the code but you’ll see that it works ok any more question well tomorrow morning then we begin the class at nine thirty and we begin breakfast at about nine o’clock so come early enough and have some have some tea Oh start at the beginning I would say and again dentist I’ll start the beginning read some of these comments and familiarize yourself with use of the index and then set yourself some problem or other in other words something say I wonder how he does that now the way you can do that that’s a good question if you take any one of text primitives like death or something like that you can look it up in the index under that name and it’ll say def primitive it will refer you to the place where it was put in the hash table and then that will refer you to what command code it has and you might be able to trace through looking at the index the whole seek the whole history of death how it comes to attack but isn’t anyway give yourself if you don’t like that problem give yourself some other little task so you know saying I wonder what this half what this doesn’t and that’ll just give you a reason for perusing the index and and finding a way through the report because the main thing to do is just to get a little familiar with the notation and and mess up the page they get the page a little a little black on the edges yeah

Session 4