Of the videos, this is the fifth session.
The following is the comment I had left on YouTube:
Syntax (TeX's syntactic routines: parts 21 to 27 of TeX: The Program) The module (now called section) numbers are around 40 higher in the present program than being talked about in this lecture. 12:00 TeX takes about 100 machine instructions per character of text. 23:00 What was called |get_nc_token| in this lecture is called |get_x_token| in the current TeX program. (Also |cs_ptr| is now |cur_cs|, |ch_code| is now |cat_code|.) At 37:00 there's an interesting sidenote on why TeX uses language with "I" and "you" for talking to the user. 58:00 Mentions how the 72.27 points per inch came about. And finally, ends with a question (~59:45) in response to which DEK explains/defends his conventions of labels and GOTOs. (Useful context: his "Structured programming with go to statements" article from 1974.)
The transcript below is autogenerated from YouTube, and needs to be cleaned up.and we I guess I better start this morning by recapitulating a little bit where we are but the name theme for today in the four lectures will be the the guts of tech the inner parts that really do most of the processing inside and tomorrow’s team is going to be the things that are going to be a primary concern to people in this audience how they what they really want to know about the the things that is going to be slightly different from one place to another so to remind you of this of the structure of tech itself somebody asked yesterday could they have a copy of this diagram and so I made up Xerox copies will pass these out now start to talk them going in two directions oh you can get feeling for where we are now this is a remember that I said this part was the code and each box represents about ten modules in your listing so when we were talking yesterday about the dynamic memory in the last hour this represents the amount of code there was for the dynamic memory but the dynamic memory itself was taking a lot more space that we that might might overlap this whole diagram by at least half of it but maybe maybe all of it for the dynamic memory itself then they eat the EQ TB the equivalence table was is what I call locals here this is the part of the this is the part of the program that refers to saving and restoring the equivalence and defining all the macros that we had for equivalent on in the first lecture that we’re having this morning I want to study this part syntax which is the part that of tech that reads the input and figures out what to do next then next hour we’ll be talking about semantics which is doing it okay any questions on what we’re about to do so I’ll then we’ll go back in and take a look at that listing though certainly I can’t cover all of the all of the ideas of syntax in one hour but what I want to do is give you the feeling for for how its organized so that if you have any particular question on it you know exactly where to look and the sections on syntax start out with part 21 module 277 introduction to the syntactic routines and then that carries on through part 27 building token lists at the basic scanning subroutines part 26 system is the bulk of the thing that’s of that’s about 55 modules worth in that in that section and I’ll probably talk a lot about basics gained 15 since I can’t cover everything the approach I usually take is trying to do a sample a random sample of study will study the whole thing lightly and one thing in depth and then there everybody gets some some idea not sure which one to study in depth and maybe I can take some suggestions from the audience as we get into it but first I should we should start out just with the basic idea of syntax what we have to do the goal of syntax is to deliver is to scan the input and deliver the the necessary information to the semantic routines and at the at the level the semantic routine wants it it usually wants to get a command and the modifier to the command which is which is called sure o Lord sure sometimes is a pointer not really a character so we have a command code and a character part and then there’s also CS pointer goes with it in case the thing that you fetched from the input was a control sequence then the command will will maybe be I’m sorry the cs pointer will point to whatever control sequence actually gave this had this command ensure so that in an error message for example you you’d like to use the the control sequence that actually caused this thing somebody in tech can say let back /a equals death or something like this when off i have a or what if you’ve decided to redefine text primitive called death and the user is using some other name for it then you have an error message you would like to use the the new name in the error message rather than the old one so CS pointer is another thing you get from the the input routine so that the thing that delivers the next token is called get next subroutine get next and its job is to get the next token of input now might sound like an easy job after all you just advance your pointer one in the buffer and there it is right however get next turns out to have a lot of different things that it might be having to get for example it has it might have to expand a macro it might have to be reading an argument or macro it might have to recognize the control sequence whether it’s defined or not and look it up in the hash table it might have it might find that in order to get the next thing actually we have to prompt for it because there because it hasn’t been given to us yet might run to the end of a file and and or you know so if we get to the end of the file i means that we have to close out their file do some processing all kinds of things can actually happen when you say get next historically in fact get next was the routine that started this whole web project going we took the we took the get next procedure out of texts 78 and we we said well how would how would we really want to present this this code and we and we played around with Lewis and I tried several drafts until we until we had played around with with the till we got through what we thought would be a good way to present the get next routine and in that led of it that to the doc system of two years ago and eventually to web so it was a it was a had enough structure in sub parts to it that it that it was really our first test case for the whole idea of this style of documentation the get next routine is is it begins in module 317 and the whole the structure of it is is given there so I’ll get next it by itself looks looks very simple let’s see if we can if we can get if we can focus on this can you can you read that with your TV cam with your camera let’s try to get see we can set up the way we had yesterday yet oh ok you need enough light nor it on my podium in order to read the in order to see where my finger is within the people on the screen tonight could see where I’m pointing if they if they project onto the screen what’s going on the monitor they work yesterday would might take a little bit of time figuring through to do it here now I got something on the screen here and I’m pointing to get next can oh you can see a finger yeah well that about all okay somebody said that they could shut the door back there and it would make to like that would make it easier to read the screen okay so the whole idea of get next is is is rather simple then we have a couple of labels in here restart if we have if for some reason what we tried to get resulted in nothing so we have to have to start again then there’s some other some other switches that we go to but the the code is forget next is very short except that we involves a couple of submodels so we set CS pointer 20 CS pointer is going to be 0 unless there was a control sequence cotton that’s why the EQ TB starts at location 1 instead of 0 because the first control sequence actually is his ones now then state is one of the variables that we have set up for us all the time at our fingertips to tell us what kind of what kind of input we’re doing if the state is token list that means that we’re getting we’re getting something not from the user source file but from from inside of tech from some token list that that we have to read through and get the next symbol off that token list the token list can be about how about 10 different kind site the for example it could be an output routine it could be a macro could be every Power this feature that comes up at the beginning of a paragraph it could be the parameter to a macro could be something that we inserted just to back up we reread we read some token we decided we weren’t ready for it now so we put it back and we’ve been into a small token list and so that we can read it again it might be something now like we converted a number to Roman numerals and we want to read that so anyway token so token this is one way we can get input but if it’s not token list if the state is not token list then we get input from an external file and we would go to restart if there wasn’t anything found otherwise we course one an input from a token list and we have other cases where we might have to go to restart because there was nothing there yet so two main branches of gay next for the suitcases then if we’ve got to the end of an alignment entry then we have to do something special for alignment here’s one of the places where this post hypnotic suggestion comes in with alignment saying that for example if you if you just now got to a tab mark in alignment this is the time to omit the thing that’s supposed to finish off that column and so you do a little bit of special calculation for alignment if you just happen to notice that it gets triggered right here in the middle get next other get next is part of the inner loop of the code certainly when we move the first part of it here this restart CS pointer to zero and so on and I talked a little bit about inner loop yesterday now if you want to find out exactly where I think most of the time is spent in tech you look in the index and under inner loop and it refers you to about 20 modules which I believe are the ones that that account for almost all of the processing time in a sense that the other ones are probably executed in order magnitude less and not all of these modules are executed very often but I so we try to write the program so that the frequent parts are done and rather fast and and I think my estimate is something like a hundred machine instructions per character of text is is approximately right I have to double check it again now that I’ve got the whole tech put together but that was if you consider what would happen if you took a tech file added one more character to one of the paragraphs how many more machine instructions instructions would the CPU have to do and it was on the order of a hundred i believe and that and we try to keep that going fast so we want we won’t get next in particular to to be reasonably reasonably fast and if we’re inputting from an external file we call to module not 319 andhra 319 is a switch that we’re reading from an external file now and so our state actually the state consists of a number of different things five or six different quantities are actually correspond to the to the current input state and one of them is called state at which we looked at you see if it was token this or not another one is called loke which is the location where we are in the buffer another one is called limit which is the end of that line in the buffer in the buffer we might have lots of different lines we might have part of a line from a terminal we have we’ve been then we were reading some file and in the middle of that line it said input from another file and so so we might have several different things in the buffer but the in limit tells where that where the current line ends and if it’s a local less than or equal limit and current line is not yet finished and so we have to do the thing that we’re almost always doing now when the look is greater than limit though then we have an unusual case not in the inner loop and a theist move to the next line of file or go to restart if there is no next line of the file and so we do that and that’s a time when we can also check an interrupt there is a provision if you can do it in your system for interrupts to / interrupting tech and at safe times to make an interrupt I’ll say check interrupt there aren’t many places where I do it but this is this is a fairly good place at the end of a line of in the buffer so any if you at the end of a token list is another is another place and this happens fast enough that if somebody sitting there and wants to stop the tech run or do something special for you can probably catch it in one of those places now and the main case though the one that that’s that we want to be at high speed is this first line here where loka’s less than or equal limit because that’s the case when they were most interested in getting to go fast okay so then we said current char kirche are to buffer position buffer loke so as a is an array of ascii code characters this is text internal code and all the external file has been converted to that and sits in the buffer next in internal code form increase location and then we go to set up the current command which is the check code of the Kerch are now to code of kerch are is really this is a macro that’s an abbreviation for get out of the eqt be the current value of the of the of the code for that character we have made sure elsewhere that this that this is going to be in range nobody’s allowed to store into that part of eqt be a number that’s that’s that couldn’t possibly be at your code of the character I think the only illegal ones are 0 to 15 so this this this is known to be current command is now known to be a number between 0 and 15 and then we have the big thing that changes state and go to switch if the current character should be ignored sewn on the tech manual I have a chapter that sort that sort of says what this states are and Howard how we’re actually reading so we have a state that says we’re skipping blanks if we’re in skipping blank state then if a line comes next we skip it and so on there’s there’s three states actually midline new line and skipping blanks and we want this to be fast so that’s module 320 let’s take a look at what that is also module 320 is a big case statement and it’s done by case of state plus kirkham and so the state is now encoded as a number that is 117 or 33 i believe it is so that the current command being between 0 and 15 added to the state will take us to a unique location no sense multiplying here we’re in the inner loop we might as well define the state code so that already includes the multiplication by 16 and and so then we can have a state of various states like well here’s an example midline + spacer midline + carriage return skip blanks plus carriage return any state plus end line any state plus is a is a macro that’s defined 22 just defined over here where was it here and you state laws anybody and any state plus midline so that that adds midline plus the thing skip length plus the same new line listing so anyway this is the way we recognize it’s a it’s a 16 it’s a 48 way switch that we’re at high speed decide what to do next okay if for example we get any state plus escape we have to scan a control sequence and then set state to skip blanks because we’re going to skip a blank after control sequence that’s the way this this part of the code looks and the the number of machine instructions actually needed for this is supposed to be rather small of you and if your compiler really does terrible things with case statements then you might want to take take the few places in the inner loop where there’s a case statement and and and do something special to the two that part of the program that as a refinement to but it will make tech run faster if you can make this part run five percent faster your tech will probably run four percent faster so something like that I think is is worth pointing out now the if you have a token list that starts at module 333 and so the token list again we have state state information besides the state the thing variable called state itself will equal token list but then look in this case will be a pointer to what token is next in the end amendment in the memory array and if that’s null then we might you know we’re at the end of the tokens but if it’s not know then we r affect out of the token the command and char and CS finer just as if we had scanned it out of the put and otherwise if it’s null then again we have one of these things where we where we can end a token list I thought that I I guess n token list itself is where the is where it would check for an inner of ok and I am surprised i don’t see a pause / interrupt in there but I I suppose it’s in n token list itself I don’t want to take time to look it up now one thing I wanted to mention about this is that you’ll notice then control sequence never it doesn’t live in mem by its alphabetic name the only control sequence is certain is represented as as the as an address in eq TV in a token and so are the only time you’re ever you’re ever looking at adas control sequence to figure out what its hash address is is when that is when that appeared in the file only when the only time you are scanning it you have to compute a hash code to figure out where the e key to be address is in the first case of get next where you saw and escape the limiter in the file so if you have a long or short macro name it doesn’t make any difference when you’re actually using the macro it’s all along that name was it only took a little longer if for a long name at the first when you input the thing into into a token list in the first place but once once you’ve got a control sequence into a token list it’s just like any other control sequence no matter how long its its actual external name was okay so all of these modules handle then the there’s weird cases of get next where we have to do things like a person has said pause set two nonzero value and in that case that means that after you’ve read a new line out of your file are you supposed to displayed on the terminal give the user a chance to see what’s going on and maybe make a change to it before tech actually gets it you and in my change file the only thing I had to change and get next was to make it more friendly on our particular editor at sale which is page oriented editor so we have not only page numbers line numbers but we also have page numbers so I so I had to make sure that that the page number would get adjusted when I passed a page mark in the file also I spent but most of an afternoon trying to get it to work with our our line editor which has a special feature system that allows a user to change the line as as it comes in and edit it with them with some rather powerful editing commands us that was searched for characters and insert characters in the line and so on and I and I made made the this part of the probe of the get next routine talk to that system routine so it’s certainly not standard in Pascal so i had to do those two things in the change file for forget next now that now almost all uses of get next are covered in the next part of the program there they are starting at three thirty eight and the most common way to call get next is is get NC token if you look at the rest of tech you’ll find and you find out where is get next actually use you think well this would be one of the most common subroutines to call because you need to get the next token a lot in a lot of different places and continue on advancing to do something but if you look in the index under get next for the uses of it you’ll find that it’s only only used in about four places but those but it’s used in get NC token and and most of the other times in tech when we when we want the next thing we actually get we actually call this subroutine instead of the other one the only exception to that is well let me sir the only place where where the domain audio tech itself calls get next is in the inner loop where we want to avoid an extra level a procedure overhead of get NC token and so in the very inner loop where you’re scanning a word out of a paragraph and looking for ligatures and things like this where you expect that the next letter is probably the next thing is just going to be a letter then you call get next in order to avoid getting antsy token and then if you look at it and it looks like it’s something that’s more complicated then I will call a subroutine and leave the inner loop so this inner loop has has a fee the idea of making it efficient has has affected the according to a smoke to a certain extent makes it a little harder to read but only I think only one only one percent harder so it was okay now get MC token is like get next but it does it does more it n c stands for non call and called is a is a macro that’s that our you know that has to be expanded so get NC token means that if if that it’s get next but if it turned out the next thing was a macro expanded and get me the next thing that isn’t a macro so this is the so when i call get NC token this says let me see whatever is next but be sure that it’s doesn’t include any control it’s not something that’s been depth furthermore get em seat open sets besides command and char in fact it’s called kirkham and and kirche are other variables that get next sent it says curt oak which is the which is a token half word representation of the command and char and CS pointer so that you can you can have one variable that that stands for all of the others pat in a packed form this is this is convenient if you want to store it away and another token list or something like that you you can use it or if you have to back up it’s all ready to be backed up so so I get NC token gives you the values of curcuma and perch our CS pointer and kurt oak which and this is enough by itself to deduce the other three get NC token is a is the will also expand not only macros but it will it will expand marks like top mark or something like that if you if you want to switch because they’re somewhat analogous to macros if you if the next token is an undefined control sequence’ get NC token issues you the error message saying undefined control sequence because this is where it would it would not let let it get through when you call good NC token you don’t have to check afterwards that the thing was an undefined control sequence cadenza token will will already remove all undefined control sequences from you and get the next thing that’s really legitimate there’s another routine called NC token I might just mention because it’s used only in the one place in the inner loop where in the inner loop we had a place where we expected that that it was unlikely we’d have a macro coming next and so we called get next I was going to save us a the level of procedure call that would have said call this person then that would call get nexus be little faster but if we found out that that our assumption was wrong that really get that really there was a macro there then we call NC token which is exactly the same as get in see token except that except somehow get NZ token is equivalent to get next followed by NC token the other main way to call get next is called get token and this one does not expand macros it just gives it to you with us oliso the token that you get you get you get your kirkham and you’ll get your courage are in your CS finer and you get cur token also but if but it it’s possible that Kirk command would be a call or long call for calling a long macro or you know or you course something like a mark top mark or an undefined control sequence it’s certainly possible that get token will return you the name of a of an undefined control sequence and this is the thing that you use at times when you’re building a definition for example after the word death and then if we say def whatever follows it you’d say get token you certainly wouldn’t want to say good NC token that would replace a thing by his previous definition or give you an error if it wasn’t already defined would be a big big tragic mistake so there are places where we don’t where we definitely want to suppress macro expansion and that’s what get token does ok so the those routines get token get NC token are they’re just they’re rather simple routine sub of module 3 39 is get token just to reinforce it let’s let let’s show you what get token is this module 3 39 and it says sets kirkham and kercher and kurt oak I goofed I said Kurt token on the board here I meant curto and okay there’s a no new control sequence is set to false then get next and set no new control sequence true the in the middle to get next routine that there is a part of it that would have to look that might have to go and the hash table and and this end and add a new undefined control sequence to the hash table now we don’t want to put misspelled things in the hash table they turn out to be an error so generally we have a variable no new control sequence that says new control sequences aren’t allowed this is the only place where we said it false and we said it true afterwards now this is a this violates prevailing wisdom about programming which says that you shouldn’t that global variables are considered harmful and that we should pass such things as parameters but my comments about the inner loop should indicate why I thought why I feel that this is actually as long as we’re using an in disciplined way we know what we’re doing this is a valid way to save a lot of time by not saying get next of some parameter that says whether or not a new control sequence is allowed or not and gate next would have to propagate that Dantas to the hash routine and a bunch of other things setting up the setting up a parameter every time we call that subroutine would make tech run a lot slower all right so get next this will then allow a new control sequence to appear I guess I made a little lie when I said that’s the only place this is said false there’s one other place and that’s in any tech when you’re first loading the when you’re first loading the hash table okay then if CS pointer is 0 this means there was no cos not a control sequence found then we do this part otherwise we said current token equal to the cs token flag plus CS pointer so Kurt okies is properly set up and it’s all as to it if we did it but it was 0 then there’s a a fatal error in here if the current command is NV NV occurs in alignment and this would actually be something I’m not even sure can ever happen but probably it will it will theory but I but if it would happen I wouldn’t have any how to recover from it and this would mean that somebody is is a calling macro without expansion at a time when it’s also possible to to be finishing a column in an alignment and the well if you think about it a while like I think there’s nothing else to do except give a fatal error and I will see what happens if there is ever guess maybe if that turns out there is a way to recover in a in a reasonably common case and of course we would change this part of the code so i’ll make the check here because if i didn’t and i love that i allowed that envy to get through then all kinds of things could could get screw up in the rest of tech so i wanted to trap it here even though this might be the inner loop and kerr talked then it said in this case to a pact version of the command and character as we mentioned yesterday that’s the way we represent a token all right those are the basic things that are the all the rest of tech relies on get token get NC token in order to look at at its input any questions on that okay how the the interesting part of syntax is the is the just a section called basic scanning subroutines and these are the things that do higher-level parsing out of texts source and well basic scanning routines start out with some that are actually pretty pretty dumb there’s one called scan left brace every once in a while tech gets to a point where it’s got to see a left brace or not or else it or else it’s stuck and so it called scan left brace and and this subroutine I think it’s probably used about 15 20 times this is module 3 63 and so it just you know this is well this is the typical way that we that we would call something in and says get the next non blank non-call token if the next thing is really a space then then then we won’t give an error message saying miss missing left brace because we will keep on going till we get something besides a space but then so this is a little a little thing that gets used very often notice of module 3 64 this Court is used in sections 363 rather often will say get NC token until / command is unequal spacer ok and then if Kirk command is not a left brace then we’re then we got this long arm a message the knifeman here it gives us a chance to show how those help messages and errors are a typical error is is is done if fatal errors are very unusual but but this kind of error where that where we give a help message is typical so take a look at it that we start out with it with print something on a new line in our error message our official error message here has starts with a ! missing life price inserted now for the help message is is then is given next before I call the error subroutine and and the number of lines of help is is is used here so are so that I could stick to simple web macros and so I have a help for for a four line help message and a help 343 line health message and so on the help message comes in a left brace was mandatory here so I put one in you might want to delete or insert some corrections so they’ll find a matching right brace soon if you’re confused by all this try typing I right brace now that’s that’s the best way to probably recover if they didn’t but if they had a missing left brace and I’m going to put one in it can also be a problem yet there isn’t a right brace somewhere else then I call back error back error is a is something that takes current token and puts it back so that it’ll be read again and so the next thing get token on get next we’ll see is the one that it that it just already saw and it backs up and also does the right things for backing up and then the current token is set to a left brace token and and we pretend that we’ve read it that’s in agreement with what we said here that a missing one was inserted somebody asked me the other day why I well the style of these health message is using I and you in these mess intent and I believe there are several reasons why it’s it turns out to be a win to have the computer talking to you as if it as if it understands I mean or it’ll make a statement like saying I don’t understand this or I do understand something when we all know that computer doesn’t really think well because the the main reason is that that you can communicate a lot more in a small space when you when you use the the English language the way it was designed to be used and english language has developed over thousands of years and and i think its most powerful at its discourses between between people and so when we win if we if we restrict ourselves to using third person all the time then we then were losing a great deal of power of it so so this so in fact I edged out all kinds of ways to write health messages and finally this one turned out to be by far the one that gave the most had the most effective I’m quite convinced to this and all the help messages have to be at most 60 characters long per line and I also like to make those lines break in somehow reason you know fairly for a nice way so it was a little bit of a challenge some time to take a 61 character work one and figure out what was the right word to substitute but when when writing these help messages i also tried to use words that were different from the ones in the official error message that was given earlier so that person has the error described in two different ways has a chance of understanding one of those two so so if people hear from foreign countries implementing tech there would like to to make tech more friendly there you might probably be better to have the help messages translated into your own native language our fuel from the east coast might want to translate it into East Coast English is on so I’m in MIT is another special case right I mean anyone who’s reading the help messages in emacs knows that they have in language of their own so now the anyway we try to make the help message in iu situation and when you see the new tech manual with the illustrations you also get an idea that we can sort of personify tech because there’s a character that peers in these illustrations who is going to be on all our t-shirts next year now that the so this is a simple simple use of the of the scanning and there’s a slightly more interesting one on the next page case you’re getting bored this one is used to scan a keyword when we’re looking for something like plus or minus or for after various words a bomb that that might be present in a it in a text and this one will we’ll look ahead to see if you’ve got if you match a keyword either with uppercase or possibly an uppercase version of the keyword it’s all done here whenever you’re looking for such a keyword this function scan keyword returns yes or no whether it found the key word or not if it found it then we pass it in the input and we were ready to read on if it didn’t find it we back up over the partial finds that we might have made and are positioned back at where we were before before we started okay now the most mountain but after that we get into the interesting things they’re really really important space expanding routines are the ones that recognize high-level constructs in in the input for example on if the time comes time for tech to look for an integer number then it tech can say scan it and it will look for an integer number in the next single in place now an integer number can be a an integer number like 12-1 followed by two but it could be an octal so it would mean starts if the next thing we see is a ’ then we start to say well will scan and rate in optimal tation in 12 will mean 10 it might start with backwards ’ which means we’re looking for a character constant but integer number might start with a minus sign of course then we then we have to have to have all these options again after we’ve decided to negate it in fact it or I can start with a plus sign which is sort of a Noah and we go past the plus sign and we can have a bunch of minus signs in fact a bunch of minus x and plus signs there well certain you you it’s important when you have a language that uses macros to allow minus minus to appear because otherwise the macros that would have to be very worried about about not allowing that to come through so minus minus it should sebile should be legal but besides all of these things and other integers can also be stored in text registers so it might say backslash count something or other and we’re supposed to recognize that and fetch the valid current value of the counter most complicated one is when when you save uh in tech 82 this is something that is in the pleasure tech but if you see if you want to give an integer you because you can start out with the word thug and then the will fetch low many things inside of text internal tables so though we’ll be able to fetch out for example on any one of the parameters and the hyphenation penalty or something like that the can be used to find out parameters that have been stored with a particular font and many other things now scan int then scans integer there’s also scanned which scans the dimension and will return the value of the mention all right now whether they wrote this it turned out to be better not to return the value as the value of a function but to put it in a global variable curv al and again here was a case that it took that my my original design was not to use this bubble forever but I found out later that I was getting much better reap program by by using a global variable for this purpose so they’re so curv al is if you call scan int the answer cruising appears in curv al now so skanda man is forgetting a dimension and there’s a scan skip or skin glue gosh I forgot what i called it scanned amends starts at four hundred and scan glue yes can glue at for 13 so these are three three high-level things the scan Blue routine is going to look for something that is a dimension and then possibly saying plus some other amount of stretch ability and then put or minus something about it well now all of these think three things can say the for example when you when tech comes to the point where it once the scan glue the glue might start with the word though it might be the baseline skip for example so if you want to if you want to do a V skip by the baseline skip you can say V skip the baseline skip when when tech is he starts its V skip instruction it’ll call scan glue to figure out how much it’s supposed to skip by so V skip can be followed by zup now let’s look at the syntax a little bit of a look at the syntax of you of glue what does it consist of well it consists of a dimension and then optionally plus sorry plus a dimension and optionally minus a dimension we know that in order to scan these words plus and minus we’re going to have that scan keyword subroutine and we got scanned them in to do this to do this thing well but now if you think about it domen starts with a the men like you know 10 points starts with scan int one of the dimensions can also be ’ 77-point you can give a dimensions in octal so it has to start with a scanning integer or you can say count five points some things like that so the men can start with an integer and further and glue can start with the demand and all of these can be the so when i call the here okay if it says the baseline skip that would substitute for all of this but if it says the bar unit that’s a dimension or the eh sighs what are these eyes say that’s a dimension that would leave us that would leave us only at the first part and we still have to look for plus again and the if it was if it was a count or some other or some other signature register like the the time or something like that all kinds of integers that confer who would like to be skipped by the time of day but why not then it has to be followed by then that’s only part of a dimension so after we finish so so we have the subroutines of tech called scans ah but it doesn’t know what what is going to wind up with it might wind up with it with glue the whole piece of glue it might wind up with a dimension it might wind up with just an integer so besides curv al where each contains the result of skin though we also have curv al level which tells what kind of a thing it found and so the curve l level will be set to either int val or demand valor glu val telling you what kind of a thing you got so you so after you called so you call a subroutine called skanda and looks ahead to see what what’s what’s following and afterwards if it found a baseline skipped fo piece of glue then curv al level will be will say glu val and curve Val will be a pointer to the specification of the glue that that was fun so you can figure out what you got that was the main unusual aspect of the design of these particular subroutines now you these things are also recursive because if you see what’s going on if you say if you’re trying to scan an integer integer starts with the word count then which counter are you doing you have to give a number between 0 and 255 and that’s an integer so you have so so scan is so so scanning an integer can ask can reduce to scanning integer or scanning zup i mean you can say count saw something rather and that would give you one of the counters and so the scan Val has to call scan int scan int has to call scanned up or NC and so on so these procedures are mutually recursive but they keep on getting further and further into the input so they don’t they don’t get into a loop record question yeah yeah it seems almost that what you’re asking it to do there is identical to what you’re asking it to do when it expands macros I’m a little curious about why you didn’t do it the same way in the in the get next routine itself yeah that simply have glue parameters in fact be macros and behave them expand token lists well the easy answer is I didn’t think of it but and but I haven’t got a simple and otherwise this is the way that occurred to me to do this yeah now the okay so so uh I’d like to look at in the remaining five minutes I’d like to look at one of these in a little more detail so you can get a little more feeling for it by take a suggestion from the audience who what do you think would be a good one to pursue just to get a feeling for what what what was totally involved here or or you don’t want to look at any of them what what linds tell me what no idea what to do okay well let’s take a look at at the details of scanning Adam and then scan demand starts on module four hundred and when we look at what actually has to be done now the job of scanned demand is going to be to put in kerrville a an integer in scaled points or in other words a scaled quantity and the units are two to the sixteenth to the minus 16 points and this is supposed to be something that all versions of tech 82 will will arrive at exactly the same number it’s a quite important to to design a language so that it’s machine independent in this way now there’s but it turns out there’s three parameters to this when you start looking at what’s can dimension has to has to handle and the first parameter is called mu because of of things in math mode when in math mode when you when you say m em skip or em Kern to give spacing in math mode you’re not supposed to say 33 points you’re supposed to say 3 mu which is free math unit Smith this is a variable unit that will change whether or not you’re in a subscript or something like that but but math units are not allowed outside of this outside of these contexts so skanda men take the first pounder meal which either true or false saying whether whether the units are supposed to be mu or not most of these the next parameter to scan to men is called int and this this says whether or not it will allow the units to be infinite there are three versions of infinity Phil Phil oh and Phil olo and if anybody knows a poem by ogden nash about a 1l llama is a priest to a llama is a beast and he bets something that they’ll never see a three-alarm on or something like that well we don’t have 4l fills in this in this language now now I’m but this this info parameter is to ask andaman is is true or false depending on whether or not such things are are legal and they’re only legal after plus or minus so usually that parameter is false and the third parameter is called shortcut and that says and that’s because when you’re scanning a dimension you might already have built up part of that dimension and you’re ready to go right to the two to the part that adds up the that that inserts the points or whatever and because you might have called though and it gave you an integer and then then and so you’re halfway through into your scanning dimension already so you want to want to pass by that first part of the routine so those are three parameters to scan them and most the time when I want to get a dimensional in one of my semantic routines I’ll call skanda men of false false false because all of those other cases two aren’t having arisen is no shortcut no mu and no infinite glue about okay now scan the man is I’m looking at module 401 and I have a fairly large procedure here but if you but but I think by reading it in this module form we get we boil it down to in this case a little more than 12 line but we can see exactly what all what all it has to do f is the integers the numerator of a fraction whose denominator is 2 to the 16th the big problem is kind of man is going to be the preservatives machinery repentance to make sure I do on my fixed-point arithmetic with numbers that would fit on 32-bit machine and also I have to check for arithmetic overflow that might occur then during these calculations the cur order this is a global variable that’s going to if I do find infinite blue I have to tell somebody whether it which kind of feel it was with with one two or three else so the cur order is going to be normal unless I found some special glue negative is it something I said false and that will flip back and forth if I find minus signs in front of the darn thing negative means should the answer be negated okay then if not shortcut then I have to do all this stuff which gets the integer part or the gets the essentially sets up a value for 4 in curv al containing containing the integer part of the dimension and f is the fraction part of the dimension by the time I finish this loop then I’ve got some integer plus a fraction over 2 to the 16th that’s got to be finished up with units supplied so let’s look at this first part which we do in case of a short get the next non blank non signé token set- appropriately and so this is really going to pass up minus signs and plus signs that might be at the beginning and blanks then we have to check if it’s love or register register is what comes up if you say count or two men or one of the internal things for 41 of Tech’s internal six 206 registers in that case then we have to do a special thing that either fetch is an internal integer or for an internal dimension if we find a whole dimension like if we find the V size then we get then we get to go to attach sign because we’ve got the whole thing in the right units but if we just get an integer and we continue on and we have to figure out this point source centimeters or whatever if we didn’t have the thought in that other case we back up the input rate get ready to read the token again and check to see whether we it was a decimal point or not and if it was a decimal point we have to set decimal radix and and can and and in other words the whole thing might have started with a decimal point you can say point one you don’t have to say 0.1 and if you start with a point then you’re getting into a base 10 thing if you look at this code you see what happens if a guy just says a decimal point and doesn’t say zero either before or after it but just a period and then PT or something like that and turns out that that’s 20 but I hope people don’t use that now okay that’s the way the thing goes and the on the next page we have one of the prettiest modules the way the way it gets formatted by we’ve else ifs I guess anyway this is where we’re just looking for all the different kinds of units that that can arise and and we have to convert inches two points by by calling a macro here set version which will takes two parameters a numerator and denominator says essentially multiplying by 7227 dividing by 100 to get you know inches are in two points this is my definition of of a point there’s I’ve searched for a standard official definition of what is a point and I found not found a pronouncement of the bureau of standards or anything like that the the reference sources give give more or less significant figures and the ones that that gave the most significant figures who are all consistent with this value so nice simple fraction I’m choosing it as a definition of of an inch to a point and I i want to emphasize again all this arithmetic has been carefully designed so that it will not it will round well and it will and it will not go over 32 bit calculations and give a give a decent answer ok any questions before we break so these are the routines like there is one yet this is a question mainly on the web and philosophy the get next shows that you use labels to find labels in one module that aren’t actually to click define until X number of modules later what you find this to be a problem in well I wish that yeah I wish that that was that that Pascal had had a better setup for those but actually what I did with the is is almost all the labels in the program are defined once and for all at the beginning and I probably have mentioned that in my first lecture in module 15 it tells about this I I have labels that that are used in in ways that are quite common in order to do things so exit is is generally it is generally just before the end of a subroutine and there’s done to get out of loops and found and things like this continue and these are used in in ways that are always the same as if I had if I had a programming language that support it and so these are this general a few idioms that are used in in the programming and there and the labels were defined once and for all here I only a few places will we have a special label like attached sign and in that case I if you notice the attached sign was defined in module but also also carried through in in the names of other modules that we’re going to go to attach sign it’s very important when you give a name of a module that if it if there’s some something funny about the control structure that gets into the name of the module that was one of the important things we learned about of formulating these rules but most of the things if we had reprogrammed this to be done with with very few go-to statements it would have turned out i believe the program would have been a lot less readable and lot slower just because of the way the programming languages are working and if you don’t believe me we should debate this after class contest but that’s but that is a firmly held belief of philosophy after doing a lots of lots of coding in this form I try to have these labels on module 15 as part of the style of programming as represented in this in this code but but for example exit I have a defined return to be go to exit it’s very and Pascal didn’t have a return statement so we just do that now that means that I have to remember when I use a return statement and procedure after remember to declare the label exit at the beginning and to put it there at the end it wasn’t much of a hassle to remember that though okay thanks a lot you see you again at eleven o’clock