Session 10

Of the videos, this is the tenth session. It is about TFM files, but ends abruptly.

This will get replaced with the video (if the JavaScript runs properly).

The transcript below is autogenerated from YouTube, and needs to be cleaned up.

that it the section of the tech 82 listing it’s like is part 20 starting in section 456 page 158 on your copies the important thing the important thing to understand here is is what kind of files Tech users in order to get all the information it needs about fonts and these are called TFM files tech font metric files and they are used not only by tech but by other system programs and the design was intended whoops to make them to make it a compact way to communicate lots of information these were designed by Lau Ramshaw who’s sitting here in the audience today and so he’s going to be bored by this deck by this hour but he can also correct me in case I make mistakes and so a TFM file consists of a long string of 8-bit bytes when we send you a TF m file we find that the easiest way to do it so that nobody could get confused by this is by sending these bytes in decimal with spaces between them so that device would say zero space 255 space or whatever in decimal everybody has been able to read those kinds of files then you can then there’s a two or three line program in Fortran or something which will convert those into whatever your local conventions are for files that are made out of 8-bit bytes packed file of 8-bit bytes but this is but that vic seems to vary greatly between systems as far as tech is concerned it just calls it a byte file which means a packed file of 8-bit and eight bits is a is our numbers between 0 and 255 and it has to be 0 to 255 not minus 128 2 plus 127 when we when we can when we refer to something from the byte file we’re going to get the next byte from it and it’s got to be a number between 0 and 255 you have to do something to make sure that that accept that assumption is valid but that’s the only assumption we make about bike files now actually you could also consider it as a 32-bit words because the number of bytes in the file is always a multiple of 4 and we always read it and we always read them almost always read them 4 bytes at a time oh it’s only at the beginning when we read them 2 bytes at a time and so if it’s if you wanted to you could you could imagine it as a assisting of these words but all of the but for tech purposes it was more it was it was better to stick to smaller number of file types if possible so we so Tech has alpha files byte files and word files we needed those three anyway didn’t want to go to a fourth one which was made out of 32-bit words from TFM file let’s think of it as bytes but 4 bytes to a word so I’m going to draw my lines here indicating 4 bytes it 4 bytes and I’m going to consider that the bytes are left to right located in a word so 0 by 0 the file byte 1 by 2 by 3 and so on this is what Lyle calls big endian order where the big end is that right you nodded your head this way it’s called big endian order but I didn’t invent the term Dan Cohen of bias I invented the term in a very amusing essay about why order inside of words ok because they go in different in different ways and and that’s why we transmit them as bytes this way but I’m going to represent them in on the board assuming that this is your first bite second bite third by fourth way that’s just a convention for the way I the way I write I write and now that the the first part of the file is six words long so it’s always fixed and it tells you how big the rest of the file is and it has twelve half words in it 12 16 bit integers that are composed of the first byte times 256 plus the second byte that’s the big endian order there so this is the the length and we’ve got a whole bunch of these things they’re all described in module 457 I’ll just read them to you God so you get the feeling of it the first comes the length of the entire file in words so the length of the header data so that breast is the neck the next parts of the file is the header and then there’s the char info done let me try to do it from memory and then I can correct it afterwards this would be more fun for me I don’t be confusing for you if I get it wrong but let’s hope I get right then comes the width then comes the heights and comes the death and comes the italic Corrections and comes the ligature current instructions then comes plan ahead here I’ve got to get you then come the currents and then come the extensible characters and finally the parameters that’s all just made it how did I get it right Wow okay turns yeah now so start out length of the file then the length of the header part then it then the next two half words here are the beginning character and ending character the front and characters don’t have to be consecutive inside there’s another way to indicate gaps inside but but this is more efficient if we can if you don’t have two characters in the front one of which is number zero and the other is which 255 because then we have more space in your the the char info gives character starts at BC and ends at EC and so if you so there will be zeros in here for missing characters but the fewer those are the fewer words of char info you need Chari & Co is packed as I’ll describe later giving for each character in the font information about its actual width height depth nutella those piece of information are indirect pointers to the width height depth and intelligent parts of the format file that come next so char info flowers here then you specify the number of widths a number of heights the number of depths the number of italic Corrections the length of your league a number of commands you have of ligatures currents the number of currents the number of extension parts the number of parameters so this first part just gives the length of all those and there’s a formula has to be satisfied that the length of the file has to be 6 plus length of the header plus you know and so on that that has to be checked and so you’ll know something is wrong if that equation doesn’t work out correctly now the beginning character and ending character are supposed to satisfy the relation BC minus 1 less than or equal you see less than or equal to 55 inside of the TEC 82 I’m not saying that BC has to be less than or equal EC you could have beginning character equals 1 and eat any character equals 0 that’s the empty font it’s not a particularly interesting font but the I just put it I just made sure that it would be legal because the history of mathematics indicated that it was useful to have 0 around and so sometimes a person might want to be able to input a null font but that does BC minus one really have to be greater than equals 0 I believe so what happens at BC is 0 no you’re right you see just has to be greater than equals 0 sorry yeah excuse me and that’s all I said to in in that section now the definition of metric data not only is in this section but it’s also in the other handout that you have called tecware and the program in there called tf2 PL is the is the I believe it’s got starts at page 200 in one it’s the second program in the in the book they it jumps from page 110 to page 201 TF de piel contains a separate description of you know just copied all the tech of this of these fonts again it’s it’s reproduced there but with a slight more generality because this isn’t intended for also for processors that are a little more general in test and so there will be a comment in in that section that’s not in the tech manual if you look at module 8 of TF 2 PL it’ll say an exception to these rules is planned for oriental fonts which will be identified by the condition EC equals 256 and we’re planning to say that if EC is 256 then we’re going to do something special in an extension attack well say something about that that plan the saving thing about oriental characters like Japanese or Chinese is that they don’t have that many different widths Heights and depths their effect they’re almost always the same they have lots and lots of different different shapes of characters but only a few with sights and depth so inside of tech we would represent those as a character code first of all it tells what it’s width height and depth is and then we would have another bit another thing after it 16 bits that would tell us exactly what glyph it is in the font and by looking but by looking for a font that has something that has EC equal to 56 we tech in such an extension would know to look for a glyph code besides to figure out what character what the actual character is but the first part of it would be treated just as usual by tech to figure out this the width height and depth so the character the different widths Heights and depths would be described in the in the ordinary way I’ll tell you well that’s something that we would then then I think we were going to say character 256 would maybe refer to would represent all of the other ones or something like that that there’s no large proposal worked out I just wanted to make sure that this kind of thing would would be easy to graph kin without much difficulty okay now inside of these char info parts we have that we have the highly compressed data which makes it possible to do this most most of the time the rest of this if these tables are also going to be fairly short in fact well we can find out how how big these files are I suppose let me let me just type something here so I can you let me see I’m not going to be doing anything real interesting so you don’t have to see it yourself but on but if you want if you want to we could switch over to the computer now for just to test a link and you can you dim the house lights now and go to the computer terminal and see if we can get anything here I’m going to try CMR 10 dot TFM on Texas and type it out in octal is to see what see how long it is that’s it can anybody see anybody now this is the beginning of the file you see those are those that header information occurring in the first six word can you see it no yeah all right they’d like to see the computer screen is that connected okay see what a dreadful thing if you have to watch me all the time I I can’t stand to look at mirrors now they did so this is the beginning you see the length of the file in words is 215 octo no no wait a minute we got to be careful because either 36 bit words and they’re shifted left so what what is that really oh all two one five now let’s see that’s three six nine twelve fifteen bits so we need 16 bits so that’ll really be one zero l really before 4.32 octo which is four times 64 256 24 more 26 more 256 and 26 to 82 something like that 282 words so typically we’re running between 250 and 300 words for the top for the for the whole log TFM file you see the next part of the file coming through there this would be your char in four words for the for this roman font after the header the header the header contains the name in of the font it will say computer modern or something like that and technically pretty much ignores that but then the character intro parts most of these are zero that’s when the the height and depth the depth is zero for example on most of these characters and both of the information is taken by the char info table and at the end we’re going to get things where there are more bits on probably and that’s when we’re getting into the hike the actual width data and height data but looks like most binary dump files okay go back to the house lights on again then it’s packed highly condensed and so we have the TF to pl program available to print that in symbolic form and we’ll we’ll take a look at what it prints from that binary data in a minute well the char in four words are are packed into they have four bytes to them the first bite second bite third bite and fourth buyer gets with zeros the first second this one is width and it’s a pointer into the width table if the width is zero however this means the character is not there just the condition width equals there always is in is necessary and sufficient for aesthetics concern saying don’t put it in the DVI font and and give an error message if the user wants to be told about such errors so it’s an eight bit pointer we could handle almost all cases however if somebody has 256 nonzero widths in his and 256 characters in the front then we couldn’t have we couldn’t handle them all because we only have two we have to we have to 0 is reserved for a flag to indicate not present the next one height and depth is packed four bits for height and four bits for depth which means that we’re only allowing sixteen different heights sixteen different depths furthermore the first width is zero the first height is zero the first steps is zero first to tell it correction is zero this is a convention also that we can help to check that we’ve got a TFM file and well that doesn’t really hurt you because you usually want at least one character that has height zero or depth zero so you’re not paying but now so you can so if you have a font that has more than sixteen different heights in it the program will round to the coat to the closest the closest ones that it had it’ll take all the heights that are there and it’ll pick 16 of them so that the maximum error is as small as possible that’s of note that’s of little consolation however if your height is represents the thickness of a rule of a square root sign so so we’re careful to design our symbol spawn so that whenever we have a square root sign it’s it’s not going to be in a font that’s that’s Heights are getting rounded they’re going to be exact Heights given but typically this isn’t this is plenty and we only have that many bits so that’s it so four bits for height four bits for depths are packed in there the italic correction and tag are here but this is six bit bit so this is 8 bits live for four this is six and two italic correction and we have 66 bit so we can have 64 different italic Corrections in the file and in my if I’m not well if I’d if I calculate each italic correction in by a tricky math formula then I tend to get 65 or 66 of them in a typical italic font of 128 characters but that but rounding doesn’t really bother me on donnatella Corrections that I won’t see the difference of a percentage point of a point so we got that thing then the tag field which it which has four values and then there’s the remainder field which is a which is sort of like a parameter to the tag eight bits now the tag has four values and zero one two and three zero is the vanilla case a normal case and almost all characters have a tag a zero tag of one is linked it says a ligature current program is there and in that case the remainder is a pointer into the ligature current commands and we start looking there to see what about ligatures and currents we’ll talk a little bit more about that in a minute type two it is where we have a character list and this means we there is a next larger character in some list of characters for example of left parentheses might be might be linked together and and and so we can get the knit to the next larger one bye-bye and the remainder is a pointer to the to the character that’s next larger in the sequence and that can point to still another one and so on so we can find we have a we can have lists the characters and the last one is extent the third cases for an extensible character in that case the remainder field is a pointer into the extensible into the extensible recipes for making up extensible characters well I can explain extensible recipes right now extensible recipes have four parts top middle bottom and repeatable part and these are the names of four characters in the font or they might be 0 in this in these three case they might be 0 indicating that the thing isn’t present so we make up repeatable characters in that way for example that a left brace a curly brace has four has off all four things present repeatable part is this here gets repeated as many times as you need the same number of times above and below the middle part of course is this the top part this the bottom part that four different characters and tech will we’ll take top any number you know n of these then middle and end again of these and end the bottom if middle is empty that would it be indicated by zero here that would be a character like a left parenthesis has just the top of bottom and a repeater part font designer should drop should to make a good font the font designer should make the this set of all different widths that you can get be the same for all of his de limiters but you shouldn’t it’s for exactly so that if you’d have a left parenthesis with the repeater part of six points then your your repeater part for braces should be three points each so so that you’ll have the same sequence of of lengths because when tech is is matching suppose somebody for some reason one of the left parenthesis and a right brace it would be funny if the right brace would be a different size from the left parenthesis just because the set of different available Heights is it’s different for the two you’d like to make them consistent so so you design it so that the set of different things that that that can be made up has the same has the same dimensions anyway you take a symbol like vertical line that doesn’t have top middle or bottom so our repeater parts and one of these floor brackets has bottom but no top and so on I haven’t seen one yet that has a middle and nothing else but could you can imagine off all those combinations the repeater part is always there even if it’s 0 it’s that new that would mean that 0 is your repeater character but but if it’s an extensible character this is used in certain parts of tech mass mode to see if you have if this character is an extensible character and then then it will see if the tag is 3 what about the ligature current programs I think I’ve covered everything about this except parameters and ligature current commands because with sidesteps these are just the different depths I got to mention though that these are given in in certain units because they represent physical dimensions so we ought to talk about those units the in the header in the header part there’s two important words far as tech is concerned and I forget which comes first and which comes second but the Fiat first is the checksum and next is the young is the design size so we got to talk about those two things checksum first that’s a number that’s supposed to be that’s supposed to change if-if-if text in important information is different if for example somebody is if uses tech to produce a DVI file in January and then you use that DVI file to print in February the checksum that tech found in its in its TFM file that used in January should agree with the checksum that’s being used in February to print the font otherwise there’s been an incompatibility okay and so this text um is computed by marathon if the checksum is 0 it doesn’t get there no test is made but but but otherwise the user is warned that something has happened to the font in the meantime so that the assumptions that tech made generating the file are not necessarily valid anymore and we can’t mom maybe printed correctly so beware that’s what the checksum is for the design side is is so besides that the that’s sort of normal for this font whether that the designer had in mind it’s but it’s kind of an arbitrary thing it’s just what all the other units are expressed in terms of so all the widths are expressed as multiples of the design size or all the heights depths that tell corrections currents and parameters except one parameter is pure number but all the other parameters are expressed as multiples of the design side and the design size doesn’t necessarily have to have to reflect any universal truth for something but if you say that you’ve got that your design size is 10 points this is given in units of points then then if somebody just calls for this font without saying at a certain other size then he gets it at 10 points the design size is sort of the default ant size so a person in in writing in tech will say font 10 equals CMR 10 and its design size happens to be 10 and the design size of CMR 9 happens to be 9 but if he says cm are 10 at 15 point then tech just reads in the same TFM file but pretends that 15 was the design size here and so everything is in a multiple of the design size so everything gets multiplied by by 15 instead now these wit these widths are actually given as as what the documentation calls fixed words and fixed words are our our fixed point things that have 20 bits to the right of the decimal point so there’s 20 bits to the right of the decimal point and therefore 12 to the left of the decimal point okay now that the fixed words however it should not actually use all this precision to the left a decimal point Tech is only going to allow them to use eight bits to the left the decimal point or is it for four bits yeah only four of those should actually be used so we’re not saying so nobody’s supposed to have a width more than 16 times the design size sorry so for example if you really want to you know to have something more than 16 amps the design size we might call an M is typically as it is one quarter one M if somebody really wants to have a character that’s more than 16 M s in some direction or other he should say that his design size is larger okay and the reason is that Ventech can can do the conversion multiplying by the design size with a single precision without a lot of time and saves it saves it saves these internal calculations so the so these are given as fixed words so the first byte will either be 0 or 255 but if a negative is given into its complement notation so the first byte has to be 0 or 255 in all the websites depths parameters okay now to leek late your current commands here’s where we’ve got another another highly efficient way to pack logic into bits and the ligature current command table is a big big table of instructions possibly big but but possibly is empty and if the tag bit was one that would say the remainder is an index into this table as to where we start there’s four bit for bite commands in the table and and you can you do and you execute this command and if it didn’t find something then you go on to the next one no the commands have four parts and the first one is called the stop bite or the dot bit this one is called the op I think this one is called no this one is called the character and this one’s called the op and this one is called the remainder or the upper end or something I don’t have a specific name for it call it remainder and the command has the following the the right now we’re using one bit for stopped one bit for op eight bits per character and 8 bits for remainder so people like Pierre Makai have been eyeing these extra 14 bits with great interest because they want to do things for Arabic and other languages that where they would like to go beyond 256 character limit right now we’re just using the high order bit in the stop bite and the op bite the commands means this look at the next character following the present one and it doesn’t match this one if not go on to the next instruction or stop depending on whether this top bit is there so the first thing is look for a match on the character after it and if and if you didn’t find it then you stop if this stop it is on otherwise you go on and sequentially until you either find a match or you find a stop it now if you did find a match then up it tells you it’s either a leecher or a current if it’s a if it’s a Kern then you’d look at the remainder part and that’s an index into this current table here tells you how much the current is if it’s a ligature the remainder tells you replace your present character and the following character by this one and then start the whole process over again pretend that that was the character that was input and go through and see if that character in turn has a ligature tag and if so it will refer to it to its program to continue on so see what happens to an F for example in a typical American version of ligatures F has a ligature program that says are you following if you’re followed by an F then replace this by FF character as a ligature and go on if you’re followed by an AI replace it by Fi and and and and same for L somewhere then FF itself FF ligature is followed by an AI and replace that by F fi f would also be followed for two also often followed by currents for example F followed by an exclamation point you’d want to put some extra space between F and information point so that they don’t run into each other quote marks the same way okay so that’s the idea of these ligature current instructions and a command and tech will interpret this in several parts of the of the program so let’s now see what the tf2 pl program does with this and so let’s have the lights off and go back to the computer and see if this works it’s still up tf2 PL tf2 PL has a program that is is one of these tecware programs that you’ve got and it asked me for this TFM file and I’ll just use the CMR 10 TFM that on texas that we were just looking at in binary on this and I’ll print out CMR 10 dot PL and this is the this is the program that you have a listing of doing it it prints out the mate the character codes that it has processed as its doing it ok so now we should be able to look at CMR

That's the end of this page. If you have any feedback about anything on this site, you can contact me here. To go back to the top-level page (if you are not on it already), click here.