## Brief Notes on Computer Word and Byte Sizes

7 March 2023

This is not my usual blog fodder, but there’s too much material here for even a Mastodon thread. The basic question is why assorted early microcomputers—and all of today’s computers—use 8-bit bytes. A lot of this material is based on personal experience; some of it is what I learned in a Computer Architecture course (and probably other courses) I took from one of my mentors, Fred Brooks.

There are three starting points important to remember. First, punch card data processing is far older than computers: it dates back to Hollerith in the late 19th century. When computerization started taking place, it had to accommodate these older “databases”. Second, early computers had tiny amounts of storage by today’s standards, both RAM and bulk storage (which may have been either disk (for some values of “disk”!) or tape). Third, until the mid-1960s, computers were either “commercial” or “scientific”, and had architectures suited for those purposes.

Punch card processing was seriously constrained. Punch cards (at least the IBM type; there were competing companies) had 80 columns with 12 rows each. There was a strong desire to keep all data for a given record on a single card, given the way that data processing worked in the pre-computer era (but that’s a topic for another time). This meant that there was a premium on ways to compress data, and to compress it without today’s software-based algorithms. The easiest way to do this was to put extra holes in a card column. Consider a column holding a single digit “3”. That was represented by a single hole in the 3-row of a single column. There were thus 10 rows reserved for digits—but in a numeric field, the 11-row and the 12-row weren’t used. You could encode two more bits in that colum, as long as the “programming” knew that, say, a column with a 12-3 punch was really a 12 punch and the number 3 and not the letter C. Clearly, 10 digit rows plus two "zone" rows gives us 40 possible characters; a few more were added when things were computerized.

Let’s look at such computers. The underlying technology was binary, because it’s a lot easier to build a circuit that looks at on/off rather than, say, 10 different voltage levels. When reading a card, though, you had to preserve the two zone bits separately, because their meaning was application-dependent. Accordingly, they used 6-bit characters: two zone bits, plus four bits for a single digit. But you can fit 16 possible values in those four bits, not just 10, so machines of that era actually had 64-bit character sets. In a purely numeric field, the zone bits were used for things like the sign bit and (sometimes) for an end-of-field marker of some sort, but that’s not really relevant to what I’m talking about so I won’t say more about those. The important thing is that each column had had to be read in as a single character, more or less uninterpreted.

Representing a number as a string of (effectively) decimal characters was also ideal for commercial data processing, where you’re often dealing with money, i.e., with dollars and cents or francs and centimes. It turns out that \$.10 can’t be represented in binary: 1/10 is a repeating string in binary, just like 1/3 is in decimal, and CFOs and bankers didn’t really like the inaccuracy that would result from truncating values at a finite number of places. (Pounds, shillings, and pence? Don’t go there!) The commerical computers of the day, then, would do arithmetic on long strings of decimal digits.

Scientic computers had a different constraint. They were often dealing with inexact numbers anyway (what is the exact diameter of the earth when computing an orbit), and had to deal with logarithms, trig functions, and more. Furthermore, many calculations were inherently imprecise: a Taylor series won’t yield an exact answer except by chance, and it might not be possible even in theory. (What is the exact value of π? It’s not just irrational, it’s transcendental.) But there were other constraints. Sometimes, scientists and engineers were dealing with very large numbers; other times, they were dealing with very small numbers. Furthermore, they needed a reasonable amount of precision, though just how much was needed would vary depending on the problem. Floating point numbers were represented internally in scientific notation: an exponent (generally binary) and a mantissa. There were thus two critical parameters: the number of bits in the mantissa, which translated into the precision of numbers stored, and the number of bits in the exponent, which translated into the range. (Both fields, of course, included a sign bit in some form.) Given these constraints, and given that commercial data processing, with its 6-bit characgters, came first, it was natural to use 36-bit words: plently of bits of precision and range, and the ability to hold six characters if that’s what you were doing.

That’s where matters stood when the IBM S/360 series was being designed starting in 1961. But one of the goals of the 360s was to have a single unified architecture that could do both scientific and commerical computing. There was still the need to support those old BCD databases, whether they were still on punch cards or had migrated to magetic tape, and there was still the need to support decimal arithmetic. The basic design was for a machine that could support memory-to-register arithmetic for scientifc work and general utlity computing, and storage-to-storage decimal arithemtic for commercial computing. This clearly implied a hybrid byte/word architecture. But how big should bytes be? One faction favored 6-bit bytes and either 24-bit or 36-bit words; another favored 8-bit bytes and 32-bit words. Ultimately, Brooks made the call: 8-bit bytes permitted lower-case letters, which he foresaw would become important to permit character processing. (Aside: Brooks, apart from being a mensch, was a brilliant man. It’s sobering to realize that he was appointed to head the S/360 design project, a bet-the-compay effort by IBM, when he was just 30 years old, and this was just after his previous project, the 8000 series of scientific computers, was canceled. I wasn’t even out of grad school when I was 30!)

The reduction from 36 bits to 32 bits for floating point numbers was challenging: there was a loss of precision. You could go to double-precision floating point—64 bits—but that cost storage, which was expensive. In fact, 8-bit bytes were also expensive: 33% more bits for each character. (IBM did many simulations and analyses to confirm that 32 bits would usually suffice.) But Brooks’ vision of the need for lower case letters has been amply confirmed. (Other character sets than the American Latin alphabet? Not really on folks’ radar then, which was unfortunate. But it would have been hard to do something like Unicode back then. The lowest plane of Unicode is based on ASCII, not IBM’s EBCDIC. Many people within IBM wanted to go to ASCII for the S/360 line (there was even support in the Program Status Word for ASCII bytes instead of EBCDIC ones when dealing with decimal arithmetic), but major customers begged IBM not to do that—remember those pesky zone punches that still existed and that still couldn’t be converted in a context-independent fashion?)

8-bit bytes have other, albeit minor, advantages. If you’re trying to create a bit array, it’s nice to be able to lop off the lower-order 3 bits and use them to index into a byte. But Brooks himself said that the primary reason for his decision was to support lower-case letters. (Aside: Gerritt Blaauw, one of the other architects of the S/360, spent a semester at UNC Chapel Hill where I was a grad student, and I took a course in computer design from him. There were rumors in the trade press that IBM was going to switch to 9-bit bytes for future computers. I happened to overhear a conversation between him and Brooks about this rumor. Neither knew if it was true, but they both agreed that it would be unfortunate, given how hard they’d had to fight for 8-bit bytes.) USASCII fits nicely into 7 bits, but that’s a really awkward byte size. The upper plane was used for a variety of other alphabets’ characters. That usage, though, has largely been supplanted by Unicode. What it boils down to is that every since the S/360, there has never been a good reason to use a byte size of anything other than 8 bits. On IBM systems, you have EBCDIC, an 8-bit character set. On everything else, you have ASCII, which fits nicely in 8 bits and was more international.

Word sizes are more linked to hardware. The real issue, especially in the days before cache, was the width of the memory bus. A wide bus is better for performance, but of course is more expensive. The S/360 was originally planned to have five models, from the low-end 360/30 to the 360/70, that shared the same instruction set. It turns out that the 360/50 was a sweet spot for price/peformance and for profit—and it had a 32-bit memory bus. If you’re trying to do a 32-bit addition, you really want the memory operand to be aligned on a 4-byte boundary, or you’d have to do two memory fetches. 32 bits, then, is the natural word size, and the size of the registers. You could do half-word fetches, but that’s easy; you just discard the half of the word you don’t want. A double-precision 64-bit operand requires two fetches, but on a higher-end machine with a 64-bit bus it’s only one fetch if the operand is aligned on an 8-byte boundary. And on the IBM Z series, the modern successor to the S/360? Words are still 32 bits, because the nomenclature is established. A pair of 64-bit registers together is said to hold a “quadword”. That is, what a “word” is is was defined by the original history of the architecture; after that, it’s likely historical.

https://www.cs.columbia.edu/~smb/blog/2023-03/2023-03-07.html