Inside the Machine

After a recommendation on Twitter, I start read the book Inside the machine by Jon Stokes, available as paperback and also somewhere as ebook. The book describes the design and history of Intel Pentium processor and ARM 6800 processor up until the Intel Core.

What makes it interesting, the book does not focus now how transistors are created, but rather on the design of the processor, how a level 1, level 2 cache work, where there are for and what they actually are. Transistors become smaller each year, but how do processor designers actually make use of them?
As programmer I usually consider the microprocessor, and whole computer for that matter as a black box, a generic Turing machine. That may be not a problem for most day to day programming, but I found it very interesting on how Intel, AMD and ARM processors (e..g. Apple) stand out.
Some things I remember from the book:

Level 1, level 2, level 3, etc. caches
Since Pentium the speed of processors has increased exponentially as we all know. You computer’s internal memory hasn’t caught up. So one of the solutions is introducing caches, memory that’s a lot more faster, but also a lot more expensive. Modern processors have multiple levels of caches. The lowest level is the processor itself, which has a few registers (think sort of variables) where values can be stored. Then there’s the level 1 cache, very fast memory that’s part of the CPU, the next level cache is slightly slower, all the way up to your computer’s internal memory.

Your processor is like a workshop in a city, where your workers manufacturer, for example, wooden shoes for tourists. Raw material is needed. Your manufacturers create different type of shoes, with different type of raw materials. All of the raw material are in warehouse outside the side at one hour drive away. You can predict what your workers will need to make (tourist are quite predictable) and fortunately your workshop has a some limited storage space as well where you can store the raw materials for the day.
If all is well, you can ship all goods for the day in a truck from your warehouse to the workshop, so your workers can work as fast as possible. But what if someone orders a unique wooden shoe that requires different raw material? Then your truck has to drive for one hour just nearly empty. The customer would have to wait. A cache miss.
There’s more your workshop, can do. Branch prediction – already starting to manufacturing a wooding shoe when a customer is still making their order. If you started well you can deliver immediately. But if the customer changed their mind or you misunderstood the order, you’ve created something you can throw away.

RISC and CISC
From early Pentium processor an surprising amount of about one fifth of the transistor capacity was dedicated to emulating the x86 processor – e.g. older processors.
Internally a Pentium Processor was RISC processor – using simplified (‘Reduced Instruct Set) instructions. However, externally the Pentium accepted complex x86 instructions (‘Complex Instruct Set)- the instruction set used by older processors. This means the processor was backward compatible without the need for an emulator. But the disadvantage was programmers weren’t uses the processor at full capacity.

Compare that to the the Motorola 60x processor that was included in, among other, Apple PowerPC. These processors accepted only an simplified instructions, allowing them to be faster. To use older programs, an emulator was needed, a small program that would translates software compiled for older processors (Motorola 68000 and older) to the new instruction set.
One main use of these newer processors was the Apple Mac. Since Apple controlled what operating system was used on their machines, including an emulator in the operating system to run old software was a lot easier then for Intel – where people were used to install the operating system themself.

You’d think eventually the faster Motorola processors would take over, but as we now know that was not the case. The speed penalty of 20% reduced quickly for later Pentium II processors up to a negligible percentage. And even though it’s kind of ugly that a modern processor contains a built in emulator- normal consumers and even programmers don’t care about that as long as the processor is fast.