Starting With a Blank Page

What if you could make your own processor from scratch?


In The Heart of the Beast, I mentioned that microprocessors have a fixed number of functional units used to execute the instructions in your application program. For example, whether your application needs zero or forty math units, you get a fixed number of math units (typically four) in a microprocessor. If your program could really use forty math units, tough. Only four may be used at a time, and so sharing these four math units becomes a performance bottleneck. What if you could create your own processor, crafted to exactly match your program’s ideal requirements?

You can indeed define and create your own processor… if you have a million or so US dollars, can pay a hardware engineer or five to design the thing for you, and the patience to wait six to twelve months for the first devices to roll off the fabrication lines. This situation is not ideal, but it works well if you have a large enough market business case to support the effort. These things are called ASICs, Application Specific Integrated Circuits. You could say a microprocessor is a general purpose, programmable ASIC.

There exists another custom processor solution these days. Cue the entrance of a device called an FPGA. (Which stands for Field Programmable Gate Array, and tells you nothing about what it does.) FPGAs have been around for a very long time, but until recently only hardware engineers could use them. To be fair, until recently these FPGAs were not large enough to be interesting to application software developers.

What are these FPGAs to software people? They are essentially a blank piece of paper that can be configured and re-configured with “hardware” created using software languages instead of hardware languages. The good news is that you can create a set of functional units exactly tuned to your program’s needs. The bad news is that you can create a set of functional units exactly tuned to your program’s needs. Seriously. You have the complete freedom to create whatever kind of processing you want, so the problem then becomes “what do you want?”. That is a bit of a problem to solve these days, as programmers are used to dealing with a fixed number of functional units in a microprocessor. Complete freedom to create whatever is sometimes unnerving. At the same time, there are two major concepts to make life simpler and reasonably sane for the application programmer.

The first is the concept of data flow. Instead of microprocessor program instructions acting upon data and shuttling data around, imagine your data simply flows simultaneously through all of the functional blocks you create. Are you starting to why these FPGA devices might be useful for application performance? Not only is your FPGA based custom processor 100% dedicated to your application, you’re getting a tremendous amount of parallel (simultaneous) processing by not needing to schedule execution around a limited set of functional units.

The second concept is the astonishing power of modern compilers. A compiler is software that turns software programming language statements into something useful in the computer world. In the case of an FPGA, a compiler turns a program into hardware functional units which are then laid down (“programmed”) upon the blank page FPGA.

Why bother with an FPGA? Well, microprocessor performance has hit a brick wall. The microprocessor clock rate (instruction execution rate) can not be cranked up any higher without the device melting down from waste heat. On the other hand, FPGA based applications run at one-tenth the clock rate of a microprocessor, which leads to much less waste heat and much less power consumption. Even at one-tenth the clock rate, applications on an FPGA may run five times to a hundred-plus times faster than the equivalent microprocessor application. Ok, you knew the bad news was coming. Due to numerous factors, not all applications on an FPGA will outperform their microprocessor equivalent. I may go into some of these factors in a later post. In the mean-time, remember that sometimes an FPGA is sometimes an application accelerator, sometimes it is an application decelerator.

The Heart of the Beast

How do computers do so many different things?

Computers run spreadsheets, web browsers, games, take pictures, do email and a host of many other (hopefully) useful things. Computer servers, desktops and mobile devices do the many things they do because of a chip at their heart that executes instructions. These instructions are collectively known as a program. (More on programs in a later post.) This chip is called a microprocessor or a CPU (Central Processing Unit), and it executes several millions of instructions per second. Each instruction is rather simple, so it takes a rather horrifying number of executed instructions to, say, open a web page in your browser. Good thing the microprocessor is really, really fast.

There are many, many ways to describe the guts of a microprocessor. A computational instruction execution engine, an uncountable number of transistors, a set of defined functional units, and a serious power hog and heat generator are a few of the ways of looking at a microprocessor. Let’s look at a microprocessor from the point of view of it being a set of functional units for now.

Let’s warp the idea of a household kitchen for a moment and view it though the lens of it being a set of functional units. You’ve got your refrigerator, freezer, sink, faucet, blender, oven, microwave, dishwasher, cupboards, toaster and so on. Each “functional unit” in your kitchen does one thing really well. A faucet is good at producing water and a toaster is great at toasting bagels. A faucet is maybe not so good at toasting bagels and I sincerely hope your toaster does not produce water. All of these functional unit things together make up your kitchen. A microprocessor is similar in that it has math units for computation, caches for temporary storage of instructions and data, instruction execution units to execute programs, memory management units to keep memory straight (wish my brain had one), and so on. Fewer functional units in total than your kitchen, actually. A bit smaller than your kitchen, too.

The good news is that many years of experience, experimentation and observation have paid off in that we have microprocessors today that can do many, many things reasonably well. It’s more of a generic efficiency apartment kitchen than a Uno’s Pizzeria and Grill kitchen. Nothing wrong with that, I don’t need an Uno’s kitchen when I make dinner tonight.

The bad news is that the functional units in a microprocessor are fixed at what they are. Let’s say I needed another freezer in my generic efficiency apartment kitchen, it’s not so easy to add one. Same thing with a microprocessor. They typically have four floating point math functional units to, well, do math. If my program only needs four at a time for execution, all is well. But let’s say I have a weather prediction program I want to execute and I want to predict the weather tomorrow. A weather prediction program has a lot of math in it, as you may imagine. If I want the program to complete before tomorrow, I’d really like to have, say, a thousand floating point math functional units all running at the same time. It’s not so easy to add more math functional units to a microprocessor.

A microprocessor is a good general purpose instruction execution engine that has a fixed number of functional units to do the work it needs to do. In later posts, I will touch on the nature of programs, alternative ways and means to compute stuff,  programs that translate geek-readable text into microprocessor instructions and whatever else might appear in our wandering.

Hello, and welcome

The what and why of this blog.

Seymour Cray, the father of supercomputing, inspired me at a very difficult time in my undergraduate education. I was struggling with Network Analysis, a difficult course in any Electrical Engineering curriculum. I stumbled upon this quote from Seymour Cray:

“I’m all for simplicity. if it’s very complicated I can’t understand it.”

I thought If Seymour himself can’t understand things unless they are simplified, perhaps the reason I am struggling with this material is because I am not breaking it down into simpler pieces I can understand. Sure enough, that was my problem. I was trying to digest giant-sized chunks of complex information without simplifying it first.

I practice simplification to this day,  and thoroughly enjoy describing seemingly complex things to people in a manner and at a level they can appreciate. Which is why I started this blog. I intend to describe at a reasonably simple level concepts and ideas from computer engineering and computer science. I understand not many people really want to know every little exquisite detail about the underlying hardware or software in computers. However, I believe a larger number of people are indeed interested in a high-level, rough understanding of various nuts and bolts of computation today. I know I certainly appreciate it when a scientist takes the time to describe quantum mechanics or chaos theory or black holes in a manner I am able to grasp.

The “hidden” in this blog tag line refers to the invisible aspects of computer technology. I am not going to touch on iPhone operation nor how to make Microsoft Word do a certain thing. Instead, I’ll be going underneath it all, looking at microprocessors, accelerators, networking, aspects of programming, computational considerations, and the like. I intend to describe these things in a manner an inquisitive and generalist mind may easily grasp. I request you tell me when I miss the mark, as I need your feedback in order   to refine my descriptions.

I have no idea of exactly where this blog is going to go, nor which topics I will choose in which order. Storytelling takes on a life of its own, and I am willing to go where this path takes me. And thoroughly enjoy the ride! I hope you enjoy it, too.