Left Brain vs. Right Brain Programming?

Serial and parallel programming style.

Until the past decade or so, most software programming was serial. That is, the code was meant to execute one statement at a time. That was fine, serial code performance kept increasing as long as microprocessor clock rates kept increasing.

Microprocessor clock rates stopped increasing a few years ago, and so people turned to parallel programming to extract more performance out of their application programs. One central idea in parallel programming is that independent data and instruction streams may be divided up and executed simultaneously. Much of the underlying computer processors (CPUs, GPUs and FPGAs) support this style of programming. What I find puzzling is that the code development environments do not seem (to me, at least) to fully support a parallel programming style.

I’m speaking from personal experience here. I know I absolutely have to draw out my parallel code on a large piece of paper and only then can I implement pieces of the code in a traditional text-based development environment. Much to my surprise, I’ve seen papers in cognitive neuroscience concluding our human brains use the speech center to work things out in a serial fashion and use the visual center to work things out in a parallel fashion. It sure seems like a useful parallel programming environment would fully support coding using visualization instead of coding using text characters from a computer language. To be fair, this may be confirmation bias on my part. Or perhaps a programming environment using only imagery it a bit too far off in the future. I do not know as I tend to have many more questions than answers. And I’d love to be able to create parallel programs from pictures.

 

Starting With a Blank Page

What if you could make your own processor from scratch?

In The Heart of the Beast, I mentioned that microprocessors have a fixed number of functional units used to execute the instructions in your application program. For example, whether your application needs zero or forty math units, you get a fixed number of math units (typically four) in a microprocessor. If your program could really use forty math units, tough. Only four may be used at a time, and so sharing these four math units becomes a performance bottleneck. What if you could create your own processor, crafted to exactly match your program’s ideal requirements?

You can indeed define and create your own processor… if you have a million or so US dollars, can pay a hardware engineer or five to design the thing for you, and the patience to wait six to twelve months for the first devices to roll off the fabrication lines. This situation is not ideal, but it works well if you have a large enough market business case to support the effort. These things are called ASICs, Application Specific Integrated Circuits. You could say a microprocessor is a general purpose, programmable ASIC.

There exists another custom processor solution these days. Cue the entrance of a device called an FPGA. (Which stands for Field Programmable Gate Array, and tells you nothing about what it does.) FPGAs have been around for a very long time, but until recently only hardware engineers could use them. To be fair, until recently these FPGAs were not large enough to be interesting to application software developers.

What are these FPGAs to software people? They are essentially a blank piece of paper that can be configured and re-configured with “hardware” created using software languages instead of hardware languages. The good news is that you can create a set of functional units exactly tuned to your program’s needs. The bad news is that you can create a set of functional units exactly tuned to your program’s needs. Seriously. You have the complete freedom to create whatever kind of processing you want, so the problem then becomes “what do you want?”. That is a bit of a problem to solve these days, as programmers are used to dealing with a fixed number of functional units in a microprocessor. Complete freedom to create whatever is sometimes unnerving. At the same time, there are two major concepts to make life simpler and reasonably sane for the application programmer.

The first is the concept of data flow. Instead of microprocessor program instructions acting upon data and shuttling data around, imagine your data simply flows simultaneously through all of the functional blocks you create. Are you starting to why these FPGA devices might be useful for application performance? Not only is your FPGA based custom processor 100% dedicated to your application, you’re getting a tremendous amount of parallel (simultaneous) processing by not needing to schedule execution around a limited set of functional units.

The second concept is the astonishing power of modern compilers. A compiler is software that turns software programming language statements into something useful in the computer world. In the case of an FPGA, a compiler turns a program into hardware functional units which are then laid down (“programmed”) upon the blank page FPGA.

Why bother with an FPGA? Well, microprocessor performance has hit a brick wall. The microprocessor clock rate (instruction execution rate) can not be cranked up any higher without the device melting down from waste heat. On the other hand, FPGA based applications run at one-tenth the clock rate of a microprocessor, which leads to much less waste heat and much less power consumption. Even at one-tenth the clock rate, applications on an FPGA may run five times to a hundred-plus times faster than the equivalent microprocessor application. Ok, you knew the bad news was coming. Due to numerous factors, not all applications on an FPGA will outperform their microprocessor equivalent. I may go into some of these factors in a later post. In the mean-time, remember that sometimes an FPGA is sometimes an application accelerator, sometimes it is an application decelerator.