In The Heart of the Beast, I mentioned that microprocessors have a fixed number of functional units used to execute the instructions in your application program. For example, whether your application needs zero or forty math units, you get a fixed number of math units (typically four) in a microprocessor. If your program could really use forty math units, tough. Only four may be used at a time, and so sharing these four math units becomes a performance bottleneck. What if you could create your own processor, crafted to exactly match your program’s ideal requirements?
You can indeed define and create your own processor… if you have a million or so US dollars, can pay a hardware engineer or five to design the thing for you, and the patience to wait six to twelve months for the first devices to roll off the fabrication lines. This situation is not ideal, but it works well if you have a large enough market business case to support the effort. These things are called ASICs, Application Specific Integrated Circuits. You could say a microprocessor is a general purpose, programmable ASIC.
There exists another custom processor solution these days. Cue the entrance of a device called an FPGA. (Which stands for Field Programmable Gate Array, and tells you nothing about what it does.) FPGAs have been around for a very long time, but until recently only hardware engineers could use them. To be fair, until recently these FPGAs were not large enough to be interesting to application software developers.
What are these FPGAs to software people? They are essentially a blank piece of paper that can be configured and re-configured with “hardware” created using software languages instead of hardware languages. The good news is that you can create a set of functional units exactly tuned to your program’s needs. The bad news is that you can create a set of functional units exactly tuned to your program’s needs. Seriously. You have the complete freedom to create whatever kind of processing you want, so the problem then becomes “what do you want?”. That is a bit of a problem to solve these days, as programmers are used to dealing with a fixed number of functional units in a microprocessor. Complete freedom to create whatever is sometimes unnerving. At the same time, there are two major concepts to make life simpler and reasonably sane for the application programmer.
The first is the concept of data flow. Instead of microprocessor program instructions acting upon data and shuttling data around, imagine your data simply flows simultaneously through all of the functional blocks you create. Are you starting to why these FPGA devices might be useful for application performance? Not only is your FPGA based custom processor 100% dedicated to your application, you’re getting a tremendous amount of parallel (simultaneous) processing by not needing to schedule execution around a limited set of functional units.
The second concept is the astonishing power of modern compilers. A compiler is software that turns software programming language statements into something useful in the computer world. In the case of an FPGA, a compiler turns a program into hardware functional units which are then laid down (“programmed”) upon the blank page FPGA.
Why bother with an FPGA? Well, microprocessor performance has hit a brick wall. The microprocessor clock rate (instruction execution rate) can not be cranked up any higher without the device melting down from waste heat. On the other hand, FPGA based applications run at one-tenth the clock rate of a microprocessor, which leads to much less waste heat and much less power consumption. Even at one-tenth the clock rate, applications on an FPGA may run five times to a hundred-plus times faster than the equivalent microprocessor application. Ok, you knew the bad news was coming. Due to numerous factors, not all applications on an FPGA will outperform their microprocessor equivalent. I may go into some of these factors in a later post. In the mean-time, remember that sometimes an FPGA is sometimes an application accelerator, sometimes it is an application decelerator.