Serial and parallel programming style.
Until the past decade or so, most software programming was serial. That is, the code was meant to execute one statement at a time. That was fine, serial code performance kept increasing as long as microprocessor clock rates kept increasing.
Microprocessor clock rates stopped increasing a few years ago, and so people turned to parallel programming to extract more performance out of their application programs. One central idea in parallel programming is that independent data and instruction streams may be divided up and executed simultaneously. Much of the underlying computer processors (CPUs, GPUs and FPGAs) support this style of programming. What I find puzzling is that the code development environments do not seem (to me, at least) to fully support a parallel programming style.
I’m speaking from personal experience here. I know I absolutely have to draw out my parallel code on a large piece of paper and only then can I implement pieces of the code in a traditional text-based development environment. Much to my surprise, I’ve seen papers in cognitive neuroscience concluding our human brains use the speech center to work things out in a serial fashion and use the visual center to work things out in a parallel fashion. It sure seems like a useful parallel programming environment would fully support coding using visualization instead of coding using text characters from a computer language. To be fair, this may be confirmation bias on my part. Or perhaps a programming environment using only imagery it a bit too far off in the future. I do not know as I tend to have many more questions than answers. And I’d love to be able to create parallel programs from pictures.
Re-purposing GPUs to solve a certain class of compute problems.
Seymour Cray once said “If you were plowing a field, which would you rather use? Two strong oxen or 1024 chickens?” Seymour was comparing using two supercomputers to using a large cluster of networked microprocessors for computation. Since Seymour was the father of supercomputing, you can guess which he would choose. I’d answer him “it depends.”
So far in this blog, I’ve talked about things with which I have fairly extensive experience. I’m going to go out on limb here and talk about something I’ve never actually programmed (yet): using GPUs as computer processors. Today, these devices are certainly used as processors in the same sense that microprocessors, FPGAs and embedded microcontrollers are used as processors. I did not want to leave GPUs out of the discussion, even when I have only a theoretical understanding of them.
GPU stands for “Graphics Processing Unit.” For many years this device was the thing in your computer that created the images on your computer screen. It started off existence as an application-specific device to assist a microprocessor with displaying graphics. You see, once upon a time, a computer display was text-only. No windows, graphics, images, mouse pointer, nothing. Text. Humble beginnings, no? A microprocessor acting alone could not do its usual general purpose thing and do graphical work at the same time. So GPUs were created to offload the graphics work from the microprocessor, which leads to why your computer desktop has images and icons and windows and whatnot.
When microprocessor performance hit a brick wall a few years ago, some clever people realized the GPU could be used to solve a certain class of processing problems. A single GPU consists of thousands of moderately powerful graphics instruction processors, each executing the same set of program instructions on different sets of data. Look at your computer screen and imagine it is divided into many independent regions of graphics data. All of the independent regions of graphics data have common image processing instructions performed on them, like shading, blending, interpolation and the like. The data is different, the instructions are the same. Some application programs display these characteristics in that they have to do the same thing over an enormous amount of independent chunks of data. Massive parallel execution, as long as each moderately powerful graphics instruction processor in a GPU is executing the same program on different independent sets of data. So maybe 1024 chickens is better than two strong oxen for certain types of problems. No surprising, really, as a hammer is great for certain problems while a socket wrench is great for other types of problems. This is why we have microprocessors, FPGAs, embedded microcontrollers and GPUs used as processors. It always depends on the problem you’re trying to solve.