Pages: p. 5
The balance of computation and communication in any application or task is, of course, known to be a fundamental determinant of delivered performance. Such balance is a much-sought-after goal for the architects and designers of any system, large or small, from the smallest processor chip to the largest of interconnected hardware or software system (such as the Web). Depending on the workload, and the level at which you analyze performance, information processing is either computation or communication bound.
The workload variability factor is easy to see; but let's take a look at the other factor: the measurement perspective. For example, from the perspective of a single machine node in a massive cluster of computers, the main CPU might be frequently underutilized, waiting for data from the networked world to which it connects. Thus, at the compute node's level, the application appears mostly communication bound, in this case.
However, the overriding reason for this high communication latency might actually be due to low processing bandwidth or poor throughput performance at the network router level. The network processor(s) handling the communication traffic within the router hardware might actually be compute bound, and this could be the real performance bottleneck when viewed from the perspective of the whole (integrated) system. Indeed, in general, the workload might exhibit phased behavior, where at one time the main compute node processor is the bottleneck; at another time the network processor hub is the bottleneck. And, of course, limitations in the link communication latency and bandwidth can also be the main source of traffic congestion and the performance shortfall, not the compute node or the communication node, per se.
So, if the goal is to optimize overall system performance in a networked environment, the balance between computation and communication at each node (be it a true number-crunching compute node or a traffic routing communication node) and the intervening communication links requires careful architecting.
If the compute node idles because of poor communication latency, but there is spare communication bandwidth, then it makes sense to upgrade the compute node to support higher degrees of multithreading (either through multiple cores or the use of simultaneous multithreading, or both).
On the other hand, if available bandwidth is the constraint, reducing the number of threads in the compute node CPU, while increasing the single-thread performance, is probably the way to go. How to maintain such rate match or balance dynamically, across variable application workloads, to maintain overall system efficiency from the perspective of performance and power, is an interesting research topic that many in this field are pursuing.
IEEE Micro devotes this issue to network processors, a key set of components that help engineers build rather complex, interconnected computer systems developed to solve a variety of scientific, engineering, and commercial applications. Network processor design brings out many of the inherent power-performance trade-off choices common in the world of general-purpose processors. Yet, because of their unique role in optimizing system communication needs (as opposed to computational needs), these processors have certain unique requirements that make it interesting to focus on this segment of the microprocessor market. I hope you find these articles interesting and relevant to your needs. Enjoy!