Multiprocessing: Functions and Features

Performance Traps

The use of two processors for multithreaded applications does not mean that the application will run at double speed. In his 1967 rule, Gene Amdahl describes the way performance gains of multiple processors are limited by threads which cannot be parallelized. According to Amdahl, a multithreaded application never shows linear scaling in regards to the number of processors involved. Further loss comes from limited system resources such as the available bandwidth of shared main memory in SMP systems.

Programs which show a need for a much more effort to synchronize threads and data than the actual advantage of gained time through parallel processing represents, are not suited for multithreading. Such software would run slower as a single threaded version, even if more CPUs are used. In contrast, multithreading is especially simple and efficient, if larger amounts of data can be split in independent segments. An example for this is picture editing with complex filters: An image is separated in several regions, each processor works on a different region. Similar scenarios can be seen at numerous scientific applications and engineering software (structural, field calculations, stream mechanics etc.), which work by the method of finite elements. In these cases certain areas of a net to be calculated are assigned to a processor. After completion of single calculations the partial results are combined using the rules of the framework.

Theoretically, also single threaded applications should run faster in a multithreaded system. The operating system assigns the application to a CPU, while the second CPU processes the overhead of the operating system. This assumption is confirmed, if the CPU usage of both processors is viewed in the system monitor.

However, if the performance of a single threaded application is measured, it is often lower in a multiprocessor system than in a PC with just one processor. The same applies to benchmark results, which are delivered for example by the BAPCO suite, which contains numerous single threaded applications. The explanation: Both processors maintain steady communication between each other and continuously synchronize their cache contents. The overhead for the synchronization slows down the application involved.