This work was a PhD thesis completed in 1996. I have decided to publish it in more accessible form because many of the tricks and techniques I developed in the early 1990s for programming high-end systems now apply to mass-market computers, thanks to the widespread adoption of multicore designs.
This book should be of interest to anyone wanting to develop performance-efficient code on any shared-memory multiprocessor. That includes multicore devices in common use such as those in desktop computers, and a wide range of parts aimed at the embedded market. Since it is a PhD thesis, a sound background in architecture and to a lesser extent in algorihtms and data structures will be helpful as a starting point in reading this book.
What are the main lessons? Memory hierarchy is critical in achieving good performance. Examples I studied mostly required optimization of interaction with the cache system but the page memory translation system can also have significant impacts on overall performance. These lessons apply also to uniprocessor systems. Multiprocessor systems in addition need to be designed to take into account interprocessor communication, and synchronization – both of which can have serious performance impacts.
This work was done when C++ was still a new language so some features like templates and exceptions are not explored, since they were not implemented fully at the time. For this reason, you will not find the Standard Template Library (STL) mentioned. Nonetheless the stripping away of complexity relative to what you could do today is helpdul in understanding the principles.
Please excuse formatting glitches: I've priced it accordingly.
This book should be of interest to anyone wanting to develop performance-efficient code on any shared-memory multiprocessor. That includes multicore devices in common use such as those in desktop computers, and a wide range of parts aimed at the embedded market. Since it is a PhD thesis, a sound background in architecture and to a lesser extent in algorihtms and data structures will be helpful as a starting point in reading this book.
What are the main lessons? Memory hierarchy is critical in achieving good performance. Examples I studied mostly required optimization of interaction with the cache system but the page memory translation system can also have significant impacts on overall performance. These lessons apply also to uniprocessor systems. Multiprocessor systems in addition need to be designed to take into account interprocessor communication, and synchronization – both of which can have serious performance impacts.
This work was done when C++ was still a new language so some features like templates and exceptions are not explored, since they were not implemented fully at the time. For this reason, you will not find the Standard Template Library (STL) mentioned. Nonetheless the stripping away of complexity relative to what you could do today is helpdul in understanding the principles.
Please excuse formatting glitches: I've priced it accordingly.