Parallel Simulation to Improve Cache Behaviour in a Shared-Memory Multiprocessor:

The Value of Distributed Synchronization

in Proc. 7th Workshop on Parallel and Distributed Simulation, San Diego, May 1993, pp 159-162

David R. Cheriton, Hendrik A. Goosen*, Hugh Holbrook, and Philip Machanick**

Computer Science Department, Stanford University

Synchronization is a significant cost in many parallel programs, and can be a major bottleneck if it is handled in a centralized fashion using traditional shared-memory constructs such as barriers. In a parallel time-stepped simulation, the use of global synchronization primitives limits scalability, increases the sensitivity to load imbalance, and reduces the potential for exploiting locality to improve cache behavior.

This paper presents the results of an initial one-application study quantifying the costs and performance benefits of distributed, nearest neighbors synchronization. The application studied, MP3D, is a particle-based wind tunnel simulation. Our results for this one application on current shared-memory multiprocessors show a significant decrease in synchronization time using these techniques. We prototyped an application-independent library that implements distributed synchronization. The library allows a variety of parallel simulations to exploit these techniques without increasing the application programming beyond that of conventional approaches.

ACM DL Author-ize serviceRestructuring a parallel simulation to improve cache behavior in a shared-memory multiprocessor: the value of distributed synchronization
David R. Cheriton, Hendrik A. Goosen, Hugh Holbrook, Philip Machanick
PADS '93 Proceedings of the seventh workshop on Parallel and distributed simulation, 1993

* On leave from the Computer Science Department, University of Cape Town.

** On leave from the Computer Science Department, University of the Witwatersrand.