Several techniques have been used to reduce the performance impact of process synchronization in fine-grained multiprocessor systems. These existing techniques tend to have long synchronization times or high shared-bus use, or they require complex and expensive hardware. A new technique is presented that uses distributed hardware locking queues to reduce both contention and latency to the minimum values that can be obtained using a shared-bus. This technique is shown to require at most two shared-bus transactions, with one transaction being typical. The latency for process continuation after obtaining a lock is reduced to near zero. Barrier synchronization using this distributed mechanism requires only one shared-bus transaction per processor involved in the barrier. This new technique is scalable and applicable to both new architectures and to existing systems, and is less complex than other hardware solutions.
|Original language||English (US)|
|Journal||Proceedings of the International Conference on Parallel Processing|
|State||Published - 1994|
|Event||23rd International Conference on Parallel Processing, ICPP 1994 - Raleigh, NC, United States|
Duration: Aug 15 1994 → Aug 19 1994
- Hardware Barrier