In large multiprocessor systems, fast synchronisation is crucial for high performance. However, synchronisation traffic tends to create "hot-spots" in shared memory and cause network congestion. Multistage shuffle-exchange networks hare been proposed and built to handle synchronisation traffic. Software combining schemes have also been proposed to relieve network congestion caused by hot-spots. However, multistage combining networks are very expensive and software combining is very slow. In this paper, we propose a single-stage combining network to handle synchronisation traffic, which is separated from the regular memory traffic. A single-stage combining network has several advantages: (1) It is very cost effective because only one stage is needed (instead of logN stages). (2) Only one network is needed to handle both forward and returning requests. (3) All synchronisation traffic is confined to a single stage; hence, packets have more opportunities to combine. (4) Combined requests are distributed evenly through the network; the wait buffer site is reduced. (5) Fast-finishing algorithms  can be used to shorten the network delay. Because of all these advantages, we show that a single-stage combining network gives good performance at a lower cost than a multistage combining network.