<p><it>Abstract—</it>Conventional multiprocessors mostly use centralized, memory-based barriers to synchronize concurrent processes created in multiple processors. These centralized barriers often become the bottleneck or hot spots in the shared memory. In this paper, we overcome the difficulty by presenting a distributed and hardwired barrier architecture, that is hierarchically constructed for fast synchronization in cluster-structured multiprocessors. The hierarchical architecture enables the scalability of cluster-structured multiprocessors. A special set of synchronization primitives is developed for explicit use of distributed barriers dynamically. To show the application of the hardwired barriers, we demonstrate how to synchronize Doall and Doacross loops using a limited number of hardwired barriers. Timing analysis shows an $<tmath>O(10^2)</tmath>$ to $<tmath>O(10^5)</tmath>$ reduction in synchronization overhead, compared with the use of software-controlled barriers implemented in a shared memory. The hardwired architecture is effective in implementing any partially ordered set of barriers or fuzzy barriers with extended synchronization regions. The versatility, scalability, programmability, and low overhead make the distributed barrier architecture attractive in constructing fine-grain, massively parallel MIMD systems using multiprocessor clusters with distributed shared memory.</p><p><it>Index Terms—</it>Barrier synchronization, distributed shared memory, Doacross loops, Doall loops, fuzzy barriers, parallel processing, partially ordered barriers, scalable multiprocessors, wired-NOR logic.</p>