Abstract | Interconnection networks of modern distributed computer systems are now hierarchical. In such systems communication time between processors depends on their placement in a computer system. In large-scale NUMA/SMP computer clusters first two levels are formed by network switches of two-stage fat tree/dragonfly topology and a third level is presented by a shared memory of computer nodes. In this paper, we describe a dynamic optimization method for collective all-to-all communication operations on hierarchical computer clusters. Our approach exploits knowledge of the L-level hierarchy of a computer system and is based on a mapping of intensively communicating processes into the same computer nodes. Optimized versions of the Bruck and recursive doubling algorithms for MPI_Allgather operation are proposed. Algorithms are implemented in our experimental library TopoMPI. Performance results on multicore SMP computer clusters with InfiniBand and Gigabit Ethernet networks indicate that the Allgather algorithms based on our approach outperform the original Allgather algorithms (Bruck and recursive doubling).
|