High performance sparse Cholesky factorization algorithm for scalable parallel computers

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

This paper presents a new parallel algorithm for sparse matrix factorization. This algorithm uses subforest-to-subcube mapping instead of the subtree-to-subcube mapping of another recently introduced scheme by Gupta and Kumar. Asymptotically, both formulations are equally scalable on a wide range of architectures and a wide variety of problems. But the subtree-to-subcube mapping of the earlier formulation causes significant load imbalance among processors, limiting overall efficiency and speedup. The new mapping largely eliminates the load imbalance among processors. Furthermore, the algorithm has a number of enhancements to improve the overall performance substantially. This new algorithm achieves up to 20 GFlops on a 1024-processor Cray T3D for moderately large problems. To our knowledge, this is the highest performance ever obtained on an MPP for sparse Cholesky factorization.

Original languageEnglish (US)
Title of host publicationFrontiers of Massively Parallel Computation - Conference Proceedings
PublisherIEEE
Pages140-147
Number of pages8
StatePublished - Jan 1 1995
EventProceedings of the 5th Symposium on the Frontiers of Massively Parallel Computation - McLean, VA, USA
Duration: Feb 6 1995Feb 9 1995

Other

OtherProceedings of the 5th Symposium on the Frontiers of Massively Parallel Computation
CityMcLean, VA, USA
Period2/6/952/9/95

Fingerprint

Dive into the research topics of 'High performance sparse Cholesky factorization algorithm for scalable parallel computers'. Together they form a unique fingerprint.

Cite this