High performance sparse Cholesky factorization algorithm for scalable parallel computers

George Karypis; Vipin Kumar

High performance sparse Cholesky factorization algorithm for scalable parallel computers

George Karypis, Vipin Kumar

Computer Science and Engineering

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

3 Scopus citations

Abstract

This paper presents a new parallel algorithm for sparse matrix factorization. This algorithm uses subforest-to-subcube mapping instead of the subtree-to-subcube mapping of another recently introduced scheme by Gupta and Kumar. Asymptotically, both formulations are equally scalable on a wide range of architectures and a wide variety of problems. But the subtree-to-subcube mapping of the earlier formulation causes significant load imbalance among processors, limiting overall efficiency and speedup. The new mapping largely eliminates the load imbalance among processors. Furthermore, the algorithm has a number of enhancements to improve the overall performance substantially. This new algorithm achieves up to 20 GFlops on a 1024-processor Cray T3D for moderately large problems. To our knowledge, this is the highest performance ever obtained on an MPP for sparse Cholesky factorization.

Original language	English (US)
Title of host publication	Frontiers of Massively Parallel Computation - Conference Proceedings
Publisher	IEEE
Pages	140-147
Number of pages	8
State	Published - Jan 1 1995
Event	Proceedings of the 5th Symposium on the Frontiers of Massively Parallel Computation - McLean, VA, USA Duration: Feb 6 1995 → Feb 9 1995

Other

Other	Proceedings of the 5th Symposium on the Frontiers of Massively Parallel Computation
City	McLean, VA, USA
Period	2/6/95 → 2/9/95

OpenUrl availability

Full text

Cite this

@inproceedings{9dbfc37e1f8840c2bfb9eea7f6028e78,

title = "High performance sparse Cholesky factorization algorithm for scalable parallel computers",

abstract = "This paper presents a new parallel algorithm for sparse matrix factorization. This algorithm uses subforest-to-subcube mapping instead of the subtree-to-subcube mapping of another recently introduced scheme by Gupta and Kumar. Asymptotically, both formulations are equally scalable on a wide range of architectures and a wide variety of problems. But the subtree-to-subcube mapping of the earlier formulation causes significant load imbalance among processors, limiting overall efficiency and speedup. The new mapping largely eliminates the load imbalance among processors. Furthermore, the algorithm has a number of enhancements to improve the overall performance substantially. This new algorithm achieves up to 20 GFlops on a 1024-processor Cray T3D for moderately large problems. To our knowledge, this is the highest performance ever obtained on an MPP for sparse Cholesky factorization.",

author = "George Karypis and Vipin Kumar",

year = "1995",

month = jan,

day = "1",

language = "English (US)",

pages = "140--147",

booktitle = "Frontiers of Massively Parallel Computation - Conference Proceedings",

publisher = "IEEE",

note = "Proceedings of the 5th Symposium on the Frontiers of Massively Parallel Computation ; Conference date: 06-02-1995 Through 09-02-1995",

}

TY - GEN

T1 - High performance sparse Cholesky factorization algorithm for scalable parallel computers

AU - Karypis, George

AU - Kumar, Vipin

PY - 1995/1/1

Y1 - 1995/1/1

N2 - This paper presents a new parallel algorithm for sparse matrix factorization. This algorithm uses subforest-to-subcube mapping instead of the subtree-to-subcube mapping of another recently introduced scheme by Gupta and Kumar. Asymptotically, both formulations are equally scalable on a wide range of architectures and a wide variety of problems. But the subtree-to-subcube mapping of the earlier formulation causes significant load imbalance among processors, limiting overall efficiency and speedup. The new mapping largely eliminates the load imbalance among processors. Furthermore, the algorithm has a number of enhancements to improve the overall performance substantially. This new algorithm achieves up to 20 GFlops on a 1024-processor Cray T3D for moderately large problems. To our knowledge, this is the highest performance ever obtained on an MPP for sparse Cholesky factorization.

AB - This paper presents a new parallel algorithm for sparse matrix factorization. This algorithm uses subforest-to-subcube mapping instead of the subtree-to-subcube mapping of another recently introduced scheme by Gupta and Kumar. Asymptotically, both formulations are equally scalable on a wide range of architectures and a wide variety of problems. But the subtree-to-subcube mapping of the earlier formulation causes significant load imbalance among processors, limiting overall efficiency and speedup. The new mapping largely eliminates the load imbalance among processors. Furthermore, the algorithm has a number of enhancements to improve the overall performance substantially. This new algorithm achieves up to 20 GFlops on a 1024-processor Cray T3D for moderately large problems. To our knowledge, this is the highest performance ever obtained on an MPP for sparse Cholesky factorization.

UR - http://www.scopus.com/inward/record.url?scp=0029185367&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0029185367&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:0029185367

SP - 140

EP - 147

BT - Frontiers of Massively Parallel Computation - Conference Proceedings

PB - IEEE

T2 - Proceedings of the 5th Symposium on the Frontiers of Massively Parallel Computation

Y2 - 6 February 1995 through 9 February 1995

ER -

High performance sparse Cholesky factorization algorithm for scalable parallel computers

Abstract

Other

OpenUrl availability

Other files and links

Fingerprint

Cite this