Similarity graph-based approach to declustering problems and its application towards parallelizing grid files

Duen Ren Liu; Shashi Shekhar

Similarity graph-based approach to declustering problems and its application towards parallelizing grid files

Duen Ren Liu, Shashi Shekhar

Computer Science and Engineering

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

17 Scopus citations

Abstract

We propose a new similarity-based technique for declustering data. The proposed method can adapt to available information about query distributions, data distributions, data sizes and partition-size constraints. The method is based on max-cut partitioning of a similarity graph defined over the given set of data, under constraints on the partition sizes. It maximizes the chances that a pair of data-items that are to be accessed together by queries are allocated to distinct disks. We show that the proposed method can achieve optimal speed-up for a query-set, if there exists any other declustering method which will achieve the optimal speed-up. Experiments in parallelizing Grid Files show that the proposed method outperforms mapping-function-based methods for interesting query distributions as well for non-uniform data distributions.

Original language	English (US)
Title of host publication	Proceedings - International Conference on Data Engineering
Publisher	IEEE
Pages	373-381
Number of pages	9
State	Published - Jan 1 1995
Event	Proceedings of the 1995 IEEE 11th International Conference on Data Engineering - Taipei, Taiwan Duration: Mar 6 1995 → Mar 10 1995

Other

Other	Proceedings of the 1995 IEEE 11th International Conference on Data Engineering
City	Taipei, Taiwan
Period	3/6/95 → 3/10/95

OpenUrl availability

Full text

Cite this

@inproceedings{1b639a2631394eae863c485a175d9710,

title = "Similarity graph-based approach to declustering problems and its application towards parallelizing grid files",

abstract = "We propose a new similarity-based technique for declustering data. The proposed method can adapt to available information about query distributions, data distributions, data sizes and partition-size constraints. The method is based on max-cut partitioning of a similarity graph defined over the given set of data, under constraints on the partition sizes. It maximizes the chances that a pair of data-items that are to be accessed together by queries are allocated to distinct disks. We show that the proposed method can achieve optimal speed-up for a query-set, if there exists any other declustering method which will achieve the optimal speed-up. Experiments in parallelizing Grid Files show that the proposed method outperforms mapping-function-based methods for interesting query distributions as well for non-uniform data distributions.",

author = "Liu, {Duen Ren} and Shashi Shekhar",

year = "1995",

month = jan,

day = "1",

language = "English (US)",

pages = "373--381",

booktitle = "Proceedings - International Conference on Data Engineering",

publisher = "IEEE",

note = "Proceedings of the 1995 IEEE 11th International Conference on Data Engineering ; Conference date: 06-03-1995 Through 10-03-1995",

}

TY - GEN

T1 - Similarity graph-based approach to declustering problems and its application towards parallelizing grid files

AU - Liu, Duen Ren

AU - Shekhar, Shashi

PY - 1995/1/1

Y1 - 1995/1/1

N2 - We propose a new similarity-based technique for declustering data. The proposed method can adapt to available information about query distributions, data distributions, data sizes and partition-size constraints. The method is based on max-cut partitioning of a similarity graph defined over the given set of data, under constraints on the partition sizes. It maximizes the chances that a pair of data-items that are to be accessed together by queries are allocated to distinct disks. We show that the proposed method can achieve optimal speed-up for a query-set, if there exists any other declustering method which will achieve the optimal speed-up. Experiments in parallelizing Grid Files show that the proposed method outperforms mapping-function-based methods for interesting query distributions as well for non-uniform data distributions.

AB - We propose a new similarity-based technique for declustering data. The proposed method can adapt to available information about query distributions, data distributions, data sizes and partition-size constraints. The method is based on max-cut partitioning of a similarity graph defined over the given set of data, under constraints on the partition sizes. It maximizes the chances that a pair of data-items that are to be accessed together by queries are allocated to distinct disks. We show that the proposed method can achieve optimal speed-up for a query-set, if there exists any other declustering method which will achieve the optimal speed-up. Experiments in parallelizing Grid Files show that the proposed method outperforms mapping-function-based methods for interesting query distributions as well for non-uniform data distributions.

UR - http://www.scopus.com/inward/record.url?scp=0029232285&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0029232285&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:0029232285

SP - 373

EP - 381

BT - Proceedings - International Conference on Data Engineering

PB - IEEE

T2 - Proceedings of the 1995 IEEE 11th International Conference on Data Engineering

Y2 - 6 March 1995 through 10 March 1995

ER -

Similarity graph-based approach to declustering problems and its application towards parallelizing grid files

Abstract

Other

OpenUrl availability

Other files and links

Fingerprint

Cite this