TY - GEN
T1 - Global linear neighborhoods for efficient label propagation
AU - Tian, Ze
AU - Kuang, Rui
PY - 2012
Y1 - 2012
N2 - Graph-based semi-supervised learning improves classification by combining labeled and unlabeled data through label propagation. It was shown that the sparse representation of graph by weighted local neighbors provides a better similarity measure between data points for label propagation. However, selecting local neighbors can lead to disjoint components and incorrect neighbors in graph, and thus, fail to capture the underlying global structure. In this paper, we propose to learn a nonnegative low-rank graph to capture global linear neighborhoods, under the assumption that each data point can be linearly reconstructed from weighted combinations of its direct neighbors and reachable indirect neighbors. The global linear neighborhoods utilize information from both direct and indirect neighbors to preserve the global cluster structures, while the low-rank property retains a compressed representation of the graph. An efficient algorithm based on a multiplicative update rule is designed to learn a nonnegative low-rank factorization matrix minimizing the neighborhood reconstruction error. Large scale simulations and experiments on UCI datasets and high-dimensional gene expression datasets showed that label propagation based on global linear neighborhoods captures the global cluster structures better and achieved more accurate classification results.
AB - Graph-based semi-supervised learning improves classification by combining labeled and unlabeled data through label propagation. It was shown that the sparse representation of graph by weighted local neighbors provides a better similarity measure between data points for label propagation. However, selecting local neighbors can lead to disjoint components and incorrect neighbors in graph, and thus, fail to capture the underlying global structure. In this paper, we propose to learn a nonnegative low-rank graph to capture global linear neighborhoods, under the assumption that each data point can be linearly reconstructed from weighted combinations of its direct neighbors and reachable indirect neighbors. The global linear neighborhoods utilize information from both direct and indirect neighbors to preserve the global cluster structures, while the low-rank property retains a compressed representation of the graph. An efficient algorithm based on a multiplicative update rule is designed to learn a nonnegative low-rank factorization matrix minimizing the neighborhood reconstruction error. Large scale simulations and experiments on UCI datasets and high-dimensional gene expression datasets showed that label propagation based on global linear neighborhoods captures the global cluster structures better and achieved more accurate classification results.
UR - http://www.scopus.com/inward/record.url?scp=84880250285&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84880250285&partnerID=8YFLogxK
U2 - 10.1137/1.9781611972825.74
DO - 10.1137/1.9781611972825.74
M3 - Conference contribution
AN - SCOPUS:84880250285
SN - 9781611972320
T3 - Proceedings of the 12th SIAM International Conference on Data Mining, SDM 2012
SP - 863
EP - 872
BT - Proceedings of the 12th SIAM International Conference on Data Mining, SDM 2012
PB - Society for Industrial and Applied Mathematics Publications
T2 - 12th SIAM International Conference on Data Mining, SDM 2012
Y2 - 26 April 2012 through 28 April 2012
ER -