Divide and conquer strategies for effective information retrieval

Jie Chen, Yousef Saad

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

The standard application of Latent Semantic Indexing (LSI), a well-known technique for information retrieval, requires the computation of a partial Singular Value Decomposition (SVD) of the term-document matrix. This computation becomes infeasible for large document collections, since it is very demanding both in terms of arithmetic operations and in memory requirements. This paper discusses two divide and conquer strategies, with the goal of alleviating these difficulties. Both strategies divide the document collection in subsets, perform relevance analysis on each subset, and conquer the analysis results to form the query response. Since each sub-problem resulting from the recursive division process has a smaller size, the processing of large scale document collections requires much fewer resources. In addition, the computation is highly parallel and can be easily adapted to a parallel computing environment. To reduce the computational cost, we perform the analysis on the subsets by using the Lanczos vectors instead of singular vectors as in the standard LSI method. This technique is far more efficient than the computation of the truncated SVD, while its accuracy is comparable. Experimental results confirm that the proposed divide and conquer strategies are effective for information retrieval problems.

Original languageEnglish (US)
Title of host publicationSociety for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics 133
Pages445-456
Number of pages12
StatePublished - Dec 1 2009
Event9th SIAM International Conference on Data Mining 2009, SDM 2009 - Sparks, NV, United States
Duration: Apr 30 2009May 2 2009

Publication series

NameSociety for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics
Volume1

Other

Other9th SIAM International Conference on Data Mining 2009, SDM 2009
CountryUnited States
CitySparks, NV
Period4/30/095/2/09

Fingerprint

Dive into the research topics of 'Divide and conquer strategies for effective information retrieval'. Together they form a unique fingerprint.

Cite this