Improve precategorized collection retrieval by using supervised term weighting schemes

Ying Zhao; George Karypis

doi:10.1109/ITCC.2002.1000353

Improve precategorized collection retrieval by using supervised term weighting schemes

Ying Zhao, George Karypis

Computer Science and Engineering

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

6 Scopus citations

Abstract

The emergence of the World Wide Web has led to an increased interest in methods for searching for information. A key characteristic of many online document collections is that the documents have pre-defined category information, such as the variety of scientific articles accessible via digital libraries (e.g. ACM, IEEE, etc.), medical articles, news-wires and various directories (e.g. Yahoo, OpenDirectory Project, etc.). However, most previous information retrieval systems have not taken the pre-existing category information into account. In this paper, we present weight adjustment schemes based upon the category information in the vector-space model, which are able to select the most content-specific and discriminating features. Our experimental results on TREC data sets show that the pre-existing category information does provide additional beneficial information to improve retrieval. The proposed weight adjustment schemes perform better than the vector-space model with the inverse document frequency (IDF) weighting scheme when queries are less specific. The proposed weighting schemes can also benefit retrieval when clusters are used as an approximations to categories.

Original language	English (US)
Title of host publication	Proceedings - International Conference on Information Technology
Subtitle of host publication	Coding and Computing, ITCC 2002
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	16-21
Number of pages	6
ISBN (Electronic)	0769515061, 9780769515069
DOIs	https://doi.org/10.1109/ITCC.2002.1000353
State	Published - 2002
Event	International Conference on Information Technology: Coding and Computing, ITCC 2002 - Las Vegas, United States Duration: Apr 8 2002 → Apr 10 2002

Publication series

Name	Proceedings - International Conference on Information Technology: Coding and Computing, ITCC 2002

Other

Other	International Conference on Information Technology: Coding and Computing, ITCC 2002
Country/Territory	United States
City	Las Vegas
Period	4/8/02 → 4/10/02

Bibliographical note

Publisher Copyright:
© 2002 IEEE.

Keywords

Computer science
Contracts
Frequency
Indexing
Information retrieval
Inverse problems
Software libraries
Text categorization
Text mining
US Department of Energy

Access

10.1109/ITCC.2002.1000353

OpenUrl availability

Full text

Cite this

Zhao, Y., & Karypis, G. (2002). Improve precategorized collection retrieval by using supervised term weighting schemes. In Proceedings - International Conference on Information Technology: Coding and Computing, ITCC 2002 (pp. 16-21). Article 1000353 (Proceedings - International Conference on Information Technology: Coding and Computing, ITCC 2002). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ITCC.2002.1000353

Improve precategorized collection retrieval by using supervised term weighting schemes. / Zhao, Ying; Karypis, George.
Proceedings - International Conference on Information Technology: Coding and Computing, ITCC 2002. Institute of Electrical and Electronics Engineers Inc., 2002. p. 16-21 1000353 (Proceedings - International Conference on Information Technology: Coding and Computing, ITCC 2002).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Zhao, Y & Karypis, G 2002, Improve precategorized collection retrieval by using supervised term weighting schemes. in Proceedings - International Conference on Information Technology: Coding and Computing, ITCC 2002., 1000353, Proceedings - International Conference on Information Technology: Coding and Computing, ITCC 2002, Institute of Electrical and Electronics Engineers Inc., pp. 16-21, International Conference on Information Technology: Coding and Computing, ITCC 2002, Las Vegas, United States, 4/8/02. https://doi.org/10.1109/ITCC.2002.1000353

Zhao Y, Karypis G. Improve precategorized collection retrieval by using supervised term weighting schemes. In Proceedings - International Conference on Information Technology: Coding and Computing, ITCC 2002. Institute of Electrical and Electronics Engineers Inc. 2002. p. 16-21. 1000353. (Proceedings - International Conference on Information Technology: Coding and Computing, ITCC 2002). doi: 10.1109/ITCC.2002.1000353

Zhao, Ying ; Karypis, George. / Improve precategorized collection retrieval by using supervised term weighting schemes. Proceedings - International Conference on Information Technology: Coding and Computing, ITCC 2002. Institute of Electrical and Electronics Engineers Inc., 2002. pp. 16-21 (Proceedings - International Conference on Information Technology: Coding and Computing, ITCC 2002).

@inproceedings{85aa06269bee4493ba74d7722a609bbf,

title = "Improve precategorized collection retrieval by using supervised term weighting schemes",

abstract = "The emergence of the World Wide Web has led to an increased interest in methods for searching for information. A key characteristic of many online document collections is that the documents have pre-defined category information, such as the variety of scientific articles accessible via digital libraries (e.g. ACM, IEEE, etc.), medical articles, news-wires and various directories (e.g. Yahoo, OpenDirectory Project, etc.). However, most previous information retrieval systems have not taken the pre-existing category information into account. In this paper, we present weight adjustment schemes based upon the category information in the vector-space model, which are able to select the most content-specific and discriminating features. Our experimental results on TREC data sets show that the pre-existing category information does provide additional beneficial information to improve retrieval. The proposed weight adjustment schemes perform better than the vector-space model with the inverse document frequency (IDF) weighting scheme when queries are less specific. The proposed weighting schemes can also benefit retrieval when clusters are used as an approximations to categories.",

keywords = "Computer science, Contracts, Frequency, Indexing, Information retrieval, Inverse problems, Software libraries, Text categorization, Text mining, US Department of Energy",

author = "Ying Zhao and George Karypis",

note = "Publisher Copyright: {\textcopyright} 2002 IEEE.; International Conference on Information Technology: Coding and Computing, ITCC 2002 ; Conference date: 08-04-2002 Through 10-04-2002",

year = "2002",

doi = "10.1109/ITCC.2002.1000353",

language = "English (US)",

series = "Proceedings - International Conference on Information Technology: Coding and Computing, ITCC 2002",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "16--21",

booktitle = "Proceedings - International Conference on Information Technology",

}

TY - GEN

T1 - Improve precategorized collection retrieval by using supervised term weighting schemes

AU - Zhao, Ying

AU - Karypis, George

PY - 2002

Y1 - 2002

N2 - The emergence of the World Wide Web has led to an increased interest in methods for searching for information. A key characteristic of many online document collections is that the documents have pre-defined category information, such as the variety of scientific articles accessible via digital libraries (e.g. ACM, IEEE, etc.), medical articles, news-wires and various directories (e.g. Yahoo, OpenDirectory Project, etc.). However, most previous information retrieval systems have not taken the pre-existing category information into account. In this paper, we present weight adjustment schemes based upon the category information in the vector-space model, which are able to select the most content-specific and discriminating features. Our experimental results on TREC data sets show that the pre-existing category information does provide additional beneficial information to improve retrieval. The proposed weight adjustment schemes perform better than the vector-space model with the inverse document frequency (IDF) weighting scheme when queries are less specific. The proposed weighting schemes can also benefit retrieval when clusters are used as an approximations to categories.

AB - The emergence of the World Wide Web has led to an increased interest in methods for searching for information. A key characteristic of many online document collections is that the documents have pre-defined category information, such as the variety of scientific articles accessible via digital libraries (e.g. ACM, IEEE, etc.), medical articles, news-wires and various directories (e.g. Yahoo, OpenDirectory Project, etc.). However, most previous information retrieval systems have not taken the pre-existing category information into account. In this paper, we present weight adjustment schemes based upon the category information in the vector-space model, which are able to select the most content-specific and discriminating features. Our experimental results on TREC data sets show that the pre-existing category information does provide additional beneficial information to improve retrieval. The proposed weight adjustment schemes perform better than the vector-space model with the inverse document frequency (IDF) weighting scheme when queries are less specific. The proposed weighting schemes can also benefit retrieval when clusters are used as an approximations to categories.

KW - Computer science

KW - Contracts

KW - Frequency

KW - Indexing

KW - Information retrieval

KW - Inverse problems

KW - Software libraries

KW - Text categorization

KW - Text mining

KW - US Department of Energy

UR - http://www.scopus.com/inward/record.url?scp=34250177459&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34250177459&partnerID=8YFLogxK

U2 - 10.1109/ITCC.2002.1000353

DO - 10.1109/ITCC.2002.1000353

M3 - Conference contribution

AN - SCOPUS:34250177459

T3 - Proceedings - International Conference on Information Technology: Coding and Computing, ITCC 2002

SP - 16

EP - 21

BT - Proceedings - International Conference on Information Technology

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - International Conference on Information Technology: Coding and Computing, ITCC 2002

Y2 - 8 April 2002 through 10 April 2002

ER -

Improve precategorized collection retrieval by using supervised term weighting schemes

Abstract

Publication series

Other

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this