TY - GEN
T1 - Summarization - Compressing data into an informative representation
AU - Chandola, Varun
AU - Kumar, Vipin
PY - 2005
Y1 - 2005
N2 - In this paper, we formulate the problem of summarization of a dataset of transactions with categorical attributes as an optimization problem involving two objective functions - compaction gain and information loss. We propose metrics to characterize the output of any summarization algorithm. We investigate two approaches to address this problem. The first approach is an adaptation of clustering and the second approach makes use of frequent itemsets from the association analysis domain. We illustrate one application of summarization in the field of network data where we show how our technique can be effectively used to summarize network traffic into a compact but meaningful representation. Specifically, we evaluate our proposed algorithms on the 1998 DARPA Off-line Intrusion Detection Evaluation data and network data generated by SKAION Corp for the ARDA information assurance program.
AB - In this paper, we formulate the problem of summarization of a dataset of transactions with categorical attributes as an optimization problem involving two objective functions - compaction gain and information loss. We propose metrics to characterize the output of any summarization algorithm. We investigate two approaches to address this problem. The first approach is an adaptation of clustering and the second approach makes use of frequent itemsets from the association analysis domain. We illustrate one application of summarization in the field of network data where we show how our technique can be effectively used to summarize network traffic into a compact but meaningful representation. Specifically, we evaluate our proposed algorithms on the 1998 DARPA Off-line Intrusion Detection Evaluation data and network data generated by SKAION Corp for the ARDA information assurance program.
UR - http://www.scopus.com/inward/record.url?scp=34548580802&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34548580802&partnerID=8YFLogxK
U2 - 10.1109/ICDM.2005.137
DO - 10.1109/ICDM.2005.137
M3 - Conference contribution
AN - SCOPUS:34548580802
SN - 0769522785
SN - 9780769522784
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 98
EP - 105
BT - Proceedings - Fifth IEEE International Conference on Data Mining, ICDM 2005
T2 - 5th IEEE International Conference on Data Mining, ICDM 2005
Y2 - 27 November 2005 through 30 November 2005
ER -