TY - GEN

T1 - A framework for exploring categorical data

AU - Chandola, Varun

AU - Boriah, Shyam

AU - Kumar, Vipin

PY - 2009/12/1

Y1 - 2009/12/1

N2 - In this paper, we present a framework for categorical data analysis which allows such data sets to be explored using a rich set of techniques that are only applicable to continuous data sets. We introduce the concept of separability statistics in the context of exploratory categorical data analysis. We show how these statistics can be used as a way to map categorical data to continuous space given a labeled reference data set. This mapping enables visualization of categorical data using techniques that are applicable to continuous data. We show that in the transformed continuous space, the performance of the standard k-nn based outlier detection technique is comparable to the performance of the k-nn based outlier detection technique using the best of the similarity measures designed for categorical data. The proposed framework can also be used to devise similarity measures best suited for a particular type of data set.

AB - In this paper, we present a framework for categorical data analysis which allows such data sets to be explored using a rich set of techniques that are only applicable to continuous data sets. We introduce the concept of separability statistics in the context of exploratory categorical data analysis. We show how these statistics can be used as a way to map categorical data to continuous space given a labeled reference data set. This mapping enables visualization of categorical data using techniques that are applicable to continuous data. We show that in the transformed continuous space, the performance of the standard k-nn based outlier detection technique is comparable to the performance of the k-nn based outlier detection technique using the best of the similarity measures designed for categorical data. The proposed framework can also be used to devise similarity measures best suited for a particular type of data set.

UR - http://www.scopus.com/inward/record.url?scp=72849132071&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=72849132071&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:72849132071

SN - 9781615671090

T3 - Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics

SP - 184

EP - 195

BT - Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics 133

T2 - 9th SIAM International Conference on Data Mining 2009, SDM 2009

Y2 - 30 April 2009 through 2 May 2009

ER -