TY - JOUR
T1 - Conversion of categorical variables into numerical variables via Bayesian network classifiers for binary classifications
AU - Lee, Namgil
AU - Kim, Jong Min
PY - 2010/5/1
Y1 - 2010/5/1
N2 - Many pattern classification algorithms such as Support Vector Machines (SVMs), Multi-Layer Perceptrons (MLPs), and K-Nearest Neighbors (KNNs) require data to consist of purely numerical variables. However many real world data consist of both categorical and numerical variables. In this paper we suggest an effective method of converting the mixed data of categorical and numerical variables into data of purely numerical variables for binary classifications. Since the suggested method is based on the theory of learning Bayesian Network Classifiers (BNCs), it is computationally efficient and robust to noises and data losses. Also the suggested method is expected to extract sufficient information for estimating a minimum-error-rate (MER) classifier. Simulations on artificial data sets and real world data sets are conducted to demonstrate the competitiveness of the suggested method when the number of values in each categorical variable is large and BNCs accurately model the data.
AB - Many pattern classification algorithms such as Support Vector Machines (SVMs), Multi-Layer Perceptrons (MLPs), and K-Nearest Neighbors (KNNs) require data to consist of purely numerical variables. However many real world data consist of both categorical and numerical variables. In this paper we suggest an effective method of converting the mixed data of categorical and numerical variables into data of purely numerical variables for binary classifications. Since the suggested method is based on the theory of learning Bayesian Network Classifiers (BNCs), it is computationally efficient and robust to noises and data losses. Also the suggested method is expected to extract sufficient information for estimating a minimum-error-rate (MER) classifier. Simulations on artificial data sets and real world data sets are conducted to demonstrate the competitiveness of the suggested method when the number of values in each categorical variable is large and BNCs accurately model the data.
UR - http://www.scopus.com/inward/record.url?scp=77249101489&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77249101489&partnerID=8YFLogxK
U2 - 10.1016/j.csda.2009.11.003
DO - 10.1016/j.csda.2009.11.003
M3 - Article
AN - SCOPUS:77249101489
SN - 0167-9473
VL - 54
SP - 1247
EP - 1265
JO - Computational Statistics and Data Analysis
JF - Computational Statistics and Data Analysis
IS - 5
ER -