Conversion of categorical variables into numerical variables via Bayesian network classifiers for binary classifications

Namgil Lee; Jong Min Kim

doi:10.1016/j.csda.2009.11.003

Conversion of categorical variables into numerical variables via Bayesian network classifiers for binary classifications

Namgil Lee, Jong Min Kim

Research output: Contribution to journal › Article › peer-review

16 Scopus citations

Abstract

Many pattern classification algorithms such as Support Vector Machines (SVMs), Multi-Layer Perceptrons (MLPs), and K-Nearest Neighbors (KNNs) require data to consist of purely numerical variables. However many real world data consist of both categorical and numerical variables. In this paper we suggest an effective method of converting the mixed data of categorical and numerical variables into data of purely numerical variables for binary classifications. Since the suggested method is based on the theory of learning Bayesian Network Classifiers (BNCs), it is computationally efficient and robust to noises and data losses. Also the suggested method is expected to extract sufficient information for estimating a minimum-error-rate (MER) classifier. Simulations on artificial data sets and real world data sets are conducted to demonstrate the competitiveness of the suggested method when the number of values in each categorical variable is large and BNCs accurately model the data.

Original language	English (US)
Pages (from-to)	1247-1265
Number of pages	19
Journal	Computational Statistics and Data Analysis
Volume	54
Issue number	5
DOIs	https://doi.org/10.1016/j.csda.2009.11.003
State	Published - May 1 2010

Bibliographical note

Copyright:
Copyright 2010 Elsevier B.V., All rights reserved.

Access

10.1016/j.csda.2009.11.003

OpenUrl availability

Full text

Cite this

@article{3eefb8928dac4f39a4be597a18e5318a,

title = "Conversion of categorical variables into numerical variables via Bayesian network classifiers for binary classifications",

abstract = "Many pattern classification algorithms such as Support Vector Machines (SVMs), Multi-Layer Perceptrons (MLPs), and K-Nearest Neighbors (KNNs) require data to consist of purely numerical variables. However many real world data consist of both categorical and numerical variables. In this paper we suggest an effective method of converting the mixed data of categorical and numerical variables into data of purely numerical variables for binary classifications. Since the suggested method is based on the theory of learning Bayesian Network Classifiers (BNCs), it is computationally efficient and robust to noises and data losses. Also the suggested method is expected to extract sufficient information for estimating a minimum-error-rate (MER) classifier. Simulations on artificial data sets and real world data sets are conducted to demonstrate the competitiveness of the suggested method when the number of values in each categorical variable is large and BNCs accurately model the data.",

author = "Namgil Lee and Kim, {Jong Min}",

year = "2010",

month = may,

day = "1",

doi = "10.1016/j.csda.2009.11.003",

language = "English (US)",

volume = "54",

pages = "1247--1265",

journal = "Computational Statistics and Data Analysis",

issn = "0167-9473",

publisher = "Elsevier",

number = "5",

}

TY - JOUR

T1 - Conversion of categorical variables into numerical variables via Bayesian network classifiers for binary classifications

AU - Lee, Namgil

AU - Kim, Jong Min

PY - 2010/5/1

Y1 - 2010/5/1

N2 - Many pattern classification algorithms such as Support Vector Machines (SVMs), Multi-Layer Perceptrons (MLPs), and K-Nearest Neighbors (KNNs) require data to consist of purely numerical variables. However many real world data consist of both categorical and numerical variables. In this paper we suggest an effective method of converting the mixed data of categorical and numerical variables into data of purely numerical variables for binary classifications. Since the suggested method is based on the theory of learning Bayesian Network Classifiers (BNCs), it is computationally efficient and robust to noises and data losses. Also the suggested method is expected to extract sufficient information for estimating a minimum-error-rate (MER) classifier. Simulations on artificial data sets and real world data sets are conducted to demonstrate the competitiveness of the suggested method when the number of values in each categorical variable is large and BNCs accurately model the data.

AB - Many pattern classification algorithms such as Support Vector Machines (SVMs), Multi-Layer Perceptrons (MLPs), and K-Nearest Neighbors (KNNs) require data to consist of purely numerical variables. However many real world data consist of both categorical and numerical variables. In this paper we suggest an effective method of converting the mixed data of categorical and numerical variables into data of purely numerical variables for binary classifications. Since the suggested method is based on the theory of learning Bayesian Network Classifiers (BNCs), it is computationally efficient and robust to noises and data losses. Also the suggested method is expected to extract sufficient information for estimating a minimum-error-rate (MER) classifier. Simulations on artificial data sets and real world data sets are conducted to demonstrate the competitiveness of the suggested method when the number of values in each categorical variable is large and BNCs accurately model the data.

UR - http://www.scopus.com/inward/record.url?scp=77249101489&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77249101489&partnerID=8YFLogxK

U2 - 10.1016/j.csda.2009.11.003

DO - 10.1016/j.csda.2009.11.003

M3 - Article

AN - SCOPUS:77249101489

SN - 0167-9473

VL - 54

SP - 1247

EP - 1265

JO - Computational Statistics and Data Analysis

JF - Computational Statistics and Data Analysis

IS - 5

ER -

Conversion of categorical variables into numerical variables via Bayesian network classifiers for binary classifications

Abstract

Bibliographical note

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this