TY - GEN
T1 - Algorithms to calculate the Manhattan (L1) distance for vertical data represented in pTrees
AU - Hossain, Mohammad K.
AU - Roy, Arjun G.
AU - Chatterjee, Arijit
AU - Perrizo, William
PY - 2012
Y1 - 2012
N2 - In data mining applications different types of distance metrics are used to measure the closeness of two data points. Among these metrics Manhattan (L1), Euclidean (L2) and Max (L.) distances are used very frequently in various algorithms. In pTree vertical data representation Max distance can be efficiently implemented using only bitwise operations across the pTrees without any horizontal access of the data points. But many clustering and classification algorithms require computing L1 and L2 distances in order to increase 1 their accuracy. In this paper we have shown how Manhattan or L1 distance can be calculated for vertical data represented in pTrees. Similar to the Max distance this algorithm also uses only bitwise operations across various pTrees without performing any horizontal scan of the data points. As a result the algorithm works very fast on huge volume of data represented by pTrees comparing with traditional horizontal data representation. Also these algorithms enable various data mining algorithms that use pTrees to improve their accuracy without sacrificing any significant speed.
AB - In data mining applications different types of distance metrics are used to measure the closeness of two data points. Among these metrics Manhattan (L1), Euclidean (L2) and Max (L.) distances are used very frequently in various algorithms. In pTree vertical data representation Max distance can be efficiently implemented using only bitwise operations across the pTrees without any horizontal access of the data points. But many clustering and classification algorithms require computing L1 and L2 distances in order to increase 1 their accuracy. In this paper we have shown how Manhattan or L1 distance can be calculated for vertical data represented in pTrees. Similar to the Max distance this algorithm also uses only bitwise operations across various pTrees without performing any horizontal scan of the data points. As a result the algorithm works very fast on huge volume of data represented by pTrees comparing with traditional horizontal data representation. Also these algorithms enable various data mining algorithms that use pTrees to improve their accuracy without sacrificing any significant speed.
UR - http://www.scopus.com/inward/record.url?scp=84872004517&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84872004517&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84872004517
SN - 9781880843840
T3 - Proceedings of the ISCA 27th International Conference on Computers and Their Applications, CATA 2012
SP - 65
EP - 70
BT - Proceedings of the ISCA 27th International Conference on Computers and Their Applications, CATA 2012
T2 - 27th International Conference on Computers and Their Applications, CATA 2012
Y2 - 12 March 2012 through 14 March 2012
ER -