TY - GEN
T1 - Identifying high cardinality internet hosts
AU - Cao, Jin
AU - Jin, Yu
AU - Chen, Aiyou
AU - Bu, Tian
AU - Zhang, Zhi-Li
PY - 2009
Y1 - 2009
N2 - The Internet host cardinality, defined as the number of distinct peers that an Internet host communicates with, is an important metric for profiling Internet hosts. Some example applications include behavior based network intrusion detection, p2p hosts identification, and server identification. However, due to the tremendous number of hosts in the Internet and high speed links, tracking the exact cardinality of each host is not feasible due to the limited memory and computation resource. Existing approaches on host cardinality counting have primarily focused on hosts of extremely high cardinalities. These methods do not work well with hosts of moderately large cardinalities that are needed for certain host behavior profiling such as detection of p2p hosts or port scanners. In this paper, we propose an online sampling approach for identifying hosts whose cardinality exceeds some moderate prescribed threshold, e.g. 50, or within specific ranges. The main advantage of our approach is that it can filter out the majority of low cardinality hosts while preserving the hosts of interest, and hence minimize the memory resources wasted by tracking irrelevant hosts. Our approach consists of three components: 1) two-phase filtering for eliminating low cardinality hosts, 2) thresholded bitmap for counting cardinalities, and 3) bias correction. Through both theoretical analysis and experiments using real Internet traces, we demonstrate that our approach requires much less memory than existing approaches do whereas yields more accurate estimates.
AB - The Internet host cardinality, defined as the number of distinct peers that an Internet host communicates with, is an important metric for profiling Internet hosts. Some example applications include behavior based network intrusion detection, p2p hosts identification, and server identification. However, due to the tremendous number of hosts in the Internet and high speed links, tracking the exact cardinality of each host is not feasible due to the limited memory and computation resource. Existing approaches on host cardinality counting have primarily focused on hosts of extremely high cardinalities. These methods do not work well with hosts of moderately large cardinalities that are needed for certain host behavior profiling such as detection of p2p hosts or port scanners. In this paper, we propose an online sampling approach for identifying hosts whose cardinality exceeds some moderate prescribed threshold, e.g. 50, or within specific ranges. The main advantage of our approach is that it can filter out the majority of low cardinality hosts while preserving the hosts of interest, and hence minimize the memory resources wasted by tracking irrelevant hosts. Our approach consists of three components: 1) two-phase filtering for eliminating low cardinality hosts, 2) thresholded bitmap for counting cardinalities, and 3) bias correction. Through both theoretical analysis and experiments using real Internet traces, we demonstrate that our approach requires much less memory than existing approaches do whereas yields more accurate estimates.
UR - http://www.scopus.com/inward/record.url?scp=70349653447&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70349653447&partnerID=8YFLogxK
U2 - 10.1109/INFCOM.2009.5061990
DO - 10.1109/INFCOM.2009.5061990
M3 - Conference contribution
AN - SCOPUS:70349653447
SN - 9781424435135
T3 - Proceedings - IEEE INFOCOM
SP - 810
EP - 818
BT - IEEE INFOCOM 2009 - The 28th Conference on Computer Communications
T2 - 28th Conference on Computer Communications, IEEE INFOCOM 2009
Y2 - 19 April 2009 through 25 April 2009
ER -