Identifying high cardinality internet hosts

Jin Cao; Yu Jin; Aiyou Chen; Tian Bu; Zhi-Li Zhang

doi:10.1109/INFCOM.2009.5061990

Identifying high cardinality internet hosts

Jin Cao, Yu Jin, Aiyou Chen, Tian Bu, Zhi-Li Zhang

Computer Science and Engineering

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

71 Scopus citations

Abstract

The Internet host cardinality, defined as the number of distinct peers that an Internet host communicates with, is an important metric for profiling Internet hosts. Some example applications include behavior based network intrusion detection, p2p hosts identification, and server identification. However, due to the tremendous number of hosts in the Internet and high speed links, tracking the exact cardinality of each host is not feasible due to the limited memory and computation resource. Existing approaches on host cardinality counting have primarily focused on hosts of extremely high cardinalities. These methods do not work well with hosts of moderately large cardinalities that are needed for certain host behavior profiling such as detection of p2p hosts or port scanners. In this paper, we propose an online sampling approach for identifying hosts whose cardinality exceeds some moderate prescribed threshold, e.g. 50, or within specific ranges. The main advantage of our approach is that it can filter out the majority of low cardinality hosts while preserving the hosts of interest, and hence minimize the memory resources wasted by tracking irrelevant hosts. Our approach consists of three components: 1) two-phase filtering for eliminating low cardinality hosts, 2) thresholded bitmap for counting cardinalities, and 3) bias correction. Through both theoretical analysis and experiments using real Internet traces, we demonstrate that our approach requires much less memory than existing approaches do whereas yields more accurate estimates.

Original language	English (US)
Title of host publication	IEEE INFOCOM 2009 - The 28th Conference on Computer Communications
Pages	810-818
Number of pages	9
DOIs	https://doi.org/10.1109/INFCOM.2009.5061990
State	Published - 2009
Event	28th Conference on Computer Communications, IEEE INFOCOM 2009 - Rio de Janeiro, Brazil Duration: Apr 19 2009 → Apr 25 2009

Publication series

Name	Proceedings - IEEE INFOCOM
ISSN (Print)	0743-166X

Other

Other	28th Conference on Computer Communications, IEEE INFOCOM 2009
Country/Territory	Brazil
City	Rio de Janeiro
Period	4/19/09 → 4/25/09

Access

10.1109/INFCOM.2009.5061990

OpenUrl availability

Full text

Cite this

@inproceedings{f5753aefdd0a465b9def1d56f970596e,

title = "Identifying high cardinality internet hosts",

abstract = "The Internet host cardinality, defined as the number of distinct peers that an Internet host communicates with, is an important metric for profiling Internet hosts. Some example applications include behavior based network intrusion detection, p2p hosts identification, and server identification. However, due to the tremendous number of hosts in the Internet and high speed links, tracking the exact cardinality of each host is not feasible due to the limited memory and computation resource. Existing approaches on host cardinality counting have primarily focused on hosts of extremely high cardinalities. These methods do not work well with hosts of moderately large cardinalities that are needed for certain host behavior profiling such as detection of p2p hosts or port scanners. In this paper, we propose an online sampling approach for identifying hosts whose cardinality exceeds some moderate prescribed threshold, e.g. 50, or within specific ranges. The main advantage of our approach is that it can filter out the majority of low cardinality hosts while preserving the hosts of interest, and hence minimize the memory resources wasted by tracking irrelevant hosts. Our approach consists of three components: 1) two-phase filtering for eliminating low cardinality hosts, 2) thresholded bitmap for counting cardinalities, and 3) bias correction. Through both theoretical analysis and experiments using real Internet traces, we demonstrate that our approach requires much less memory than existing approaches do whereas yields more accurate estimates.",

author = "Jin Cao and Yu Jin and Aiyou Chen and Tian Bu and Zhi-Li Zhang",

year = "2009",

doi = "10.1109/INFCOM.2009.5061990",

language = "English (US)",

isbn = "9781424435135",

series = "Proceedings - IEEE INFOCOM",

pages = "810--818",

booktitle = "IEEE INFOCOM 2009 - The 28th Conference on Computer Communications",

note = "28th Conference on Computer Communications, IEEE INFOCOM 2009 ; Conference date: 19-04-2009 Through 25-04-2009",

}

TY - GEN

T1 - Identifying high cardinality internet hosts

AU - Cao, Jin

AU - Jin, Yu

AU - Chen, Aiyou

AU - Bu, Tian

AU - Zhang, Zhi-Li

PY - 2009

Y1 - 2009

N2 - The Internet host cardinality, defined as the number of distinct peers that an Internet host communicates with, is an important metric for profiling Internet hosts. Some example applications include behavior based network intrusion detection, p2p hosts identification, and server identification. However, due to the tremendous number of hosts in the Internet and high speed links, tracking the exact cardinality of each host is not feasible due to the limited memory and computation resource. Existing approaches on host cardinality counting have primarily focused on hosts of extremely high cardinalities. These methods do not work well with hosts of moderately large cardinalities that are needed for certain host behavior profiling such as detection of p2p hosts or port scanners. In this paper, we propose an online sampling approach for identifying hosts whose cardinality exceeds some moderate prescribed threshold, e.g. 50, or within specific ranges. The main advantage of our approach is that it can filter out the majority of low cardinality hosts while preserving the hosts of interest, and hence minimize the memory resources wasted by tracking irrelevant hosts. Our approach consists of three components: 1) two-phase filtering for eliminating low cardinality hosts, 2) thresholded bitmap for counting cardinalities, and 3) bias correction. Through both theoretical analysis and experiments using real Internet traces, we demonstrate that our approach requires much less memory than existing approaches do whereas yields more accurate estimates.

AB - The Internet host cardinality, defined as the number of distinct peers that an Internet host communicates with, is an important metric for profiling Internet hosts. Some example applications include behavior based network intrusion detection, p2p hosts identification, and server identification. However, due to the tremendous number of hosts in the Internet and high speed links, tracking the exact cardinality of each host is not feasible due to the limited memory and computation resource. Existing approaches on host cardinality counting have primarily focused on hosts of extremely high cardinalities. These methods do not work well with hosts of moderately large cardinalities that are needed for certain host behavior profiling such as detection of p2p hosts or port scanners. In this paper, we propose an online sampling approach for identifying hosts whose cardinality exceeds some moderate prescribed threshold, e.g. 50, or within specific ranges. The main advantage of our approach is that it can filter out the majority of low cardinality hosts while preserving the hosts of interest, and hence minimize the memory resources wasted by tracking irrelevant hosts. Our approach consists of three components: 1) two-phase filtering for eliminating low cardinality hosts, 2) thresholded bitmap for counting cardinalities, and 3) bias correction. Through both theoretical analysis and experiments using real Internet traces, we demonstrate that our approach requires much less memory than existing approaches do whereas yields more accurate estimates.

UR - http://www.scopus.com/inward/record.url?scp=70349653447&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70349653447&partnerID=8YFLogxK

U2 - 10.1109/INFCOM.2009.5061990

DO - 10.1109/INFCOM.2009.5061990

M3 - Conference contribution

AN - SCOPUS:70349653447

SN - 9781424435135

T3 - Proceedings - IEEE INFOCOM

SP - 810

EP - 818

BT - IEEE INFOCOM 2009 - The 28th Conference on Computer Communications

T2 - 28th Conference on Computer Communications, IEEE INFOCOM 2009

Y2 - 19 April 2009 through 25 April 2009

ER -

Identifying high cardinality internet hosts

Abstract

Publication series

Other

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this