Counting YouTube videos via random prefix sampling

Jia Zhou, Yanhua Li, Vijay Kumar Adhikari, Zhi Li Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

47 Scopus citations

Abstract

Leveraging the characteristics of YouTube video id space and exploiting a unique property of YouTube search API, in this paper we develop a random prefix sampling method to estimate the total number of videos hosted by YouTube. Through theoretical modeling and analysis, we demonstrate that the estimator based on this method is unbiased, and provide bounds on its variance and confidence interval. These bounds enable us to judiciously select sample sizes to control estimation errors. We evaluate our sampling method and validate the sampling results using two distinct collections of YouTube video id's (namely, treating each collection as if it were the "true" collection of YouTube videos). We then apply our sampling method to the live YouTube system, and estimate that there are a total of roughly 500 millions YouTube videos by May, 2011. Finally, using an unbiased collection of YouTube videos sampled by our method, we show that YouTube video view count statistics collected by prior methods (e.g., through crawling of related video links) are highly skewed, significantly under-estimating the number of videos with very small view counts (

Original languageEnglish (US)
Title of host publicationIMC'11 - Proceedings of the 2011 ACM SIGCOMM Internet Measurement Conference
Pages371-379
Number of pages9
DOIs
StatePublished - 2011
Event2011 ACM SIGCOMM Internet Measurement Conference, IMC'11 - Berlin, Germany
Duration: Nov 2 2011Nov 4 2011

Publication series

NameProceedings of the ACM SIGCOMM Internet Measurement Conference, IMC

Other

Other2011 ACM SIGCOMM Internet Measurement Conference, IMC'11
Country/TerritoryGermany
CityBerlin
Period11/2/1111/4/11

Keywords

  • YouTube
  • online social networks
  • sampling

Fingerprint

Dive into the research topics of 'Counting YouTube videos via random prefix sampling'. Together they form a unique fingerprint.

Cite this