This paper discusses the challenges and opportunities for using archival Internet data in order to observe a host of social science phenomena. Specifically, this paper introduces HistoryTracker, a new tool for accessing and extracting archived data from the Internet Archive, the largest repository of archived Web data in existence. The HistoryTracker tool serves to create a Web observatory that allows scholars to study the history of the Web. HistoryTracker takes advantages of Hadoop processing capacity, and allows researchers to extract large swaths of archived data into a link list format that can be easily transferred to a number of other analytical tools. A brief illustration of the use of HistoryTracker is presented demonstrating the use of the tool. Finally, a number of continuing research challenges are discussed, and future research opportunities are outlined.
|Original language||English (US)|
|Title of host publication||WWW 2014 Companion - Proceedings of the 23rd International Conference on World Wide Web|
|Publisher||Association for Computing Machinery, Inc|
|Number of pages||6|
|State||Published - Apr 7 2014|
|Event||23rd International Conference on World Wide Web, WWW 2014 - Seoul, Korea, Republic of|
Duration: Apr 7 2014 → Apr 11 2014
|Name||WWW 2014 Companion - Proceedings of the 23rd International Conference on World Wide Web|
|Other||23rd International Conference on World Wide Web, WWW 2014|
|Country/Territory||Korea, Republic of|
|Period||4/7/14 → 4/11/14|
Bibliographical noteFunding Information:
The author acknowledges support from the National Science Foundation (NSF Award 1244727), as well as the support of a number of collaborators including Kris Carpenter, David Lazer, Katherine Ognyanova, Vinay Goel, Luan Nguyen, Hai Nguyen and Allie Kosterich.
© Copyright 2014 by the International World Wide Web Conferences Steering Committee.
- Archived data
- Data extraction
- Network analysis
- Occupy wall street
- Social sciences
- Web observatory