Observing the web by understanding the past: Archival internet research

Matthew S. Weber

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

This paper discusses the challenges and opportunities for using archival Internet data in order to observe a host of social science phenomena. Specifically, this paper introduces HistoryTracker, a new tool for accessing and extracting archived data from the Internet Archive, the largest repository of archived Web data in existence. The HistoryTracker tool serves to create a Web observatory that allows scholars to study the history of the Web. HistoryTracker takes advantages of Hadoop processing capacity, and allows researchers to extract large swaths of archived data into a link list format that can be easily transferred to a number of other analytical tools. A brief illustration of the use of HistoryTracker is presented demonstrating the use of the tool. Finally, a number of continuing research challenges are discussed, and future research opportunities are outlined.

Original languageEnglish (US)
Title of host publicationWWW 2014 Companion - Proceedings of the 23rd International Conference on World Wide Web
PublisherAssociation for Computing Machinery, Inc
Pages1031-1036
Number of pages6
ISBN (Electronic)9781450327459
DOIs
StatePublished - Apr 7 2014
Externally publishedYes
Event23rd International Conference on World Wide Web, WWW 2014 - Seoul, Korea, Republic of
Duration: Apr 7 2014Apr 11 2014

Publication series

NameWWW 2014 Companion - Proceedings of the 23rd International Conference on World Wide Web

Other

Other23rd International Conference on World Wide Web, WWW 2014
Country/TerritoryKorea, Republic of
CitySeoul
Period4/7/144/11/14

Bibliographical note

Funding Information:
The author acknowledges support from the National Science Foundation (NSF Award 1244727), as well as the support of a number of collaborators including Kris Carpenter, David Lazer, Katherine Ognyanova, Vinay Goel, Luan Nguyen, Hai Nguyen and Allie Kosterich.

Publisher Copyright:
© Copyright 2014 by the International World Wide Web Conferences Steering Committee.

Keywords

  • Archived data
  • Data extraction
  • Network analysis
  • Occupy wall street
  • Social sciences
  • Web observatory

Fingerprint

Dive into the research topics of 'Observing the web by understanding the past: Archival internet research'. Together they form a unique fingerprint.

Cite this