Observing the web by understanding the past: Archival internet research

Matthew S. Weber

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

This paper discusses the challenges and opportunities for using archival Internet data in order to observe a host of social science phenomena. Specifically, this paper introduces HistoryTracker, a new tool for accessing and extracting archived data from the Internet Archive, the largest repository of archived Web data in existence. The HistoryTracker tool serves to create a Web observatory that allows scholars to study the history of the Web. HistoryTracker takes advantages of Hadoop processing capacity, and allows researchers to extract large swaths of archived data into a link list format that can be easily transferred to a number of other analytical tools. A brief illustration of the use of HistoryTracker is presented demonstrating the use of the tool. Finally, a number of continuing research challenges are discussed, and future research opportunities are outlined.

Original languageEnglish (US)
Title of host publicationWWW 2014 Companion - Proceedings of the 23rd International Conference on World Wide Web
PublisherAssociation for Computing Machinery, Inc
Pages1031-1036
Number of pages6
ISBN (Electronic)9781450327459
DOIs
StatePublished - Apr 7 2014
Externally publishedYes
Event23rd International Conference on World Wide Web, WWW 2014 - Seoul, Korea, Republic of
Duration: Apr 7 2014Apr 11 2014

Publication series

NameWWW 2014 Companion - Proceedings of the 23rd International Conference on World Wide Web

Other

Other23rd International Conference on World Wide Web, WWW 2014
CountryKorea, Republic of
CitySeoul
Period4/7/144/11/14

Keywords

  • Archived data
  • Data extraction
  • Network analysis
  • Occupy wall street
  • Social sciences
  • Web observatory

Fingerprint Dive into the research topics of 'Observing the web by understanding the past: Archival internet research'. Together they form a unique fingerprint.

Cite this