Integrating crowdsourcing and active learning for classification of work-life events from tweets

Yunpeng Zhao, Mattia Prosperi, Tianchen Lyu, Yi Guo, Le Zhou, Jiang Bian

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Social media, especially Twitter, is being increasingly used for research with predictive analytics. In social media studies, natural language processing (NLP) techniques are used in conjunction with expert-based, manual and qualitative analyses. However, social media data are unstructured and must undergo complex manipulation for research use. The manual annotation is the most resource and time-consuming process that multiple expert raters have to reach consensus on every item, but is essential to create gold-standard datasets for training NLP-based machine learning classifiers. To reduce the burden of the manual annotation, yet maintaining its reliability, we devised a crowdsourcing pipeline combined with active learning strategies. We demonstrated its effectiveness through a case study that identifies job loss events from individual tweets. We used Amazon Mechanical Turk platform to recruit annotators from the Internet and designed a number of quality control measures to assure annotation accuracy. We evaluated 4 different active learning strategies (i.e., least confident, entropy, vote entropy, and Kullback-Leibler divergence). The active learning strategies aim at reducing the number of tweets needed to reach a desired performance of automated classification. Results show that crowdsourcing is useful to create high-quality annotations and active learning helps in reducing the number of required tweets, although there was no substantial difference among the strategies tested.

Original languageEnglish (US)
Title of host publicationTrends in Artificial Intelligence Theory and Applications. Artificial Intelligence Practices - 33rd International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2020, Proceedings
EditorsHamido Fujita, Jun Sasaki, Philippe Fournier-Viger, Moonis Ali
PublisherSpringer Science and Business Media Deutschland GmbH
Pages333-344
Number of pages12
ISBN (Print)9783030557881
DOIs
StatePublished - 2020
Event33rd International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2020 - Kitakyushu, Japan
Duration: Sep 22 2020Sep 25 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12144 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference33rd International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2020
Country/TerritoryJapan
CityKitakyushu
Period9/22/209/25/20

Bibliographical note

Funding Information:
This study was supported by NSF Award #1734134.

Publisher Copyright:
© Springer Nature Switzerland AG 2020.

Keywords

  • Active learning
  • Crowdsourcing
  • Social media

Fingerprint

Dive into the research topics of 'Integrating crowdsourcing and active learning for classification of work-life events from tweets'. Together they form a unique fingerprint.

Cite this