DAPPER: Scaling Dynamic Author Persona Topic Model to Billion Word Corpora

Robert Giaquinto, Arindam Banerjee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Extracting common narratives from multi-author dynamic text corpora requires complex models, such as the Dynamic Author Persona (DAP) topic model. However, such models are complex and can struggle to scale to large corpora, often because of challenging non-conjugate terms. To overcome such challenges, we adapt new ideas in approximate inference to the DAP model, resulting in the DAP Performed Exceedingly Rapidly (DAPPER) topic model. Specifically, we develop Conjugate-Computation Variational Inference (CVI) based variational Expectation-Maximization (EM) for learning the model, yielding fast, closed form updates for each document, replacing iterative optimization in earlier work. Our results show significant improvements in model fit and training time without needing to compromise the model's temporal structure or the application of Regularized Variation Inference (RVI). We demonstrate the scalability and effectiveness of the DAPPER model on multiple datasets, including the CaringBridge corpus - a collection of 9 million journals written by 200,000 authors during health crises.

Original languageEnglish (US)
Title of host publication2018 IEEE International Conference on Data Mining, ICDM 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages971-976
Number of pages6
ISBN (Electronic)9781538691588
DOIs
StatePublished - Dec 27 2018
Event18th IEEE International Conference on Data Mining, ICDM 2018 - Singapore, Singapore
Duration: Nov 17 2018Nov 20 2018

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
Volume2018-November
ISSN (Print)1550-4786

Conference

Conference18th IEEE International Conference on Data Mining, ICDM 2018
CountrySingapore
CitySingapore
Period11/17/1811/20/18

Bibliographical note

Funding Information:
ACKNOWLEDGMENTS We thank reviewers for their valuable comments, University of Minnesota Supercomputing Institute for technical support, and CaringBridge for support and collaboration. The research was supported by NSF grants IIS-1563950, IIS-1447566, IIS-1447574, IIS-1422557, CCF-1451986, CNS-1314560.

Keywords

  • Approximate inference
  • Graphical model
  • Healthcare
  • Non-conjugate models
  • Regularized variational inference
  • Text mining
  • Topic modeling

Fingerprint Dive into the research topics of 'DAPPER: Scaling Dynamic Author Persona Topic Model to Billion Word Corpora'. Together they form a unique fingerprint.

Cite this