TY - GEN
T1 - Emerging topic detection using dictionary learning
AU - Kasiviswanathan, Shiva Prasad
AU - Melville, Prem
AU - Banerjee, Arindam
AU - Sindhwani, Vikas
PY - 2011
Y1 - 2011
N2 - Streaming user-generated content in the form of blogs, microblogs, forums, and multimedia sharing sites, provides a rich source of data from which invaluable information and insights maybe gleaned. Given the vast volume of such social media data being continually generated, one of the challenges is to automatically tease apart the emerging topics of discussion from the constant background chatter. Such emerging topics can be identified by the appearance of multiple posts on a unique subject matter, which is distinct from previous online discourse. We address the problem of identifying emerging topics through the use of dictionary learning. We propose a two stage approach respectively based on detection and clustering of novel user-generated content. We derive a scalable approach by using the alternating directions method to solve the resulting optimization problems. Empirical results show that our proposed approach is more effective than several baselines in detecting emerging topics in traditional news story and newsgroup data. We also demonstrate the practical application to social media analysis, based on a study on streaming data from Twitter.
AB - Streaming user-generated content in the form of blogs, microblogs, forums, and multimedia sharing sites, provides a rich source of data from which invaluable information and insights maybe gleaned. Given the vast volume of such social media data being continually generated, one of the challenges is to automatically tease apart the emerging topics of discussion from the constant background chatter. Such emerging topics can be identified by the appearance of multiple posts on a unique subject matter, which is distinct from previous online discourse. We address the problem of identifying emerging topics through the use of dictionary learning. We propose a two stage approach respectively based on detection and clustering of novel user-generated content. We derive a scalable approach by using the alternating directions method to solve the resulting optimization problems. Empirical results show that our proposed approach is more effective than several baselines in detecting emerging topics in traditional news story and newsgroup data. We also demonstrate the practical application to social media analysis, based on a study on streaming data from Twitter.
KW - clustering
KW - dictionary learning
KW - l1 reconstruction
UR - http://www.scopus.com/inward/record.url?scp=83055161779&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=83055161779&partnerID=8YFLogxK
U2 - 10.1145/2063576.2063686
DO - 10.1145/2063576.2063686
M3 - Conference contribution
AN - SCOPUS:83055161779
SN - 9781450307178
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 745
EP - 754
BT - CIKM'11 - Proceedings of the 2011 ACM International Conference on Information and Knowledge Management
T2 - 20th ACM Conference on Information and Knowledge Management, CIKM'11
Y2 - 24 October 2011 through 28 October 2011
ER -