TY - JOUR
T1 - Automatic tagging with existing and novel tags
AU - Wang, Junhui
AU - Shen, Xiaotong
AU - Sun, Yiwen
AU - Qu, Annie
PY - 2017/6/1
Y1 - 2017/6/1
N2 - Automatic tagging by key words and phrases is important in multi-label classification of a document. In this paper, we first introduce a tagging loss to measure the discrepancy between predicted and actual tag sets, which is expressed in terms of a sum of weighted pairwise margins between two tags by their degree of similarity. We then construct a regularized empirical loss to incorporate linguistic knowledge, and identify a tagger maximizing the separations between the pairwise margins. One salient feature of the proposed method is its capability to identify novel tags absent from a training sample by using their similarity to existing tags. Computationally, the proposed method is implemented by an alternating direction method of multipliers, integrated with a difference convex algorithm. This permits scalable computation. We show that the method achieves accurate tagging, and that it compares favourably with existing methods. Finally, we apply the proposed method to tagging a Reuters news dataset.
AB - Automatic tagging by key words and phrases is important in multi-label classification of a document. In this paper, we first introduce a tagging loss to measure the discrepancy between predicted and actual tag sets, which is expressed in terms of a sum of weighted pairwise margins between two tags by their degree of similarity. We then construct a regularized empirical loss to incorporate linguistic knowledge, and identify a tagger maximizing the separations between the pairwise margins. One salient feature of the proposed method is its capability to identify novel tags absent from a training sample by using their similarity to existing tags. Computationally, the proposed method is implemented by an alternating direction method of multipliers, integrated with a difference convex algorithm. This permits scalable computation. We show that the method achieves accurate tagging, and that it compares favourably with existing methods. Finally, we apply the proposed method to tagging a Reuters news dataset.
KW - Alternating direction method of multipliers
KW - Large margin
KW - Multi-label classification
KW - Scalability
KW - Social bookmarking system
KW - Text mining
UR - http://www.scopus.com/inward/record.url?scp=85026899763&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85026899763&partnerID=8YFLogxK
U2 - 10.1093/biomet/asx016
DO - 10.1093/biomet/asx016
M3 - Article
AN - SCOPUS:85026899763
SN - 0006-3444
VL - 104
SP - 273
EP - 290
JO - Biometrika
JF - Biometrika
IS - 2
ER -