Automatic tagging with existing and novel tags

Junhui Wang; Xiaotong Shen; Yiwen Sun; Annie Qu

doi:10.1093/biomet/asx016

Automatic tagging with existing and novel tags

Junhui Wang, Xiaotong Shen, Yiwen Sun, Annie Qu

Statistics (Twin Cities)

Research output: Contribution to journal › Article › peer-review

Abstract

Automatic tagging by key words and phrases is important in multi-label classification of a document. In this paper, we first introduce a tagging loss to measure the discrepancy between predicted and actual tag sets, which is expressed in terms of a sum of weighted pairwise margins between two tags by their degree of similarity. We then construct a regularized empirical loss to incorporate linguistic knowledge, and identify a tagger maximizing the separations between the pairwise margins. One salient feature of the proposed method is its capability to identify novel tags absent from a training sample by using their similarity to existing tags. Computationally, the proposed method is implemented by an alternating direction method of multipliers, integrated with a difference convex algorithm. This permits scalable computation. We show that the method achieves accurate tagging, and that it compares favourably with existing methods. Finally, we apply the proposed method to tagging a Reuters news dataset.

Original language	English (US)
Pages (from-to)	273-290
Number of pages	18
Journal	Biometrika
Volume	104
Issue number	2
DOIs	https://doi.org/10.1093/biomet/asx016
State	Published - Jun 1 2017

Keywords

Alternating direction method of multipliers
Large margin
Multi-label classification
Scalability
Social bookmarking system
Text mining

Access

10.1093/biomet/asx016

OpenUrl availability

Full text

Cite this

@article{ec008220abc24671a40a01400773a4e8,

title = "Automatic tagging with existing and novel tags",

abstract = "Automatic tagging by key words and phrases is important in multi-label classification of a document. In this paper, we first introduce a tagging loss to measure the discrepancy between predicted and actual tag sets, which is expressed in terms of a sum of weighted pairwise margins between two tags by their degree of similarity. We then construct a regularized empirical loss to incorporate linguistic knowledge, and identify a tagger maximizing the separations between the pairwise margins. One salient feature of the proposed method is its capability to identify novel tags absent from a training sample by using their similarity to existing tags. Computationally, the proposed method is implemented by an alternating direction method of multipliers, integrated with a difference convex algorithm. This permits scalable computation. We show that the method achieves accurate tagging, and that it compares favourably with existing methods. Finally, we apply the proposed method to tagging a Reuters news dataset.",

keywords = "Alternating direction method of multipliers, Large margin, Multi-label classification, Scalability, Social bookmarking system, Text mining",

author = "Junhui Wang and Xiaotong Shen and Yiwen Sun and Annie Qu",

year = "2017",

month = jun,

day = "1",

doi = "10.1093/biomet/asx016",

language = "English (US)",

volume = "104",

pages = "273--290",

journal = "Biometrika",

issn = "0006-3444",

publisher = "Oxford University Press",

number = "2",

}

TY - JOUR

T1 - Automatic tagging with existing and novel tags

AU - Wang, Junhui

AU - Shen, Xiaotong

AU - Sun, Yiwen

AU - Qu, Annie

PY - 2017/6/1

Y1 - 2017/6/1

N2 - Automatic tagging by key words and phrases is important in multi-label classification of a document. In this paper, we first introduce a tagging loss to measure the discrepancy between predicted and actual tag sets, which is expressed in terms of a sum of weighted pairwise margins between two tags by their degree of similarity. We then construct a regularized empirical loss to incorporate linguistic knowledge, and identify a tagger maximizing the separations between the pairwise margins. One salient feature of the proposed method is its capability to identify novel tags absent from a training sample by using their similarity to existing tags. Computationally, the proposed method is implemented by an alternating direction method of multipliers, integrated with a difference convex algorithm. This permits scalable computation. We show that the method achieves accurate tagging, and that it compares favourably with existing methods. Finally, we apply the proposed method to tagging a Reuters news dataset.

AB - Automatic tagging by key words and phrases is important in multi-label classification of a document. In this paper, we first introduce a tagging loss to measure the discrepancy between predicted and actual tag sets, which is expressed in terms of a sum of weighted pairwise margins between two tags by their degree of similarity. We then construct a regularized empirical loss to incorporate linguistic knowledge, and identify a tagger maximizing the separations between the pairwise margins. One salient feature of the proposed method is its capability to identify novel tags absent from a training sample by using their similarity to existing tags. Computationally, the proposed method is implemented by an alternating direction method of multipliers, integrated with a difference convex algorithm. This permits scalable computation. We show that the method achieves accurate tagging, and that it compares favourably with existing methods. Finally, we apply the proposed method to tagging a Reuters news dataset.

KW - Alternating direction method of multipliers

KW - Large margin

KW - Multi-label classification

KW - Scalability

KW - Social bookmarking system

KW - Text mining

UR - http://www.scopus.com/inward/record.url?scp=85026899763&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85026899763&partnerID=8YFLogxK

U2 - 10.1093/biomet/asx016

DO - 10.1093/biomet/asx016

M3 - Article

AN - SCOPUS:85026899763

SN - 0006-3444

VL - 104

SP - 273

EP - 290

JO - Biometrika

JF - Biometrika

IS - 2

ER -

Automatic tagging with existing and novel tags

Abstract

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this